2023-09-22

cs.AI

cs.AI - 2023-09-22

Poster: Self-Supervised Quantization-Aware Knowledge Distillation

paper_url: http://arxiv.org/abs/2309.13220
repo_url: None
paper_authors: Kaiqi Zhao, Ming Zhao
for: 提高量化敏感模型的性能
methods: 提出了一种新的无监督自适应量化敏感知识传递（SQAKD）框架，通过同时Minimize KL损失和精度损失来协调量化和敏感知识传递。
results: 对多种现有QAT研究进行评估，显示SQAKD可以显著提高量化敏感模型的性能，并不需要大量标注数据。

Abstract
Quantization-aware training (QAT) starts with a pre-trained full-precision model and performs quantization during retraining. However, existing QAT works require supervision from the labels and they suffer from accuracy loss due to reduced precision. To address these limitations, this paper proposes a novel Self-Supervised Quantization-Aware Knowledge Distillation framework (SQAKD). SQAKD first unifies the forward and backward dynamics of various quantization functions and then reframes QAT as a co-optimization problem that simultaneously minimizes the KL-Loss and the discretization error, in a self-supervised manner. The evaluation shows that SQAKD significantly improves the performance of various state-of-the-art QAT works. SQAKD establishes stronger baselines and does not require extensive labeled training data, potentially making state-of-the-art QAT research more accessible.

摘要
Quantization-aware training (QAT) 开始于一个预训练的全精度模型，并在重新训练期间进行量化。然而，现有的 QAT 工作受到标签的监督，并且受到精度降低的影响，导致准确性下降。为解决这些限制，本文提出了一种新的 Self-Supervised Quantization-Aware Knowledge Distillation 框架 (SQAKD)。SQAKD 首先将量化函数的前向和反向动力统一，然后将 QAT 重新定义为一个同时减少 KL-损失和精度损失的合理化问题，在自我监督的方式下进行解决。评估结果表明，SQAKD 可以显著提高不同的状态前训练 QAT 工作的性能。SQAKD 设立了更强的基线，并不需要大量的标签训练数据，可能使状态前训练 QAT 研究更加可 accessible。

AI-Copilot for Business Optimisation: A Framework and A Case Study in Production Scheduling

paper_url: http://arxiv.org/abs/2309.13218
repo_url: None
paper_authors: Pivithuru Thejan Amarasinghe, Su Nguyen, Yuan Sun, Damminda Alahakoon
for: 这个论文的目的是提出一种基于大语言模型（LLM）的企业优化问题表述 Synthesizer，以减少人工智能的参与度。
methods: 该论文采用了练好LLM的 fine-tuning方法，并提出了一种AI copilot的设计方法以及模块化和提示工程技术来解决问题表述中的卡通问题。
results: 实验结果显示，通过该方法可以synthesize大型和复杂的企业优化问题表述，并且可以在生产规划中应用。

Abstract
Business optimisation refers to the process of finding and implementing efficient and cost-effective means of operation to bring a competitive advantage for businesses. Synthesizing problem formulations is an integral part of business optimisation, which relies on human expertise to construct problem formulations using optimisation languages. Interestingly, with advancements in Large Language Models (LLMs), the human expertise needed in problem formulation can be minimized. However, developing an LLM for problem formulation is challenging, due to training data, token limitations, and lack of appropriate performance metrics. For the requirement of training data, recent attention has been directed towards fine-tuning pre-trained LLMs for downstream tasks rather than training an LLM from scratch for a specific task. In this paper, we adopt an LLM fine-tuning approach and propose an AI-Copilot for business optimisation problem formulation. For token limitations, we introduce modularization and prompt engineering techniques to synthesize complex problem formulations as modules that fit into the token limits of LLMs. Additionally, we design performance evaluation metrics that are better suited for assessing the accuracy and quality of problem formulations. The experiment results demonstrate that with this approach we can synthesize complex and large problem formulations for a typical business optimisation problem in production scheduling.

摘要
Despite the potential benefits of using LLMs for problem formulation, there are several challenges that need to be addressed. One of the main challenges is the lack of training data, which makes it difficult to train an LLM from scratch for a specific task. To address this challenge, recent attention has been directed towards fine-tuning pre-trained LLMs for downstream tasks.In this paper, we adopt an LLM fine-tuning approach and propose an AI-Copilot for business optimization problem formulation. To overcome the token limitations of LLMs, we introduce modularization and prompt engineering techniques to synthesize complex problem formulations as modules that fit into the token limits of LLMs. Additionally, we design performance evaluation metrics that are better suited for assessing the accuracy and quality of problem formulations.The experiment results demonstrate that with this approach, we can synthesize complex and large problem formulations for a typical business optimization problem in production scheduling. This shows that our proposed AI-Copilot can effectively assist businesses in optimizing their operations and gaining a competitive advantage.

MISFIT-V: Misaligned Image Synthesis and Fusion using Information from Thermal and Visual

paper_url: http://arxiv.org/abs/2309.13216
repo_url: https://github.com/Aadharc/Visual_Thermal_Image_Fusion
paper_authors: Aadhar Chauhan, Isaac Remy, Danny Broyles, Karen Leung
for: 本研究旨在提高搜救队伍在霍尔风景中从空中视觉和热成像中检测人员，以提高搜救效率和准确率。
methods: 该研究提出了一种基于Generative Adversarial Network（GAN）和带通信机制的两元深度学习方法，名为Misaligned Image Synthesis and Fusion using Information from Thermal and Visual（MISFIT-V），用于把视觉和热成像模态进行图像 fusión。
results: 实验结果表明，MISFIT-V方法在环境因素不利的情况下具有更高的 robustness 性和精度，比如融合图像中的人员检测结果。

Abstract
Detecting humans from airborne visual and thermal imagery is a fundamental challenge for Wilderness Search-and-Rescue (WiSAR) teams, who must perform this function accurately in the face of immense pressure. The ability to fuse these two sensor modalities can potentially reduce the cognitive load on human operators and/or improve the effectiveness of computer vision object detection models. However, the fusion task is particularly challenging in the context of WiSAR due to hardware limitations and extreme environmental factors. This work presents Misaligned Image Synthesis and Fusion using Information from Thermal and Visual (MISFIT-V), a novel two-pronged unsupervised deep learning approach that utilizes a Generative Adversarial Network (GAN) and a cross-attention mechanism to capture the most relevant features from each modality. Experimental results show MISFIT-V offers enhanced robustness against misalignment and poor lighting/thermal environmental conditions compared to existing visual-thermal image fusion methods.

摘要
搜寻人员从空中视觉和热影像中识别是搜寻和救援队（WiSAR）的基本挑战，需要在压力很大的情况下准确完成。将这两种感知模式融合可能可以减轻人工操作员的认知负担和/或提高计算机视觉对象检测模型的效果。然而，在WiSAR中融合任务 particullay challenging due to hardware limitations and extreme environmental factors。这篇文章介绍了一种新的两重无监督深度学习方法，即 Misaligned Image Synthesis and Fusion using Information from Thermal and Visual（MISFIT-V）。该方法使用生成 adversarial network（GAN）和跨注意机制来捕捉每个模式中最相关的特征。实验结果表明，MISFIT-V在不同的拍摄角度和热环境下具有更高的鲁棒性，相比之下现有的视觉热像重构方法。

Assessing the Impact of Personality on Affective States from Video Game Communication

paper_url: http://arxiv.org/abs/2309.13214
repo_url: None
paper_authors: Atieh Kashani, Johannes Pfau, Magy Seif El-Nasr
for: This paper explores the impact of personality on the way players express themselves affectively in a team-based collaborative alternate reality game.
methods: The authors collected chat logs from eleven players over two weeks, labeled them according to their affective state, and assessed the connection between them and the five-factor personality domains and facets using multi-linear regression.
results: The study found a series of reasonable correlations between (combinations of) personality variables and expressed affect, including increased confusion predicted by lower self-competence, personal annoyance predicted by vulnerability to stress, and expressing anger more often in players prone to anxiety.

Abstract
Individual differences in personality determine our preferences, traits and values, which should similarly hold for the way we express ourselves. With current advancements and transformations of technology and society, text-based communication has become ordinary and often even surpasses natural voice conversations -- with distinct challenges and opportunities. In this exploratory work, we investigate the impact of personality on the tendency how players of a team-based collaborative alternate reality game express themselves affectively. We collected chat logs from eleven players over two weeks, labeled them according to their affective state, and assessed the connection between them and the five-factor personality domains and facets. After applying multi-linear regression, we found a series of reasonable correlations between (combinations of) personality variables and expressed affect -- as increased confusion could be predicted by lower self-competence (C1), personal annoyance by vulnerability to stress (N6) and expressing anger occured more often in players that are prone to anxiety (N1), less humble and modest (A5), think less carefully before they act (C6) and have higher neuroticism (N). Expanding the data set, sample size and input modalities in subsequent work, we aim to confirm these findings and reveal even more interesting connections that could inform affective computing and games user research equally.

摘要
人类个体差异对我们的偏好、特质和价值产生影响，这一点应该在我们的表达方式上也有影响。随着技术和社会的发展，文本基本上已经成为了日常交流的一种常见方式，有时甚至超过了自然语音交流。在这项探索性研究中，我们 investigate了玩家在团队合作 alternate reality 游戏中表达情感的影响。我们收集了11名玩家的聊天记录，将其分为不同情感状态，并评估了这些状态与五大人格特征域和特征之间的连接。经多线性回归分析，我们发现了一系列合理的相关关系，例如：增加混乱可以预测低自我竞争力（C1）、个人恼怒可以预测脆弱性（N6），表达愤怒更常见于具有 anxiety（N1）、不谦虚和谨慎（A5）、不思议行为（C6）和高度neurótico（N）。在后续工作中，我们计划扩大数据集、样本大小和输入模式，以确认这些发现和揭示更多的 interessante 连接，以便在情感计算和游戏用户研究中具有参考意义。

Intent-Aware Autonomous Driving: A Case Study on Highway Merging Scenarios

paper_url: http://arxiv.org/abs/2309.13206
repo_url: None
paper_authors: Nishtha Mahajan, Qi Zhang
for: 本研究使用汽车自动控制器之间的意图交换来促进协作。
methods: 我们在高速公路环境 simulator 中实现了意图分享任务，并在两个代理之间进行了 investigate highway 合并场景中意图分享如何 помо助接收方调整其行为。
results: 我们发现，通过意图分享，接收方可以更好地适应高速公路合并场景，提高了合并效率和安全性。

Abstract
In this work, we use the communication of intent as a means to facilitate cooperation between autonomous vehicle agents. Generally speaking, intents can be any reliable information about its future behavior that a vehicle communicates with another vehicle. We implement this as an intent-sharing task atop the merging environment in the simulator of highway-env, which provides a collection of environments for learning decision-making strategies for autonomous vehicles. Under a simple setting between two agents, we carefully investigate how intent-sharing can aid the receiving vehicle in adjusting its behavior in highway merging scenarios.

摘要
在这项工作中，我们使用意图通信作为自动驾驶车辆间合作的方式。一般来说，意图可以是任何可靠地关于未来行为的信息，车辆通过这些信息与另一辆车辆进行交流。我们在高速公路环境 simulator 中实现了意图分享任务，提供了一个收集多种决策策略学习自动驾驶车辆的环境。在简单的两辆车辆之间的设定下，我们仔细研究了意图分享如何帮助接收车辆在高速公路做入道场景中调整其行为。

A Practical Survey on Zero-shot Prompt Design for In-context Learning

paper_url: http://arxiv.org/abs/2309.13205
repo_url: None
paper_authors: Yinheng Li
for: 这篇论文旨在探讨各种提示技术，包括简洁、连续、几何shot和零shot提示，以及它们对大语言模型（LLM）性能的影响。methods: 论文检讨了不同提示设计方法，包括手动设计、优化算法和评估方法，以优化LLM在多种任务上的性能。results: 论文总结了关键的研究成果，包括提示工程的方法学和贡献，以及评估提示性能的挑战。

Abstract
The remarkable advancements in large language models (LLMs) have brought about significant improvements in Natural Language Processing(NLP) tasks. This paper presents a comprehensive review of in-context learning techniques, focusing on different types of prompts, including discrete, continuous, few-shot, and zero-shot, and their impact on LLM performance. We explore various approaches to prompt design, such as manual design, optimization algorithms, and evaluation methods, to optimize LLM performance across diverse tasks. Our review covers key research studies in prompt engineering, discussing their methodologies and contributions to the field. We also delve into the challenges faced in evaluating prompt performance, given the absence of a single "best" prompt and the importance of considering multiple metrics. In conclusion, the paper highlights the critical role of prompt design in harnessing the full potential of LLMs and provides insights into the combination of manual design, optimization techniques, and rigorous evaluation for more effective and efficient use of LLMs in various NLP tasks.

摘要
LLMs 的卓越发展对自然语言处理(NLP)任务带来了重要的改善。本文提供了对Context Learning技术的完整回顾，包括不同类型的提示，如离散、连续、少量和零 shot，以及它们对 LLM 性能的影响。我们探讨了不同的提示设计方法，如手动设计、优化算法和评估方法，以优化 LLM 在多种任务上的性能。我们的回顾包括关键的研究成果在提示工程学，讨论了他们的方法和贡献。此外，我们还探讨了评估提示性能的挑战，因为没有单一的 "最佳" 提示，以及考虑多种维度的益虑。结束语，本文强调了提示设计的重要性，并提供了手动设计、优化技术和严格评估的组合，以更有效地和高效地使用 LLM 在多种 NLP 任务中。

Large Language Models and Control Mechanisms Improve Text Readability of Biomedical Abstracts

paper_url: http://arxiv.org/abs/2309.13202
repo_url: https://github.com/hecta-uom/plaba-mu
paper_authors: Zihao Li, Samuel Belkadi, Nicolo Micheletti, Lifeng Han, Matthew Shardlow, Goran Nenadic
for: 本研究旨在提高生物医学领域文献的可读性，使用自然语言处理（NLP）模型自动化报告简化任务，提高公共健康文化知识。methods: 本研究使用现代大语言模型（LLMs）对生物医学报告简化任务进行研究，包括领域细化和提示基本学习（PBL），以及控制符 token 机制。results: 研究使用了多种自动评价指标，包括 BLEU、ROUGE、SARI 和 BERTscore，以及人类评价。 BART-Large WITH Control Token（BART-L-w-CT）机制得到了最高 SARI 分数46.54，T5-base 得到了最高 BERTscore 72.62。在人类评价中，BART-L-w-CTs 获得了更好的简单性分数（2.9 vs. 2.2），而 T5-Base 获得了更好的意义保持分数（3.1 vs. 2.6）。

Abstract
Biomedical literature often uses complex language and inaccessible professional terminologies. That is why simplification plays an important role in improving public health literacy. Applying Natural Language Processing (NLP) models to automate such tasks allows for quick and direct accessibility for lay readers. In this work, we investigate the ability of state-of-the-art large language models (LLMs) on the task of biomedical abstract simplification, using the publicly available dataset for plain language adaptation of biomedical abstracts (\textbf{PLABA}). The methods applied include domain fine-tuning and prompt-based learning (PBL) on: 1) Encoder-decoder models (T5, SciFive, and BART), 2) Decoder-only GPT models (GPT-3.5 and GPT-4) from OpenAI and BioGPT, and 3) Control-token mechanisms on BART-based models. We used a range of automatic evaluation metrics, including BLEU, ROUGE, SARI, and BERTscore, and also conducted human evaluations. BART-Large with Control Token (BART-L-w-CT) mechanisms reported the highest SARI score of 46.54 and T5-base reported the highest BERTscore 72.62. In human evaluation, BART-L-w-CTs achieved a better simplicity score over T5-Base (2.9 vs. 2.2), while T5-Base achieved a better meaning preservation score over BART-L-w-CTs (3.1 vs. 2.6). We also categorised the system outputs with examples, hoping this will shed some light for future research on this task. Our code, fine-tuned models, and data splits are available at \url{https://github.com/HECTA-UoM/PLABA-MU}

摘要
生物医学文献经常使用复杂的语言和不可接触的专业术语，这使得公众健康文化知识的提高受到了限制。因此，简化对于改善公众健康文化知识具有重要的作用。在这项工作中，我们研究了现状最佳的大型自然语言处理（NLP）模型在生物医学摘要简化任务上的能力，使用公共可用的PLABA数据集（plain language adaptation of biomedical abstracts）。我们使用的方法包括域特化 fine-tuning 和提示基本学习（PBL），其中包括：1）Encoder-decoder模型（T5、SciFive和BART），2）Decoder-only GPT模型（GPT-3.5和GPT-4）和3）BART基于模型中的控制符机制。我们使用了一系列自动评估指标，包括BLEU、ROUGE、SARI和BERTscore，并也进行了人类评估。BART-Large with Control Token（BART-L-w-CT）机制获得了46.54的SARI分数，T5-base获得了72.62的BERTscore。在人类评估中，BART-L-w-CTs在 simplicity 分数上赢得了2.9 VS T5-Base的2.2，而T5-Base在 meaning preservation 分数上赢得了3.1 VS BART-L-w-CTs的2.6。我们还将系统输出分类并提供了示例，希望这可以为未来这个任务提供一些灯光。我们的代码、精度调整模型和数据分割可以在https://github.com/HECTA-UoM/PLABA-MU 中获取。

Towards Green AI in Fine-tuning Large Language Models via Adaptive Backpropagation

paper_url: http://arxiv.org/abs/2309.13192
repo_url: https://github.com/pittisl/greentrainer
paper_authors: Kai Huang, Hanyun Yin, Heng Huang, Wei Gao
for: 这个研究旨在提高大型自然语言模型（LLM）的精细化过程中的能效性，以减少环境影响。
methods: 这个研究使用了一新的绿色精细化技术（GreenTrainer），可以根据不同的网络层次和精细化目标，选择最适合的网络层次和精细化方法，以最大化精细化效率和降低总计算量（FLOPs）。
results: 实验结果显示，比较于精细化整个LLM模型，GreenTrainer可以降低总计算量（FLOPs）达64%，而且与其他已有的精细化技术相比，GreenTrainer可以实现更高的模型准确度和相似的总计算量降低。

Abstract
Fine-tuning is the most effective way of adapting pre-trained large language models (LLMs) to downstream applications. With the fast growth of LLM-enabled AI applications and democratization of open-souced LLMs, fine-tuning has become possible for non-expert individuals, but intensively performed LLM fine-tuning worldwide could result in significantly high energy consumption and carbon footprint, which may bring large environmental impact. Mitigating such environmental impact towards Green AI directly correlates to reducing the FLOPs of fine-tuning, but existing techniques on efficient LLM fine-tuning can only achieve limited reduction of such FLOPs, due to their ignorance of the backpropagation cost in fine-tuning. To address this limitation, in this paper we present GreenTrainer, a new LLM fine-tuning technique that adaptively evaluates different tensors' backpropagation costs and contributions to the fine-tuned model accuracy, to minimize the fine-tuning cost by selecting the most appropriate set of tensors in training. Such selection in GreenTrainer is made based on a given objective of FLOPs reduction, which can flexibly adapt to the carbon footprint in energy supply and the need in Green AI. Experiment results over multiple open-sourced LLM models and abstractive summarization datasets show that, compared to fine-tuning the whole LLM model, GreenTrainer can save up to 64% FLOPs in fine-tuning without any noticeable model accuracy loss. Compared to the existing fine-tuning techniques such as LoRa, GreenTrainer can achieve up to 4% improvement on model accuracy with on-par FLOPs reduction.

摘要
大量语言模型（LLM）的先进修改是下游应用最有效的方法。随着AI应用的快速发展和开源LLM的普及，非专家个人也可以进行修改，但是全球范围内的修改具有极高的能源消耗和碳脚印，可能对环境产生很大的影响。为了 Mitigating这些环境影响，我们在这篇论文中提出了GreenTrainer，一种新的LLM修改技术，可以自动评估不同张量的反射成本和精度贡献，以最小化修改成本。这种选择在GreenTrainer中是基于一个给定的硬件成本目标，可以适应不同的能源供应和绿色AI的需求。实验结果表明，相比于整个LLM模型的修改，GreenTrainer可以在不同的开源LLM模型和概括摘要 datasets 上节省到64%的FLOPs，而无需 sacrifiSing model精度。相比之下，与LoRa等现有的修改技术，GreenTrainer可以达到4%的模型精度提升，同时具有相同的FLOPs减少。

Masked Discriminators for Content-Consistent Unpaired Image-to-Image Translation

paper_url: http://arxiv.org/abs/2309.13188
repo_url: https://github.com/bonifazstuhr/feamgan
paper_authors: Bonifaz Stuhr, Jürgen Brauer, Bernhard Schick, Jordi Gonzàlez
for: 这篇论文的目的是提高零对零图像转换的效能，尤其是在实际应用中遇到的问题，例如内容不一致和模式转换问题。
methods: 这篇论文使用的方法包括对全球检测器的输入进行封页，并使用对照抽样法选取小图像组合，以及对生成器流中的内容统计进行选择性标准化。
results: 这篇论文的实验结果显示，使用这些方法可以大幅提高零对零图像转换的效能，特别是在实际应用中的图像转换和天气转换领域。此外，论文还提出了一个新的评估指标（cKVD），可以更好地评估图像转换的质量。

Abstract
A common goal of unpaired image-to-image translation is to preserve content consistency between source images and translated images while mimicking the style of the target domain. Due to biases between the datasets of both domains, many methods suffer from inconsistencies caused by the translation process. Most approaches introduced to mitigate these inconsistencies do not constrain the discriminator, leading to an even more ill-posed training setup. Moreover, none of these approaches is designed for larger crop sizes. In this work, we show that masking the inputs of a global discriminator for both domains with a content-based mask is sufficient to reduce content inconsistencies significantly. However, this strategy leads to artifacts that can be traced back to the masking process. To reduce these artifacts, we introduce a local discriminator that operates on pairs of small crops selected with a similarity sampling strategy. Furthermore, we apply this sampling strategy to sample global input crops from the source and target dataset. In addition, we propose feature-attentive denormalization to selectively incorporate content-based statistics into the generator stream. In our experiments, we show that our method achieves state-of-the-art performance in photorealistic sim-to-real translation and weather translation and also performs well in day-to-night translation. Additionally, we propose the cKVD metric, which builds on the sKVD metric and enables the examination of translation quality at the class or category level.

摘要
通常的目标对于无配对图像到图像翻译是保持源图像和翻译图像的内容一致性，同时模仿目标领域的样式。由于两个频谱的数据集之间存在偏见，许多方法受到翻译过程中的不一致性的影响。大多数引入的方法不约束探测器，导致训练 setup 更加不确定。此外，这些方法没有考虑更大的融合尺度。在这种情况下，我们表明，对两个频谱的总探测器的输入进行内容基于的蒙版是可以减少内容不一致性的。然而，这种策略会导致蒙版过程中的痕迹。为了减少这些痕迹，我们引入了一个本地探测器，该探测器在两个小尺度的匹配对上运行。此外，我们采用这种匹配策略来选择全局输入尺度上的源和目标数据集的输入。此外，我们提出了内容基于的降ormalization来选择性地包含生成器流中的内容统计。在我们的实验中，我们发现我们的方法可以在实际图像到图像翻译中达到领先的性能，并且在天气翻译和白天到夜晚翻译中也表现良好。此外，我们提出了 cKVD 指标，它基于 sKVD 指标，可以对翻译质量进行分类或类别级别的评估。

Diagnosing and exploiting the computational demands of videos games for deep reinforcement learning

paper_url: http://arxiv.org/abs/2309.13181
repo_url: None
paper_authors: Lakshmi Narasimhan Govindarajan, Rex G Liu, Drew Linsley, Alekh Karkada Ashok, Max Reuter, Michael J Frank, Thomas Serre
for: 这篇论文旨在探讨深度强化学习（dRL）算法是否可以在视频游戏中学习如人类一样，以及这些成功是由视觉表示学习或强化学习算法的发现更好策略带来的。
methods: 作者提出了学习挑战诊断器（LCD）工具，用于分解任务中的视觉学习和强化学习需求。通过LCD，作者在Procgenbenchmark中发现了一种新的挑战分类，并证明这些预测具有高可靠性和可以指导算法开发。
results: 作者通过LCD发现了多种在优化dRL算法上整个视频游戏benchmark时出现的失败案例，并提供了更有效的进程路径。

Abstract
Humans learn by interacting with their environments and perceiving the outcomes of their actions. A landmark in artificial intelligence has been the development of deep reinforcement learning (dRL) algorithms capable of doing the same in video games, on par with or better than humans. However, it remains unclear whether the successes of dRL models reflect advances in visual representation learning, the effectiveness of reinforcement learning algorithms at discovering better policies, or both. To address this question, we introduce the Learning Challenge Diagnosticator (LCD), a tool that separately measures the perceptual and reinforcement learning demands of a task. We use LCD to discover a novel taxonomy of challenges in the Procgen benchmark, and demonstrate that these predictions are both highly reliable and can instruct algorithmic development. More broadly, the LCD reveals multiple failure cases that can occur when optimizing dRL algorithms over entire video game benchmarks like Procgen, and provides a pathway towards more efficient progress.

摘要
人类学习通过与环境互动和行为结果互动。人工智能领域的一个里程碑是开发深度奖励学习（dRL）算法，能够在电子游戏中学习，与人类或更好的性能。然而，未知 Whether the successes of dRL models reflect advances in visual representation learning, the effectiveness of reinforcement learning algorithms at discovering better policies, or both。为解决这个问题，我们引入学习挑战评价器（LCD），一种能够分解任务的视觉和奖励学习需求。我们使用LCD发现了Procgenbenchmark中的一种新分类器，并证明这些预测具有高可靠性和可以指导算法开发。更广泛地说，LCD揭示了优化dRL算法整个电子游戏benchmark like Procgen时可能出现的多种失败情况，并提供了更有效的进程。

AI Risk Profiles: A Standards Proposal for Pre-Deployment AI Risk Disclosures

paper_url: http://arxiv.org/abs/2309.13176
repo_url: None
paper_authors: Eli Sherman, Ian W. Eisenberg
for: 本研究旨在提出一种风险评估标准，用于导引下游决策，包括评估风险、购买和部署，以及指导法规制定。
methods: 本研究使用了作者提出的AI风险分类法，将广泛的风险提议分类到高级分类层次。furthermore, the authors propose a template-based methodology for collating risk information into a standard, yet flexible, structure.
results: 作者采用公开可用信息，应用这种方法对许多知名的AI系统进行了风险评估。结果显示，这种方法可以帮助consumers更好地理解AI系统的风险，并且可以导引下游决策。

Abstract
As AI systems' sophistication and proliferation have increased, awareness of the risks has grown proportionally (Sorkin et al. 2023). In response, calls have grown for stronger emphasis on disclosure and transparency in the AI industry (NTIA 2023; OpenAI 2023b), with proposals ranging from standardizing use of technical disclosures, like model cards (Mitchell et al. 2019), to yet-unspecified licensing regimes (Sindhu 2023). Since the AI value chain is complicated, with actors representing various expertise, perspectives, and values, it is crucial that consumers of a transparency disclosure be able to understand the risks of the AI system the disclosure concerns. In this paper we propose a risk profiling standard which can guide downstream decision-making, including triaging further risk assessment, informing procurement and deployment, and directing regulatory frameworks. The standard is built on our proposed taxonomy of AI risks, which reflects a high-level categorization of the wide variety of risks proposed in the literature. We outline the myriad data sources needed to construct informative Risk Profiles and propose a template-based methodology for collating risk information into a standard, yet flexible, structure. We apply this methodology to a number of prominent AI systems using publicly available information. To conclude, we discuss design decisions for the profiles and future work.

摘要
随着人工智能系统的复杂性和普及度的增加，关注这些风险的意识也在不断增长（索金等2023年）。作为回应，各方强调了更加强制的披透和透明度在人工智能业务中（NTIA等2023年；OpenAI等2023年），并提出了从技术披透标准化到未定许可证 regime（ sindhu等2023年）。由于人工智能价值链非常复杂，各个参与者具有不同的专业知识、观点和价值观，因此在下游决策过程中，披透报告的消费者必须能够理解关注的人工智能系统风险。在这篇论文中，我们提议了一个风险评估标准，可以导引下游决策，包括抢救进一步风险评估、指导采购和部署，以及指导法规框架。这个标准基于我们提议的人工智能风险分类体系，该分类体系反映了Literature中提出的广泛的风险。我们描述了构建信息的各种数据来源，并提议一种模板基于的方法来整理风险信息 into a standard， yet flexible 的结构。我们应用这种方法到了一些公开available的人工智能系统上。最后，我们讨论了配置风险profile的设计决策和未来工作。

Investigating Efficient Deep Learning Architectures For Side-Channel Attacks on AES

paper_url: http://arxiv.org/abs/2309.13170
repo_url: None
paper_authors: Yohaï-Eliel Berreby, Laurent Sauvage
for: 这项研究是为了提高深度学习在嵌入式 криптографических应用中的攻击效率，并减少计算资源和数据量的成本。
methods: 这项研究使用了 JAX 框架，并 investigate 了不同的 Transformer 模型，以便复制和提高先前的结果。
results: 研究人员在 ANSSI Side-Channel Attack Database (ASCAD) 上实现了一些先前已知的攻击结果，并在这些结果的基础之上做出了进一步的改进。

Abstract
Over the past few years, deep learning has been getting progressively more popular for the exploitation of side-channel vulnerabilities in embedded cryptographic applications, as it offers advantages in terms of the amount of attack traces required for effective key recovery. A number of effective attacks using neural networks have already been published, but reducing their cost in terms of the amount of computing resources and data required is an ever-present goal, which we pursue in this work. We focus on the ANSSI Side-Channel Attack Database (ASCAD), and produce a JAX-based framework for deep-learning-based SCA, with which we reproduce a selection of previous results and build upon them in an attempt to improve their performance. We also investigate the effectiveness of various Transformer-based models.

摘要
在过去几年，深度学习在嵌入式加密应用中利用侧渠攻击的潜力得到了普遍的推广，因为它在关键恢复方面提供了更多的优势。许多使用神经网络的有效攻击已经发表，但减少计算资源和数据需求的成本仍然是一个持续的目标。我们在这种工作中关注ASCAD侧渠攻击数据库（ANSSI Side-Channel Attack Database），并基于JAX框架开发了深度学习基于SCA的框架，可以重现一些先前的结果并将其扩展以提高性能。我们还研究了不同的Transformer模型的效果。

Large Language Models Are Also Good Prototypical Commonsense Reasoners

paper_url: http://arxiv.org/abs/2309.13165
repo_url: None
paper_authors: Chenin Li, Qianglong Chen, Yin Zhang, Yifei Zhang, Hongxiang Yao
For: The paper aims to improve the performance of large language models on complex reasoning tasks by developing novel prompts that better support the models’ commonsense reasoning abilities.* Methods: The authors draw inspiration from the outputs of large models for tailored tasks and semi-automatically develop a set of novel prompts from multiple perspectives, including task-relevance, supportive evidence generation, and diverse path decoding.* Results: The experimental results on the ProtoQA dataset demonstrate that the proposed prompts can achieve a new state-of-the-art (SOTA) on the ProtoQA leaderboard, with improvements of 8% and 4% in the Max Answer@1 and Max Incorrect@1 scores, respectively, compared to the previous SOTA model. The generated Chain-of-Thought and knowledge also improve the interpretability of the model.

Abstract
Commonsense reasoning is a pivotal skill for large language models, yet it presents persistent challenges in specific tasks requiring this competence. Traditional fine-tuning approaches can be resource-intensive and potentially compromise a model's generalization capacity. Furthermore, state-of-the-art language models like GPT-3.5 and Claude are primarily accessible through API calls, which makes fine-tuning models challenging. To address these challenges, we draw inspiration from the outputs of large models for tailored tasks and semi-automatically developed a set of novel prompts from several perspectives, including task-relevance, supportive evidence generation (e.g. chain-of-thought and knowledge), diverse path decoding to aid the model. Experimental results on ProtoQA dataset demonstrate that with better designed prompts we can achieve the new state-of-art(SOTA) on the ProtoQA leaderboard, improving the Max Answer@1 score by 8%, Max Incorrect@1 score by 4% (breakthrough 50% for the first time) compared to the previous SOTA model and achieved an improvement on StrategyQA and CommonsenseQA2.0 (3% and 1%, respectively). Furthermore, with the generated Chain-of-Thought and knowledge, we can improve the interpretability of the model while also surpassing the previous SOTA models. We hope that our work can provide insight for the NLP community to develop better prompts and explore the potential of large language models for more complex reasoning tasks.

摘要
大型语言模型的通质性理解是一项重要的技能，但是在特定任务中表现出 persistente 挑战。传统的精度调整方法可能是资源占用的和可能妨碍模型的总体化能力。此外，当前的语言模型如 GPT-3.5 和 Claude 都是通过 API 调用来访问，这使得模型的调整变得困难。为了解决这些挑战，我们从大型模型的输出中提取了特定任务的输出，并 semi-自动生成了一组新的提示，包括任务相关性、证据生成（如链条思维和知识）和多种路径解码，以帮助模型。实验结果表明，我们的提示设计可以超越前一个 SOTA 模型在 ProtoQA 数据集上的 Max Answer@1 得分，提高了8%，并且在 Max Incorrect@1 得分上提高了4%（打破了50%的首次纪录）。此外，我们还可以通过生成的链条思维和知识提高模型的解释性，并超越了前一个 SOTA 模型。我们希望，我们的工作可以为 NLP 社区提供灵感，开发更好的提示，探索大型语言模型在更复杂的理解任务中的潜在能力。

GAMIX-VAE: A VAE with Gaussian Mixture Based Posterior

paper_url: http://arxiv.org/abs/2309.13160
repo_url: None
paper_authors: Mariano Rivera
for: 这篇论文探讨了变量自动编码器（VAEs）中关键的底下勒比级（KL）差异，它是生成模型和表示学习中机器学习中的一个重要组成部分。
methods: 该论文提出了一种新的ELBO定义，使用混合 Gaussian 来描述 posterior 概率分布，并在权重抑制方面添加了一个正则化项以避免减少抖动。它还使用 PatchGAN 探测器来提高 texture 的真实感。
results: 实验表明该方法可以生成真实的面孔，提供了一种可行的解决方案来增强 VAE 基于的生成模型。

Abstract
Variational Autoencoders (VAEs) have become a cornerstone in generative modeling and representation learning within machine learning. This paper explores a nuanced aspect of VAEs, focusing on interpreting the Kullback Leibler (KL) Divergence, a critical component within the Evidence Lower Bound (ELBO) that governs the trade-off between reconstruction accuracy and regularization. While the KL Divergence enforces alignment between latent variable distributions and a prior imposing a structure on the overall latent space but leaves individual variable distributions unconstrained. The proposed method redefines the ELBO with a mixture of Gaussians for the posterior probability, introduces a regularization term to prevent variance collapse, and employs a PatchGAN discriminator to enhance texture realism. Implementation details involve ResNetV2 architectures for both the Encoder and Decoder. The experiments demonstrate the ability to generate realistic faces, offering a promising solution for enhancing VAE based generative models.

摘要

Contextual Emotion Estimation from Image Captions

paper_url: http://arxiv.org/abs/2309.13136
repo_url: None
paper_authors: Vera Yang, Archita Srivastava, Yasaman Etesam, Chuxuan Zhang, Angelica Lim
for: 这 paper 探索了 Whether Large Language Models (LLMs) 可以支持情绪 estimation 任务，通过首先生成图像描述，然后使用 LLM 进行推理。
methods: 这 paper 使用了 Computer Vision 方法来直接测量人们的情绪，并使用 LLM 进行推理。
results: 研究发现，GPT-3.5 模型可以提供 surprisingly 合理的情绪预测，但是准确度可以随情绪概念而变化。 Overall, the results suggest promise in the image captioning and LLM approach.

Abstract
Emotion estimation in images is a challenging task, typically using computer vision methods to directly estimate people's emotions using face, body pose and contextual cues. In this paper, we explore whether Large Language Models (LLMs) can support the contextual emotion estimation task, by first captioning images, then using an LLM for inference. First, we must understand: how well do LLMs perceive human emotions? And which parts of the information enable them to determine emotions? One initial challenge is to construct a caption that describes a person within a scene with information relevant for emotion perception. Towards this goal, we propose a set of natural language descriptors for faces, bodies, interactions, and environments. We use them to manually generate captions and emotion annotations for a subset of 331 images from the EMOTIC dataset. These captions offer an interpretable representation for emotion estimation, towards understanding how elements of a scene affect emotion perception in LLMs and beyond. Secondly, we test the capability of a large language model to infer an emotion from the resulting image captions. We find that GPT-3.5, specifically the text-davinci-003 model, provides surprisingly reasonable emotion predictions consistent with human annotations, but accuracy can depend on the emotion concept. Overall, the results suggest promise in the image captioning and LLM approach.

摘要
人工智能识别人类情感是一项复杂的任务，通常使用计算机视觉方法直接测量人脸、姿势和上下文信息来确定人们的情感。在这篇论文中，我们考虑了使用大型自然语言模型（LLM）来支持情感识别任务。我们首先需要了解： LLM 如何识别人类情感吗？哪些信息使得它们能够确定情感呢？我们的首要挑战是构建一个描述人在场景中的自然语言描述，以便用 LLM 进行推理。为此，我们提出了一组面部、身体、互动和环境等自然语言描述器。我们使用它们手动生成了331个图像集EMOTIC中的图像caption和情绪标注。这些caption提供了可解释的表示方式，用于了解场景元素对情感识别在 LLM 和其他方法中的影响。其次，我们测试了一个大型自然语言模型（GPT-3.5）是否可以从图像caption中推断出情绪。我们发现，特别是text-davinci-003模型，能够提供相对准确的情绪预测，但是准确程度可能取决于情绪概念。总的来说，结果表明了图像captioning和LLM方法的潜在优势。

Insights from an OTTR-centric Ontology Engineering Methodology

paper_url: http://arxiv.org/abs/2309.13130
repo_url: None
paper_authors: Moritz Blum, Basil Ell, Philipp Cimiano
for: This paper is written for the purpose of discussing the use of OTTR templates in ontology engineering for the domain of Material Science.
methods: The paper uses a bottom-up and top-down approach to ontology engineering, starting with existing data and using OTTR templates to feed the data into a knowledge graph.
results: The paper finds that OTTR templates are useful for communicating with domain experts and that the engineering process becomes flexible as a result of encapsulating modeling decisions.Here’s the same information in Simplified Chinese text:
for: 这篇论文是为了介绍使用OTTR模板在材料科学领域的ontology工程。
methods: 这篇论文使用底层和顶层的方法来实现ontology工程，从现有数据开始，使用OTTR模板将数据feed到知识图。
results: 这篇论文发现OTTR模板在与领域专家交流时非常有用，并且因为模板封装了设计决策，因此工程过程变得更加灵活，可以轻松地修改设计决策。

Abstract
OTTR is a language for representing ontology modeling patterns, which enables to build ontologies or knowledge bases by instantiating templates. Thereby, particularities of the ontological representation language are hidden from the domain experts, and it enables ontology engineers to, to some extent, separate the processes of deciding about what information to model from deciding about how to model the information, e.g., which design patterns to use. Certain decisions can thus be postponed for the benefit of focusing on one of these processes. To date, only few works on ontology engineering where ontology templates are applied are described in the literature. In this paper, we outline our methodology and report findings from our ontology engineering activities in the domain of Material Science. In these activities, OTTR templates play a key role. Our ontology engineering process is bottom-up, as we begin modeling activities from existing data that is then, via templates, fed into a knowledge graph, and it is top-down, as we first focus on which data to model and postpone the decision of how to model the data. We find, among other things, that OTTR templates are especially useful as a means of communication with domain experts. Furthermore, we find that because OTTR templates encapsulate modeling decisions, the engineering process becomes flexible, meaning that design decisions can be changed at little cost.

摘要
OTTR 是一种用于表示 ontology 模式的语言，它可以帮助建立 ontology 或知识库 by instantiating 模板。因此，ontological 表示语言中的特定特点被隐藏，使得域专家不必关注这些特点，可以更专注于决定需要模型的信息和使用哪些设计模式。这样可以抽象出一些决策，以便更专注于一个过程中。在现有文献中，只有一些关于 ontology 工程的研究描述了使用 ontology 模板。在本文中，我们介绍了我们的方法和在材料科学领域中的 ontology 工程活动的发现。在这些活动中，OTTR 模板扮演了关键的角色。我们的 ontology 工程过程是底向的，我们从现有数据开始，将其通过模板feed into 知识图，并是顶向的，我们首先决定需要模型的数据，然后决定如何模型数据。我们发现 OTTR 模板非常有用作域专家与之交流的工具。此外，我们发现因为 OTTR 模板封装了模型决策，工程过程变得灵活，可以在低成本下更改设计决策。

paper_url: http://arxiv.org/abs/2309.13043
repo_url: None
paper_authors: Linfeng Zhao, Hongyu Li, Taskin Padir, Huaizu Jiang, Lawson L. S. Wong
for: 提高机器人导航的学习效率和稳定性，满足实际应用中的需求。
methods: 利用欧几何同态性在规划中，实现参数共享和稳定的训练。在不结构化环境中，通过几何图形规划和对称性保持的消息传递网络实现值迭代。还提出了一种可学习的协变层，将特征映射到 DESIRED 空间。
results: 在五种多样化任务中，包括结构化和不结构化环境，以及已知和未知的目标点或Semantic goal，实现了训练效率、稳定性和泛化性的显著改进。

Abstract
Learning for robot navigation presents a critical and challenging task. The scarcity and costliness of real-world datasets necessitate efficient learning approaches. In this letter, we exploit Euclidean symmetry in planning for 2D navigation, which originates from Euclidean transformations between reference frames and enables parameter sharing. To address the challenges of unstructured environments, we formulate the navigation problem as planning on a geometric graph and develop an equivariant message passing network to perform value iteration. Furthermore, to handle multi-camera input, we propose a learnable equivariant layer to lift features to a desired space. We conduct comprehensive evaluations across five diverse tasks encompassing structured and unstructured environments, along with maps of known and unknown, given point goals or semantic goals. Our experiments confirm the substantial benefits on training efficiency, stability, and generalization.

摘要
学习 robot 导航存在一个极其紧迫和挑战性的任务。因为实际世界数据的稀缺和高价，我们需要开发高效的学习方法。在本文中，我们利用二维 Navigation 中的欧几何 симметрия，来实现参数共享。为了处理无结构环境，我们将导航问题定义为在几何图形上进行规划，并开发了一个对称报essage passing网络来实现值迭代。此外，我们还提出了一个可学习的对称层，以提高多摄像头输入的特征提取。我们在五种不同的任务中进行了广泛的评估，包括结构化和无结构化环境，以及已知和未知的点目标或semantic目标。我们的实验表明，我们的方法可以提高训练效率、稳定性和泛化能力。

MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation

paper_url: http://arxiv.org/abs/2309.13042
repo_url: https://github.com/jiahao000/mosaicfusion
paper_authors: Jiahao Xie, Wei Li, Xiangtai Li, Ziwei Liu, Yew Soon Ong, Chen Change Loy
for: 这篇论文是为了提出一种新的扩展数据生成方法，以提高大词汇实例分割器的性能。
methods: 该方法使用了 diffusion-based 数据生成方法，不需要任何标注数据，可以使用存在的文本至图生成器来生成多个实例。
results: 实验结果表明，使用该方法可以生成大量的合理标注数据，特别是对于罕见和新类别。这有助于提高现有的实例分割器的性能，特别是对于罕见和新类别。

Abstract
We present MosaicFusion, a simple yet effective diffusion-based data augmentation approach for large vocabulary instance segmentation. Our method is training-free and does not rely on any label supervision. Two key designs enable us to employ an off-the-shelf text-to-image diffusion model as a useful dataset generator for object instances and mask annotations. First, we divide an image canvas into several regions and perform a single round of diffusion process to generate multiple instances simultaneously, conditioning on different text prompts. Second, we obtain corresponding instance masks by aggregating cross-attention maps associated with object prompts across layers and diffusion time steps, followed by simple thresholding and edge-aware refinement processing. Without bells and whistles, our MosaicFusion can produce a significant amount of synthetic labeled data for both rare and novel categories. Experimental results on the challenging LVIS long-tailed and open-vocabulary benchmarks demonstrate that MosaicFusion can significantly improve the performance of existing instance segmentation models, especially for rare and novel categories. Code will be released at https://github.com/Jiahao000/MosaicFusion.

摘要
我们介绍MosaicFusion，一种简单 yet effective的扩散基于数据增强方法，用于大词汇实例分割。我们的方法不需要任何标注指导。我们使用两个关键设计来使用市场上可用的文本到图像扩散模型来生成对象实例和mask注释。首先，我们将图像canvas分成多个区域，并在不同的文本提示下进行单次扩散过程，以同时生成多个实例。其次，我们通过聚合层和扩散时间步骤之间的交叉注意力图来获得对象提示的集合，然后进行简单的阈值设定和边缘敏感处理来获得实例mask。无论精雕的设计，MosaicFusion可以生成大量的合成标注数据，特别是为罕见和新类别。我们的实验结果表明，MosaicFusion可以大幅提高现有实例分割模型的性能，特别是为罕见和新类别。代码将在https://github.com/Jiahao000/MosaicFusion中发布。

Memory-augmented conformer for improved end-to-end long-form ASR

paper_url: http://arxiv.org/abs/2309.13029
repo_url: https://github.com/miamoto/conformer-ntm
paper_authors: Carlos Carvalho, Alberto Abad
for: 用于自动语音识别（ASR）模型的改进，特别是对长句子的表现。
methods: 使用外部可微分储存网络（NTM）和encoder-decoder结构的协同作用，以扩展对长句子的泛化能力。
results: 在使用Librispeech数据集的train-clean-100和train-960集上，提出的模型比基eline conformer ohne memory更高的表现于长句子。

Abstract
Conformers have recently been proposed as a promising modelling approach for automatic speech recognition (ASR), outperforming recurrent neural network-based approaches and transformers. Nevertheless, in general, the performance of these end-to-end models, especially attention-based models, is particularly degraded in the case of long utterances. To address this limitation, we propose adding a fully-differentiable memory-augmented neural network between the encoder and decoder of a conformer. This external memory can enrich the generalization for longer utterances since it allows the system to store and retrieve more information recurrently. Notably, we explore the neural Turing machine (NTM) that results in our proposed Conformer-NTM model architecture for ASR. Experimental results using Librispeech train-clean-100 and train-960 sets show that the proposed system outperforms the baseline conformer without memory for long utterances.

摘要
具有最新提议的具有竞争力的模型方法（Conformer）在自动语音识别（ASR）中表现出色，超过了基于回归神经网络的方法和变换器。然而，在总体来说，这些端到端模型，特别是带有注意力的模型，在长句子情况下表现较差。为解决这一限制，我们提议在编码器和解码器之间添加一个可微分的内存增强神经网络。这个外部内存可以为长句子提供更多的信息，从而提高系统的总体化能力。我们研究了神经图理 machine（NTM），从而得到我们的提议的 Conformer-NTM 模型体系结构。实验结果表明，提议的系统在 Librispeech train-clean-100 和 train-960 集上比基础 Conformer ohne 内存表现出色。

OpportunityFinder: A Framework for Automated Causal Inference

paper_url: http://arxiv.org/abs/2309.13103
repo_url: None
paper_authors: Huy Nguyen, Prince Grover, Devashish Khatwani
for: 用于执行对屏幕数据进行多种 causal inference 研究，为非专家用户提供可编程代码的框架。
methods: 使用 raw 观察数据和配置文件，触发管道进行数据检查/处理，选择适合的算法执行 causal 研究，并返回对 outcome 的 causal 影响，以及敏感性和稳定性结果。
results: 返回 causal 影响的结果，包括对 outcome 的 causal 影响，以及敏感性和稳定性结果。

Abstract
We introduce OpportunityFinder, a code-less framework for performing a variety of causal inference studies with panel data for non-expert users. In its current state, OpportunityFinder only requires users to provide raw observational data and a configuration file. A pipeline is then triggered that inspects/processes data, chooses the suitable algorithm(s) to execute the causal study. It returns the causal impact of the treatment on the configured outcome, together with sensitivity and robustness results. Causal inference is widely studied and used to estimate the downstream impact of individual's interactions with products and features. It is common that these causal studies are performed by scientists and/or economists periodically. Business stakeholders are often bottle-necked on scientist or economist bandwidth to conduct causal studies. We offer OpportunityFinder as a solution for commonly performed causal studies with four key features: (1) easy to use for both Business Analysts and Scientists, (2) abstraction of multiple algorithms under a single I/O interface, (3) support for causal impact analysis under binary treatment with panel data and (4) dynamic selection of algorithm based on scale of data.

摘要
我们介绍OpportunityFinder，一个无程式码框架，用于实现各种对组合数据进行可能性推论的不专家用户。目前情况下，OpportunityFinder只需用户提供原始观察数据和配置文件，然后触发一个管道，将数据进行检查和处理，选择适当的算法来执行可能性研究。它返回对定结果的影响，以及敏感度和稳定性结果。可能性推论广泛研究和使用，用于估计个人对产品和功能互动所产生的下游影响。这些可能性研究通常由科学家和/或经济学家定期进行。企业决策者往往因为科学家或经济学家的专业压力而受到瓶颈。我们提供OpportunityFinder作为常见的可能性研究解决方案，具有以下四个关键特点：1. 易用，适合商业分析师和科学家使用。2. 多种算法的抽象，通过单一的输入界面进行处理。3. 支持对组合数据进行可能性影响分析，并且仅需进行二进制对待。4. 基于数据的尺度进行动态算法选择。

A Hybrid Deep Learning-based Approach for Optimal Genotype by Environment Selection

paper_url: http://arxiv.org/abs/2309.13021
repo_url: None
paper_authors: Zahra Khalilzadeh, Motahareh Kashanian, Saeed Khaki, Lizhi Wang
For: The paper aims to improve crop yield prediction by integrating weather data across the growing season, especially for different crop varieties, to understand their adaptability in the face of climate change.* Methods: The authors used a dataset of 93,028 training records and 10,337 test records, covering 159 locations across 28 U.S. states and Canadian provinces over 13 years (2003-2015). They developed two novel convolutional neural network (CNN) architectures: the CNN-DNN model and the CNN-LSTM-DNN model. They also used the Generalized Ensemble Method (GEM) to determine optimal model weights.* Results: The GEM model achieved lower RMSE (5.55% to 39.88%), reduced MAE (5.34% to 43.76%), and higher correlation coefficients (1.1% to 10.79%) compared to baseline models. The CNN-DNN model was used to identify top-performing genotypes for various locations and weather conditions, aiding genotype selection based on weather variables.Here are the three points in Simplified Chinese text:* For: 这个论文目的是提高农业实践中的作物产量预测，以便更好地理解气候变化对作物的适应性。* Methods: 作者使用了一个包含93,028个训练记录和10,337个测试记录的数据集，覆盖了28个美国州和加拿大省的159个地点，时间跨度为13年（2003-2015）。他们开发了两种新的卷积神经网络模型：CNN-DNN模型和CNN-LSTM-DNN模型。他们还使用了通用ensemble方法（GEM）来确定优化模型的权重。* Results: GEM模型在测试数据上实现了较低的RMSE（5.55%到39.88%）、reduced MAE（5.34%到43.76%）和高于基eline模型的 correlation coefficient（1.1%到10.79%）。CNN-DNN模型用于在不同的地点和气候条件下预测最高产量的种子，帮助选择基于气候变量的种子。

Abstract
Precise crop yield prediction is essential for improving agricultural practices and ensuring crop resilience in varying climates. Integrating weather data across the growing season, especially for different crop varieties, is crucial for understanding their adaptability in the face of climate change. In the MLCAS2021 Crop Yield Prediction Challenge, we utilized a dataset comprising 93,028 training records to forecast yields for 10,337 test records, covering 159 locations across 28 U.S. states and Canadian provinces over 13 years (2003-2015). This dataset included details on 5,838 distinct genotypes and daily weather data for a 214-day growing season, enabling comprehensive analysis. As one of the winning teams, we developed two novel convolutional neural network (CNN) architectures: the CNN-DNN model, combining CNN and fully-connected networks, and the CNN-LSTM-DNN model, with an added LSTM layer for weather variables. Leveraging the Generalized Ensemble Method (GEM), we determined optimal model weights, resulting in superior performance compared to baseline models. The GEM model achieved lower RMSE (5.55% to 39.88%), reduced MAE (5.34% to 43.76%), and higher correlation coefficients (1.1% to 10.79%) when evaluated on test data. We applied the CNN-DNN model to identify top-performing genotypes for various locations and weather conditions, aiding genotype selection based on weather variables. Our data-driven approach is valuable for scenarios with limited testing years. Additionally, a feature importance analysis using RMSE change highlighted the significance of location, MG, year, and genotype, along with the importance of weather variables MDNI and AP.

摘要
precisión del cultivo de precisión es esencial para mejorar las prácticas agrícolas y asegurar la resistencia de los cultivos en climas variables. Integrar los datos meteorológicos durante la temporada de crecimiento, especialmente para diferentes variedades de cultivos, es crucial para comprender su adaptabilidad enfrentada al cambio climático. En el Desafío de Predicción de Yield de MLCAS2021, utilizamos un conjunto de datos de entrenamiento que comprendía 93,028 registros para predecir los rendimientos para 10,337 registros de prueba, que cubrían 159 ubicaciones en 28 estados y provincias canadiences durante 13 años (2003-2015). Este conjunto de datos incluyó detalles sobre 5,838 genotipos distinctos y datos meteorológicos diarios para una temporada de crecimiento de 214 días, lo que permitió un análisis exhaustivo. Como uno de los equipos ganadores, desarrollamos dos arquitecturas de red neuronal convolutional (CNN) nuevas: el modelo CNN-DNN, que combina redes neuronales convolutional y fully connected, y el modelo CNN-LSTM-DNN, con una capa adicional de LSTM para variables meteorológicas. Al utilizar el Método de Ensemble Generalizado (GEM), determinamos los pesos óptimos del modelo, lo que resultó en una performance superior a los modelos de referencia. El modelo GEM obtuvo una RMSE reducida (del 5,55% al 39,88%), una MAE reducida (del 5,34% al 43,76%) y coeficientes de correlación más altos (del 1,1% al 10,79%) cuando se evaluó en datos de prueba. Aplicamos el modelo CNN-DNN para identificar los genotipos más renderos para diferentes ubicaciones y condiciones meteorológicas, lo que es útil para la selección de genotipos basada en variables meteorológicas. Nuestra aproximación basada en datos es valiosa para escenarios con años de prueba limitados. Además, un análisis de importancia de características utilizando el cambio de RMSE destacó la importancia de la ubicación, MG, año y genotipo, así como la importancia de las variables meteorológicas MDNI y AP.

Efficient N:M Sparse DNN Training Using Algorithm, Architecture, and Dataflow Co-Design

paper_url: http://arxiv.org/abs/2309.13015
repo_url: None
paper_authors: Chao Fang, Wei Sun, Aojun Zhou, Zhongfeng Wang
for: 这个论文主要针对的是如何使用粗糙训练来降低深度神经网络（DNN）的计算成本，同时保持高度准确性。
methods: 本论文提出了一种 computation-efficient 的训练方案，包括算法、架构和数据流程合理设计。在算法层面，提出了一种双向权重剔除方法（BDWP），可以在前向和反向传播中利用 N：M 粗糙性来减少计算成本，同时保持模型准确性。在架构层面，提出了一种专门用于 DNN 训练的粗糙加速器（SAT），可以支持常见的稠密操作以及计算效率高的 N：M 粗糙操作。在数据流程层面，提出了多种优化方法，包括交叉映射、预生成 N：M 粗糙权重和离线调度等，以提高 SAT 的计算效率。
results: 实验结果显示，使用 SAT 加速器和 BDWP 粗糙训练方法，在 Xilinx VCU1525 FPGA 卡上使用不同的 DNN 模型和数据集，可以实现 average 速度提升1.75倍，同时减少了模型精度下降的0.56%。此外，我们的训练方案可以提高训练吞吐量2.97~~25.22倍和能效率1.36~~3.58倍 compared to 先前的 FPGA 加速器。

Abstract
Sparse training is one of the promising techniques to reduce the computational cost of DNNs while retaining high accuracy. In particular, N:M fine-grained structured sparsity, where only N out of consecutive M elements can be nonzero, has attracted attention due to its hardware-friendly pattern and capability of achieving a high sparse ratio. However, the potential to accelerate N:M sparse DNN training has not been fully exploited, and there is a lack of efficient hardware supporting N:M sparse training. To tackle these challenges, this paper presents a computation-efficient training scheme for N:M sparse DNNs using algorithm, architecture, and dataflow co-design. At the algorithm level, a bidirectional weight pruning method, dubbed BDWP, is proposed to leverage the N:M sparsity of weights during both forward and backward passes of DNN training, which can significantly reduce the computational cost while maintaining model accuracy. At the architecture level, a sparse accelerator for DNN training, namely SAT, is developed to neatly support both the regular dense operations and the computation-efficient N:M sparse operations. At the dataflow level, multiple optimization methods ranging from interleave mapping, pre-generation of N:M sparse weights, and offline scheduling, are proposed to boost the computational efficiency of SAT. Finally, the effectiveness of our training scheme is evaluated on a Xilinx VCU1525 FPGA card using various DNN models and datasets. Experimental results show the SAT accelerator with the BDWP sparse training method under 2:8 sparse ratio achieves an average speedup of 1.75x over that with the dense training, accompanied by a negligible accuracy loss of 0.56% on average. Furthermore, our proposed training scheme significantly improves the training throughput by 2.97~25.22x and the energy efficiency by 1.36~3.58x over prior FPGA-based accelerators.

摘要
\begin{blockquote}稀疏训练是深度学习模型的一种有前途的技术，可以降低深度学习模型的计算成本，保持高度准确。特别是N：M精细结构稀疏，在N个连续M个元素中只有N个可以非零，这种硬件友好的模式和高度稀疏比例的实现，引起了关注。然而，N：M稀疏深度学习训练的潜在加速仍未得到完全利用，缺少高效的硬件支持。为了解决这些挑战，本文提出了一种计算效率高的训练方案 для N：M稀疏深度学习模型，通过算法、建筑和数据流合理设计。在算法层面，我们提出了一种双向权重减少方法，称为BDWP，可以在深度学习模型的前向和反向传播中利用N：M稀疏的权重，大幅降低计算成本，保持模型准确性。在建筑层面，我们开发了一种适用于深度学习训练的稀疏加速器，称为SAT，可以方便支持常见的密集操作以及计算效率的N：M稀疏操作。在数据流层面，我们提出了多种优化方法，从批量映射、预生成N：M稀疏权重到离线调度，以提高SAT的计算效率。实验结果表明，使用我们的训练方案和SAT加速器，N：M稀疏深度学习模型在Xilinx VCU1525 FPGA卡上实现了1.75倍的速度提升，相对于密集训练方案，平均准确性损失为0.56%。此外，我们的训练方案可以提高训练吞吐量2.97~25.22倍和能效率1.36~3.58倍，至于先前的FPGA加速器。\end{blockquote}Note that the translation is done using the Simplified Chinese language setting, which may not be exactly the same as the Traditional Chinese language setting used in Taiwan.

ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs

paper_url: http://arxiv.org/abs/2309.13007
repo_url: https://github.com/dinobby/reconcile
paper_authors: Justin Chih-Yao Chen, Swarnadeep Saha, Mohit Bansal
for: 提高大型自然语言模型（LLM）的复杂理解能力
methods: 提出ReConcile模型，利用多个LLM代理在圆桌会议中互动，提高代理之间的多样化思维和沟通，以提高LLM的复杂理解能力
results: 在多个benchmark上实验表明，ReConcile模型可以大幅提高LLM的复杂理解能力，比对 Singleshot baseline和多代理baseline高出7.7%，并且在一些数据集上甚至超过GPT-4的表现。

Abstract
Large Language Models (LLMs) still struggle with complex reasoning tasks. Motivated by the society of minds (Minsky, 1988), we propose ReConcile, a multi-model multi-agent framework designed as a round table conference among diverse LLM agents to foster diverse thoughts and discussion for improved consensus. ReConcile enhances the reasoning capabilities of LLMs by holding multiple rounds of discussion, learning to convince other agents to improve their answers, and employing a confidence-weighted voting mechanism. In each round, ReConcile initiates discussion between agents via a 'discussion prompt' that consists of (a) grouped answers and explanations generated by each agent in the previous round, (b) their uncertainties, and (c) demonstrations of answer-rectifying human explanations, used for convincing other agents. This discussion prompt enables each agent to revise their responses in light of insights from other agents. Once a consensus is reached and the discussion ends, ReConcile determines the final answer by leveraging the confidence of each agent in a weighted voting scheme. We implement ReConcile with ChatGPT, Bard, and Claude2 as the three agents. Our experimental results on various benchmarks demonstrate that ReConcile significantly enhances the reasoning performance of the agents (both individually and as a team), surpassing prior single-agent and multi-agent baselines by 7.7% and also outperforming GPT-4 on some of these datasets. We also experiment with GPT-4 itself as one of the agents in ReConcile and demonstrate that its initial performance also improves by absolute 10.0% through discussion and feedback from other agents. Finally, we also analyze the accuracy after every round and observe that ReConcile achieves better and faster consensus between agents, compared to a multi-agent debate baseline. Our code is available at: https://github.com/dinobby/ReConcile

摘要
大型语言模型（LLM）仍然面临复杂的理解任务。 motivated by the society of minds（Minsky，1988），我们提议ReConcile，一种多模型多代理框架，设计为多个LLM代理在多种不同的思路和讨论中促进多元思维和讨论，以提高共识。ReConcile通过多轮讨论、学习感SErrors他们的答案，并使用可信度权重投票机制来提高LLMs的理解能力。在每轮讨论中，ReConcile通过'讨论提示'来initiate discussion between agents，该提示包括每个代理在上一轮的答案和解释、uncertainties，以及answer-rectifying human explanations，用于说服其他代理。这些讨论提示使每个代理可以根据其他代理的启示修改其答案。当讨论结束并达成共识时，ReConcile使用每个代理的可信度在权重投票机制中确定最终答案。我们在ChatGPT、Bard和Claude2作为三个代理来实现ReConcile。我们的实验结果表明，ReConcile可以明显提高LLMs的理解能力（both individually and as a team），比对前的单机和多机基线高出7.7%，同时也超过GPT-4在一些数据集上的性能。我们还在GPT-4作为一个代理参与ReConcile，并证明其初始性能也提高了绝对10.0%通过对其他代理的讨论和反馈。最后，我们还分析了每轮的准确率，发现ReConcile在多个代理之间达成共识的速度和准确率比multi-agent debate基线更高。我们的代码可以在https://github.com/dinobby/ReConcile上获取。

Pursuing Counterfactual Fairness via Sequential Autoencoder Across Domains

paper_url: http://arxiv.org/abs/2309.13005
repo_url: None
paper_authors: Yujie Lin, Chen Zhao, Minglai Shao, Baoluo Meng, Xujiang Zhao, Haifeng Chen
for: 提高机器学习系统在不同频率的数据上的性能，并在数据分布逐渐发展的过程中保持公平性。
methods: 提出了一种名为Counterfactual Fairness-Aware Domain Generalization with Sequential Autoencoder（CDSAE）的创新框架，该框架可以分离类别特征中的环境信息和敏感特征，从而提高模型在多样化和不 Familiar 频率上的泛化性能，同时也能够有效地解决不公正分类问题。
results: 通过在 sintetic 和实际世界数据集上的验证，证明了我们的方法可以提高准确率，同时保持公平性在数据分布逐渐发展的过程中。

Abstract
Recognizing the prevalence of domain shift as a common challenge in machine learning, various domain generalization (DG) techniques have been developed to enhance the performance of machine learning systems when dealing with out-of-distribution (OOD) data. Furthermore, in real-world scenarios, data distributions can gradually change across a sequence of sequential domains. While current methodologies primarily focus on improving model effectiveness within these new domains, they often overlook fairness issues throughout the learning process. In response, we introduce an innovative framework called Counterfactual Fairness-Aware Domain Generalization with Sequential Autoencoder (CDSAE). This approach effectively separates environmental information and sensitive attributes from the embedded representation of classification features. This concurrent separation not only greatly improves model generalization across diverse and unfamiliar domains but also effectively addresses challenges related to unfair classification. Our strategy is rooted in the principles of causal inference to tackle these dual issues. To examine the intricate relationship between semantic information, sensitive attributes, and environmental cues, we systematically categorize exogenous uncertainty factors into four latent variables: 1) semantic information influenced by sensitive attributes, 2) semantic information unaffected by sensitive attributes, 3) environmental cues influenced by sensitive attributes, and 4) environmental cues unaffected by sensitive attributes. By incorporating fairness regularization, we exclusively employ semantic information for classification purposes. Empirical validation on synthetic and real-world datasets substantiates the effectiveness of our approach, demonstrating improved accuracy levels while ensuring the preservation of fairness in the evolving landscape of continuous domains.

摘要
recognizing the prevalence of domain shift as a common challenge in machine learning, various domain generalization (DG) techniques have been developed to enhance the performance of machine learning systems when dealing with out-of-distribution (OOD) data. Furthermore, in real-world scenarios, data distributions can gradually change across a sequence of sequential domains. While current methodologies primarily focus on improving model effectiveness within these new domains, they often overlook fairness issues throughout the learning process. In response, we introduce an innovative framework called Counterfactual Fairness-Aware Domain Generalization with Sequential Autoencoder (CDSAE). This approach effectively separates environmental information and sensitive attributes from the embedded representation of classification features. This concurrent separation not only greatly improves model generalization across diverse and unfamiliar domains but also effectively addresses challenges related to unfair classification. Our strategy is rooted in the principles of causal inference to tackle these dual issues. To examine the intricate relationship between semantic information, sensitive attributes, and environmental cues, we systematically categorize exogenous uncertainty factors into four latent variables: 1) semantic information influenced by sensitive attributes, 2) semantic information unaffected by sensitive attributes, 3) environmental cues influenced by sensitive attributes, and 4) environmental cues unaffected by sensitive attributes. By incorporating fairness regularization, we exclusively employ semantic information for classification purposes. Empirical validation on synthetic and real-world datasets substantiates the effectiveness of our approach, demonstrating improved accuracy levels while ensuring the preservation of fairness in the evolving landscape of continuous domains.

Audience-specific Explanations for Machine Translation

paper_url: http://arxiv.org/abs/2309.12998
repo_url: None
paper_authors: Renhan Lou, Jan Niehues
for: 解决机器翻译中存在的certain words even if translated can cause incomprehension of the target language audience due to different cultural backgrounds.
methods: 我们提出了一种 semi-automatic technique to extract these explanations from a large parallel corpus.
results: 我们的方法能够从英语->德语、英语->法语和英语->中文语对取得较好的结果，其中有超过10%的句子包含解释，而原始句子中只有1.9%包含解释。

Abstract
In machine translation, a common problem is that the translation of certain words even if translated can cause incomprehension of the target language audience due to different cultural backgrounds. A solution to solve this problem is to add explanations for these words. In a first step, we therefore need to identify these words or phrases. In this work we explore techniques to extract example explanations from a parallel corpus. However, the sparsity of sentences containing words that need to be explained makes building the training dataset extremely difficult. In this work, we propose a semi-automatic technique to extract these explanations from a large parallel corpus. Experiments on English->German language pair show that our method is able to extract sentence so that more than 10% of the sentences contain explanation, while only 1.9% of the original sentences contain explanations. In addition, experiments on English->French and English->Chinese language pairs also show similar conclusions. This is therefore an essential first automatic step to create a explanation dataset. Furthermore we show that the technique is robust for all three language pairs.

摘要
在机器翻译中，一个常见的问题是翻译某些词汇，即使翻译成功，也可能导致目标语言群体的不理解，因为不同的文化背景。为解决这问题，一种解决方案是添加解释。在这项工作中，我们 explore 技术来提取示例解释。然而，在建立训练集时，由于翻译后的句子中包含需要解释的词汇的稀缺性，使得建立训练集非常困难。因此，我们提出了一种半自动的提取方法，来从大量的平行词典中提取示例解释。实验表明，我们的方法可以从英语->德语、英语->法语和英语->中文三种语言对的平行词典中提取句子，使得超过10%的句子包含解释，而原始句子中只有1.9%的句子包含解释。此外，我们还展示了该技术在三种语言对上的稳定性。因此，这是一项重要的自动第一步，用于创建解释数据集。

Higher-order Graph Convolutional Network with Flower-Petals Laplacians on Simplicial Complexes

paper_url: http://arxiv.org/abs/2309.12971
repo_url: https://github.com/zeniSoida/pl1
paper_authors: Yiming Huang, Yujie Zeng, Qiang Wu, Linyuan Lü
for: 这 paper 的目的是提出一种基于 simplicial complexes (SCs) 的高级别征特征学习方法，以增强 graph neural network (GNN) 的表达能力。
methods: 该方法基于 Flower-Petals (FP) 模型，并使用 learnable graph filters 来识别不同的高级别交互强度。
results: 实验结果表明，提出的模型可以在多种图任务上达到 state-of-the-art (SOTA) 性能，并且提供一个可扩展和灵活的解决方案来探索图中的高级别交互。

Abstract
Despite the recent successes of vanilla Graph Neural Networks (GNNs) on many tasks, their foundation on pairwise interaction networks inherently limits their capacity to discern latent higher-order interactions in complex systems. To bridge this capability gap, we propose a novel approach exploiting the rich mathematical theory of simplicial complexes (SCs) - a robust tool for modeling higher-order interactions. Current SC-based GNNs are burdened by high complexity and rigidity, and quantifying higher-order interaction strengths remains challenging. Innovatively, we present a higher-order Flower-Petals (FP) model, incorporating FP Laplacians into SCs. Further, we introduce a Higher-order Graph Convolutional Network (HiGCN) grounded in FP Laplacians, capable of discerning intrinsic features across varying topological scales. By employing learnable graph filters, a parameter group within each FP Laplacian domain, we can identify diverse patterns where the filters' weights serve as a quantifiable measure of higher-order interaction strengths. The theoretical underpinnings of HiGCN's advanced expressiveness are rigorously demonstrated. Additionally, our empirical investigations reveal that the proposed model accomplishes state-of-the-art (SOTA) performance on a range of graph tasks and provides a scalable and flexible solution to explore higher-order interactions in graphs.

摘要
尽管最近的vanilla图 neural network (GNN)在许多任务上表现出色，但它们的基础是对应的对之间互动网络，因此它们无法自然地捕捉复杂系统中隐藏的高阶互动。为bridge这个能力差距，我们提出了一种新的方法，利用 simplicial complexes (SC) 的丰富数学理论 - 一种可靠的工具 для模型高阶互动。现有的 SC 基于 GNN 受到高复杂性和僵化的限制，同时量化高阶互动强度仍然是挑战。我们创新地提出了一种高阶花 petal (FP) 模型，将 FP Laplacians 引入 SC 中。此外，我们还介绍了一种基于 FP Laplacians 的高阶图卷积网络 (HiGCN)，可以在不同的 topological scale 上捕捉到系统内部的自适应特征。通过使用可学习的图滤波器，每个 FP Laplacian 域中的参数组，我们可以识别出多种具有不同特征的图pattern，其中滤波器的权重serve as a quantifiable measure of high-order interaction strengths。我们的理论基础的进一步证明和实验研究表明，提出的模型可以在许多图任务上达到领先的性能水平，并提供一个可扩展和灵活的解决方案来探索图中的高阶互动。

Trusta: Reasoning about Assurance Cases with Formal Methods and Large Language Models

paper_url: http://arxiv.org/abs/2309.12941
repo_url: None
paper_authors: Zezhong Chen, Yuxin Deng, Wenjie Du
for: This paper focuses on the development of a tool called Trustworthiness Derivation Tree Analyzer (Trusta) that automates the construction and verification of assurance cases for safety-critical systems.
methods: The tool uses formal methods, such as Prolog and constraint solvers like Z3 and MONA, to automatically reason about assurance cases. It also utilizes large language models like ChatGPT-3.5, ChatGPT-4, and PaLM 2 to generate and evaluate assurance cases, allowing for interactive human examination and modification.
results: The paper presents several industrial case studies that demonstrate the practical value of Trusta in finding subtle issues that are typically missed in manual inspection, and shows that the tool can quickly and efficiently enhance the assurance case development process.

Abstract
Assurance cases can be used to argue for the safety of products in safety engineering. In safety-critical areas, the construction of assurance cases is indispensable. Trustworthiness Derivation Trees (TDTs) enhance assurance cases by incorporating formal methods, rendering it possible for automatic reasoning about assurance cases. We present Trustworthiness Derivation Tree Analyzer (Trusta), a desktop application designed to automatically construct and verify TDTs. The tool has a built-in Prolog interpreter in its backend, and is supported by the constraint solvers Z3 and MONA. Therefore, it can solve constraints about logical formulas involving arithmetic, sets, Horn clauses etc. Trusta also utilizes large language models to make the creation and evaluation of assurance cases more convenient. It allows for interactive human examination and modification. We evaluated top language models like ChatGPT-3.5, ChatGPT-4, and PaLM 2 for generating assurance cases. Our tests showed a 50%-80% similarity between machine-generated and human-created cases. In addition, Trusta can extract formal constraints from text in natural languages, facilitating an easier interpretation and validation process. This extraction is subject to human review and correction, blending the best of automated efficiency with human insight. To our knowledge, this marks the first integration of large language models in automatic creating and reasoning about assurance cases, bringing a novel approach to a traditional challenge. Through several industrial case studies, Trusta has proven to quickly find some subtle issues that are typically missed in manual inspection, demonstrating its practical value in enhancing the assurance case development process.

摘要
可信度证明（Assurance Case）可以用于安全工程中证明产品的安全性。在安全关键领域，构建可信度证明是必备的。可信度推导树（TDT）可以增强可信度证明，使其可以进行自动的逻辑推理。我们介绍了一款名为“信任worthiness Derivation Tree Analyzer”（Trusta）的桌面应用程序，用于自动构建和验证TDT。Trusta具有内置的Prolog解释器，并支持Z3和MONA等约束解决器。因此，它可以解决包括逻辑形式中的数学、集合、扩展等约束。Trusta还利用大型自然语言模型来使创建和评估可信度证明更加方便。它允许交互式的人工检查和修改。我们对ChatGPT-3.5、ChatGPT-4和PaLM 2等大型自然语言模型进行测试，测试结果表明，机器生成的可信度证明与人类创建的可信度证明之间存在50%-80%的相似性。此外，Trusta可以从自然语言文本中提取形式约束，使评估和验证过程更加容易。这种提取是人类审核和修正的，将机器自动效率与人类智慧相结合。在我们所知道的情况下，Trusta是首次将大型自然语言模型 integrate into automatic creation and reasoning about assurance cases，为传统挑战提供了一种新的方法。通过多个工业案例研究，Trusta已经证明了它能够快速发现一些通常被人工检查掉的细微问题，表明了它在提高可信度证明开发过程的实际价值。

Self-Explanation Prompting Improves Dialogue Understanding in Large Language Models

paper_url: http://arxiv.org/abs/2309.12940
repo_url: None
paper_authors: Haoyu Gao, Ting-En Lin, Hangyu Li, Min Yang, Yuchuan Wu, Wentao Ma, Yongbin Li
for: 提高大语言模型在多回话对话中的理解能力
methods: 使用自我解释提示策略，让模型在对话开始前分析每个对话语言，提高对话中任务执行的表现
results: 经过实验证明，该策略可以在六个 benchmark 数据集中 consistently 超越零shot 提示和几个 shot 提示，达到或超过少量提示的效果，表明其可以强大地增强大语言模型在复杂对话任务中的理解能力。

Abstract
Task-oriented dialogue (TOD) systems facilitate users in executing various activities via multi-turn dialogues, but Large Language Models (LLMs) often struggle to comprehend these intricate contexts. In this study, we propose a novel "Self-Explanation" prompting strategy to enhance the comprehension abilities of LLMs in multi-turn dialogues. This task-agnostic approach requires the model to analyze each dialogue utterance before task execution, thereby improving performance across various dialogue-centric tasks. Experimental results from six benchmark datasets confirm that our method consistently outperforms other zero-shot prompts and matches or exceeds the efficacy of few-shot prompts, demonstrating its potential as a powerful tool in enhancing LLMs' comprehension in complex dialogue tasks.

摘要
干预对话（TOD）系统通过多回对话来帮助用户执行各种活动，但大语言模型（LLM）经常在复杂的上下文中困难理解。在这项研究中，我们提出了一种新的“自我解释”提示策略，以提高LLM在多回对话中的理解能力。这种任务无关的方法需要模型在对话过程中分析每个语音，从而提高对各种对话中心任务的表现。六个基准数据集的实验结果表明，我们的方法在其他零批提示和几批提示之间具有优异的表现，并且能够与或超过几批提示的效果，这表明了这种方法在复杂对话任务中提高LLM的理解能力的潜力。

Frustrated with Code Quality Issues? LLMs can Help!

paper_url: http://arxiv.org/abs/2309.12938
repo_url: None
paper_authors: Nalin Wadhwa, Jui Pradhan, Atharv Sonwane, Surya Prakash Sahu, Nagarajan Natarajan, Aditya Kanade, Suresh Parthasarathy, Sriram Rajamani
for: 这种论文主要是为了提高代码质量，提高软件的可靠性、维护性和安全性。
methods: 这个论文使用了大型自然语言模型（LLM）来帮助开发者修复代码质量问题。具体来说，这个工具使用了一对LLM组成的“推荐-评分”结构，其中一个LLM提供修复建议，另一个LLM则根据开发者的acceptance criterion评分这些建议。
results: 这个论文的实验结果显示，使用CORE工具可以提高Python文件的修复率达59.2%，同时减少了 False Positive 的比例。此外，在Java文件中，CORE工具可以达到76.8%的修复率，与专门的程序修复工具相当。

Abstract
As software projects progress, quality of code assumes paramount importance as it affects reliability, maintainability and security of software. For this reason, static analysis tools are used in developer workflows to flag code quality issues. However, developers need to spend extra efforts to revise their code to improve code quality based on the tool findings. In this work, we investigate the use of (instruction-following) large language models (LLMs) to assist developers in revising code to resolve code quality issues. We present a tool, CORE (short for COde REvisions), architected using a pair of LLMs organized as a duo comprised of a proposer and a ranker. Providers of static analysis tools recommend ways to mitigate the tool warnings and developers follow them to revise their code. The \emph{proposer LLM} of CORE takes the same set of recommendations and applies them to generate candidate code revisions. The candidates which pass the static quality checks are retained. However, the LLM may introduce subtle, unintended functionality changes which may go un-detected by the static analysis. The \emph{ranker LLM} evaluates the changes made by the proposer using a rubric that closely follows the acceptance criteria that a developer would enforce. CORE uses the scores assigned by the ranker LLM to rank the candidate revisions before presenting them to the developer. CORE could revise 59.2% Python files (across 52 quality checks) so that they pass scrutiny by both a tool and a human reviewer. The ranker LLM is able to reduce false positives by 25.8% in these cases. CORE produced revisions that passed the static analysis tool in 76.8% Java files (across 10 quality checks) comparable to 78.3% of a specialized program repair tool, with significantly much less engineering efforts.

摘要
随着软件项目的进行，代码质量的重要性日益增加，因为它直接影响软件的可靠性、维护性和安全性。为此，开发者在开发过程中使用静态分析工具来检测代码质量问题。然而，开发者需要额外努力来修改代码，以便通过工具的检测。在这个工作中，我们研究了使用大型自然语言模型（LLM）来帮助开发者修改代码，以解决代码质量问题。我们提出了一个工具，称为 CORE（简称代码修订），其核心思想是使用一对LLM组成的“提案-评分”机制。提案LLM使用同样的推荐方法，将static分析工具的推荐改进应用于代码修订。提案LLM生成的候选修订检查通过静态质量检查。然而，LLM可能引入微妙的、意外的功能变化，这些变化可能被静态分析工具忽略。评分LLM使用一个仅次于开发者的接受标准来评价修订。CORE使用评分LLM的分数来排序候选修订，然后向开发者展示。CORE可以在52个质量检查中，对59.2%的Python文件进行修订，使其通过静态分析工具和人工审查。评分LLM可以在这些案例中减少false positives的比例为25.8%。CORE生成的修订可以在76.8%的Java文件中通过静态分析工具，与专门的修复工具相当，但需要的工程努力明显更少。

On Separate Normalization in Self-supervised Transformers

paper_url: http://arxiv.org/abs/2309.12931
repo_url: None
paper_authors: Xiaohui Chen, Yinkai Wang, Yuanqi Du, Soha Hassoun, Li-Ping Liu
for: 本文提出了一种简单的修改，即在masked autoencoders（MAE）中使用分开的normalization层来更好地捕捉token和[CLS]符号的不同特征，以提高下游任务性能。
methods: 本文提出的方法是，在MAE模型中，为token和[CLS]符号分别使用分开的normalization层，以便更好地捕捉它们的不同特征。
results: 经验表明，通过使用分开的normalization层，[CLS] embedding可以更好地编码全局Contextual信息，并且在它的不规则空间中分布更加均匀。 replaced conventional normalization layer with two separate layers, we observe an average performance improvement of 2.7% over the image, natural language, and graph domains.

Abstract
Self-supervised training methods for transformers have demonstrated remarkable performance across various domains. Previous transformer-based models, such as masked autoencoders (MAE), typically utilize a single normalization layer for both the [CLS] symbol and the tokens. We propose in this paper a simple modification that employs separate normalization layers for the tokens and the [CLS] symbol to better capture their distinct characteristics and enhance downstream task performance. Our method aims to alleviate the potential negative effects of using the same normalization statistics for both token types, which may not be optimally aligned with their individual roles. We empirically show that by utilizing a separate normalization layer, the [CLS] embeddings can better encode the global contextual information and are distributed more uniformly in its anisotropic space. When replacing the conventional normalization layer with the two separate layers, we observe an average 2.7% performance improvement over the image, natural language, and graph domains.

摘要
自我超vision方法对transformer模型表现非常出色，在不同领域中达到了优秀的result。以前的transformer模型，如masked autoencoders（MAE），通常使用单个normalization层来处理[CLS]符号和token。我们在这篇论文中提出了一个简单的修改，即在token和[CLS]符号之间使用分开的normalization层，以更好地捕捉它们的特点和提高下游任务表现。我们的方法的目标是解决使用同一个normalization统计数据来处理token和[CLS]符号可能存在的负面影响，这可能不是最佳的对齐。我们在实验中发现，通过使用两个分开的normalization层，[CLS]嵌入可以更好地编码全局上下文信息，并且在其不对称空间中分布更加均匀。当将传统的normalization层替换为两个分开的normalization层时，我们在图像、自然语言和图形领域的average表现提高了2.7%。

Lamarck’s Revenge: Inheritance of Learned Traits Can Make Robot Evolution Better

paper_url: http://arxiv.org/abs/2309.13099
repo_url: None
paper_authors: Jie Luo, Karine Miras, Jakub Tomczak, Agoston E. Eiben
for: 研究“如果18世纪生物学家拉马克不完全错误，个体特征通过遗传继承给后代？”问题。
methods: 使用进化机器人框架进行模拟，其中机器人体型（身体）和控制器（大脑）都可以进化，同时机器人也可以通过学习在生命中提高控制器。
results: 对拉马克主义系统和达尔文主义系统进行比较，发现拉马克主义系统会增强机器人体型智能的出现，并确定了这种成功的原因：新生机器人的遗传大脑与身体更好匹配，因此其遗传率比达尔文主义系统高。

Abstract
Evolutionary robot systems offer two principal advantages: an advanced way of developing robots through evolutionary optimization and a special research platform to conduct what-if experiments regarding questions about evolution. Our study sits at the intersection of these. We investigate the question ``What if the 18th-century biologist Lamarck was not completely wrong and individual traits learned during a lifetime could be passed on to offspring through inheritance?'' We research this issue through simulations with an evolutionary robot framework where morphologies (bodies) and controllers (brains) of robots are evolvable and robots also can improve their controllers through learning during their lifetime. Within this framework, we compare a Lamarckian system, where learned bits of the brain are inheritable, with a Darwinian system, where they are not. Analyzing simulations based on these systems, we obtain new insights about Lamarckian evolution dynamics and the interaction between evolution and learning. Specifically, we show that Lamarckism amplifies the emergence of `morphological intelligence', the ability of a given robot body to acquire a good brain by learning, and identify the source of this success: `newborn' robots have a higher fitness because their inherited brains match their bodies better than those in a Darwinian system.

摘要
生化机器系统提供了两大优势：一是通过进化优化发展机器人的先进方法，二是一种特殊的研究平台来进行关于进化的问题的什么样的实验。我们的研究位于这两个方面之间。我们研究“如果18世纪的生物学家拉马克不完全错误，个体特征在生命周期中学习得来的不能被遗传下来？”这个问题，通过使用进化机器人框架进行模拟，在这个框架中，机器人的形态（身体）和控制器（脑）都可以进化，同时机器人也可以通过学习提高控制器的性能。在这个框架中，我们比较了拉马克主义系统和达尔文主义系统两种不同的进化方式。通过分析这些系统的模拟结果，我们获得了新的理解关于拉马克主义进化动力学和进化和学习之间的互动。具体来说，我们发现拉马克主义会增强机器人身体的智能化，即机器人身体可以通过学习获得一个好的脑。而这种成功的原因是“新生”机器人的遗传因素更好地匹配其身体，因此它们在达尔文主义系统中的遗传因素更高。

A matter of attitude: Focusing on positive and active gradients to boost saliency maps

paper_url: http://arxiv.org/abs/2309.12913
repo_url: https://github.com/oscarllorente/positive_active_saliency_maps
paper_authors: Oscar Llorente, Jaime Boal, Eugenio F. Sánchez-Úbeda
for: 这篇论文旨在探讨如何通过修复权重映射的缺陷，提高多类分类问题中神经网络的解释性。
methods: 该论文使用了修复权重映射的方法，通过恢复权重映射中的符号，提高了对多类分类问题中神经网络的解释性。
results: 研究发现，通过考虑正确类和其他类之间的关系，可以更好地了解神经网络对图像中各个像素的关注。此外，隐藏或改变这些像素会对结果产生什么影响也变得更加清晰。

Abstract
Saliency maps have become one of the most widely used interpretability techniques for convolutional neural networks (CNN) due to their simplicity and the quality of the insights they provide. However, there are still some doubts about whether these insights are a trustworthy representation of what CNNs use to come up with their predictions. This paper explores how rescuing the sign of the gradients from the saliency map can lead to a deeper understanding of multi-class classification problems. Using both pretrained and trained from scratch CNNs we unveil that considering the sign and the effect not only of the correct class, but also the influence of the other classes, allows to better identify the pixels of the image that the network is really focusing on. Furthermore, how occluding or altering those pixels is expected to affect the outcome also becomes clearer.

摘要
静观地図（Saliency map）已成为 convolutional neural network（CNN）的解释技术中最广泛使用的一种，这主要是因为它的简单性和解释的质量。然而，有些人仍存在对静观地図是否准确反映CNN的预测过程中的信息的 doubts。这篇文章探讨了如何从静观地図中救出梯度的正负信息，以便更深入地理解多类分类问题。我们使用预训练和从scratch预测的CNN，发现考虑正负信息和其他类的影响，可以更好地定位图像中网络是真正关注的像素。此外，对这些像素进行遮盖或修改也会对结果产生什么影响也变得更加清楚。

KG-MDL: Mining Graph Patterns in Knowledge Graphs with the MDL Principle

paper_url: http://arxiv.org/abs/2309.12908
repo_url: None
paper_authors: Francesco Bariatti, Peggy Cellier, Sébastien Ferré
for: 本文targets the problem of extracting meaningful graph patterns from large knowledge graphs (KGs), which are difficult to mine due to their size and complexity.
methods: 本文提出了一种基于最小描述长度（MDL）原理的图Pattern mining approach called KG-MDL, which generates a human-sized and descriptive set of graph patterns for a given KG.
results: 实验表明，KG-MDL可以生成小 enough to be interpreted by humans yet descriptive of the KG的pattern set, highlighting both the schema used to create the data and the concrete facts it contains.

Abstract
Nowadays, increasingly more data are available as knowledge graphs (KGs). While this data model supports advanced reasoning and querying, they remain difficult to mine due to their size and complexity. Graph mining approaches can be used to extract patterns from KGs. However this presents two main issues. First, graph mining approaches tend to extract too many patterns for a human analyst to interpret (pattern explosion). Second, real-life KGs tend to differ from the graphs usually treated in graph mining: they are multigraphs, their vertex degrees tend to follow a power-law, and the way in which they model knowledge can produce spurious patterns. Recently, a graph mining approach named GraphMDL+ has been proposed to tackle the problem of pattern explosion, using the Minimum Description Length (MDL) principle. However, GraphMDL+, like other graph mining approaches, is not suited for KGs without adaptations. In this paper we propose KG-MDL, a graph pattern mining approach based on the MDL principle that, given a KG, generates a human-sized and descriptive set of graph patterns, and so in a parameter-less and anytime way. We report on experiments on medium-sized KGs showing that our approach generates sets of patterns that are both small enough to be interpreted by humans and descriptive of the KG. We show that the extracted patterns highlight relevant characteristics of the data: both of the schema used to create the data, and of the concrete facts it contains. We also discuss the issues related to mining graph patterns on knowledge graphs, as opposed to other types of graph data.

摘要
现在，知识图（KG）中的数据越来越多，这种数据模型支持高级推理和查询，但它们仍然具有困难 mine 的问题。图минаING Approaches可以提取图中的模式，但这有两个主要问题。首先，图MINING Approaches通常会提取太多的模式，让人类分析者难以处理（pattern explosion）。其次，实际的 KG 与通常在图MINING 中处理的图不同：它们是多 graphs，顶点度遵循力� law，以及知识表示方式可能会生成假Pattern。最近，一种名为 GraphMDL+ 的图MINING Approach 被提出，使用最小描述长度（MDL）原理来解决pattern explosion问题。然而，GraphMDL+ 和其他图MINING Approaches不适用于 KG 而不需要更多的参数。在这篇论文中，我们提出了基于 MDL 原理的图模式挖掘方法，可以给 KG 生成一个人类可以理解的、描述性的图模式集，并且在无参数和实时的情况下进行。我们对中等规模的 KG 进行了实验，并证明了我们的方法可以生成小 enough 且描述性的图模式集，并且这些模式集可以高亮 KG 中的schema和具体事实。我们还讨论了对知识图进行图模式挖掘的问题，与其他类型的图数据进行比较。

ProtoEM: A Prototype-Enhanced Matching Framework for Event Relation Extraction

paper_url: http://arxiv.org/abs/2309.12892
repo_url: None
paper_authors: Zhilei Hu, Zixuan Li, Daozhu Xu, Long Bai, Cheng Jin, Xiaolong Jin, Jiafeng Guo, Xueqi Cheng
for: 本研究旨在提取文本中的多种事件关系，并尝试更好地捕捉这些关系的内在 semantics。
methods: 本研究使用 Prototype-Enhanced Matching (ProtoEM) 框架，包括 prototype representing 和 prototype matching 两个步骤。在第一步中，使用例子来表示不同类型的事件关系的词义特征。在第二步中，使用图 neural network (GNN) 模块来模型事件关系之间的依赖关系。
results: 实验结果表明，ProtoEM 框架可以有效地表示事件关系的词义特征，并在 MAVEN-ERE 数据集上具有显著的提高效果 compared to baseline models。

Abstract
Event Relation Extraction (ERE) aims to extract multiple kinds of relations among events in texts. However, existing methods singly categorize event relations as different classes, which are inadequately capturing the intrinsic semantics of these relations. To comprehensively understand their intrinsic semantics, in this paper, we obtain prototype representations for each type of event relation and propose a Prototype-Enhanced Matching (ProtoEM) framework for the joint extraction of multiple kinds of event relations. Specifically, ProtoEM extracts event relations in a two-step manner, i.e., prototype representing and prototype matching. In the first step, to capture the connotations of different event relations, ProtoEM utilizes examples to represent the prototypes corresponding to these relations. Subsequently, to capture the interdependence among event relations, it constructs a dependency graph for the prototypes corresponding to these relations and utilized a Graph Neural Network (GNN)-based module for modeling. In the second step, it obtains the representations of new event pairs and calculates their similarity with those prototypes obtained in the first step to evaluate which types of event relations they belong to. Experimental results on the MAVEN-ERE dataset demonstrate that the proposed ProtoEM framework can effectively represent the prototypes of event relations and further obtain a significant improvement over baseline models.

摘要

prototype representing：通过使用例子来表示不同的事件关系的核心含义。2. prototype matching：使用一个基于图 neural network (GNN) 的模块来模型事件关系之间的依赖关系，然后将新的事件对比与已有的原型来评估它们属于哪种事件关系。实验结果表明，ProtoEM 框架可以有效地表示事件关系的原型，并在基eline模型上获得了显著改进。

Gravity Network for end-to-end small lesion detection

paper_url: http://arxiv.org/abs/2309.12876
repo_url: https://github.com/cirorusso2910/gravitynet
paper_authors: Ciro Russo, Alessandro Bria, Claudio Marrocco
for: 检测医疗影像中的小 lesion
methods: 提出了一种新的一stage末端检测器，通过引入新的像素基于气场点，动态追踪目标小 lesion进行检测
results: 在两个常见的医疗影像任务中（数字乳腺影像和数字胆囊影像），方法展现出了有效地检测小 lesion的表现

Abstract
This paper introduces a novel one-stage end-to-end detector specifically designed to detect small lesions in medical images. Precise localization of small lesions presents challenges due to their appearance and the diverse contextual backgrounds in which they are found. To address this, our approach introduces a new type of pixel-based anchor that dynamically moves towards the targeted lesion for detection. We refer to this new architecture as GravityNet, and the novel anchors as gravity points since they appear to be "attracted" by the lesions. We conducted experiments on two well-established medical problems involving small lesions to evaluate the performance of the proposed approach: microcalcifications detection in digital mammograms and microaneurysms detection in digital fundus images. Our method demonstrates promising results in effectively detecting small lesions in these medical imaging tasks.

摘要
这篇论文介绍了一种新的一stage端到端检测器，用于检测医疗图像中的小肿。由于小肿的外观和背景的多样性，精准地定位小肿呈现了挑战。为解决这个问题，我们的方法引入了一种新的像素基的锚点，这些锚点在检测过程中会动态向着目标小肿移动。我们称这种新架构为重力网络（GravityNet），这些新锚点为重力点（Gravity Point），因为它们看起来像是吸引小肿的。我们在两个常见的医学问题中进行了实验：数字乳腺癌检测和数字血管图像中的微血管检测。我们的方法在这些医学检测任务中表现出色，能够有效地检测小肿。

AnglE-optimized Text Embeddings

paper_url: http://arxiv.org/abs/2309.12871
repo_url: https://github.com/SeanLee97/AnglE
paper_authors: Xianming Li, Jing Li
for: 提高Semantic Textual Similarity（STS）任务的质量文本嵌入，帮助提高Large Language Model（LLM）应用程序的性能。
methods: 提出了一种新的角度优化文本嵌入模型AnglE，通过引入角度优化来解决cosine函数中的恶性区域问题，从而提高 gradient 的演化和优化过程。
results: 对于短文本STS任务和新收集的长文本STS任务以及域специфи STS任务，AnglE 比State-of-the-art（SOTA）STS模型表现更好， demonstrating the ability of AnglE to generate high-quality text embeddings and the usefulness of angle optimization in STS.

Abstract
High-quality text embedding is pivotal in improving semantic textual similarity (STS) tasks, which are crucial components in Large Language Model (LLM) applications. However, a common challenge existing text embedding models face is the problem of vanishing gradients, primarily due to their reliance on the cosine function in the optimization objective, which has saturation zones. To address this issue, this paper proposes a novel angle-optimized text embedding model called AnglE. The core idea of AnglE is to introduce angle optimization in a complex space. This novel approach effectively mitigates the adverse effects of the saturation zone in the cosine function, which can impede gradient and hinder optimization processes. To set up a comprehensive STS evaluation, we experimented on existing short-text STS datasets and a newly collected long-text STS dataset from GitHub Issues. Furthermore, we examine domain-specific STS scenarios with limited labeled data and explore how AnglE works with LLM-annotated data. Extensive experiments were conducted on various tasks including short-text STS, long-text STS, and domain-specific STS tasks. The results show that AnglE outperforms the state-of-the-art (SOTA) STS models that ignore the cosine saturation zone. These findings demonstrate the ability of AnglE to generate high-quality text embeddings and the usefulness of angle optimization in STS.

摘要
高品质文本嵌入是 LLM 应用中关键的Semantic Textual Similarity (STS) 任务的提高的关键，但是现有的文本嵌入模型面临着让 Gradient 消失的挑战，主要是因为它们对 cosine 函数在优化目标中的使用，这会导致缺失区域。为解决这个问题，本文提出了一种新的角度优化文本嵌入模型 called AnglE。 AnglE 的核心思想是在复杂空间中引入角度优化。这种新的方法可以有效地减轻 cosine 函数的缺失区域的影响，从而使得梯度和优化过程得到改善。为进行全面的 STS 评估，我们对现有的短文本 STS 数据集和从 GitHub Issues 上收集的新的长文本 STS 数据集进行实验。此外，我们还研究了受限制的数据集和使用 LLM 标注数据的域特性 STS 场景。我们进行了各种任务，包括短文本 STS、长文本 STS 和域特性 STS 任务。结果表明，AnglE 在忽略 cosine 缺失区域的 SOTA STS 模型之上表现出色，这些发现证明了 AnglE 的能力生成高质量文本嵌入和角度优化在 STS 中的有用性。

Accurate and Fast Compressed Video Captioning

paper_url: http://arxiv.org/abs/2309.12867
repo_url: https://github.com/acherstyx/CoCap
paper_authors: Yaojie Shen, Xin Gu, Kai Xu, Heng Fan, Longyin Wen, Libo Zhang
for: 这种论文是为了提出一种新的视频描述方法，以解决现有视频描述方法中的扫描缺陷。
methods: 该方法使用了压缩视频域的特征，包括I帧、运动向量和差异，并设计了一种特殊的变换器来学习描述视频。
results: 该方法可以在不同的benchmark上达到状态 arts的性能，同时运行速度比现有方法快得多。

Abstract
Existing video captioning approaches typically require to first sample video frames from a decoded video and then conduct a subsequent process (e.g., feature extraction and/or captioning model learning). In this pipeline, manual frame sampling may ignore key information in videos and thus degrade performance. Additionally, redundant information in the sampled frames may result in low efficiency in the inference of video captioning. Addressing this, we study video captioning from a different perspective in compressed domain, which brings multi-fold advantages over the existing pipeline: 1) Compared to raw images from the decoded video, the compressed video, consisting of I-frames, motion vectors and residuals, is highly distinguishable, which allows us to leverage the entire video for learning without manual sampling through a specialized model design; 2) The captioning model is more efficient in inference as smaller and less redundant information is processed. We propose a simple yet effective end-to-end transformer in the compressed domain for video captioning that enables learning from the compressed video for captioning. We show that even with a simple design, our method can achieve state-of-the-art performance on different benchmarks while running almost 2x faster than existing approaches. Code is available at https://github.com/acherstyx/CoCap.

摘要
传统的视频描述方法通常需要先从解码后的视频抽取一些帧并进行后续处理（例如特征提取和/或描述模型学习）。在这个管道中，人工抽取帧可能会忽略视频中的关键信息，从而降低性能。此外，抽取的帧中可能包含重复的信息，导致在视频描述中的效率低下。为了解决这些问题，我们从压缩频道的视角研究视频描述，它带来了多重优势：1）相比于解码后的原始图像，压缩视频中的I-帧、运动向量和差异，具有高度可识别性，允许我们通过特殊的模型设计不需要人工抽取来学习整个视频;2）描述模型在推理中更高效，因为处理的信息更小和更少重复。我们提出了一种简单 yet 有效的终端转换器在压缩频道中进行视频描述，允许我们从压缩视频中学习描述。我们示出了我们的方法可以在不同的标准上达到领先的性能，同时运行速度比现有方法快得多于2倍。代码可以在https://github.com/acherstyx/CoCap中获取。

Domain Adaptation for Arabic Machine Translation: The Case of Financial Texts

paper_url: http://arxiv.org/abs/2309.12863
repo_url: None
paper_authors: Emad A. Alghamdi, Jezia Zakraoui, Fares A. Abanmy
for: 本研究目的是探讨阿拉伯语机器翻译（AMT）在未曾探讨的领域中的效果，具体来说是在金融新闻文章的翻译中。
methods: 我们开发了一个精心制作的阿拉伯语-英语（AR-EN）翻译并 fine-tuning 多种预训练的神经机器翻译和大语言模型，包括 ChatGPT-3.5 Turbo。
results: 我们发现，只需要一些良好的AR-EN对应的域 adaptation segments， fine-tuning 就能够成功。 ChatGPT 的翻译质量较高，超过其他模型的自动和人工评估。这是首次将 ChatGPT fine-tuning 应用于金融领域域传输学习。

Abstract
Neural machine translation (NMT) has shown impressive performance when trained on large-scale corpora. However, generic NMT systems have demonstrated poor performance on out-of-domain translation. To mitigate this issue, several domain adaptation methods have recently been proposed which often lead to better translation quality than genetic NMT systems. While there has been some continuous progress in NMT for English and other European languages, domain adaption in Arabic has received little attention in the literature. The current study, therefore, aims to explore the effectiveness of domain-specific adaptation for Arabic MT (AMT), in yet unexplored domain, financial news articles. To this end, we developed carefully a parallel corpus for Arabic-English (AR- EN) translation in the financial domain for benchmarking different domain adaptation methods. We then fine-tuned several pre-trained NMT and Large Language models including ChatGPT-3.5 Turbo on our dataset. The results showed that the fine-tuning is successful using just a few well-aligned in-domain AR-EN segments. The quality of ChatGPT translation was superior than other models based on automatic and human evaluations. To the best of our knowledge, this is the first work on fine-tuning ChatGPT towards financial domain transfer learning. To contribute to research in domain translation, we made our datasets and fine-tuned models available at https://huggingface.co/asas-ai/.

摘要
神经机器翻译（NMT）在大规模训练 corpora 上显示了很好的性能。然而，通用 NMT 系统在域外翻译中表现不佳。为了解决这个问题，一些域 adaptation 方法在过去几年得到了广泛的关注，并常常比通用 NMT 系统的翻译质量更高。英语和其他欧洲语言的 NMT 曾经得到了相当的研究，但在阿拉伯语中的域 adaptation 却受到了 littlet 的关注。因此，本研究的目的是探索在金融新闻文章中使用域adaptation 技术来提高阿拉伯语翻译质量。为此，我们开发了一个精心制作的阿拉伯语-英语（AR-EN）翻译 parallel corpus，并在这个dataset上练习了多种域 adaptation 方法。我们还对多种预训练的 NMT 和 Large Language 模型进行了练习，包括 ChatGPT-3.5 Turbo。结果表明，只需要几个Well-aligned in-domain AR-EN段，我们可以成功地进行了练习。ChatGPT 的翻译质量比其他模型更高，根据自动和人类评估。据我们所知，这是首次在金融域转移学习中使用 ChatGPT 进行了 fine-tuning。为了贡献到域翻译研究，我们将我们的dataset和练习模型上传到了https://huggingface.co/asas-ai/。

Diffusion Augmentation for Sequential Recommendation

paper_url: http://arxiv.org/abs/2309.12858
repo_url: https://github.com/liuqidong07/diffuasr
paper_authors: Qidong Liu, Fan Yan, Xiangyu Zhao, Zhaocheng Du, Huifeng Guo, Ruiming Tang, Feng Tian
for: 这篇论文的目的是解决紧缩式推荐（SRS）中的数据罕见问题和长尾用户问题。
methods: 这篇论文提出了一种叫做散射增强（DiffuASR）的方法，它可以生成高质量的增强数据，并且可以直接使用这些增强数据来训练紧缩式推荐模型。
results: 根据实验结果，DiffuASR 能够有效地解决紧缩式推荐中的数据罕见问题和长尾用户问题，并且可以提高紧缩式推荐模型的表现。

Abstract
Sequential recommendation (SRS) has become the technical foundation in many applications recently, which aims to recommend the next item based on the user's historical interactions. However, sequential recommendation often faces the problem of data sparsity, which widely exists in recommender systems. Besides, most users only interact with a few items, but existing SRS models often underperform these users. Such a problem, named the long-tail user problem, is still to be resolved. Data augmentation is a distinct way to alleviate these two problems, but they often need fabricated training strategies or are hindered by poor-quality generated interactions. To address these problems, we propose a Diffusion Augmentation for Sequential Recommendation (DiffuASR) for a higher quality generation. The augmented dataset by DiffuASR can be used to train the sequential recommendation models directly, free from complex training procedures. To make the best of the generation ability of the diffusion model, we first propose a diffusion-based pseudo sequence generation framework to fill the gap between image and sequence generation. Then, a sequential U-Net is designed to adapt the diffusion noise prediction model U-Net to the discrete sequence generation task. At last, we develop two guide strategies to assimilate the preference between generated and origin sequences. To validate the proposed DiffuASR, we conduct extensive experiments on three real-world datasets with three sequential recommendation models. The experimental results illustrate the effectiveness of DiffuASR. As far as we know, DiffuASR is one pioneer that introduce the diffusion model to the recommendation.

摘要
受sequential recommendation（SRS）技术支持，许多应用程序在最近得到了技术基础。SRS的目标是根据用户的历史交互来推荐下一个项目，但是SRS经常面临数据稀缺的问题，这种问题广泛存在于推荐系统中。另外，大多数用户只与几个项目进行交互，而现有的SRS模型往往不能满足这些用户。这种问题被称为长尾用户问题，仍需解决。数据扩展是一种有效的解决方法，但是它们通常需要复杂的训练策略或低质量生成的交互。为解决这些问题，我们提出了一种增强的数据扩展方法，即增强数据扩展（DiffuASR），可以为高质量生成提供基础。DiffuASR生成的扩展数据可以直接用于训练SRS模型，不需训练复杂的策略。为了利用扩展模型的生成能力，我们首先提出了一种基于扩散的pseudo序列生成框架，用于填充图像和序列生成之间的差异。然后，我们设计了一种序列U-Net，用于适应扩散噪声预测模型U-Net。最后，我们开发了两种引导策略，以便在生成和原始序列之间融合喜好。为验证我们提出的DiffuASR，我们在三个真实世界数据集上进行了广泛的实验，使用三种序列推荐模型。实验结果表明，DiffuASR是有效的。到目前为止，DiffuASR是对推荐领域中的扩展模型做出的开创性贡献。

AxOCS: Scaling FPGA-based Approximate Operators using Configuration Supersampling

paper_url: http://arxiv.org/abs/2309.12830
repo_url: None
paper_authors: Siva Satyendra Sahoo, Salim Ullah, Soumyo Bhattacharjee, Akash Kumar
For: 这个研究旨在提供一个基于机器学习的精简数据类型设计方法，以减少嵌入式系统中的机器学习实现成本。* Methods: 本研究使用了机器学习基于设计空间探索技术，并利用了机器学习方法来预测PPA和BEHAV的影响。* Results: 实验结果显示，提案的AxOCS方法可以对FPGA优化的精简数据类型进行多目标优化，并获得了较高的品质结果范围。

Abstract
The rising usage of AI and ML-based processing across application domains has exacerbated the need for low-cost ML implementation, specifically for resource-constrained embedded systems. To this end, approximate computing, an approach that explores the power, performance, area (PPA), and behavioral accuracy (BEHAV) trade-offs, has emerged as a possible solution for implementing embedded machine learning. Due to the predominance of MAC operations in ML, designing platform-specific approximate arithmetic operators forms one of the major research problems in approximate computing. Recently there has been a rising usage of AI/ML-based design space exploration techniques for implementing approximate operators. However, most of these approaches are limited to using ML-based surrogate functions for predicting the PPA and BEHAV impact of a set of related design decisions. While this approach leverages the regression capabilities of ML methods, it does not exploit the more advanced approaches in ML. To this end, we propose AxOCS, a methodology for designing approximate arithmetic operators through ML-based supersampling. Specifically, we present a method to leverage the correlation of PPA and BEHAV metrics across operators of varying bit-widths for generating larger bit-width operators. The proposed approach involves traversing the relatively smaller design space of smaller bit-width operators and employing its associated Design-PPA-BEHAV relationship to generate initial solutions for metaheuristics-based optimization for larger operators. The experimental evaluation of AxOCS for FPGA-optimized approximate operators shows that the proposed approach significantly improves the quality-resulting hypervolume for multi-objective optimization-of 8x8 signed approximate multipliers.

摘要
随着人工智能和机器学习处理的应用领域扩大，低成本机器学习实现成为了一个紧迫的需求，特别是 для具有限制的嵌入式系统。为此，精确计算，一种探讨功能、性能、面积（PPA）和行为准确率（BEHAV）的负担规划，已经出现为实现嵌入式机器学习的可能解决方案。由于机器学习中的主要操作是 MAC 操作，因此设计特定平台的精确计算操作成为了研究中的主要问题。最近，人工智能/机器学习基于的设计空间探索技术已经广泛应用于实现精确操作。然而，大多数这些方法仅通过使用机器学习基于的代表函数预测 PPA 和 BEHAV 指标的影响来进行设计。这种方法利用机器学习方法的回归能力，但并不利用更高级别的机器学习方法。为此，我们提出了 AxOCS，一种通过机器学习基于supersampling的精确计算操作设计方法。具体来说，我们提出了一种利用不同位数的运算符之间的 correlation 关系来生成更大位数的运算符的方法。我们的方法是在更小的设计空间中查找更小的位数运算符的解决方案，并使用其相关的 Design-PPA-BEHAV 关系来生成初始的优化解决方案。我们的实验表明，AxOCS 可以对 FPGA 优化的精确 multiply 操作进行多目标优化，并显著提高了质量结果的卷积体积。

Synthetic Boost: Leveraging Synthetic Data for Enhanced Vision-Language Segmentation in Echocardiography

paper_url: http://arxiv.org/abs/2309.12829
repo_url: https://github.com/naamiinepal/synthetic-boost
paper_authors: Rabin Adhikari, Manish Dhakal, Safal Thapaliya, Kanchan Poudel, Prasiddha Bhandari, Bishesh Khanal
for: 本研究旨在提高echocardiography图像分割的准确率，使用视力语言分割模型（VLSM），并利用Semantic Diffusion Models（SDM）生成的synthetic图像来增强VLSM的表现。
methods: 本研究使用了两种popular VLSMs（CLIPSeg和CRIS），并使用了七种不同的语言提示（derived from several attributes，自动提取自echocardiography图像、分割masks和metadata）。
results: 研究结果表明，在使用SDM生成的synthetic图像进行预训练后，VLSM的表现得到了提高，并且 convergence faster。code、配置和提示可以在https://github.com/naamiinepal/synthetic-boost上获取。

Abstract
Accurate segmentation is essential for echocardiography-based assessment of cardiovascular diseases (CVDs). However, the variability among sonographers and the inherent challenges of ultrasound images hinder precise segmentation. By leveraging the joint representation of image and text modalities, Vision-Language Segmentation Models (VLSMs) can incorporate rich contextual information, potentially aiding in accurate and explainable segmentation. However, the lack of readily available data in echocardiography hampers the training of VLSMs. In this study, we explore using synthetic datasets from Semantic Diffusion Models (SDMs) to enhance VLSMs for echocardiography segmentation. We evaluate results for two popular VLSMs (CLIPSeg and CRIS) using seven different kinds of language prompts derived from several attributes, automatically extracted from echocardiography images, segmentation masks, and their metadata. Our results show improved metrics and faster convergence when pretraining VLSMs on SDM-generated synthetic images before finetuning on real images. The code, configs, and prompts are available at https://github.com/naamiinepal/synthetic-boost.

摘要
准确的分割是诊断卡地狱疾病（CVD）的关键。然而，医生之间的差异和超声图像本身的挑战使得准确的分割受到阻碍。通过利用图像和文本Modalities的共同表示，视觉语言分割模型（VLSM）可以捕捉更多的上下文信息，可能帮助实现准确和可解释的分割。然而，有限的实际数据使得训练VLSM困难。在这项研究中，我们探讨使用Semantic Diffusion Models（SDM）生成的 sintetic dataset来增强VLSM的echocardiography分割。我们对两种流行的VLSM（CLIPSeg和CRIS）进行评估，使用七种不同的语言提示，自动从echocardiography图像、分割mask和其Metadata中提取的多种属性。我们的结果表明，在先行训练VLSM在SDM生成的 sintetic图像后，再进行 fine-tuning on real images 中，可以提高 metric 并且更快地 converges。代码、配置和提示可以在https://github.com/naamiinepal/synthetic-boost中找到。

OmniDrones: An Efficient and Flexible Platform for Reinforcement Learning in Drone Control

paper_url: http://arxiv.org/abs/2309.12825
repo_url: None
paper_authors: Botian Xu, Feng Gao, Chao Yu, Ruize Zhang, Yi Wu, Yu Wang
for: 本研究提出了一个高效可扩展的平台，用于在凝视掌控中应用强化学习，基于Nvidia的Omniverse Isaac Sim。
methods: 该平台采用底层设计方法，允许用户轻松设计和实验各种应用场景，并在GPU并行的 simulations 上进行加速。
results: 该平台提供了多种标准任务，包括单批静止、多批静止和过 actuated 系统跟踪等，并提供了一些广泛使用的RL基elines。作者还提供了一些先行结果，以示Platform的能力和支持未来研究。

Abstract
In this work, we introduce OmniDrones, an efficient and flexible platform tailored for reinforcement learning in drone control, built on Nvidia's Omniverse Isaac Sim. It employs a bottom-up design approach that allows users to easily design and experiment with various application scenarios on top of GPU-parallelized simulations. It also offers a range of benchmark tasks, presenting challenges ranging from single-drone hovering to over-actuated system tracking. In summary, we propose an open-sourced drone simulation platform, equipped with an extensive suite of tools for drone learning. It includes 4 drone models, 5 sensor modalities, 4 control modes, over 10 benchmark tasks, and a selection of widely used RL baselines. To showcase the capabilities of OmniDrones and to support future research, we also provide preliminary results on these benchmark tasks. We hope this platform will encourage further studies on applying RL to practical drone systems.

摘要
在这个工作中，我们介绍OmniDrones，一个高效和灵活的平台，适用于强化学习控制飞行器，基于Nvidia的Omniverse Isaac Sim。它采用底层设计方法，允许用户轻松地设计和实验各种应用场景，并在GPU平行化的 simulations 上进行了加速。它还提供了一系列的 benchmark 任务，展示了从单架飞行器停在位pto 多架飞行器跟踪系统的挑战。总之，我们提出了一个开源的飞行器 simulate平台，具有广泛的工具，用于飞行器学习。它包括4架飞行器模型、5种感知模式、4种控制方式、超过10个 benchmark 任务，以及一些通用的RL基elines。为了证明OmniDrones的能力和支持未来研究，我们也提供了先行结果这些 benchmark 任务。我们希望这个平台能够鼓励更多的研究人员通过应用RL来解决实际飞行器系统的问题。

A Spectral Theory of Neural Prediction and Alignment

paper_url: http://arxiv.org/abs/2309.12821
repo_url: None
paper_authors: Abdulkadir Canatar, Jenelle Feather, Albert Wakhloo, SueYeon Chung
for: 这个论文的目的是尝试理解深度神经网络如何预测神经活动。
methods: 该论文使用了一种新的理论框架，将泛化误差与模型活动的 спектраль偏好以及神经响应的对齐关系联系起来。
results: 研究发现，使用多种深度神经网络可以获得低级别神经预测误差，且每种神经网络都有不同的表达形式。这些结果表明，可以通过分析表达形式来了解神经网络如何捕捉神经活动。

Abstract
The representations of neural networks are often compared to those of biological systems by performing regression between the neural network responses and those measured from biological systems. Many different state-of-the-art deep neural networks yield similar neural predictions, but it remains unclear how to differentiate among models that perform equally well at predicting neural responses. To gain insight into this, we use a recent theoretical framework that relates the generalization error from regression to the spectral bias of the model activations and the alignment of the neural responses onto the learnable subspace of the model. We extend this theory to the case of regression between model activations and neural responses, and define geometrical properties describing the error embedding geometry. We test a large number of deep neural networks that predict visual cortical activity and show that there are multiple types of geometries that result in low neural prediction error as measured via regression. The work demonstrates that carefully decomposing representational metrics can provide interpretability of how models are capturing neural activity and points the way towards improved models of neural activity.

摘要
neural network 的表示方式 часто和生物系统相比，通过 regression 来比较 neural network 的响应和生物系统测量的响应。许多 state-of-the-art deep neural network 具有类似的预测性能，但是还不清楚如何区分这些模型在预测 neural network 响应的时候表现出类似的性能。为了增加这些信息，我们使用了最近的理论框架，将泛化误差从回归相关到模型活动的spectral bias和模型学习的子空间对齐。我们将这些理论扩展到模型活动和 neural network 之间的回归问题，并定义了 geometrical properties 描述错误嵌入几何。我们测试了许多 deep neural network，它们预测的视觉 Cortical activity 和实际测量的响应之间存在多种几何，这些几何都能够实现低级别预测错误。这项工作显示，细分表示度量可以提供如何模型捕捉 neural activity的解释，并指向了改进的 neural activity 模型。

Computational Natural Philosophy: A Thread from Presocratics through Turing to ChatGPT

paper_url: http://arxiv.org/abs/2309.13094
repo_url: None
paper_authors: Gordana Dodig-Crnkovic
for: 这篇论文旨在探讨计算自然哲学如何理解自然世界，以及如何通过计算机科学和人工智能技术来研究认知和智能。
methods: 这篇论文使用了计算机科学和人工智能的方法，包括深度神经网络和强化学习。
results: 这篇论文描述了一种基于深度神经网络的大语言模型（LLM），并通过人类反馈强化学习（RLHF）来训练这种模型。

Abstract
Modern computational natural philosophy conceptualizes the universe in terms of information and computation, establishing a framework for the study of cognition and intelligence. Despite some critiques, this computational perspective has significantly influenced our understanding of the natural world, leading to the development of AI systems like ChatGPT based on deep neural networks. Advancements in this domain have been facilitated by interdisciplinary research, integrating knowledge from multiple fields to simulate complex systems. Large Language Models (LLMs), such as ChatGPT, represent this approach's capabilities, utilizing reinforcement learning with human feedback (RLHF). Current research initiatives aim to integrate neural networks with symbolic computing, introducing a new generation of hybrid computational models.

摘要
现代计算自然哲学将宇宙视为信息和计算的框架，为认知和智能的研究提供了一个框架。虽有一些批评，但这种计算视角已经对自然世界的理解产生了深远的影响，导致了基于深度神经网络的AI系统，如ChatGPT。这个领域的进步得益于多学科研究的整合，将多个领域的知识融合到模拟复杂系统中。大语言模型（LLMs），如ChatGPT，表示这种方法的能力，通过人工增强学习（RLHF）。当前的研究启动旨在将神经网络与符号计算结合，推出一代新的混合计算模型。

Masking Improves Contrastive Self-Supervised Learning for ConvNets, and Saliency Tells You Where

paper_url: http://arxiv.org/abs/2309.12757
repo_url: None
paper_authors: Zhi-Yi Chin, Chieh-Ming Jiang, Ching-Chun Huang, Pin-Yu Chen, Wei-Chen Chiu
for: 提高 ConvNet 自动学习效果，使其能够更好地利用掩蔽操作进行自我超vision transformer 的学习。
methods: 在 ConvNet 框架中引入掩蔽操作，并考虑图像中重要对象的焦点分布，以避免掩蔽操作对图像的损害。另外，引入硬negative sample，使用更大的掩蔽区域来提高对抗样本的精度。
results: 在多个数据集、对抗学习机制和下游任务中，提出的方法在许多情况下表现出优于多个基eline。

Abstract
While image data starts to enjoy the simple-but-effective self-supervised learning scheme built upon masking and self-reconstruction objective thanks to the introduction of tokenization procedure and vision transformer backbone, convolutional neural networks as another important and widely-adopted architecture for image data, though having contrastive-learning techniques to drive the self-supervised learning, still face the difficulty of leveraging such straightforward and general masking operation to benefit their learning process significantly. In this work, we aim to alleviate the burden of including masking operation into the contrastive-learning framework for convolutional neural networks as an extra augmentation method. In addition to the additive but unwanted edges (between masked and unmasked regions) as well as other adverse effects caused by the masking operations for ConvNets, which have been discussed by prior works, we particularly identify the potential problem where for one view in a contrastive sample-pair the randomly-sampled masking regions could be overly concentrated on important/salient objects thus resulting in misleading contrastiveness to the other view. To this end, we propose to explicitly take the saliency constraint into consideration in which the masked regions are more evenly distributed among the foreground and background for realizing the masking-based augmentation. Moreover, we introduce hard negative samples by masking larger regions of salient patches in an input image. Extensive experiments conducted on various datasets, contrastive learning mechanisms, and downstream tasks well verify the efficacy as well as the superior performance of our proposed method with respect to several state-of-the-art baselines.

摘要
而图像数据在使用简单 yet effective的自我超vised学习算法基于masking和自我重建目标时，卷积神经网络作为图像数据的另一个重要和广泛采用的建筑物，虽然通过对它们的contrastive学习技术进行驱动，仍然面临masking操作不能够帮助它们的学习过程获得显著改进的问题。在这项工作中，我们想要解除对ConvNet的contrastive学习框架中masking操作的添加，以避免在contrastive样本对中 randomly sampling masking区域导致重要/焦点对象上的排序。此外，我们还发现在一个视图中，随机 sampling masking区域可能会过度集中在重要/焦点对象上，从而导致对另一个视图的错误对比性。为了解决这个问题，我们提出了一种explicitly considering saliency constraint的方法，以确保masked区域在背景和前景中具有更均匀的分布。此外，我们还引入硬negative samples，通过在输入图像中masking更大的salient patches来提高对downstream任务的适应性。我们的提议方法在多种dataset、contrastive学习机制和下游任务上进行了广泛的实验，并证明了与一些state-of-the-art基eline相比，我们的方法具有更高的效果和性能。

Towards an MLOps Architecture for XAI in Industrial Applications

paper_url: http://arxiv.org/abs/2309.12756
repo_url: None
paper_authors: Leonhard Faubel, Thomas Woudsma, Leila Methnani, Amir Ghorbani Ghezeljhemeidan, Fabian Buelow, Klaus Schmid, Willem D. van Driel, Benjamin Kloepper, Andreas Theodorou, Mohsen Nosratinia, Magnus Bång
for: 本研究旨在提高机器学习操作（MLOps）中的模型解释和反馈能力，以提高用户信任和采纳率。
methods: 本研究使用了一种新的MLOps软件架构，包括在实际用例中实现了解释和反馈功能。
results: 该软件架构具有高效地管理ML模型生产环境的能力，同时允许 интеграción of 解释和反馈功能。

Abstract
Machine learning (ML) has become a popular tool in the industrial sector as it helps to improve operations, increase efficiency, and reduce costs. However, deploying and managing ML models in production environments can be complex. This is where Machine Learning Operations (MLOps) comes in. MLOps aims to streamline this deployment and management process. One of the remaining MLOps challenges is the need for explanations. These explanations are essential for understanding how ML models reason, which is key to trust and acceptance. Better identification of errors and improved model accuracy are only two resulting advantages. An often neglected fact is that deployed models are bypassed in practice when accuracy and especially explainability do not meet user expectations. We developed a novel MLOps software architecture to address the challenge of integrating explanations and feedback capabilities into the ML development and deployment processes. In the project EXPLAIN, our architecture is implemented in a series of industrial use cases. The proposed MLOps software architecture has several advantages. It provides an efficient way to manage ML models in production environments. Further, it allows for integrating explanations into the development and deployment processes.

摘要
To address this challenge, we developed a novel MLOps software architecture that integrates explanations and feedback capabilities into the ML development and deployment processes. This architecture was implemented in the project EXPLAIN, using a series of industrial use cases. The proposed MLOps software architecture offers several advantages, including an efficient way to manage ML models in production environments and the ability to integrate explanations into the development and deployment processes.

OpenAi’s GPT4 as coding assistant

paper_url: http://arxiv.org/abs/2309.12732
repo_url: https://github.com/lmous/openai-gpt4-coding-assistant
paper_authors: Lefteris Moussiades, George Zografos
for: 本研究使用GPT3.5和GPT4作为代码生成助手，以检验它们在代码开发过程中的能力。
methods: 研究人员采用了适当的测试来检验GPT3.5和GPT4的能力，包括回答常见代码开发中的问题、生成可靠的代码和帮助代码调试。
results: 测试结果吸引人，GPT4的表现出色，表明这些新工具将提高程序员的产效和重新定义软件开发流程。

Abstract
Lately, Large Language Models have been widely used in code generation. GPT4 is considered the most potent Large Language Model from Openai. In this paper, we examine GPT3.5 and GPT4 as coding assistants. More specifically, we have constructed appropriate tests to check whether the two systems can a) answer typical questions that can arise during the code development, b) produce reliable code, and c) contribute to code debugging. The test results are impressive. The performance of GPT4 is outstanding and signals an increase in the productivity of programmers and the reorganization of software development procedures based on these new tools.

摘要
近期，大型自然语言模型在代码生成方面广泛应用。GPT4被视为Openai中最强大的大型自然语言模型。本文我们将 examine GPT3.5和GPT4作为代码助手。更specifically，我们构建了适当的测试，检查这两个系统是否可以：a) 回答代码开发中可能出现的常见问题，b) 生成可靠的代码，c) 帮助代码调试。测试结果印象良好，GPT4的表现出色，这 signal了程序员的产出力和基于这些新工具的软件开发流程的重新组织。

Defeasible Reasoning with Knowledge Graphs

paper_url: http://arxiv.org/abs/2309.12731
repo_url: None
paper_authors: Dave Raggett
for: 本文旨在解决人类知识中的不确定性、不精确性、不完整性和不一致性问题，以及这些问题对semantic web的影响。
methods: 本文提出了一种直观notation和模型，用于解决不确定性推理，并与前期工作关于理据论相关。PKN与N3相比， defeasible reasoning是deductive logic的一种替换。
results: 本文结束时提出了一些关于使用声明性语言描述推理策略和战术的想法，以及基于AIF ontology的灵感。此外，文章还讨论了大语言模型时代的符号approaches。

Abstract
Human knowledge is subject to uncertainties, imprecision, incompleteness and inconsistencies. Moreover, the meaning of many everyday terms is dependent on the context. That poses a huge challenge for the Semantic Web. This paper introduces work on an intuitive notation and model for defeasible reasoning with imperfect knowledge, and relates it to previous work on argumentation theory. PKN is to N3 as defeasible reasoning is to deductive logic. Further work is needed on an intuitive syntax for describing reasoning strategies and tactics in declarative terms, drawing upon the AIF ontology for inspiration. The paper closes with observations on symbolic approaches in the era of large language models.

摘要
人类知识受到不确定性、不精确性、不完整性和不一致性的影响。此外，许多日常用语的意义取决于上下文。这 pose 巨大挑战 дляsemantic web。本文介绍了一种直观 notation 和模型，用于解决不确定性推理，并与之前的 argumentation theory 相关。PKN 相当于 N3，而 defeasible reasoning 相当于deductive logic。进一步的工作需要对推理策略和 тактики的描述使用声明性语言， drawing upon the AIF ontology for inspiration。文章结束时，讨论了使用 symbolic approaches in the era of large language models。Note: "PKN" stands for "Prefered Knowledge Network", and "AIF" stands for "Approximate Inference Framework".

In-context Interference in Chat-based Large Language Models

paper_url: http://arxiv.org/abs/2309.12727
repo_url: None
paper_authors: Eric Nuertey Coleman, Julio Hurtado, Vincenzo Lomonaco
for: This paper aims to study the limitations of in-context learning in large language models (LLMs) and its impact on the model’s performance.
methods: The paper uses a black-box scenario to evaluate the in-context learning ability of LLMs, and proposes an evaluation benchmark based on the bAbI dataset.
results: The study shows that in-context learning can lead to interference between information continually flowing in the context, causing the model to forget previously learned knowledge and reducing its performance.

Abstract
Large language models (LLMs) have had a huge impact on society due to their impressive capabilities and vast knowledge of the world. Various applications and tools have been created that allow users to interact with these models in a black-box scenario. However, one limitation of this scenario is that users cannot modify the internal knowledge of the model, and the only way to add or modify internal knowledge is by explicitly mentioning it to the model during the current interaction. This learning process is called in-context training, and it refers to training that is confined to the user's current session or context. In-context learning has significant applications, but also has limitations that are seldom studied. In this paper, we present a study that shows how the model can suffer from interference between information that continually flows in the context, causing it to forget previously learned knowledge, which can reduce the model's performance. Along with showing the problem, we propose an evaluation benchmark based on the bAbI dataset.

摘要

H2O+: An Improved Framework for Hybrid Offline-and-Online RL with Dynamics Gaps

paper_url: http://arxiv.org/abs/2309.12716
repo_url: None
paper_authors: Haoyi Niu, Tianying Ji, Bingqi Liu, Haocheng Zhao, Xiangyu Zhu, Jianying Zheng, Pengfei Huang, Guyue Zhou, Jianming Hu, Xianyuan Zhan
for: This paper focuses on solving real-world complex tasks using reinforcement learning (RL) in imperfect simulation environments and with limited data.
methods: The authors propose a new algorithm called H2O+, which combines offline and online learning methods to address the challenges of sim-to-real transfer and dynamics gaps.
results: The proposed algorithm demonstrates superior performance and flexibility in both simulation and real-world robotics experiments compared to advanced cross-domain online and offline RL algorithms.Here’s the same information in Simplified Chinese:
for: 这篇论文关注使用强化学习（RL）解决实际世界中复杂任务，不需要高精度模拟环境或大量的离线数据。
methods: 作者提出了一种新的算法H2O+，它结合了离线和在线学习方法，以弥补模拟环境和实际环境之间的动态差异。
results: 提出的算法在实验和实际 робо术中表现出了superior性能和灵活性，与高级跨领域在线和离线RL算法相比。

Abstract
Solving real-world complex tasks using reinforcement learning (RL) without high-fidelity simulation environments or large amounts of offline data can be quite challenging. Online RL agents trained in imperfect simulation environments can suffer from severe sim-to-real issues. Offline RL approaches although bypass the need for simulators, often pose demanding requirements on the size and quality of the offline datasets. The recently emerged hybrid offline-and-online RL provides an attractive framework that enables joint use of limited offline data and imperfect simulator for transferable policy learning. In this paper, we develop a new algorithm, called H2O+, which offers great flexibility to bridge various choices of offline and online learning methods, while also accounting for dynamics gaps between the real and simulation environment. Through extensive simulation and real-world robotics experiments, we demonstrate superior performance and flexibility over advanced cross-domain online and offline RL algorithms.

摘要
Translated into Simplified Chinese:解决实际世界中复杂任务使用回归学习（RL）无需高精度模拟环境或大量的离线数据可以很困难。在线RL代理在不完美的模拟环境中训练后可能会受到严重的模拟到实际（sim-to-real）问题。离线RL方法虽然不需要模拟器，但通常有训练离线数据的质量和量的严格要求。最近出现的混合在线-离线RL提供了一个吸引人的框架，允许在有限的离线数据和不完美的模拟器之间进行可转移的政策学习。在这篇论文中，我们开发了一个新的算法 called H2O+，它提供了很大的灵活性，可以将不同的在线和离线学习方法相互连接，同时也考虑到实际和模拟环境之间的动力差异。通过了较为广泛的模拟和实际机器人实验，我们展示了superior的性能和灵活性，与先进的跨Domain在线和离线RL算法相比。

The Mathematical Game

paper_url: http://arxiv.org/abs/2309.12711
repo_url: https://github.com/xploitspeeds/Bookmarklet-Hacks-For-School
paper_authors: Marc Pierre, Quentin Cohen-Solal, Tristan Cazenave
for: 这 paper 用于提高 Holophrasm theorem prover 的性能，使用其他游戏搜索算法。
methods: 这 paper 使用 MCTS combined with 神经网络来实现自动证明。
results: 该 paper 提出了一种基于 MCTS 和神经网络的自动证明方法，以提高 Holophrasm theorem prover 的性能。

Abstract
Monte Carlo Tree Search can be used for automated theorem proving. Holophrasm is a neural theorem prover using MCTS combined with neural networks for the policy and the evaluation. In this paper we propose to improve the performance of the Holophrasm theorem prover using other game tree search algorithms.

摘要
“蒙特卡罗树搜寻”可以用于自动证明。“Holophrasm”是一个使用蒙特卡罗树搜寻和神经网络来决策和评估的神经证明器。在这篇论文中，我们提议使用其他游戏树搜寻算法来提高Holophrasm证明器的性能。

PointSSC: A Cooperative Vehicle-Infrastructure Point Cloud Benchmark for Semantic Scene Completion

paper_url: http://arxiv.org/abs/2309.12708
repo_url: None
paper_authors: Yuxiang Yan, Boda Liu, Jianfei Ai, Qinbu Li, Ru Wan, Jian Pu
for: 该论文旨在提出一个 Cooperative Vehicle-Infrastructure Point Cloud Benchmark for Semantic Scene Completion，用于驱动semantic point cloud completion的技术进步。
methods: 该论文使用了一种基于LiDAR的模型，包括一个Spatial-Aware Transformer для全球和本地特征提取，以及一个Completion and Segmentation Cooperative Module для联合完成和分类。
results: 该论文提出了一个名为PointSSC的共同自动车辆-基础设施点云benchmark，用于测试和评估semantic point cloud completion技术的进步。

Abstract
Semantic Scene Completion (SSC) aims to jointly generate space occupancies and semantic labels for complex 3D scenes. Most existing SSC models focus on volumetric representations, which are memory-inefficient for large outdoor spaces. Point clouds provide a lightweight alternative but existing benchmarks lack outdoor point cloud scenes with semantic labels. To address this, we introduce PointSSC, the first cooperative vehicle-infrastructure point cloud benchmark for semantic scene completion. These scenes exhibit long-range perception and minimal occlusion. We develop an automated annotation pipeline leveraging Segment Anything to efficiently assign semantics. To benchmark progress, we propose a LiDAR-based model with a Spatial-Aware Transformer for global and local feature extraction and a Completion and Segmentation Cooperative Module for joint completion and segmentation. PointSSC provides a challenging testbed to drive advances in semantic point cloud completion for real-world navigation.

摘要

Multi-Label Noise Transition Matrix Estimation with Label Correlations: Theory and Algorithm

paper_url: http://arxiv.org/abs/2309.12706
repo_url: https://github.com/tmllab/Multi-Label-T
paper_authors: Shikun Li, Xiaobo Xia, Hansong Zhang, Shiming Ge, Tongliang Liu
for: 这篇论文旨在解决多标签学习中的噪声问题，因为收集大规模准确标签变得更加困难。
methods: 这篇论文提出了一种使用过渡矩阵来模型多标签噪声的方法，并利用标签相互关系来估计噪声过渡矩阵。
results: 这篇论文提出了一种新的估计器，可以在不需要 anchor point 或准确适应噪声类 posterior 的情况下估计多标签噪声过渡矩阵。这种估计器基于标签相互关系，并使用样本选择技术来提取净标签相互关系所含信息。

Abstract
Noisy multi-label learning has garnered increasing attention due to the challenges posed by collecting large-scale accurate labels, making noisy labels a more practical alternative. Motivated by noisy multi-class learning, the introduction of transition matrices can help model multi-label noise and enable the development of statistically consistent algorithms for noisy multi-label learning. However, estimating multi-label noise transition matrices remains a challenging task, as most existing estimators in noisy multi-class learning rely on anchor points and accurate fitting of noisy class posteriors, which is hard to satisfy in noisy multi-label learning. In this paper, we address this problem by first investigating the identifiability of class-dependent transition matrices in noisy multi-label learning. Building upon the identifiability results, we propose a novel estimator that leverages label correlations without the need for anchor points or precise fitting of noisy class posteriors. Specifically, we first estimate the occurrence probability of two noisy labels to capture noisy label correlations. Subsequently, we employ sample selection techniques to extract information implying clean label correlations, which are then used to estimate the occurrence probability of one noisy label when a certain clean label appears. By exploiting the mismatches in label correlations implied by these occurrence probabilities, we demonstrate that the transition matrix becomes identifiable and can be acquired by solving a bilinear decomposition problem. Theoretically, we establish an estimation error bound for our multi-label transition matrix estimator and derive a generalization error bound for our statistically consistent algorithm. Empirically, we validate the effectiveness of our estimator in estimating multi-label noise transition matrices, leading to excellent classification performance.

摘要
噪声多标签学习已经吸引了越来越多的关注，因为收集大规模准确标签变得更加困难。为了模型多标签噪声，引入过渡矩阵可以帮助模型多标签噪声。然而，估计多标签噪声过渡矩阵仍然是一个挑战，因为大多数现有的估计器在噪声多类学习中使用锚点和准确地适应噪声类 posterior，这在多标签噪声学习中很难实现。在这篇论文中，我们解决这个问题，首先研究多标签噪声过渡矩阵的可识别性。基于可识别性结果，我们提出了一种新的估计器，它利用标签相互关系来估计噪声标签的过渡矩阵。具体来说，我们首先估计两个噪声标签之间的发生概率，以捕捉噪声标签之间的相互关系。然后，我们使用样本选择技术提取信息，以便从中提取净标签相关信息。最后，我们使用这些发生概率来估计一个噪声标签的过渡矩阵，并解决一个二元分解问题来获取过渡矩阵。理论上，我们建立了估计误差 bound 和一般化误差 bound，以证明我们的多标签过渡矩阵估计器的可靠性。实验证明了我们的估计器在估计多标签噪声过渡矩阵时的效果。

Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning

paper_url: http://arxiv.org/abs/2309.12696
repo_url: https://github.com/thu-rllab/CFCQL
paper_authors: Jianzhun Shao, Yun Qu, Chen Chen, Hongchang Zhang, Xiangyang Ji
for: 解决多智能机器人在离线多智能学习中的分布偏移和维度高问题，提高行动外部分布(OOD)和价值误差现象的问题。
methods: 提出了一种新的多智能离线Q学习算法CounterFactual Conservative Q-Learning（CFCQL），通过计算每个机器人的保守补偿来实现保守价值估计。
results: 在四个环境中，包括整数和浮点动作的设定，以及现有和自己制作的数据集上，CFCQL比既有方法高效，尤其是当机器人数量很大时。

Abstract
Offline multi-agent reinforcement learning is challenging due to the coupling effect of both distribution shift issue common in offline setting and the high dimension issue common in multi-agent setting, making the action out-of-distribution (OOD) and value overestimation phenomenon excessively severe. Tomitigate this problem, we propose a novel multi-agent offline RL algorithm, named CounterFactual Conservative Q-Learning (CFCQL) to conduct conservative value estimation. Rather than regarding all the agents as a high dimensional single one and directly applying single agent methods to it, CFCQL calculates conservative regularization for each agent separately in a counterfactual way and then linearly combines them to realize an overall conservative value estimation. We prove that it still enjoys the underestimation property and the performance guarantee as those single agent conservative methods do, but the induced regularization and safe policy improvement bound are independent of the agent number, which is therefore theoretically superior to the direct treatment referred to above, especially when the agent number is large. We further conduct experiments on four environments including both discrete and continuous action settings on both existing and our man-made datasets, demonstrating that CFCQL outperforms existing methods on most datasets and even with a remarkable margin on some of them.

摘要
偏向多智能体异线上学习是因为偏向分布shift问题和高维度问题的交互作用，导致行为out-of-distribution (OOD)和价值误估现象过于严重。为了解决这个问题，我们提出了一种新的多智能体异线上RL算法，名为CounterFactual Conservative Q-Learning (CFCQL)，用于进行保守的价值估计。而不是将所有智能体视为一个高维度单一的智能体并直接应用单个智能体方法，CFCQL计算每个智能体的保守补偿 separately在反对方法下，然后将其线性组合以实现全局保守价值估计。我们证明了它仍然具有下预测性和性能保证，但是引入的补偿和安全策略改进 bound 独立于智能体数量，因此是理论上的超越直接处理。我们进一步在四个环境中进行了实验，包括 both 某些精度和连续动作设置，并在我们自己制作的数据集上进行了实验，示出CFCQL在大多数数据集上表现出优于现有方法，甚至在一些数据集上具有很大的差距。

Enhancing Graph Representation of the Environment through Local and Cloud Computation

paper_url: http://arxiv.org/abs/2309.12692
repo_url: https://github.com/djdprogramming/adfa2
paper_authors: Francesco Argenziano, Vincenzo Suriani, Daniele Nardi
for: bridging the gap between low-level sensor readings and high-level semantic understanding
methods: combines classical computer vision tools with modern computer vision cloud services, incorporates an ontology hierarchy with over 800 object classes
results: allows for the handling of small objects and integration into the semantic representation of the environment

Abstract
Enriching the robot representation of the operational environment is a challenging task that aims at bridging the gap between low-level sensor readings and high-level semantic understanding. Having a rich representation often requires computationally demanding architectures and pure point cloud based detection systems that struggle when dealing with everyday objects that have to be handled by the robot. To overcome these issues, we propose a graph-based representation that addresses this gap by providing a semantic representation of robot environments from multiple sources. In fact, to acquire information from the environment, the framework combines classical computer vision tools with modern computer vision cloud services, ensuring computational feasibility on onboard hardware. By incorporating an ontology hierarchy with over 800 object classes, the framework achieves cross-domain adaptability, eliminating the need for environment-specific tools. The proposed approach allows us to handle also small objects and integrate them into the semantic representation of the environment. The approach is implemented in the Robot Operating System (ROS) using the RViz visualizer for environment representation. This work is a first step towards the development of a general-purpose framework, to facilitate intuitive interaction and navigation across different domains.

摘要
Our approach combines classical computer vision tools with modern computer vision cloud services to acquire information from the environment. By incorporating an ontology hierarchy with over 800 object classes, our framework achieves cross-domain adaptability, eliminating the need for environment-specific tools. This allows us to handle small objects and integrate them into the semantic representation of the environment.We have implemented our approach in the Robot Operating System (ROS) using the RViz visualizer for environment representation. This is a first step towards the development of a general-purpose framework for intuitive interaction and navigation across different domains.

QAL-BP: An Augmented Lagrangian Quantum Approach for Bin Packing Problem

paper_url: http://arxiv.org/abs/2309.12678
repo_url: https://github.com/lorenz92/qal-bp
paper_authors: Lorenzo Cellini, Antonio Macaluso, Michele Lombardi
for: 解决 bin packing 问题，一个 NP-hard 问题，寻找高效的解决方案。
methods: 使用 quantum technologies，尤其是 quantum computing，来解决这个问题。提出了一种新的 Quadratic Unconstrained Binary Optimization (QUBO) 模型，可以更好地处理这个问题。
results: 实验结果表明，使用 quantum annealing Device 可以更有效地解决 bin packing 问题，并且比 classical solvers 更快。

Abstract
The bin packing is a well-known NP-Hard problem in the domain of artificial intelligence, posing significant challenges in finding efficient solutions. Conversely, recent advancements in quantum technologies have shown promising potential for achieving substantial computational speedup, particularly in certain problem classes, such as combinatorial optimization. In this study, we introduce QAL-BP, a novel Quadratic Unconstrained Binary Optimization (QUBO) formulation designed specifically for bin packing and suitable for quantum computation. QAL-BP utilizes the augmented Lagrangian method to incorporate the bin packing constraints into the objective function while also facilitating an analytical estimation of heuristic, but empirically robust, penalty multipliers. This approach leads to a more versatile and generalizable model that eliminates the need for empirically calculating instance-dependent Lagrangian coefficients, a requirement commonly encountered in alternative QUBO formulations for similar problems. To assess the effectiveness of our proposed approach, we conduct experiments on a set of bin-packing instances using a real Quantum Annealing device. Additionally, we compare the results with those obtained from two different classical solvers, namely simulated annealing and Gurobi. The experimental findings not only confirm the correctness of the proposed formulation but also demonstrate the potential of quantum computation in effectively solving the bin-packing problem, particularly as more reliable quantum technology becomes available.

摘要
bin packing 是人工智能领域的一个非常知名的 NP-硬Problem，它提出了很大的计算挑战。然而，最近的量子技术发展有显著的潜在能力，特别是在 combinatorial optimization 中，以获得显著的计算速度提升。在这项研究中，我们介绍了 QAL-BP，一种特有的 Quadratic Unconstrained Binary Optimization (QUBO) 形式，适用于 bin packing 问题并且适用于量子计算。QAL-BP 利用了扩展拉格朗日方法，将箱包约束直接添加到目标函数中，同时还可以 analytically 计算 heuristic，但是实际上是可靠的 penalty multipliers。这种方法使得我们的模型更加通用和可重用，消除了对实际计算 instance-dependent Lagrangian coefficients 的需求，这种需求通常在类似问题的alternative QUBO formulations中出现。为评估我们提出的方法的有效性，我们在一组 bin-packing 实例上进行了实验，使用了真实的 Quantum Annealing 设备。此外，我们还与 simulated annealing 和 Gurobi 两种类型的 classical solver进行了比较。实验结果不仅证明了我们的提出的形式的正确性，还表明了量子计算在解决箱包问题上的潜在能力，特别是随着可靠的量子技术的发展。

TrTr: A Versatile Pre-Trained Large Traffic Model based on Transformer for Capturing Trajectory Diversity in Vehicle Population

paper_url: http://arxiv.org/abs/2309.12677
repo_url: None
paper_authors: Ruyi Feng, Zhibin Li, Bowen Liu, Yan Ding, Ou Zheng
For: The paper aims to learn the diversity of trajectories within vehicle populations using the Transformer architecture, which can handle large-scale parameters and capture the spatial distribution of vehicles.* Methods: The authors apply the Transformer architecture to traffic tasks and design specific pre-training tasks to improve the model’s performance. They also create a data structure tailored to the attention mechanism and introduce noises that correspond to spatio-temporal demands.* Results: The pre-trained model demonstrates excellent performance in capturing the spatial distribution of the vehicle population, with no instances of vehicle overlap and an RMSE of 0.6059 compared to ground truth values. In the context of time series prediction, approximately 95% of the predicted trajectories’ speeds closely align with the true speeds, within a deviation of 7.5144m/s. The model also exhibits robustness in the stability test and provides a good basis for downstream fine-tuning tasks.Here’s the format you requested:* For: 学习车辆群体的路径多样性* Methods: 使用变换器架构，设计特定的预训练任务，创建适合注意机制的数据结构，引入相应的空间时间噪声* Results: 预训练模型在捕捉车辆群体的空间分布方面表现出色，无车辆重叠，RMSE为0.6059；在时间序列预测任务中，预测速度与真实速度相似，偏差为7.5144m/s；模型在稳定性测试中表现稳定，可以长期预测时间序列，并且展现出多样化的驾驶行为。

Abstract
Understanding trajectory diversity is a fundamental aspect of addressing practical traffic tasks. However, capturing the diversity of trajectories presents challenges, particularly with traditional machine learning and recurrent neural networks due to the requirement of large-scale parameters. The emerging Transformer technology, renowned for its parallel computation capabilities enabling the utilization of models with hundreds of millions of parameters, offers a promising solution. In this study, we apply the Transformer architecture to traffic tasks, aiming to learn the diversity of trajectories within vehicle populations. We analyze the Transformer's attention mechanism and its adaptability to the goals of traffic tasks, and subsequently, design specific pre-training tasks. To achieve this, we create a data structure tailored to the attention mechanism and introduce a set of noises that correspond to spatio-temporal demands, which are incorporated into the structured data during the pre-training process. The designed pre-training model demonstrates excellent performance in capturing the spatial distribution of the vehicle population, with no instances of vehicle overlap and an RMSE of 0.6059 when compared to the ground truth values. In the context of time series prediction, approximately 95% of the predicted trajectories' speeds closely align with the true speeds, within a deviation of 7.5144m/s. Furthermore, in the stability test, the model exhibits robustness by continuously predicting a time series ten times longer than the input sequence, delivering smooth trajectories and showcasing diverse driving behaviors. The pre-trained model also provides a good basis for downstream fine-tuning tasks. The number of parameters of our model is over 50 million.

摘要
理解轨迹多样性是实际交通任务的基本方面。然而，捕捉轨迹多样性带来挑战，特别是使用传统的机器学习和回归神经网络，因为它们需要庞大的参数量。然而，出现的Transformer技术，因其可平行计算能力，使得使用大量参数的模型变得可能。在这个研究中，我们将Transformer架构应用到交通任务中，以学习车辆人口中的轨迹多样性。我们分析Transformer的注意机制和它的适应性，然后设计特定的预训练任务。为此，我们创建了适应注意机制的数据结构，并引入了相应的噪声，这些噪声在预训练过程中被包含到结构化数据中。预训练模型显示出色的性能，可以准确地捕捉车辆人口的空间分布，没有车辆重叠，RMSE为0.6059，相比真实值。在时间序列预测任务中，大约95%的预测轨迹速度与真实速度高度相似，差异在7.5144米/秒之间。此外，在稳定测试中，模型表现了稳定性，连续预测了10个输入序列的时间序列，输出了平滑的轨迹和多样的驾驶行为。预训练模型还提供了下游细化任务的良好基础。我们的模型参数数量超过5000万。

Vision Transformers for Computer Go

paper_url: http://arxiv.org/abs/2309.12675
repo_url: https://github.com/assasinator/Swin_Transformers
paper_authors: Amani Sagri, Tristan Cazenave, Jérôme Arjonilla, Abdallah Saffidine
for: investigate the application of transformers in the game of Go, specifically in the analysis of the Transformer in Vision.
methods: compare transformers to usual Residual Networks.
results: highlight the substantial role that transformers can play in the game of Go, through a detailed analysis of numerous points such as prediction accuracy, win rates, memory, speed, size, or even learning rate.

Abstract
Motivated by the success of transformers in various fields, such as language understanding and image analysis, this investigation explores their application in the context of the game of Go. In particular, our study focuses on the analysis of the Transformer in Vision. Through a detailed analysis of numerous points such as prediction accuracy, win rates, memory, speed, size, or even learning rate, we have been able to highlight the substantial role that transformers can play in the game of Go. This study was carried out by comparing them to the usual Residual Networks.

摘要
受到 transformer 在不同领域的成功启发，本研究探讨它们在围棋游戏中的应用。特别是我们在视觉领域中使用 transformer 进行分析。通过对各种指标（如预测精度、赢局率、内存、速度、大小、学习率等）的细致分析，我们得出了 transformer 在围棋游戏中的重要作用。这个研究通过与常见的 Residual Networks 进行比较，得出了这种结论。

On Sparse Modern Hopfield Model

paper_url: http://arxiv.org/abs/2309.12673
repo_url: None
paper_authors: Jerry Yao-Chieh Hu, Donglin Yang, Dennis Wu, Chenwei Xu, Bo-Yu Chen, Han Liu
For: 本文提出了一种含有稀畴结构的现代戴维尔模型（Sparse Modern Hopfield Model），作为现代戴维尔模型的稀畴扩展。* Methods: 本文使用了稀畴注意机制，并提出了一种关于稀畴能函数的关闭式表示，以及基于此的稀畴记忆抽取动力学。* Results: 本文提出了一种基于稀畴结构的记忆抽取动力学，并证明其一步采样等价于稀畴结构的注意机制。此外，本文还提供了稀畴取决因子的记忆抽取误差 bound，并证明其比密集模型更紧。

Abstract
We introduce the sparse modern Hopfield model as a sparse extension of the modern Hopfield model. Like its dense counterpart, the sparse modern Hopfield model equips a memory-retrieval dynamics whose one-step approximation corresponds to the sparse attention mechanism. Theoretically, our key contribution is a principled derivation of a closed-form sparse Hopfield energy using the convex conjugate of the sparse entropic regularizer. Building upon this, we derive the sparse memory retrieval dynamics from the sparse energy function and show its one-step approximation is equivalent to the sparse-structured attention. Importantly, we provide a sparsity-dependent memory retrieval error bound which is provably tighter than its dense analog. The conditions for the benefits of sparsity to arise are therefore identified and discussed. In addition, we show that the sparse modern Hopfield model maintains the robust theoretical properties of its dense counterpart, including rapid fixed point convergence and exponential memory capacity. Empirically, we use both synthetic and real-world datasets to demonstrate that the sparse Hopfield model outperforms its dense counterpart in many situations.

摘要
我们介绍了一个对� tenir 的现代丰码模型，这是一个对于现代丰码模型的延伸。这个模型具有一个备受归属的记忆回传动态，它的一步近似对应于简短的注意力机制。理论上，我们的主要贡献是通过对简短Entropic regularizer的对偶函数而 derive 一个关闭的简短丰码能量函数。建基于这个能量函数，我们 derive 简短的记忆回传动态，并证明它的一步近似与简短结构的注意力相等。更重要的是，我们提供了一个对简短性的记忆回传错误 bound，这是与对简短性的预设相比较为紧的。因此，我们可以识别出简短性在哪些情况下获得好的情况，并讨论这些情况。此外，我们显示了对简短性的丰码模型保持了现代丰码模型的理论性质，包括快速的稳定点收敛和exponential的记忆容量。实验上，我们使用了 synthetic 和实际世界数据来显示，简短丰码模型在许多情况下比对简短性的丰码模型表现更好。

How to Fine-tune the Model: Unified Model Shift and Model Bias Policy Optimization

paper_url: http://arxiv.org/abs/2309.12671
repo_url: https://github.com/betray12138/unified-model-shift-and-model-bias-policy-optimization
paper_authors: Hai Zhang, Hang Yu, Junqiao Zhao, Di Zhang, Chang Huang, Hongtu Zhou, Xiao Zhang, Chen Ye
for: 这篇论文主要目标是提出一种能够满足性能保证的模型基于强化学习（MBRL）算法设计方法。
methods: 该方法使用返回差值来导引模型学习，并通过自适应调整模型更新来保证性能提升。
results: 实验结果表明，该算法在多个复杂任务上达到了状态之最的性能。

Abstract
Designing and deriving effective model-based reinforcement learning (MBRL) algorithms with a performance improvement guarantee is challenging, mainly attributed to the high coupling between model learning and policy optimization. Many prior methods that rely on return discrepancy to guide model learning ignore the impacts of model shift, which can lead to performance deterioration due to excessive model updates. Other methods use performance difference bound to explicitly consider model shift. However, these methods rely on a fixed threshold to constrain model shift, resulting in a heavy dependence on the threshold and a lack of adaptability during the training process. In this paper, we theoretically derive an optimization objective that can unify model shift and model bias and then formulate a fine-tuning process. This process adaptively adjusts the model updates to get a performance improvement guarantee while avoiding model overfitting. Based on these, we develop a straightforward algorithm USB-PO (Unified model Shift and model Bias Policy Optimization). Empirical results show that USB-PO achieves state-of-the-art performance on several challenging benchmark tasks.

摘要
“设计和推导具有性能改善保证的模型基于学习（MBRL）算法是具有挑战，主要是因为模型学习和政策优化之间存在高度的整合。许多先前的方法将返回差异用来引导模型学习，忽略模型转移的影响，可能导致因为过度更新模型而导致性能下降。其他方法使用性能差异 bound来考虑模型转移，但这些方法靠摄设定的阈值来限制模型转移，导致执行过程中的依赖性和缺乏灵活性。在这篇文章中，我们理论上 derive 一个优化目标，可以将模型转移和模型偏差统一，然后构成一个精致调整过程。这个过程可以适应性地调整模型更新，以确保性能改善保证，同时避免模型过滤。基于这些，我们开发了一个简单的算法 USB-PO（对于模型转移和模型偏差的政策优化）。实验结果显示，USB-PO 在一些具有挑战性的 bencark task 上实现了顶尖性能。”

Natural revision is contingently-conditionalized revision

paper_url: http://arxiv.org/abs/2309.12655
repo_url: None
paper_authors: Paolo Liberatore
for: 本研究旨在探讨自然修订的扩展，以满足更复杂的条件。
methods: 本研究使用了自然修订的基本原则，包括最小变化、等同和纯洁。
results: 研究发现，自然修订的扩展可以帮助解决一些Counterexample，但也存在一些限制。

Abstract
Natural revision seems so natural: it changes beliefs as little as possible to incorporate new information. Yet, some counterexamples show it wrong. It is so conservative that it never fully believes. It only believes in the current conditions. This is right in some cases and wrong in others. Which is which? The answer requires extending natural revision from simple formulae expressing universal truths (something holds) to conditionals expressing conditional truth (something holds in certain conditions). The extension is based on the basic principles natural revision follows, identified as minimal change, indifference and naivety: change beliefs as little as possible; equate the likeliness of scenarios by default; believe all until contradicted. The extension says that natural revision restricts changes to the current conditions. A comparison with an unrestricting revision shows what exactly the current conditions are. It is not what currently considered true if it contradicts the new information. It includes something more and more unlikely until the new information is at least possible.

摘要
自然修订似乎很自然：它最小化信念更改，以接纳新信息。然而，一些例外证明它错误。它太保守，从未全heartedly 信任。它只信任当前情况。这对于一些情况是正确的，对于另外一些情况是错误的。哪些是哪些？答案需要扩展自然修订从简单的公式表达universal truth（something holds）扩展到 conditionals表达条件真理（something holds in certain conditions）。这种扩展基于自然修订的基本原则：变化信念最少化，默许enario equality 和简单信任（change beliefs as little as possible; equate the likeliness of scenarios by default; believe all until contradicted）。这种扩展说明自然修订限制更改到当前情况。与不限制的修订进行比较显示了现在所考虑的真实情况是什么。不是现在被认为是真的，如果它与新信息相 contradicted。而是更加不可能的事情，直到新信息至少可能。

Are Deep Learning Classification Results Obtained on CT Scans Fair and Interpretable?

paper_url: http://arxiv.org/abs/2309.12632
repo_url: None
paper_authors: Mohamad M. A. Ashames, Ahmet Demir, Omer N. Gerek, Mehmet Fidan, M. Bilginer Gulmezoglu, Semih Ergin, Mehmet Koc, Atalay Barkana, Cuneyt Calisir
for: 这篇研究的目的是提高医学影像识别的可解释性和准确性。
methods: 这篇研究使用了深度学习方法，并将患者级别分类 strict 地隔离在训练、验证和测试数据集中。
results: 研究发现，使用传统的不平等排序方法训练深度学习模型可能会报告误导性的准确率，并且学习无关的特征。但是，使用严格的患者级别分类方法训练深度学习模型可以保持其准确率，并且在新患者影像上进行测试时仍然表现出高度的可解释性。

Abstract
Following the great success of various deep learning methods in image and object classification, the biomedical image processing society is also overwhelmed with their applications to various automatic diagnosis cases. Unfortunately, most of the deep learning-based classification attempts in the literature solely focus on the aim of extreme accuracy scores, without considering interpretability, or patient-wise separation of training and test data. For example, most lung nodule classification papers using deep learning randomly shuffle data and split it into training, validation, and test sets, causing certain images from the CT scan of a person to be in the training set, while other images of the exact same person to be in the validation or testing image sets. This can result in reporting misleading accuracy rates and the learning of irrelevant features, ultimately reducing the real-life usability of these models. When the deep neural networks trained on the traditional, unfair data shuffling method are challenged with new patient images, it is observed that the trained models perform poorly. In contrast, deep neural networks trained with strict patient-level separation maintain their accuracy rates even when new patient images are tested. Heat-map visualizations of the activations of the deep neural networks trained with strict patient-level separation indicate a higher degree of focus on the relevant nodules. We argue that the research question posed in the title has a positive answer only if the deep neural networks are trained with images of patients that are strictly isolated from the validation and testing patient sets.

摘要
Traditional deep learning methods for image and object classification have achieved great success, and the biomedical image processing community has also applied these methods to various automatic diagnosis cases. However, most deep learning-based classification attempts in the literature focus solely on achieving high accuracy scores without considering interpretability or patient-wise separation of training and test data. For example, most lung nodule classification papers using deep learning randomly shuffle data and split it into training, validation, and test sets, causing certain images from the same person's CT scan to be in the training set, while other images of the exact same person are in the validation or testing image sets. This can lead to misleading accuracy rates and the learning of irrelevant features, ultimately reducing the real-life usability of these models. When the deep neural networks trained on the traditional, unfair data shuffling method are challenged with new patient images, they perform poorly. In contrast, deep neural networks trained with strict patient-level separation maintain their accuracy rates even when new patient images are tested. Heat-map visualizations of the activations of the deep neural networks trained with strict patient-level separation indicate a higher degree of focus on the relevant nodules. We argue that the research question posed in the title has a positive answer only if the deep neural networks are trained with images of patients that are strictly isolated from the validation and testing patient sets.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and other countries. If you need Traditional Chinese, please let me know.

A Quantum Computing-based System for Portfolio Optimization using Future Asset Values and Automatic Reduction of the Investment Universe

paper_url: http://arxiv.org/abs/2309.12627
repo_url: None
paper_authors: Eneko Osaba, Guillaume Gelabert, Esther Villar-Rodriguez, Antón Asla, Izaskun Oregi
for: 这个研究是为了解决股票 portefolio优化问题，并使用量子计算技术来解决。
methods: 这个系统使用了未来资产预测值来模型，而不是历史值；此外，它还包括自动宇宙减少模块，以降低问题的复杂性。
results: 作者们提出了一个概述性的讨论，描述了不同模块的先进性表现。

Abstract
One of the problems in quantitative finance that has received the most attention is the portfolio optimization problem. Regarding its solving, this problem has been approached using different techniques, with those related to quantum computing being especially prolific in recent years. In this study, we present a system called Quantum Computing-based System for Portfolio Optimization with Future Asset Values and Automatic Universe Reduction (Q4FuturePOP), which deals with the Portfolio Optimization Problem considering the following innovations: i) the developed tool is modeled for working with future prediction of assets, instead of historical values; and ii) Q4FuturePOP includes an automatic universe reduction module, which is conceived to intelligently reduce the complexity of the problem. We also introduce a brief discussion about the preliminary performance of the different modules that compose the prototypical version of Q4FuturePOP.

摘要
一个在量化金融中受到最多关注的问题是资产配置优化问题（Portfolio Optimization Problem）。它的解决方法有很多，其中使用量子计算技术的方法在过去几年中特别繁荣。在这篇研究中，我们介绍了一个名为量子计算基于系统 для资产配置优化与未来资产价值预测（Q4FuturePOP）的系统，它解决了资产配置优化问题，并包括以下两点创新：首先，该工具采用未来预测资产价值，而不是历史数据；其次，Q4FuturePOP包括自动宇宙减少模块，以智能减少问题的复杂性。我们还介绍了批处理版Q4FuturePOP的不同模块的初步性能。

Construction contract risk identification based on knowledge-augmented language model

paper_url: http://arxiv.org/abs/2309.12626
repo_url: None
paper_authors: Saika Wong, Chunmo Zheng, Xing Su, Yinqiu Tang
for: 本研究旨在提高建筑项目合同审核效果，以避免潜在的损失。
methods: 本研究使用大型自然语言处理模型（LLM），并通过适应 contruction 合同知识来增强语言模型的风险识别能力。
results: 我们对实际的建筑合同进行了评估，并实现了良好的性能。此外，我们还研究了LLM如何在这种任务中运用逻辑思维，并提供了未来研究的建议。

Abstract
Contract review is an essential step in construction projects to prevent potential losses. However, the current methods for reviewing construction contracts lack effectiveness and reliability, leading to time-consuming and error-prone processes. While large language models (LLMs) have shown promise in revolutionizing natural language processing (NLP) tasks, they struggle with domain-specific knowledge and addressing specialized issues. This paper presents a novel approach that leverages LLMs with construction contract knowledge to emulate the process of contract review by human experts. Our tuning-free approach incorporates construction contract domain knowledge to enhance language models for identifying construction contract risks. The use of a natural language when building the domain knowledge base facilitates practical implementation. We evaluated our method on real construction contracts and achieved solid performance. Additionally, we investigated how large language models employ logical thinking during the task and provide insights and recommendations for future research.

摘要
CONTRACT REVIEW IS AN ESSENTIAL STEP IN CONSTRUCTION PROJECTS TO PREVENT POTENTIAL LOSSES. HOWEVER, THE CURRENT METHODS FOR REVIEWING CONSTRUCTION CONTRACTS LACK EFFECTIVENESS AND RELIABILITY, LEADING TO TIME-CONSUMING AND ERROR-PRONE PROCESSES. WHILE LARGE LANGUAGE MODELS (LLMs) HAVE SHOWN PROMISE IN REVOLUTIONIZING NATURAL LANGUAGE PROCESSING (NLP) TASKS, THEY STRUGGLE WITH DOMAIN-SPECIFIC KNOWLEDGE AND ADDRESSING SPECIALIZED ISSUES. THIS PAPER PRESENTS A NOVEL APPROACH THAT LEVERAGES LLMs WITH CONSTRUCTION CONTRACT KNOWLEDGE TO EMULATE THE PROCESS OF CONTRACT REVIEW BY HUMAN EXPERTS. OUR TUNING-FREE APPROACH INCORPORATES CONSTRUCTION CONTRACT DOMAIN KNOWLEDGE TO ENHANCE LANGUAGE MODELS FOR IDENTIFYING CONSTRUCTION CONTRACT RISKS. THE USE OF A NATURAL LANGUAGE WHEN BUILDING THE DOMAIN KNOWLEDGE BASE FACILITATES PRACTICAL IMPLEMENTATION. WE EVALUATED OUR METHOD ON REAL CONSTRUCTION CONTRACTS AND ACHIEVED SOLID PERFORMANCE. ADDITIONALLY, WE INVESTIGATED HOW LARGE LANGUAGE MODELS EMPLOY LOGICAL THINKING DURING THE TASK AND PROVIDE INSIGHTS AND RECOMMENDATIONS FOR FUTURE RESEARCH.

paper_url: http://arxiv.org/abs/2309.12625
repo_url: https://github.com/hanyin88/drg-llama
paper_authors: Hanyin Wang, Chufan Gao, Christopher Dantona, Bryan Hull, Jimeng Sun
For: This paper aims to improve the efficiency of the Diagnosis-Related Group (DRG) assignment process in the U.S. inpatient payment system by using an advanced large language model (LLM) fine-tuned on clinical notes.* Methods: The paper introduces DRG-LLaMA, a LLM fine-tuned on 236,192 MIMIC-IV discharge summaries using Low-Rank Adaptation (LoRA) to enhance DRG assignment. The model uses a maximum input token length of 512 and achieves a noteworthy macro-averaged F1 score of 0.327, a top-1 prediction accuracy of 52.0%, and a macro-averaged AUC of 0.986.* Results: The DRG-LLaMA model surpassed the performance of prior leading models in DRG prediction, showing a relative improvement of 40.3% and 35.7% in macro-averaged F1 score compared to ClinicalBERT and CAML, respectively. The model also achieved a top-1 prediction accuracy of 67.8% and 67.5% for base DRG and CC/MCC prediction, respectively.

Abstract
In the U.S. inpatient payment system, the Diagnosis-Related Group (DRG) is pivotal, but its assignment process is inefficient. The study introduces DRG-LLaMA, an advanced large language model (LLM) fine-tuned on clinical notes to enhance DRGs assignment. Utilizing LLaMA as the foundational model and optimizing it through Low-Rank Adaptation (LoRA) on 236,192 MIMIC-IV discharge summaries, our DRG-LLaMA-7B model exhibited a noteworthy macro-averaged F1 score of 0.327, a top-1 prediction accuracy of 52.0%, and a macro-averaged Area Under the Curve (AUC) of 0.986, with a maximum input token length of 512. This model surpassed the performance of prior leading models in DRG prediction, showing a relative improvement of 40.3% and 35.7% in macro-averaged F1 score compared to ClinicalBERT and CAML, respectively. Applied to base DRG and complication or comorbidity (CC)/major complication or comorbidity (MCC) prediction, DRG-LLaMA achieved a top-1 prediction accuracy of 67.8% and 67.5%, respectively. Additionally, our findings indicate that DRG-LLaMA's performance correlates with increased model parameters and input context lengths.

摘要
在美国医疗机构中，诊断相关组（DRG）是关键，但其分配过程不具有效率。这项研究介绍DRG-LLaMA，一种基于临床笔记的大型自然语言模型（LLM）进行DRG分配的提高。通过将LLaMA作为基础模型，并通过对236,192份MIMIC-IV病卷摘要进行低级适应（LoRA）优化，我们的DRG-LLaMA-7B模型在macro-averaged F1分数上表现出了很好的成绩，即0.327，Top-1预测精度为52.0%，和macro-averaged AUC为0.986，最大输入符号长度为512。这个模型在DRG预测中超越了先前的主导模型，显示了相对提升40.3%和35.7%的macro-averaged F1分数相比于ClinicalBERT和CAML，分别。应用于基本DRG和complication或comorbidity（CC）/主要complication或comorbidity（MCC）预测中，DRG-LLaMA达到了Top-1预测精度为67.8%和67.5%，分别。此外，我们的发现表明DRG-LLaMA的性能与模型参数和输入上下文长度相关。

From Text to Trends: A Unique Garden Analytics Perspective on the Future of Modern Agriculture

paper_url: http://arxiv.org/abs/2309.12579
repo_url: None
paper_authors: Parag Saxena
for: 这个研究报告旨在提高现代农业中的数据驱动智能，即通过机器学习框架来改善农业教育和沟通。
methods: 该研究使用了机器学习模型和自然语言处理（NLP）技术，将问题分类并预测未来的趋势。
results: 研究结果表明，机器学习可以预测园艺趋势，并且可以预测未来园艺问题的主题和趋势。这些结果表明，大规模农业产业可以通过积累和维护文本数据来预测趋势和实施战略农业规划。

Abstract
Data-driven insights are essential for modern agriculture. This research paper introduces a machine learning framework designed to improve how we educate and reach out to people in the field of horticulture. The framework relies on data from the Horticulture Online Help Desk (HOHD), which is like a big collection of questions from people who love gardening and are part of the Extension Master Gardener Program (EMGP). This framework has two main parts. First, it uses special computer programs (machine learning models) to sort questions into categories. This helps us quickly send each question to the right expert, so we can answer it faster. Second, it looks at when questions are asked and uses that information to guess how many questions we might get in the future and what they will be about. This helps us plan on topics that will be really important. It's like knowing what questions will be popular in the coming months. We also take into account where the questions come from by looking at the Zip Code. This helps us make research that fits the challenges faced by gardeners in different places. In this paper, we demonstrate the potential of machine learning techniques to predict trends in horticulture by analyzing textual queries from homeowners. We show that NLP, classification, and time series analysis can be used to identify patterns in homeowners' queries and predict future trends in horticulture. Our results suggest that machine learning could be used to predict trends in other agricultural sectors as well. If large-scale agriculture industries curate and maintain a comparable repository of textual data, the potential for trend prediction and strategic agricultural planning could be revolutionized. This convergence of technology and agriculture offers a promising pathway for the future of sustainable farming and data-informed agricultural practices

摘要
“数据驱动的洞察力是现代农业的关键。这项研究发表在园艺领域的机器学习框架，旨在提高我们向园艺领域人员的教育和沟通方式。该框架包括两个主要部分。首先，它使用特殊的计算机程序（机器学习模型）将问题分类。这有助于我们快速将每个问题传递给相应的专家，以便更快地回答。其次，它考虑问题的提交时间，并使用这些信息预测将来的问题数量和主题。这有助于我们在未来准备相关的研究和规划。我们还考虑问题来源的ZipCode，以便制定地域化的研究。我们的研究表明，机器学习技术可以通过分析家庭主持人的文本查询来预测园艺领域的趋势。我们的结果表明，机器学习可以预测其他农业领域的趋势。如果大规模农业产业积累和维护类似的文本数据库，那么机器学习的潜力可以为未来的可持续农业和数据驱动农业实践带来革命性的改变。这种技术和农业的融合将为未来的可持续农业和数据驱动农业实践提供了一条优秀的道路。”

Understanding Patterns of Deep Learning ModelEvolution in Network Architecture Search

paper_url: http://arxiv.org/abs/2309.12576
repo_url: None
paper_authors: Robert Underwood, Meghana Madhastha, Randal Burns, Bogdan Nicolae
for: 这个论文主要探讨了深度学习模型的结构优化方法，具体来说是用于研究模型在时间推移中的Empirical Evolution Patterns，以便为缓存策略、优化搜索算法和其他应用场景设计。
methods: 这个论文使用了Regularized Evolution algorithm来研究模型的进化趋势，并对Candle项目和Nasbench-201搜索空间中的模型进行了算法性分析和量化Characterization。
results: 研究发现，Regularized Evolution algorithm会影响模型结构的进化趋势，而且在分布式环境中，模型的进化 Patterns可以被利用来优化缓存和调度策略。此外，研究还发现了一些影响模型结构的升降温度的 Condition。

Abstract
Network Architecture Search and specifically Regularized Evolution is a common way to refine the structure of a deep learning model.However, little is known about how models empirically evolve over time which has design implications for designing caching policies, refining the search algorithm for particular applications, and other important use cases.In this work, we algorithmically analyze and quantitatively characterize the patterns of model evolution for a set of models from the Candle project and the Nasbench-201 search space.We show how the evolution of the model structure is influenced by the regularized evolution algorithm. We describe how evolutionary patterns appear in distributed settings and opportunities for caching and improved scheduling. Lastly, we describe the conditions that affect when particular model architectures rise and fall in popularity based on their frequency of acting as a donor in a sliding window.

摘要
网络建构搜索和特别是减少演化是深度学习模型结构的常见方法。然而，有很 little is known about如何模型实际在时间演化，这有关缓存策略、改进搜索算法、特定应用场景等重要用例的设计假设。在这项工作中，我们使用算法分析和量化描述了一组 Candle 项目和 Nasbench-201 搜索空间中模型的演化趋势。我们表明了减少演化算法如何影响模型结构的演化。我们还描述了分布式设置中的演化趋势以及缓存和调度的机会。最后，我们描述了模型结构的崛起和衰落的条件，基于它们在滚动窗口中的频率被用作donor。

Creativity Support in the Age of Large Language Models: An Empirical Study Involving Emerging Writers

paper_url: http://arxiv.org/abs/2309.12570
repo_url: None
paper_authors: Tuhin Chakrabarty, Vishakh Padmakumar, Faeze Brahman, Smaranda Muresan
for: 这个论文旨在探讨现代大型自然语言模型（LLM）在职业写作支持工具中的可用性。
methods: 这个研究采用了实证研究方法（n=30），检查了现代LLM在助手写作过程中的可用性。
results: 研究发现，写作者在不同的认知活动中寻求LLM的帮助，尤其是在翻译和审阅阶段。

Abstract
The development of large language models (LLMs) capable of following instructions and engaging in conversational interactions sparked increased interest in their utilization across various support tools. We investigate the utility of modern LLMs in assisting professional writers via an empirical user study (n=30). The design of our collaborative writing interface is grounded in the cognitive process model of writing that views writing as a goal-oriented thinking process encompassing non-linear cognitive activities: planning, translating, and reviewing. Participants are asked to submit a post-completion survey to provide feedback on the potential and pitfalls of LLMs as writing collaborators. Upon analyzing the writer-LLM interactions, we find that while writers seek LLM's help across all three types of cognitive activities, they find LLMs more helpful in translation and reviewing. Our findings from analyzing both the interactions and the survey responses highlight future research directions in creative writing assistance using LLMs.

摘要
大型语言模型（LLM）的发展，能够跟随 instrux 和进行对话交互，对各种支持工具的应用产生了增加的兴趣。我们通过实验用户研究（n=30）进行了 LLM 在职业作家助手方面的可用性检查。我们的协作写作界面设计基于写作认知过程模型，视写作为一个目标导向的思考过程，包括观察、翻译和审核等非线性认知活动。参与者被要求提供完成后的调查，以提供 LLM 作为写作伙伴的潜在和障碍。我们分析了写作者与 LLM 之间的互动，发现写作者对 LLM 的帮助最多是在翻译和审核阶段。我们从分析互动和调查回应中获得了未来创作写作助手领域的研究方向。

paper_url: http://arxiv.org/abs/2309.12568
repo_url: https://github.com/robotixx/multimodal-fusion-network
paper_authors: Bhabaranjan Panigrahi, Amir Hossain Raj, Mohammad Nazeri, Xuesu Xiao
for: 本研究旨在开发一种能够在人类 inhabited 的公共空间中自主移动的 robot，并能够根据围绕着它的人类行为和意图进行相应的导航决策。
methods: 本研究使用机器学习方法来捕捉人类社交互动的复杂和微妙性，从数据驱动的角度来capture这些互动。研究者使用多个可用的感知模式，包括 LiDAR 和 RGB 摄像头，并对不同的社交情境进行了比较。
results: 研究结果显示，使用多模态感知可以在社交导航决策中获得明显的优势，并且在人类研究中也被证明。研究者还分析了学习过程中的训练和普适性性能。开源代码可供社区未来研究多模态感知导航。

Abstract
Autonomous mobile robots need to perceive the environments with their onboard sensors (e.g., LiDARs and RGB cameras) and then make appropriate navigation decisions. In order to navigate human-inhabited public spaces, such a navigation task becomes more than only obstacle avoidance, but also requires considering surrounding humans and their intentions to somewhat change the navigation behavior in response to the underlying social norms, i.e., being socially compliant. Machine learning methods are shown to be effective in capturing those complex and subtle social interactions in a data-driven manner, without explicitly hand-crafting simplified models or cost functions. Considering multiple available sensor modalities and the efficiency of learning methods, this paper presents a comprehensive study on learning social robot navigation with multimodal perception using a large-scale real-world dataset. The study investigates social robot navigation decision making on both the global and local planning levels and contrasts unimodal and multimodal learning against a set of classical navigation approaches in different social scenarios, while also analyzing the training and generalizability performance from the learning perspective. We also conduct a human study on how learning with multimodal perception affects the perceived social compliance. The results show that multimodal learning has a clear advantage over unimodal learning in both dataset and human studies. We open-source our code for the community's future use to study multimodal perception for learning social robot navigation.

摘要
自适应移动 робоッツ需要通过其 бордов的感知器 (例如 LiDAR 和 RGB 摄像头) 识别环境，然后采取相应的导航决策。在人类居住的公共空间中导航，这种导航任务不仅仅是避免障碍物，还需要考虑周围的人和他们的意图，并根据下面社会规范进行相应的导航行为变化。机器学习方法可以有效地捕捉这些复杂和柔和的社会互动，无需显式地手工设计简化模型或成本函数。 Considering multiple available sensor modalities and the efficiency of learning methods, this paper presents a comprehensive study on learning social robot navigation with multimodal perception using a large-scale real-world dataset. The study investigates social robot navigation decision making on both the global and local planning levels and contrasts unimodal and multimodal learning against a set of classical navigation approaches in different social scenarios, while also analyzing the training and generalizability performance from the learning perspective. We also conduct a human study on how learning with multimodal perception affects the perceived social compliance. The results show that multimodal learning has a clear advantage over unimodal learning in both dataset and human studies. We open-source our code for the community's future use to study multimodal perception for learning social robot navigation.

Machine Learning Meets Advanced Robotic Manipulation

paper_url: http://arxiv.org/abs/2309.12560
repo_url: None
paper_authors: Saeid Nahavandi, Roohallah Alizadehsani, Darius Nahavandi, Chee Peng Lim, Kevin Kelly, Fernando Bello
for: 提高自动化生产质量、降低生产成本和更好地利用人员资源
methods: 机器学习方法
results: 提高安全性、可靠性和效率Here is a more detailed explanation of each point:
for: The paper is written to discuss the application of machine learning methods in automation and robotics, specifically in the context of manipulation tasks. The goal is to improve the quality, efficiency, and safety of automation systems.
methods: The paper reviews cutting-edge technologies and recent trends in machine learning methods applied to real-world manipulation tasks. It covers a wide range of applications in different domains, including industry, healthcare, agriculture, space, military, and search and rescue.
results: The paper highlights the potential of machine learning methods to improve the safety, reliability, and efficiency of automation systems. It provides an overview of the current state of the field and identifies important research directions for future works.

Abstract
Automated industries lead to high quality production, lower manufacturing cost and better utilization of human resources. Robotic manipulator arms have major role in the automation process. However, for complex manipulation tasks, hard coding efficient and safe trajectories is challenging and time consuming. Machine learning methods have the potential to learn such controllers based on expert demonstrations. Despite promising advances, better approaches must be developed to improve safety, reliability, and efficiency of ML methods in both training and deployment phases. This survey aims to review cutting edge technologies and recent trends on ML methods applied to real-world manipulation tasks. After reviewing the related background on ML, the rest of the paper is devoted to ML applications in different domains such as industry, healthcare, agriculture, space, military, and search and rescue. The paper is closed with important research directions for future works.

摘要
自动化业务会导致高质量生产、低成本生产和更好的人员资源利用。 robotic manipulator arms 在自动化过程中扮演着重要的角色。然而，对于复杂的抓拍任务，使用硬编程方法设计有效和安全的轨迹是挑战和时间consuming。机器学习方法有 potential 可以学习出专家示范的控制器。 despite promising advances, 以下是未来研究的重要方向：1. 提高安全性、可靠性和效率的机器学习方法，包括在训练和部署阶段。2. 应用机器学习方法到不同领域，如工业、医疗、农业、航天、军事和搜救等。3. 开发出可靠的机器学习模型，以满足实际应用需求。本文首先介绍了机器学习的相关背景，然后分别介绍了机器学习在不同领域的应用，包括工业、医疗、农业、航天、军事和搜救等。 finally, 本文结束于未来研究的重要方向。

Invariant Learning via Probability of Sufficient and Necessary Causes

paper_url: http://arxiv.org/abs/2309.12559
repo_url: https://github.com/ymy4323460/casn
paper_authors: Mengyue Yang, Zhen Fang, Yonggang Zhang, Yali Du, Furui Liu, Jean-Francois Ton, Jun Wang
for: 提高模型在未知训练分布下的泛化能力（OOD generalization）
methods: 利用 causality 的方法，具体是计算可能性极值（PNS），来捕捉可能性的必要和充分条件，并利用 PNS 风险来学习表示
results: 对synthetic和实际数据进行了实验，证明提出的方法有效，并进行了理论分析和证明，证明方法的泛化性。更多细节可以查看 GitHub 仓库：https://github.com/ymy4323460/CaSN。

Abstract
Out-of-distribution (OOD) generalization is indispensable for learning models in the wild, where testing distribution typically unknown and different from the training. Recent methods derived from causality have shown great potential in achieving OOD generalization. However, existing methods mainly focus on the invariance property of causes, while largely overlooking the property of \textit{sufficiency} and \textit{necessity} conditions. Namely, a necessary but insufficient cause (feature) is invariant to distribution shift, yet it may not have required accuracy. By contrast, a sufficient yet unnecessary cause (feature) tends to fit specific data well but may have a risk of adapting to a new domain. To capture the information of sufficient and necessary causes, we employ a classical concept, the probability of sufficiency and necessary causes (PNS), which indicates the probability of whether one is the necessary and sufficient cause. To associate PNS with OOD generalization, we propose PNS risk and formulate an algorithm to learn representation with a high PNS value. We theoretically analyze and prove the generalizability of the PNS risk. Experiments on both synthetic and real-world benchmarks demonstrate the effectiveness of the proposed method. The details of the implementation can be found at the GitHub repository: https://github.com/ymy4323460/CaSN.

摘要
OUT-OF-DISTRIBUTION（OOD）通用性是学习模型的必要条件，因为测试分布通常不同于训练分布。 recent methods based on causality have shown great potential in achieving OOD generalization. However, existing methods mainly focus on the invariance property of causes, while largely overlooking the property of \textit{sufficiency} and \textit{necessity} conditions. Specifically, a necessary but insufficient cause (feature) is invariant to distribution shift, but it may not have required accuracy. By contrast, a sufficient yet unnecessary cause (feature) tends to fit specific data well but may have a risk of adapting to a new domain. To capture the information of sufficient and necessary causes, we employ a classical concept, the probability of sufficiency and necessary causes (PNS), which indicates the probability of whether one is the necessary and sufficient cause. To associate PNS with OOD generalization, we propose PNS risk and formulate an algorithm to learn representation with a high PNS value. We theoretically analyze and prove the generalizability of the PNS risk. Experiments on both synthetic and real-world benchmarks demonstrate the effectiveness of the proposed method. For more details, please refer to the GitHub repository: .

PlanFitting: Tailoring Personalized Exercise Plans with Large Language Models

paper_url: http://arxiv.org/abs/2309.12555
repo_url: None
paper_authors: Donghoon Shin, Gary Hsieh, Young-Ho Kim
for: 本研究旨在帮助用户创建个性化的锻炼计划，以满足用户的具体需求和基本原则。
methods: 本研究使用了大语言模型的生成能力，让用户通过自然语言描述约束和查询，以生成和优化用户每周的锻炼计划。
results: 经过用户研究（N=18）和专家评估（N=3），研究发现PlanFitting可以生成个性化、可行、基于证据的锻炼计划。未来，研究人员可以通过AI助手创建计划，更好地遵循锻炼原则，并更好地适应用户的个性约束。

Abstract
A personally tailored exercise regimen is crucial to ensuring sufficient physical activities, yet challenging to create as people have complex schedules and considerations and the creation of plans often requires iterations with experts. We present PlanFitting, a conversational AI that assists in personalized exercise planning. Leveraging generative capabilities of large language models, PlanFitting enables users to describe various constraints and queries in natural language, thereby facilitating the creation and refinement of their weekly exercise plan to suit their specific circumstances while staying grounded in foundational principles. Through a user study where participants (N=18) generated a personalized exercise plan using PlanFitting and expert planners (N=3) evaluated these plans, we identified the potential of PlanFitting in generating personalized, actionable, and evidence-based exercise plans. We discuss future design opportunities for AI assistants in creating plans that better comply with exercise principles and accommodate personal constraints.

摘要
一个专门设计的运动计划是不可或缺的，以确保人们有足够的身体活动，但创建计划可以是具有复杂的时间表和考虑的挑战。我们介绍PlanFitting，一个以语言模型为基础的对话式人工智能，可以帮助用户创建个性化的运动计划。通过让用户使用自然语言描述各种限制和查询，PlanFitting可以帮助用户创建和调整每周的运动计划，以满足他们的具体情况，同时尊重基本的运动原则。经过一次用户研究（N=18）和专家规划师（N=3）评估这些计划，我们发现PlanFitting具有创建个性化、可行、基于证据的运动计划的潜力。我们讨论未来的设计机会，以更好地让人工智能助手遵循运动原则，并考虑个人的限制。

Provably Robust and Plausible Counterfactual Explanations for Neural Networks via Robust Optimisation

paper_url: http://arxiv.org/abs/2309.12545
repo_url: https://github.com/junqi-jiang/proplace
paper_authors: Junqi Jiang, Jianglin Lan, Francesco Leofante, Antonio Rago, Francesca Toni
for: 这篇论文的目的是解释神经网络分类器的counterfactual explanations（CEs）。
methods: 这篇论文提出了一种名为PROPLACE的方法，利用Robust optimization技术来解释神经网络分类器的CEs。
results: 对比六种基eline，PROPLACE在三个评价方面的表现最佳，其中五种基eline都是targeting robustness。

Abstract
Counterfactual Explanations (CEs) have received increasing interest as a major methodology for explaining neural network classifiers. Usually, CEs for an input-output pair are defined as data points with minimum distance to the input that are classified with a different label than the output. To tackle the established problem that CEs are easily invalidated when model parameters are updated (e.g. retrained), studies have proposed ways to certify the robustness of CEs under model parameter changes bounded by a norm ball. However, existing methods targeting this form of robustness are not sound or complete, and they may generate implausible CEs, i.e., outliers wrt the training dataset. In fact, no existing method simultaneously optimises for proximity and plausibility while preserving robustness guarantees. In this work, we propose Provably RObust and PLAusible Counterfactual Explanations (PROPLACE), a method leveraging on robust optimisation techniques to address the aforementioned limitations in the literature. We formulate an iterative algorithm to compute provably robust CEs and prove its convergence, soundness and completeness. Through a comparative experiment involving six baselines, five of which target robustness, we show that PROPLACE achieves state-of-the-art performances against metrics on three evaluation aspects.

摘要
counterfactual explanations (CEs) 已经收到了增加的关注，作为神经网络分类器的主要方法ологи。通常，对输入输出对的 CE 是指最近输入的数据点，被分类为不同的标签。为解决已经存在的问题， CE 在模型参数更新（例如 retrained）后会被无效化，研究者们已经提出了保证 CE 在模型参数变化下的稳定性的方法。然而，现有的方法不具备完善性和准确性，可能生成不合理的 CE，即训练数据集中的异常值。事实上，现有的方法没有同时优化 proximity 和 plausibility，保持robustness guarantees。在这种情况下，我们提出了可证实 Robust and PLAusible Counterfactual Explanations (PROPLACE)，一种基于robust optimization技术来解决现有文献中的限制。我们设计了一种迭代算法来计算可证实的 CE，并证明其 converges，完整性和准确性。通过对六个基elines进行比较实验，我们显示了 PROPLACE 在三个评价方面的状态前 performances。

2023-09-22

Poster: Self-Supervised Quantization-Aware Knowledge Distillation

AI-Copilot for Business Optimisation: A Framework and A Case Study in Production Scheduling

MISFIT-V: Misaligned Image Synthesis and Fusion using Information from Thermal and Visual

Assessing the Impact of Personality on Affective States from Video Game Communication

Intent-Aware Autonomous Driving: A Case Study on Highway Merging Scenarios

A Practical Survey on Zero-shot Prompt Design for In-context Learning

Large Language Models and Control Mechanisms Improve Text Readability of Biomedical Abstracts

Towards Green AI in Fine-tuning Large Language Models via Adaptive Backpropagation

Masked Discriminators for Content-Consistent Unpaired Image-to-Image Translation

Diagnosing and exploiting the computational demands of videos games for deep reinforcement learning

AI Risk Profiles: A Standards Proposal for Pre-Deployment AI Risk Disclosures

Investigating Efficient Deep Learning Architectures For Side-Channel Attacks on AES

Large Language Models Are Also Good Prototypical Commonsense Reasoners

GAMIX-VAE: A VAE with Gaussian Mixture Based Posterior

Contextual Emotion Estimation from Image Captions

Insights from an OTTR-centric Ontology Engineering Methodology

E(2)-Equivariant Graph Planning for Navigation

MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation

Memory-augmented conformer for improved end-to-end long-form ASR

OpportunityFinder: A Framework for Automated Causal Inference

A Hybrid Deep Learning-based Approach for Optimal Genotype by Environment Selection

Efficient N:M Sparse DNN Training Using Algorithm, Architecture, and Dataflow Co-Design

ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs

Pursuing Counterfactual Fairness via Sequential Autoencoder Across Domains

Audience-specific Explanations for Machine Translation

Higher-order Graph Convolutional Network with Flower-Petals Laplacians on Simplicial Complexes

Trusta: Reasoning about Assurance Cases with Formal Methods and Large Language Models

Self-Explanation Prompting Improves Dialogue Understanding in Large Language Models

Frustrated with Code Quality Issues? LLMs can Help!

On Separate Normalization in Self-supervised Transformers

Lamarck’s Revenge: Inheritance of Learned Traits Can Make Robot Evolution Better

A matter of attitude: Focusing on positive and active gradients to boost saliency maps

KG-MDL: Mining Graph Patterns in Knowledge Graphs with the MDL Principle

ProtoEM: A Prototype-Enhanced Matching Framework for Event Relation Extraction

Gravity Network for end-to-end small lesion detection

AnglE-optimized Text Embeddings

Accurate and Fast Compressed Video Captioning

Domain Adaptation for Arabic Machine Translation: The Case of Financial Texts

Diffusion Augmentation for Sequential Recommendation

AxOCS: Scaling FPGA-based Approximate Operators using Configuration Supersampling

Synthetic Boost: Leveraging Synthetic Data for Enhanced Vision-Language Segmentation in Echocardiography

OmniDrones: An Efficient and Flexible Platform for Reinforcement Learning in Drone Control

A Spectral Theory of Neural Prediction and Alignment

Computational Natural Philosophy: A Thread from Presocratics through Turing to ChatGPT

Masking Improves Contrastive Self-Supervised Learning for ConvNets, and Saliency Tells You Where

Towards an MLOps Architecture for XAI in Industrial Applications

OpenAi’s GPT4 as coding assistant

Defeasible Reasoning with Knowledge Graphs

In-context Interference in Chat-based Large Language Models

H2O+: An Improved Framework for Hybrid Offline-and-Online RL with Dynamics Gaps

The Mathematical Game

PointSSC: A Cooperative Vehicle-Infrastructure Point Cloud Benchmark for Semantic Scene Completion

Multi-Label Noise Transition Matrix Estimation with Label Correlations: Theory and Algorithm

Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning

Enhancing Graph Representation of the Environment through Local and Cloud Computation

QAL-BP: An Augmented Lagrangian Quantum Approach for Bin Packing Problem

TrTr: A Versatile Pre-Trained Large Traffic Model based on Transformer for Capturing Trajectory Diversity in Vehicle Population

Vision Transformers for Computer Go

On Sparse Modern Hopfield Model

How to Fine-tune the Model: Unified Model Shift and Model Bias Policy Optimization

Natural revision is contingently-conditionalized revision

Are Deep Learning Classification Results Obtained on CT Scans Fair and Interpretable?

A Quantum Computing-based System for Portfolio Optimization using Future Asset Values and Automatic Reduction of the Investment Universe

Construction contract risk identification based on knowledge-augmented language model

DRG-LLaMA : Tuning LLaMA Model to Predict Diagnosis-related Group for Hospitalized Patients

From Text to Trends: A Unique Garden Analytics Perspective on the Future of Modern Agriculture

Understanding Patterns of Deep Learning ModelEvolution in Network Architecture Search

Creativity Support in the Age of Large Language Models: An Empirical Study Involving Emerging Writers

A Study on Learning Social Robot Navigation with Multimodal Perception

Machine Learning Meets Advanced Robotic Manipulation

Invariant Learning via Probability of Sufficient and Necessary Causes

PlanFitting: Tailoring Personalized Exercise Plans with Large Language Models

Provably Robust and Plausible Counterfactual Explanations for Neural Networks via Robust Optimisation