2023-10-15

cs.AI

cs.AI - 2023-10-15

On Statistical Learning of Branch and Bound for Vehicle Routing Optimization

paper_url: http://arxiv.org/abs/2310.09986
repo_url: https://github.com/isotlaboratory/ml4vrp
paper_authors: Andrew Naguib, Waleed A. Yousef, Issa Traoré, Mohammad Mamun
for: solve the capacitated vehicle routing problem (CVRP) using machine learning
methods: utilize and compare the performance of three neural networks (GCNN, GraphSAGE, and GAT) to emulate the Strong Branching strategy
results: match or improve upon the performance of the branch and bound algorithm with significantly less computational time

Abstract
Recently, machine learning of the branch and bound algorithm has shown promise in approximating competent solutions to NP-hard problems. In this paper, we utilize and comprehensively compare the outcomes of three neural networks--graph convolutional neural network (GCNN), GraphSAGE, and graph attention network (GAT)--to solve the capacitated vehicle routing problem. We train these neural networks to emulate the decision-making process of the computationally expensive Strong Branching strategy. The neural networks are trained on six instances with distinct topologies from the CVRPLIB and evaluated on eight additional instances. Moreover, we reduced the minimum number of vehicles required to solve a CVRP instance to a bin-packing problem, which was addressed in a similar manner. Through rigorous experimentation, we found that this approach can match or improve upon the performance of the branch and bound algorithm with the Strong Branching strategy while requiring significantly less computational time. The source code that corresponds to our research findings and methodology is readily accessible and available for reference at the following web address: https://isotlaboratory.github.io/ml4vrp

摘要
近些时间，机器学习的分支和约束算法在解决NP困难问题中表现出了承诺。在这篇论文中，我们利用了三种神经网络--图 convolutional neural network (GCNN), GraphSAGE, 和 graph attention network (GAT) --来解决具有限制的车辆路径问题。我们使用这些神经网络来模拟计算成本较高的强分支策略的决策过程。我们在CVRPLIB中采样了六个不同的topology实例，并对八个额外实例进行了评估。此外，我们将CVRP实例中的最小车辆数量降低到了一个箱包问题，该问题在类似的方式进行了解决。经过严格的实验，我们发现这种方法可以与分支和约束算法中的强分支策略相匹配或超越性能，并且需要 significatively less computational time。相关的研究发现和方法的源代码可以在以下网址查看：https://isotlaboratory.github.io/ml4vrp

Farzi Data: Autoregressive Data Distillation

paper_url: http://arxiv.org/abs/2310.09983
repo_url: None
paper_authors: Noveen Sachdeva, Zexue He, Wang-Cheng Kang, Jianmo Ni, Derek Zhiyuan Cheng, Julian McAuley
for: 本研究旨在为自动逆进机器学习任务提供数据减混技术，以便在训练大型模型时采用更小的数据量。
methods: 我们提出了一种名为“Farzi”的方法，它可以将输入和输出之间的紧密左右 causal 结构转化为一小批 synthetic 序列（Farzi Data），以保持或提高模型性能。 Farzi 在内部使用了高效的反向模板导数和积分产品来实现内存灵活的数据减混。
results: 我们在测试sequential recommendation和语言模型任务中，可以使用 Farzi Data 的0.1%到原始数据大小的比例来训练现代模型，并达到98-120%的下游全数据性能。这表明可以通过减少数据量来训练更好的模型，并开启了将来大型自动逆进机器学习模型的设计和数据量的扩展的新机遇。

Abstract
We study data distillation for auto-regressive machine learning tasks, where the input and output have a strict left-to-right causal structure. More specifically, we propose Farzi, which summarizes an event sequence dataset into a small number of synthetic sequences -- Farzi Data -- which are optimized to maintain (if not improve) model performance compared to training on the full dataset. Under the hood, Farzi conducts memory-efficient data distillation by (i) deriving efficient reverse-mode differentiation of the Adam optimizer by leveraging Hessian-Vector Products; and (ii) factorizing the high-dimensional discrete event-space into a latent-space which provably promotes implicit regularization. Empirically, for sequential recommendation and language modeling tasks, we are able to achieve 98-120% of downstream full-data performance when training state-of-the-art models on Farzi Data of size as little as 0.1% of the original dataset. Notably, being able to train better models with significantly less data sheds light on the design of future large auto-regressive models, and opens up new opportunities to further scale up model and data sizes.

摘要
我们研究数据简化技术，用于自动递归学习任务，输入和输出具有约束性的左向 causal 结构。我们提出了 Farzi，它将事件序列数据总结为一小数量的 sintetic 序列 -- Farzi 数据 -- 以保持（或更好）模型性能相对训练全 dataset。在实现方面，Farzi 实现了内存有效的数据简化，通过以下两个方法：(i) 通过利用 Hessian-Vector Products 来 derivate Adam 优化器的逆向Mode 导数，实现高效的数据简化。(ii) 通过 факторизе 高维 discrete 事件空间为尺度空间，实现隐式 regularization。在实验中，我们在序列推荐和自然语言处理任务中，可以使用 Farzi 数据训练现有模型，达到了原始数据的 98-120% 的下游性能。这表明可以通过减少数据量训练更好的模型，为未来大型自动递归模型的设计提供了新的思路，并开创了训练模型和数据量的新机遇。

Chinese Painting Style Transfer Using Deep Generative Models

paper_url: http://arxiv.org/abs/2310.09978
repo_url: https://github.com/yanyangbaobeiisemma/chinsepaintingstyletransfer
paper_authors: Weijian Ma, Yanyang Kong
for: 本研究旨在将传统中国画风 transferred to modern images like nature objects, portraits and landscapes.
methods: 我们将使用 state-of-the-art deep generative models for Chinese painting style transfer, 并评估其表现 both qualitatively and quantitatively. 此外，我们还提出了一种 combining several style transfer models for our task.
results: 我们将在本研究中评估和比较不同的深度生成模型在传统中国画风转移 task 中的表现, 并提出一种新的方法 combination 多种风格转移模型。

Abstract
Artistic style transfer aims to modify the style of the image while preserving its content. Style transfer using deep learning models has been widely studied since 2015, and most of the applications are focused on specific artists like Van Gogh, Monet, Cezanne. There are few researches and applications on traditional Chinese painting style transfer. In this paper, we will study and leverage different state-of-the-art deep generative models for Chinese painting style transfer and evaluate the performance both qualitatively and quantitatively. In addition, we propose our own algorithm that combines several style transfer models for our task. Specifically, we will transfer two main types of traditional Chinese painting style, known as "Gong-bi" and "Shui-mo" (to modern images like nature objects, portraits and landscapes.

摘要
<>文化风格转移目的是对图像的风格进行修改，保留其内容。 Deep learning模型在2015年之后广泛研究了风格转移，大多数应用都是专注于特定艺术家如万高、蒙德、刺激。有很少的研究和应用在传统中国画风格转移方面。在这篇论文中，我们将研究和利用不同的国际先进的生成模型进行中国画风格转移，评估其性能 both qualitatively和quantitatively。此外，我们还提出了我们自己的算法，将多种风格转移模型结合起来用于我们的任务。具体来说，我们将将“公笔”和“水墨”两种传统中国画风格转移到现代图像中，如自然景观、人像和风景等。Translation notes:* "Gong-bi" (工笔) and "Shui-mo" (水墨) are two main types of traditional Chinese painting styles.* "公笔" and "水墨" are both translated as "Chinese painting style" in the text, but they refer to different specific styles.* "国际先进的生成模型" (international advanced generative models) is a phrase used to refer to state-of-the-art deep learning models.* "qualitatively" and "quantitatively" are both translated as "both qualitatively and quantitatively" in the text, but "qualitatively" refers to the subjective evaluation of the results, while "quantitatively" refers to the objective evaluation using metrics such as PSNR or SSIM.

Specialized Deep Residual Policy Safe Reinforcement Learning-Based Controller for Complex and Continuous State-Action Spaces

paper_url: http://arxiv.org/abs/2310.14788
repo_url: https://github.com/ammar-n-abbas/CoL-SDRPRL
paper_authors: Ammar N. Abbas, Georgios C. Chasparis, John D. Kelleher
For: The paper is written to address the limitations of traditional controllers in safety-critical environments, and to propose a specialized deep reinforcement learning approach for complex and continuous state-action spaces.* Methods: The paper proposes a cycle of learning approach that combines residual policy learning with expert trajectory guidance, and specializes the policy through input-output hidden Markov model to optimize the policy within the region of interest.* Results: The proposed solution is validated on the Tennessee Eastman process control, and the results show that the hybrid control architecture that combines the reinforcement learning agent with the conventional controller can improve the control performance and adapt to abnormal situations.Here’s the simplified Chinese text for the three key points:* For: 这篇论文是为了解决传统控制器在安全关键环境中的局限性，并提出一种特殊的深度强化学习方法来处理复杂的状态动作空间。* Methods: 论文提出了一种循环学习方法， combining residual policy learning with expert trajectory guidance, 并通过输入输出隐马尔可夫模型特化策略以优化策略在兴趣区域内。* Results: 论文在田州东曼过程控制中验证了该解决方案，结果显示了hybrid控制架构， combining reinforcement learning agent和传统控制器，可以提高控制性能并适应异常情况。

Abstract
Traditional controllers have limitations as they rely on prior knowledge about the physics of the problem, require modeling of dynamics, and struggle to adapt to abnormal situations. Deep reinforcement learning has the potential to address these problems by learning optimal control policies through exploration in an environment. For safety-critical environments, it is impractical to explore randomly, and replacing conventional controllers with black-box models is also undesirable. Also, it is expensive in continuous state and action spaces, unless the search space is constrained. To address these challenges we propose a specialized deep residual policy safe reinforcement learning with a cycle of learning approach adapted for complex and continuous state-action spaces. Residual policy learning allows learning a hybrid control architecture where the reinforcement learning agent acts in synchronous collaboration with the conventional controller. The cycle of learning initiates the policy through the expert trajectory and guides the exploration around it. Further, the specialization through the input-output hidden Markov model helps to optimize policy that lies within the region of interest (such as abnormality), where the reinforcement learning agent is required and is activated. The proposed solution is validated on the Tennessee Eastman process control.

摘要
传统控制器有限制，因为它们基于前期知识，需要动态模型化，并且在异常情况下表现不佳。深度权值学习有可能解决这些问题，通过环境中的探索学习优化控制策略。但是，在安全关键环境下，随机探索是不现实istic，而替换传统控制器的黑obox模型也不符合意愿。此外，在连续状态和动作空间中进行搜索也是昂贵的。为了解决这些挑战，我们提出了特殊化的深度剩余政策安全权值学习，采用环境中的循环学习策略。剩余政策学习允许在传统控制器和权值学习代理之间同步协作，并且通过输入-输出隐藏马尔可夫模型进行特殊化，以便优化政策，使其在特定区域（如异常情况）中表现最佳。我们的解决方案在田中东曼制程控制中得到验证。

Seeking Next Layer Neurons’ Attention for Error-Backpropagation-Like Training in a Multi-Agent Network Framework

paper_url: http://arxiv.org/abs/2310.09952
repo_url: None
paper_authors: Arshia Soltani Moakhar, Mohammad Azizmalayeri, Hossein Mirzaei, Mohammad Taghi Manzuri, Mohammad Hossein Rohban
for: 这 paper 的目的是提出一种基于 local objective 的多智能体神经网络训练方法，以提高神经网络在实际问题中的应用性。methods: 该 paper 使用了一种基于自利 Interest 的神经网络模型，并对其进行了优化。在这种模型中，每个神经元尝试通过 Maximizing 其自己的局部目标来适应神经网络的训练。results: 该 paper 通过三个数据集的实验表明，使用这种方法可以提高神经网络在快速学习和牵扯问题中的性能，并在灾变性学习测试中超过 error-backpropagation。

Abstract
Despite considerable theoretical progress in the training of neural networks viewed as a multi-agent system of neurons, particularly concerning biological plausibility and decentralized training, their applicability to real-world problems remains limited due to scalability issues. In contrast, error-backpropagation has demonstrated its effectiveness for training deep networks in practice. In this study, we propose a local objective for neurons that, when pursued by neurons individually, align them to exhibit similarities to error-backpropagation in terms of efficiency and scalability during training. For this purpose, we examine a neural network comprising decentralized, self-interested neurons seeking to maximize their local objective -- attention from subsequent layer neurons -- and identify the optimal strategy for neurons. We also analyze the relationship between this strategy and backpropagation, establishing conditions under which the derived strategy is equivalent to error-backpropagation. Lastly, we demonstrate the learning capacity of these multi-agent neural networks through experiments on three datasets and showcase their superior performance relative to error-backpropagation in a catastrophic forgetting benchmark.

摘要
具有很大理论进步的神经网络 viewed as a multi-agent system of neurons 的训练，特别是生物可能性和分散式训练，却因为扩展性问题而受限。相比之下，错误反射法在实务中证明了它的有效性 для 训练深度网络。在这篇研究中，我们提出了一个本地目标 для neurons，使得它们个别努力以获得类似于错误反射法的有效性和扩展性 durante 训练。为了实现这个目标，我们对一个分散式、自利 neurons 组成的神经网络进行了分析，并找到了最佳策略 для neurons。我们还分析了这策略和错误反射之间的关系，并证明了在某些情况下， derivated 策略与错误反射法相同。最后，我们透过实验证明了这些多客体神经网络的学习能力，并在三个数据集上显示了它们的超越性。

Chameleon: a Heterogeneous and Disaggregated Accelerator System for Retrieval-Augmented Language Models

paper_url: http://arxiv.org/abs/2310.09949
repo_url: None
paper_authors: Wenqi Jiang, Marco Zeller, Roger Waleffe, Torsten Hoefler, Gustavo Alonso
methods: 该研究使用了一种约束语言模型（LM）和检索器的异质加速器系统，以提高LM的执行效率。results: 研究发现，使用Chameleon系统可以实现23.72倍的速度提升和26.2倍的能效率提升，相比CPU和GPU vector搜索系统。此外，Chameleon系统在不同RALM配置下可以实现1.16倍的响应时间减少和3.18倍的速度提升。

Abstract
A Retrieval-Augmented Language Model (RALM) augments a generative language model by retrieving context-specific knowledge from an external database. This strategy facilitates impressive text generation quality even with smaller models, thus reducing orders of magnitude of computational demands. However, RALMs introduce unique system design challenges due to (a) the diverse workload characteristics between LM inference and retrieval and (b) the various system requirements and bottlenecks for different RALM configurations such as model sizes, database sizes, and retrieval frequencies. We propose Chameleon, a heterogeneous accelerator system that integrates both LM and retrieval accelerators in a disaggregated architecture. The heterogeneity ensures efficient acceleration of both LM inference and retrieval, while the accelerator disaggregation enables the system to independently scale both types of accelerators to fulfill diverse RALM requirements. Our Chameleon prototype implements retrieval accelerators on FPGAs and assigns LM inference to GPUs, with a CPU server orchestrating these accelerators over the network. Compared to CPU-based and CPU-GPU vector search systems, Chameleon achieves up to 23.72x speedup and 26.2x energy efficiency. Evaluated on various RALMs, Chameleon exhibits up to 2.16x reduction in latency and 3.18x speedup in throughput compared to the hybrid CPU-GPU architecture. These promising results pave the way for bringing accelerator heterogeneity and disaggregation into future RALM systems.

摘要
一种叫做Retrieval-Augmented Language Model（RALM）的语言模型可以通过从外部数据库中获取上下文特定的知识来增强生成语言模型。这种策略使得even with smaller models可以达到出色的文本生成质量，从而降低了计算需求的级别。然而，RALM引入了一些独特的系统设计挑战，包括（a）语言模型推理和检索工作负荷的多样性，以及（b）不同的RALM配置，如模型大小、数据库大小和检索频率等的系统需求和瓶颈。我们提出了一种叫做Chameleon的异步加速器系统，它将语言模型推理和检索加速器分解成不同的硬件模块。这种多样性和分解使得系统可以独立地扩展两类加速器，以满足不同的RALM需求。我们的Chameleon原型在FPGA上实现检索加速器，并将语言模型推理分配给GPU。CPU服务器通过网络管理这些加速器。相比CPU基于和CPU-GPU вектор搜索系统，Chameleon可以达到23.72倍的速度提升和26.2倍的能效率提升。在不同的RALM系统上进行了评估，Chameleon可以减少响应时间2.16倍，提高通过put Throughput 3.18倍。这些出色的结果铺平了将加速器多样性和分解引入未来RALM系统的道路。

“Reading Between the Heat”: Co-Teaching Body Thermal Signatures for Non-intrusive Stress Detection

paper_url: http://arxiv.org/abs/2310.09932
repo_url: None
paper_authors: Yi Xiao, Harshit Sharma, Zhongyang Zhang, Dessa Bergen-Cico, Tauhidur Rahman, Asif Salekin
for: 这 paper 是为了开发一种可靠的、无接触的indoor stress监测系统，用于评估工作场所产能、智能家庭和个性化心理健康监测。
methods: 这 paper 使用了 ThermaStrain，一种共同教学框架，将穿戴式 electrodermal activity (EDA) 传感器和无接触thermal感知结合使用，以提高无接触stress监测的精度。
results: 这 paper 的实验结果表明，ThermaStrain 可以在不同的距离和压力情况下，实现高精度的stress分类，并且在实时执行、边缘计算和多个人感知方面表现出色。

Abstract
Stress impacts our physical and mental health as well as our social life. A passive and contactless indoor stress monitoring system can unlock numerous important applications such as workplace productivity assessment, smart homes, and personalized mental health monitoring. While the thermal signatures from a user's body captured by a thermal camera can provide important information about the "fight-flight" response of the sympathetic and parasympathetic nervous system, relying solely on thermal imaging for training a stress prediction model often lead to overfitting and consequently a suboptimal performance. This paper addresses this challenge by introducing ThermaStrain, a novel co-teaching framework that achieves high-stress prediction performance by transferring knowledge from the wearable modality to the contactless thermal modality. During training, ThermaStrain incorporates a wearable electrodermal activity (EDA) sensor to generate stress-indicative representations from thermal videos, emulating stress-indicative representations from a wearable EDA sensor. During testing, only thermal sensing is used, and stress-indicative patterns from thermal data and emulated EDA representations are extracted to improve stress assessment. The study collected a comprehensive dataset with thermal video and EDA data under various stress conditions and distances. ThermaStrain achieves an F1 score of 0.8293 in binary stress classification, outperforming the thermal-only baseline approach by over 9%. Extensive evaluations highlight ThermaStrain's effectiveness in recognizing stress-indicative attributes, its adaptability across distances and stress scenarios, real-time executability on edge platforms, its applicability to multi-individual sensing, ability to function on limited visibility and unfamiliar conditions, and the advantages of its co-teaching approach.

摘要
压力会影响我们的身体和心理健康以及我们的社会生活。一个不需要接触和干预的indoor压力监测系统可以开启多个重要应用程序，如工作场所产量评估、智能家庭和个性化压力监测。而thermal图像中的用户体表的热签ature可以提供关键的“战斗或逃脱”压力反应信息，但凭借热成像alone来训练压力预测模型可能会导致过拟合，从而影响性能。这篇论文解决了这个挑战，通过引入ThermaStrain，一种新的合作学习框架，实现高精度压力预测表现。在训练过程中，ThermaStrain使用了一个穿着式电导活动（EDA）传感器，将热成像中的压力指示符转换为穿着式EDA传感器的压力指示符，以便在训练过程中增强模型的鲁棒性。在测试过程中，只使用热成像，从热成像和模拟EDA表示中提取压力指示符，以提高压力评估。研究采集了包括热成像和EDA数据在内的全面数据集，ThermaStrain在二分类压力预测中取得F1分数为0.8293，在热成像基eline方法上出performancedoor9%。广泛的评估表明ThermaStrain具有识别压力指示符的能力，适应不同距离和压力情况，实时执行在边缘平台上，适用于多个个体感知，在有限视力和不熟悉情况下可行，以及合作学习的优势。

Estimating Uncertainty in Multimodal Foundation Models using Public Internet Data

paper_url: http://arxiv.org/abs/2310.09926
repo_url: https://github.com/AlaaLab/WebCP
paper_authors: Shiladitya Dutta, Hongbo Wei, Lars van der Laan, Ahmed M. Alaa
for: 这种论文是为了解决零shot预测中的不确定性问题。
methods: 该论文使用了自我超vised学习，并在测试时使用CLIP样式模型进行零shot分类。它还使用了一种新的协Forms score来衡量预测的可靠性。
results: 研究人员通过使用web数据进行 calibration，实现了针对各种生物医学数据集的零shot预测。他们的初步结果表明，通过在测试时使用网络上的calibration数据，可以实现预测的目标覆盖率，并且效率相对较高。

Abstract
Foundation models are trained on vast amounts of data at scale using self-supervised learning, enabling adaptation to a wide range of downstream tasks. At test time, these models exhibit zero-shot capabilities through which they can classify previously unseen (user-specified) categories. In this paper, we address the problem of quantifying uncertainty in these zero-shot predictions. We propose a heuristic approach for uncertainty estimation in zero-shot settings using conformal prediction with web data. Given a set of classes at test time, we conduct zero-shot classification with CLIP-style models using a prompt template, e.g., "an image of a ", and use the same template as a search query to source calibration data from the open web. Given a web-based calibration set, we apply conformal prediction with a novel conformity score that accounts for potential errors in retrieved web data. We evaluate the utility of our proposed method in Biomedical foundation models; our preliminary results show that web-based conformal prediction sets achieve the target coverage with satisfactory efficiency on a variety of biomedical datasets.

摘要
基础模型通过大规模数据的自我超vision学习训练，可以适应各种下游任务。在测试时，这些模型可以通过零批预测来分类之前未看到的类别。在这篇论文中，我们解决了零批预测中的uncertainty量化问题。我们提出了一种启发式方法，使用web数据来实现零批预测中的uncertainty估计。给定一组测试时的类别，我们使用CLIP样式的模型进行零批分类，使用提示模板，例如“一张<类别>的图像”，并使用相同的模板作为搜索关键词来源网络数据。给定一个网络基础的核心集，我们应用彩色预测技术，使用一种新的彩色度分数，考虑可能存在的网络数据错误。我们对生物基础模型进行了初步的实验结果，表明在各种生物数据集上，网络基础的彩色预测集可以达到目标覆盖率，并且具有满意的效率。

Homophone Disambiguation Reveals Patterns of Context Mixing in Speech Transformers

paper_url: http://arxiv.org/abs/2310.09925
repo_url: https://github.com/hmohebbi/ContextMixingASR
paper_authors: Hosein Mohebbi, Grzegorz Chrupała, Willem Zuidema, Afra Alishahi
for: This paper aims to investigate how measures of ‘context-mixing’ developed for text models can be adapted and applied to models of spoken language, specifically in the case of homophony in French.
methods: The authors use a series of controlled experiments and probing analyses on Transformer-based speech models to explore how representations in encoder-only models and encoder-decoder models incorporate syntactic cues to identify the correct transcription.
results: The authors find that representations in encoder-only models effectively incorporate these cues, while encoders in encoder-decoder models mainly relegate the task of capturing contextual dependencies to decoder modules.

Abstract
Transformers have become a key architecture in speech processing, but our understanding of how they build up representations of acoustic and linguistic structure is limited. In this study, we address this gap by investigating how measures of 'context-mixing' developed for text models can be adapted and applied to models of spoken language. We identify a linguistic phenomenon that is ideal for such a case study: homophony in French (e.g. livre vs livres), where a speech recognition model has to attend to syntactic cues such as determiners and pronouns in order to disambiguate spoken words with identical pronunciations and transcribe them while respecting grammatical agreement. We perform a series of controlled experiments and probing analyses on Transformer-based speech models. Our findings reveal that representations in encoder-only models effectively incorporate these cues to identify the correct transcription, whereas encoders in encoder-decoder models mainly relegate the task of capturing contextual dependencies to decoder modules.

摘要
听说模型已成为语音处理中关键的建筑，但我们对它们如何建立语音和文本结构的表示还是有限的。在这项研究中，我们尝试将文本模型中的'上下文混合'度量应用到语音模型中，以更好地理解它们如何建立表示。我们选择了一种语言现象，即法语中的同音异义（例如，"livre" vs "livres"），这种现象需要语音识别模型通过 determiners 和 Pronouns 等语法提示来纠正 spoken 词的意思，并且将其转录为句子中的正确形式。我们进行了一系列控制的实验和探索分析，发现encoder-only模型中的表示能够有效地捕捉这些语法提示，而encoder-decoder模型中的encoder模块主要通过decoder模块来捕捉上下文关系。

Predictive Maintenance Model Based on Anomaly Detection in Induction Motors: A Machine Learning Approach Using Real-Time IoT Data

paper_url: http://arxiv.org/abs/2310.14949
repo_url: None
paper_authors: Sergio F. Chevtchenko, Monalisa C. M. dos Santos, Diego M. Vieira, Ricardo L. Mota, Elisson Rocha, Bruna V. Cruz, Danilo Araújo, Ermeson Andrade
for: 本研究旨在透过互联网路物 (IoT) 设备收集腐败现象数据，并运用数据驱动模型进行异常检测在工业设备中。
methods: 本研究使用了一组融合预处理技术和机器学习 (ML) 模型，包括快速傅立叶 transform (FFT)、波лет трансформа (WT) 和分割，以提取数据的特征。本研究还使用多目标优化和分析以保证异常检测率、假阳性率和推论速率之间的最佳平衡。
results: 本研究获得了一系列的实验结果，证明了融合预处理技术和 ML 模型可以实现高精度异常检测，并且可以在不同的工业上适用。

Abstract
With the support of Internet of Things (IoT) devices, it is possible to acquire data from degradation phenomena and design data-driven models to perform anomaly detection in industrial equipment. This approach not only identifies potential anomalies but can also serve as a first step toward building predictive maintenance policies. In this work, we demonstrate a novel anomaly detection system on induction motors used in pumps, compressors, fans, and other industrial machines. This work evaluates a combination of pre-processing techniques and machine learning (ML) models with a low computational cost. We use a combination of pre-processing techniques such as Fast Fourier Transform (FFT), Wavelet Transform (WT), and binning, which are well-known approaches for extracting features from raw data. We also aim to guarantee an optimal balance between multiple conflicting parameters, such as anomaly detection rate, false positive rate, and inference speed of the solution. To this end, multiobjective optimization and analysis are performed on the evaluated models. Pareto-optimal solutions are presented to select which models have the best results regarding classification metrics and computational effort. Differently from most works in this field that use publicly available datasets to validate their models, we propose an end-to-end solution combining low-cost and readily available IoT sensors. The approach is validated by acquiring a custom dataset from induction motors. Also, we fuse vibration, temperature, and noise data from these sensors as the input to the proposed ML model. Therefore, we aim to propose a methodology general enough to be applied in different industrial contexts in the future.

摘要
“利用互联网络器件（IoT），可以从损坏现象中获取数据，设计数据驱动的模型以进行异常检测在工业设备中。这种方法不仅可以检测出可能的异常，而且可以作为建立预测维护政策的第一步。在这个工作中，我们展示了一个新的异常检测系统，应用于对发电机（induction motor）进行验证。这个工作使用了一组合的预处理技术，包括快速傅立叶变换（FFT）、wavelet变换（WT）和分割，这些技术都是抽象数据的常用方法。我们还希望确保多项衡量的依势关系，例如异常检测率、伪阳性率和推理速度，得到一个优化的解。为达到这个目的，我们进行多目标优化和分析。得到的 pareto 最佳解可以选择最佳的模型，以及评估这些模型的数据驱动和计算成本。不同于大多数在这个领域中使用公开available的数据集来验证他们的模型，我们提出了一个终端解决方案， combining 低成本和易于入手的 IoT 感应器。我们将这种方法应用于不同的工业上下，以提高维护效率和降低成本。”

Lifelong Sequence Generation with Dynamic Module Expansion and Adaptation

paper_url: http://arxiv.org/abs/2310.09886
repo_url: None
paper_authors: Chengwei Qin, Chen Chen, Shafiq Joty
for: 解决 continual learning 中的 Life-long Sequence Generation (LSG) 问题，即在不断训练模型的同时，总结出来的新生成模式，而不是忘记之前的知识。
methods: 我们提出了 Dynamic Module Expansion and Adaptation (DMEA) 方法，即在任务相似性的基础上动态决定模型需要的架构，并选择最相似的先前任务来促进新任务的适应性。同时，我们还提出了动态梯度缩放，以保持当前任务和先前任务的学习平衡。
results: 通过广泛的实验，我们示出了 DMEA 可以在不同的 LSG 设定下表现出色，常常超越现有的方法。

Abstract
Lifelong sequence generation (LSG), a problem in continual learning, aims to continually train a model on a sequence of generation tasks to learn constantly emerging new generation patterns while avoiding the forgetting of previous knowledge. Existing LSG methods mainly focus on maintaining old knowledge while paying little attention to knowledge transfer across tasks. In contrast, humans can better learn new tasks by leveraging previously acquired knowledge from similar tasks. Inspired by the learning paradigm of humans, we propose Dynamic Module Expansion and Adaptation (DMEA), which enables the model to dynamically determine the architecture for acquiring new knowledge based on task correlation and select the most similar previous tasks to facilitate adaptation to new tasks. In addition, as the learning process can easily be biased towards the current task which might cause more severe forgetting of previously learned knowledge, we propose dynamic gradient scaling to balance the learning of the current task and replayed tasks. With extensive experiments, we demonstrate that DMEA can consistently outperform existing methods in different LSG settings.

摘要
这是一个生命长序列生成（LSG）问题，它是一种持续学习的问题，旨在不断训练一个模型，以学习不断出现的新生成模式，而且避免遗传知识的忘记。现有的LSG方法主要是维护古代知识，对任务之间的知识传递甚少关注。然而，人类在学习新任务时，可以更好地利用先前所获得的知识，以便更好地适应新任务。受人类学习模式启发，我们提出了动态模组扩展和适应（DMEA）方法，让模型在任务相似度和先前任务之间进行动态决定模组架构，并选择最相似的先前任务来促进新任务的适应。此外，当学习过程可能会偏向现在任务，导致更严重的知识忘记，我们提出了动态GradientScaling来均衡现在任务和重复任务的学习。经过广泛的实验，我们证明了DMEA可以在不同的LSG设定中具有优秀的表现。

In-Context Learning with Iterative Demonstration Selection

paper_url: http://arxiv.org/abs/2310.09881
repo_url: None
paper_authors: Chengwei Qin, Aston Zhang, Anirudh Dagar, Wenming Ye
for: 提高大语言模型（LLM）在几个示例下学习中的表现。
methods: Iterative Demonstration Selection（IDS）方法，使用零shot chain-of-thoughtreasoning（Zero-shot-CoT）选择示例，并在多个迭代中选择最佳示例。
results: 在多个任务上，包括通用理解、问答、话题分类和情感分析，IDS方法可以一直 exceed 现有的ICL示例选择方法。

Abstract
Spurred by advancements in scale, large language models (LLMs) have demonstrated strong few-shot learning ability via in-context learning (ICL). However, the performance of ICL has been shown to be highly sensitive to the selection of few-shot demonstrations. Selecting the most suitable examples as context remains an ongoing challenge and an open problem. Existing literature has highlighted the importance of selecting examples that are diverse or semantically similar to the test sample while ignoring the fact that the optimal selection dimension, i.e., diversity or similarity, is task-specific. Leveraging the merits of both dimensions, we propose Iterative Demonstration Selection (IDS). Using zero-shot chain-of-thought reasoning (Zero-shot-CoT), IDS iteratively selects examples that are diverse but still strongly correlated with the test sample as ICL demonstrations. Specifically, IDS applies Zero-shot-CoT to the test sample before demonstration selection. The output reasoning path is then used to choose demonstrations that are prepended to the test sample for inference. The generated answer is accompanied by its corresponding reasoning path for extracting a new set of demonstrations in the next iteration. After several iterations, IDS adopts majority voting to obtain the final result. Through extensive experiments on tasks including commonsense reasoning, question answering, topic classification, and sentiment analysis, we demonstrate that IDS can consistently outperform existing ICL demonstration selection methods.

摘要
促进了规模的进步，大语言模型（LLM）在内容学习（ICL）中表现出了强大的几个示例学习能力。然而，ICL表现的选择示例仍然是一个持续的挑战和开放问题。现有的文献强调选择测试样本中的多样化或semantic相似的示例，而忽略了任务特定的最佳选择维度。基于这两个维度的优点，我们提出了迭代示例选择（IDS）。IDS使用零实例链条思维（Zero-shot-CoT）来选择示例，其中逻辑路径是在测试样本之前应用于测试样本。然后，选择的示例将被附加到测试样本中进行INF的推理。生成的答案将被 accompanied by its corresponding reasoning path，以提取新的示例集。经过多轮迭代，IDS采用多数投票方式获得最终结果。我们通过对常识推理、问答、话题分类和情感分析等任务进行广泛的实验，证明IDS可以一直性能高于现有的ICL示例选择方法。

Statistical inference using machine learning and classical techniques based on accumulated local effects (ALE)

paper_url: http://arxiv.org/abs/2310.09877
repo_url: None
paper_authors: Chitu Okoli
for: 这篇论文主要是为了提出一种model-agnostic的方法来进行黑盒机器学习（ML）算法的全面解释。
methods: 这篇论文使用了ALE（Accumulated Local Effects）模型无关的方法来进行解释，并提出了一些新的统计推断方法来解决小样本大小的问题，以及在ML数据分析中对变量的总效果的INTRODUCTION。
results: 这篇论文提出了一些实用的解决方案，包括在ALE分析中确保可靠性，以及在ML数据分析中对变量的总效果进行INTRODUCTION。这些解决方案可以帮助更好地进行ML数据分析和统计推断。

Abstract
Accumulated Local Effects (ALE) is a model-agnostic approach for global explanations of the results of black-box machine learning (ML) algorithms. There are at least three challenges with conducting statistical inference based on ALE: ensuring the reliability of ALE analyses, especially in the context of small datasets; intuitively characterizing a variable's overall effect in ML; and making robust inferences from ML data analysis. In response, we introduce innovative tools and techniques for statistical inference using ALE, establishing bootstrapped confidence intervals tailored to dataset size and introducing ALE effect size measures that intuitively indicate effects on both the outcome variable scale and a normalized scale. Furthermore, we demonstrate how to use these tools to draw reliable statistical inferences, reflecting the flexible patterns ALE adeptly highlights, with implementations available in the 'ale' package in R. This work propels the discourse on ALE and its applicability in ML and statistical analysis forward, offering practical solutions to prevailing challenges in the field.

摘要
集成本地效应（ALE）是一种模型不依赖的方法，用于全面解释黑盒机器学习（ML）算法的结果。在进行统计推断基于ALE时，存在至少三个挑战：确保ALE分析的可靠性，特别是在小数据集中；Intuitively characterize a variable's overall effect in ML;和从ML数据分析中获得可靠的推断。为此，我们介绍了新的工具和技术，用于基于ALE的统计推断，包括适应 dataset 大小的 bootstrap 信任区间和 ALE 效果大小度量，这些度量可以直观地反映变量对结果变量的影响和Normalized 比例。此外，我们示例了如何使用这些工具来提取可靠的统计推断，反映 ALE 灵活地高亮的各种模式，R 中的 'ale' 包提供了实现。这项工作推动了 ALE 在 ML 和统计分析领域的应用前进，提供了实用的解决方案，用于解决领域中的挑战。

Federated Multi-Objective Learning

paper_url: http://arxiv.org/abs/2310.09866
repo_url: https://github.com/Zakaria-Dahi/Multi-Objective_Optimiser_For_Federated_Learning
paper_authors: Haibo Yang, Zhuqing Liu, Jia Liu, Chaosheng Dong, Michinari Momma
for: Multi-agent multi-task learning applications with distributed nature and data privacy needs.
methods: Federated multi-objective learning (FMOL) framework with multiple clients distributively and collaboratively solving an MOO problem while keeping their training data private.
results: Proposed two new federated multi-objective optimization (FMOO) algorithms called federated multi-gradient descent averaging (FMGDA) and federated stochastic multi-gradient descent averaging (FSMGDA), which allow local updates to significantly reduce communication costs, while achieving the same convergence rates as those of their algorithmic counterparts in the single-objective federated learning.

Abstract
In recent years, multi-objective optimization (MOO) emerges as a foundational problem underpinning many multi-agent multi-task learning applications. However, existing algorithms in MOO literature remain limited to centralized learning settings, which do not satisfy the distributed nature and data privacy needs of such multi-agent multi-task learning applications. This motivates us to propose a new federated multi-objective learning (FMOL) framework with multiple clients distributively and collaboratively solving an MOO problem while keeping their training data private. Notably, our FMOL framework allows a different set of objective functions across different clients to support a wide range of applications, which advances and generalizes the MOO formulation to the federated learning paradigm for the first time. For this FMOL framework, we propose two new federated multi-objective optimization (FMOO) algorithms called federated multi-gradient descent averaging (FMGDA) and federated stochastic multi-gradient descent averaging (FSMGDA). Both algorithms allow local updates to significantly reduce communication costs, while achieving the {\em same} convergence rates as those of their algorithmic counterparts in the single-objective federated learning. Our extensive experiments also corroborate the efficacy of our proposed FMOO algorithms.

摘要
To solve the FMOL problem, we propose two new federated multi-objective optimization (FMOO) algorithms, called federated multi-gradient descent averaging (FMGDA) and federated stochastic multi-gradient descent averaging (FSMGDA). Both algorithms allow for local updates to significantly reduce communication costs, while achieving the same convergence rates as their algorithmic counterparts in single-objective federated learning. Our extensive experiments also demonstrate the effectiveness of our proposed FMOO algorithms.

Federated Reinforcement Learning for Resource Allocation in V2X Networks

paper_url: http://arxiv.org/abs/2310.09858
repo_url: None
paper_authors: Kaidi Xu, Shenglong Zhou, Geoffrey Ye Li
for: 这个论文是用来研究车至所有东西（V2X）网络资源分配的最佳化方法。
methods: 这个论文使用联邦强化学习（FRL）框架，并使用不精确的方向分解方法（ADMM）来解决资源分配问题。
results: 这个论文的结果显示，使用PASM算法可以实现资源分配问题的最佳化，并且比一些基于估计的方法具有更好的数字性表现。

Abstract
Resource allocation significantly impacts the performance of vehicle-to-everything (V2X) networks. Most existing algorithms for resource allocation are based on optimization or machine learning (e.g., reinforcement learning). In this paper, we explore resource allocation in a V2X network under the framework of federated reinforcement learning (FRL). On one hand, the usage of RL overcomes many challenges from the model-based optimization schemes. On the other hand, federated learning (FL) enables agents to deal with a number of practical issues, such as privacy, communication overhead, and exploration efficiency. The framework of FRL is then implemented by the inexact alternative direction method of multipliers (ADMM), where subproblems are solved approximately using policy gradients and accelerated by an adaptive step size calculated from their second moments. The developed algorithm, PASM, is proven to be convergent under mild conditions and has a nice numerical performance compared with some baseline methods for solving the resource allocation problem in a V2X network.

摘要
资源分配对于 vehicle-to-everything（V2X）网络的性能有着重要的影响。大多数现有的资源分配算法基于优化或机器学习（例如，强化学习）。在这篇论文中，我们explore V2X网络中的资源分配问题在 federated reinforcement learning（FRL）框架下。一方面，RL可以超越许多模型基于优化方案中的挑战。另一方面，联邦学习（FL）可以帮助代理人处理一些实际问题，如隐私、通信开销和探索效率。然后，FRL框架被实现通过不确定多члены方法（ADMM），其中子问题被解决approximately使用政策偏导和加速器是根据其第二次 moments。开发的算法，PASM，在一定的条件下被证明是收敛的，并与一些基准方法相比有良好的数值性能。

MERTech: Instrument Playing Technique Detection Using Self-Supervised Pretrained Model With Multi-Task Finetuning

paper_url: http://arxiv.org/abs/2310.09853
repo_url: None
paper_authors: Dichucheng Li, Yinghao Ma, Weixing Wei, Qiuqiang Kong, Yulun Wu, Mingjin Che, Fan Xia, Emmanouil Benetos, Wei Li
for: 本研究旨在提出一种自动检测乐器演奏技巧（IPT）的方法，以解决数据稀缺和类别不均匀问题。
methods: 该方法利用自动学习模型，先在大规模无标签音乐数据上进行自动学习，然后在IPT检测任务上练习 fine-tuning。此外，还 investigate了多任务融合finetuning，包括抑制和识别抑制的多个任务。
results: 该方法在多个IPT标准测试集上比过去的方法表现出色，在 Frame-level和事件-level度量中均显示出优异性。此外，多任务融合finetuning也能够提高每个IPT类别的准确率。

Abstract
Instrument playing techniques (IPTs) constitute a pivotal component of musical expression. However, the development of automatic IPT detection methods suffers from limited labeled data and inherent class imbalance issues. In this paper, we propose to apply a self-supervised learning model pre-trained on large-scale unlabeled music data and finetune it on IPT detection tasks. This approach addresses data scarcity and class imbalance challenges. Recognizing the significance of pitch in capturing the nuances of IPTs and the importance of onset in locating IPT events, we investigate multi-task finetuning with pitch and onset detection as auxiliary tasks. Additionally, we apply a post-processing approach for event-level prediction, where an IPT activation initiates an event only if the onset output confirms an onset in that frame. Our method outperforms prior approaches in both frame-level and event-level metrics across multiple IPT benchmark datasets. Further experiments demonstrate the efficacy of multi-task finetuning on each IPT class.

摘要
To further enhance performance, we investigate multi-task finetuning with pitch and onset detection as auxiliary tasks. Pitch is essential for capturing the nuances of IPTs, while onset information is critical for locating IPT events. We also apply a post-processing approach for event-level prediction, where an IPT activation is only triggered if the onset output confirms an onset in that frame.Our method outperforms prior approaches in both frame-level and event-level metrics across multiple IPT benchmark datasets. Additionally, we demonstrate the effectiveness of multi-task finetuning on each IPT class. Our approach provides a significant improvement in IPT detection accuracy, addressing the challenges of limited labeled data and class imbalance issues.

ACES: Generating Diverse Programming Puzzles with Autotelic Language Models and Semantic Descriptors

paper_url: http://arxiv.org/abs/2310.10692
repo_url: None
paper_authors: Julien Pourcel, Cédric Colas, Pierre-Yves Oudeyer, Laetitia Teodorescu
For: studying automated problem generation in the context of python programming puzzles, with a focus on interesting diversity optimization.* Methods: using semantic descriptors produced by a large language model (LLM) to directly optimize for interesting diversity, as well as few-shot-based generation.* Results: discovering a richer diversity of puzzles than existing diversity-maximizing algorithms, as measured across a range of diversity metrics.

Abstract
Finding and selecting new and interesting problems to solve is at the heart of curiosity, science and innovation. We here study automated problem generation in the context of the open-ended space of python programming puzzles. Existing generative models often aim at modeling a reference distribution without any explicit diversity optimization. Other methods explicitly optimizing for diversity do so either in limited hand-coded representation spaces or in uninterpretable learned embedding spaces that may not align with human perceptions of interesting variations. With ACES (Autotelic Code Exploration via Semantic descriptors), we introduce a new autotelic generation method that leverages semantic descriptors produced by a large language model (LLM) to directly optimize for interesting diversity, as well as few-shot-based generation. Each puzzle is labeled along 10 dimensions, each capturing a programming skill required to solve it. ACES generates and pursues novel and feasible goals to explore that abstract semantic space, slowly discovering a diversity of solvable programming puzzles in any given run. Across a set of experiments, we show that ACES discovers a richer diversity of puzzles than existing diversity-maximizing algorithms as measured across a range of diversity metrics. We further study whether and in which conditions this diversity can translate into the successful training of puzzle solving models.

摘要
寻找和选择新领域的问题是感知、科学和创新的核心。我们在python编程练习中的开放式空间中研究自动生成问题。现有的生成模型通常是模型参考分布而不是直接优化多样性。其他方法通过手动编码的表示空间或学习的嵌入空间来显式地优化多样性，但这些方法可能并不与人类的意义变化相匹配。我们在ACES（自动telic代码探索 via 语义描述符）中引入了一种新的自动telic生成方法，利用大语言模型生成的语义描述符直接优化有趣的多样性，以及几招学习。每个练习都被标记了10个维度，每个维度捕捉一个需要解决它的编程技能。ACES生成和追求新的可行目标，慢慢发现任务抽象 semantic空间中的多样性，在任务执行中逐渐发现可解决的编程练习。在一系列实验中，我们发现ACES在多样性度量上比现有的多样性最大化算法更加丰富。我们进一步研究是否和在哪些条件下，这种多样性可以导致练习解决模型的成功培训。

CoCoFormer: A controllable feature-rich polyphonic music generation method

paper_url: http://arxiv.org/abs/2310.09843
repo_url: None
paper_authors: Jiuyang Zhou, Tengfei Niu, Hong Zhu, Xingping Wang
for: 本研究探讨了多重音乐序列的模型化方法，尤其是使用 transformer 模型进行可控音乐生成。
methods: 本研究提出了 Condition Choir Transformer（CoCoFormer）模型，通过控制输出模型的逻辑和拍子输入来实现精细化控制。同时，通过自我超VI等方法进行验证和训练。
results: 实验表明，CoCoFormer 模型在指定多重音乐Texture时，可以生成多种不同的同一首歌曲，并且达到了当前最佳水平。

Abstract
This paper explores the modeling method of polyphonic music sequence. Due to the great potential of Transformer models in music generation, controllable music generation is receiving more attention. In the task of polyphonic music, current controllable generation research focuses on controlling the generation of chords, but lacks precise adjustment for the controllable generation of choral music textures. This paper proposed Condition Choir Transformer (CoCoFormer) which controls the output of the model by controlling the chord and rhythm inputs at a fine-grained level. In this paper, the self-supervised method improves the loss function and performs joint training through conditional control input and unconditional input training. In order to alleviate the lack of diversity on generated samples caused by the teacher forcing training, this paper added an adversarial training method. CoCoFormer enhances model performance with explicit and implicit inputs to chords and rhythms. In this paper, the experiments proves that CoCoFormer has reached the current better level than current models. On the premise of specifying the polyphonic music texture, the same melody can also be generated in a variety of ways.

摘要
The paper uses a self-supervised method to improve the loss function and performs joint training through conditional control input and unconditional input training. To alleviate the lack of diversity in generated samples caused by teacher forcing training, the paper adds an adversarial training method. CoCoFormer enhances model performance with explicit and implicit inputs to chords and rhythms.Experiments show that CoCoFormer has reached a current better level than current models. With the premise of specifying the polyphonic music texture, the same melody can also be generated in a variety of ways.Translation notes:* "polyphonic music sequence" is translated as "多重音乐序列" (polytrophic music sequence)* "Transformer models" is translated as "变换器模型" (transformer models)* "controllable music generation" is translated as "可控音乐生成" (controllable music generation)* "chord" is translated as "和声" (chord)* "rhythm" is translated as "拍" (rhythm)* "self-supervised method" is translated as "自我指导方法" (self-supervised method)* "adversarial training method" is translated as "对抗训练方法" (adversarial training method)* "CoCoFormer" is translated as "CoCoFormer" (CoCoFormer)* "polyphonic music texture" is translated as "多重音乐Texture" (polyphonic music texture)* "melody" is translated as "旋律" (melody)

Explaining How a Neural Network Play the Go Game and Let People Learn

paper_url: http://arxiv.org/abs/2310.09838
repo_url: None
paper_authors: Huilin Zhou, Huijie Tang, Mingjie Li, Hao Zhang, Zhenyu Liu, Quanshi Zhang
for: 本研究的目的是解释Go游戏中AI模型所编码的知识，并使用这些知识来教育人类玩家。
methods: 本研究使用了Value网络来提取Go游戏中石头之间的交互 primitives，以便人类可以从Value网络中学习准确和可靠的知识。
results: 实验表明，我们的方法可以有效地提取Go游戏中AI模型所编码的知识，并帮助人类玩家更好地理解和掌握Go游戏。

Abstract
The AI model has surpassed human players in the game of Go, and it is widely believed that the AI model has encoded new knowledge about the Go game beyond human players. In this way, explaining the knowledge encoded by the AI model and using it to teach human players represent a promising-yet-challenging issue in explainable AI. To this end, mathematical supports are required to ensure that human players can learn accurate and verifiable knowledge, rather than specious intuitive analysis. Thus, in this paper, we extract interaction primitives between stones encoded by the value network for the Go game, so as to enable people to learn from the value network. Experiments show the effectiveness of our method.

摘要
人工智能模型已经在围棋游戏中超越人类玩家，而且广泛认为该模型已经编码了人类玩家之外的新知识。因此，解释AI模型所编码的知识并使用其教育人类玩家是一项有前途又挑战的问题。为此，我们需要有数学支持，以确保人类玩家可以学习准确和可靠的知识，而不是基于假设的直觉分析。在本文中，我们提取了围棋中石头之间的互动基本原理，以便让人类玩家从值网络中学习。实验表明我们的方法的效果。

MIR2: Towards Provably Robust Multi-Agent Reinforcement Learning by Mutual Information Regularization

paper_url: http://arxiv.org/abs/2310.09833
repo_url: None
paper_authors: Simin Li, Ruixiao Xu, Jun Guo, Pu Feng, Jiakai Wang, Aishan Liu, Yaodong Yang, Xianglong Liu, Weifeng Lv
for: 这篇论文的目的是提出一种robust多代理学习（MARL）方法，以增强对不确定或最坏情况的抗性。
methods: 该方法使用policy学习在 Routine Scenarios 中训练，并使用Mutual Information as Robust Regularization来避免过度优化。
results: 对于StarCraft II、Multi-agent Mujoco和 rendezvous 等场景，MIR2方法显示了更高的抗性性能，并且在实际应用中的 robot 群集控制场景中也表现出了优异性能。

Abstract
Robust multi-agent reinforcement learning (MARL) necessitates resilience to uncertain or worst-case actions by unknown allies. Existing max-min optimization techniques in robust MARL seek to enhance resilience by training agents against worst-case adversaries, but this becomes intractable as the number of agents grows, leading to exponentially increasing worst-case scenarios. Attempts to simplify this complexity often yield overly pessimistic policies, inadequate robustness across scenarios and high computational demands. Unlike these approaches, humans naturally learn adaptive and resilient behaviors without the necessity of preparing for every conceivable worst-case scenario. Motivated by this, we propose MIR2, which trains policy in routine scenarios and minimize Mutual Information as Robust Regularization. Theoretically, we frame robustness as an inference problem and prove that minimizing mutual information between histories and actions implicitly maximizes a lower bound on robustness under certain assumptions. Further analysis reveals that our proposed approach prevents agents from overreacting to others through an information bottleneck and aligns the policy with a robust action prior. Empirically, our MIR2 displays even greater resilience against worst-case adversaries than max-min optimization in StarCraft II, Multi-agent Mujoco and rendezvous. Our superiority is consistent when deployed in challenging real-world robot swarm control scenario. See code and demo videos in Supplementary Materials.

摘要
多智能体强化学习（MARL）需要对不确定或最坏情况的行动具备抗性。现有的最大最小优化技术在Robust MARL中增强抗性，但随着智能体数量增加，最坏情况的数量将 exponentiation 增长，导致计算量过高。尝试简化这种复杂性通常会导致过度保守的策略，不足robustness across scenarios和高计算需求。与这些方法不同，人类自然地学习了适应和抗性行为，无需为每个可能的最坏情况做准备。 inspirited by this，我们提出了MIR2，它在 Routine scenarios 中训练策略，并将 Mutual Information 作为Robust Regularization 来最小化。从理论角度来看，我们将robustness 视为一个推理问题，并证明在某些假设下，将mutual information between histories and actions 最小化会implicitly 最大化一个下界于robustness的lower bound。进一步的分析表明，我们的提posed approach prevent agents from overreacting to others through an information bottleneck，并使策略与一个robust action prior 吻合。Empirically，我们的MIR2在StarCraft II, Multi-agent Mujoco和 rendezvous 中对最坏情况的抗性性能更高than max-min optimization。我们的superiority 在Real-world robot swarm control scenario 中也是一致的。参考代码和示例视频在Supplementary Materials中。

Large Language Models for In-Context Student Modeling: Synthesizing Student’s Behavior in Visual Programming from One-Shot Observation

paper_url: http://arxiv.org/abs/2310.10690
repo_url: None
paper_authors: Manh Hung Nguyen, Sebastian Tschiatschek, Adish Singla
for: This paper is written for researchers and practitioners in the field of educational technology, particularly those interested in student modeling and personalized learning.
methods: The paper explores the use of Large Language Models (LLMs) for in-context student modeling in open-ended learning environments. The proposed framework, LLM-SS, leverages LLMs to synthesize a student’s behavior based on their solving attempts on a reference task. The authors fine-tune LLMs using domain-specific expertise to improve their understanding of domain background and student behaviors.
results: The paper reports significant improvements in student behavior synthesis compared to baseline methods included in the StudentSyn benchmark. Specifically, the method using the fine-tuned Llama2-70B model improves noticeably compared to using the base model and becomes on par with using the state-of-the-art GPT-4 model.

Abstract
Student modeling is central to many educational technologies as it enables the prediction of future learning outcomes and targeted instructional strategies. However, open-ended learning environments pose challenges for accurately modeling students due to the diverse behaviors exhibited by students and the absence of a well-defined set of learning skills. To approach these challenges, we explore the application of Large Language Models (LLMs) for in-context student modeling in open-ended learning environments. We introduce a novel framework, LLM-SS, that leverages LLMs for synthesizing student's behavior. More concretely, given a particular student's solving attempt on a reference task as observation, the goal is to synthesize the student's attempt on a target task. Our framework can be combined with different LLMs; moreover, we fine-tune LLMs using domain-specific expertise to boost their understanding of domain background and student behaviors. We evaluate several concrete methods based on LLM-SS using the StudentSyn benchmark, an existing student's attempt synthesis benchmark in visual programming. Experimental results show a significant improvement compared to baseline methods included in the StudentSyn benchmark. Furthermore, our method using the fine-tuned Llama2-70B model improves noticeably compared to using the base model and becomes on par with using the state-of-the-art GPT-4 model.

摘要
We propose a novel framework, LLM-SS, which leverages LLMs to synthesize a student's behavior. Given a particular student's attempt at a reference task, the goal is to synthesize their attempt on a target task. Our framework can be combined with different LLMs, and we fine-tune these models using domain-specific expertise to improve their understanding of the domain background and student behaviors.We evaluate several concrete methods based on LLM-SS using the StudentSyn benchmark, an existing student attempt synthesis benchmark in visual programming. The results show a significant improvement compared to baseline methods included in the StudentSyn benchmark. Additionally, our method using the fine-tuned Llama2-70B model improves noticeably compared to using the base model and is on par with using the state-of-the-art GPT-4 model.

Optimizing K-means for Big Data: A Comparative Study

paper_url: http://arxiv.org/abs/2310.09819
repo_url: None
paper_authors: Ravil Mussabayev, Rustam Mussabayev
for: 这篇论文旨在比较不同优化技术对K-means算法的应用在大数据场景中的影响。
methods: 论文描述了不同的优化技术，包括并行、简化、采样等方法，以解决K-means算法在大数据场景中的缺乏扩展性问题。
results: 作者通过对各种标准数据集进行比较，发现不同的技术在不同的数据集上的表现不同，并提供了关于速度和准确性之间的负担平衡的理解。

Abstract
This paper presents a comparative analysis of different optimization techniques for the K-means algorithm in the context of big data. K-means is a widely used clustering algorithm, but it can suffer from scalability issues when dealing with large datasets. The paper explores different approaches to overcome these issues, including parallelization, approximation, and sampling methods. The authors evaluate the performance of these techniques on various benchmark datasets and compare them in terms of speed, quality of clustering, and scalability according to the LIMA dominance criterion. The results show that different techniques are more suitable for different types of datasets and provide insights into the trade-offs between speed and accuracy in K-means clustering for big data. Overall, the paper offers a comprehensive guide for practitioners and researchers on how to optimize K-means for big data applications.

摘要
Translation in Simplified Chinese:这篇论文提出了对K-means算法的不同优化技术进行比较分析，以帮助在大数据场景下使用K-means算法。K-means算法广泛使用，但是它在处理大数据时可能会遇到扩展性问题。论文探讨了不同的方法来解决这些问题，包括并行、 aproximation 和采样方法。作者对这些技术在不同的测试数据集上进行评估，并根据LIMA主导因素来比较它们的速度、归一化质量和可扩展性。结果显示不同的技术适用于不同的数据类型，并提供了关于速度和准确性在K-means归一化中的贸易OFF的深入理解。总之，这篇论文为实践者和研究人员提供了一份全面的指南，以帮助他们在大数据应用中优化K-means算法。

Negative Sampling with Adaptive Denoising Mixup for Knowledge Graph Embedding

paper_url: http://arxiv.org/abs/2310.09781
repo_url: https://github.com/DeMix2023/Demix
paper_authors: Xiangnan Chen, Wen Zhang, Zhen Yao, Mingyang Chen, Siliang Tang
for: 本研究旨在提高知识图（KG）中entity和relation embedding的质量，通过减少负样本中的噪声。
methods: 提议使用一种混合策略，通过自我supervised的方式来更新负样本，从而提高KGE的训练效果。
results: 实验结果表明，提议的DeMix方法可以更好地减少负样本中的噪声，使KGE更快地训练到更好的链接预测结果。

Abstract
Knowledge graph embedding (KGE) aims to map entities and relations of a knowledge graph (KG) into a low-dimensional and dense vector space via contrasting the positive and negative triples. In the training process of KGEs, negative sampling is essential to find high-quality negative triples since KGs only contain positive triples. Most existing negative sampling methods assume that non-existent triples with high scores are high-quality negative triples. However, negative triples sampled by these methods are likely to contain noise. Specifically, they ignore that non-existent triples with high scores might also be true facts due to the incompleteness of KGs, which are usually called false negative triples. To alleviate the above issue, we propose an easily pluggable denoising mixup method called DeMix, which generates high-quality triples by refining sampled negative triples in a self-supervised manner. Given a sampled unlabeled triple, DeMix firstly classifies it into a marginal pseudo-negative triple or a negative triple based on the judgment of the KGE model itself. Secondly, it selects an appropriate mixup partner for the current triple to synthesize a partially positive or a harder negative triple. Experimental results on the knowledge graph completion task show that the proposed DeMix is superior to other negative sampling techniques, ensuring corresponding KGEs a faster convergence and better link prediction results.

摘要
知识图embedding（KGE）目的是将知识图（KG）中的实体和关系映射到一个低维度和紧凑的向量空间，通过对正确和错误 triplets进行对比。在KGE训练过程中，负样本是关键的，因为KG只包含正确的 triplets。现有的负样本方法假设高分负样本是高质量的负样本，但这些负样本可能含有噪声。Specifically, these methods ignore the fact that high-scoring non-existent triplets may be true facts due to the incompleteness of KGs, which are called false negative triplets. To address this issue, we propose an easily pluggable denoising mixup method called DeMix, which generates high-quality triples by refining sampled negative triples in a self-supervised manner. Given a sampled unlabeled triple, DeMix first classifies it into a marginal pseudo-negative triple or a negative triple based on the judgment of the KGE model itself. Secondly, it selects an appropriate mixup partner for the current triple to synthesize a partially positive or a harder negative triple. Experimental results on the knowledge graph completion task show that the proposed DeMix is superior to other negative sampling techniques, ensuring corresponding KGEs a faster convergence and better link prediction results.

Notes on Applicability of Explainable AI Methods to Machine Learning Models Using Features Extracted by Persistent Homology

paper_url: http://arxiv.org/abs/2310.09780
repo_url: https://github.com/naofumihama/xai_ph_ml
paper_authors: Naofumi Hama
For: The paper explores the potential application of explainable AI methodologies to the persistent homology (PH)-machine learning (ML) pipeline for predicting gas adsorption in metal-organic frameworks.* Methods: The paper uses the PH-ML pipeline to extract features from topological data analysis and applies explainable AI methodologies to improve the interpretability of the results.* Results: The paper demonstrates suggestive results for predicting gas adsorption in metal-organic frameworks using the PH-ML pipeline with explainable AI methodologies. The codes to reproduce the results are available on GitHub.Here is the same information in Simplified Chinese text:* For: 本文探讨PH-ML管线在预测金属组织材料中气吸附过程中的可读性。* Methods: 本文使用PH-ML管线提取特征，并应用可读性AI方法来提高结果的解释性。* Results: 本文提出了预测金属组织材料中气吸附过程中的可读性结果，并提供了在GitHub上可重现的代码。

Abstract
Data analysis that uses the output of topological data analysis as input for machine learning algorithms has been the subject of extensive research. This approach offers a means of capturing the global structure of data. Persistent homology (PH), a common methodology within the field of TDA, has found wide-ranging applications in machine learning. One of the key reasons for the success of the PH-ML pipeline lies in the deterministic nature of feature extraction conducted through PH. The ability to achieve satisfactory levels of accuracy with relatively simple downstream machine learning models, when processing these extracted features, underlines the pipeline's superior interpretability. However, it must be noted that this interpretation has encountered issues. Specifically, it fails to accurately reflect the feasible parameter region in the data generation process, and the physical or chemical constraints that restrict this process. Against this backdrop, we explore the potential application of explainable AI methodologies to this PH-ML pipeline. We apply this approach to the specific problem of predicting gas adsorption in metal-organic frameworks and demonstrate that it can yield suggestive results. The codes to reproduce our results are available at https://github.com/naofumihama/xai_ph_ml

摘要
研究使用 topological data analysis（TDA）的输出作为机器学习算法的输入的数据分析方法已经得到了广泛的研究。这种方法可以捕捉数据的全局结构。 persistent homology（PH）是TDA领域中常用的方法ологи，在机器学习领域也有广泛的应用。PH-ML管道的成功一个关键原因在于PH的干扰特征，这使得可以使用简单的下游机器学习模型达到高度的准确性。然而，这种解释存在一些问题，它无法准确地反映数据生成过程中可行的参数范围和物理或化学约束。为了解决这些问题，我们研究了使用可解释AI方法ologies来解释PH-ML管道。我们在预测金属组分材料中的气体吸附问题中应用了这种方法，并证明了它可以提供有价值的结果。codes可以在https://github.com/naofumihama/xai_ph_ml中找到。

Worst-Case Analysis is Maximum-A-Posteriori Estimation

paper_url: http://arxiv.org/abs/2310.09774
repo_url: None
paper_authors: Hongjun Wu, Di Wang
for: 这种软件工程任务中的性能优化和算法复杂性找出缺陷。
methods: 使用一种通用、适应和有 garantía的随机探测框架，称为DSE-SMC，来估计最坏情况的资源使用。
results: 对 Java 应用程序进行实验评估，得到了 DSE-SMC 比现有黑盒随机探测方法更有效。

Abstract
The worst-case resource usage of a program can provide useful information for many software-engineering tasks, such as performance optimization and algorithmic-complexity-vulnerability discovery. This paper presents a generic, adaptive, and sound fuzzing framework, called DSE-SMC, for estimating worst-case resource usage. DSE-SMC is generic because it is black-box as long as the user provides an interface for retrieving resource-usage information on a given input; adaptive because it automatically balances between exploration and exploitation of candidate inputs; and sound because it is guaranteed to converge to the true resource-usage distribution of the analyzed program. DSE-SMC is built upon a key observation: resource accumulation in a program is isomorphic to the soft-conditioning mechanism in Bayesian probabilistic programming; thus, worst-case resource analysis is isomorphic to the maximum-a-posteriori-estimation problem of Bayesian statistics. DSE-SMC incorporates sequential Monte Carlo (SMC) -- a generic framework for Bayesian inference -- with adaptive evolutionary fuzzing algorithms, in a sound manner, i.e., DSE-SMC asymptotically converges to the posterior distribution induced by resource-usage behavior of the analyzed program. Experimental evaluation on Java applications demonstrates that DSE-SMC is significantly more effective than existing black-box fuzzing methods for worst-case analysis.

摘要
<> translate "The worst-case resource usage of a program can provide useful information for many software-engineering tasks, such as performance optimization and algorithmic-complexity-vulnerability discovery. This paper presents a generic, adaptive, and sound fuzzing framework, called DSE-SMC, for estimating worst-case resource usage. DSE-SMC is generic because it is black-box as long as the user provides an interface for retrieving resource-usage information on a given input; adaptive because it automatically balances between exploration and exploitation of candidate inputs; and sound because it is guaranteed to converge to the true resource-usage distribution of the analyzed program. DSE-SMC is built upon a key observation: resource accumulation in a program is isomorphic to the soft-conditioning mechanism in Bayesian probabilistic programming; thus, worst-case resource analysis is isomorphic to the maximum-a-posteriori-estimation problem of Bayesian statistics. DSE-SMC incorporates sequential Monte Carlo (SMC) -- a generic framework for Bayesian inference -- with adaptive evolutionary fuzzing algorithms, in a sound manner, i.e., DSE-SMC asymptotically converges to the posterior distribution induced by resource-usage behavior of the analyzed program. Experimental evaluation on Java applications demonstrates that DSE-SMC is significantly more effective than existing black-box fuzzing methods for worst-case analysis."into Simplified Chinese:<>将程序的最差情况资源使用情况提供有用信息，用于软件工程各种任务，如性能优化和漏极性漏极性检测。本文介绍了一种通用、适应、有Sound的异步爬虫框架，称为DSE-SMC，用于估计最差情况资源使用。DSE-SMC是通用的，因为它可以透过输入的接口获取资源使用信息; 适应的，因为它会自动考虑探索和利用候选输入; 和有Sound的，因为它可以保证对分析程序的资源使用行为进行正确的拟合。 DSE-SMC基于资源寄生在程序中的软件条件机制，因此最差情况资源分析与 bayesian probabilistic programming 中的最大 posterior estimation 归一化。 DSE-SMC通过将 Bayesian 推理框架sequential Monte Carlo (SMC) 与适应演化爬虫算法相结合，实现了一种有Sound的方法。实验结果表明，DSE-SMC在 Java 应用程序上比现有的黑盒爬虫方法更有效。

A Critical Survey on Fairness Benefits of XAI

paper_url: http://arxiv.org/abs/2310.13007
repo_url: None
paper_authors: Luca Deck, Jakob Schoeffer, Maria De-Arteaga, Niklas Kühl
for: 这些研究旨在探讨可解释人工智能（XAI）与公平性之间的关系，并寻找XAI如何实现公平性的方法。
methods: 这些研究使用系统性的文献复查和后续的质量分析，找到了175篇关于XAI是如何提供公平性的纷争性的论文。
results: 研究发现了7种典型的声索，即XAI可以帮助实现多种公平性标准。但是，研究还发现了这些声索的一些重要的限制和困难。

Abstract
In this critical survey, we analyze typical claims on the relationship between explainable AI (XAI) and fairness to disentangle the multidimensional relationship between these two concepts. Based on a systematic literature review and a subsequent qualitative content analysis, we identify seven archetypal claims from 175 papers on the alleged fairness benefits of XAI. We present crucial caveats with respect to these claims and provide an entry point for future discussions around the potentials and limitations of XAI for specific fairness desiderata. While the literature often suggests XAI to be an enabler for several fairness desiderata, we notice a misalignment between these desiderata and the capabilities of XAI. We encourage to conceive XAI as one of many tools to approach the multidimensional, sociotechnical challenge of algorithmic fairness and to be more specific about how exactly what kind of XAI method enables whom to address which fairness desideratum.

摘要
在这份重要的调查中，我们分析了通用Explainable AI（XAI）和公平之间的关系，以彻底分离这两个概念之间的多维关系。通过系统性文献综述和 subsequential 资料分析，我们确定了175篇文章中对XAI的公平 benefittest的七种典型声明。我们提出了关于这些声明的重要警告和限制，并为将来关于XAI在特定公平要求上的潜在优势和局限性的讨论提供入口点。尽管文献 часто表明XAI是许多公平要求的激活器，但我们注意到了XAI的能力与这些要求的不一致。我们建议视XAI为一种用于多维、社技挑战的算法公平的工具，并更 preciselly 说明XAI方法可以为谁 Address 哪些公平要求。

VLIS: Unimodal Language Models Guide Multimodal Language Generation

paper_url: http://arxiv.org/abs/2310.09767
repo_url: https://github.com/jiwanchung/vlis
paper_authors: Jiwan Chung, Youngjae Yu
for: 提高多Modal语言生成的复杂语言理解能力
methods: combinesthe visual conditioning capability of vision-language models with the language understanding of unimodal text-only language models without further training
results: 在多种任务上（包括CommonSense理解、复杂文本生成等），VLIS可以提高视觉语言模型的性能

Abstract
Multimodal language generation, which leverages the synergy of language and vision, is a rapidly expanding field. However, existing vision-language models face challenges in tasks that require complex linguistic understanding. To address this issue, we introduce Visual-Language models as Importance Sampling weights (VLIS), a novel framework that combines the visual conditioning capability of vision-language models with the language understanding of unimodal text-only language models without further training. It extracts pointwise mutual information of each image and text from a visual-language model and uses the value as an importance sampling weight to adjust the token likelihood from a text-only model. VLIS improves vision-language models on diverse tasks, including commonsense understanding (WHOOPS, OK-VQA, and ScienceQA) and complex text generation (Concadia, Image Paragraph Captioning, and ROCStories). Our results suggest that VLIS represents a promising new direction for multimodal language generation.

摘要
多模态语言生成，利用语言和视觉之间的共同作用，是一个快速发展的领域。然而，现有的视觉语言模型在需要复杂的语言理解任务时会遇到挑战。为解决这个问题，我们介绍了视觉语言模型作为重要抽象权重（VLIS），这是一种将视觉语言模型的视觉条件能力与单模式文本Only语言模型的语言理解能力结合在一起的新框架。它从视觉语言模型中提取每个图像和文本的点对 Mutual Information，并将其用作重要抽象权重，以调整文本Only模型的单词概率。VLIS改进了多种任务，包括宽泛理解（WHOOPS、OK-VQA和科学问答）和复杂文本生成（Concadia、图像段落描述和ROCStories）。我们的结果表明，VLIS代表了一个有前途的新方向 для多模态语言生成。

Improving Access to Justice for the Indian Population: A Benchmark for Evaluating Translation of Legal Text to Indian Languages

paper_url: http://arxiv.org/abs/2310.09765
repo_url: None
paper_authors: Sayan Mahapatra, Debtanu Datta, Shubham Soni, Adrijit Goswami, Saptarshi Ghosh
For: The paper aims to make legal text in the Indian judiciary more accessible to the general population, who are not comfortable with reading English.* Methods: The authors construct a high-quality legal parallel corpus containing aligned text units in English and nine Indian languages, and benchmark the performance of various Machine Translation (MT) systems over this corpus.* Results: The authors survey Law practitioners to evaluate the quality of the translations produced by the MT systems, and compare the results with automatic MT evaluation metrics.

Abstract
Most legal text in the Indian judiciary is written in complex English due to historical reasons. However, only about 10% of the Indian population is comfortable in reading English. Hence legal text needs to be made available in various Indian languages, possibly by translating the available legal text from English. Though there has been a lot of research on translation to and between Indian languages, to our knowledge, there has not been much prior work on such translation in the legal domain. In this work, we construct the first high-quality legal parallel corpus containing aligned text units in English and nine Indian languages, that includes several low-resource languages. We also benchmark the performance of a wide variety of Machine Translation (MT) systems over this corpus, including commercial MT systems, open-source MT systems and Large Language Models. Through a comprehensive survey by Law practitioners, we check how satisfied they are with the translations by some of these MT systems, and how well automatic MT evaluation metrics agree with the opinions of Law practitioners.

摘要
大多数印度法律文本在印度司法系统中 escriten in 复杂的英语，历史原因。然而，只有约10%的印度人口能够读写英语。因此，法律文本需要在各种印度语言中提供，可能是通过从英语翻译。虽然已有很多关于翻译与印度语言之间的研究，但我们知道，在法律领域中的翻译研究不多。在这项工作中，我们构建了首个高质量的法律平行文本库，包括英语和九种印度语言的对应文本单位。我们还对这个库进行了评估，包括商业MT系统、开源MT系统和大语言模型。通过对法律专业人员的详细调查，我们检查了这些MT系统的翻译质量如何满意，以及自动MT评估指标与专业人员的意见如何相符。

DropMix: Better Graph Contrastive Learning with Harder Negative Samples

paper_url: http://arxiv.org/abs/2310.09764
repo_url: https://github.com/Mayueq/DropMix-Code
paper_authors: Yueqi Ma, Minjie Chen, Xiang Li
for: 提高图像对比学习中的负样本质量
methods: DropMix方法包括两个主要步骤：首先选择图像中的困难负样本，然后只在部分表示维度上进行混合，以生成更困难的负样本
results: 对六个基准数据集进行了广泛的实验，结果表明 DropMix 方法可以提高对比学习性能

Abstract
While generating better negative samples for contrastive learning has been widely studied in the areas of CV and NLP, very few work has focused on graph-structured data. Recently, Mixup has been introduced to synthesize hard negative samples in graph contrastive learning (GCL). However, due to the unsupervised learning nature of GCL, without the help of soft labels, directly mixing representations of samples could inadvertently lead to the information loss of the original hard negative and further adversely affect the quality of the newly generated harder negative. To address the problem, in this paper, we propose a novel method DropMix to synthesize harder negative samples, which consists of two main steps. Specifically, we first select some hard negative samples by measuring their hardness from both local and global views in the graph simultaneously. After that, we mix hard negatives only on partial representation dimensions to generate harder ones and decrease the information loss caused by Mixup. We conduct extensive experiments to verify the effectiveness of DropMix on six benchmark datasets. Our results show that our method can lead to better GCL performance. Our data and codes are publicly available at https://github.com/Mayueq/DropMix-Code.

摘要
“对待于图structured数据的异构学习中，生成更好的负样本已经广泛研究在CV和NLP领域，但很少有研究在图结构数据上。近期，Mixup方法在图相关学习（GCL）中被引入，以生成困难的负样本。然而，由于GCL是无监督学习的，没有软标签的帮助，直接混合样本表示可能会导致原始困难的负样本中的信息损失，从而降低新生成的更困难负样本的质量。为解决这个问题，在本文中，我们提出了一种新的方法DropMix，它包括两个主要步骤。具体来说，我们首先从图中选择一些困难的负样本，并测量它们的困难程度从本地和全局视图同时。然后，我们只在部分表示维度上混合困难负样本，以生成更困难的负样本和减少Mixup导致的信息损失。我们对六个标准 benchmark dataset进行了广泛的实验，结果显示，我们的方法可以提高GCL性能。我们的数据和代码在https://github.com/Mayueq/DropMix-Code上公开。”

Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer

paper_url: http://arxiv.org/abs/2310.09762
repo_url: None
paper_authors: Boan Liu, Liang Ding, Li Shen, Keqin Peng, Yu Cao, Dazhao Cheng, Dacheng Tao
for: 提高MoE模型的表现和多样性
methods: 提出了一种简单 yet高效的解决方案——对采用MoE结构的模型进行非对称专家优化，并 introduce了一种 alternate training strategy to encourage each expert to update in a direction orthogonal to the subspace spanned by other experts。
results: 通过广泛的实验，证明了我们提出的优化算法可以显著提高MoE模型在GLUE、SuperGLUE、问答任务和名词识别任务的表现。

Abstract
The Mixture of Experts (MoE) has emerged as a highly successful technique in deep learning, based on the principle of divide-and-conquer to maximize model capacity without significant additional computational cost. Even in the era of large-scale language models (LLMs), MoE continues to play a crucial role, as some researchers have indicated that GPT-4 adopts the MoE structure to ensure diverse inference results. However, MoE is susceptible to performance degeneracy, particularly evident in the issues of imbalance and homogeneous representation among experts. While previous studies have extensively addressed the problem of imbalance, the challenge of homogeneous representation remains unresolved. In this study, we shed light on the homogeneous representation problem, wherein experts in the MoE fail to specialize and lack diversity, leading to frustratingly high similarities in their representations (up to 99% in a well-performed MoE model). This problem restricts the expressive power of the MoE and, we argue, contradicts its original intention. To tackle this issue, we propose a straightforward yet highly effective solution: OMoE, an orthogonal expert optimizer. Additionally, we introduce an alternating training strategy that encourages each expert to update in a direction orthogonal to the subspace spanned by other experts. Our algorithm facilitates MoE training in two key ways: firstly, it explicitly enhances representation diversity, and secondly, it implicitly fosters interaction between experts during orthogonal weights computation. Through extensive experiments, we demonstrate that our proposed optimization algorithm significantly improves the performance of fine-tuning the MoE model on the GLUE benchmark, SuperGLUE benchmark, question-answering task, and name entity recognition tasks.

摘要
《粗粒化专家（MoE）》技术在深度学习中得到了广泛应用，基于分治分 conquering的原则，以提高模型容量而不增加显著的计算成本。即使在大规模语言模型（LLM）时代，MoE仍然扮演着关键的角色，一些研究人员表示GPT-4采用了MoE结构以确保多样化的推理结果。然而，MoE受到性能异常化的问题困扰，特别是专家之间的不均衡和同质化表现问题。虽然以前的研究已经广泛地解决了不均衡问题，但同质化表现问题仍然未得到解决。在这项研究中，我们 shed light on the homogeneous representation problem，专家在MoE中失去特化和多样性，导致其表达相似度达99%以上（在一个良好的MoE模型中）。这个问题限制了MoE的表达力，我们认为这与MoE的原意相抵触。为解决这个问题，我们提出了一种简单 yet highly effective的解决方案：OMoE，一种ortogonal expert optimizer。此外，我们还提出了一种 alternate training strategy，鼓励每个专家在归一化方向上更新其 weights。我们的算法可以在两个关键方面帮助MoE训练：首先，它明确提高了表达多样性；其次，它 implicit地促进了专家之间的交互在 ortogonal weights 计算中。通过广泛的实验，我们证明了我们的提出的优化算法可以显著提高 fine-tuning MoE 模型在 GLUE Benchmark、SuperGLUE Benchmark、问题回答任务和名词识别任务上的性能。

CAPro: Webly Supervised Learning with Cross-Modality Aligned Prototypes

paper_url: http://arxiv.org/abs/2310.09761
repo_url: https://github.com/yuleiqin/capro
paper_authors: Yulei Qin, Xingyu Chen, Yunhang Shen, Chaoyou Fu, Yun Gu, Ke Li, Xing Sun, Rongrong Ji
For: 这个论文旨在提出一种基于文本和图像协同学习的Visual Representation Learning方法，以适应现实世界中噪声的挑战。* Methods: 该方法使用文本prototype来选择干净的图像，并通过文本匹配来解决视觉prototype的混乱问题。此外，它还使用视觉特征空间来完善和提高图像的文本描述，以及使用集合bootstrap来鼓励更好的标签参考。* Results: 实验表明，CAPro可以 effetively处理现实世界中的噪声，并在单个标签和多个标签场景下达到新的州OF-THE-ART性能。它还展示了对开集认识的Robustness。代码可以在https://github.com/yuleiqin/capro上下载。

Abstract
Webly supervised learning has attracted increasing attention for its effectiveness in exploring publicly accessible data at scale without manual annotation. However, most existing methods of learning with web datasets are faced with challenges from label noise, and they have limited assumptions on clean samples under various noise. For instance, web images retrieved with queries of tiger cat (a cat species) and drumstick (a musical instrument) are almost dominated by images of tigers and chickens, which exacerbates the challenge of fine-grained visual concept learning. In this case, exploiting both web images and their associated texts is a requisite solution to combat real-world noise. In this paper, we propose Cross-modality Aligned Prototypes (CAPro), a unified prototypical contrastive learning framework to learn visual representations with correct semantics. For one thing, we leverage textual prototypes, which stem from the distinct concept definition of classes, to select clean images by text matching and thus disambiguate the formation of visual prototypes. For another, to handle missing and mismatched noisy texts, we resort to the visual feature space to complete and enhance individual texts and thereafter improve text matching. Such semantically aligned visual prototypes are further polished up with high-quality samples, and engaged in both cluster regularization and noise removal. Besides, we propose collective bootstrapping to encourage smoother and wiser label reference from appearance-similar instances in a manner of dictionary look-up. Extensive experiments on WebVision1k and NUS-WIDE (Web) demonstrate that CAPro well handles realistic noise under both single-label and multi-label scenarios. CAPro achieves new state-of-the-art performance and exhibits robustness to open-set recognition. Codes are available at https://github.com/yuleiqin/capro.

摘要
优先级学习在扫描公共访问数据时得到了越来越多的关注，因为它可以让计算机系统利用大规模数据来学习而无需手动标注。然而，现有的网络数据学习方法受到噪声标注的挑战，而且它们假设清晰的样本存在于各种噪声下。例如，通过查询“虎猫”和“鼓”的图像检索结果将主要是虎猫和鸡图像，这会使视觉概念学习受到挑战。在这种情况下，利用网络图像和其相关文本是一种必要的解决方案，以避免实际世界中的噪声。在这篇论文中，我们提出了跨模态对应原型（CAPro），一种统一的 проtotypical contrastive learning框架，用于学习正确的视觉表示。首先，我们利用文本原型，它们来自不同类型的概念定义，来选择干净的图像，并通过文本匹配来减少视觉原型的形成干扰。其次，为了处理缺失和不一致的噪声文本，我们 resorts to the visual feature space to complete and enhance individual texts, and thereafter improve text matching。这些semantically aligned的视觉原型被进一步练练以高质量样本，并在集群规则和噪声除除中使用。此外，我们提出了集体 bootstrap的方法，以便在 appearancely similar 的实例上进行词典查找，以便更好地引用表达相似的标签。广泛的实验表明，CAPro可以有效地处理现实世界中的噪声，并在单个标签和多标签场景中达到新的领先性性和robustness。代码可以在https://github.com/yuleiqin/capro中获取。

EX-FEVER: A Dataset for Multi-hop Explainable Fact Verification

paper_url: http://arxiv.org/abs/2310.09754
repo_url: https://github.com/dependentsign/EX-FEVER
paper_authors: Huanhuan Ma, Weizhi Xu, Yifan Wei, Liuji Chen, Liang Wang, Qiang Liu, Shu Wu, Liang Wang
for: 这个论文的目的是构建一个可解释的事实验证系统，以便在复杂多层扩展中实现自动化的真实检查。
methods: 该论文使用了一种新的基于Wikipedia文档的数据集，并提出了一种基于这些数据集的基线系统。该基线系统包括文档检索、解释生成和CLAIM验证三个部分。
results: 该论文通过对EX-FEVER数据集进行实验，发现现有的事实验证模型在这个数据集上表现不佳，而Large Language Models在这个任务中具有潜在的应用前景。

Abstract
Fact verification aims to automatically probe the veracity of a claim based on several pieces of evidence. Existing works are always engaging in the accuracy improvement, let alone the explainability, a critical capability of fact verification system. Constructing an explainable fact verification system in a complex multi-hop scenario is consistently impeded by the absence of a relevant high-quality dataset. Previous dataset either suffer from excessive simplification or fail to incorporate essential considerations for explainability. To address this, we present EX-FEVER, a pioneering dataset for multi-hop explainable fact verification. With over 60,000 claims involving 2-hop and 3-hop reasoning, each is created by summarizing and modifying information from hyperlinked Wikipedia documents. Each instance is accompanied by a veracity label and an explanation that outlines the reasoning path supporting the veracity classification. Additionally, we demonstrate a novel baseline system on our EX-FEVER dataset, showcasing document retrieval, explanation generation, and claim verification and observe that existing fact verification models trained on previous datasets struggle to perform well on our dataset. Furthermore, we highlight the potential of utilizing Large Language Models in the fact verification task. We hope our dataset could make a significant contribution by providing ample opportunities to explore the integration of natural language explanations in the domain of fact verification.

摘要
fact checking 目标是自动检查声明的真实性，基于多个证据。现有工作都在增强准确性，却忽视了解释性，这是 фаクト checking 系统的关键能力。在复杂多趋场景中构建可解释的 факт checking 系统受到高质量数据缺乏的阻碍。现有数据集都受到过度简化或者缺乏关键考虑因素，以解释性为前提，我们提出了 EX-FEVER 数据集，包含了2-hop和3-hop逻辑推理的60,000个声明，每个声明都由修改和摘要来自 hyperlinked Wikipedia 文档。每个实例都有真实性标签和解释，其中解释描述了支持真实性分类的逻辑路径。此外，我们还提出了一种基于 EX-FEVER 数据集的基线系统，包括文档检索、解释生成和声明验证，并观察到现有的 fact checking 模型在前一个数据集上表现不佳。此外，我们还强调了利用大型自然语言模型在 fact checking 任务中的潜在优势。我们希望我们的数据集能够为研究人员提供丰富的探索自然语言解释在验证领域的机会。

paper_url: http://arxiv.org/abs/2310.09755
repo_url: None
paper_authors: Sumedh Rasal, Sanjay Kumar Boddhu
for: 本研究旨在提供一种创新的路网生成方法，利用多modal的大语言模型（LLM）来生成细致、可行驾驶的路网。
methods: 我们的模型使用了BLIP-2架构 arXiv:2301.12597，利用预先冻结的图像编码器和大语言模型来创造一种多modal LLM。
results: 我们的实验结果表明，使用我们的方法可以准确地生成路网，并且不需要生成二进制分割mask。这种方法可以增强自主驾驶系统，特别是在路网场景中，准确的导航是非常重要的。

Abstract
This paper introduces an innovative approach to road network generation through the utilization of a multi-modal Large Language Model (LLM). Our model is specifically designed to process aerial images of road layouts and produce detailed, navigable road networks within the input images. The core innovation of our system lies in the unique training methodology employed for the large language model to generate road networks as its output. This approach draws inspiration from the BLIP-2 architecture arXiv:2301.12597, leveraging pre-trained frozen image encoders and large language models to create a versatile multi-modal LLM. Our work also offers an alternative to the reasoning segmentation method proposed in the LISA paper arXiv:2308.00692. By training the large language model with our approach, the necessity for generating binary segmentation masks, as suggested in the LISA paper arXiv:2308.00692, is effectively eliminated. Experimental results underscore the efficacy of our multi-modal LLM in providing precise and valuable navigational guidance. This research represents a significant stride in bolstering autonomous navigation systems, especially in road network scenarios, where accurate guidance is of paramount importance.

摘要

When can transformers reason with abstract symbols?

paper_url: http://arxiv.org/abs/2310.09753
repo_url: https://github.com/eboix/relational-reasoning
paper_authors: Enric Boix-Adsera, Omid Saremi, Emmanuel Abbe, Samy Bengio, Etai Littwin, Joshua Susskind
for: 这个研究探讨了基于抽象符号的关系理解任务中 transformer大语言模型（LLMs）的能力。
methods: 这些任务使用了许多年来在 neuroscience 文献中研究的基本建构物，包括程序编程、数学和语言理解。
results: 研究发现，对于回归任务， transformer 可以通过训练而泛化，但需要很大量的训练数据；对于下一个符号预测任务， transformer 的表示维度增加会导致泛化失败，但可以通过添加两个可调参数来降低数据量。

Abstract
We investigate the capabilities of transformer large language models (LLMs) on relational reasoning tasks involving abstract symbols. Such tasks have long been studied in the neuroscience literature as fundamental building blocks for more complex abilities in programming, mathematics, and verbal reasoning. For (i) regression tasks, we prove that transformers generalize when trained, but require astonishingly large quantities of training data. For (ii) next-token-prediction tasks with symbolic labels, we show an "inverse scaling law": transformers fail to generalize as their embedding dimension increases. For both settings (i) and (ii), we propose subtle transformer modifications which can reduce the amount of data needed by adding two trainable parameters per head.

摘要
我们研究transformer大语言模型（LLM）在关系理解任务中的能力，这些任务在神经科学文献中已经被认为是更进阶的程序设计、数学和语言理解能力的基础元素。 для（i）回溯任务，我们证明transformer会通过训练时通过数据大量化，但需要非常多的训练数据。 для（ii）下一个字符预测任务，我们显示了“倒推法则”：transformer在增加嵌入维度时无法通过数据大量化。 для beiden（i）和（ii）设定，我们提出了微妙的transformer修改，可以透过添加两个可调参数每个head来降低训练数据量。

Domain-Specific Language Model Post-Training for Indonesian Financial NLP

paper_url: http://arxiv.org/abs/2310.09736
repo_url: https://github.com/intanq/indonesian-financial-domain-lm
paper_authors: Ni Putu Intan Maharani, Yoga Yustiawan, Fauzy Caesar Rochim, Ayu Purwarianti
for: 这 paper 是关于金融领域的自然语言处理（NLP）任务中BERT和IndoBERT的应用和调整。
methods: 本文使用了预训练的IndoBERT，在小规模的INDONESIAN financial corpus上进行了后期训练。同时，我们还构建了INDONESIAN自然语言负面情感分类和主题分类数据集，并发布了一家BERT模型 для金融NLP。
results: 我们的实验结果表明，对特定领域下的下游任务进行适应性训练可以提高语言模型的效果。

Abstract
BERT and IndoBERT have achieved impressive performance in several NLP tasks. There has been several investigation on its adaption in specialized domains especially for English language. We focus on financial domain and Indonesian language, where we perform post-training on pre-trained IndoBERT for financial domain using a small scale of Indonesian financial corpus. In this paper, we construct an Indonesian self-supervised financial corpus, Indonesian financial sentiment analysis dataset, Indonesian financial topic classification dataset, and release a family of BERT models for financial NLP. We also evaluate the effectiveness of domain-specific post-training on sentiment analysis and topic classification tasks. Our findings indicate that the post-training increases the effectiveness of a language model when it is fine-tuned to domain-specific downstream tasks.

摘要
BERT和IndoBERT在多个自然语言处理任务中表现出色。有很多关于它们在特定领域的调整的研究。我们在金融领域和印度尼西亚语言中进行调整，使用小规模的印度尼西亚金融文本库进行后处理。在这篇论文中，我们构建了一个印度尼西亚自我指导的金融文本库，印度尼西亚金融情感分析数据集和印度尼西亚金融话题分类数据集，并推出一家BERT模型的家族用于金融NLPT。我们还评估了域名特定的后处理对情感分析和话题分类任务的效果。我们的发现表明，后处理可以提高语言模型在域名特定下滤波器任务的效果。

Enhancing Conversational Search: Large Language Model-Aided Informative Query Rewriting

paper_url: http://arxiv.org/abs/2310.09716
repo_url: None
paper_authors: Fanghua Ye, Meng Fang, Shenghui Li, Emine Yilmaz
for: 提高对话搜索的会话搜索性能，使用语言模型来重写用户查询。
methods: 使用大型语言模型（LLM）来重写查询，通过设计良好的指令来生成有用的重写。
results: 对QReCC数据集进行实验，显示了使用有用的重写可以提高搜索性能，尤其是使用稀有搜索器。

Abstract
Query rewriting plays a vital role in enhancing conversational search by transforming context-dependent user queries into standalone forms. Existing approaches primarily leverage human-rewritten queries as labels to train query rewriting models. However, human rewrites may lack sufficient information for optimal retrieval performance. To overcome this limitation, we propose utilizing large language models (LLMs) as query rewriters, enabling the generation of informative query rewrites through well-designed instructions. We define four essential properties for well-formed rewrites and incorporate all of them into the instruction. In addition, we introduce the role of rewrite editors for LLMs when initial query rewrites are available, forming a "rewrite-then-edit" process. Furthermore, we propose distilling the rewriting capabilities of LLMs into smaller models to reduce rewriting latency. Our experimental evaluation on the QReCC dataset demonstrates that informative query rewrites can yield substantially improved retrieval performance compared to human rewrites, especially with sparse retrievers.

摘要
查询重写play vital role in enhance conversational search by transforming context-dependent user queries into standalone forms. Existing approaches primarily leverages human-rewritten queries as labels to train query rewriting models. However, human rewrites may lack sufficient information for optimal retrieval performance. To overcome this limitation, we propose utilizing large language models (LLMs) as query rewriters, enabling the generation of informative query rewrites through well-designed instructions. We define four essential properties for well-formed rewrites and incorporate all of them into the instruction. In addition, we introduce the role of rewrite editors for LLMs when initial query rewrites are available, forming a "rewrite-then-edit" process. Furthermore, we propose distilling the rewriting capabilities of LLMs into smaller models to reduce rewriting latency. Our experimental evaluation on the QReCC dataset demonstrates that informative query rewrites can yield substantially improved retrieval performance compared to human rewrites, especially with sparse retrievers.Here's the text with Traditional Chinese characters:查询重写play vital role in enhance conversational search by transforming context-dependent user queries into standalone forms. Existing approaches primarily leverages human-rewritten queries as labels to train query rewriting models. However, human rewrites may lack sufficient information for optimal retrieval performance. To overcome this limitation, we propose utilizing large language models (LLMs) as query rewriters, enabling the generation of informative query rewrites through well-designed instructions. We define four essential properties for well-formed rewrites and incorporate all of them into the instruction. In addition, we introduce the role of rewrite editors for LLMs when initial query rewrites are available, forming a "rewrite-then-edit" process. Furthermore, we propose distilling the rewriting capabilities of LLMs into smaller models to reduce rewriting latency. Our experimental evaluation on the QReCC dataset demonstrates that informative query rewrites can yield substantially improved retrieval performance compared to human rewrites, especially with sparse retrievers.

New Advances in Body Composition Assessment with ShapedNet: A Single Image Deep Regression Approach

paper_url: http://arxiv.org/abs/2310.09709
repo_url: None
paper_authors: Navar Medeiros M. Nascimento, Pedro Cavalcante de Sousa Junior, Pedro Yuri Rodrigues Nunes, Suane Pires Pinheiro da Silva, Luiz Lannes Loureiro, Victor Zaban Bittencourt, Valden Luis Matos Capistrano Junior, Pedro Pedrosa Rebouças Filho
for: 增强体重分析方法
methods: 使用深度神经网络进行身体脂肪百分比（BFP）估算、个体识别和位置确定，只需单张照片
results: 比对 стандар方法双能X射线吸收仪(DXA)，1273名健康成人的Age、性别和BFP水平进行验证，结果表明ShapedNet比前方法提高19.5%，MAPE为4.91%，MAE为1.42%，Gender-neutral方法表现更优。

Abstract
We introduce a novel technique called ShapedNet to enhance body composition assessment. This method employs a deep neural network capable of estimating Body Fat Percentage (BFP), performing individual identification, and enabling localization using a single photograph. The accuracy of ShapedNet is validated through comprehensive comparisons against the gold standard method, Dual-Energy X-ray Absorptiometry (DXA), utilizing 1273 healthy adults spanning various ages, sexes, and BFP levels. The results demonstrate that ShapedNet outperforms in 19.5% state of the art computer vision-based approaches for body fat estimation, achieving a Mean Absolute Percentage Error (MAPE) of 4.91% and Mean Absolute Error (MAE) of 1.42. The study evaluates both gender-based and Gender-neutral approaches, with the latter showcasing superior performance. The method estimates BFP with 95% confidence within an error margin of 4.01% to 5.81%. This research advances multi-task learning and body composition assessment theory through ShapedNet.

摘要
我们介绍了一种新的技术called ShapedNet，用于提高身体组分评估。这种方法利用深度神经网络，能够估算身体脂肪百分比（BFP），进行个体识别，并使用单张图像进行地图化。我们 validate了ShapedNet的准确性，通过对杰基标方法（DXA）的1273名健康成人进行比较，这些成人来自不同的年龄、性别和BFP水平。结果表明，ShapedNet在19.5%的state of the art计算机视觉基础上进行身体脂肪估计方法中，表现出色，其 Mean Absolute Percentage Error（MAPE）为4.91%， Mean Absolute Error（MAE）为1.42。我们也评估了不同的性别和无性别方法，其中后者表现更出色。ShapedNet可以在95%的信息内，对BFP进行4.01%至5.81%的估计，这对身体组分评估理论和多任务学习做出了重要贡献。

AdaptSSR: Pre-training User Model with Augmentation-Adaptive Self-Supervised Ranking

paper_url: http://arxiv.org/abs/2310.09706
repo_url: https://github.com/yflyl613/AdaptSSR
paper_authors: Yang Yu, Qi Liu, Kai Zhang, Yuren Zhang, Chao Song, Min Hou, Yuqing Yuan, Zhihao Ye, Zaixi Zhang, Sanshi Lei Yu
for: 用于提高用户模型的泛化能力和数据稀缺性问题。
methods: 使用对数据进行增强学习，并采用自适应自我supervised排序任务来改善用户模型的准确性。
results: 经过extensive experiments的证明，该方法可以提高用户模型的性能和数据稀缺性问题。

Abstract
User modeling, which aims to capture users' characteristics or interests, heavily relies on task-specific labeled data and suffers from the data sparsity issue. Several recent studies tackled this problem by pre-training the user model on massive user behavior sequences with a contrastive learning task. Generally, these methods assume different views of the same behavior sequence constructed via data augmentation are semantically consistent, i.e., reflecting similar characteristics or interests of the user, and thus maximizing their agreement in the feature space. However, due to the diverse interests and heavy noise in user behaviors, existing augmentation methods tend to lose certain characteristics of the user or introduce noisy behaviors. Thus, forcing the user model to directly maximize the similarity between the augmented views may result in a negative transfer. To this end, we propose to replace the contrastive learning task with a new pretext task: Augmentation-Adaptive SelfSupervised Ranking (AdaptSSR), which alleviates the requirement of semantic consistency between the augmented views while pre-training a discriminative user model. Specifically, we adopt a multiple pairwise ranking loss which trains the user model to capture the similarity orders between the implicitly augmented view, the explicitly augmented view, and views from other users. We further employ an in-batch hard negative sampling strategy to facilitate model training. Moreover, considering the distinct impacts of data augmentation on different behavior sequences, we design an augmentation-adaptive fusion mechanism to automatically adjust the similarity order constraint applied to each sample based on the estimated similarity between the augmented views. Extensive experiments on both public and industrial datasets with six downstream tasks verify the effectiveness of AdaptSSR.

摘要
用户模型化，它目标是捕捉用户特点或兴趣，受到任务特定的标注数据的缺乏问题困扰。一些最近的研究解决了这个问题，通过在大量用户行为序列上进行预训练，并使用对偶学习任务。通常，这些方法假设不同的视图的同一个行为序列，通过数据扩展生成的方法是具有相同特征或兴趣的用户，并且尽量在特征空间中增加它们之间的一致性。然而，由于用户的兴趣和行为噪声的多样性，现有的扩展方法通常会消失用户的特征或引入噪声行为。因此，直接在扩展视图之间寻求最大的一致性可能会导致负面传播。为此，我们提议将对偶学习任务改为一种新的预文任务：增强自监 Ranking（AdaptSSR），这种任务可以降低对扩展视图的 semantic consistency 要求，而在预训练用户模型时，capture用户的相似性序列。具体来说，我们采用多对多对比损失函数，训练用户模型，捕捉扩展视图、显式扩展视图和其他用户视图之间的相似性序列。此外，考虑不同的数据扩展对不同的行为序列的不同影响，我们设计了数据扩展适应机制，自动调整每个样本所应用的相似性序列约束，基于每个扩展视图之间的估计相似性。我们在公共和工业数据集上进行了六个下游任务的广泛实验，并证明了 AdaptSSR 的效果。

Progressive Evidence Refinement for Open-domain Multimodal Retrieval Question Answering

paper_url: http://arxiv.org/abs/2310.09696
repo_url: None
paper_authors: Shuwen Yang, Anran Wu, Xingjiao Wu, Luwei Xiao, Tianlong Ma, Cheng Jin, Liang He
for: 提高 retrieval-based question answering 模型的表现，解决现有模型在使用压缩证据特征时丢失细节信息，以及Question和证据之间的特征提取差距。
methods: 提出了一种两阶段框架，包括进行逐步证据筛选、使用 semi-supervised contrastive learning 训练策略、多次询问回答等方法来解决这两个问题。
results: 通过广泛的实验证明，该模型在 WebQA 和 MultimodelQA 测试上达到了出色的表现。

Abstract
Pre-trained multimodal models have achieved significant success in retrieval-based question answering. However, current multimodal retrieval question-answering models face two main challenges. Firstly, utilizing compressed evidence features as input to the model results in the loss of fine-grained information within the evidence. Secondly, a gap exists between the feature extraction of evidence and the question, which hinders the model from effectively extracting critical features from the evidence based on the given question. We propose a two-stage framework for evidence retrieval and question-answering to alleviate these issues. First and foremost, we propose a progressive evidence refinement strategy for selecting crucial evidence. This strategy employs an iterative evidence retrieval approach to uncover the logical sequence among the evidence pieces. It incorporates two rounds of filtering to optimize the solution space, thus further ensuring temporal efficiency. Subsequently, we introduce a semi-supervised contrastive learning training strategy based on negative samples to expand the scope of the question domain, allowing for a more thorough exploration of latent knowledge within known samples. Finally, in order to mitigate the loss of fine-grained information, we devise a multi-turn retrieval and question-answering strategy to handle multimodal inputs. This strategy involves incorporating multimodal evidence directly into the model as part of the historical dialogue and question. Meanwhile, we leverage a cross-modal attention mechanism to capture the underlying connections between the evidence and the question, and the answer is generated through a decoding generation approach. We validate the model's effectiveness through extensive experiments, achieving outstanding performance on WebQA and MultimodelQA benchmark tests.

摘要
先进多模态模型已经在回答问题中取得了显著成功。然而，当前的多模态回答问题模型面临两个主要挑战。首先，使用压缩证据特征作为模型输入会导致证据中细详信息的损失。其次，证据和问题之间的特征EXTRACTING存在差距，这使得模型从证据中EXTRACTING答案相关的关键特征变得困难。我们提出了一个两个阶段框架，用于增强证据检索和回答问题。首先，我们提出了一种进步的证据精细化策略，用于选择重要的证据。这种策略使用迭代的证据检索方法，找到证据归并的逻辑顺序。它使用两轮的筛选来优化解决空间，从而更加确保时间效率。其次，我们引入了一种半监督对比学习训练策略，以扩展问题领域。这种策略基于负样本，通过对已知样本进行更多的探索，扩大问题领域的范围。 finally，为了减少细详信息的损失，我们提出了一种多turn检索和回答策略，用于处理多模态输入。这种策略将多模态证据直接 integrate into the model 中的历史对话和问题。同时，我们利用交叉模式注意力机制，捕捉证据和问题之间的下面连接。通过解码生成方法，我们生成答案。我们通过广泛的实验 validate the model's effectiveness, achieved outstanding performance on WebQA and MultimodelQA benchmark tests.

Spike-based Neuromorphic Computing for Next-Generation Computer Vision

paper_url: http://arxiv.org/abs/2310.09692
repo_url: None
paper_authors: Md Sakib Hasan, Catherine D. Schuman, Zhongyang Zhang, Tauhidur Rahman, Garrett S. Rose
for: 这篇论文旨在探讨 neuromorphic computing 技术的应用在计算机视觉领域。
methods: 论文使用了不同层次设计（设备、电路和算法）的示例来介绍 neuromorphic computing 技术。
results: 论文 conclude 了一些可能的应用和未来研究方向，例如用于 edge device 中的视觉任务。

Abstract
Neuromorphic Computing promises orders of magnitude improvement in energy efficiency compared to traditional von Neumann computing paradigm. The goal is to develop an adaptive, fault-tolerant, low-footprint, fast, low-energy intelligent system by learning and emulating brain functionality which can be realized through innovation in different abstraction layers including material, device, circuit, architecture and algorithm. As the energy consumption in complex vision tasks keep increasing exponentially due to larger data set and resource-constrained edge devices become increasingly ubiquitous, spike-based neuromorphic computing approaches can be viable alternative to deep convolutional neural network that is dominating the vision field today. In this book chapter, we introduce neuromorphic computing, outline a few representative examples from different layers of the design stack (devices, circuits and algorithms) and conclude with a few exciting applications and future research directions that seem promising for computer vision in the near future.

摘要
《神经omorphic computing》承诺在能效率方面比传统的各批计算模式提供多个数量级的提升。目标是开发一个适应、错误tolerant、占用空间小、快速、低能耗智能系统，通过学习和模拟大脑功能来实现。在复杂视觉任务中的能 consumption不断增加，而 Edge devices受限的资源变得越来越普遍，使得使用射频型神经omorphic computing方法可以成为视觉领域今天的可行替代方案。在这个书章中，我们介绍了神经omorphic computing，从不同层次的设计栈（设备、电路和算法）中选出了一些示例，并结束于一些有前途的应用和未来研究方向。

Configuration Validation with Large Language Models

paper_url: http://arxiv.org/abs/2310.09690
repo_url: https://github.com/ciri4conf/ciri
paper_authors: Xinyu Lian, Yinfang Chen, Runxiang Cheng, Jie Huang, Parth Thakkar, Tianyin Xu
for: 这个论文主要是为了探讨使用自然语言处理（NLP）和机器学习（ML）进行配置验证的可能性和效果。
methods: 该论文使用了大量的配置数据和不同的大语言模型（LLMs）进行验证，并开发了一个通用的 LLM-based 验证框架（Ciri）。该框架使用了小量的示例数据和几何学学习来设计有效的提示，并将多个 LLMs 的输出 validate 并聚合成验证结果。
results: 该论文的分析表明，使用 LLMs 进行配置验证是可能的，并且可以采用提示工程学习和几何学学习来设计有效的提示。但是，该论文还发现了一些问题，例如某些类型的错误配置不能准确地被检测出来，以及 LLMs 的偏见对一些常见的配置参数产生影响。

Abstract
Misconfigurations are the major causes of software failures. Existing configuration validation techniques rely on manually written rules or test cases, which are expensive to implement and maintain, and are hard to be comprehensive. Leveraging machine learning (ML) and natural language processing (NLP) for configuration validation is considered a promising direction, but has been facing challenges such as the need of not only large-scale configuration data, but also system-specific features and models which are hard to generalize. Recent advances in Large Language Models (LLMs) show the promises to address some of the long-lasting limitations of ML/NLP-based configuration validation techniques. In this paper, we present an exploratory analysis on the feasibility and effectiveness of using LLMs like GPT and Codex for configuration validation. Specifically, we take a first step to empirically evaluate LLMs as configuration validators without additional fine-tuning or code generation. We develop a generic LLM-based validation framework, named Ciri, which integrates different LLMs. Ciri devises effective prompt engineering with few-shot learning based on both valid configuration and misconfiguration data. Ciri also validates and aggregates the outputs of LLMs to generate validation results, coping with known hallucination and nondeterminism of LLMs. We evaluate the validation effectiveness of Ciri on five popular LLMs using configuration data of six mature, widely deployed open-source systems. Our analysis (1) confirms the potential of using LLMs for configuration validation, (2) understands the design space of LLMbased validators like Ciri, especially in terms of prompt engineering with few-shot learning, and (3) reveals open challenges such as ineffectiveness in detecting certain types of misconfigurations and biases to popular configuration parameters.

摘要
软件故障的主要原因是配置错误。现有的配置验证技术依赖于手动编写的规则或测试用例，实施和维护成本高，难以全面验证。使用机器学习（ML）和自然语言处理（NLP）进行配置验证是一个有前途的方向，但它面临着大规模配置数据和系统特有的特征和模型难以普适化的挑战。近年来，大型自然语言模型（LLMs）的进步表明可以解决一些长期存在的ML/NLP基于配置验证技术的局限性。在这篇论文中，我们提出了一种使用LLMs like GPT和Codex进行配置验证的探索性分析。 Specifically，我们不需要额外 fine-tuning或代码生成，就可以使用LLMs来验证配置。我们开发了一个通用的LLM-based validation框架，名为Ciri。Ciri使用几种LLMs，并开发了有效的提示工程学和少量学习技术，以适应不同的配置数据。Ciri还可以将LLMs的输出验证和聚合，以生成验证结果，并处理知道的投影和非决定性。我们对五种流行的LLMs进行了配置数据的六种广泛部署的开源系统的验证。我们的分析表明：（1）使用LLMs进行配置验证是有潜力的；（2）LLM-based validator如Ciri在提示工程学和少量学习方面存在设计空间，特别是在针对有效配置和错误配置数据进行少量学习；（3）存在一些未解决的挑战，例如对某些类型的配置错误不够有效。

A Partially Supervised Reinforcement Learning Framework for Visual Active Search

paper_url: http://arxiv.org/abs/2310.09689
repo_url: https://github.com/anindyasarkariith/psrl_vas
paper_authors: Anindya Sarkar, Nathan Jacobs, Yevgeniy Vorobeychik
for: 这篇论文旨在提出一个名为“视觉活搜”（Visual Active Search，VAS）的框架，用于使用视觉讯号来引导探索，以找到大地ospatial空间中的区域兴趣。
methods: 这篇论文使用了深度强化学习（Deep Reinforcement Learning，DRL）和传统的活搜搜寻（Active Search）两种方法。
results: 论文的实验结果显示，该方法可以对现有的DRL框架进行改进，并且在多个问题领域中表现出色。

Abstract
Visual active search (VAS) has been proposed as a modeling framework in which visual cues are used to guide exploration, with the goal of identifying regions of interest in a large geospatial area. Its potential applications include identifying hot spots of rare wildlife poaching activity, search-and-rescue scenarios, identifying illegal trafficking of weapons, drugs, or people, and many others. State of the art approaches to VAS include applications of deep reinforcement learning (DRL), which yield end-to-end search policies, and traditional active search, which combines predictions with custom algorithmic approaches. While the DRL framework has been shown to greatly outperform traditional active search in such domains, its end-to-end nature does not make full use of supervised information attained either during training, or during actual search, a significant limitation if search tasks differ significantly from those in the training distribution. We propose an approach that combines the strength of both DRL and conventional active search by decomposing the search policy into a prediction module, which produces a geospatial distribution of regions of interest based on task embedding and search history, and a search module, which takes the predictions and search history as input and outputs the search distribution. We develop a novel meta-learning approach for jointly learning the resulting combined policy that can make effective use of supervised information obtained both at training and decision time. Our extensive experiments demonstrate that the proposed representation and meta-learning frameworks significantly outperform state of the art in visual active search on several problem domains.

摘要
视觉活动搜索（VAS）被提出作为模型框架，使用视觉提示导航，以找到大型地ospatial领域中的区域兴趣点。其潜在应用包括珍稀野生动物贩卖活动热点检测、搜救找寻、武器贸易毒品人贩卖等。现状最佳实践方法包括应用深度强化学习（DRL），得到综合搜索策略，以及传统的活动搜索，将预测与自定义算法策略结合。而DRL框架在这些领域中已经大幅超越传统的活动搜索，但其端到端的结构不能充分利用在训练和决策过程中获得的指导信息。我们提议一种将DRL和传统的活动搜索结合在一起的方法，将搜索策略 decomposes为预测模块和搜索模块。预测模块根据任务嵌入和搜索历史生成地ospatial领域中的区域兴趣点，搜索模块将预测和搜索历史作为输入，输出搜索分布。我们开发了一种新的元学习方法，用于同时学习结果的结合策略，以便在训练和决策过程中有效地利用获得的指导信息。我们的广泛实验表明，我们的表示和元学习框架在多个问题领域中具有显著超越现状的性能。

Recursively-Constrained Partially Observable Markov Decision Processes

paper_url: http://arxiv.org/abs/2310.09688
repo_url: None
paper_authors: Qi Heng Ho, Tyler Becker, Ben Kraske, Zakariya Laouar, Martin Feather, Federico Rossi, Morteza Lahijanian, Zachary N. Sunberg
for: 本研究旨在解决受到转移不确定性和部分可见性限制的优化目标函数问题。
methods: 本研究使用了受到约束的部分 observable Markov Decision Process (C-POMDP) 模型，并提出了一种新的形式ulation，即 Recursively-Constrained POMDP (RC-POMDP)，以解决优化目标函数问题中的缺陷。
results: 研究发现，对于 C-POMDPs，优化策略可能会违反贝尔曼的优化原则，导致不良行为。而 RC-POMDPs 中的优化策略总是具有确定性，并且遵循贝尔曼的优化原则。研究还提出了一种基于点的动态计划算法，可以Synthesize RC-POMDPs 中的优化策略。在一系列 benchmark 问题中，研究发现 RC-POMDPs 中的策略比 C-POMDPs 中的策略更为愉悦，并且 demonstrate 了算法的可靠性。

Abstract
In many problems, it is desirable to optimize an objective function while imposing constraints on some other aspect of the problem. A Constrained Partially Observable Markov Decision Process (C-POMDP) allows modelling of such problems while subject to transition uncertainty and partial observability. Typically, the constraints in C-POMDPs enforce a threshold on expected cumulative costs starting from an initial state distribution. In this work, we first show that optimal C-POMDP policies may violate Bellman's principle of optimality and thus may exhibit pathological behaviors, which can be undesirable for many applications. To address this drawback, we introduce a new formulation, the Recursively-Constrained POMDP (RC-POMDP), that imposes additional history dependent cost constraints on the C-POMDP. We show that, unlike C-POMDPs, RC-POMDPs always have deterministic optimal policies, and that optimal policies obey Bellman's principle of optimality. We also present a point-based dynamic programming algorithm that synthesizes optimal policies for RC-POMDPs. In our evaluations, we show that policies for RC-POMDPs produce more desirable behavior than policies for C-POMDPs and demonstrate the efficacy of our algorithm across a set of benchmark problems.

摘要
很多问题中，您希望优化一个目标函数，同时对另一个问题进行约束。一个受过чай�odel Markov决策过程（C-POMDP）可以模拟这些问题，同时受到过程不确定和部分可见性的影响。通常，C-POMDPs 中的约束都是对起始状态分布的预期总成本下的阈值。在这项工作中，我们首先表明了C-POMDP 的优化策略可能会违反 Bellman 的优化原理，从而导致不良行为，这可能对许多应用程序不符合预期。为解决这个缺点，我们引入了一种新的形式，即循环约束 POMDP（RC-POMDP），该形式在 C-POMDP 中添加了历史висимые成本约束。我们表明了，不同于 C-POMDPs，RC-POMDPs 的优化策略总是具有确定性，并且优化策略都遵循 Bellman 的优化原理。我们还提出了一种基于点的动态Programming算法，该算法可以Synthesize RC-POMDPs 中的优化策略。在我们的评估中，我们发现RC-POMDPs 中的策略产生了更加愿意的行为，并且我们的算法在一组标准问题上进行了评估， demonstrate了其效果。

Generative artificial intelligence for de novo protein design

paper_url: http://arxiv.org/abs/2310.09685
repo_url: None
paper_authors: Adam Winnifrith, Carlos Outeiral, Brian Hie
for: 这些论文的目的是探讨人工智能在蛋白质设计中的应用，以扩展我们对蛋白质的工程能力。
methods: 这些论文使用了生成型架构，如语言模型和扩散过程，生成 novel yet realistic 的蛋白质，以实现预先定义的功能和性能。
results: 现代设计协议的实验成功率已经接近 20%，从而扩大了蛋白质设计的可能性。 despite extensive progress, there are still challenges in the field, such as determining the best in silico metrics to prioritize designs for experimental testing, and designing proteins that can undergo large conformational changes or be regulated by post-translational modifications and other cellular processes.

Abstract
Engineering new molecules with desirable functions and properties has the potential to extend our ability to engineer proteins beyond what nature has so far evolved. Advances in the so-called "de novo" design problem have recently been brought forward by developments in artificial intelligence. Generative architectures, such as language models and diffusion processes, seem adept at generating novel, yet realistic proteins that display desirable properties and perform specified functions. State-of-the-art design protocols now achieve experimental success rates nearing 20%, thus widening the access to de novo designed proteins. Despite extensive progress, there are clear field-wide challenges, for example in determining the best in silico metrics to prioritise designs for experimental testing, and in designing proteins that can undergo large conformational changes or be regulated by post-translational modifications and other cellular processes. With an increase in the number of models being developed, this review provides a framework to understand how these tools fit into the overall process of de novo protein design. Throughout, we highlight the power of incorporating biochemical knowledge to improve performance and interpretability.

摘要
engineer新分子 possessing desired functions and properties has the potential to extend our ability to engineer proteins beyond what nature has so far evolved. Advances in the so-called "de novo" design problem have recently been brought forward by developments in artificial intelligence. Generative architectures, such as language models and diffusion processes, seem adept at generating novel, yet realistic proteins that display desirable properties and perform specified functions. State-of-the-art design protocols now achieve experimental success rates nearing 20%, thus widening the access to de novo designed proteins. Despite extensive progress, there are clear field-wide challenges, for example in determining the best in silico metrics to prioritize designs for experimental testing, and in designing proteins that can undergo large conformational changes or be regulated by post-translational modifications and other cellular processes. With an increase in the number of models being developed, this review provides a framework to understand how these tools fit into the overall process of de novo protein design. Throughout, we highlight the power of incorporating biochemical knowledge to improve performance and interpretability.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and other countries. Traditional Chinese is also widely used, especially in Taiwan and Hong Kong.

2023-10-15

On Statistical Learning of Branch and Bound for Vehicle Routing Optimization

Farzi Data: Autoregressive Data Distillation

Chinese Painting Style Transfer Using Deep Generative Models

Specialized Deep Residual Policy Safe Reinforcement Learning-Based Controller for Complex and Continuous State-Action Spaces

Seeking Next Layer Neurons’ Attention for Error-Backpropagation-Like Training in a Multi-Agent Network Framework

Chameleon: a Heterogeneous and Disaggregated Accelerator System for Retrieval-Augmented Language Models

“Reading Between the Heat”: Co-Teaching Body Thermal Signatures for Non-intrusive Stress Detection

Estimating Uncertainty in Multimodal Foundation Models using Public Internet Data

Homophone Disambiguation Reveals Patterns of Context Mixing in Speech Transformers

Predictive Maintenance Model Based on Anomaly Detection in Induction Motors: A Machine Learning Approach Using Real-Time IoT Data

Lifelong Sequence Generation with Dynamic Module Expansion and Adaptation

In-Context Learning with Iterative Demonstration Selection

Statistical inference using machine learning and classical techniques based on accumulated local effects (ALE)

Federated Multi-Objective Learning

Federated Reinforcement Learning for Resource Allocation in V2X Networks

MERTech: Instrument Playing Technique Detection Using Self-Supervised Pretrained Model With Multi-Task Finetuning

ACES: Generating Diverse Programming Puzzles with Autotelic Language Models and Semantic Descriptors

CoCoFormer: A controllable feature-rich polyphonic music generation method

Explaining How a Neural Network Play the Go Game and Let People Learn

MIR2: Towards Provably Robust Multi-Agent Reinforcement Learning by Mutual Information Regularization

Large Language Models for In-Context Student Modeling: Synthesizing Student’s Behavior in Visual Programming from One-Shot Observation

Optimizing K-means for Big Data: A Comparative Study

Negative Sampling with Adaptive Denoising Mixup for Knowledge Graph Embedding

Notes on Applicability of Explainable AI Methods to Machine Learning Models Using Features Extracted by Persistent Homology

Worst-Case Analysis is Maximum-A-Posteriori Estimation

A Critical Survey on Fairness Benefits of XAI

VLIS: Unimodal Language Models Guide Multimodal Language Generation

Improving Access to Justice for the Indian Population: A Benchmark for Evaluating Translation of Legal Text to Indian Languages

DropMix: Better Graph Contrastive Learning with Harder Negative Samples

Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer

CAPro: Webly Supervised Learning with Cross-Modality Aligned Prototypes

EX-FEVER: A Dataset for Multi-hop Explainable Fact Verification

Beyond Segmentation: Road Network Generation with Multi-Modal LLMs

When can transformers reason with abstract symbols?

Domain-Specific Language Model Post-Training for Indonesian Financial NLP

Enhancing Conversational Search: Large Language Model-Aided Informative Query Rewriting

New Advances in Body Composition Assessment with ShapedNet: A Single Image Deep Regression Approach

AdaptSSR: Pre-training User Model with Augmentation-Adaptive Self-Supervised Ranking

Progressive Evidence Refinement for Open-domain Multimodal Retrieval Question Answering

Spike-based Neuromorphic Computing for Next-Generation Computer Vision

Configuration Validation with Large Language Models

A Partially Supervised Reinforcement Learning Framework for Visual Active Search

Recursively-Constrained Partially Observable Markov Decision Processes

Generative artificial intelligence for de novo protein design