2023-09-07

cs.CL

cs.CL - 2023-09-07

Evaluation and Mitigation of Agnosia in Multimodal Large Language Models

paper_url: http://arxiv.org/abs/2309.04041
repo_url: None
paper_authors: Jiaying Lu, Jinmeng Rao, Kezhen Chen, Xiaoyuan Guo, Yawen Zhang, Baochen Sun, Carl Yang, Jie Yang
for: 本研究旨在评估和 Mitigate Multimodal Agnosia（agnosia in MLLMs），以提高MLLMs的视觉语言任务表现。
methods: 我们提出了一种名为EMMA（评估和 Mitigation of Multimodal Agnosia）的 frameworks，包括评估模块和修复模块。评估模块通过自动生成多样化的视觉问答示例来评估 MLLMs 中的agnosia的程度和方面。修复模块通过多模式 instrucion 调整来减少 MLLMs 中的agnosia。
results: 我们在七个state-of-the-art MLLMs 上进行了9K 个测试样本的评估，发现大多数模型具有不同程度和方面的agnosia。我们还开发了一个细化的 instrucion 集，并对 MLLMs 进行了调整，从而得到了明显的改进。

Abstract
While Multimodal Large Language Models (MLLMs) are widely used for a variety of vision-language tasks, one observation is that they sometimes misinterpret visual inputs or fail to follow textual instructions even in straightforward cases, leading to irrelevant responses, mistakes, and ungrounded claims. This observation is analogous to a phenomenon in neuropsychology known as Agnosia, an inability to correctly process sensory modalities and recognize things (e.g., objects, colors, relations). In our study, we adapt this similar concept to define "agnosia in MLLMs", and our goal is to comprehensively evaluate and mitigate such agnosia in MLLMs. Inspired by the diagnosis and treatment process in neuropsychology, we propose a novel framework EMMA (Evaluation and Mitigation of Multimodal Agnosia). In EMMA, we develop an evaluation module that automatically creates fine-grained and diverse visual question answering examples to assess the extent of agnosia in MLLMs comprehensively. We also develop a mitigation module to reduce agnosia in MLLMs through multimodal instruction tuning on fine-grained conversations. To verify the effectiveness of our framework, we evaluate and analyze agnosia in seven state-of-the-art MLLMs using 9K test samples. The results reveal that most of them exhibit agnosia across various aspects and degrees. We further develop a fine-grained instruction set and tune MLLMs to mitigate agnosia, which led to notable improvement in accuracy.

摘要
While 多模态大语言模型（MLLM）广泛用于视觉语言任务，一种观察是它们在简单情况下也可能错误处理视觉输入或不遵循文本指令，导致无关回答、错误和未根据的声明。这种现象与神经科学中的识别障碍（Agnosia）类似，即不正确处理感官模式并识别物体（例如对象、颜色、关系）。在我们的研究中，我们采用了类似的概念，称之为“MLLM中的识别障碍”，我们的目标是全面评估并 Mitigate 这种障碍。受到神经科学诊断和治疗过程的启发，我们提出了一个新的框架EMMA（评估和 Mitigate 多模态识别障碍）。在EMMA框架中，我们开发了评估模块，可以自动生成多样化和细致的视觉问答示例，用于全面评估 MLMM 中的识别障碍。此外，我们还开发了修复模块，通过多模态指令调整，降低 MLMM 中的识别障碍。为证明我们的框架的有效性，我们使用了7个状态计算机最新的 MLMM，并对9K测试样本进行评估。结果表明，大多数 MLMM 具有不同方面和程度的识别障碍。我们进一步开发了细致的指令集，并对 MLMM 进行调整，这导致了显著的准确率提高。

Multiple Representation Transfer from Large Language Models to End-to-End ASR Systems

paper_url: http://arxiv.org/abs/2309.04031
repo_url: None
paper_authors: Takuma Udagawa, Masayuki Suzuki, Gakuto Kurata, Masayasu Muraoka, George Saon
for: 将大语言模型（LLM）的知识融入到端到端自动语音识别系统（ASR）中
methods: explore 多种方法来从不同层、上下文和模型中获取和传递多个 LLM 表示
results: 示出将多个 LLM 表示传递到扬声器基于 ASR 系统可以是一个有效的代替方案

Abstract
Transferring the knowledge of large language models (LLMs) is a promising technique to incorporate linguistic knowledge into end-to-end automatic speech recognition (ASR) systems. However, existing works only transfer a single representation of LLM (e.g. the last layer of pretrained BERT), while the representation of a text is inherently non-unique and can be obtained variously from different layers, contexts and models. In this work, we explore a wide range of techniques to obtain and transfer multiple representations of LLMs into a transducer-based ASR system. While being conceptually simple, we show that transferring multiple representations of LLMs can be an effective alternative to transferring only a single representation.

摘要
通过传输大语言模型（LLM）的知识是一种可能的方法，以把语言知识 integrate 到端到端自动语音识别（ASR）系统中。然而，现有的工作只是将单个表示（例如，预训练BERT的最后一层）传输到 ASR 系统中，而文本表示的非唯一性意味着可以从不同的层、上下文和模型中获得不同的表示。在这个工作中，我们探讨了许多技术来获取和传输多个 LLM 的表示，并将其 integrate 到传感器基于 ASR 系统中。虽然概念简单，但我们发现传输多个 LLM 的表示可以是一种有效的替代方案。

TIDE: Textual Identity Detection for Evaluating and Augmenting Classification and Language Models

paper_url: http://arxiv.org/abs/2309.04027
repo_url: https://github.com/google-research-datasets/TIDAL
paper_authors: Emmanuel Klu, Sameer Sethi
for: 这篇论文旨在提高文本分类器和自然语言处理模型中的公平性，尤其是在文本数据集中存在不公平和排斥的情况下。
methods: 这篇论文提出了一种新的标识词典（TIDAL），包括15,123个标识词和相关的感受上下文，以及一种用于提高标识上下文和机器学习公平性技术的注释和增强工具。
results: 对比之下，这种助手注释技术可以提高人工审核过程的可靠性和速度，而且在评估和修复过程中，这种方法可以检测更多的差距和生成更公平的模型。

Abstract
Machine learning models can perpetuate unintended biases from unfair and imbalanced datasets. Evaluating and debiasing these datasets and models is especially hard in text datasets where sensitive attributes such as race, gender, and sexual orientation may not be available. When these models are deployed into society, they can lead to unfair outcomes for historically underrepresented groups. In this paper, we present a dataset coupled with an approach to improve text fairness in classifiers and language models. We create a new, more comprehensive identity lexicon, TIDAL, which includes 15,123 identity terms and associated sense context across three demographic categories. We leverage TIDAL to develop an identity annotation and augmentation tool that can be used to improve the availability of identity context and the effectiveness of ML fairness techniques. We evaluate our approaches using human contributors, and additionally run experiments focused on dataset and model debiasing. Results show our assistive annotation technique improves the reliability and velocity of human-in-the-loop processes. Our dataset and methods uncover more disparities during evaluation, and also produce more fair models during remediation. These approaches provide a practical path forward for scaling classifier and generative model fairness in real-world settings.

摘要
We create a new, more comprehensive identity lexicon, TIDAL, which includes 15,123 identity terms and associated sense context across three demographic categories. We leverage TIDAL to develop an identity annotation and augmentation tool that can be used to improve the availability of identity context and the effectiveness of ML fairness techniques. We evaluate our approaches using human contributors, and additionally run experiments focused on dataset and model debiasing.Results show our assistive annotation technique improves the reliability and velocity of human-in-the-loop processes. Our dataset and methods uncover more disparities during evaluation, and also produce more fair models during remediation. These approaches provide a practical path forward for scaling classifier and generative model fairness in real-world settings.

LanSER: Language-Model Supported Speech Emotion Recognition

paper_url: http://arxiv.org/abs/2309.03978
repo_url: None
paper_authors: Taesik Gong, Josh Belanich, Krishna Somandepalli, Arsha Nagrani, Brian Eoff, Brendan Jou
for: 用于替代人工标注数据，使得大量语音数据和复杂的情感分类器易于扩展
methods: 使用弱监督学习，通过大语言模型对未标注数据进行推理，推导出情感标签
results: 在标准语音感知 task 上表现出色，比基eline模型更高效，并且表现出模型可以模拟语音中的语音特征。

Abstract
Speech emotion recognition (SER) models typically rely on costly human-labeled data for training, making scaling methods to large speech datasets and nuanced emotion taxonomies difficult. We present LanSER, a method that enables the use of unlabeled data by inferring weak emotion labels via pre-trained large language models through weakly-supervised learning. For inferring weak labels constrained to a taxonomy, we use a textual entailment approach that selects an emotion label with the highest entailment score for a speech transcript extracted via automatic speech recognition. Our experimental results show that models pre-trained on large datasets with this weak supervision outperform other baseline models on standard SER datasets when fine-tuned, and show improved label efficiency. Despite being pre-trained on labels derived only from text, we show that the resulting representations appear to model the prosodic content of speech.

摘要
<>转换文本到简化中文。> speech emotional recognition（SER）模型通常需要昂贵的人类标注数据进行训练，从而使得扩展方法到大量speech数据和细腻的情绪分类难以进行。我们介绍了LanSER，一种方法可以使用无标注数据进行训练，通过弱型学习进行启发。为了从文本中提取启发的情绪标签，我们使用文本排序approach，选择一个符合分类的情绪标签，通过自动语音识别获取speech脚本。我们的实验结果表明，在标准SER数据集上精化的模型，可以在标准SER数据集上超越其他基eline模型，并且显示改进的标签效率。即使只使用文本中 derivated的标签进行训练，我们发现模型 Apparently captured the prosody content of speech。

ImageBind-LLM: Multi-modality Instruction Tuning

paper_url: http://arxiv.org/abs/2309.03905
repo_url: https://github.com/opengvlab/llama-adapter
paper_authors: Jiaming Han, Renrui Zhang, Wenqi Shao, Peng Gao, Peng Xu, Han Xiao, Kaipeng Zhang, Chris Liu, Song Wen, Ziyu Guo, Xudong Lu, Shuai Ren, Yafei Wen, Xiaoxin Chen, Xiangyu Yue, Hongsheng Li, Yu Qiao
for: 图像和文本多模态指令调整方法
methods: 使用ImageBind进行图像编码器的学习绑定网络，并在LLaMA中进行语音和3D点云、视频等多模态指令注入。
results: 通过无需更多训练的方式，ImageBind-LLM可以响应多种模态指令，并且在语音、3D点云、视频等多种模态输入下展示出色的语言生成质量。

Abstract
We present ImageBind-LLM, a multi-modality instruction tuning method of large language models (LLMs) via ImageBind. Existing works mainly focus on language and image instruction tuning, different from which, our ImageBind-LLM can respond to multi-modality conditions, including audio, 3D point clouds, video, and their embedding-space arithmetic by only image-text alignment training. During training, we adopt a learnable bind network to align the embedding space between LLaMA and ImageBind's image encoder. Then, the image features transformed by the bind network are added to word tokens of all layers in LLaMA, which progressively injects visual instructions via an attention-free and zero-initialized gating mechanism. Aided by the joint embedding of ImageBind, the simple image-text training enables our model to exhibit superior multi-modality instruction-following capabilities. During inference, the multi-modality inputs are fed into the corresponding ImageBind encoders, and processed by a proposed visual cache model for further cross-modal embedding enhancement. The training-free cache model retrieves from three million image features extracted by ImageBind, which effectively mitigates the training-inference modality discrepancy. Notably, with our approach, ImageBind-LLM can respond to instructions of diverse modalities and demonstrate significant language generation quality. Code is released at https://github.com/OpenGVLab/LLaMA-Adapter.

摘要
我们介绍ImageBind-LLM，一种基于ImageBind的多Modalities语言模型调教方法。现有工作主要集中在语言和图像指令调教上，与之不同的我们的ImageBind-LLM可以应对多Modalities条件，包括音频、3D点云、视频和它们的内存空间加法。在训练过程中，我们采用一个学习可 Bind 网络将 ImageBind 的图像编码器的嵌入空间与 LLaMA 的各层单词Token进行对齐。然后，通过 Bind 网络转换的图像特征被添加到 LLMA 的所有层单词Token中，逐渐注入视觉指令，无需注意力和初始值。帮助 joint embedding of ImageBind，简单的图像文本训练使我们的模型在多Modalities指令跟踪能力方面表现出色。在推理过程中，多Modalities输入被 feed 到对应的 ImageBind 编码器中，并被一种提议的视觉缓存模型进行进一步跨模态嵌入增强。这种培成模型通过从三百万个 ImageBind 提取出的图像特征进行重新调教，有效地减少了训练-推理模态差异。需要注意的是，我们的方法可以让 ImageBind-LLM在不同的模态上 respond 到指令，并且在语言生成质量方面表现出色。代码可以在上下载。

Zero-Shot Audio Captioning via Audibility Guidance

paper_url: http://arxiv.org/abs/2309.03884
repo_url: None
paper_authors: Tal Shaharabany, Ariel Shaulov, Lior Wolf
For: The paper proposes a method for audio captioning, with the goal of generating fluent and faithful text descriptions of audio files.* Methods: The method uses a combination of three networks: a large language model (GPT-2), a multimodal matching network (ImageBind), and a text classifier. The method does not involve learning to perform captioning, but rather uses inference to generate text based on the input audio.* Results: The authors present results on the AudioCap dataset, showing that their method significantly enhances performance compared to a baseline lacking audibility guidance. Specifically, the method achieves high fluency, faithfulness to the input audio, and audibility.

Abstract
The task of audio captioning is similar in essence to tasks such as image and video captioning. However, it has received much less attention. We propose three desiderata for captioning audio -- (i) fluency of the generated text, (ii) faithfulness of the generated text to the input audio, and the somewhat related (iii) audibility, which is the quality of being able to be perceived based only on audio. Our method is a zero-shot method, i.e., we do not learn to perform captioning. Instead, captioning occurs as an inference process that involves three networks that correspond to the three desired qualities: (i) A Large Language Model, in our case, for reasons of convenience, GPT-2, (ii) A model that provides a matching score between an audio file and a text, for which we use a multimodal matching network called ImageBind, and (iii) A text classifier, trained using a dataset we collected automatically by instructing GPT-4 with prompts designed to direct the generation of both audible and inaudible sentences. We present our results on the AudioCap dataset, demonstrating that audibility guidance significantly enhances performance compared to the baseline, which lacks this objective.

摘要
audio captioning的任务与图像和视频标题类似，但它得到了许多 menos attention。我们提出了三个愿景 для标题音频：（i）生成文本的流畅性，（ii）生成文本与输入音频的准确性，以及（iii）可见性，即基于 solely audio 可以被感知的质量。我们的方法是一种零截法方法，即不学习权值，而是通过三个网络来实现这三个愿景：（i）一个大语言模型，我们使用 GPT-2，（ii）一个用于对 audio 文件和文本进行匹配的多模态匹配网络，我们使用 ImageBind，（iii）一个用于分类文本的文本分类器，我们使用一个自动收集的数据集来训练。我们在 AudioCap 数据集上展示了我们的结果，表明可见导航对基准的性能有很大改善。

On Large Language Models’ Selection Bias in Multi-Choice Questions

paper_url: http://arxiv.org/abs/2309.03882
repo_url: None
paper_authors: Chujie Zheng, Hao Zhou, Fandong Meng, Jie Zhou, Minlie Huang
for: 本研究探讨了大语言模型（LLM）在多选问题（MCQ）中的表现，发现LLM具有自然选择偏好（selection bias），即选择位置在特定位置（如选项C）的偏好。
methods: 我们提出了一种新方法 called PriDe，它可以减轻 LLM 的选择偏好。PriDe 首先将观测到的模型预测分布分解成内在预测和选择ID的 posterior distribution。然后，它使用一小数量的测试样本来估计 posterior，并使用这些估计来减轻后续测试样本。
results: 我们的实验表明，PriDe 可以更有效率地减轻 LLM 的选择偏好，并且可以在不同领域中进行广泛应用。此外，我们还发现了 PriDe 估计的 posterior 可以很好地泛化到不同领域。

Abstract
Multi-choice questions (MCQs) serve as a common yet important task format in the research of large language models (LLMs). Our work shows that LLMs exhibit an inherent "selection bias" in MCQs, which refers to LLMs' preferences to select options located at specific positions (like "Option C"). This bias is prevalent across various LLMs, making their performance vulnerable to option position changes in MCQs. We identify that one primary cause resulting in selection bias is option numbering, i.e., the ID symbols A/B/C/D associated with the options. To mitigate selection bias, we propose a new method called PriDe. PriDe first decomposes the observed model prediction distribution into an intrinsic prediction over option contents and a prior distribution over option IDs. It then estimates the prior by permutating option contents on a small number of test samples, which is used to debias the subsequent test samples. We demonstrate that, as a label-free, inference-time method, PriDe achieves a more effective and computation-efficient debiasing than strong baselines. We further show that the priors estimated by PriDe generalize well across different domains, highlighting its practical potential in broader scenarios.

摘要
(Simplified Chinese translation)LLMs 在多选问题 (MCQs) 中表现出一种内在的 "选择偏见"，即选择位置为特定位置 (如 "选项 C" ) 的偏好。这种偏见遍布于多种 LLMs，使其在 MCQs 中的性能易受选项位置的变化影响。我们认为选项编号 (A/B/C/D) 是主要的 causal factor。为了解决这种偏见，我们提出了一种新的方法 called PriDe。 PriDe 首先将观察到的模型预测分布 decomposes 为内在预测和选项 ID 的先验分布。然后，它使用一小量测试样本中的内容排序来估计先验，并用这些先验来减偏测试样本。我们示出，作为无标签、执行时的方法，PriDe 在减偏性和计算效率方面超过了强基elines。此外，我们还表明了 PriDe 在不同领域中的 praktische 潜力。

Introducing “Forecast Utterance” for Conversational Data Science

paper_url: http://arxiv.org/abs/2309.03877
repo_url: None
paper_authors: Md Mahadi Hassan, Alex Knipper, Shubhra Kanti Karmaker
for: 预测任务的帮助，帮助用户通过自然语言交互，无需深入了解机器学习过程。
methods: 利用新概念“预测utterance”，自动和准确地理解用户预测目标，并将其转化为机器学习任务。 employed two zero-shot methods：1）实体提取（EE），2）问答技术（QA）。
results: 通过三组精心制作的数据集进行实验，证明了我们的目标的可能性，并证明了EE和QA技术在解释预测utterance中的效果。

Abstract
Envision an intelligent agent capable of assisting users in conducting forecasting tasks through intuitive, natural conversations, without requiring in-depth knowledge of the underlying machine learning (ML) processes. A significant challenge for the agent in this endeavor is to accurately comprehend the user's prediction goals and, consequently, formulate precise ML tasks. In this paper, we take a pioneering step towards this ambitious goal by introducing a new concept called Forecast Utterance and then focus on the automatic and accurate interpretation of users' prediction goals from these utterances. Specifically, we frame the task as a slot-filling problem, where each slot corresponds to a specific aspect of the goal prediction task. We then employ two zero-shot methods for solving the slot-filling task, namely: 1) Entity Extraction (EE), and 2) Question-Answering (QA) techniques. Our experiments, conducted with three meticulously crafted data sets, validate the viability of our ambitious goal and demonstrate the effectiveness of both EE and QA techniques in interpreting Forecast Utterances.

摘要
想像一个智能代理人，能够帮助用户进行预测任务，通过自然、直观的对话，无需深入了解下面机器学习（ML）过程。这个目标具有很大挑战，即准确理解用户预测目标，并因此形成精准的ML任务。在这篇论文中，我们采取了一项先锋的方法，即预测话语（Forecast Utterance）的概念，然后将用户预测目标的自动和准确理解归类为槽筛问题。每个槽都对应于特定的预测目标方面。我们然后使用了两种零容量解决槽筛问题的方法，即实体提取（EE）和问答技术（QA）。我们的实验，在三个精心制作的数据集上进行，证明了我们的目标的可行性，并证明了EE和QA技术在解释预测话语中的效果。

USA: Universal Sentiment Analysis Model & Construction of Japanese Sentiment Text Classification and Part of Speech Dataset

paper_url: http://arxiv.org/abs/2309.03787
repo_url: https://huggingface.co/ganchengguang/USA-7B-instruction-incontext-learning
paper_authors: Chengguang Gan, Qinghao Zhang, Tatsunori Mori
for: 这篇论文旨在提高情感分析的性能，通过利用文本中单词的复杂影响和总体文本的含义相互强制的效果。
methods: 该论文提出了一种基于大语言模型的新方法，通过利用单词的情感方向对整个文本的情感含义产生强制效应，从而提高情感分析的性能。
results: 实验结果表明，该方法可以在四个新的情感分类和PART OF SPEECH（SCPOS）数据集上达到比gpt-3.5-turbo更高的性能。

Abstract
Sentiment analysis is a pivotal task in the domain of natural language processing. It encompasses both text-level sentiment polarity classification and word-level Part of Speech(POS) sentiment polarity determination. Such analysis challenges models to understand text holistically while also extracting nuanced information. With the rise of Large Language Models(LLMs), new avenues for sentiment analysis have opened. This paper proposes enhancing performance by leveraging the Mutual Reinforcement Effect(MRE) between individual words and the overall text. It delves into how word polarity influences the overarching sentiment of a passage. To support our research, we annotated four novel Sentiment Text Classification and Part of Speech(SCPOS) datasets, building upon existing sentiment classification datasets. Furthermore, we developed a Universal Sentiment Analysis(USA) model, with a 7-billion parameter size. Experimental results revealed that our model surpassed the performance of gpt-3.5-turbo across all four datasets, underscoring the significance of MRE in sentiment analysis.

摘要

The Daunting Dilemma with Sentence Encoders: Success on Standard Benchmarks, Failure in Capturing Basic Semantic Properties

paper_url: http://arxiv.org/abs/2309.03747
repo_url: None
paper_authors: Yash Mahajan, Naman Bansal, Shubhra Kanti Karmaker
for: 本研究旨在比较五种广泛使用的句子编码器，包括Sentence-BERT、Universal Sentence Encoder（USE）、LASER、InferSent和Doc2vec，以及他们在下游任务中的表现和捕捉基本Semantic property的能力。
methods: 本研究采用了Retrospective方法，对五种句子编码器进行了SentEvalbenchmark的评估，并设计了四个Semantic评估标准，包括Paraphrasing、Synonym Replacement、Antonym Replacement和Sentence Jumbling，以评估这些编码器的表现。
results: 结果显示，Sentence-BERT和USE模型在Paraphrasing标准上表现最佳，SBERT在这两个标准上表现更加出色。LASER在Synonym Replacement标准上表现最佳。 Interestingly，所有的句子编码器在Antonym Replacement和Jumbling标准上都失败了。这些结果表明，虽然这些句子编码器在SentEvalbenchmark上表现非常出色，但它们仍然努力捕捉一些基本Semantic property，从而存在一定的挑战。

Abstract
In this paper, we adopted a retrospective approach to examine and compare five existing popular sentence encoders, i.e., Sentence-BERT, Universal Sentence Encoder (USE), LASER, InferSent, and Doc2vec, in terms of their performance on downstream tasks versus their capability to capture basic semantic properties. Initially, we evaluated all five sentence encoders on the popular SentEval benchmark and found that multiple sentence encoders perform quite well on a variety of popular downstream tasks. However, being unable to find a single winner in all cases, we designed further experiments to gain a deeper understanding of their behavior. Specifically, we proposed four semantic evaluation criteria, i.e., Paraphrasing, Synonym Replacement, Antonym Replacement, and Sentence Jumbling, and evaluated the same five sentence encoders using these criteria. We found that the Sentence-Bert and USE models pass the paraphrasing criterion, with SBERT being the superior between the two. LASER dominates in the case of the synonym replacement criterion. Interestingly, all the sentence encoders failed the antonym replacement and jumbling criteria. These results suggest that although these popular sentence encoders perform quite well on the SentEval benchmark, they still struggle to capture some basic semantic properties, thus, posing a daunting dilemma in NLP research.

摘要
在这篇论文中，我们采用了回顾方法来评估和比较五种流行的句子编码器，即 Sentence-BERT、全局句子编码器（USE）、LASER、InferSent和Doc2vec，它们在下游任务上的表现和捕捉基本Semantic Properties的能力之间的关系。我们首先对所有五种句子编码器在Popular SentEval benchmark上进行评估，发现它们在多种Popular downstream task上表现很好。然而，无法找到一个在所有情况下赢的单一赢家，我们设计了进一步的实验来深入了解它们的行为。特别是，我们提出了四个semantic评估标准，即Paraphrasing、Synonym Replacement、Antonym Replacement和Sentence Jumbling，并对五种句子编码器进行评估。我们发现，Sentence-Bert和USE模型通过Paraphrasing标准，SBERT在两者中表现更佳。LASER在Synonym Replacement标准上表现出色。另外，所有句子编码器在Antonym Replacement和Jumbling标准上都失败。这些结果表明，虽然这些流行的句子编码器在SentEval benchmark上表现很好，但它们仍然很难捕捉一些基本的Semantic Properties，这是NPRL研究所存在的一个棘手的问题。

Word segmentation granularity in Korean

paper_url: http://arxiv.org/abs/2309.03713
repo_url: None
paper_authors: Jungyeul Park, Mija Kim
for: 这篇论文关注韩语Word分 segmentation粒度的问题。
methods: 文章分析了不同粒度水平的Word分 segmentation，并提供了适用于特定语言处理和文献注释任务的多种不同粒度水平。
results: 研究发现，只分Functional morphemes，包括案例标记和词尾变化，而不分其他词形 derivation 的粒度可以获得最佳的结构分析性能。这与过去韩语处理的常见做法不同，该做法需要将所有 morphemes 分开。

Abstract
This paper describes word {segmentation} granularity in Korean language processing. From a word separated by blank space, which is termed an eojeol, to a sequence of morphemes in Korean, there are multiple possible levels of word segmentation granularity in Korean. For specific language processing and corpus annotation tasks, several different granularity levels have been proposed and utilized, because the agglutinative languages including Korean language have a one-to-one mapping between functional morpheme and syntactic category. Thus, we analyze these different granularity levels, presenting the examples of Korean language processing systems for future reference. Interestingly, the granularity by separating only functional morphemes including case markers and verbal endings, and keeping other suffixes for morphological derivation results in the optimal performance for phrase structure parsing. This contradicts previous best practices for Korean language processing, which has been the de facto standard for various applications that require separating all morphemes.

摘要

Exploring an LM to generate Prolog Predicates from Mathematics Questions

paper_url: http://arxiv.org/abs/2309.03667
repo_url: None
paper_authors: Xiaocheng Yang, Yik-Cheung Tam
for: 这个论文的目的是调查是否可以通过 fine-tuning 模型来提高逻辑推理能力，并发现 Prolog 代码生成模型在性能方面的优势。methods: 该论文使用了链条思维的技术来 fine-tune LLaMA7B 模型，并开发了不同的 fine-tuned LLaMA7B 模型，包括 Prolog 代码生成模型、Prolog 代码 + 链条思维模型和链条思维 + Prolog 代码模型。results: 结果显示，Prolog 代码生成模型在性能方面超过了基eline模型，而组合生成模型并不显示明显的改善。同时，基于 GSM8K 的 Prolog 词库和相应的 fine-tuned Prolog 代码生成模型也被发布到研究社区。

Abstract
Recently, there has been a surge in interest in NLP driven by ChatGPT. ChatGPT, a transformer-based generative language model of substantial scale, exhibits versatility in performing various tasks based on natural language. Nevertheless, large language models often exhibit poor performance in solving mathematics questions that require reasoning. Prior research has demonstrated the effectiveness of chain-of-thought prompting in enhancing reasoning capabilities. Now, we aim to investigate whether fine-tuning a model for the generation of Prolog codes, a logic language, and subsequently passing these codes to a compiler can further improve accuracy. Consequently, we employ chain-of-thought to fine-tune LLaMA7B as a baseline model and develop other fine-tuned LLaMA7B models for the generation of Prolog code, Prolog code + chain-of-thought, and chain-of-thought + Prolog code, respectively. The results reveal that the Prolog generation model surpasses the baseline in performance, while the combination generation models do not yield significant improvements. The Prolog corpus based on GSM8K and the correspondingly finetuned Prolog generation model based on LLaMA7B are released to the research community.

摘要
最近，有一股关注NLP的兴趣，归功于ChatGPT。ChatGPT是一种基于变换器的生成语言模型，表现了对自然语言的多种任务的灵活性。然而，大型语言模型经常在解决需要推理的数学问题上表现不佳。先前的研究表明了链式思维提示的效iveness。因此，我们想 investigate whether fine-tuning a model for the generation of Prolog codes, a logic language, and subsequently passing these codes to a compiler can further improve accuracy。所以，我们使用链式思维来精度调整LLaMA7B基eline模型，并开发了LLaMA7B模型的其他精度调整版本，包括生成Prolog代码、Prolog代码+链式思维和链式思维+Prolog代码。结果表明，Prolog代码生成模型的性能高于基eline，而组合生成模型不带有显著改善。基于GSM8K的Prolog词库和相应地精度调整的Prolog代码生成模型是由LLaMA7B模型来解决研究社区发布。

BNS-Net: A Dual-channel Sarcasm Detection Method Considering Behavior-level and Sentence-level Conflicts

paper_url: http://arxiv.org/abs/2309.03658
repo_url: None
paper_authors: Liming Zhou, Xiaowei Xu, Xiaodong Wang
for: 本研究旨在开发一种能够有效地Identify sarcastic expressions in text的模型，以便在实际应用中提高人机交互的效果。
methods: 本研究使用了深度学习方法，包括用户profile、句子结构和情感词等特征来进行分类。同时，研究还提出了一种名为BNS-Net的双渠道干扰检测模型，该模型通过两个渠道来检测句子中的干扰信息，其中一个渠道是基于行为水平的干扰检测，另一个渠道是基于句子水平的干扰检测。
results: 经过多种比较和缺省实验，研究发现BNS-Net可以准确地识别句子中的干扰表达，并达到了当前领域的最佳性能。

Abstract
Sarcasm detection is a binary classification task that aims to determine whether a given utterance is sarcastic. Over the past decade, sarcasm detection has evolved from classical pattern recognition to deep learning approaches, where features such as user profile, punctuation and sentiment words have been commonly employed for sarcasm detection. In real-life sarcastic expressions, behaviors without explicit sentimental cues often serve as carriers of implicit sentimental meanings. Motivated by this observation, we proposed a dual-channel sarcasm detection model named BNS-Net. The model considers behavior and sentence conflicts in two channels. Channel 1: Behavior-level Conflict Channel reconstructs the text based on core verbs while leveraging the modified attention mechanism to highlight conflict information. Channel 2: Sentence-level Conflict Channel introduces external sentiment knowledge to segment the text into explicit and implicit sentences, capturing conflicts between them. To validate the effectiveness of BNS-Net, several comparative and ablation experiments are conducted on three public sarcasm datasets. The analysis and evaluation of experimental results demonstrate that the BNS-Net effectively identifies sarcasm in text and achieves the state-of-the-art performance.

摘要
<>translate_language: zh-CNSarcasm detection 是一个二分类任务，旨在判断给定的语句是否带有讽刺意味。过去十年，讽刺检测从经典的模式识别演化到深度学习方法，其中用户 profiling、标点符号和情感词等特征被广泛使用于讽刺检测。在实际生活中，讽刺表达中的行为常常没有显式的情感cue，因此我们提出了一种双通道讽刺检测模型，称为BNS-Net。该模型在两个通道中考虑行为和句子冲突。通道1：行为水平冲突通道，使用核心动词重建文本，同时利用修改注意机制来强调冲突信息。通道2：句子水平冲突通道，引入外部情感知识，将文本分解成显式和隐式句子，捕捉句子之间的冲突。为验证BNS-Net的有效性，我们进行了多种比较和减少实验，并对三个公共讽刺数据集进行了分析和评估。实验结果分析表明，BNS-Net能够有效地在文本中检测到讽刺，并达到了当前领域的state-of-the-art表现。

Loquacity and Visible Emotion: ChatGPT as a Policy Advisor

paper_url: http://arxiv.org/abs/2309.03595
repo_url: None
paper_authors: Claudia Biancotti, Carolina Camassa
for: 这个论文是用来评估 chatGPT 软件在复杂的写作任务中的潜力。
methods: 作者使用 chatGPT 软件来 compose a policy brief for the Board of the Bank of Italy，并对其生成的内容进行了评估。
results: 研究发现，使用 chatGPT 软件可以加速工作流程，提供结构化的内容建议，并在秒钟内生成大量、语言正确的文本。但是，需要专家指导，以避免生成的内容不正确、 superficiale 或 irrelevant。

Abstract
ChatGPT, a software seeking to simulate human conversational abilities, is attracting increasing attention. It is sometimes portrayed as a groundbreaking productivity aid, including for creative work. In this paper, we run an experiment to assess its potential in complex writing tasks. We ask the software to compose a policy brief for the Board of the Bank of Italy. We find that ChatGPT can accelerate workflows by providing well-structured content suggestions, and by producing extensive, linguistically correct text in a matter of seconds. It does, however, require a significant amount of expert supervision, which partially offsets productivity gains. If the app is used naively, output can be incorrect, superficial, or irrelevant. Superficiality is an especially problematic limitation in the context of policy advice intended for high-level audiences.

摘要
chatgpt，一种软件尝试模拟人类对话能力，正在吸引越来越多的关注。它有时被描述为创新的产品ivity工具，包括创作工作。在这篇论文中，我们进行了一项实验，以评估它在复杂的写作任务中的潜力。我们问软件组织一份意大利银行董事会的政策报告。我们发现，chatgpt可以加速工作流程，提供结构良好的内容建议，并在几秒钟之内生成大量、语言正确的文本。但它需要较大的专家监督，这部分抵消了产生效益。如果应用不当，输出可能是错误的、 superficiale 或无关的。 superficiale 是特别问题在政策建议高级审批人群时。

paper_url: http://arxiv.org/abs/2309.03564
repo_url: https://github.com/thudm/chatglm2-6b
paper_authors: Hongzhi Qi, Qing Zhao, Changwei Song, Wei Zhai, Dan Luo, Shuo Liu, Yi Jing Yu, Fan Wang, Huijing Zou, Bing Xiang Yang, Jianqiang Li, Guanghui Fu
for: 本研究旨在探讨大语言模型在中国社交媒体平台上的应用前景，特别是在心理学领域。
methods: 本研究采用了supervised learning作为基础，对三种不同的大语言模型（零shot、少shot和精度调整）进行了比较。
results: 研究发现，大语言模型在中国社交媒体任务上表现明显差，主要是因为模型无法完全理解微分category。然而，GPT-4在多个场景中表现出优异，而GPT-3.5在精度调整后表现出显著提高在自杀风险分类中。

Abstract
Large language models, particularly those akin to the rapidly progressing GPT series, are gaining traction for their expansive influence. While there is keen interest in their applicability within medical domains such as psychology, tangible explorations on real-world data remain scant. Concurrently, users on social media platforms are increasingly vocalizing personal sentiments; under specific thematic umbrellas, these sentiments often manifest as negative emotions, sometimes escalating to suicidal inclinations. Timely discernment of such cognitive distortions and suicidal risks is crucial to effectively intervene and potentially avert dire circumstances. Our study ventured into this realm by experimenting on two pivotal tasks: suicidal risk and cognitive distortion identification on Chinese social media platforms. Using supervised learning as a baseline, we examined and contrasted the efficacy of large language models via three distinct strategies: zero-shot, few-shot, and fine-tuning. Our findings revealed a discernible performance gap between the large language models and traditional supervised learning approaches, primarily attributed to the models' inability to fully grasp subtle categories. Notably, while GPT-4 outperforms its counterparts in multiple scenarios, GPT-3.5 shows significant enhancement in suicide risk classification after fine-tuning. To our knowledge, this investigation stands as the maiden attempt at gauging large language models on Chinese social media tasks. This study underscores the forward-looking and transformative implications of using large language models in the field of psychology. It lays the groundwork for future applications in psychological research and practice.

摘要
大型语言模型，特别是快速进步的GPT系列，在医疗领域中获得了更多的关注，但实际应用在社交媒体平台上的探索仍然很少。同时，社交媒体上的用户对自己的情感表达越来越 vocal，经常表达负面情感，甚至有些情感可能会演变为自杀倾向。在时间上有效地识别这些认知扭曲和自杀风险是关键的，以便有效地干预并可能避免不良情况。我们的研究进入了这个领域，通过实验在中文社交媒体平台上进行了两项重要任务：自杀风险识别和认知扭曲识别。使用supervised learning作为基础，我们评估和比较了大型语言模型的三种不同策略：零shot、几shot和精度调整。我们发现了大型语言模型和传统supervised learning方法之间的明显性能差距，主要归因于模型无法完全理解细部类别。特别是GPT-4在多个场景中表现出色，而GPT-3.5在精度调整后表现出了自杀风险分类的明显改善。根据我们所知，这是中文社交媒体任务上首次使用大型语言模型进行探索。这些研究实践了大型语言模型在医学领域的前瞻性和转型性应用，奠定了未来在医学研究和实践中的基础。

All Labels Together: Low-shot Intent Detection with an Efficient Label Semantic Encoding Paradigm

paper_url: http://arxiv.org/abs/2309.03563
repo_url: None
paper_authors: Jiangshu Du, Congying Xia, Wenpeng Yin, Tingting Liang, Philip S. Yu
for: 本文旨在提出一种综合利用意图标签的一对所有系统，以便对输入句子进行与所有标签候选者进行比较。
methods: 本文提出了一种终到端的一对所有系统，并使用了 indirect supervision from paraphrasing 进行预训练。
results: 实验表明，当训练资源极为稀少时，One-to-All 系统在 1-, 3- 和 5-shot 设置下表现出状态之 arts 性能。

Abstract
In intent detection tasks, leveraging meaningful semantic information from intent labels can be particularly beneficial for few-shot scenarios. However, existing few-shot intent detection methods either ignore the intent labels, (e.g. treating intents as indices) or do not fully utilize this information (e.g. only using part of the intent labels). In this work, we present an end-to-end One-to-All system that enables the comparison of an input utterance with all label candidates. The system can then fully utilize label semantics in this way. Experiments on three few-shot intent detection tasks demonstrate that One-to-All is especially effective when the training resource is extremely scarce, achieving state-of-the-art performance in 1-, 3- and 5-shot settings. Moreover, we present a novel pretraining strategy for our model that utilizes indirect supervision from paraphrasing, enabling zero-shot cross-domain generalization on intent detection tasks. Our code is at https://github.com/jiangshdd/AllLablesTogether.

摘要
在意图探测任务中，利用意图标签中的有意义semantic信息可以对少量scenario非常有利。然而，现有的少量意图探测方法可能会忽略意图标签（例如，对意图视为索引）或者只使用一部分意图标签信息。在这种情况下，我们提出了一个端到端的One-to-All系统，允许输入语音与所有标签候选者进行比较。这种系统可以全面利用标签semantic信息。我们的实验表明，One-to-All在非常罕见的训练资源情况下表现非常出色，在1-, 3-和5-shot设置下达到了状态的最优性。此外，我们还提出了一种新的预训练策略，利用副本推导来启用零批次跨频域通用性。我们的代码可以在https://github.com/jiangshdd/AllLablesTogether中找到。

An Anchor Learning Approach for Citation Field Learning

paper_url: http://arxiv.org/abs/2309.03559
repo_url: None
paper_authors: Zilin Yuan, Borun Chen, Yimeng Dai, Yinghui Li, Hai-Tao Zheng, Rui Zhang
for: 本研究旨在提高参引字段学习性能，提供一种基于锚学习的新算法CIFAL。
methods: 本研究使用了锚学习来帮助捕捉参与不同风格的参引模式，并在不同的参引资料上进行训练。
results: 实验结果表明，CIFAL比现有方法提高了2.83%的场景级F1分数，并且对质量评估也有了较好的表现。

Abstract
Citation field learning is to segment a citation string into fields of interest such as author, title, and venue. Extracting such fields from citations is crucial for citation indexing, researcher profile analysis, etc. User-generated resources like academic homepages and Curriculum Vitae, provide rich citation field information. However, extracting fields from these resources is challenging due to inconsistent citation styles, incomplete sentence syntax, and insufficient training data. To address these challenges, we propose a novel algorithm, CIFAL (citation field learning by anchor learning), to boost the citation field learning performance. CIFAL leverages the anchor learning, which is model-agnostic for any Pre-trained Language Model, to help capture citation patterns from the data of different citation styles. The experiments demonstrate that CIFAL outperforms state-of-the-art methods in citation field learning, achieving a 2.83% improvement in field-level F1-scores. Extensive analysis of the results further confirms the effectiveness of CIFAL quantitatively and qualitatively.

摘要
使用简化中文文本：引用字段学习是将引用字符串分解成 интерес的字段，如作者、标题和会议地点。从引用中提取这些字段非常重要，以便引用索引、研究人员资料分析等。用户生成的资源，如学术主页和CV，提供了丰富的引用字段信息。然而，从这些资源中提取字段具有挑战，主要是因为引用风格的不一致，句子结构不完整，以及训练数据的不足。为解决这些挑战，我们提出了一种新的算法，称为CIFAL（引用字段学习 by anchor learning），以提高引用字段学习性能。CIFAL利用了锚学习，这是对任何预训练语言模型的模型无关的，来帮助捕捉不同引用风格的引用模式。实验表明，CIFAL在引用字段学习方面的表现明显超过了现有方法，提高了字段级F1分数2.83%。广泛的分析结果证明了CIFAL的有效性，并且证明了其量化和质量上的优势。

Machine Learning for Tangible Effects: Natural Language Processing for Uncovering the Illicit Massage Industry & Computer Vision for Tactile Sensing

paper_url: http://arxiv.org/abs/2309.03470
repo_url: None
paper_authors: Rui Ouyang
for: This thesis explores how computer science can be used to fight human trafficking, specifically in the illicit massage industry, and how computer vision can create a sense of touch.
methods: The thesis uses natural language processing (NLP) to monitor the industry and create datasets, and also considers the use of agent-based models to create synthetic financial data. Additionally, the thesis describes the development of a novel sensor, the Digger Finger, which adapts the Gelsight sensor to find objects in granular media, and a low-cost six-axis force-torque sensor using a webcam and printed reference marker.
results: The thesis shows how NLP can be used to derive insights into the labor pressures and language barriers faced by employees in the industry, as well as the income, demographics, and societal pressures affecting sex buyers. Additionally, the thesis reports on the development of a novel sensor that is up to a hundred times less expensive than commercial sensors, allowing for a wider range of applications.

Abstract
I explore two questions in this thesis: how can computer science be used to fight human trafficking? And how can computer vision create a sense of touch? I use natural language processing (NLP) to monitor the United States illicit massage industry (IMI), a multi-billion dollar industry that offers not just therapeutic massages but also commercial sexual services. Employees of this industry are often immigrant women with few job opportunities, leaving them vulnerable to fraud, coercion, and other facets of human trafficking. Monitoring spatiotemporal trends helps prevent trafficking in the IMI. By creating datasets with three publicly-accessible websites: Google Places, Rubmaps, and AMPReviews, combined with NLP techniques such as bag-of-words and Word2Vec, I show how to derive insights into the labor pressures and language barriers that employees face, as well as the income, demographics, and societal pressures affecting sex buyers. I include a call-to-action to other researchers given these datasets. I also consider how to creating synthetic financial data, which can aid with counter-trafficking in the banking sector. I use an agent-based model to create both tabular and payee-recipient graph data. I then consider the role of computer vision in making tactile sensors. I report on a novel sensor, the Digger Finger, that adapts the Gelsight sensor to finding objects in granular media. Changes include using a wedge shape to facilitate digging, replacing the internal lighting LEDs with fluorescent paint, and adding a vibrator motor to counteract jamming. Finally, I also show how to use a webcam and a printed reference marker, or fiducial, to create a low-cost six-axis force-torque sensor. This sensor is up to a hundred times less expensive than commercial sensors, allowing for a wider range of applications. For this and earlier chapters I release design files and code as open source.

摘要
我在这个论文中考察了两个问题：如何使用计算机科学来战击人口贩卖？以及如何使用计算机视觉创造一种触觉感？我使用自然语言处理（NLP）监测美国违法按摩业（IMI），这是一个多亿元的业务，提供了不仅按摩服务，还有商业性的性服务。该行业的员工多为移民女性，她们受到了诈骗、强迫和其他人贩卖的威胁。通过使用Google Places、Rubmaps和AMPReviews等三个公共可访问的网站，以及NLP技术如袋子模型和Word2Vec，我提取了员工面临的劳动压力和语言障碍，以及客户的收入、人口、社会压力等信息。我还采取了对其他研究人员的呼吁，并考虑了如何创建假金融数据，以便在银行业中应对贩卖。我使用代理模型创建了标题和付款人关系数据。然后，我考察了计算机视觉在创造触觉感方面的作用。我报告了一种新的感知器，即挖掘手指（Digger Finger），它将GELSight感知器改进为在粗粒媒体中找到物体。改进包括使用梯形尝试挖掘，取代内部照明LED的涂抹fluorescent paint，并添加震动电动机以防止堵塞。此外，我还显示了如何使用Webcam和印刷参照标记（fiducial）创建一个低成本的六个 степени力矩感知器。这个感知器比商业感知器便宜得多，可以扩展到更多的应用场景。为此和之前章节，我发布了设计文件和代码作为开源。

Improving Open Information Extraction with Large Language Models: A Study on Demonstration Uncertainty

paper_url: http://arxiv.org/abs/2309.03433
repo_url: None
paper_authors: Chen Ling, Xujiang Zhao, Xuchao Zhang, Yanchi Liu, Wei Cheng, Haoyu Wang, Zhengzhang Chen, Takao Osaki, Katsushi Matsuda, Haifeng Chen, Liang Zhao
for: 提高大语言模型（LLM）在开放信息提取任务（OIE）中的表现。
methods: 提出了多种在Context中学习策略和表现 uncertainty 量化模块，以提高 LLM 的 instruction-following 能力和生成关系准确性。
results: 在三个 OIE benchmark 数据集上进行了实验，并证明了我们的方法可以与现有的指导方法相比， both quantitatively and qualitatively。

Abstract
Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text, typically in the form of (subject, relation, object) triples. Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks due to two key issues. First, LLMs struggle to distinguish irrelevant context from relevant relations and generate structured output due to the restrictions on fine-tuning the model. Second, LLMs generates responses autoregressively based on probability, which makes the predicted relations lack confidence. In this paper, we assess the capabilities of LLMs in improving the OIE task. Particularly, we propose various in-context learning strategies to enhance LLM's instruction-following ability and a demonstration uncertainty quantification module to enhance the confidence of the generated relations. Our experiments on three OIE benchmark datasets show that our approach holds its own against established supervised methods, both quantitatively and qualitatively.

摘要

From Base to Conversational: Japanese Instruction Dataset and Tuning Large Language Models

paper_url: http://arxiv.org/abs/2309.03412
repo_url: https://github.com/retarfi/jallm
paper_authors: Masahiro Suzuki, Masanori Hirano, Hiroki Sakaji
for: 这个论文目的是为了证明大语言模型（LLM）的交互性需要进行调教。
methods: 这个论文使用了扩展和筛选现有数据集，并将其应用于日本预训练基模型。它还使用了低级别适应（LoRA）调教技术来调教日本和英语基模型。
results: 研究表明，日本指令集数据集的效iveness得到了证明。同时，通过指令调教，即使使用较小的LLM，在下游任务中的性能也会得到改善。研究的指令集、调教模型和实现都公开在线可用。

Abstract
Instruction tuning is essential for large language models (LLMs) to become interactive. While many instruction tuning datasets exist in English, there is a noticeable lack in other languages. Also, their effectiveness has not been well verified in non-English languages. We construct a Japanese instruction dataset by expanding and filtering existing datasets and apply the dataset to a Japanese pre-trained base model. We performed Low-Rank Adaptation (LoRA) tuning on both Japanese and English existing models using our instruction dataset. We evaluated these models from both quantitative and qualitative perspectives. As a result, the effectiveness of Japanese instruction datasets is confirmed. The results also indicate that even with relatively small LLMs, performances in downstream tasks would be improved through instruction tuning. Our instruction dataset, tuned models, and implementation are publicly available online.

摘要
大型语言模型（LLM）的指令调整是必要的，以使其成为互动型。然而，英语以外的语言的指令调整数据集尚缺乏，而且它们的效果尚未得到充分验证。我们使用扩展和筛选现有数据集，构建了一个日本语言指令数据集。我们使用这个数据集对日语预训练模型进行了低级别适应（LoRA）调整，并对英语和日语现有模型进行了相同的调整。我们从量化和质量两个角度进行了评估。结果表明，日语指令数据集的效果得到了证明，同时也表明，即使使用较小的LLM，在下游任务中的性能仍可以通过指令调整得到改进。我们的指令数据集、调整模型和实现在线公开可用。

2023-09-07

cs.LG

cs.LG - 2023-09-07

Bayesian Dynamic DAG Learning: Application in Discovering Dynamic Effective Connectome of Brain

paper_url: http://arxiv.org/abs/2309.07080
repo_url: None
paper_authors: Abdolmahdi Bagheri, Mohammad Pasande, Kevin Bello, Alireza Akhondi-Asl, Babak Nadjar Araabi
for: EXTRACTING THE DYNAMIC EFFECTIVE CONNECTOME (DEC)
methods: BAYESIAN DYNAMIC DAG LEARNING WITH M-MATRICES ACYCLICITY CHARACTERIZATION (BDyMA)
results: MORE ACCURATE AND RELIABLE DEC COMPARED TO STATE-OF-THE-ART AND BASELINE METHODS, IMPROVED RESULTS WHEN INCORPORATING DTI DATA AS PRIOR KNOWLEDGE

Abstract
Understanding the complex mechanisms of the brain can be unraveled by extracting the Dynamic Effective Connectome (DEC). Recently, score-based Directed Acyclic Graph (DAG) discovery methods have shown significant improvements in extracting the causal structure and inferring effective connectivity. However, learning DEC through these methods still faces two main challenges: one with the fundamental impotence of high-dimensional dynamic DAG discovery methods and the other with the low quality of fMRI data. In this paper, we introduce Bayesian Dynamic DAG learning with M-matrices Acyclicity characterization \textbf{(BDyMA)} method to address the challenges in discovering DEC. The presented dynamic causal model enables us to discover bidirected edges as well. Leveraging an unconstrained framework in the BDyMA method leads to more accurate results in detecting high-dimensional networks, achieving sparser outcomes, making it particularly suitable for extracting DEC. Additionally, the score function of the BDyMA method allows the incorporation of prior knowledge into the process of dynamic causal discovery which further enhances the accuracy of results. Comprehensive simulations on synthetic data and experiments on Human Connectome Project (HCP) data demonstrate that our method can handle both of the two main challenges, yielding more accurate and reliable DEC compared to state-of-the-art and baseline methods. Additionally, we investigate the trustworthiness of DTI data as prior knowledge for DEC discovery and show the improvements in DEC discovery when the DTI data is incorporated into the process.

摘要
理解大脑的复杂机制可以通过提取动态有效连接oma（DEC）来解开。最近，使用分数基数 Directed Acyclic Graph（DAG）发现方法已经显示出了重要的改进，可以捕捉 causal structure 和生成有效连接。然而，通过这些方法学习 DEC 仍面临两个主要挑战：一是高维动态 DAG 发现方法的基础不足，二是 fMRI 数据质量低。在这篇文章中，我们介绍了 Bayesian 动态 DAG 学习方法（BDyMA），用于解决这两个挑战。BDyMA 方法使得我们可以发现拥有 bidirected 边的动态 causal模型。利用 BDyMA 方法不受限制的框架，可以更加准确地检测高维网络，从而更好地提取 DEC。此外，BDyMA 方法的分数函数允许我们在动态 causal发现过程中 incorporate 先前知识，进一步提高结果的准确性。在 synthetic 数据和 HCP 数据上进行的广泛的 simulate 和实验表明，我们的方法可以解决两个主要挑战，并且比基eline 和状态艺术方法更准确和可靠。此外，我们还 investigate DTI 数据的可靠性作为 DEC 发现的优先知识，并显示在 incorporate DTI 数据到发现过程中可以提高 DEC 的准确性。

SRN-SZ: Deep Leaning-Based Scientific Error-bounded Lossy Compression with Super-resolution Neural Networks

paper_url: http://arxiv.org/abs/2309.04037
repo_url: None
paper_authors: Jinyang Liu, Sheng Di, Sian Jin, Kai Zhao, Xin Liang, Zizhong Chen, Franck Cappello
for: 这篇论文的目的是提出一种基于深度学习的科学数据损失压缩器，以提高现有损失压缩器对困难压缩数据的压缩率。
methods: 这篇论文使用了基于超分解网络的嵌入式深度学习方法，以提高压缩率。
results: 对比各种现有压缩器，SRN-SZ实现了75%的压缩率提升，并且在同等PSNR下实现了80%的压缩率提升。

Abstract
The fast growth of computational power and scales of modern super-computing systems have raised great challenges for the management of exascale scientific data. To maintain the usability of scientific data, error-bound lossy compression is proposed and developed as an essential technique for the size reduction of scientific data with constrained data distortion. Among the diverse datasets generated by various scientific simulations, certain datasets cannot be effectively compressed by existing error-bounded lossy compressors with traditional techniques. The recent success of Artificial Intelligence has inspired several researchers to integrate neural networks into error-bounded lossy compressors. However, those works still suffer from limited compression ratios and/or extremely low efficiencies. To address those issues and improve the compression on the hard-to-compress datasets, in this paper, we propose SRN-SZ, which is a deep learning-based scientific error-bounded lossy compressor leveraging the hierarchical data grid expansion paradigm implemented by super-resolution neural networks. SRN-SZ applies the most advanced super-resolution network HAT for its compression, which is free of time-costing per-data training. In experiments compared with various state-of-the-art compressors, SRN-SZ achieves up to 75% compression ratio improvements under the same error bound and up to 80% compression ratio improvements under the same PSNR than the second-best compressor.

摘要
现代超级计算机系统的快速增长和大规模数据管理带来了科学数据管理的大 Challenge。以保持科学数据的可用性，错误约束的损失压缩被提出和开发为科学数据的大小减少的关键技术。各种科学仿真数据中的数据不同，一些数据无法使用现有的错误约束损失压缩器进行有效压缩。人工智能的最近成功激发了一些研究人员将神经网络 integrate into error-bounded lossy compressors。然而，这些工作仍然受到有限的压缩比和/或非常低的效率的限制。为了解决这些问题并提高压缩硬度很大的数据集，在这篇论文中，我们提出了SRN-SZ，它是基于科学数据的深度学习损失约束压缩器。SRN-SZ利用了最先进的超分辨率网络HAT进行压缩，无需时间成本的每个数据点训练。在对比各种当前状态的压缩器时，SRN-SZ可以达到75%的压缩率提升，同时保持与第二最佳压缩器相同的PSNR。

Brief technical note on linearizing recurrent neural networks (RNNs) before vs after the pointwise nonlinearity

paper_url: http://arxiv.org/abs/2309.04030
repo_url: None
paper_authors: Marino Pagan, Adrian Valente, Srdjan Ostojic, Carlos D. Brody
for: study the properties of recurrent neural networks (RNNs)
methods: linearization of activation dynamics and activity dynamics
results: context-dependent effects are more apparent under linearization of activity dynamics than under linearization of activation dynamics

Abstract
Linearization of the dynamics of recurrent neural networks (RNNs) is often used to study their properties. The same RNN dynamics can be written in terms of the ``activations" (the net inputs to each unit, before its pointwise nonlinearity) or in terms of the ``activities" (the output of each unit, after its pointwise nonlinearity); the two corresponding linearizations are different from each other. This brief and informal technical note describes the relationship between the two linearizations, between the left and right eigenvectors of their dynamics matrices, and shows that some context-dependent effects are readily apparent under linearization of activity dynamics but not linearization of activation dynamics.

摘要

Optimal Transport with Tempered Exponential Measures

paper_url: http://arxiv.org/abs/2309.04015
repo_url: None
paper_authors: Ehsan Amid, Frank Nielsen, Richard Nock, Manfred K. Warmuth
for: 这 paper 是关于最优运输问题的研究，具体来说是关于不 Regularized 最优运输和带有抽象 Regularization 的最优运输之间的比较。
methods: 这 paper 使用了一种基于温和凝聚推荐的方法，即使用温和凝聚推荐来对最优运输问题进行解决。
results: 这 paper 得到了一种中间点，即使用温和凝聚推荐可以获得非常快的近似算法和控制在某些程度上的稀疏性。此外，这种方法还适用于不均衡最优运输问题的设定中。

Abstract
In the field of optimal transport, two prominent subfields face each other: (i) unregularized optimal transport, ``\`a-la-Kantorovich'', which leads to extremely sparse plans but with algorithms that scale poorly, and (ii) entropic-regularized optimal transport, ``\`a-la-Sinkhorn-Cuturi'', which gets near-linear approximation algorithms but leads to maximally un-sparse plans. In this paper, we show that a generalization of the latter to tempered exponential measures, a generalization of exponential families with indirect measure normalization, gets to a very convenient middle ground, with both very fast approximation algorithms and sparsity which is under control up to sparsity patterns. In addition, it fits naturally in the unbalanced optimal transport problem setting as well.

摘要
在优化运输领域，有两个主要子领域面对着：（i）不规则优化运输，“\`a-la-Kantorovich”，它会导致非常稀疏的计划，但算法扩展不好；（ii）尼科特-库图里的热力学减扰优化运输，“\`a-la-Sinkhorn-Cuturi”，它可以得到近线性approximation算法，但会导致最稀疏的计划。在这篇论文中，我们表明了一种扩展后者的概念，即对温和减扰度量的扩展，可以达到非常方便的中间点，具有非常快的approximation算法和控制在某些几何结构上的稀疏性。此外，它自然地适应了不平衡优化运输问题的设定。

An Element-wise RSAV Algorithm for Unconstrained Optimization Problems

paper_url: http://arxiv.org/abs/2309.04013
repo_url: None
paper_authors: Shiheng Zhang, Jiahao Zhang, Jie Shen, Guang Lin
for: 提出了一种新的优化算法，元素缓和帮助器（E-RSAV），满足不Conditional energy dissipation law，并且在修改后和原始能量之间存在更好的对齐。
methods: 该算法具有严格的线性收敛证明，并且在单变量情况下提出了一种简单的加速算法，以提高收敛率至超 linear。
results: 通过大量的数学实验，证明了算法的稳定性和快速收敛性。

Abstract
We present a novel optimization algorithm, element-wise relaxed scalar auxiliary variable (E-RSAV), that satisfies an unconditional energy dissipation law and exhibits improved alignment between the modified and the original energy. Our algorithm features rigorous proofs of linear convergence in the convex setting. Furthermore, we present a simple accelerated algorithm that improves the linear convergence rate to super-linear in the univariate case. We also propose an adaptive version of E-RSAV with Steffensen step size. We validate the robustness and fast convergence of our algorithm through ample numerical experiments.

摘要
我团队提出了一种新的优化算法，元素缓和Scalar副Variables（E-RSAV），它满足不受条件的能量耗散定律并且在修改后的能量与原始能量之间存在改进的对齐。我们的算法具有准确的线性增长证明在凸设定下。此外，我们还提出了一种简单的加速算法，可以在单变量情况下提高线性增长率到超线性级别。此外，我们还提出了基于E-RSAV的适应版本，使用Steffensen步长。我们通过丰富的数学实验证明了我们的算法的稳定性和快速增长。

paper_url: http://arxiv.org/abs/2309.05607
repo_url: None
paper_authors: Aarav Patel, Peter Gloor
for:This paper aims to create a data-driven ESG evaluation system that provides better guidance and more systemized scores by incorporating social sentiment.methods:The authors use Python web scrapers to collect data from Wikipedia, Twitter, LinkedIn, and Google News for S&P 500 companies. They then clean and pass the data through NLP algorithms to obtain sentiment scores for ESG subcategories. Machine-learning algorithms are trained and calibrated to S&P Global ESG Ratings to test their predictive capabilities.results:The Random-Forest model shows encouraging results, with a mean absolute error of 13.4% and a correlation of 26.1% (p-value 0.0372). The authors conclude that measuring ESG social sentiment across sub-categories can help executives focus efforts on areas people care about most, and that this data-driven methodology can provide ratings for companies without coverage, allowing more socially responsible firms to thrive.Here’s the simplified Chinese text:for:这篇论文目的是创建一个基于数据的ESG评估系统，以提供更好的指导和更系统化的评估结果，通过包括社交情绪。methods:作者使用Python网络抓取器收集Wikipedia、Twitter、LinkedIn和Google News上的S&P 500公司数据，然后对数据进行清洁和NLP算法处理，以获取ESG子类别的情绪分数。他们使用机器学习算法和S&P全球ESG评级calibration来测试预测能力。results:Random Forest模型表现很有挑战性， Mean Absolute Error为13.4%，Correlation为26.1%（p-value 0.0372）。作者认为，通过评估ESG社交情绪 across sub-categories可以帮助行政官员更加专注于人们关心的方面，并且这种数据驱动的方法ологи可以为没有评估的公司提供评估结果，让更多的社会责任公司得以发展。

Abstract
Environmental Social Governance (ESG) is a widely used metric that measures the sustainability of a company practices. Currently, ESG is determined using self-reported corporate filings, which allows companies to portray themselves in an artificially positive light. As a result, ESG evaluation is subjective and inconsistent across raters, giving executives mixed signals on what to improve. This project aims to create a data-driven ESG evaluation system that can provide better guidance and more systemized scores by incorporating social sentiment. Social sentiment allows for more balanced perspectives which directly highlight public opinion, helping companies create more focused and impactful initiatives. To build this, Python web scrapers were developed to collect data from Wikipedia, Twitter, LinkedIn, and Google News for the S&P 500 companies. Data was then cleaned and passed through NLP algorithms to obtain sentiment scores for ESG subcategories. Using these features, machine-learning algorithms were trained and calibrated to S&P Global ESG Ratings to test their predictive capabilities. The Random-Forest model was the strongest model with a mean absolute error of 13.4% and a correlation of 26.1% (p-value 0.0372), showing encouraging results. Overall, measuring ESG social sentiment across sub-categories can help executives focus efforts on areas people care about most. Furthermore, this data-driven methodology can provide ratings for companies without coverage, allowing more socially responsible firms to thrive.

摘要
环境社会治理（ESG）是一个广泛使用的指标，用于衡量公司的可持续发展实践。目前，ESG是通过自我报告的公司签订，这使得公司可以 artificially 呈现出正面的形象。因此，ESG评估是主观的，不一致的评估人员给出的批评是混乱的，对于执行人来说是不明确的。这个项目的目标是创建一个数据驱动的 ESG 评估系统，可以提供更好的指导和更系统的分数，通过包含社会情绪的社会情绪。社会情绪可以提供更加均衡的视角，直接反映公众的意见，帮助公司制定更集中和有效的措施。为建立这个系统，我们使用 Python 网络抓取工具收集 Wikipedia、Twitter、LinkedIn 和 Google News 上的 S&P 500 公司数据。然后，我们清洁和处理数据，并通过自然语言处理（NLP）算法获得 ESG 子类别的情绪分数。使用这些特征，我们使用机器学习算法进行训练和调整，以测试其预测能力。Random Forest 模型是最强的模型，其 mean absolute error 为 13.4%， correlation 为 26.1%（p-value 0.0372），这表示有很好的结果。总的来说，通过社会情绪的 ESG 评估可以帮助执行人员更好地关注人们关心的方面。此外，这种数据驱动的方法ология可以为没有评估的公司提供评估分数，使更多的社会责任感的公司能够成功。

Derivation of Coordinate Descent Algorithms from Optimal Control Theory

paper_url: http://arxiv.org/abs/2309.03990
repo_url: None
paper_authors: I. M. Ross
for: 本文探讨了如何将不同优化算法合并到一起，并从优化控制理论中派生 coordinate descent 算法。
methods: 本文使用了最大原理和一组 max 函数作为 “控制” Lyapunov 函数，从而 derive 出基本的 coordinate descent 算法。
results: 研究发现，使用 Hessian 函数作为搜索向量的操作度量时，coordinate descent 算法的 converges 与 Lyapunov 函数的控制式衰减相关。

Abstract
Recently, it was posited that disparate optimization algorithms may be coalesced in terms of a central source emanating from optimal control theory. Here we further this proposition by showing how coordinate descent algorithms may be derived from this emerging new principle. In particular, we show that basic coordinate descent algorithms can be derived using a maximum principle and a collection of max functions as "control" Lyapunov functions. The convergence of the resulting coordinate descent algorithms is thus connected to the controlled dissipation of their corresponding Lyapunov functions. The operational metric for the search vector in all cases is given by the Hessian of the convex objective function.

摘要
最近，有人提出了论点，即不同的优化算法可能可以从优化控制理论中汇集起来。我们在这里进一步发展这个提议，证明基本坐标降降算法可以从这种新的原理中 derivation。具体来说，我们证明了使用最大原理和一组 max 函数作为 "控制" Lyapunov 函数，可以 derivation 基本坐标降降算法。这些算法的 converge 因此与它们相应的 Lyapunov 函数的控制式耗散相连接。搜索向量的运算度量在所有情况下均给出为对称的 Hessian Matrix。

DBsurf: A Discrepancy Based Method for Discrete Stochastic Gradient Estimation

paper_url: http://arxiv.org/abs/2309.03974
repo_url: None
paper_authors: Pau Mulet Arabi, Alec Flowers, Lukas Mauch, Fabien Cardinaux
for: 这篇论文是用于计算分布参数的梯度问题的研究，这个问题在科学和工程中都是非常重要的。
methods: 这篇论文使用了Reinforce方法来解决这个问题，但是Reinforce estimator具有较高的敏感度，即使在低抽样 régime下也会导致梯度估计不准确。因此，这篇论文提出了一种基于Reinforce的新采样方法，可以减少实际分布和抽样之间的差异。
results: 在各种任务上，DBsurf estimator在最小二乘问题中具有最低的方差，并在不同的数据集和抽样设置下训练VAEs时达到了最好的结果。此外，这篇论文还应用了DBsurf estimator来建立了一种简单而高效的神经网络架构搜索算法，实现了状态空间的最佳性能。

Abstract
Computing gradients of an expectation with respect to the distributional parameters of a discrete distribution is a problem arising in many fields of science and engineering. Typically, this problem is tackled using Reinforce, which frames the problem of gradient estimation as a Monte Carlo simulation. Unfortunately, the Reinforce estimator is especially sensitive to discrepancies between the true probability distribution and the drawn samples, a common issue in low sampling regimes that results in inaccurate gradient estimates. In this paper, we introduce DBsurf, a reinforce-based estimator for discrete distributions that uses a novel sampling procedure to reduce the discrepancy between the samples and the actual distribution. To assess the performance of our estimator, we subject it to a diverse set of tasks. Among existing estimators, DBsurf attains the lowest variance in a least squares problem commonly used in the literature for benchmarking. Furthermore, DBsurf achieves the best results for training variational auto-encoders (VAE) across different datasets and sampling setups. Finally, we apply DBsurf to build a simple and efficient Neural Architecture Search (NAS) algorithm with state-of-the-art performance.

摘要
计算对分布参数的期望梯度问题在多个科学和工程领域中出现。通常，这种问题使用Reinforce来解决，它将问题定义为一个Monte Carlo simulation中的gradient estimation问题。然而，Reinforce estimator尤其容易受到真实分布和抽样分布之间的差异影响，这在低抽样范围下导致gradient estimate不准确。在这篇论文中，我们介绍了DBsurf，一种基于Reinforce的抽样方法，可以减少抽样与真实分布之间的差异。为评估我们的计算器的性能，我们将其应用到多个任务上。与现有的计算器相比，DBsurf在通用的least squares问题中具有最低的方差，并在不同的数据集和抽样设置下训练VAEs时达到最好的结果。最后，我们使用DBsurf构建了一个简单而高效的Neural Architecture Search（NAS）算法，并达到了当前最佳性能。

Automatic Concept Embedding Model (ACEM): No train-time concepts, No issue!

paper_url: http://arxiv.org/abs/2309.03970
repo_url: None
paper_authors: Rishabh Jain
for: 这篇论文的目的是提出一种自动学习概念嵌入模型（ACEMs），以便在大量数据集上学习概念嵌入模型，而不需要手动标注概念。
methods: 该论文使用的方法是自动学习概念嵌入模型（ACEMs），它可以自动学习概念 annotations，而不需要手动标注。
results: 论文的结果表明，ACEMs 可以在大量数据集上学习概念嵌入模型，并且可以提供高度可解释的模型。

Abstract
Interpretability and explainability of neural networks is continuously increasing in importance, especially within safety-critical domains and to provide the social right to explanation. Concept based explanations align well with how humans reason, proving to be a good way to explain models. Concept Embedding Models (CEMs) are one such concept based explanation architectures. These have shown to overcome the trade-off between explainability and performance. However, they have a key limitation -- they require concept annotations for all their training data. For large datasets, this can be expensive and infeasible. Motivated by this, we propose Automatic Concept Embedding Models (ACEMs), which learn the concept annotations automatically.

摘要
neural networks的可解释性和可读性在不断增加的重要性，尤其在安全关键领域和提供社会的解释权。基于概念的解释方法align well with human reasoning，证明是一种好的解释模型。基于概念嵌入模型（CEMs）是一种这种基于概念的解释建筑。它们能够超越性能和解释之间的负面平衡。然而，它们需要全量的概念标注数据来训练。对于大量数据，这可能是昂贵和不可能的。motivated by这个问题，我们提议自动学习的概念嵌入模型（ACEMs），可以自动学习概念标注。

A Tutorial on the Non-Asymptotic Theory of System Identification

paper_url: http://arxiv.org/abs/2309.03873
repo_url: None
paper_authors: Ingvar Ziemann, Anastasios Tsiamis, Bruce Lee, Yassir Jedra, Nikolai Matni, George J. Pappas
for: 这篇论文主要针对系统identification理论中的非假设方法，尤其是 Linear system identification 领域内的一些问题。
methods: 这篇论文使用了覆盖技术、汉森-温特不等式和自Normalized Martingales 等工具，以提供Least squares 基于估计器的性能分析。
results: 论文结束部分概述了如何将介绍的想法推广到某些非线性identification问题。

Abstract
This tutorial serves as an introduction to recently developed non-asymptotic methods in the theory of -- mainly linear -- system identification. We emphasize tools we deem particularly useful for a range of problems in this domain, such as the covering technique, the Hanson-Wright Inequality and the method of self-normalized martingales. We then employ these tools to give streamlined proofs of the performance of various least-squares based estimators for identifying the parameters in autoregressive models. We conclude by sketching out how the ideas presented herein can be extended to certain nonlinear identification problems.

摘要
这个教程是非对称方法理论中线性系统识别的引入教程，我们强调这些工具在这个领域中特别有用，如覆盖技术、汉生-瑞特不等式和自正常马丁加尔方法。然后，我们使用这些工具来提供了直观的 proofs of 非对称基于最小二乘估计器在拟合 autoregressive 模型中的性能。最后，我们简要介绍了如何通过这些想法来扩展到某些非线性识别问题。

Mixtures of Gaussians are Privately Learnable with a Polynomial Number of Samples

paper_url: http://arxiv.org/abs/2309.03847
repo_url: None
paper_authors: Mohammad Afzali, Hassan Ashtiani, Christopher Liaw
for: 该研究问题是如何在具有权限私钥的情况下估算混合的高斯分布。
methods: 该研究使用了一种新的框架，它可能对其他任务也有用。具体来说，我们展示了如果一个分布类型（如高斯分布）满足两个条件：（1）可以列出decoding和（2）在总变量距离方面具有“本地小”的覆盖，则该分布的混合可以私下学习。
results: 我们的主要结果是，只需要$\tilde{O}(k^2 d^4 \log(1/\delta) / \alpha^2 \varepsilon)$个样本，就可以估算一个混合的高斯分布，保证总变量距离为α，并满足($\varepsilon$, $\delta$)-DP。这是首次没有任何结构假设的finite sample complexity上界。

Abstract
We study the problem of estimating mixtures of Gaussians under the constraint of differential privacy (DP). Our main result is that $\tilde{O}(k^2 d^4 \log(1/\delta) / \alpha^2 \varepsilon)$ samples are sufficient to estimate a mixture of $k$ Gaussians up to total variation distance $\alpha$ while satisfying $(\varepsilon, \delta)$-DP. This is the first finite sample complexity upper bound for the problem that does not make any structural assumptions on the GMMs. To solve the problem, we devise a new framework which may be useful for other tasks. On a high level, we show that if a class of distributions (such as Gaussians) is (1) list decodable and (2) admits a "locally small'' cover [BKSW19] with respect to total variation distance, then the class of its mixtures is privately learnable. The proof circumvents a known barrier indicating that, unlike Gaussians, GMMs do not admit a locally small cover [AAL21].

摘要
我们研究了在对应条件下的数据隐私（DP）下预测混合的高斯分布的问题。我们的主要结果是，需要$\tilde{O}(k^2 d^4 \log(1/\delta) / \alpha^2 \varepsilon)$样本来预测$k$个高斯分布，到达条件下的差异量$\alpha$，并且满足$(\varepsilon, \delta)$-DP。这是首个不假设GMM的结构性质的终点复杂度上限。我们还提出了一个新的框架，可能对其他任务有用。高水平上，我们证明了如果一个分布集（如高斯分布）满足以下两个条件：一、可以列出decodable（如高斯分布），二、对于总差异量有"地方小"的覆盖（BKSW19），则这个分布集的混合是私有可学习的。证明绕过了已知的障碍，表明GMM不满足"地方小"的覆盖（AAL21）。

Gradient-Based Feature Learning under Structured Data

paper_url: http://arxiv.org/abs/2309.03843
repo_url: None
paper_authors: Alireza Mousavi-Hosseini, Denny Wu, Taiji Suzuki, Murat A. Erdogdu
for: 本文研究了单指数模型在带有峰值结构的输入数据上的学习复杂度，并发现了一些有趣的现象。
methods: 本文使用了各种方法，包括径向加速器和权重正规化，以处理带有峰值结构的输入数据。
results: 研究发现，在带有峰值结构的情况下，通常使用的径向加速器可能会导致错误的方向恢复，而适当的权重正规化可以解决这个问题。此外，通过利用输入峰值结构和目标之间的对齐，本文可以获得改进的样本复杂度和超越下界的 rotationally invariant kernel 方法。

Abstract
Recent works have demonstrated that the sample complexity of gradient-based learning of single index models, i.e. functions that depend on a 1-dimensional projection of the input data, is governed by their information exponent. However, these results are only concerned with isotropic data, while in practice the input often contains additional structure which can implicitly guide the algorithm. In this work, we investigate the effect of a spiked covariance structure and reveal several interesting phenomena. First, we show that in the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction, even when the spike is perfectly aligned with the target direction. Next, we show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue. Further, by exploiting the alignment between the (spiked) input covariance and the target, we obtain improved sample complexity compared to the isotropic case. In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent while also outperforming lower bounds for rotationally invariant kernel methods.

摘要
近期研究显示，梯度基本学习单指数模型（即输入数据的1维投影函数）的样本复杂性被决定于其信息幂。但这些结果仅适用于各向同性数据，而在实际应用中输入数据通常具有附加的结构，这些结构可以通过梯度学习算法来导航。在这项工作中，我们调查了隐式协调结构的影响，并发现了一些有趣的现象。首先，我们发现在非各向同性设置下，通常使用的圆形梯度动力学可能无法回归真实方向，即使折射完全与目标方向一致。其次，我们发现适当的权重 нормализа可以解决这个问题。最后，通过利用输入协调矩阵和目标之间的对齐，我们获得了改进的样本复杂性，比如在隐式模型下，通过适当的权重 нормализа，梯度基本学习的样本复杂性可以独立于信息幂，同时超过了低界 для旋转不变kernel方法。

Early warning via transitions in latent stochastic dynamical systems

paper_url: http://arxiv.org/abs/2309.03842
repo_url: None
paper_authors: Lingyu Feng, Ting Gao, Wang Xiao, Jinqiao Duan
for: Early warnings for dynamical transitions in complex systems or high-dimensional observation data
methods: Directed anisotropic diffusion map to capture latent evolutionary dynamics in low-dimensional manifold
results: Successfully found appropriate effective coordinates and derived early warning signals capable of detecting tipping point during state transition, bridging latent dynamics with original dataset, validated as accurate and effective through numerical experiments.

Abstract
Early warnings for dynamical transitions in complex systems or high-dimensional observation data are essential in many real world applications, such as gene mutation, brain diseases, natural disasters, financial crises, and engineering reliability. To effectively extract early warning signals, we develop a novel approach: the directed anisotropic diffusion map that captures the latent evolutionary dynamics in low-dimensional manifold. Applying the methodology to authentic electroencephalogram (EEG) data, we successfully find the appropriate effective coordinates, and derive early warning signals capable of detecting the tipping point during the state transition. Our method bridges the latent dynamics with the original dataset. The framework is validated to be accurate and effective through numerical experiments, in terms of density and transition probability. It is shown that the second coordinate holds meaningful information for critical transition in various evaluation metrics.

摘要
<>将文本翻译成简化中文。<>在复杂系统或高维观测数据中提供早期警示是许多实际应用中非常重要的，如基因变化、脑疾病、自然灾害、金融危机和工程可靠性。为了有效提取早期警示信号，我们开发了一种新的方法：指定方向的异otropic扩散地图，可以捕捉低维抽象 manifold 中的潜在演化动力学。通过应用这种方法，我们成功地找到了有效的坐标，并 derivation 出早期警示信号，可以检测状态过渡中的致点。我们的方法将潜在动力学与原始数据相连接。这种框架在数学实验中被证明是准确和有效的，在密度和过渡概率方面。结果显示，第二坐标包含了关键过渡中的有用信息。

Bootstrapping Adaptive Human-Machine Interfaces with Offline Reinforcement Learning

paper_url: http://arxiv.org/abs/2309.03839
repo_url: None
paper_authors: Jensen Gao, Siddharth Reddy, Glen Berseth, Anca D. Dragan, Sergey Levine
for:* 论文旨在帮助用户完成Sequential Decision-Making任务，如Robotic Teleoperation，使用含有噪音和维度的命令信号（例如从脑机器 interfaces）。methods:* 使用人类在循环机器学习中参与系统，以提高系统性能，但通常受到个人用户数据收集的限制。* 提出一种基于强化学习的算法，用于训练接口将原始命令信号映射到动作，使用线上预训练和线下细化。results:* 在12名参与者进行的模拟导航任务中，我们的方法能够更多地帮助用户完成目标，比基eline方向式接口更successful。* 在模拟的Sawyer推动任务和Lunar Lander游戏中，我们的方法也比基eline接口更好。* 对于模拟用户命令的杜氏实验表明，每个方法的组成部分均具有重要的作用。

Abstract
Adaptive interfaces can help users perform sequential decision-making tasks like robotic teleoperation given noisy, high-dimensional command signals (e.g., from a brain-computer interface). Recent advances in human-in-the-loop machine learning enable such systems to improve by interacting with users, but tend to be limited by the amount of data that they can collect from individual users in practice. In this paper, we propose a reinforcement learning algorithm to address this by training an interface to map raw command signals to actions using a combination of offline pre-training and online fine-tuning. To address the challenges posed by noisy command signals and sparse rewards, we develop a novel method for representing and inferring the user's long-term intent for a given trajectory. We primarily evaluate our method's ability to assist users who can only communicate through noisy, high-dimensional input channels through a user study in which 12 participants performed a simulated navigation task by using their eye gaze to modulate a 128-dimensional command signal from their webcam. The results show that our method enables successful goal navigation more often than a baseline directional interface, by learning to denoise user commands signals and provide shared autonomy assistance. We further evaluate on a simulated Sawyer pushing task with eye gaze control, and the Lunar Lander game with simulated user commands, and find that our method improves over baseline interfaces in these domains as well. Extensive ablation experiments with simulated user commands empirically motivate each component of our method.

摘要
便捷 интерфейс可以帮助用户完成顺序决策任务，如机器人 теле操作，当 command signal 具有噪音和高维度特征时。现代人类在机器学习循环中的协作技术可以使这些系统进步，但它们通常受到个人用户数据收集的限制。在这篇论文中，我们提出了一种强化学习算法，用于训练 interface 将原始 command signal 映射到操作动作。为了解决噪音 command signal 和罕见奖励的挑战，我们开发了一种新的用户长期意图表示和推理方法。我们主要通过一个用户研究，在12名参与者通过眼睛移动控制128个command signal从webcam中获取的高维度指令信号中完成了一个模拟的导航任务，以证明我们的方法可以在噪音指令信号下帮助用户成功完成目标导航。此外，我们还在模拟的Sawyer推动任务和Lunar Lander游戏中使用眼睛控制，并发现我们的方法在这些领域中也超越基eline интерфей斯。我们还进行了大量的减少实验，以确认每个方法组件的实际效果。

Learning from Demonstration via Probabilistic Diagrammatic Teaching

paper_url: http://arxiv.org/abs/2309.03835
repo_url: None
paper_authors: Weiming Zhi, Tianyi Zhang, Matthew Johnson-Roberson
for: 本研究旨在开发一种新的示例教学（Diagrammatic Teaching）方法，以便让用户通过简单的图像示例来教育机器人新的技能。
methods: 该方法基于图像示例，用户可以通过简单地绘制2D图像来示例出想要教育机器人的动作轨迹。研究人员则使用了投影随机过程（RPTL）框架，将2D图像转化为3D空间中的动作轨迹。
results: 研究人员通过实验和实际应用 Validated the effectiveness of their proposed framework, showing that it can be used to teach new skills to both fixed-base and quadruped-mounted manipulators.

Abstract
Learning for Demonstration (LfD) enables robots to acquire new skills by imitating expert demonstrations, allowing users to communicate their instructions in an intuitive manner. Recent progress in LfD often relies on kinesthetic teaching or teleoperation as the medium for users to specify the demonstrations. Kinesthetic teaching requires physical handling of the robot, while teleoperation demands proficiency with additional hardware. This paper introduces an alternative paradigm for LfD called Diagrammatic Teaching. Diagrammatic Teaching aims to teach robots novel skills by prompting the user to sketch out demonstration trajectories on 2D images of the scene, these are then synthesised as a generative model of motion trajectories in 3D task space. Additionally, we present the Ray-tracing Probabilistic Trajectory Learning (RPTL) framework for Diagrammatic Teaching. RPTL extracts time-varying probability densities from the 2D sketches, applies ray-tracing to find corresponding regions in 3D Cartesian space, and fits a probabilistic model of motion trajectories to these regions. New motion trajectories, which mimic those sketched by the user, can then be generated from the probabilistic model. We empirically validate our framework both in simulation and on real robots, which include a fixed-base manipulator and a quadruped-mounted manipulator.

摘要
学习示例（LfD）允许机器人学习新技能，通过复制专家示范，让用户通过直观的方式表达指令。现代进步通常基于手势教学或远程操作作为用户指定示范的媒体。手势教学需要机器人的物理执行，而远程操作需要额外硬件的熟悉。这篇论文介绍了一种替代方案called Diagrammatic Teaching。Diagrammatic Teaching的目标是通过让用户在2D场景图上绘制示范轨迹，并将其 sinthezied为3D任务空间的生成模型。此外，我们还提出了RAY-TRACING概率轨迹学（RPTL）框架。RPTL从2D绘制中提取时间变化的概率密度，应用RAY-TRACING找到相应的3D坐标空间中的区域，并适应一个概率性运动轨迹模型。新的运动轨迹可以通过这个模型来生成，这些轨迹与用户绘制的示范轨迹类似。我们在实验中Validation我们的框架，包括一个固定基 manipulator和一个四脚搭载 manipulator。

Prime and Modulate Learning: Generation of forward models with signed back-propagation and environmental cues

paper_url: http://arxiv.org/abs/2309.03825
repo_url: None
paper_authors: Sama Daryanavard, Bernd Porr
for: 本研究旨在提出一种新的深度神经网络学习方法，以解决深度神经网络在误差反射传播学习过程中的扩散和消失梯度问题。
methods: 本研究使用了一种新的 Prime and Modulate 方法，它利用误差信号的正负pole来驱动学习，而不需要normalization技术或限制activation函数。
results: 实验结果表明， compared to conventional back-propagation方法， Prime and Modulate 方法可以快速提高学习的速度和稳定性。

Abstract
Deep neural networks employing error back-propagation for learning can suffer from exploding and vanishing gradient problems. Numerous solutions have been proposed such as normalisation techniques or limiting activation functions to linear rectifying units. In this work we follow a different approach which is particularly applicable to closed-loop learning of forward models where back-propagation makes exclusive use of the sign of the error signal to prime the learning, whilst a global relevance signal modulates the rate of learning. This is inspired by the interaction between local plasticity and a global neuromodulation. For example, whilst driving on an empty road, one can allow for slow step-wise optimisation of actions, whereas, at a busy junction, an error must be corrected at once. Hence, the error is the priming signal and the intensity of the experience is a modulating factor in the weight change. The advantages of this Prime and Modulate paradigm is twofold: it is free from normalisation and it makes use of relevant cues from the environment to enrich the learning. We present a mathematical derivation of the learning rule in z-space and demonstrate the real-time performance with a robotic platform. The results show a significant improvement in the speed of convergence compared to that of the conventional back-propagation.

摘要

Empirical Risk Minimization for Losses without Variance

paper_url: http://arxiv.org/abs/2309.03818
repo_url: None
paper_authors: Guanhua Fang, Ping Li, Gennady Samorodnitsky
for: 这个论文考虑了一个Empirical Risk Minimization（ERM）问题在重 tailed 设置下，其数据没有固定方差，仅有 $p$-th moment，其中 $p \in (1,2)$。methods: 而不是使用 truncated 观察数据的估计过程，这个论文选择了使用 optimizer 来iminize 风险值。 Catoni 的方法 (Catoni, 2012) 可以帮助我们Robustly 估计风险值。通过 Catoni-type 影响函数的结构，我们可以通过 Generalized Generic Chaining 方法来确定过程溢出风险Upper Bounds。results: 我们在计算问题上进行了深入的理论研究，包括Robust Gradient Descent 算法和 Empirical Risk-based 方法。在广泛的数值研究中，我们发现optimizer 基于 Catoni-style 估计实际上在其他基线之上表现更好，这表明直接基于 truncated 数据进行估计可能会得到不满足的结果。

Abstract
This paper considers an empirical risk minimization problem under heavy-tailed settings, where data does not have finite variance, but only has $p$-th moment with $p \in (1,2)$. Instead of using estimation procedure based on truncated observed data, we choose the optimizer by minimizing the risk value. Those risk values can be robustly estimated via using the remarkable Catoni's method (Catoni, 2012). Thanks to the structure of Catoni-type influence functions, we are able to establish excess risk upper bounds via using generalized generic chaining methods. Moreover, we take computational issues into consideration. We especially theoretically investigate two types of optimization methods, robust gradient descent algorithm and empirical risk-based methods. With an extensive numerical study, we find that the optimizer based on empirical risks via Catoni-style estimation indeed shows better performance than other baselines. It indicates that estimation directly based on truncated data may lead to unsatisfactory results.

摘要
这篇论文考虑了一个实际风险最小化问题，该问题在重 tailed 设定下进行，即数据不具有稳定 variance，仅仅具有 $p$-th 积分。而不是使用 truncated 观察数据的估计过程，我们选择了使用风险值来选择优化器。这些风险值可以通过 Catoni 方法 (Catoni, 2012) 进行可靠地估计。由于 Catoni-type 影响函数的结构，我们可以通过通用化 Generic Chaining 方法来确定过量风险上界。此外，我们还考虑了计算问题。我们特别是使用 robust 梯度下降算法和 empirical risk-based 方法进行优化。通过广泛的数值研究，我们发现了基于 empirical risks 的 Catoni-style 估计实际上比其他基准 mejor 的性能。这表明直接基于 truncated 数据进行估计可能会导致不满足的结果。

Improved theoretical guarantee for rank aggregation via spectral method

paper_url: http://arxiv.org/abs/2309.03808
repo_url: None
paper_authors: Ziliang Samuel Zhong, Shuyang Ling
for: Ranking multiple items based on pairwise comparisons, with applications in sports, recommendation systems, and other web applications.
methods: Spectral ranking algorithms based on unnormalized and normalized data matrices, with a focus on deriving entry-wise perturbation error bounds and an error bound on the maximum displacement for each item.
results: Improved sample complexity and theoretical analysis of the eigenvectors and error bounds for the ranking problem, with confirmation from numerical experiments.

Abstract
Given pairwise comparisons between multiple items, how to rank them so that the ranking matches the observations? This problem, known as rank aggregation, has found many applications in sports, recommendation systems, and other web applications. As it is generally NP-hard to find a global ranking that minimizes the mismatch (known as the Kemeny optimization), we focus on the Erd\"os-R\'enyi outliers (ERO) model for this ranking problem. Here, each pairwise comparison is a corrupted copy of the true score difference. We investigate spectral ranking algorithms that are based on unnormalized and normalized data matrices. The key is to understand their performance in recovering the underlying scores of each item from the observed data. This reduces to deriving an entry-wise perturbation error bound between the top eigenvectors of the unnormalized/normalized data matrix and its population counterpart. By using the leave-one-out technique, we provide a sharper $\ell_{\infty}$-norm perturbation bound of the eigenvectors and also derive an error bound on the maximum displacement for each item, with only $\Omega(n\log n)$ samples. Our theoretical analysis improves upon the state-of-the-art results in terms of sample complexity, and our numerical experiments confirm these theoretical findings.

摘要
In the ERO model, each pairwise comparison is a corrupted copy of the true score difference. We investigate spectral ranking algorithms that are based on unnormalized and normalized data matrices. The key is to understand their performance in recovering the underlying scores of each item from the observed data. This reduces to deriving an entry-wise perturbation error bound between the top eigenvectors of the unnormalized/normalized data matrix and its population counterpart.Using the leave-one-out technique, we provide a sharper $\ell_{\infty}$-norm perturbation bound of the eigenvectors and also derive an error bound on the maximum displacement for each item. Our theoretical analysis improves upon the state-of-the-art results in terms of sample complexity, and our numerical experiments confirm these theoretical findings.Here is the text in Simplified Chinese:给定多个对比的对象，如何将它们排序，以便与观测匹配？这个问题，称为排名聚合，在运动、推荐系统等领域都有广泛的应用。然而，找到一个全局排名，以最小化差异，是NP困难的，因此我们关注了 Erdős-Rényi 偏移（ERO）模型。在这个模型中，每个对比都是对真实分数差的损害版本。我们研究基于不归一化和归一化数据矩阵的spectral排名算法。关键在于理解它们在真实分数中还原数据的性能。这可以通过对数据矩阵的主 eigenvector 和人口矩阵的主 eigenvector 之间的入口级别偏差来解释。使用离散一个技术，我们提供了更加紧张的 $\ell _{\infty}$ 偏差 bound，以及每个对象的最大偏移 bound。我们的理论分析超过了当前最佳结果，并且我们的数值实验也证实了这些理论发现。

Conformal Autoregressive Generation: Beam Search with Coverage Guarantees

paper_url: http://arxiv.org/abs/2309.03797
repo_url: None
paper_authors: Nicolas Deutschmann, Marvin Alberts, María Rodríguez Martínez
for: 这两个扩展都是为了使用幂搜索算法生成具有理论保证的序列集，以提高机器翻译和化学应用等领域的性能。
methods: 第一种方法是使用动态大小的子集 beam search 结果，但与传统的 CP процедур不同，它的最高保证取决于后续调整度量。第二种算法则在解码过程中引入了具有可变宽度的幂集预测过程，该过程可以根据当前的不确定性来调整幂集宽度，并可以在先前选择保证。
results: 我们为每种方法提供了边缘保证 bound，并对其进行了实验评估，包括自然语言处理和化学等领域的任务。

Abstract
We introduce two new extensions to the beam search algorithm based on conformal predictions (CP) to produce sets of sequences with theoretical coverage guarantees. The first method is very simple and proposes dynamically-sized subsets of beam search results but, unlike typical CP procedures, has an upper bound on the achievable guarantee depending on a post-hoc calibration measure. Our second algorithm introduces the conformal set prediction procedure as part of the decoding process, producing a variable beam width which adapts to the current uncertainty. While more complex, this procedure can achieve coverage guarantees selected a priori. We provide marginal coverage bounds for each method, and evaluate them empirically on a selection of tasks drawing from natural language processing and chemistry.

摘要
我们介绍两个基于对准预测（CP）的统计搜寻算法扩展，以生成具有理论覆盖保证的序列集。第一种方法非常简单，它在统计搜寻结果中动态地选择子集，但与传统CP程序不同，它的最高保证价随后续检测度量而定。我们的第二个算法则在解码过程中引入对准集预测程序，将统计搜寻结果中的宽度调整为目前的不确定程度。这个方法处理更加复杂，但可以预先选择保证。我们提供每方法的边界覆盖保证，并对其进行实验评估，包括自然语言处理和化学领域的任务。

Adversarially Robust Deep Learning with Optimal-Transport-Regularized Divergences

paper_url: http://arxiv.org/abs/2309.03791
repo_url: None
paper_authors: Jeremiah Birrell, Mohammadreza Ebrahimi
For: The paper aims to enhance the adversarial robustness of deep learning models against various attacks, such as FGSM and PGD.* Methods: The paper proposes a novel approach called $ARMOR_D$, which uses optimal-transport-regularized divergences to enhance adversarial robustness. This approach involves maximizing the expected loss over a neighborhood of distributions, known as distributionally robust optimization.* Results: The paper demonstrates the effectiveness of $ARMOR_D$ on malware detection and image recognition applications, achieving higher robustness against adversarial attacks compared to prior methods. Specifically, $ARMOR_D$ yields a robustified accuracy of 98.29% against FGSM and 98.18% against PGD on the MNIST dataset, and improves the robustified accuracy in malware detection by 37.0% compared to previous best-performing methods.

Abstract
We introduce the $ARMOR_D$ methods as novel approaches to enhancing the adversarial robustness of deep learning models. These methods are based on a new class of optimal-transport-regularized divergences, constructed via an infimal convolution between an information divergence and an optimal-transport (OT) cost. We use these as tools to enhance adversarial robustness by maximizing the expected loss over a neighborhood of distributions, a technique known as distributionally robust optimization. Viewed as a tool for constructing adversarial samples, our method allows samples to be both transported, according to the OT cost, and re-weighted, according to the information divergence. We demonstrate the effectiveness of our method on malware detection and image recognition applications and find that, to our knowledge, it outperforms existing methods at enhancing the robustness against adversarial attacks. $ARMOR_D$ yields the robustified accuracy of $98.29\%$ against $FGSM$ and $98.18\%$ against $PGD^{40}$ on the MNIST dataset, reducing the error rate by more than $19.7\%$ and $37.2\%$ respectively compared to prior methods. Similarly, in malware detection, a discrete (binary) data domain, $ARMOR_D$ improves the robustified accuracy under $rFGSM^{50}$ attack compared to the previous best-performing adversarial training methods by $37.0\%$ while lowering false negative and false positive rates by $51.1\%$ and $57.53\%$, respectively.

摘要
我们介绍$ARMOR_D$方法作为深度学习模型的攻击抗性增强方法。这些方法基于一种新的最佳运输规则化分散，通过infimal混合两种信息分散和最佳运输（OT）成本。我们使用这些工具增强攻击抗性，通过范围内的分布预期损失最大化，称为分布式准确估算。 viewed as a sample生成工具，我们的方法可以将样本运输 according to OT 成本，并重新权重 according to 信息分散。我们在这篇文章中证明了$ARMOR_D$ 的有效性，在运算系统识别和图像识别应用中，它可以比对现有方法提高抗性 against 攻击攻击。 $ARMOR_D$ 在 MNIST dataset 上获得了$98.29\%$ 的抗性精度，比对 $FGSM$ 和 $PGD^{40}$ 攻击下的最佳性能优化 $19.7\%$ 和 $37.2\%$ 。在运算系统识别中，一个简单的二进制数据领域，$ARMOR_D$ 可以比对先前最佳的对抗训练方法提高抗性精度 $37.0\%$，同时降低伪阳性和伪阴性的比率 $51.1\%$ 和 $57.53\%$。

Neural lasso: a unifying approach of lasso and neural networks

paper_url: http://arxiv.org/abs/2309.03770
repo_url: None
paper_authors: David Delgado, Ernesto Curbelo, Danae Carreras
for: 本文目的是将统计技术与机器学习技术结合使用，以获得两者之间的优点。
methods: 本文使用神经网络表示统计技术lasso的变量选择。两者的目标函数相同，但是它们的优化不同。神经网络版本通常在一步使用单个验证集进行优化，而统计方法则使用两步优化基于交叉验证。统计方法的更复杂的优化导致更准确的参数估计，特别是当训练集较小时。
results: 在开发修改后，一种新的优化算法 для标识重要变量出现了。使用 sintetic和实际数据集进行实验表明，这种新的优化算法在三种之前的优化方法中表现更好。

Abstract
In recent years, there is a growing interest in combining techniques attributed to the areas of Statistics and Machine Learning in order to obtain the benefits of both approaches. In this article, the statistical technique lasso for variable selection is represented through a neural network. It is observed that, although both the statistical approach and its neural version have the same objective function, they differ due to their optimization. In particular, the neural version is usually optimized in one-step using a single validation set, while the statistical counterpart uses a two-step optimization based on cross-validation. The more elaborated optimization of the statistical method results in more accurate parameter estimation, especially when the training set is small. For this reason, a modification of the standard approach for training neural networks, that mimics the statistical framework, is proposed. During the development of the above modification, a new optimization algorithm for identifying the significant variables emerged. Experimental results, using synthetic and real data sets, show that this new optimization algorithm achieves better performance than any of the three previous optimization approaches.

摘要

Convergence Analysis of Decentralized ASGD

paper_url: http://arxiv.org/abs/2309.03754
repo_url: None
paper_authors: Mauro DL Tosi, Martin Theobald
for: 这个论文目的是提出了一种新的异步Stochastic Gradient Descent（DASGD）算法，以提高训练时间的效率。
methods: 这个论文使用了异步SGD算法，并提供了一种新的速度分析方法，不需要分布式参数服务器，可以在多个分布式进程之间进行异步计算。
results: 论文提出了一个新的速度分析方法，可以保证DASGD算法在不同网络拓扑和缓存环境下的 converges 速度为 $\mathcal{O}(\sigma\epsilon^{-2}) + \mathcal{O}(QS_{avg}\epsilon^{-3/2}) + \mathcal{O}(S_{avg}\epsilon^{-1})$，其中 $S_{avg}$ 是模型之间的均值差，$Q$ 是惩罚函数的最大值，$\epsilon$ 是允许的误差。

Abstract
Over the last decades, Stochastic Gradient Descent (SGD) has been intensively studied by the Machine Learning community. Despite its versatility and excellent performance, the optimization of large models via SGD still is a time-consuming task. To reduce training time, it is common to distribute the training process across multiple devices. Recently, it has been shown that the convergence of asynchronous SGD (ASGD) will always be faster than mini-batch SGD. However, despite these improvements in the theoretical bounds, most ASGD convergence-rate proofs still rely on a centralized parameter server, which is prone to become a bottleneck when scaling out the gradient computations across many distributed processes. In this paper, we present a novel convergence-rate analysis for decentralized and asynchronous SGD (DASGD) which does not require partial synchronization among nodes nor restrictive network topologies. Specifically, we provide a bound of $\mathcal{O}(\sigma\epsilon^{-2}) + \mathcal{O}(QS_{avg}\epsilon^{-3/2}) + \mathcal{O}(S_{avg}\epsilon^{-1})$ for the convergence rate of DASGD, where $S_{avg}$ is the average staleness between models, $Q$ is a constant that bounds the norm of the gradients, and $\epsilon$ is a (small) error that is allowed within the bound. Furthermore, when gradients are not bounded, we prove the convergence rate of DASGD to be $\mathcal{O}(\sigma\epsilon^{-2}) + \mathcal{O}(\sqrt{\hat{S}_{avg}\hat{S}_{max}\epsilon^{-1})$, with $\hat{S}_{max}$ and $\hat{S}_{avg}$ representing a loose version of the average and maximum staleness, respectively. Our convergence proof holds for a fixed stepsize and any non-convex, homogeneous, and L-smooth objective function. We anticipate that our results will be of high relevance for the adoption of DASGD by a broad community of researchers and developers.

摘要
Over the past few decades, Stochastic Gradient Descent (SGD) has been extensively studied by the Machine Learning community. Despite its versatility and excellent performance, optimizing large models via SGD is still a time-consuming task. To reduce training time, it is common to distribute the training process across multiple devices. Recently, it has been shown that the convergence of asynchronous SGD (ASGD) is faster than mini-batch SGD. However, most ASGD convergence-rate proofs rely on a centralized parameter server, which can become a bottleneck when scaling out the gradient computations across many distributed processes.In this paper, we present a novel convergence-rate analysis for decentralized and asynchronous SGD (DASGD) that does not require partial synchronization among nodes nor restrictive network topologies. Specifically, we provide a bound of $\mathcal{O}(\sigma\epsilon^{-2}) + \mathcal{O}(QS_{avg}\epsilon^{-3/2}) + \mathcal{O}(S_{avg}\epsilon^{-1})$ for the convergence rate of DASGD, where $S_{avg}$ is the average staleness between models, $Q$ is a constant that bounds the norm of the gradients, and $\epsilon$ is a small error allowed within the bound. Furthermore, when gradients are not bounded, we prove the convergence rate of DASGD to be $\mathcal{O}(\sigma\epsilon^{-2}) + \mathcal{O}(\sqrt{\hat{S}_{avg}\hat{S}_{max}\epsilon^{-1})$, with $\hat{S}_{max}$ and $\hat{S}_{avg}$ representing a loose version of the average and maximum staleness, respectively. Our convergence proof holds for a fixed stepsize and any non-convex, homogeneous, and L-smooth objective function. We expect that our results will be highly relevant for the adoption of DASGD by a broad community of researchers and developers.

Medoid Silhouette clustering with automatic cluster number selection

paper_url: http://arxiv.org/abs/2309.03751
repo_url: None
paper_authors: Lars Lenssen, Erich Schubert
For: The paper is written to discuss and improve the efficiency of the Silhouette method for clustering evaluation, specifically the medoid-based variant.* Methods: The paper uses the Silhouette method and combines it with the PAM algorithm to improve the efficiency of the clustering evaluation. The authors propose two fast versions of the algorithm and provide a theoretical analysis of its properties.* Results: The authors report a speedup of $O(k^2)$ compared to the original PAMMEDSIL algorithm on real data with 30000 samples and 100 clusters. They also provide a variant to choose the optimal number of clusters directly.

Abstract
The evaluation of clustering results is difficult, highly dependent on the evaluated data set and the perspective of the beholder. There are many different clustering quality measures, which try to provide a general measure to validate clustering results. A very popular measure is the Silhouette. We discuss the efficient medoid-based variant of the Silhouette, perform a theoretical analysis of its properties, provide two fast versions for the direct optimization, and discuss the use to choose the optimal number of clusters. We combine ideas from the original Silhouette with the well-known PAM algorithm and its latest improvements FasterPAM. One of the versions guarantees equal results to the original variant and provides a run speedup of $O(k^2)$. In experiments on real data with 30000 samples and $k$=100, we observed a 10464$\times$ speedup compared to the original PAMMEDSIL algorithm. Additionally, we provide a variant to choose the optimal number of clusters directly.

摘要
evaluating clustering results 是一项困难的任务，具有数据集和观点的依赖性。有很多不同的归类质量指标，它们试图为归类结果提供一个通用的验证方法。Silhouette 是一个非常受欢迎的指标之一。我们讨论了基于 medoid 的 Silhouette 变体，进行了理论分析其属性，并提供了两种快速版本用于直接优化。我们还讨论了如何选择最佳归类数量。我们将原始 Silhouette 的想法与 Pam 算法和其最新改进 FasterPAM 结合，提供了一种可以保证结果相同性的版本，并且提供了 $O(k^2)$ 的运行速度提升。在实际数据上进行了30000个样本和 $k$ = 100 的实验，我们发现了一个10464倍的速度提升，相比原始 PAMMEDSIL 算法。此外，我们还提供了一种直接选择最佳归类数量的变体。

Learning continuous-valued treatment effects through representation balancing

paper_url: http://arxiv.org/abs/2309.03731
repo_url: https://github.com/christopher-br/cbrnet
paper_authors: Christopher Bockel-Rickermann, Toon Vanderschueren, Jeroen Berrevoets, Tim Verdonck, Wouter Verbeke
for: 估计在不同剂量下的治疗效果，以及剂量对结果的影响，是各种领域的重要问题，从医疗到商业、经济等。
methods: 该研究使用了 causal machine learning 方法来估计来自观察数据的个体剂量回应。该方法基于 Neyman-Rubin 潜在结果框架，并对连续型剂量进行了扩展，以确保缓解选择偏见。
results: 我们的实验表明，CBRNet 可以准确地估计剂量回应，并与其他当前状态的方法竞争。

Abstract
Estimating the effects of treatments with an associated dose on an instance's outcome, the "dose response", is relevant in a variety of domains, from healthcare to business, economics, and beyond. Such effects, also known as continuous-valued treatment effects, are typically estimated from observational data, which may be subject to dose selection bias. This means that the allocation of doses depends on pre-treatment covariates. Previous studies have shown that conventional machine learning approaches fail to learn accurate individual estimates of dose responses under the presence of dose selection bias. In this work, we propose CBRNet, a causal machine learning approach to estimate an individual dose response from observational data. CBRNet adopts the Neyman-Rubin potential outcome framework and extends the concept of balanced representation learning for overcoming selection bias to continuous-valued treatments. Our work is the first to apply representation balancing in a continuous-valued treatment setting. We evaluate our method on a newly proposed benchmark. Our experiments demonstrate CBRNet's ability to accurately learn treatment effects under selection bias and competitive performance with respect to other state-of-the-art methods.

摘要
估算干预对象的结果的影响，即“剂量响应”，在各种领域都是重要的，从医疗到商业、经济等等。这些影响通常从观察数据中被估算，但可能受到剂量选择偏见的影响。这意味着剂量分配取决于先前的 covariates。先前的研究表明，常规机器学习方法在存在剂量选择偏见的情况下不能准确地学习个体剂量响应。在这项工作中，我们提议CBRNet，一种 causal机器学习方法，用于从观察数据中估算个体剂量响应。CBRNet采用Neyman-Rubin potential outcome框架，并将权衡表现学习概念应用到连续型剂量上。我们的工作是首次在连续型剂量设置下应用表现权衡。我们对一个新提出的 benchmark进行了测试，并证明CBRNet可以准确地学习受选择偏见影响的剂量响应，并与其他现有的状态 искусственный智能方法竞争。

A Causal Perspective on Loan Pricing: Investigating the Impacts of Selection Bias on Identifying Bid-Response Functions

paper_url: http://arxiv.org/abs/2309.03730
repo_url: None
paper_authors: Christopher Bockel-Rickermann, Sam Verboven, Tim Verdonck, Wouter Verbeke
for: 这个论文的目的是研究个性化价格策略的实现，以及如何对 selección bias 的影响。
methods: 这篇论文使用了 causal 机器学习方法，包括 logistic regression 和 neural networks，以及 state-of-the-art 方法来抗 selección bias。
results: 研究发现， selección bias 会导致传统方法的准确率下降，但是使用 state-of-the-art 方法可以有效地抗 selección bias。

Abstract
In lending, where prices are specific to both customers and products, having a well-functioning personalized pricing policy in place is essential to effective business making. Typically, such a policy must be derived from observational data, which introduces several challenges. While the problem of ``endogeneity'' is prominently studied in the established pricing literature, the problem of selection bias (or, more precisely, bid selection bias) is not. We take a step towards understanding the effects of selection bias by posing pricing as a problem of causal inference. Specifically, we consider the reaction of a customer to price a treatment effect. In our experiments, we simulate varying levels of selection bias on a semi-synthetic dataset on mortgage loan applications in Belgium. We investigate the potential of parametric and nonparametric methods for the identification of individual bid-response functions. Our results illustrate how conventional methods such as logistic regression and neural networks suffer adversely from selection bias. In contrast, we implement state-of-the-art methods from causal machine learning and show their capability to overcome selection bias in pricing data.

摘要
在贷款领域，因为价格对于客户和产品都是特定的，有效的个性化价格策略是非常重要。通常，这种策略需要基于观察数据来 derivation，这会带来一些挑战。而“内生性”问题在已有的价格理论中已经得到了广泛的研究，但是“投标偏见”问题却没有得到了相应的研究。我们在价格问题上采用 causal inference 的方法来解决这个问题。 Specifically, we consider the reaction of a customer to price as a treatment effect.在我们的实验中，我们使用 semi-synthetic 数据集来模拟不同水平的选择偏见。我们 investigate 了 parametric 和 nonparametric 方法在个bid-response函数的标识方面的潜力。我们的结果表明，使用常见的 logistic regression 和神经网络方法会受到选择偏见的害。相反，我们使用 state-of-the-art 的 causal machine learning 方法，并证明它们可以在价格数据中超越选择偏见。

DGSD: Dynamical Graph Self-Distillation for EEG-Based Auditory Spatial Attention Detection

paper_url: http://arxiv.org/abs/2309.07147
repo_url: None
paper_authors: Cunhang Fan, Hongyu Zhang, Wei Huang, Jun Xue, Jianhua Tao, Jiangyan Yi, Zhao Lv, Xiaopei Wu
for: 这篇论文的目的是检测脑电声纳中的target speaker，以提高多个说话人环境中的听力注意力检测精度。
methods: 该论文提出了一种基于动态图自适应（DGSD）方法，不需要输入语音刺激。使用动态图 convolutional neural networks 表示 EEG 信号的图 структуры，可以提取关键与听力注意力相关的 EEG 信号特征。此外，该方法还应用了自适应策略，包括特征维度和层次维度的自适应策略，以提高听力注意力检测性能。
results: 该论文在 KUL 和 DTU 两个公共数据集上进行了实验，在 1 秒时间窗口内达到了 90.0% 和 79.6% 的检测精度。与比较方法相比，该方法的检测性能不仅高于最佳可重制基线，而且可以大幅减少 Trainable 参数的数量，约 100 倍。

Abstract
Auditory Attention Detection (AAD) aims to detect target speaker from brain signals in a multi-speaker environment. Although EEG-based AAD methods have shown promising results in recent years, current approaches primarily rely on traditional convolutional neural network designed for processing Euclidean data like images. This makes it challenging to handle EEG signals, which possess non-Euclidean characteristics. In order to address this problem, this paper proposes a dynamical graph self-distillation (DGSD) approach for AAD, which does not require speech stimuli as input. Specifically, to effectively represent the non-Euclidean properties of EEG signals, dynamical graph convolutional networks are applied to represent the graph structure of EEG signals, which can also extract crucial features related to auditory spatial attention in EEG signals. In addition, to further improve AAD detection performance, self-distillation, consisting of feature distillation and hierarchical distillation strategies at each layer, is integrated. These strategies leverage features and classification results from the deepest network layers to guide the learning of shallow layers. Our experiments are conducted on two publicly available datasets, KUL and DTU. Under a 1-second time window, we achieve results of 90.0\% and 79.6\% accuracy on KUL and DTU, respectively. We compare our DGSD method with competitive baselines, and the experimental results indicate that the detection performance of our proposed DGSD method is not only superior to the best reproducible baseline but also significantly reduces the number of trainable parameters by approximately 100 times.

摘要
听觉注意点检测（AAD）目标是从脑电壳中检测目标说话人。尽管使用EEG信号的AAD方法在过去几年中获得了可观的成果，但现有方法主要依靠传统的卷积神经网络来处理Euclidean数据，这使得处理EEG信号变得困难。为了解决这个问题，本文提出了一种基于动态图自适应（DGSD）的AAD方法，不需要语音刺激作为输入。具体来说，使用动态图卷积网络来表示EEG信号的图 структуры，可以提取EEG信号中关键的听觉空间注意力特征。此外，为了进一步改善AAD检测性能，我们采用了自适应策略，包括特征退化和层次退化策略，这些策略可以在每层使用特征和分类结果来导引深层网络的学习。我们在KUL和DTU两个公共可用的数据集上进行实验，在1秒时间窗口内，我们达到了90.0%和79.6%的检测精度。我们与比较baseline方法进行比较，实验结果表明，我们提出的DGSD方法不仅在检测性能方面超过了最佳可重制baseline，而且可以降低大约100倍的训练参数数量。

A State Representation for Diminishing Rewards

paper_url: http://arxiv.org/abs/2309.03710
repo_url: None
paper_authors: Ted Moskovitz, Samo Hromadka, Ahmed Touati, Diana Borsa, Maneesh Sahani
for: 本研究旨在研究多任务动作学习中，agent需要快速适应不同的静态奖励函数。
methods: 本研究使用了Successor Representation（SR）框架，它可以快速评估策略的预期减少价值。然而，在自然世界中，继任任务 rarely是独立的，而是受到奖励刺激的可用性和主观感知的影响。
results: 本研究发现了递减偏好的现象，并提出了一种新的状态表示方法，即λ表示（λR），它可以替代SR并总结了一些从文献中提出的状态表示方法。我们证明了λR的正式性质，并考虑了其在机器学习中的 normative 优点以及其在研究自然行为中的实用性。

Abstract
A common setting in multitask reinforcement learning (RL) demands that an agent rapidly adapt to various stationary reward functions randomly sampled from a fixed distribution. In such situations, the successor representation (SR) is a popular framework which supports rapid policy evaluation by decoupling a policy's expected discounted, cumulative state occupancies from a specific reward function. However, in the natural world, sequential tasks are rarely independent, and instead reflect shifting priorities based on the availability and subjective perception of rewarding stimuli. Reflecting this disjunction, in this paper we study the phenomenon of diminishing marginal utility and introduce a novel state representation, the $\lambda$ representation ($\lambda$R) which, surprisingly, is required for policy evaluation in this setting and which generalizes the SR as well as several other state representations from the literature. We establish the $\lambda$R's formal properties and examine its normative advantages in the context of machine learning, as well as its usefulness for studying natural behaviors, particularly foraging.

摘要
通常的多任务强化学习（RL）中，一个代理人需要快速适应不同的站立奖励函数的随机抽取。在这种情况下，继承表示（SR）是一种广泛使用的框架，它支持快速政策评估 by 分离一个政策的预期减少价值函数和特定奖励函数。然而，在自然世界中，序列任务很少是独立的，而是基于奖励刺激的可用性和主观感受的变化。为了反映这种分歧，在这篇论文中，我们研究了减少的边际效用现象，并引入了一种新的状态表示，即 $\lambda$ 表示（$\lambda$R），这种表示 surprisingly 需要为政策评估。我们证明了 $\lambda$R 的正式性质，并在机器学习中对其的正式优点进行了调研，以及在自然行为研究中的实用性。

Chat Failures and Troubles: Reasons and Solutions

paper_url: http://arxiv.org/abs/2309.03708
repo_url: None
paper_authors: Manal Helal, Patrick Holthaus, Gabriella Lakatos, Farshid Amirabdollahian
for: 这种研究旨在解决人机交互（HRI）中常见的问题，导致聊天失败和困难。
methods: 该研究使用了适当的机器人和聊天模型，识别常见的问题并提出解决方案，并规划了持续改进。
results: 研究建议使用封闭控制算法，使用已经训练的人工智能（AI）预训练模型，提供词汇筛选、批处理新数据集、在数据流中学习、或者使用强化学习模型自动更新训练模型，以减少错误。

Abstract
This paper examines some common problems in Human-Robot Interaction (HRI) causing failures and troubles in Chat. A given use case's design decisions start with the suitable robot, the suitable chatting model, identifying common problems that cause failures, identifying potential solutions, and planning continuous improvement. In conclusion, it is recommended to use a closed-loop control algorithm that guides the use of trained Artificial Intelligence (AI) pre-trained models and provides vocabulary filtering, re-train batched models on new datasets, learn online from data streams, and/or use reinforcement learning models to self-update the trained models and reduce errors.

摘要
这篇论文研究了人机交互（HRI）在聊天中出现的一些常见问题，导致故障和困难。这个用例的设计决策从适合的机器人、适合的聊天模型开始，并识别常见的故障原因、潜在的解决方案，以及不断改进计划。结论是使用封闭控制算法，使用已经训练的人工智能（AI）预训练模型进行 vocabulary 筛选、批处理新数据集来重新训练模型，在数据流中学习在线，以及使用奖励学习模型自动更新训练模型以减少错误。

A Probabilistic Semi-Supervised Approach with Triplet Markov Chains

paper_url: http://arxiv.org/abs/2309.03707
repo_url: None
paper_authors: Katherine Morales, Yohan Petetin
for: 这篇论文是用于描述一种基于变量 Bayesian 推理的半supervised triplet Markov chain模型训练方法。
methods: 该方法使用变量 Bayesian 推理来训练半supervised triplet Markov chain模型，并可以应用于多种逻辑 Bayesian 分类模型。
results: 该方法可以在半supervised情况下提高 triplet Markov chain模型的训练效果，并且可以应用于多种逻辑 Bayesian 分类模型。

Abstract
Triplet Markov chains are general generative models for sequential data which take into account three kinds of random variables: (noisy) observations, their associated discrete labels and latent variables which aim at strengthening the distribution of the observations and their associated labels. However, in practice, we do not have at our disposal all the labels associated to the observations to estimate the parameters of such models. In this paper, we propose a general framework based on a variational Bayesian inference to train parameterized triplet Markov chain models in a semi-supervised context. The generality of our approach enables us to derive semi-supervised algorithms for a variety of generative models for sequential Bayesian classification.

摘要
三重马尔可夫链是一种通用的生成模型，用于处理序列数据，它考虑了三种随机变量：受损的观察值、其关联的整数标签以及隐藏的变量，以强化观察值和其关联的标签的分布。然而，在实践中，我们通常不 dispon�ible all the labels associated with the observations to estimate the parameters of such models。在这篇论文中，我们提出了一种基于变量 bayesian 推理的通用框架，用于在半指导下训练参数化的三重马尔可夫链模型。我们的方法的通用性使得我们可以 derivate 半指导的算法 для多种生成模型，用于随机分类。

Short-Term Load Forecasting Using A Particle-Swarm Optimized Multi-Head Attention-Augmented CNN-LSTM Network

paper_url: http://arxiv.org/abs/2309.03694
repo_url: None
paper_authors: Paapa Kwesi Quansah, Edwin Kwesi Ansah Tenkorang
for: 短期负载预测的重要性在电力系统中，因为它的自然非线性和动态性。
methods: 我们的方法使用Particle-Swarm Optimization算法来自动探索和优化参数，Multi-Head Attention机制来挑出精确预测的重要特征，以及一个减少Computational Overhead的框架。
results: 我们的方法在使用真实的电力需求数据进行评估中，表现出超过现有州际优秀方法的精度、可靠性和Computational Efficiency。特别是，我们的 Mean Absolute Percentage Error 为 1.9376，与现有方法相比，表明我们的方法开启了一个新的时代。

Abstract
Short-term load forecasting is of paramount importance in the efficient operation and planning of power systems, given its inherent non-linear and dynamic nature. Recent strides in deep learning have shown promise in addressing this challenge. However, these methods often grapple with hyperparameter sensitivity, opaqueness in interpretability, and high computational overhead for real-time deployment. In this paper, I propose a novel solution that surmounts these obstacles. Our approach harnesses the power of the Particle-Swarm Optimization algorithm to autonomously explore and optimize hyperparameters, a Multi-Head Attention mechanism to discern the salient features crucial for accurate forecasting, and a streamlined framework for computational efficiency. Our method undergoes rigorous evaluation using a genuine electricity demand dataset. The results underscore its superiority in terms of accuracy, robustness, and computational efficiency. Notably, our Mean Absolute Percentage Error of 1.9376 marks a significant advancement over existing state-of-the-art approaches, heralding a new era in short-term load forecasting.

摘要
短期负载预测是电力系统的重要任务，因为它的本质具有非线性和动态特性。Recent advances in deep learning have shown promise in addressing this challenge. However, these methods often struggle with hyperparameter sensitivity, lack of interpretability, and high computational overhead for real-time deployment. In this paper, I propose a novel solution that overcomes these challenges. Our approach leverages the power of the Particle-Swarm Optimization algorithm to automatically explore and optimize hyperparameters, a Multi-Head Attention mechanism to identify the crucial features for accurate forecasting, and a streamlined framework for computational efficiency. Our method undergoes rigorous evaluation using a genuine electricity demand dataset. The results demonstrate its superiority in terms of accuracy, robustness, and computational efficiency. Notably, our Mean Absolute Percentage Error of 1.9376 represents a significant improvement over existing state-of-the-art approaches, marking a new era in short-term load forecasting.Here's the translation of the text into Traditional Chinese:短期负载预测是电力系统的重要任务，因为它的本质具有非线性和动态特性。Recent advances in deep learning have shown promise in addressing this challenge. However, these methods often struggle with hyperparameter sensitivity, lack of interpretability, and high computational overhead for real-time deployment. In this paper, I propose a novel solution that overcomes these challenges. Our approach leverages the power of the Particle-Swarm Optimization algorithm to automatically explore and optimize hyperparameters, a Multi-Head Attention mechanism to identify the crucial features for accurate forecasting, and a streamlined framework for computational efficiency. Our method undergoes rigorous evaluation using a genuine electricity demand dataset. The results demonstrate its superiority in terms of accuracy, robustness, and computational efficiency. Notably, our Mean Absolute Percentage Error of 1.9376 represents a significant improvement over existing state-of-the-art approaches, marking a new era in short-term load forecasting.

A computationally lightweight safe learning algorithm

paper_url: http://arxiv.org/abs/2309.03672
repo_url: None
paper_authors: Dominik Baumann, Krzysztof Kowalczyk, Koen Tiels, Paweł Wachel
for: 本研究旨在提供一种安全学习算法，可以在训练过程中提供概率性安全保证，而无需知道系统动力学模型。
methods: 本研究使用 Nadaraya-Watson 估计器，而不是 Gaussian 过程估计器，以实现很高效的计算。
results: 我们提供了理论保证，并将其嵌入到一种安全学习算法中，并通过对七度自由运动机械人模拟器进行数学实验来证明。

Abstract
Safety is an essential asset when learning control policies for physical systems, as violating safety constraints during training can lead to expensive hardware damage. In response to this need, the field of safe learning has emerged with algorithms that can provide probabilistic safety guarantees without knowledge of the underlying system dynamics. Those algorithms often rely on Gaussian process inference. Unfortunately, Gaussian process inference scales cubically with the number of data points, limiting applicability to high-dimensional and embedded systems. In this paper, we propose a safe learning algorithm that provides probabilistic safety guarantees but leverages the Nadaraya-Watson estimator instead of Gaussian processes. For the Nadaraya-Watson estimator, we can reach logarithmic scaling with the number of data points. We provide theoretical guarantees for the estimates, embed them into a safe learning algorithm, and show numerical experiments on a simulated seven-degrees-of-freedom robot manipulator.

摘要
安全是控制系统学习的重要资产，因为违反安全约束 During training可能会导致设备损坏。为应对这一需求，安全学习领域出现了一些算法，可以提供概率安全保证而无需系统动力学知识。这些算法通常基于 Gaussian process 推理。可惜的是，Gaussian process 推理的计算复杂度随着数据点数的增加而呈 кубиック增长，因此对高维和嵌入系统不太适用。在这篇论文中，我们提出了一种安全学习算法，可以提供概率安全保证，但是使用 Nadaraya-Watson 估计器而不是 Gaussian processes。Nadaraya-Watson 估计器的计算复杂度与数据点数的增加相对较快，只有 logarithmic 增长。我们提供了对估计的理论保证，将其嵌入到安全学习算法中，并在一个模拟的七度自由机械 manipulate 器上进行了数学实验。

Alzheimer Disease Detection from Raman Spectroscopy of the Cerebrospinal Fluid via Topological Machine Learning

paper_url: http://arxiv.org/abs/2309.03664
repo_url: None
paper_authors: Francesco Conti, Martina Banchelli, Valentina Bessi, Cristina Cecchi, Fabrizio Chiti, Sara Colantonio, Cristiano D’Andrea, Marella de Angelis, Davide Moroni, Benedetta Nacmias, Maria Antonietta Pascali, Sandro Sorbi, Paolo Matteini
for: 本研究用于确定阿尔茨曼病（AD）诊断的依据。
methods: 本研究使用了拉曼光谱（RS）测试19名阿尔茨曼病患者和5名病理控制人群的静脉液样本，并使用了标准机器学习（ML）方法和拓扑分析方法来分类。
results: 研究结果显示，使用拓扑分析方法可以准确地分类阿尔茨曼病和控制人群，并且分类精度高于87%。

Abstract
The cerebrospinal fluid (CSF) of 19 subjects who received a clinical diagnosis of Alzheimer's disease (AD) as well as of 5 pathological controls have been collected and analysed by Raman spectroscopy (RS). We investigated whether the raw and preprocessed Raman spectra could be used to distinguish AD from controls. First, we applied standard Machine Learning (ML) methods obtaining unsatisfactory results. Then, we applied ML to a set of topological descriptors extracted from raw spectra, achieving a very good classification accuracy (>87%). Although our results are preliminary, they indicate that RS and topological analysis together may provide an effective combination to confirm or disprove a clinical diagnosis of AD. The next steps will include enlarging the dataset of CSF samples to validate the proposed method better and, possibly, to understand if topological data analysis could support the characterization of AD subtypes.

摘要
精神�生液（CSF）的19名病人和5名病理控制者的样本被收集和分析使用拉曼光谱（RS）。我们研究了使用标准机器学习（ML）方法来分别AD和控制者，但得到的结果不太理想。然后，我们使用ML方法对raw光谱中提取的 topological 特征进行分类，达到了非常好的分类精度（>87%）。虽然我们的结果只是初步的，但它们表明RS和 topological 分析可能是一种有效的组合，用于诊断AD。接下来的步骤将包括扩大CSF样本集，以更好地验证我们的方法，并可能地理解AD的多种类型是否可以通过 topological 数据分析来 caracterization。

Insights Into the Inner Workings of Transformer Models for Protein Function Prediction

paper_url: http://arxiv.org/abs/2309.03631
repo_url: https://github.com/markuswenzel/xai-proteins
paper_authors: Markus Wenzel, Erik Grüner, Nils Strodthoff
for: 这个论文的目的是探讨用于蛋白质功能预测的神经网络模型中的解释性AI（XAI）方法，以推广已有的XAI方法，使得神经网络模型中的潜在表示可以被检视。
methods: 这个论文使用了扩展了已有XAI方法，以便可以 inspect神经网络模型中的潜在表示，并且使用了经过 fine-tuning 的 transformer 模型来预测蛋白质的 Gene Ontology терMINology和 Enzyme Commission 号。
results: 这个论文的结果表明，使用这种 XAI 方法可以Identify蛋白质序列中特定的氨基酸 residues，并且这些相关序列部分与生物和化学预期符合，包括在 embedding layer 和模型中的 transformer 头部。此外，这种方法还可以为蛋白质序列中的某些部分分配权重，并且这些权重与实际的序列标注（例如膜蛋白和活性 сай）在多个蛋白质中具有 statistically significant 相关性。

Abstract
Motivation: We explored how explainable AI (XAI) can help to shed light into the inner workings of neural networks for protein function prediction, by extending the widely used XAI method of integrated gradients such that latent representations inside of transformer models, which were finetuned to Gene Ontology term and Enzyme Commission number prediction, can be inspected too. Results: The approach enabled us to identify amino acids in the sequences that the transformers pay particular attention to, and to show that these relevant sequence parts reflect expectations from biology and chemistry, both in the embedding layer and inside of the model, where we identified transformer heads with a statistically significant correspondence of attribution maps with ground truth sequence annotations (e.g., transmembrane regions, active sites) across many proteins. Availability and Implementation: Source code can be accessed at https://github.com/markuswenzel/xai-proteins .

摘要
Motivation: 我们研究了如何使用可解释AI（XAI）来探索神经网络中用于蛋白功能预测的内部工作机制，通过扩展广泛使用XAI方法的集成梯度的扩展，以便可以查看转换器模型中的隐藏表示。结果：我们的方法可以 помо助我们identify在序列中特定的氨基酸，并显示这些相关的序列部分与生物和化学预期符合，不仅在嵌入层中，还在模型中，我们在多个转换器头中发现了 statistically significant的对应地图与实际序列注释（例如膜内部区域、活性位点）的相关性，在许多蛋白质中。可用性和实现：源代码可以通过https://github.com/markuswenzel/xai-proteins 访问。

Understanding Self-Supervised Learning of Speech Representation via Invariance and Redundancy Reduction

paper_url: http://arxiv.org/abs/2309.03619
repo_url: None
paper_authors: Yusuf Brima, Ulf Krumnack, Simone Pika, Gunther Heidemann
for: investigates how different formulations of the Barlow Twins (BT) objective impact downstream task performance for speech data.
methods: proposes Modified Barlow Twins (MBT) with normalized latents to enforce scale-invariance and evaluates on speaker identification, gender recognition and keyword spotting tasks.
results: improves representation generalization over original BT, especially when fine-tuning with limited target data, highlighting the importance of designing objectives that encourage invariant and transferable representations.Here’s the full text in Simplified Chinese:
for: 这篇论文 investigate 如何不同的 Barlow Twins（BT）目标函数对语音数据下的下游任务表现的影响。
methods: 该论文提出 Modified Barlow Twins（MBT），使用 норма化的特征来保证缩放性，并在 speaker identification、gender recognition 和 keyword spotting 任务上进行评估。
results: MBT 在原始 BT 上进行了改进，特别是在使用有限目标数据进行练习时，对下游任务的表现有所提高，这说明设计目标函数可以促进不变和可转移的表现。

Abstract
The choice of the objective function is crucial in emerging high-quality representations from self-supervised learning. This paper investigates how different formulations of the Barlow Twins (BT) objective impact downstream task performance for speech data. We propose Modified Barlow Twins (MBT) with normalized latents to enforce scale-invariance and evaluate on speaker identification, gender recognition and keyword spotting tasks. Our results show MBT improves representation generalization over original BT, especially when fine-tuning with limited target data. This highlights the importance of designing objectives that encourage invariant and transferable representations. Our analysis provides insights into how the BT learning objective can be tailored to produce speech representations that excel when adapted to new downstream tasks. This study is an important step towards developing reusable self-supervised speech representations.

摘要
“选择目标函数的选择非常重要，以获得自动学习中的高质量表现。这篇论文探讨了不同形式的巴洛兄弟（BT）目标函数如何影响下游任务性能，特别是对于语音数据。我们提出了修改后的巴洛兄弟（MBT）目标函数，强制采用Normalized latents来保持尺度对称性，并评估了语音识别、性别识别和关键词搜寻任务。我们的结果显示MBT可以提高表现普遍化，特别是在有限目标数据下进行精致化。这诉求了设计目标函数，以导引不对称和可转移的表现。我们的分析提供了关于BT学习目标函数如何适应语音表现，以便在新的下游任务中进行适应。这个研究是发展可重用自动学习语音表现的重要一步。”

Filtration Surfaces for Dynamic Graph Classification

paper_url: http://arxiv.org/abs/2309.03616
repo_url: https://github.com/aidos-lab/filtration_surfaces
paper_authors: Franz Srambical, Bastian Rieck
for: 本文为了解决动态图 classification 中的缺陷，提出了一种新的方法——筛选面。
methods: 本文使用的方法是基于筛选面的，具有扩展性和灵活性，可以解决现有基eline的缺陷。
results: 实验表明，筛选面方法可以超越现有的基eline，在具有边重量信息的数据集上达到最高效果，而且只需一个或者没有参数。

Abstract
Existing approaches for classifying dynamic graphs either lift graph kernels to the temporal domain, or use graph neural networks (GNNs). However, current baselines have scalability issues, cannot handle a changing node set, or do not take edge weight information into account. We propose filtration surfaces, a novel method that is scalable and flexible, to alleviate said restrictions. We experimentally validate the efficacy of our model and show that filtration surfaces outperform previous state-of-the-art baselines on datasets that rely on edge weight information. Our method does so while being either completely parameter-free or having at most one parameter, and yielding the lowest overall standard deviation.

摘要
现有的方法对动态图进行分类 either 将图kernels升级到时间频谱中，或使用图神经网络（GNNs）。然而，现有的基线有可靠性问题，不能处理变化的节点集，或者不考虑边重要性信息。我们提出了筛选表面，一种可扩展和灵活的方法，以解决这些限制。我们通过实验证明了我们的模型的有效性，并证明了筛选表面在基于边重要性信息的数据集上表现出色，而且只需一个或者无参数，并且具有最低的总标准差。

Your Battery Is a Blast! Safeguarding Against Counterfeit Batteries with Authentication

paper_url: http://arxiv.org/abs/2309.03607
repo_url: https://github.com/mhackiori/eisthentication
paper_authors: Francesco Marchiori, Mauro Conti
for: 提高Li-ion电池验证技术，以确保设备只使用合法电池，保证系统的操作状况和用户的安全。
methods: 提出了两种新的验证方法：DCAuth和EIS验证，通过机器学习模型利用电池内部特征进行自动验证，不需要外部设备，并能抵抗通行的大规模伪造技术。
results: 对20个数据集进行分析，实现了高精度的电池验证（最高达0.99）和模型识别（最高达0.96），同时提供了相对识别性。

Abstract
Lithium-ion (Li-ion) batteries are the primary power source in various applications due to their high energy and power density. Their market was estimated to be up to 48 billion U.S. dollars in 2022. However, the widespread adoption of Li-ion batteries has resulted in counterfeit cell production, which can pose safety hazards to users. Counterfeit cells can cause explosions or fires, and their prevalence in the market makes it difficult for users to detect fake cells. Indeed, current battery authentication methods can be susceptible to advanced counterfeiting techniques and are often not adaptable to various cells and systems. In this paper, we improve the state of the art on battery authentication by proposing two novel methodologies, DCAuth and EISthentication, which leverage the internal characteristics of each cell through Machine Learning models. Our methods automatically authenticate lithium-ion battery models and architectures using data from their regular usage without the need for any external device. They are also resilient to the most common and critical counterfeit practices and can scale to several batteries and devices. To evaluate the effectiveness of our proposed methodologies, we analyze time-series data from a total of 20 datasets that we have processed to extract meaningful features for our analysis. Our methods achieve high accuracy in battery authentication for both architectures (up to 0.99) and models (up to 0.96). Moreover, our methods offer comparable identification performances. By using our proposed methodologies, manufacturers can ensure that devices only use legitimate batteries, guaranteeing the operational state of any system and safety measures for the users.

摘要
锂离子（Li-ion）电池是各种应用的主要能源来源，其市场估计在2022年可达480亿美元。然而，广泛采用锂离子电池的使用导致假电池的生产，这可能会对用户造成安全隐患。假电池可能会引起爆炸或火灾，并且在市场中充斥，使用户很难detect假电池。实际上，当前的电池认证方法可能会受到先进的假制技术的影响，并且不能适应不同的电池和系统。在这篇论文中，我们提出了两种新的方法，DCAuth和EIS authentication，它们利用每个电池的内部特征通过机器学习模型进行自动认证。我们的方法不需要任何外部设备，可以自动认证锂离子电池模型和架构，并且具有抗常见和最 kritical假制技术的能力。为评估我们提出的方法的效果，我们分析了20个数据集，并从中提取了有用的特征进行分析。我们的方法在锂离子电池模型和架构上达到了高精度的认证效果（达0.99），并且具有相当的标识表现。通过使用我们的方法，制造商可以确保设备只使用合法电池，保证系统的运行状态和用户的安全措施。

Beyond attention: deriving biologically interpretable insights from weakly-supervised multiple-instance learning models

paper_url: http://arxiv.org/abs/2309.03925
repo_url: None
paper_authors: Willem Bonnaffé, CRUK ICGC Prostate Group, Freddie Hamdy, Yang Hu, Ian Mills, Jens Rittscher, Clare Verrill, Dan J. Woodcock
for: 本研究旨在提高多例学习（MIL）中模型对肿瘤病理图像的预测性能，并提供更好的解释性。
methods: 本研究使用了一种post-training分析方法，包括 combing tile-level attention和预测分数生成的精细编码器，以计算高吸引区域的预测贡献。此外，研究还引入了一种生物特征实现技术，通过与核体分割Masks进行集成，以提供与生物学意义相关的特征，并且可以与已知临床特征进行比较。
results: 研究发现，对于肿瘤诊断（i.e. 癌细胞存在的样本，381/516组织样本）和诊断（i.e. 患者经受了手术后的生化倒退，98/663组织样本）的预测结果，高吸引区域并不一定与肿瘤区域重叠，这表明需要研究非癌细胞when评估诊断。

Abstract
Recent advances in attention-based multiple instance learning (MIL) have improved our insights into the tissue regions that models rely on to make predictions in digital pathology. However, the interpretability of these approaches is still limited. In particular, they do not report whether high-attention regions are positively or negatively associated with the class labels or how well these regions correspond to previously established clinical and biological knowledge. We address this by introducing a post-training methodology to analyse MIL models. Firstly, we introduce prediction-attention-weighted (PAW) maps by combining tile-level attention and prediction scores produced by a refined encoder, allowing us to quantify the predictive contribution of high-attention regions. Secondly, we introduce a biological feature instantiation technique by integrating PAW maps with nuclei segmentation masks. This further improves interpretability by providing biologically meaningful features related to the cellular organisation of the tissue and facilitates comparisons with known clinical features. We illustrate the utility of our approach by comparing PAW maps obtained for prostate cancer diagnosis (i.e. samples containing malignant tissue, 381/516 tissue samples) and prognosis (i.e. samples from patients with biochemical recurrence following surgery, 98/663 tissue samples) in a cohort of patients from the international cancer genome consortium (ICGC UK Prostate Group). Our approach reveals that regions that are predictive of adverse prognosis do not tend to co-locate with the tumour regions, indicating that non-cancer cells should also be studied when evaluating prognosis.

摘要
近期的注意力基本多例学习（MIL）技术进步，有助于我们更深入了解在数字 PATHOLOGY 中模型如何预测的组织区域。然而，这些方法的解释能力仍然有限。具体来说，它们不会报告高注意区域与类别标签之间的相关性是正向或负向的，也不会报告这些区域与已知的临床和生物学知识是如何对应的。我们解决这个问题，通过对 MIL 模型进行后处理分析方法。首先，我们引入预测注意重量（PAW）地图，通过将瓦片级别注意力和预测得分结果结合起来，以量化预测中高注意区域的预测贡献。其次，我们引入生物特征实例化技术，通过将 PAW 地图与核型分割mask 结合起来，进一步提高解释性，并提供生物学意义的特征，与已知临床特征进行比较。我们通过对抑肿癌诊断（i.e. 病理样本中含有恶性组织，381/516 组织样本）和诊断后发生 biochemical 回卷的患者群体（i.e. 来自国际癌基因组计划（ICGC） UK 频谱癌组）进行比较，我们的方法显示，预测不良结果的预测区域与肿瘤区域不常协同存在，这表明，在评估诊断时，也应该考虑非癌细胞。

Trinary Decision Trees for missing value handling

paper_url: http://arxiv.org/abs/2309.03561
repo_url: None
paper_authors: Henning Zakrisson
for: 提高决策树回归和分类器中处理缺失数据的性能
methods: 提出了一种新的三元决策树算法，不 assumptions 缺失值包含响应变量的信息
results: 对MCAR和IM两种缺失数据场景进行了 theoret calculations 和实证 validate，与已有算法相比，三元树在MCAR场景下表现更好，特别是只缺失外样本时，而在IM场景下表现较差。一种混合模型三元MIA树，结合三元树和MIA方法，在所有缺失数据场景下表现稳定。

Abstract
This paper introduces the Trinary decision tree, an algorithm designed to improve the handling of missing data in decision tree regressors and classifiers. Unlike other approaches, the Trinary decision tree does not assume that missing values contain any information about the response. Both theoretical calculations on estimator bias and numerical illustrations using real data sets are presented to compare its performance with established algorithms in different missing data scenarios (Missing Completely at Random (MCAR), and Informative Missingness (IM)). Notably, the Trinary tree outperforms its peers in MCAR settings, especially when data is only missing out-of-sample, while lacking behind in IM settings. A hybrid model, the TrinaryMIA tree, which combines the Trinary tree and the Missing In Attributes (MIA) approach, shows robust performance in all types of missingness. Despite the potential drawback of slower training speed, the Trinary tree offers a promising and more accurate method of handling missing data in decision tree algorithms.

摘要
Translated into Simplified Chinese:这篇论文介绍了一种名为Trinary decision tree的算法，用于改进决策树回归和分类器中处理缺失数据的方法。与其他方法不同，Trinary decision tree不 assumptions 缺失值包含响应值的信息。在不同的缺失数据场景（MCAR和IM）中，对其性能进行了 theoretically 计算和使用实际数据进行 illustrate ，与已有算法进行比较。特别是在MCAR场景中，Trinary tree 在数据只缺失外样时表现出色，而在IM场景中则落后于其他算法。一种混合模型，TrinaryMIA tree，通过将Trinary tree 和Missing In Attributes（MIA）方法结合起来，在所有类型的缺失数据场景中表现稳定。尽管可能会增加训练速度的慢化，但Trinary tree 提供了一种更准确和有前途的缺失数据处理方法。

On the dynamics of multi agent nonlinear filtering and learning

paper_url: http://arxiv.org/abs/2309.03557
repo_url: None
paper_authors: Sayed Pouria Talebi, Danilo Mandic
for: Multiagent systems aim to accomplish highly complex learning tasks through decentralized consensus seeking dynamics, and their use has garnered a great deal of attention in the signal processing and computational intelligence societies.
methods: The paper presents a general formulation for the actions of an agent in multiagent networked systems and conditions for achieving a cohesive learning behavior.
results: The paper applies the derived framework in distributed and federated learning scenarios.Here’s the same information in Traditional Chinese:
for: Multiagent systems aim to accomplish highly complex learning tasks through decentralized consensus seeking dynamics, and their use has garnered a great deal of attention in the signal processing and computational intelligence societies.
methods: The paper presents a general formulation for the actions of an agent in multiagent networked systems and conditions for achieving a cohesive learning behavior.
results: The paper applies the derived framework in distributed and federated learning scenarios.

Abstract
Multiagent systems aim to accomplish highly complex learning tasks through decentralised consensus seeking dynamics and their use has garnered a great deal of attention in the signal processing and computational intelligence societies. This article examines the behaviour of multiagent networked systems with nonlinear filtering/learning dynamics. To this end, a general formulation for the actions of an agent in multiagent networked systems is presented and conditions for achieving a cohesive learning behaviour is given. Importantly, application of the so derived framework in distributed and federated learning scenarios are presented.

摘要
多智能体系统目的是通过分散的同意决策方式实现高度复杂的学习任务，这在信号处理和计算智能学会中备受关注。这篇文章研究了多智能网络系统的非线性筛选/学习动态行为。为此，我们提供了多智能Agent的通用形式和各种学习情况下的凝结行为条件。更重要的是，我们在分布式和联邦学习场景中应用了所 derivation的框架。

MVD:A Novel Methodology and Dataset for Acoustic Vehicle Type Classification

paper_url: http://arxiv.org/abs/2309.03544
repo_url: None
paper_authors: Mohd Ashhad, Omar Ahmed, Sooraj K. Ambat, Zeeshan Ali Haq, Mansaf Alam
for: 本研究的目的是提供一种可靠的听音交通监测方法，以替代计算机视觉技术。
methods: 本研究使用了听音特征和多输入神经网络来分类汽车、摩托车、卡车和无车等四类听音信号。
results: 实验结果表明，我们的方法可以高度准确地分类听音信号，并且比前一些研究的基准值高出91.98%和96.66%。此外，我们还将模型部署到了Android应用程序中，以便演示其可用性。

Abstract
Rising urban populations have led to a surge in vehicle use and made traffic monitoring and management indispensable. Acoustic traffic monitoring (ATM) offers a cost-effective and efficient alternative to more computationally expensive methods of monitoring traffic such as those involving computer vision technologies. In this paper, we present MVD and MVDA: two open datasets for the development of acoustic traffic monitoring and vehicle-type classification algorithms, which contain audio recordings of moving vehicles. The dataset contain four classes- Trucks, Cars, Motorbikes, and a No-vehicle class. Additionally, we propose a novel and efficient way to accurately classify these acoustic signals using cepstrum and spectrum based local and global audio features, and a multi-input neural network. Experimental results show that our methodology improves upon the established baselines of previous works and achieves an accuracy of 91.98% and 96.66% on MVD and MVDA Datasets, respectively. Finally, the proposed model was deployed through an Android application to make it accessible for testing and demonstrate its efficacy.

摘要
城市人口增长导致交通量的增加，使得交通监测和管理成为不可或缺的。鸣音交通监测（ATM）提供了一种可靠且高效的代替方案，而不是基于计算机视觉技术的更复杂和费力的监测方法。在这篇论文中，我们提供了两个开放的数据集，即MVD和MVDA数据集，用于开发鸣音交通监测和车辆类型分类算法。这两个数据集包含了四个类别：卡车、汽车、摩托车和无车类。此外，我们还提出了一种新的和高效的鸣音信号分类方法，使用cepstrum和spectrum基于本地和全球音频特征，以及多输入神经网络。实验结果表明，我们的方法超过了之前的基准值，并达到了MVD和MVDA数据集的准确率为91.98%和96.66%。最后，我们通过Android应用程序部署了我们的模型，以便进行测试和证明其可靠性。

Subgraph-based Tight Frames on Graphs with Compact Supports and Vanishing Moments

paper_url: http://arxiv.org/abs/2309.03537
repo_url: None
paper_authors: Ruigang Zheng, Xiaosheng Zhuang
for: 这个论文是为了构建图像中的紧凑支持的新方法，以替代现有的基于分区树的方法。
methods: 该方法使用一系列层次分 partitions，并将subgraph Laplacians incorporated into its design，以便可以自由地调整(subgraph)vanishing moments和其他特性，如方向性，以便有效地表示图像信号。
results: 实验结果显示，提议的图像帧在非线性近似任务中表现出色。

Abstract
In this work, we proposed a novel and general method to construct tight frames on graphs with compact supports based on a series of hierarchical partitions. Starting from our abstract construction that generalizes previous methods based on partition trees, we are able to flexibly incorporate subgraph Laplacians into our design of graph frames. Consequently, our general methods permit adjusting the (subgraph) vanishing moments of the framelets and extra properties, such as directionality, for efficiently representing graph signals with path-like supports. Several variants are explicitly defined and tested. Experimental results show our proposed graph frames perform superiorly in non-linear approximation tasks.

摘要
在这项工作中，我们提出了一种新的和普遍的方法，用于在图上建立紧凑支持的图帧。我们从抽象构造起点，总结了之前基于分区树的方法，并可以自由地将子图 Laplacians incorporated 到我们的帧设计中。因此，我们的通用方法允许调整（子图）消失时刻的帧谱和其他属性，如方向性，以高效地表示图示号 signals with path-like 支持。我们还明确定义了多种变体并进行测试。实验结果表明，我们的提议的图帧在非线性approximation任务中表现出色。Here's the translation breakdown:* "In this work" is translated as "在这项工作中" (在这项工作中).* "we proposed" is translated as "我们提出了" (我们提出了).* "a novel and general method" is translated as "一种新的和普遍的方法" (一种新的和普遍的方法).* "to construct tight frames on graphs with compact supports" is translated as "用于在图上建立紧凑支持的图帧" (用于在图上建立紧凑支持的图帧).* "Starting from our abstract construction that generalizes previous methods based on partition trees" is translated as "从抽象构造起点，总结了之前基于分区树的方法" (从抽象构造起点，总结了之前基于分区树的方法).* "we are able to flexibly incorporate subgraph Laplacians into our design of graph frames" is translated as "并可以自由地将子图 Laplacians incorporated 到我们的帧设计中" (并可以自由地将子图 Laplacians incorporated 到我们的帧设计中).* "Consequently, our general methods permit adjusting the (subgraph) vanishing moments of the framelets and extra properties, such as directionality" is translated as "因此，我们的通用方法允许调整（子图）消失时刻的帧谱和其他属性，如方向性" (因此，我们的通用方法允许调整（子图）消失时刻的帧谱和其他属性，如方向性).* "Several variants are explicitly defined and tested" is translated as "我们还明确定义了多种变体并进行测试" (我们还明确定义了多种变体并进行测试).* "Experimental results show our proposed graph frames perform superiorly in non-linear approximation tasks" is translated as "实验结果表明，我们的提议的图帧在非线性approximation任务中表现出色" (实验结果表明，我们的提议的图帧在非线性approximation任务中表现出色).

Privacy-preserving Continual Federated Clustering via Adaptive Resonance Theory

paper_url: http://arxiv.org/abs/2309.03487
repo_url: https://github.com/Masuyama-lab/FCAC
paper_authors: Naoki Masuyama, Yusuke Nojima, Yuichiro Toda, Chu Kiong Loo, Hisao Ishibuchi, Naoyuki Kubota
for: 提高数据隐私保护的重要性，许多隐私保护机器学习方法被提出，在聚类领域，使用联邦学习框架（即联邦聚类）的各种算法已经广泛研究，表现出高聚类性能而保护数据隐私。
methods: 本文提出的隐私保护联邦聚类算法使用了适应振荡理论基于的聚类算法，具有自适应学习能力。
results: 实验结果表明，提出的算法在实验用 synthetic 和实际世界数据集上具有较高的聚类性能，同时实现了数据隐私保护和自适应学习能力。

Abstract
With the increasing importance of data privacy protection, various privacy-preserving machine learning methods have been proposed. In the clustering domain, various algorithms with a federated learning framework (i.e., federated clustering) have been actively studied and showed high clustering performance while preserving data privacy. However, most of the base clusterers (i.e., clustering algorithms) used in existing federated clustering algorithms need to specify the number of clusters in advance. These algorithms, therefore, are unable to deal with data whose distributions are unknown or continually changing. To tackle this problem, this paper proposes a privacy-preserving continual federated clustering algorithm. In the proposed algorithm, an adaptive resonance theory-based clustering algorithm capable of continual learning is used as a base clusterer. Therefore, the proposed algorithm inherits the ability of continual learning. Experimental results with synthetic and real-world datasets show that the proposed algorithm has superior clustering performance to state-of-the-art federated clustering algorithms while realizing data privacy protection and continual learning ability. The source code is available at \url{https://github.com/Masuyama-lab/FCAC}.

摘要
随着数据隐私保护的重要性日益增加，各种隐私保护机器学习方法已经被提出。在聚类领域，使用联邦学习框架（即联邦聚类）的各种算法已经得到了广泛的研究和应用，其中大多数基本聚类算法（即聚类算法）需要在先 specify the number of clusters 的情况下进行设置。这些算法因此无法处理数据的分布是未知的或在变化的情况下。为解决这个问题，本文提出了一种隐私保护的 continual federated clustering 算法。在提出的算法中，使用 adaptive resonance theory 基于的聚类算法，可以实现 continual learning 的能力。实验结果表明，与州际顶尖的联邦聚类算法相比，提出的算法具有更高的聚类性能，同时实现了数据隐私保护和 continual learning 的能力。源代码可以在 \url{https://github.com/Masuyama-lab/FCAC} 上获取。

Cross-domain Sound Recognition for Efficient Underwater Data Analysis

paper_url: http://arxiv.org/abs/2309.03451
repo_url: None
paper_authors: Jeongsoo Park, Dong-Gyun Han, Hyoung Sul La, Sangmin Lee, Yoonchang Han, Eun-Jin Yang
for: 本研究は大量の海中响のデータ分析に novel deep learning アプローチを提案しています。
methods: 本研究では、非水中（エアリアル）音の识别モデルを使用したFeature vectorを使用して、海中データをPCAおよびUMAPVisualizationします。また、これらのクラスター内のポイントを选択し、それらの特徴を理解するために、労力的なラベル付けプロセスを加速します。
results: 本研究では、エアグン・サウンドの识别において、非水中データと海中データを使用したニューラルネットワークモデルをトレーニングしました。结果、 precision、recall、およびF1 Scoreの评価指标に基づいて、我々のモデルがエアグン・サウンドを识别するための效果的な方法です。

Abstract
This paper presents a novel deep learning approach for analyzing massive underwater acoustic data by leveraging a model trained on a broad spectrum of non-underwater (aerial) sounds. Recognizing the challenge in labeling vast amounts of underwater data, we propose a two-fold methodology to accelerate this labor-intensive procedure. The first part of our approach involves PCA and UMAP visualization of the underwater data using the feature vectors of an aerial sound recognition model. This enables us to cluster the data in a two dimensional space and listen to points within these clusters to understand their defining characteristics. This innovative method simplifies the process of selecting candidate labels for further training. In the second part, we train a neural network model using both the selected underwater data and the non-underwater dataset. We conducted a quantitative analysis to measure the precision, recall, and F1 score of our model for recognizing airgun sounds, a common type of underwater sound. The F1 score achieved by our model exceeded 84.3%, demonstrating the effectiveness of our approach in analyzing underwater acoustic data. The methodology presented in this paper holds significant potential to reduce the amount of labor required in underwater data analysis and opens up new possibilities for further research in the field of cross-domain data analysis.

摘要

PCA and UMAP visualization of underwater data using the feature vectors of an aerial sound recognition model. This enables clustering of the data in a two-dimensional space and allows for understanding the defining characteristics of points within these clusters.2. Training a neural network model using both the selected underwater data and the non-underwater dataset. The model achieved an F1 score of over 84.3% in recognizing airgun sounds, a common type of underwater sound.The proposed methodology has the potential to significantly reduce the amount of labor required in underwater data analysis and opens up new possibilities for cross-domain data analysis.

Broadband Ground Motion Synthesis via Generative Adversarial Neural Operators: Development and Validation

paper_url: http://arxiv.org/abs/2309.03447
repo_url: https://github.com/yzshi5/gm-gano
paper_authors: Yaozhong Shi, Grigorios Lavrentiadis, Domniki Asimaki, Zachary E. Ross, Kamyar Azizzadenesheli
for:* The paper is written for ground-motion synthesis using a Generative Adversarial Neural Operator (GANO) to generate three-component acceleration time histories.methods:* The paper uses Neural Operators, a resolution invariant architecture that guarantees the model training is independent of the data sampling frequency.* The authors use a conditional ground-motion synthesis algorithm (cGM-GANO) that combines recent advancements in machine learning and open access strong motion data sets.results:* The paper shows that cGM-GANO can recover the magnitude, distance, and $V_{S30}$ scaling of Fourier amplitude and pseudo-spectral accelerations.* The framework is evaluated through residual analysis with the empirical dataset and comparison with conventional Ground Motion Models (GMMs) for selected ground motion scenarios.* The results show that cGM-GANO produces consistent median scaling with the GMMs for the corresponding tectonic environments, with the largest misfit observed at short distances.

Abstract
We present a data-driven model for ground-motion synthesis using a Generative Adversarial Neural Operator (GANO) that combines recent advancements in machine learning and open access strong motion data sets to generate three-component acceleration time histories conditioned on moment magnitude ($M$), rupture distance ($R_{rup}$), time-average shear-wave velocity at the top $30m$ ($V_{S30}$), and tectonic environment or style of faulting. We use Neural Operators, a resolution invariant architecture that guarantees that the model training is independent of the data sampling frequency. We first present the conditional ground-motion synthesis algorithm (referred to heretofore as cGM-GANO) and discuss its advantages compared to previous work. Next, we verify the cGM-GANO framework using simulated ground motions generated with the Southern California Earthquake Center (SCEC) Broadband Platform (BBP). We lastly train cGM-GANO on a KiK-net dataset from Japan, showing that the framework can recover the magnitude, distance, and $V_{S30}$ scaling of Fourier amplitude and pseudo-spectral accelerations. We evaluate cGM-GANO through residual analysis with the empirical dataset as well as by comparison with conventional Ground Motion Models (GMMs) for selected ground motion scenarios. Results show that cGM-GANO produces consistent median scaling with the GMMs for the corresponding tectonic environments. The largest misfit is observed at short distances due to the scarcity of training data. With the exception of short distances, the aleatory variability of the response spectral ordinates is also well captured, especially for subduction events due to the adequacy of training data. Applications of the presented framework include generation of risk-targeted ground motions for site-specific engineering applications.

摘要
我们提出了一个基于数据驱动的模型，用于地震动的合成，使用生成各种推测运算（GANO），结合了最新的机器学习技术和公开存取的强震动数据集，以生成三维加速度时间历史，受到震内功率（M）、碰撞距离（Rrup）、震内速度（V_{S30）和地震类型或构造。我们使用神经操作员，一种解析独立的架构，使得模型训练不受数据采样频率的影响。我们首先介绍了增测地震动的条件Synthesis（cGM-GANO）algorithm，并详细讨论其优点相比前期工作。接着，我们验证了cGM-GANO框架，使用南加州地震中心（SCEC）的广泛频率平台（BBP）生成的模拟地震动。最后，我们将cGM-GANO框架训练在日本的KiK-net数据集上，显示了这个框架可以重建震内功率、距离和震内速度的数值对应。我们透过差异分析和与传统的地震模型（GMMs）进行比较，评估cGM-GANO的性能。结果显示，cGM-GANO在不同的地震类型下具有一致的中位数弹性，但短距离下存在较大的差异。对于不同的地震enario，aleatory variability of the response spectral ordinates是很好地捕捉，特别是在SUBDUCTION事件中，因为训练数据的充足。应用包括生成基于风险的地震动，供特定工程应用。

Personalized Tucker Decomposition: Modeling Commonality and Peculiarity on Tensor Data

paper_url: http://arxiv.org/abs/2309.03439
repo_url: None
paper_authors: Jiuyun Hu, Naichen Shi, Raed Al Kontar, Hao Yan
for: 用于捕捉不同数据集中的差异性，提高tensor分解方法的表达能力。
methods: 使用个性化Tucker分解（perTucker），分解tensor数据为共享全球组件和个性化本地组件。采用模式正交假设，开发了贝叶斯迭代降维算法，可确保收敛到站点点。
results: 通过在模拟研究和两个实际案例（太阳黑暴报警和船用信号分类）中证明perTucker的效果，包括异常检测、客户分类和封装。

Abstract
We propose personalized Tucker decomposition (perTucker) to address the limitations of traditional tensor decomposition methods in capturing heterogeneity across different datasets. perTucker decomposes tensor data into shared global components and personalized local components. We introduce a mode orthogonality assumption and develop a proximal gradient regularized block coordinate descent algorithm that is guaranteed to converge to a stationary point. By learning unique and common representations across datasets, we demonstrate perTucker's effectiveness in anomaly detection, client classification, and clustering through a simulation study and two case studies on solar flare detection and tonnage signal classification.

摘要
我们提出个性化图cker decompositions（perTucker），以解决传统矩阵分解方法不能捕捉不同数据集之间的多样性的问题。perTucker将矩阵数据分解为共享全局组件和个性化本地组件。我们提出了一种方差归一化假设，并开发了一种距离正则化块坐标梯度下降算法，这个算法可以保证 converges to a stationary point。通过学习不同数据集之间的共同和特有表示，我们示出perTucker在异常检测、客户分类和聚类方面的效果，通过一个模拟研究和两个案例研究：太阳闪烁检测和吨位信号分类。

Byzantine-Robust Federated Learning with Variance Reduction and Differential Privacy

paper_url: http://arxiv.org/abs/2309.03437
repo_url: None
paper_authors: Zikai Zhang, Rui Hu
for: 保护数据隐私和强化模型训练 robustness against Byzantine attacks
methods: 引入简化和动量驱动的差异性隐私机制，并在安全设计中保持客户端级别隐私保证
results: 在不同的异步 dataset和任务上进行了广泛的实验，并证明了我们的框架可以提高系统对恶意攻击的抵抗力，同时保持强大的隐私保证。

Abstract
Federated learning (FL) is designed to preserve data privacy during model training, where the data remains on the client side (i.e., IoT devices), and only model updates of clients are shared iteratively for collaborative learning. However, this process is vulnerable to privacy attacks and Byzantine attacks: the local model updates shared throughout the FL network will leak private information about the local training data, and they can also be maliciously crafted by Byzantine attackers to disturb the learning. In this paper, we propose a new FL scheme that guarantees rigorous privacy and simultaneously enhances system robustness against Byzantine attacks. Our approach introduces sparsification- and momentum-driven variance reduction into the client-level differential privacy (DP) mechanism, to defend against Byzantine attackers. The security design does not violate the privacy guarantee of the client-level DP mechanism; hence, our approach achieves the same client-level DP guarantee as the state-of-the-art. We conduct extensive experiments on both IID and non-IID datasets and different tasks and evaluate the performance of our approach against different Byzantine attacks by comparing it with state-of-the-art defense methods. The results of our experiments show the efficacy of our framework and demonstrate its ability to improve system robustness against Byzantine attacks while achieving a strong privacy guarantee.

摘要
Federation Learning (FL) 是设计优先保护数据隐私，在模型训练过程中数据都留在客户端（即物联网设备），并且只有客户端的模型更新被共同学习。然而，这个过程受到隐私攻击和恶意攻击的威胁：本地模型更新在 FL 网络中传输的数据会泄露客户端的训练数据隐私信息，而且可以由恶意攻击者预制作假数据来干扰学习。在这篇论文中，我们提出了一种新的 FL 方案，保证了严格的隐私和同时增强了系统对恶意攻击的抗性。我们的方法通过在客户端级别的权限机制中引入缩减和动量驱动的减少方法，以防止恶意攻击者的干扰。我们的安全设计不违反客户端级别的隐私保证机制，因此我们的方法实现了同样的客户端级别隐私保证。我们进行了对不同的 dataset 和任务的广泛实验，并对不同的恶意攻击进行比较，以评估我们的方法的性能。实验结果表明我们的框架具有强大的隐私保证和系统抗性的能力，并且在不同的恶意攻击下都能够实现优秀的性能。

Equal Long-term Benefit Rate: Adapting Static Fairness Notions to Sequential Decision Making

paper_url: http://arxiv.org/abs/2309.03426
repo_url: https://github.com/yuancheng-xu/elbert
paper_authors: Yuancheng Xu, Chenghao Deng, Yanchao Sun, Ruijie Zheng, Xiyao Wang, Jieyu Zhao, Furong Huang
for: 本研究旨在提出一种长期公平性考虑的决策模型，以解决随机决策中的偏见问题。
methods: 本研究使用Markov决策过程（MDP）框架来形式化长期公平性考虑。它定义了长期偏见为每个时间步骤的偏见之和，但是这种方法可能会导致假的公平性感知，因为它不考虑不同时间步骤之间的重要性差异。本研究提出了一种新的长期公平性考虑方法called Equal Long-term Benefit Rate（ELBERT），它考虑了不同时间步骤之间的变化性，并将静止公平性原则应用到顺序设置中。
results: 实验结果表明，ELBERT-PO方法可以减少偏见并保持高效用。Code可以在https://github.com/Yuancheng-Xu/ELBERT中下载。

Abstract
Decisions made by machine learning models may have lasting impacts over time, making long-term fairness a crucial consideration. It has been shown that when ignoring the long-term effect, naively imposing fairness criterion in static settings can actually exacerbate bias over time. To explicitly address biases in sequential decision-making, recent works formulate long-term fairness notions in Markov Decision Process (MDP) framework. They define the long-term bias to be the sum of static bias over each time step. However, we demonstrate that naively summing up the step-wise bias can cause a false sense of fairness since it fails to consider the importance difference of different time steps during transition. In this work, we introduce a long-term fairness notion called Equal Long-term Benefit Rate (ELBERT), which explicitly considers varying temporal importance and adapts static fairness principles to the sequential setting. Moreover, we show that the policy gradient of Long-term Benefit Rate can be analytically reduced to standard policy gradient. This makes standard policy optimization methods applicable for reducing the bias, leading to our proposed bias mitigation method ELBERT-PO. Experiments on three sequential decision making environments show that ELBERT-PO significantly reduces bias and maintains high utility. Code is available at https://github.com/Yuancheng-Xu/ELBERT.

摘要
决策机器学习模型的决策可能会有持续的影响，因此长期公平是一项重要考虑因素。研究发现，在忽略长期效果的情况下，简单地在静态设置下应用公平准则可能会加剧偏见。为了显式地处理序列决策中的偏见，最近的研究在Markov决策过程（MDP）框架中表述了长期公平观。它定义了长期偏见为每个时间步骤的公平差异的总和。然而，我们展示了将每步骤的偏见相加可能会导致假的公平感，因为它不考虑不同时间步骤之间的重要性差异。在这种情况下，我们引入了一种长期公平观念called Equal Long-term Benefit Rate（ELBERT），它明确考虑了不同时间步骤之间的变化 temporal importance，并将静态公平原则应用到序列设置中。此外，我们还证明了Long-term Benefit Rate的政策梯度可以通过标准政策梯度来降低偏见，导致我们的偏见缓解方法ELBERT-PO。在三个序列决策环境中，ELBERT-PO显著减少了偏见，同时保持了高的用 utility。代码可以在https://github.com/Yuancheng-Xu/ELBERT中获取。

2023-09-07

eess.IV

eess.IV - 2023-09-07

Secure Control of Networked Inverted Pendulum Visual Servo System with Adverse Effects of Image Computation (Extended Version)

paper_url: http://arxiv.org/abs/2309.03556
repo_url: None
paper_authors: Dajun Du, Changda Zhang, Qianjiang Lu, Minrui Fei, Huiyu Zhou
for: 提高网络传输图像信息安全性，防止图像攻击导致系统性能下降或崩溃。
methods: 提出一种基于快速缩放选择图像加密（F2SIE）算法的新的网络倾斜镜视测系统（NIPVSS），不仅实现了实时要求，而且提高了安全性。
results: 通过实验结果，证明提出的方法可以实现网络倾斜镜视测系统的可靠稳定性和安全性。

Abstract
When visual image information is transmitted via communication networks, it easily suffers from image attacks, leading to system performance degradation or even crash. This paper investigates secure control of networked inverted pendulum visual servo system (NIPVSS) with adverse effects of image computation. Firstly, the image security limitation of the traditional NIPVSS is revealed, where its stability will be destroyed by eavesdropping-based image attacks. Then, a new NIPVSS with the fast scaled-selective image encryption (F2SIE) algorithm is proposed, which not only meets the real-time requirement by reducing the computational complexity, but also improve the security by reducing the probability of valuable information being compromised by eavesdropping-based image attacks. Secondly, adverse effects of the F2SIE algorithm and image attacks are analysed, which will produce extra computational delay and errors. Then, a closed-loop uncertain time-delay model of the new NIPVSS is established, and a robust controller is designed to guarantee system asymptotic stability. Finally, experimental results of the new NIPVSS demonstrate the feasibility and effectiveness of the proposed method.

摘要
当视觉图像信息通过通信网络传输时，容易受到图像攻击，导致系统性能下降或甚至崩溃。这篇论文研究了安全控制的网络倒挺镜视服系统（NIPVSS），并对图像计算的副作用进行分析。首先，传统NIPVSS的图像安全限制被揭示，其稳定性会由侦测图像攻击而被破坏。然后，一种新的NIPVSS，即快速缩放选择图像加密算法（F2SIE），被提出，该算法不仅可以满足实时要求，还能够提高安全性。其次，F2SIE算法和图像攻击的副作用被分析，它们会产生额外的计算延迟和错误。然后，一个关闭环路不确定时延模型的新NIPVSS被建立，并设计了一个robust控制器，以确保系统 asymptotic stability。最后，新NIPVSS的实验结果证明了提posed方法的可行性和效果。

2023-09-07

eess.SP

eess.SP - 2023-09-07

Channel Estimation for Quantized Systems based on Conditionally Gaussian Latent Models

paper_url: http://arxiv.org/abs/2309.04014
repo_url: None
paper_authors: Benedikt Fesl, Nurettin Turan, Benedikt Böck, Wolfgang Utschick
for: 这篇论文旨在开发一种适合粗量化系统的通道估计器。
methods: 这篇论文使用了条件 Gaussian 干扰模型，包括 Gaussian mixture models (GMMs)、mixture of factor analyzers (MFAs) 和 variational autoencoders (VAEs)，并将这些模型与实际的通道分布相映射。
results: 这篇论文透过实验示出了新的估计器的优越性，对于粗量化系统而言，它具有较低的平均方差误差 (MSE) 和可能的范围 (achievable rate) 等指标。

Abstract
This work introduces a novel class of channel estimators tailored for coarse quantization systems. The proposed estimators are founded on conditionally Gaussian latent generative models, specifically Gaussian mixture models (GMMs), mixture of factor analyzers (MFAs), and variational autoencoders (VAEs). These models effectively learn the unknown channel distribution inherent in radio propagation scenarios, providing valuable prior information. Conditioning on the latent variable of these generative models yields a locally Gaussian channel distribution, thus enabling the application of the well-known Bussgang decomposition. By exploiting the resulting conditional Bussgang decomposition, we derive parameterized linear minimum mean square error (MMSE) estimators for the considered generative latent variable models. In this context, we explore leveraging model-based structural features to reduce memory and complexity overhead associated with the proposed estimators. Furthermore, we devise necessary training adaptations, enabling direct learning of the generative models from quantized pilot observations without requiring ground-truth channel samples during the training phase. Through extensive simulations, we demonstrate the superiority of our introduced estimators over existing state-of-the-art methods for coarsely quantized systems, as evidenced by significant improvements in mean square error (MSE) and achievable rate metrics.

摘要

HDR Imaging With One-Bit Quantization

paper_url: http://arxiv.org/abs/2309.03982
repo_url: None
paper_authors: Arian Eamaz, Farhang Yeganegi, Mojtaba Soltanalian
for: 这篇论文旨在探讨模ulo sampling和杂谱一比特量化框架的相互作用，并将其应用于非射频信号中。
methods: 该论文使用模ulo ADC和杂谱一比特量化来实现高分辨率和低功耗。
results: 数值结果表明，在HDR图像恢复方面，模ulo sampling在非射频信号中具有极高的效果。

Abstract
Modulo sampling and dithered one-bit quantization frameworks have emerged as promising solutions to overcome the limitations of traditional analog-to-digital converters (ADCs) and sensors. Modulo sampling, with its high-resolution approach utilizing modulo ADCs, offers an unlimited dynamic range, while dithered one-bit quantization offers cost-efficiency and reduced power consumption while operating at elevated sampling rates. Our goal is to explore the synergies between these two techniques, leveraging their unique advantages, and to apply them to non-bandlimited signals within spline spaces. One noteworthy application of these signals lies in High Dynamic Range (HDR) imaging. In this paper, we expand upon the Unlimited One-Bit (UNO) sampling framework, initially conceived for bandlimited signals, to encompass non-bandlimited signals found in the context of HDR imaging. We present a novel algorithm rigorously examined for its ability to recover images from one-bit modulo samples. Additionally, we introduce a sufficient condition specifically designed for UNO sampling to perfectly recover non-bandlimited signals within spline spaces. Our numerical results vividly demonstrate the effectiveness of UNO sampling in the realm of HDR imaging.

摘要
幂等采样和杂音一比Quantization框架已经出现为超过传统分析数字转换器（ADC）和感测器的限制的有力解决方案。幂等采样使用高分辨率的模ulo ADC，可以实现无限的动态范围，而杂音一比Quantization可以提供Cost-efficient和降低能耗的优点，同时在提高采样率时运行。我们的目标是探索这两种技术之间的相互作用，利用它们独特的优点，并应用于非射频信号内spline空间。一个值得注意的应用之一是高动态范围（HDR）图像处理。在这篇论文中，我们扩展了由带宽限制的一比 sampling框架（UNO sampling），以包括非射频信号。我们提出了一个新的算法，并且对其进行了严格的分析，以确保该算法可以从一比模ulo 样本中恢复图像。此外，我们还提出了特定于 UNO sampling的 suficient condition，以确保在spline空间中完全回收非射频信号。我们的数值结果表现了 UNO sampling在HDR图像处理中的效果。

Multivariate, Multi-step, and Spatiotemporal Traffic Prediction for NextG Network Slicing under SLA Constraints

paper_url: http://arxiv.org/abs/2309.03898
repo_url: None
paper_authors: Evren Tuna, Alkan Soysal
for:这种研究旨在提出一种基于NextG移动网络的空间时间流量预测方法，以保证每个网络slice的服务级别协议(SLA)。methods:该方法是多变量、多步、空间时间的，利用20个无线接入网络(RAN)特征、峰值交通时间数据和移动性基于凝集来提出一个parametric SLA-based loss函数，以保证SLA违约率。results:我们的方法可以在单个维度、多个维度和slice级别进行流量预测，并对它们进行详细的比较分析。单个维度和多个维度训练架构的应用，单个维度训练可以提供单个维度级别的预测，而多个维度训练可以使用多个维度的交通数据进行预测。我们发现单个维度训练的方法可以在测试损失方面与基eline SLA-based和MAE-based模型相比，提供11.4%和38.1%的改进。此外，我们还探讨了slice级别的流量预测方法，并提出了单个slice和多个slice的方法。我们发现单个slice方法可以提供较高的测试损失改进，即28.2%, 36.4%和55.6%。

Abstract
This study presents a spatiotemporal traffic prediction approach for NextG mobile networks, ensuring the service-level agreements (SLAs) of each network slice. Our approach is multivariate, multi-step, and spatiotemporal. Leveraging 20 radio access network (RAN) features, peak traffic hour data, and mobility-based clustering, we propose a parametric SLA-based loss function to guarantee an SLA violation rate. We focus on single-cell, multi-cell, and slice-based prediction approaches and present a detailed comparative analysis of their performances, strengths, and limitations. First, we address the application of single-cell and multi-cell training architectures. While single-cell training offers individual cell-level prediction, multi-cell training involves training a model using traffic from multiple cells from the same or different base stations. We show that the single-cell approach outperforms the multi-cell approach and results in test loss improvements of 11.4% and 38.1% compared to baseline SLA-based and MAE-based models, respectively. Next, we explore slice-based traffic prediction. We present single-slice and multi-slice methods for slice-based downlink traffic volume prediction, arguing that multi-slice prediction offers a more accurate forecast. The slice-based model we introduce offers substantial test loss improvements of 28.2%, 36.4%, and 55.6% compared to our cell-based model, the baseline SLA-based model, and the baseline MAE-based model, respectively.

摘要
本研究提出了下一代移动网络（NextG）的空间时间流量预测方法，以保证每个网络卷（slice）的服务水平协议（SLA）。我们的方法是多变量、多步、空间时间的。利用20个无线接入网络（RAN）特征、峰值流量时间数据和移动性基于的分群，我们提出了一个参数化的SLA基于的损失函数，以确保SLA违反率。我们关注单细胞、多细胞和卷基本预测方法，并进行了详细的比较分析其性能、优势和局限性。首先，我们讨论了单细胞和多细胞训练架构。单细胞训练提供了单细胞级别的预测，而多细胞训练则是使用多个细胞的流量来训练模型。我们发现单细胞方法比多细胞方法更高效，并在测试损失上提供了11.4%和38.1%的改进，相比基eline SLA基本模型和MAE基本模型。然后，我们探索了卷基本的流量预测。我们介绍了单卷和多卷方法 для卷基本下行流量量预测，并论证了多卷预测提供了更准确的预测。我们的卷基本模型在测试损失上提供了28.2%, 36.4%和55.6%的改进，相比我们的细胞基本模型、基eline SLA基本模型和基eline MAE基本模型。

Private Membership Aggregation

paper_url: http://arxiv.org/abs/2309.03872
repo_url: None
paper_authors: Mohamed Nomeir, Sajani Vithana, Sennur Ulukus
for:The paper addresses the problem of private membership aggregation (PMA), where a user wants to count the number of times an element is stored in a system of independent parties without learning which element is being counted or which party has the element.methods:The paper proposes achievable schemes for four variants of the PMA problem based on the concept of cross-subspace alignment (CSA), which achieves linear communication complexity.results:The proposed schemes achieve better privacy and security constraints than previous $K$-PSI schemes, which require exponential complexity.

Abstract
We consider the problem of private membership aggregation (PMA), in which a user counts the number of times a certain element is stored in a system of independent parties that store arbitrary sets of elements from a universal alphabet. The parties are not allowed to learn which element is being counted by the user. Further, neither the user nor the other parties are allowed to learn the stored elements of each party involved in the process. PMA is a generalization of the recently introduced problem of $K$ private set intersection ($K$-PSI). The $K$-PSI problem considers a set of $M$ parties storing arbitrary sets of elements, and a user who wants to determine if a certain element is repeated at least at $K$ parties out of the $M$ parties without learning which party has the required element and which party does not. To solve the general problem of PMA, we dissect it into four categories based on the privacy requirement and the collusions among databases/parties. We map these problems into equivalent private information retrieval (PIR) problems. We propose achievable schemes for each of the four variants of the problem based on the concept of cross-subspace alignment (CSA). The proposed schemes achieve \emph{linear} communication complexity as opposed to the state-of-the-art $K$-PSI scheme that requires \emph{exponential} complexity even though our PMA problems contain more security and privacy constraints.

摘要
我们考虑private membership aggregation（PMA）问题，用户计算系统中的一个元素被多个独立点 storing 的次数。这些点不能学习用户计算的元素，也不能学习彼此的储存元素。PMA是$K$ private set intersection（$K$-PSI）问题的一般化。$K$-PSI问题中有$M$个点储存不同的元素集，并有一个用户想要找出特定元素在$M$个点中重复的至少$K$个点，而不需要学习哪个点有需要的元素和哪个点没有。为了解决PMA问题，我们将其分为四种类型根据隐私要求和数据库之间的协议。我们将这些问题转换为相应的private information retrieval（PIR）问题。我们提出了解决这四种问题的可行方案，基于横向对准（CSA）概念。我们的方案实现了线性通信复杂度，而不是现有的$K$-PSI方案，即当问题中包含更多的安全和隐私要求时，需要 exponential 复杂度。

Experimental Study of Adversarial Attacks on ML-based xApps in O-RAN

paper_url: http://arxiv.org/abs/2309.03844
repo_url: None
paper_authors: Naveen Naik Sapavath, Brian Kim, Kaushik Chowdhury, Vijay K Shah
For: This paper focuses on the vulnerability of ML models used in O-RAN to adversarial attacks, and the impact of such attacks on the performance of the entire O-RAN deployment.* Methods: The paper uses an example ML model for interference classification in near-real time (near-RT) RAN intelligent controllers (RIC), and demonstrates the vulnerability of this model to adversarial attacks through manipulation of data stored in a shared database inside the near-RT RIC.* Results: The paper shows that even small adversarial attacks can significantly decrease the accuracy of the interference classifier xApp using both clean and perturbed data, which can directly impact the performance of the entire O-RAN deployment.Here is the information in Simplified Chinese text:* For: 这篇论文关注了O-RAN中ML模型对抗 adversarial攻击的漏洞，以及这些攻击对整个O-RAN部署的影响。* Methods: 论文使用一个示例的ML模型来预测干扰类型，并在near-real time（near-RT）RAN智能控制器（RIC）中实际部署这个模型。* Results: 论文显示，же小的抗击攻击可以很快地降低ML应用程序的准确率，这直接影响整个O-RAN部署的性能。

Abstract
Open Radio Access Network (O-RAN) is considered as a major step in the evolution of next-generation cellular networks given its support for open interfaces and utilization of artificial intelligence (AI) into the deployment, operation, and maintenance of RAN. However, due to the openness of the O-RAN architecture, such AI models are inherently vulnerable to various adversarial machine learning (ML) attacks, i.e., adversarial attacks which correspond to slight manipulation of the input to the ML model. In this work, we showcase the vulnerability of an example ML model used in O-RAN, and experimentally deploy it in the near-real time (near-RT) RAN intelligent controller (RIC). Our ML-based interference classifier xApp (extensible application in near-RT RIC) tries to classify the type of interference to mitigate the interference effect on the O-RAN system. We demonstrate the first-ever scenario of how such an xApp can be impacted through an adversarial attack by manipulating the data stored in a shared database inside the near-RT RIC. Through a rigorous performance analysis deployed on a laboratory O-RAN testbed, we evaluate the performance in terms of capacity and the prediction accuracy of the interference classifier xApp using both clean and perturbed data. We show that even small adversarial attacks can significantly decrease the accuracy of ML application in near-RT RIC, which can directly impact the performance of the entire O-RAN deployment.

摘要

Novel Power-Imbalanced Dense Codebooks for Reliable Multiplexing in Nakagami Channels

paper_url: http://arxiv.org/abs/2309.03806
repo_url: None
paper_authors: Yiming Gui, Zilong Liu, Lisu Yu, Chunlei Li, Pingzhi Fan
for: 这个论文研究了在Nakagami-$m$折叠渠道上下行传输中进行增强率密集代码访问系统的设计。
methods: 这个论文首先研究了DCMA对称错误概率（PEP）在Nakagami-$m$通道上，然后提出了一种新的设计指标called minimum logarithmic sum distance（MLSD）。
results: 对于提出的MLSD，我们引入了一种新的功率不均衡率密集代码库，通过删除特定的行来实现。 simulation results显示，我们的提出的率密集代码库可以提高 Nakagami-$m$折叠渠道下的错误性能，并且在不同的扩展因子下表现出优于现有的稀疏代码多访问和传统的unimodular DCMA方案。

Abstract
This paper studies enhanced dense code multiple access (DCMA) system design for downlink transmission over the Nakagami-$m$ fading channels. By studying the DCMA pairwise error probability (PEP) in a Nakagami-$m$ channel, a novel design metric called minimum logarithmic sum distance (MLSD) is first derived. With respect to the proposed MLSD, we introduce a new family of power-imbalanced dense codebooks by deleting certain rows of a special non-unimodular circulant matrix. Simulation results demonstrate that our proposed dense codebooks lead to both larger minimum Euclidean distance and MLSD, thus yielding significant improvements of error performance over the existing sparse code multiple access and conventional unimodular DCMA schemes in Nakagami-$m$ fading channels under different overloading factors.

摘要

Space-Time Shift Keying Aided OTFS Modulation for Orthogonal Multiple Access

paper_url: http://arxiv.org/abs/2309.03771
repo_url: None
paper_authors: Zeping Sui, Hongming Zhang, Sumei Sun, Lie-Liang Yang, Lajos Hanzo
For: 提高高Doppler场景下的可靠上行传输* Methods: 使用Space-time shift keying-aided orthogonal time frequency space modulation-based multiple access (STSK-OTFS-MA)系统* Results: 提高多用户干扰耐受性和编码增益，并且可以实现更好的误差检测复杂度与性能质量之间的trade-off。

Abstract
Space-time shift keying-aided orthogonal time frequency space modulation-based multiple access (STSK-OTFS-MA) is proposed for reliable uplink transmission in high-Doppler scenarios. As a beneficial feature of our STSK-OTFS-MA system, extra information bits are mapped onto the indices of the active dispersion matrices, which allows the system to enjoy the joint benefits of both STSK and OTFS signalling. Due to the fact that both the time-, space- and DD-domain degrees of freedom are jointly exploited, our STSK-OTFS-MA achieves increased diversity and coding gains. To mitigate the potentially excessive detection complexity, the sparse structure of the equivalent transmitted symbol vector is exploited, resulting in a pair of low-complexity near-maximum likelihood (ML) multiuser detection algorithms. Explicitly, we conceive a progressive residual check-based greedy detector (PRCGD) and an iterative reduced-space check-based detector (IRCD). Then, we derive both the unconditional single-user pairwise error probability (SU-UPEP) and a tight bit error ratio (BER) union-bound for our single-user STSK-OTFS-MA system employing the ML detector. Furthermore, the discrete-input continuous-output memoryless channel (DCMC) capacity of the proposed system is derived. The optimal dispersion matrices (DMs) are designed based on the maximum attainable diversity and coding gain metrics. Finally, it is demonstrated that our STSK-OTFS-MA system achieves both a lower BER and a higher DCMC capacity than its conventional spatial modulation (SM) {and its orthogonal frequency-division multiplexing (OFDM) counterparts. As a benefit, the proposed system strikes a compelling BER vs. system complexity as well as BER vs. detection complexity trade-offs.

摘要
Space-time shift keying-aided orthogonal time frequency space modulation-based multiple access (STSK-OTFS-MA) 是一种用于可靠的上行传输的高Doppler场景中的新系统。我们的STSK-OTFS-MA系统具有一个有利的特点，即在活动扩散矩阵上对额外信息位元进行映射，从而使系统能够同时享受STSK和OTFS信号的优点。由于系统同时利用了时间、空间和DD频域的自由度，因此我们的STSK-OTFS-MA系统可以获得更高的多样性和编码增益。为了避免可能的过分复杂的检测，我们利用了稀疏结构的等效传输符号向量，并提出了一对低复杂度的Near-Maximum Likelihood（ML）多用户检测算法：进步循环剩余检查基于的滥触检测器（PRCGD）和迭代减少空间检查基于的检测器（IRCD）。然后，我们 deriv了单用户STSK-OTFS-MA系统的无条件单用户对比误差率（SU-UPEP）和紧密的 bits错误率（BER）联合上限。此外，我们还 deriv了DCMC容量。最佳的扩散矩阵（DM）是根据最大可能的多样性和编码增益度量进行设计。最终，我们证明了我们的STSK-OTFS-MA系统在BER和DCMC容量方面都高于传统的空间模ulation（SM）和orthogonal frequency-division multiplexing（OFDM）对应系统。此外，我们的系统在BER vs. 系统复杂度和检测复杂度之间实现了惊喜的质量-精度负担。

Resource Management for IRS-assisted WP-MEC Networks with Practical Phase Shift Model

paper_url: http://arxiv.org/abs/2309.03471
repo_url: None
paper_authors: Nana Li, Wanming Hao, Fuhui Zhou, Zheng Chu, Shouyi Yang, Pei Xiao
为：提高无线电力动力扩展计算能力和可持续能源供应 для低功率无线设备（WD）。* 方法：使用多个智能反射表面（IRS）来增强WP-MEC网络。jointly optimize downlink/uplink IRSs passive beamforming, downlink energy beamforming, and uplink multi-user detection (MUD) vector at HAPs, task offloading power and local computing frequency of WDs, and time slot allocation.* 结果：对于WP-MEC网络，使用实际IRS相位偏移模型可以实现更高的总计算率，比基eline方案高。

Abstract
Wireless powered mobile edge computing (WP-MEC) has been recognized as a promising solution to enhance the computational capability and sustainable energy supply for low-power wireless devices (WDs). However, when the communication links between the hybrid access point (HAP) and WDs are hostile, the energy transfer efficiency and task offloading rate are compromised. To tackle this problem, we propose to employ multiple intelligent reflecting surfaces (IRSs) to WP-MEC networks. Based on the practical IRS phase shift model, we formulate a total computation rate maximization problem by jointly optimizing downlink/uplink IRSs passive beamforming, downlink energy beamforming and uplink multi-user detection (MUD) vector at HAPs, task offloading power and local computing frequency of WDs, and the time slot allocation. Specifically, we first derive the optimal time allocation for downlink wireless energy transmission (WET) to IRSs and the corresponding energy beamforming. Next, with fixed time allocation for the downlink WET to WDs, the original optimization problem can be divided into two independent subproblems. For the WD charging subproblem, the optimal IRSs passive beamforming is derived by utilizing the successive convex approximation (SCA) method and the penalty-based optimization technique, and for the offloading computing subproblem, we propose a joint optimization framework based on the fractional programming (FP) method. Finally, simulation results validate that our proposed optimization method based on the practical phase shift model can achieve a higher total computation rate compared to the baseline schemes.

摘要
无线电力驱动边缘计算（WP-MEC）已被认为是提高无线设备（WD）的计算能力和可持续能源供应的有前途的解决方案。然而，当通信链路 между混合访问点（HAP）和WDs是敌对的时，能量传输效率和任务卸载率受到影响。为解决这个问题，我们提议使用多个智能反射表面（IRS）来加入WP-MEC网络。基于实际的IRS相位偏移模型，我们将最大化总计算率问题进行联合优化，包括下降链接IRSs的过分形成、下降能量形成和上降多用户检测（MUD）向量在HAPs、任务卸载电力和本地计算频率的WDs，以及时间槽分配。Specifically, we first derive the optimal time allocation for downlink wireless energy transmission（WET）to IRSs and the corresponding energy beamforming. Next, with fixed time allocation for the downlink WET to WDs, the original optimization problem can be divided into two independent subproblems. For the WD charging subproblem, the optimal IRSs passive beamforming is derived by utilizing the successive convex approximation（SCA）method and the penalty-based optimization technique, and for the offloading computing subproblem, we propose a joint optimization framework based on the fractional programming（FP）method. Finally, simulation results validate that our proposed optimization method based on the practical phase shift model can achieve a higher total computation rate compared to the baseline schemes.

RIS-Assisted Wireless Communications: Long-Term versus Short-Term Phase Shift Designs

paper_url: http://arxiv.org/abs/2309.03436
repo_url: None
paper_authors: Trinh Van Chien, Lam Thanh Tu, Waqas Khalid, Heejung Yu, Symeon Chatzinotas, Marco Di Renzo
for: 提高未来无线网络的覆盖可能性和性能
methods: 使用 RIS 技术和数学优化方法
results: 提高覆盖可能性和性能，比较好于几种优化方案和 benchmarkHere’s the full translation in Simplified Chinese:
for: 这篇论文是为了提高未来无线网络的覆盖可能性和性能而写的。
methods: 这篇论文使用了 RIS 技术和数学优化方法来解决这个问题。
results: 这篇论文的结果表明，使用 RIS 技术和数学优化方法可以提高覆盖可能性和性能，并且比较好于几种优化方案和 benchmark。

Abstract
Reconfigurable intelligent surface (RIS) has recently gained significant interest as an emerging technology for future wireless networks thanks to its potential for improving the coverage probability in challenging propagation environments. This paper studies an RIS-assisted propagation environment, where a source transmits data to a destination in the presence of a weak direct link. We analyze and compare RIS designs based on long-term and short-term channel statistics in terms of coverage probability and ergodic rate. For the considered optimization designs, we derive closed-form expressions for the coverage probability and ergodic rate, which explicitly unveil the impact of both the propagation environment and the RIS on the system performance. Besides the optimization of the RIS phase profile, we formulate an RIS placement optimization problem with the aim of maximizing the coverage probability by relying only on partial channel state information. An efficient algorithm is proposed based on the gradient ascent method. Simulation results are illustrated in order to corroborate the analytical framework and findings. The proposed RIS phase profile is shown to outperform several heuristic benchmarks in terms of outage probability and ergodic rate. In addition, the proposed RIS placement strategy provides an extra degree of freedom that remarkably improves system performance.

摘要
快速智能表面（RIS）在未来无线网络中已经吸引了广泛关注，因为它可以改善在困难媒体环境中的覆盖率。这篇论文研究了受助RIS的传输环境，source向目标传输数据，在弱直接链路的存在下。我们分析和比较了基于长期和短期频率统计的RIS设计，并计算了覆盖率和均衡速率。对考虑的优化设计，我们 derivated了closed-form表达式，这些表达式直接揭示了媒体环境和RIS对系统性能的影响。此外，我们提出了基于Gradient Ascent方法的RIS布局优化问题，以最大化覆盖率，只使用部分频率状态信息。实验结果表明，我们提出的RIS相位profile不仅在出入�pto�比较高，还可以在各种媒体环境下提供更好的系统性能。此外，我们的RIS布局策略提供了一个额外的自由度，可以很好地提高系统性能。

2023-09-06

cs.SD

cs.SD - 2023-09-06

Highly Controllable Diffusion-based Any-to-Any Voice Conversion Model with Frame-level Prosody Feature

paper_url: http://arxiv.org/abs/2309.03364
repo_url: None
paper_authors: Kyungguen Byun, Sunkuk Moon, Erik Visser
for: 这个论文的目的是提出一种可控的语音修饰系统，可以同时实现语音转换和语速调整。
methods: 该系统使用一个框架级别的语音特征来有效地传递框架级别的特性，包括投射和能量轨迹。这些特征被与说话者和内容嵌入一起feed到一个扩散型解码器中，生成一个已转换的语音干扰gram。另外，为了调整说话速度，系统还包括一个自我超vised模型后处理步骤，以提高可控性。
results: 该系统比一种现有的方法（SOTA）更有控制性和可读性，可以覆盖各种基频（F0）、能量和速度的变化，同时保持转换后的语音质量。

Abstract
We propose a highly controllable voice manipulation system that can perform any-to-any voice conversion (VC) and prosody modulation simultaneously. State-of-the-art VC systems can transfer sentence-level characteristics such as speaker, emotion, and speaking style. However, manipulating the frame-level prosody, such as pitch, energy and speaking rate, still remains challenging. Our proposed model utilizes a frame-level prosody feature to effectively transfer such properties. Specifically, pitch and energy trajectories are integrated in a prosody conditioning module and then fed alongside speaker and contents embeddings to a diffusion-based decoder generating a converted speech mel-spectrogram. To adjust the speaking rate, our system includes a self-supervised model based post-processing step which allows improved controllability. The proposed model showed comparable speech quality and improved intelligibility compared to a SOTA approach. It can cover a varying range of fundamental frequency (F0), energy and speed modulation while maintaining converted speech quality.

摘要
我们提出了一种高度可控的语音修饰系统，可同时实现任意语音转换（VC）和语速修饰。现状的VC系统可以传递句子水平特征，如发音人、情感和说话风格。然而，修饰帧级别的语音特征，如抽象、能量和说话速度，仍然具有挑战性。我们的提议的模型利用帧级别的语音特征来有效地传递这些特性。具体来说，我们在语音修饰模块中将抽象和能量轨迹结合在一起，然后与发音人和内容嵌入一起传递给一个基于扩散的解码器生成转换后的语音mel-spectrogram。为了调整说话速度，我们的系统包括一个基于自我超vision的后处理步骤，以提高可控性。我们的模型与State-of-the-art方法相比，能够覆盖不同的基本频率（F0）、能量和速度修饰范围，而且保持转换后语音质量。

Leveraging Geometrical Acoustic Simulations of Spatial Room Impulse Responses for Improved Sound Event Detection and Localization

paper_url: http://arxiv.org/abs/2309.03337
repo_url: https://github.com/ChrisIck/DCASE_Synth_Data
paper_authors: Christopher Ick, Brian McFee
for: 实现声音事件地图（SELD）模型的训练，为了解决现有的资料匮乏问题。
methods: 使用几何学式的单簇声学模拟来生成新的声音空间响应（SRIR）数据集，并将其用于训练SELD模型。
results: 透过实验显示，使用几何学式的单簇声学模拟可以提供相似的性能，并且可以将现有的数据集进行增强。

Abstract
As deeper and more complex models are developed for the task of sound event localization and detection (SELD), the demand for annotated spatial audio data continues to increase. Annotating field recordings with 360$^{\circ}$ video takes many hours from trained annotators, while recording events within motion-tracked laboratories are bounded by cost and expertise. Because of this, localization models rely on a relatively limited amount of spatial audio data in the form of spatial room impulse response (SRIR) datasets, which limits the progress of increasingly deep neural network based approaches. In this work, we demonstrate that simulated geometrical acoustics can provide an appealing solution to this problem. We use simulated geometrical acoustics to generate a novel SRIR dataset that can train a SELD model to provide similar performance to that of a real SRIR dataset. Furthermore, we demonstrate using simulated data to augment existing datasets, improving on benchmarks set by state of the art SELD models. We explore the potential and limitations of geometric acoustic simulation for localization and event detection. We also propose further studies to verify the limitations of this method, as well as further methods to generate synthetic data for SELD tasks without the need to record more data.

摘要

Presenting the SWTC: A Symbolic Corpus of Themes from John Williams’ Star Wars Episodes I-IX

paper_url: http://arxiv.org/abs/2309.03298
repo_url: None
paper_authors: Claire Arthur, Frank Lehman, John McNamara
for: This paper presents a new symbolic corpus of musical themes from the complete Star Wars trilogies (Episodes I-IX) by John Williams.
methods: The corpus files are made available in multiple formats (.krn, .sib, and .musicxml) and include melodic, harmonic, and formal information. The authors also introduce a new humdrum standard for non-functional harmony encodings, **harte, based on Harte (2005, 2010).
results: The Star Wars Thematic Corpus (SWTC) contains a total of 64 distinctive, recurring, and symbolically meaningful themes and motifs, commonly referred to as leitmotifs. The authors provide some brief summary statistics and hope that the SWTC will provide insights into John Williams’ compositional style and be useful in comparisons against other thematic corpora from film and beyond.

Abstract
This paper presents a new symbolic corpus of musical themes from the complete Star Wars trilogies (Episodes I-IX) by John Williams. The corpus files are made available in multiple formats (.krn, .sib, and .musicxml) and include melodic, harmonic, and formal information. The Star Wars Thematic Corpus (SWTC) contains a total of 64 distinctive, recurring, and symbolically meaningful themes and motifs, commonly referred to as leitmotifs. Through this corpus we also introduce a new humdrum standard for non-functional harmony encodings, **harte, based on Harte (2005, 2010). This report details the motivation, describes the transcription and encoding processes, and provides some brief summary statistics. While relatively small in scale, the SWTC represents a unified collection from one of the most prolific and influential composers of the 20th century, and the under-studied subset of film and multimedia musical material in general. We hope the SWTC will provide insights into John Williams' compositional style, as well as prove useful in comparisons against other thematic corpora from film and beyond.

摘要

Real-time auralization for performers on virtual stages

paper_url: http://arxiv.org/abs/2309.03149
repo_url: None
paper_authors: Ernesto Accolti, Lukas Aspöck, Manuj Yadav, Michael Vorländer
for: 这篇论文描述了一个互动系统，用于音乐表演实验室中的音响实验。
methods: 这个系统使用了一些考虑了听到自己和他人乐器的因素的实际化系统，以及考虑了视觉、模拟方法和振荡等因素。
results: 这篇论文提出了一种准确的听到自己和他人乐器的实际化系统，并且通过了对比实验和主观测试，证明了这种系统的可行性。

Abstract
This article presents an interactive system for stage acoustics experimentation including considerations for hearing one's own and others' instruments. The quality of real-time auralization systems for psychophysical experiments on music performance depends on the system's calibration and latency, among other factors (e.g. visuals, simulation methods, haptics, etc). The presented system focuses on the acoustic considerations for laboratory implementations. The calibration is implemented as a set of filters accounting for the microphone-instrument distances and the directivity factors, as well as the transducers' frequency responses. Moreover, sources of errors are characterized using both state-of-the-art information and derivations from the mathematical definition of the calibration filter. In order to compensate for hardware latency without cropping parts of the simulated impulse responses, the virtual direct sound of musicians hearing themselves is skipped from the simulation and addressed by letting the actual direct sound reach the listener through open headphones. The required latency compensation of the interactive part (i.e. hearing others) meets the minimum distance requirement between musicians, which is 2 m for the implemented system. Finally, a proof of concept is provided that includes objective and subjective experiments, which give support to the feasibility of the proposed setup.

摘要

Self-Supervised Disentanglement of Harmonic and Rhythmic Features in Music Audio Signals

paper_url: http://arxiv.org/abs/2309.02796
repo_url: https://github.com/WuYiming6526/HARD-DAFx2023
paper_authors: Yiming Wu
for: 这个论文旨在推断音乐audiо生成过程中隐藏的多个有用特征表示，以实现可控的数据生成。
methods: 该论文提出了一种基于深度神经网络的自我超vised学习方法，用于推断音乐audiо生成过程中的rhythmic和harmonic表示。在训练阶段，该方法使用变换 autoencoder 将 mel-spectrogram 转换为两个隐藏特征，代表rhythmic和harmonic内容。在每次前向计算过程中，对一个隐藏特征应用了vector rotation操作，假设这些维度对应了抑制间隔。因此，在训练过程中，rotated隐藏特征表示mel-spectrogram中的抑制相关信息，而unrotated隐藏特征表示rhythmic内容。
results: 该方法被用predictor-based disentanglement metric来评估学习的结果，并应用于自动生成音乐remixes。

Abstract
The aim of latent variable disentanglement is to infer the multiple informative latent representations that lie behind a data generation process and is a key factor in controllable data generation. In this paper, we propose a deep neural network-based self-supervised learning method to infer the disentangled rhythmic and harmonic representations behind music audio generation. We train a variational autoencoder that generates an audio mel-spectrogram from two latent features representing the rhythmic and harmonic content. In the training phase, the variational autoencoder is trained to reconstruct the input mel-spectrogram given its pitch-shifted version. At each forward computation in the training phase, a vector rotation operation is applied to one of the latent features, assuming that the dimensions of the feature vectors are related to pitch intervals. Therefore, in the trained variational autoencoder, the rotated latent feature represents the pitch-related information of the mel-spectrogram, and the unrotated latent feature represents the pitch-invariant information, i.e., the rhythmic content. The proposed method was evaluated using a predictor-based disentanglement metric on the learned features. Furthermore, we demonstrate its application to the automatic generation of music remixes.

摘要
“ latent variable disentanglement 的目标是推断数据生成过程中隐藏的多个有用特征表示，这是可控数据生成的关键因素。在这篇论文中，我们提议一种基于深度神经网络的自我超vised学习方法，用于推断音频数据生成过程中的分解特征。我们训练了一个变分自动编码器，该编码器从两个隐藏特征中生成了一个音频 mel-spectrogram，其中一个隐藏特征表示了 rhythmic 内容，另一个隐藏特征表示了 harmonic 内容。在训练阶段，变分自动编码器被训练来重建输入 mel-spectrogram，基于其滥 shift 版本。在每次前向计算中，我们对一个隐藏特征应用了一个向量旋转操作，假设这些特征维度与抑制间隔有关。因此，在训练后的变分自动编码器中，旋转隐藏特征表示音频中的抑制相关信息，而未旋转隐藏特征表示 rhythmic 内容。我们使用 predictor-based 分解度量评估学习的结果，并示出了它的应用于自动生成音乐重混。”

Simultaneous Measurement of Multiple Acoustic Attributes Using Structured Periodic Test Signals Including Music and Other Sound Materials

paper_url: http://arxiv.org/abs/2309.02767
repo_url: None
paper_authors: Hideki Kawahara, Kohei Yatabe, Ken-Ichi Sakakibara, Mitsunori Mizumachi, Tatsuya Kitamura
for: 这 paper 是用来测量音频特性的框架，包括常数时变 (LTI) 回响、信号依赖时变 (SDTI) 组成部分以及随机时变 (RTV) 部分。
methods: 这 paper 使用了结构化 periodic test signal 来测量音频特性，并且可以使用音乐作品和其他声音材料作为测试信号。
results: 这 paper 实现了一种可交互式、实时测量工具，并且开源了这些工具。此外， paper 还用这些工具对抽取器的性能进行了 объектив评估。

Abstract
We introduce a general framework for measuring acoustic properties such as liner time-invariant (LTI) response, signal-dependent time-invariant (SDTI) component, and random and time-varying (RTV) component simultaneously using structured periodic test signals. The framework also enables music pieces and other sound materials as test signals by "safeguarding" them by adding slight deterministic "noise." Measurement using swept-sin, MLS (Maxim Length Sequence), and their variants are special cases of the proposed framework. We implemented interactive and real-time measuring tools based on this framework and made them open-source. Furthermore, we applied this framework to assess pitch extractors objectively.

摘要
我们提出了一个普遍适用的测量听音属性的框架，包括线性时不变（LTI）响应、固有时不变（SDTI）组件以及随机时变（RTV）组件，同时测量这些属性。这个框架还允许使用音乐作品和其他声音材料作为测试信号，通过添加一些稳定的随机噪声来"保护"它们。使用滚动窗口、MLS（最长长度序列）和其他变体的测量方法都是该框架的特殊情况。我们还实现了基于这个框架的交互式和实时测量工具，并将其开源。此外，我们使用这个框架对抽取器进行了 объекively 的评估。

MuLanTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2023

paper_url: http://arxiv.org/abs/2309.02743
repo_url: None
paper_authors: Zhihang Xu, Shaofei Zhang, Xi Wang, Jiajun Zhang, Wenning Wei, Lei He, Sheng Zhao
for: 这项研究是为了提出一个基于Microsoft的端到端神经网络文本读取系统（TTS），旨在参加2023年Blizzard挑战。
methods: 该系统基于DelightfulTTS，采用了上下文和情感编码器来适应Audiobook数据，以提高长形句子和对话表达性。此外，还应用了采样除噪和长 audio处理等技术来提高录音质量。
results: 该系统在两个任务中获得了mean评分4.3和4.5，与自然语音 statistically comparable，同时保持了good similarity according to similarity assessment。这些结果表明该系统在两个任务中得到了优秀的效果。

Abstract
In this paper, we present MuLanTTS, the Microsoft end-to-end neural text-to-speech (TTS) system designed for the Blizzard Challenge 2023. About 50 hours of audiobook corpus for French TTS as hub task and another 2 hours of speaker adaptation as spoke task are released to build synthesized voices for different test purposes including sentences, paragraphs, homographs, lists, etc. Building upon DelightfulTTS, we adopt contextual and emotion encoders to adapt the audiobook data to enrich beyond sentences for long-form prosody and dialogue expressiveness. Regarding the recording quality, we also apply denoise algorithms and long audio processing for both corpora. For the hub task, only the 50-hour single speaker data is used for building the TTS system, while for the spoke task, a multi-speaker source model is used for target speaker fine tuning. MuLanTTS achieves mean scores of quality assessment 4.3 and 4.5 in the respective tasks, statistically comparable with natural speech while keeping good similarity according to similarity assessment. The excellent and similarity in this year's new and dense statistical evaluation show the effectiveness of our proposed system in both tasks.

摘要
在这篇论文中，我们介绍了Microsoft的终端到终点神经语音识别系统MuLanTTS，用于2023年Blizzard挑战。我们发布了50小时的法语TTS Hub任务数据和2小时的说话人适应任务数据，用于建立不同测试目的的合成声音，包括句子、段落、同义词、列表等。基于DelightfulTTS，我们采用了上下文和情感编码器，以便对audiobook数据进行拓展，以增强长形层次和对话表达性。在录音质量方面，我们还应用了雷达处理和长 audio处理等技术。在Hub任务中，我们只使用单个说话人数据进行TTS系统建立，而在 Spoke任务中，我们使用多个说话人源模型进行目标说话人细化。MuLanTTS在两个任务中获得了4.3和4.5的平均评价分，与自然语音相比，保持了良好的相似性。这一年的新和紧密的统计评价结果表明我们提出的系统在两个任务中的效果。

2023-09-06

eess.AS

eess.AS - 2023-09-06

Leveraging ASR Pretrained Conformers for Speaker Verification through Transfer Learning and Knowledge Distillation

paper_url: http://arxiv.org/abs/2309.03019
repo_url: None
paper_authors: Danwei Cai, Ming Li
for: 这篇论文探讨了使用ASR预训练的Conformers进行人识别，利用它们对语音信号的模型化能力。
methods: 我们提出了三种策略：（1）使用转移学习初始化说话者嵌入网络，提高泛化和减少过拟合。（2）在say speaker verification模型中添加 frame-level ASR损失作为辅助任务，以提高模型的灵活性。（3）使用轻量级的说话者适配器，实现高效的特征转换而不需要改变原始ASR Conformer，以便平行进行ASR和人识别。
results: 实验结果表明，转移学习可以提供0.48% EER，知识传递可以提供0.43% EER，而使用说话者适配器可以提供0.57% EER。总的来说，我们的方法可以有效地将ASR能力传递到人识别任务中。

Abstract
This paper explores the use of ASR-pretrained Conformers for speaker verification, leveraging their strengths in modeling speech signals. We introduce three strategies: (1) Transfer learning to initialize the speaker embedding network, improving generalization and reducing overfitting. (2) Knowledge distillation to train a more flexible speaker verification model, incorporating frame-level ASR loss as an auxiliary task. (3) A lightweight speaker adaptor for efficient feature conversion without altering the original ASR Conformer, allowing parallel ASR and speaker verification. Experiments on VoxCeleb show significant improvements: transfer learning yields a 0.48% EER, knowledge distillation results in a 0.43% EER, and the speaker adaptor approach, with just an added 4.92M parameters to a 130.94M-parameter model, achieves a 0.57% EER. Overall, our methods effectively transfer ASR capabilities to speaker verification tasks.

摘要
这篇论文探讨使用ASR预训练的具有相同特征的声音识别器进行说话人识别，利用其对语音信号的模型化能力。我们介绍了三种策略：（1）将说话人嵌入网络初始化为可重用的模型，以提高泛化和降低过拟合。（2）通过知识传递来训练更灵活的说话人识别模型，并将帧级ASR损失作为辅助任务。（3）一种轻量级的说话人适配器，可以高效地将特征转换而不改变原始ASR Conformer，以便并行进行ASR和说话人识别。在VoxCeleb上进行实验，我们发现：（1）将说话人嵌入网络初始化为可重用的模型可以得到0.48%的EER。（2）通过知识传递来训练更灵活的说话人识别模型可以得到0.43%的EER。（3）使用说话人适配器可以在130.94M参数的模型上添加4.92M参数，并达到0.57%的EER。总的来说，我们的方法可以有效地将ASR能力传递到说话人识别任务。

2023-09-06

cs.CV

cs.CV - 2023-09-06

Distribution-Aware Prompt Tuning for Vision-Language Models

paper_url: http://arxiv.org/abs/2309.03406
repo_url: https://github.com/mlvlab/dapt
paper_authors: Eulrang Cho, Jooyeon Kim, Hyunwoo J. Kim
for: 该研究旨在提高视觉语言模型（VLM）的表现，通过在目标任务上进行适应调整。
methods: 该研究使用了数据驱动的适应调整方法，包括在输入图像或文本中添加上下文。在学习Vector的基础上，通过对两种模式的特征空间进行对齐，提高VLM的表现。
results: 该研究的实验结果表明，使用分布规格感知调整（DAPT）方法可以显著提高VLM的总体表现，并且在11个标准测试集上进行了广泛的验证。

Abstract
Pre-trained vision-language models (VLMs) have shown impressive performance on various downstream tasks by utilizing knowledge learned from large data. In general, the performance of VLMs on target tasks can be further improved by prompt tuning, which adds context to the input image or text. By leveraging data from target tasks, various prompt-tuning methods have been studied in the literature. A key to prompt tuning is the feature space alignment between two modalities via learnable vectors with model parameters fixed. We observed that the alignment becomes more effective when embeddings of each modality are `well-arranged' in the latent space. Inspired by this observation, we proposed distribution-aware prompt tuning (DAPT) for vision-language models, which is simple yet effective. Specifically, the prompts are learned by maximizing inter-dispersion, the distance between classes, as well as minimizing the intra-dispersion measured by the distance between embeddings from the same class. Our extensive experiments on 11 benchmark datasets demonstrate that our method significantly improves generalizability. The code is available at https://github.com/mlvlab/DAPT.

摘要
传统的视觉语言模型（VLM）已经在多个下游任务上表现出色，通过利用大量数据学习知识。在总的来说，VLM的目标任务表现可以通过提示调整进一步提高，这种方法可以通过给输入图像或文本添加上下文来实现。利用目标任务的数据，文献中有多种提示调整方法的研究。一个关键在提示调整中是在两个模式之间的特征空间对齐，通过学习向量并固定模型参数。我们发现，每个模式的特征空间的对齐会在嵌入空间中更加有效。 Drawing inspiration from this observation, we proposed distribution-aware prompt tuning (DAPT) for vision-language models, which is simple yet effective. Specifically, the prompts are learned by maximizing inter-dispersion, the distance between classes, as well as minimizing the intra-dispersion measured by the distance between embeddings from the same class. Our extensive experiments on 11 benchmark datasets demonstrate that our method significantly improves generalizability. 代码可以在 https://github.com/mlvlab/DAPT 上获取。

Reasonable Anomaly Detection in Long Sequences

paper_url: http://arxiv.org/abs/2309.03401
repo_url: https://github.com/allenyljiang/anomaly-detection-in-sequences
paper_authors: Yalong Jiang, Changkang Li
for: 本文提出了一种新的视频异常检测方法，以解决现有方法中的示例有限性问题。
methods: 本文使用了栈搅拌机制（SSM）模型，通过长期序列学习表示物体的运动模式，并根据过去的状态预测未来状态，从而检测异常情况。
results: 经过广泛的实验评估，本文的方法能够在数据集和现有方法上显著提高异常检测性能。

Abstract
Video anomaly detection is a challenging task due to the lack in approaches for representing samples. The visual representations of most existing approaches are limited by short-term sequences of observations which cannot provide enough clues for achieving reasonable detections. In this paper, we propose to completely represent the motion patterns of objects by learning from long-term sequences. Firstly, a Stacked State Machine (SSM) model is proposed to represent the temporal dependencies which are consistent across long-range observations. Then SSM model functions in predicting future states based on past ones, the divergence between the predictions with inherent normal patterns and observed ones determines anomalies which violate normal motion patterns. Extensive experiments are carried out to evaluate the proposed approach on the dataset and existing ones. Improvements over state-of-the-art methods can be observed. Our code is available at https://github.com/AllenYLJiang/Anomaly-Detection-in-Sequences.

摘要
视频异常检测是一项复杂的任务，主要因为缺乏对样本的表示方法。现有的大多数方法的视觉表示有限，短期序列观察数据无法提供足够的信息以实现有效的检测。在本文中，我们提出了完全基于长期序列学习的物体运动模式表示方法。首先，我们提出了堆叠状态机制（SSM）模型，用于表示时间相关性，这些相关性在长距离观察中保持一致。然后，SSM模型可以基于过去的状态预测未来状态，与内在的正常模式相比，如果存在异常情况，则视为异常。我们对数据集和现有方法进行了广泛的实验，并观察到了我们的方法与之前的状态艺术方法的改进。代码可以在https://github.com/AllenYLJiang/Anomaly-Detection-in-Sequences中找到。

A novel method for iris recognition using BP neural network and parallel computing by the aid of GPUs (Graphics Processing Units)

paper_url: http://arxiv.org/abs/2309.03390
repo_url: None
paper_authors: Farahnaz Hosseini, Hossein Ebrahimpour, Samaneh Askari
for: 该文章旨在探讨一种新的芳华识别系统设计方法。
methods: 该方法首先从芳华图像中提取了Haar波浪特征，这些特征具有快速抽取和对每个芳华唯一的优点。然后使用了后向传播神经网络（BPNN）作为分类器。
results: 该系统的性能和加速结果在文章中展示了，使用GPU和CUDA实现的BPNN并行算法可以加速学习过程。

Abstract
In this paper, we seek a new method in designing an iris recognition system. In this method, first the Haar wavelet features are extracted from iris images. The advantage of using these features is the high-speed extraction, as well as being unique to each iris. Then the back propagation neural network (BPNN) is used as a classifier. In this system, the BPNN parallel algorithms and their implementation on GPUs have been used by the aid of CUDA in order to speed up the learning process. Finally, the system performance and the speeding outcomes in a way that this algorithm is done in series are presented.

摘要
在这篇论文中，我们寻找一种新的方法来设计一个萝卤识别系统。在这种方法中，首先从萝卤图像中提取Haar浪声特征。这些特征的优点是快速提取和唯一性，可以用于识别每个萝卤。然后，我们使用背投传播神经网络（BPNN）作为分类器。在这个系统中，我们使用GPU上的并行算法和CUDA技术来加速学习过程。最后，我们展示了这个算法的性能和加速效果。

Kidney abnormality segmentation in thorax-abdomen CT scans

paper_url: http://arxiv.org/abs/2309.03383
repo_url: None
paper_authors: Gabriel Efrain Humpire Mamani, Nikolas Lessmann, Ernst Th. Scholten, Mathias Prokop, Colin Jacobs, Bram van Ginneken
for: This paper aims to support clinicians in identifying and quantifying renal abnormalities such as cysts, lesions, masses, metastases, and primary tumors through the use of deep learning for segmenting kidney parenchyma and kidney abnormalities.
methods: The paper introduces an end-to-end segmentation method that utilizes a modified 3D U-Net network with four additional components: end-to-end multi-resolution approach, task-specific data augmentations, modified loss function using top-$k$, and spatial dropout. The method was trained on 215 contrast-enhanced thoracic-abdominal CT scans.
results: The paper reports that the best-performing model achieved Dice scores of 0.965 and 0.947 for segmenting kidney parenchyma in two test sets, outperforming an independent human observer. The method also achieved a Dice score of 0.585 for segmenting kidney abnormalities within the 30 test scans containing them, suggesting potential for further improvement in computerized methods.

Abstract
In this study, we introduce a deep learning approach for segmenting kidney parenchyma and kidney abnormalities to support clinicians in identifying and quantifying renal abnormalities such as cysts, lesions, masses, metastases, and primary tumors. Our end-to-end segmentation method was trained on 215 contrast-enhanced thoracic-abdominal CT scans, with half of these scans containing one or more abnormalities. We began by implementing our own version of the original 3D U-Net network and incorporated four additional components: an end-to-end multi-resolution approach, a set of task-specific data augmentations, a modified loss function using top-$k$, and spatial dropout. Furthermore, we devised a tailored post-processing strategy. Ablation studies demonstrated that each of the four modifications enhanced kidney abnormality segmentation performance, while three out of four improved kidney parenchyma segmentation. Subsequently, we trained the nnUNet framework on our dataset. By ensembling the optimized 3D U-Net and the nnUNet with our specialized post-processing, we achieved marginally superior results. Our best-performing model attained Dice scores of 0.965 and 0.947 for segmenting kidney parenchyma in two test sets (20 scans without abnormalities and 30 with abnormalities), outperforming an independent human observer who scored 0.944 and 0.925, respectively. In segmenting kidney abnormalities within the 30 test scans containing them, the top-performing method achieved a Dice score of 0.585, while an independent second human observer reached a score of 0.664, suggesting potential for further improvement in computerized methods. All training data is available to the research community under a CC-BY 4.0 license on https://doi.org/10.5281/zenodo.8014289

摘要
在这个研究中，我们介绍了一种深度学习方法用于分割肾脏和肾脏疾病，以支持临床医生在识别和评估肾脏疾病，如肿瘤、抑郁、肿瘤和原发性肾脏癌。我们的终端 segmentation 方法在 215 个对比增强的 thoracic-abdominal CT 扫描图像上进行训练，其中半数图像含有一个或多个疾病。我们开始实现我们自己的版本的原始 3D U-Net 网络，并添加了四个附加组件：一个终端多分辨率方法、一组任务特定的数据增强、修改后的 top-$k$ 损失函数和空间抽象。此外，我们设计了特制的后处理策略。ablation 研究表明，每一个修改都提高了肾脏疾病 segmentation 性能，而三个中提高了肾脏正常组织 segmentation。然后，我们将 nnUNet 框架进行训练。通过将优化的 3D U-Net 和 nnUNet 与我们特制的后处理 ensemble，我们实现了微妙的提高。我们最佳性能模型在两个测试集（20 个无疾病扫描图像和 30 个含疾病扫描图像）中，对肾脏正常组织 segmentation 取得了 dice 分数为 0.965 和 0.947，超过了一名独立的人类观察员，其分数为 0.944 和 0.925。在 segmenting 肾脏疾病内的 30 个测试扫描图像中，我们的最佳方法取得了 dice 分数为 0.585，而第二名独立的人类观察员达到了分数为 0.664，表明计算机化方法还有很大的提高空间。所有训练数据都可以通过 CC-BY 4.0 LICENSE 在 https://doi.org/10.5281/zenodo.8014289 上获得。

Active shooter detection and robust tracking utilizing supplemental synthetic data

paper_url: http://arxiv.org/abs/2309.03381
repo_url: None
paper_authors: Joshua R. Waite, Jiale Feng, Riley Tavassoli, Laura Harris, Sin Yong Tan, Subhadeep Chakraborty, Soumik Sarkar
for: 针对美国gun violence问题的增长关注，提出了开发公共安全系统的想法，其中一种方法是探测和跟踪射击者，以预防或减轻暴力事件的影响。
methods: 本文提出了探测射击者而不是只是枪支，以提高跟踪鲁棒性，因为隐藏枪支不再导致系统产生失去威胁的情况。为了解决公共数据的限制和创造困难，本文使用域随机化和传输学习，以提高模型在不同情况下的普适性。
results: 使用YOLOv8和Deep OC-SORT，实现了一个初步版本的射击者跟踪系统，可以在边缘硬件上运行，包括Raspberry Pi和Jetson Nano。

Abstract
The increasing concern surrounding gun violence in the United States has led to a focus on developing systems to improve public safety. One approach to developing such a system is to detect and track shooters, which would help prevent or mitigate the impact of violent incidents. In this paper, we proposed detecting shooters as a whole, rather than just guns, which would allow for improved tracking robustness, as obscuring the gun would no longer cause the system to lose sight of the threat. However, publicly available data on shooters is much more limited and challenging to create than a gun dataset alone. Therefore, we explore the use of domain randomization and transfer learning to improve the effectiveness of training with synthetic data obtained from Unreal Engine environments. This enables the model to be trained on a wider range of data, increasing its ability to generalize to different situations. Using these techniques with YOLOv8 and Deep OC-SORT, we implemented an initial version of a shooter tracking system capable of running on edge hardware, including both a Raspberry Pi and a Jetson Nano.

摘要
随着美国的枪击事件频率的增长，有关公众安全的关注也在不断增加。为了开发一个可以提高公众安全的系统，一种方法是检测和跟踪射击者，以防止或减轻暴力事件的影响。在这篇论文中，我们提出了检测射击者的整体方法，而不仅仅是检测枪支，这将允许更好地跟踪射击者，即使枪支被遮住也不会导致系统丢失跟踪。然而，公共可用的射击者数据比枪支数据更加有限和困难生成。因此，我们研究了采用域随机化和传输学习来提高训练 synthetic 数据的效果。这使得模型可以在更广泛的数据上训练，从而提高其对不同情况的适应能力。使用这些技术和 YOLOv8 和 Deep OC-SORT，我们实现了一个可以在边缘硬件上运行的射击者跟踪系统，包括 Raspberry Pi 和 Jetson Nano。

ViewMix: Augmentation for Robust Representation in Self-Supervised Learning

paper_url: http://arxiv.org/abs/2309.03360
repo_url: None
paper_authors: Arjon Das, Xin Zhong
for: 这 paper 的目的是提出一种基于自动检测的视角拼接策略，以提高基于自助学习的表示学习能力。
methods: 这 paper 使用了一种基于自动检测的视角拼接策略，并与多种基于自助学习的表示学习方法进行比较。
results: 该 paper 表明，通过使用 ViewMix 策略，可以提高基于自助学习的表示学习方法的地方化能力和稳定性。

Abstract
Joint Embedding Architecture-based self-supervised learning methods have attributed the composition of data augmentations as a crucial factor for their strong representation learning capabilities. While regional dropout strategies have proven to guide models to focus on lesser indicative parts of the objects in supervised methods, it hasn't been adopted by self-supervised methods for generating positive pairs. This is because the regional dropout methods are not suitable for the input sampling process of the self-supervised methodology. Whereas dropping informative pixels from the positive pairs can result in inefficient training, replacing patches of a specific object with a different one can steer the model from maximizing the agreement between different positive pairs. Moreover, joint embedding representation learning methods have not made robustness their primary training outcome. To this end, we propose the ViewMix augmentation policy, specially designed for self-supervised learning, upon generating different views of the same image, patches are cut and pasted from one view to another. By leveraging the different views created by this augmentation strategy, multiple joint embedding-based self-supervised methodologies obtained better localization capability and consistently outperformed their corresponding baseline methods. It is also demonstrated that incorporating ViewMix augmentation policy promotes robustness of the representations in the state-of-the-art methods. Furthermore, our experimentation and analysis of compute times suggest that ViewMix augmentation doesn't introduce any additional overhead compared to other counterparts.

摘要
joint embedding architecture-based self-supervised learning方法中，数据增强的组合被认为是关键因素，对于强化表示学习能力。而regional dropout策略在supervised方法中已经证明了导models专注于更加不显示的部分，但是它们没有被采用于自我监督方法中，这是因为regional dropout方法不适用于自我监督方法的输入采样过程。 dropped informative pixels from the positive pairs can result in inefficient training，而且 replacing patches of a specific object with a different one can steer the model away from maximizing the agreement between different positive pairs。 joint embedding representation learning方法没有让robustness成为主要训练目标。为此，我们提出了ViewMixaugmentation policy，专门针对自我监督学习。在生成不同视图的同一个图像上，将patches cut和paste到另一个视图中。通过利用不同的视图，生成的多个joint embedding-based self-supervised方法在本地化能力方面表现出色，并且与基eline方法相比，表现出了明显的提升。此外，我们的实验和分析表明，ViewMix augmentation不会增加任何额外的计算时间。

paper_url: http://arxiv.org/abs/2309.03353
repo_url: None
paper_authors: Venkata Udaya Sameer, Shilpa Mukhopadhyay, Ruchira Naskar, Ishaan Dali
for: 本研究旨在验证视频源的authenticity和来源，以确定视频是否实际来源于声明的来源。
methods: 本研究使用特征提取、特征选择和后续源分类来实现视频源识别。
results: 我们的实验结果表明，提出的方法比传统的指纹基本方法更高效。

Abstract
Source camera identification in digital videos is the problem of associating an unknown digital video with its source device, within a closed set of possible devices. The existing techniques in source detection of digital videos try to find a fingerprint of the actual source in the video in form of PRNU (Photo Response Non--Uniformity), and match it against the SPN (Sensor Pattern Noise) of each possible device. The highest correlation indicates the correct source. We investigate the problem of identifying a video source through a feature based approach using machine learning. In this paper, we present a blind forensic technique of video source authentication and identification, based on feature extraction, feature selection and subsequent source classification. The main aim is to determine whether a claimed source for a video is actually its original source. If not, we identify its original source. Our experimental results prove the efficiency of the proposed method compared to traditional fingerprint based technique.

摘要
源码 identificatin in digital videos 是一个关键问题，即将未知的数字视频与其源设备相关联，在一个封闭的设备集中。现有的数字视频源检测技术尝试找到视频中的实际源print（PRNU），并将其与每个可能的设备SPN（感光器环境噪）进行匹配。最高的相关性指示正确的源。我们 investigate了基于特征分析的视频源认证和identification方法，以实现视频源的潜在验证和确定。我们的实验结果表明，我们提posed方法比传统的指纹基本技术更高效。Here's a word-for-word translation of the text:源码标识在数字视频中是一个关键问题，即将未知的数字视频与其源设备相关联，在一个封闭的设备集中。现有的数字视频源检测技术尝试找到视频中的实际源印记（PRNU），并将其与每个可能的设备感光器环境噪（SPN）进行匹配。最高的相关性指示正确的源。我们 investigate了基于特征分析的视频源认证和identification方法，以实现视频源的潜在验证和确定。我们的实验结果表明，我们提posed方法比传统的指纹基本技术更高效。

Using Neural Networks for Fast SAR Roughness Estimation of High Resolution Images

paper_url: http://arxiv.org/abs/2309.03351
repo_url: https://github.com/jeovafarias/sar-roughness-estimation-neural-nets
paper_authors: Li Fan, Jeova Farias Sales Rocha Neto
for: 这个论文主要是为了提出一种基于神经网络的高分辨率Synthetic Aperture Radar（SAR）图像分析方法，以解决SAR图像中的杂点噪声问题。
methods: 这个方法首先学习了G_I^0分布下数据的模型，然后可以从SAR图像中提取杂点信息，并用于后续的图像处理任务，如分 segmentation、分类和解释。
results: 该方法可以快速和可靠地估计SAR图像中的杂点参数，尤其是高分辨率图像。此外，该方法还可以扩展到处理图像输入，并且可以使用简单的神经网络来实现实时像素粗糙度估计。

Abstract
The analysis of Synthetic Aperture Radar (SAR) imagery is an important step in remote sensing applications, and it is a challenging problem due to its inherent speckle noise. One typical solution is to model the data using the $G_I^0$ distribution and extract its roughness information, which in turn can be used in posterior imaging tasks, such as segmentation, classification and interpretation. This leads to the need of quick and reliable estimation of the roughness parameter from SAR data, especially with high resolution images. Unfortunately, traditional parameter estimation procedures are slow and prone to estimation failures. In this work, we proposed a neural network-based estimation framework that first learns how to predict underlying parameters of $G_I^0$ samples and then can be used to estimate the roughness of unseen data. We show that this approach leads to an estimator that is quicker, yields less estimation error and is less prone to failures than the traditional estimation procedures for this problem, even when we use a simple network. More importantly, we show that this same methodology can be generalized to handle image inputs and, even if trained on purely synthetic data for a few seconds, is able to perform real time pixel-wise roughness estimation for high resolution real SAR imagery.

摘要
“Synthetic Aperture Radar（SAR）影像分析是远程感知应用中的一个重要步骤，但是它受到自然的雷达噪声的限制。一种常见的解决方案是使用$G_I^0$分布来模型数据，并从其中提取噪声信息，以便在后续的图像处理任务中使用，如分 segmentation、分类和解释。这导致了高分辨率图像的快速和可靠地Estimation of roughness parameter的需求。然而，传统的参数估计方法是慢并且容易出现估计错误。在这项工作中，我们提出了基于神经网络的估计框架，它可以先预测underlying参数的$G_I^0$样本，然后用于估计未看到的数据中的粗糙度。我们显示了这种方法比传统估计方法更快、更准确、更可靠，即使使用简单的网络。更重要的是，我们显示了这种方法可以扩展到处理图像输入，且即使只使用了几秒钟的Synthetic数据，仍能在实时进行每个像素粗糙度估计。”

SADIR: Shape-Aware Diffusion Models for 3D Image Reconstruction

paper_url: http://arxiv.org/abs/2309.03335
repo_url: None
paper_authors: Nivetha Jayakumar, Tonmoy Hossain, Miaomiao Zhang
for: 本研究旨在提高3D图像重建的精度和shape结构保持性，使用深度学习模型。
methods: 本研究提出了一种基于扩散模型的shape-aware网络（SADIR），通过同时学习mean shape和变换模型来指导3D图像重建。
results: 对于大脑和心脏MRIs，我们的方法SADIR比基eline方法有更低的重建错误和更好地保持object shape结构。

Abstract
3D image reconstruction from a limited number of 2D images has been a long-standing challenge in computer vision and image analysis. While deep learning-based approaches have achieved impressive performance in this area, existing deep networks often fail to effectively utilize the shape structures of objects presented in images. As a result, the topology of reconstructed objects may not be well preserved, leading to the presence of artifacts such as discontinuities, holes, or mismatched connections between different parts. In this paper, we propose a shape-aware network based on diffusion models for 3D image reconstruction, named SADIR, to address these issues. In contrast to previous methods that primarily rely on spatial correlations of image intensities for 3D reconstruction, our model leverages shape priors learned from the training data to guide the reconstruction process. To achieve this, we develop a joint learning network that simultaneously learns a mean shape under deformation models. Each reconstructed image is then considered as a deformed variant of the mean shape. We validate our model, SADIR, on both brain and cardiac magnetic resonance images (MRIs). Experimental results show that our method outperforms the baselines with lower reconstruction error and better preservation of the shape structure of objects within the images.

摘要
三维图像重建从有限数量的二维图像是计算机视觉和图像分析领域的长期挑战。虽然深度学习基本方法在这一领域已经取得了很好的成绩，但现有的深度网络 oftentimes 不够利用图像中对象的形态结构。为此，重建对象的topology可能不会很好地保留，导致图像中的缺陷，如缺失、洞、或者不同部分之间的不一致。在这篇论文中，我们提出了一种基于扩散模型的形态意识网络，名为SADIR，以解决这些问题。与先前的方法不同，我们的模型不仅仅依靠图像中的空间相关性来进行三维重建，而且同时学习图像中对象的形态特征。为此，我们开发了一个联合学习网络，该网络同时学习一个变换模型中的平均形态。每个重建的图像都被视为变换模型中的一个扭曲变体。我们验证了我们的模型，SADIR，在脑和心脏磁共振图像（MRI）上。实验结果表明，我们的方法在比较基eline上下降重建错误和更好地保留图像中对象的形态结构。

Expert Uncertainty and Severity Aware Chest X-Ray Classification by Multi-Relationship Graph Learning

paper_url: http://arxiv.org/abs/2309.03331
repo_url: None
paper_authors: Mengliang Zhang, Xinyue Hu, Lin Gu, Liangchen Liu, Kazuma Kobayashi, Tatsuya Harada, Ronald M. Summers, Yingying Zhu
for: 这篇论文的目的是为了提高胸部X线成像（CXR）报告中的疾病标签准确性，因为胸部X线成像诊断常常需要处理多种肺病，并且鉴别这些疾病的细节和患者的状况不同，因此鉴别结果可能会受到诊断者的干扰。
methods: 这篇论文使用了一个规律基于关键字的方法来重新提取CXR报告中的疾病标签，并且还使用了一个多关系图学习方法，以及一个专家不确定意识感损失函数，以提高验证结果的解释性。
results: 这篇论文的实验结果显示，考虑疾病严重程度和不确定性的模型可以超越先前的州OF-THE-ART方法的性能。

Abstract
Patients undergoing chest X-rays (CXR) often endure multiple lung diseases. When evaluating a patient's condition, due to the complex pathologies, subtle texture changes of different lung lesions in images, and patient condition differences, radiologists may make uncertain even when they have experienced long-term clinical training and professional guidance, which makes much noise in extracting disease labels based on CXR reports. In this paper, we re-extract disease labels from CXR reports to make them more realistic by considering disease severity and uncertainty in classification. Our contributions are as follows: 1. We re-extracted the disease labels with severity and uncertainty by a rule-based approach with keywords discussed with clinical experts. 2. To further improve the explainability of chest X-ray diagnosis, we designed a multi-relationship graph learning method with an expert uncertainty-aware loss function. 3. Our multi-relationship graph learning method can also interpret the disease classification results. Our experimental results show that models considering disease severity and uncertainty outperform previous state-of-the-art methods.

摘要
患者在胸部X射线检查（CXR）时经常uffer from multiple lung diseases. 评估患者情况时，由于复杂的疾病特征、不同肺脏病变的柔软Texture changes和患者状况差异，辐射学家可能会做出不确定的诊断，即使他们拥有长期临床训练和专业指导。这会导致CXR报告中的疾病标签EXTRACTION具有较高的噪音。在这篇论文中，我们重新EXTRACT了CXR报告中的疾病标签，以使其更加真实性。我们的贡献如下：1. 我们使用规则基本的方法与临床专家讨论的关键词来重新EXTRACT疾病标签，考虑疾病严重性和不确定性。2. 为了进一步提高胸部X射线诊断的解释性，我们设计了一种多关系图学习方法，并使用专家不确定性感知损失函数。3. 我们的多关系图学习方法还可以解释肺病分类结果。我们的实验结果表明，考虑疾病严重性和不确定性的模型比前一个状态的方法表现更好。

MEGANet: Multi-Scale Edge-Guided Attention Network for Weak Boundary Polyp Segmentation

paper_url: http://arxiv.org/abs/2309.03329
repo_url: https://github.com/dinhhieuhoang/meganet
paper_authors: Nhat-Tan Bui, Dinh-Hieu Hoang, Quang-Thuc Nguyen, Minh-Triet Tran, Ngan Le
for: 该研究旨在提高肠内部肿瘤分割的精度，以促进早期肠癌诊断。
methods: 该研究提出了一种多尺度Edge-Guided Attention网络（MEGANet），结合了经典的边检测技术和注意机制，以提高肿瘤边界的定义。
results: 对五个 benchmark 数据集进行了广泛的实验，并表明了我们的EGANet在六个评价指标上超越了现有的SOTA方法。

Abstract
Efficient polyp segmentation in healthcare plays a critical role in enabling early diagnosis of colorectal cancer. However, the segmentation of polyps presents numerous challenges, including the intricate distribution of backgrounds, variations in polyp sizes and shapes, and indistinct boundaries. Defining the boundary between the foreground (i.e. polyp itself) and the background (surrounding tissue) is difficult. To mitigate these challenges, we propose Multi-Scale Edge-Guided Attention Network (MEGANet) tailored specifically for polyp segmentation within colonoscopy images. This network draws inspiration from the fusion of a classical edge detection technique with an attention mechanism. By combining these techniques, MEGANet effectively preserves high-frequency information, notably edges and boundaries, which tend to erode as neural networks deepen. MEGANet is designed as an end-to-end framework, encompassing three key modules: an encoder, which is responsible for capturing and abstracting the features from the input image, a decoder, which focuses on salient features, and the Edge-Guided Attention module (EGA) that employs the Laplacian Operator to accentuate polyp boundaries. Extensive experiments, both qualitative and quantitative, on five benchmark datasets, demonstrate that our EGANet outperforms other existing SOTA methods under six evaluation metrics. Our code is available at \url{https://github.com/DinhHieuHoang/MEGANet}

摘要
高效的贝壳分割在医疗领域对抗肝癌的诊断扮演了关键角色。然而，贝壳分割存在许多挑战，包括贝壳的复杂分布、贝壳大小和形状的变化，以及边界不明确。准确地定义贝壳和背景之间的边界是困难的。为了解决这些挑战，我们提议一种适应贝壳分割的多尺度 Edge-Guided Attention 网络（MEGANet）。这个网络灵感来自于对精度 Edge Detection 技术和注意机制的融合。通过这些技术的组合，MEGANet 能够保留高频信息，包括边缘和边界，这些信息在神经网络深化时往往会丢失。MEGANet 是一个端到端框架，包括一个编码器，负责从输入图像中提取和抽象特征，一个解码器，专注于突出特征，以及Edge-Guided Attention 模块（EGA），使用拉普拉斯运算符来强调贝壳边界。我们在五个参考数据集进行了广泛的实验，包括质量和量化的评估，显示我们的 EGANet 在六个评价指标上表现出色，超过了现有的 State-of-the-Art 方法。我们的代码可以在 GitHub 上找到：https://github.com/DinhHieuHoang/MEGANet。

C-CLIP: Contrastive Image-Text Encoders to Close the Descriptive-Commentative Gap

paper_url: http://arxiv.org/abs/2309.03921
repo_url: None
paper_authors: William Theisen, Walter Scheirer
for: 这个论文的目的是为了提高社交媒体上图文对应的 Multimodal 模型的性能，以便在不同语言和平台上进行图文对应任务。
methods: 这个论文使用的方法是使用对称的图文对应模型，并在这些模型中使用特定的训练数据来提高图文对应的性能。
results: 论文的结果表明，使用特定的训练数据可以大幅提高图文对应的性能，并且这些结果可以在多种非英语语言上进行应用。

Abstract
The interplay between the image and comment on a social media post is one of high importance for understanding its overall message. Recent strides in multimodal embedding models, namely CLIP, have provided an avenue forward in relating image and text. However the current training regime for CLIP models is insufficient for matching content found on social media, regardless of site or language. Current CLIP training data is based on what we call ``descriptive'' text: text in which an image is merely described. This is something rarely seen on social media, where the vast majority of text content is ``commentative'' in nature. The captions provide commentary and broader context related to the image, rather than describing what is in it. Current CLIP models perform poorly on retrieval tasks where image-caption pairs display a commentative relationship. Closing this gap would be beneficial for several important application areas related to social media. For instance, it would allow groups focused on Open-Source Intelligence Operations (OSINT) to further aid efforts during disaster events, such as the ongoing Russian invasion of Ukraine, by easily exposing data to non-technical users for discovery and analysis. In order to close this gap we demonstrate that training contrastive image-text encoders on explicitly commentative pairs results in large improvements in retrieval results, with the results extending across a variety of non-English languages.

摘要
社交媒体文章的图文对话是理解总体信息的关键之一。现在的多模式嵌入模型CLIP已经提供了前进的方向，但现在CLIP模型的训练方式是不足以满足社交媒体上的内容的。现有CLIP训练数据基于我们称之为“描述性”文本：文本中描述图片的内容。这在社交媒体上非常少见，大多数文本内容是“评论性”的，图片的caption提供了更广泛的背景和评论。现有CLIP模型在图片-caption对应 зада务中表现不佳。将这个差距减少会对社交媒体相关应用领域带来多个重要的好处，例如在灾难事件中，如现在俄罗斯入侵乌克兰的进行开源情报操作（OSINT）的团队能够更好地帮助努力。为了减少这个差距，我们展示了在使用明确评论对应的图片-文本对应器进行训练后， Retrieval结果得到了大幅提高，这些结果在多种非英语语言上都能够扩展。

CoNeS: Conditional neural fields with shift modulation for multi-sequence MRI translation

paper_url: http://arxiv.org/abs/2309.03320
repo_url: https://github.com/cyjdswx/cones
paper_authors: Yunjie Chen, Marius Staring, Olaf M. Neve, Stephan R. Romeijn, Erik F. Hensen, Berit M. Verbist, Jelmer M. Wolterink, Qian Tao
for: 这个研究的目的是提出一种可以Synthesize missing MRI sequence的方法，以便在诊断过程中使用深度学习模型。
methods: 这个方法使用Conditional Neural fields with Shift modulation（CoNeS）模型，将 voxel 坐标作为输入，并使用多层感知扩展（MLP）作为解oder，以实现像素到像素的映射。
results: 实验结果显示，提出的方法可以较好地透过多个序列的MRI资料进行翻译，并且可以更好地保留高频率细节。此外，实验还显示了这个方法可以扩展到诊断下游任务中，例如分类 tasks。

Abstract
Multi-sequence magnetic resonance imaging (MRI) has found wide applications in both modern clinical studies and deep learning research. However, in clinical practice, it frequently occurs that one or more of the MRI sequences are missing due to different image acquisition protocols or contrast agent contraindications of patients, limiting the utilization of deep learning models trained on multi-sequence data. One promising approach is to leverage generative models to synthesize the missing sequences, which can serve as a surrogate acquisition. State-of-the-art methods tackling this problem are based on convolutional neural networks (CNN) which usually suffer from spectral biases, resulting in poor reconstruction of high-frequency fine details. In this paper, we propose Conditional Neural fields with Shift modulation (CoNeS), a model that takes voxel coordinates as input and learns a representation of the target images for multi-sequence MRI translation. The proposed model uses a multi-layer perceptron (MLP) instead of a CNN as the decoder for pixel-to-pixel mapping. Hence, each target image is represented as a neural field that is conditioned on the source image via shift modulation with a learned latent code. Experiments on BraTS 2018 and an in-house clinical dataset of vestibular schwannoma patients showed that the proposed method outperformed state-of-the-art methods for multi-sequence MRI translation both visually and quantitatively. Moreover, we conducted spectral analysis, showing that CoNeS was able to overcome the spectral bias issue common in conventional CNN models. To further evaluate the usage of synthesized images in clinical downstream tasks, we tested a segmentation network using the synthesized images at inference.

摘要
多序列核磁共振成像（MRI）在现代临床研究和深度学习中得到了广泛应用。然而，在临床实践中，由于不同的图像获取协议或患者的荷物禁忌，导致MRI序列中有一些图像缺失，这限制了使用深度学习模型训练在多序列数据上的应用。一种可能的方法是使用生成模型来生成缺失的序列，这可以作为训练深度学习模型的供应。现状的方法通常基于卷积神经网络（CNN），它们通常受到频率偏好的影响，导致重建高频率细节的表现不佳。在这篇论文中，我们提出了基于 Conditional Neural fields with Shift modulation（CoNeS）的方法。CoNeS 模型接受 voxel 坐标作为输入，并学习一种用于多序列 MRI 翻译的目标图像表示方式。我们使用多层感知神经网络（MLP）作为像素到像素映射的解码器，因此每个目标图像都被表示为一个 conditional neural field，通过 shift modulation 和学习的隐藏代码来conditioned 于源图像。我们在 BraTS 2018 和一个内部临床数据集中进行了实验，并证明了我们的方法在多序列 MRI 翻译中超过了现状方法的视觉和量化性能。此外，我们还进行了频谱分析，表明 CoNeS 能够超越常见的频谱偏好问题。为了进一步评估生成的图像在临床下渠道任务中的使用，我们在推理阶段使用生成的图像进行分割网络的测试。

Bayes’ Rays: Uncertainty Quantification for Neural Radiance Fields

paper_url: http://arxiv.org/abs/2309.03185
repo_url: https://github.com/BayesRays/BayesRays
paper_authors: Lily Goli, Cody Reading, Silvia Sellán, Alec Jacobson, Andrea Tagliasacchi
for: 这个论文是为了评估透视场景中神经辐射场景（NeRFs）的不确定性。
methods: 该论文提出了一种后期框架，可以无需修改训练过程来评估神经辐射场景中的不确定性。该方法基于空间干扰和 bayesian laplaceapproximation来建立三维不确定性场景。
results: 该论文通过统计 derivation 和实验结果显示了其在关键指标和应用中的superior性能。更多结果可以在https://bayesrays.github.io/查看。

Abstract
Neural Radiance Fields (NeRFs) have shown promise in applications like view synthesis and depth estimation, but learning from multiview images faces inherent uncertainties. Current methods to quantify them are either heuristic or computationally demanding. We introduce BayesRays, a post-hoc framework to evaluate uncertainty in any pre-trained NeRF without modifying the training process. Our method establishes a volumetric uncertainty field using spatial perturbations and a Bayesian Laplace approximation. We derive our algorithm statistically and show its superior performance in key metrics and applications. Additional results available at: https://bayesrays.github.io.

摘要
neural radiance fields (NeRFs) 已经在视图合成和深度估计方面显示了承诺，但学习从多视图图像中存在内在的不确定性。现有的方法来衡量这些不确定性是 Either heuristic 或 computationally demanding。我们介绍 BayesRays，一种在预训练 NeRF 无需修改训练过程中的 posterior framework 来评估不确定性。我们的方法创建了一个卷积uncertainty field 使用空间扰动和bayesian laplace approximation。我们从统计角度 derivation 我们的算法，并在关键指标和应用中表现出优于现有方法。更多结果可以在: 中找到。

3D Transformer based on deformable patch location for differential diagnosis between Alzheimer’s disease and Frontotemporal dementia

paper_url: http://arxiv.org/abs/2309.03183
repo_url: None
paper_authors: Huy-Dung Nguyen, Michaël Clément, Boris Mansencal, Pierrick Coupé
for: 本研究的目的是提出一种基于 transformer 架构的三维医学数据识别方法，以提高阿尔茨海默病和前rontemporal dementia 的Multi-class differential diagnosis。
methods: 本研究使用了 transformer 架构，并提出了一种弹性质的 patch 定位模块，以提高精准性。此外，为了解决数据稀缺问题，我们提出了一种有效的数据扩充技术策略，适用于训练 transformer 模型。
results: 我们的实验表明，提出的方法可以实现竞争性的 результаats，并且可以Visualize 弹性 patch 定位，揭示了每种疾病的诊断所使用的主要脑区域。

Abstract
Alzheimer's disease and Frontotemporal dementia are common types of neurodegenerative disorders that present overlapping clinical symptoms, making their differential diagnosis very challenging. Numerous efforts have been done for the diagnosis of each disease but the problem of multi-class differential diagnosis has not been actively explored. In recent years, transformer-based models have demonstrated remarkable success in various computer vision tasks. However, their use in disease diagnostic is uncommon due to the limited amount of 3D medical data given the large size of such models. In this paper, we present a novel 3D transformer-based architecture using a deformable patch location module to improve the differential diagnosis of Alzheimer's disease and Frontotemporal dementia. Moreover, to overcome the problem of data scarcity, we propose an efficient combination of various data augmentation techniques, adapted for training transformer-based models on 3D structural magnetic resonance imaging data. Finally, we propose to combine our transformer-based model with a traditional machine learning model using brain structure volumes to better exploit the available data. Our experiments demonstrate the effectiveness of the proposed approach, showing competitive results compared to state-of-the-art methods. Moreover, the deformable patch locations can be visualized, revealing the most relevant brain regions used to establish the diagnosis of each disease.

摘要
阿尔茨海默病和前rontemporal dementia是常见的神经退化疾病，它们的临床表现相似，诊断非常困难。许多努力已经done for the diagnosis of each disease，但多类差分诊断还没有得到active exploration。在最近几年，transformer-based模型在各种计算机视觉任务中表现出色，但它们在疾病诊断中使用不常见，主要因为3D医疗数据的有限性，transformer-based模型的大小。本文提出了一种新的3D transformer-based架构，使用可变形矩阵定位模块以提高阿尔茨海默病和前rontemporal dementia的多类差分诊断。此外，为了解决数据稀缺的问题，我们提议了一种高效的数据扩充技术组合，适用于在3D结构磁共振成像数据上训练transformer-based模型。最后，我们提议将我们的transformer-based模型与传统机器学习模型结合，使用大脑结构体积来更好地利用可用的数据。我们的实验结果表明，提议的方法具有竞争力，与状态对照方法相比，并且可视化的可变形矩阵定位可以揭示每种疾病诊断中最重要的大脑区域。

SLiMe: Segment Like Me

paper_url: http://arxiv.org/abs/2309.03179
repo_url: None
paper_authors: Aliasghar Khani, Saeid Asgari Taghanaki, Aditya Sanghi, Ali Mahdavi Amiri, Ghassan Hamarneh
for: 这篇研究旨在提出一个单一示例的图像分类法，可以在测试阶段使用单一图像和其分类标签来构成任意细分层次的图像分类。
methods: 这篇研究使用大量的感知语言模型，如Stable Diffusion（SD），来实现图像分类。它将这个问题设定为一个优化任务，具体是从SD的偏好中提取注意力地图，然后将Stable Diffusion的文本嵌入优化，以让每个嵌入学习一个单一分类区域。
results: 研究结果显示，SLiMe可以在测试阶段使用单一图像和其分类标签来构成任意细分层次的图像分类，并且比其他一元和几元的分类方法更高效。此外，在可以使用更多训练数据时，SLiMe的性能会得到进一步提升。

Abstract
Significant strides have been made using large vision-language models, like Stable Diffusion (SD), for a variety of downstream tasks, including image editing, image correspondence, and 3D shape generation. Inspired by these advancements, we explore leveraging these extensive vision-language models for segmenting images at any desired granularity using as few as one annotated sample by proposing SLiMe. SLiMe frames this problem as an optimization task. Specifically, given a single training image and its segmentation mask, we first extract attention maps, including our novel "weighted accumulated self-attention map" from the SD prior. Then, using the extracted attention maps, the text embeddings of Stable Diffusion are optimized such that, each of them, learn about a single segmented region from the training image. These learned embeddings then highlight the segmented region in the attention maps, which in turn can then be used to derive the segmentation map. This enables SLiMe to segment any real-world image during inference with the granularity of the segmented region in the training image, using just one example. Moreover, leveraging additional training data when available, i.e. few-shot, improves the performance of SLiMe. We carried out a knowledge-rich set of experiments examining various design factors and showed that SLiMe outperforms other existing one-shot and few-shot segmentation methods.

摘要
“大量视语模型，如稳定扩散（SD），在多种下游任务上取得了重要进步，包括图像编辑、图像对应和3D形状生成。受这些进步的激发，我们想要利用这些广泛的视语模型来 segment 图像，并且可以使用只有一个标注样本。我们提出了 SLime，它带有一个优化问题的框架。给定一个训练图像和其 segmentation 图像，我们首先提取注意力地图，包括我们的新的“加重累积自注意力地图”。然后，使用提取的注意力地图，Stable Diffusion 的文本嵌入被优化，以确保每个嵌入学习一个单个分割区域。这些学习的嵌入然后可以高亮分割区域在注意力地图中，并可以在推理中使用这些注意力地图来生成分割图像。这使得 SLime 可以在推理中对实际世界图像进行分割，并且可以使用只有一个标注样本。此外，当有更多的训练数据可用时，我们可以使用几个示例进行训练，从而提高 SLime 的性能。我们进行了一系列知识丰富的实验，检查了不同的设计因素，并证明了 SLime 在一shot 和几个示例下的 segmentation 方法中表现出色。”

3D Object Positioning Using Differentiable Multimodal Learning

paper_url: http://arxiv.org/abs/2309.03177
repo_url: None
paper_authors: Sean Zanyk-McLean, Krishna Kumar, Paul Navratil
for: 优化计算机图形Scene中对象的位置，使其与观察者或参照对象相匹配。
methods: 使用模拟的激光数据via ray tracing和图像像素损失，并使用分子 descend gradient 优化对象的位置。
results: 使用两种感知模式（图像和激光）可以更快 converges 对象的位置，这种方法有 potential usefulness для自动驾驶车辆，可以用于场景中多个actor的位置确定。

Abstract
This article describes a multi-modal method using simulated Lidar data via ray tracing and image pixel loss with differentiable rendering to optimize an object's position with respect to an observer or some referential objects in a computer graphics scene. Object position optimization is completed using gradient descent with the loss function being influenced by both modalities. Typical object placement optimization is done using image pixel loss with differentiable rendering only, this work shows the use of a second modality (Lidar) leads to faster convergence. This method of fusing sensor input presents a potential usefulness for autonomous vehicles, as these methods can be used to establish the locations of multiple actors in a scene. This article also presents a method for the simulation of multiple types of data to be used in the training of autonomous vehicles.

摘要
这篇文章描述了一种多模态方法，使用模拟的激光数据和图像像素损失，通过可导渲染来优化对观察者或参考对象的位置在计算机图形Scene中。对象位置优化使用梯度下降，损失函数受到两种模态的影响。通常的对象放置优化只使用图像像素损失和可导渲染，这种工作表明在使用第二种感知器（激光）时，更快地 converges。这种感知器数据融合方法在自动驾驶汽车中可能有用，因为它们可以用于场景中多个actor的位置确定。这篇文章还描述了一种用于训练自动驾驶汽车的多种数据的 simulate方法。

PDiscoNet: Semantically consistent part discovery for fine-grained recognition

paper_url: http://arxiv.org/abs/2309.03173
repo_url: https://github.com/robertdvdk/part_detection
paper_authors: Robert van der Klis, Stephan Alaniz, Massimiliano Mancini, Cassio F. Dantas, Dino Ienco, Zeynep Akata, Diego Marcos
for: 本研究旨在提高细化分类模型的准确性，通过让模型首先检测到特定物体部分，然后使用这些部分来推断类别。
methods: 本研究提出了PDiscoNet方法，使用图像级别的类别标签和约束，以便找出物体部分。此外，还使用部分抽取和特征向量修饰来保证每个部分具有不同的信息。
results: 对于CUB、CelebA和PartImageNet等数据集，PDiscoNet方法可以提供明显更好的部分发现性能，而无需进行额外的Hyperparameter调整，同时不会影响分类性能。

Abstract
Fine-grained classification often requires recognizing specific object parts, such as beak shape and wing patterns for birds. Encouraging a fine-grained classification model to first detect such parts and then using them to infer the class could help us gauge whether the model is indeed looking at the right details better than with interpretability methods that provide a single attribution map. We propose PDiscoNet to discover object parts by using only image-level class labels along with priors encouraging the parts to be: discriminative, compact, distinct from each other, equivariant to rigid transforms, and active in at least some of the images. In addition to using the appropriate losses to encode these priors, we propose to use part-dropout, where full part feature vectors are dropped at once to prevent a single part from dominating in the classification, and part feature vector modulation, which makes the information coming from each part distinct from the perspective of the classifier. Our results on CUB, CelebA, and PartImageNet show that the proposed method provides substantially better part discovery performance than previous methods while not requiring any additional hyper-parameter tuning and without penalizing the classification performance. The code is available at https://github.com/robertdvdk/part_detection.

摘要
通常需要细化分类时，需要识别特定的物体部分，如鸟类的嘴形和翼模式。我们建议使用PDiscoNet来发现物体部分，使用只有图像级别的类别标签以及促进这些部分是：特异的、紧凑的、对对映变换旋转的、活跃的。此外，我们还提出使用部分排除和部分特征向量修饰，以避免单个部分占据过多的地位。我们的实验结果表明，我们的方法可以在CUB、CelebA和PartImageNet上提供显著更好的部分发现性能，而不需要进行额外的 гиперпараметр调整，也不会影响分类性能。代码可以在https://github.com/robertdvdk/part_detection上找到。

ResFields: Residual Neural Fields for Spatiotemporal Signals

paper_url: http://arxiv.org/abs/2309.03160
repo_url: https://github.com/markomih/ResFields
paper_authors: Marko Mihajlovic, Sergey Prokudin, Marc Pollefeys, Siyu Tang
for: 用于模型复杂的3D数据，特别是大型神经签距离场（NeRFs）或神经签距离场（SDFs）via单个多层感知器（MLP）。
methods: incorporating temporal residual layers into neural fields，dubbed ResFields，a novel class of networks specifically designed to effectively represent complex temporal signals。
results: 提出了一种有效的方法来解决MLP的限制，并对多个复杂任务进行了全面的分析和评估，包括2D视频近似、动态形状模型化via temporal SDFs、动态NeRF重建等。

Abstract
Neural fields, a category of neural networks trained to represent high-frequency signals, have gained significant attention in recent years due to their impressive performance in modeling complex 3D data, especially large neural signed distance (SDFs) or radiance fields (NeRFs) via a single multi-layer perceptron (MLP). However, despite the power and simplicity of representing signals with an MLP, these methods still face challenges when modeling large and complex temporal signals due to the limited capacity of MLPs. In this paper, we propose an effective approach to address this limitation by incorporating temporal residual layers into neural fields, dubbed ResFields, a novel class of networks specifically designed to effectively represent complex temporal signals. We conduct a comprehensive analysis of the properties of ResFields and propose a matrix factorization technique to reduce the number of trainable parameters and enhance generalization capabilities. Importantly, our formulation seamlessly integrates with existing techniques and consistently improves results across various challenging tasks: 2D video approximation, dynamic shape modeling via temporal SDFs, and dynamic NeRF reconstruction. Lastly, we demonstrate the practical utility of ResFields by showcasing its effectiveness in capturing dynamic 3D scenes from sparse sensory inputs of a lightweight capture system.

摘要
神经场（Neural Fields），一类基于神经网络的高频信号表示方法，在过去几年内受到了广泛关注，尤其是通过单一多层感知器（MLP）来表示复杂的3D数据，如大神经积分距离（SDFs）或各向异性场（NeRFs）。然而，尽管MLP具有强大和简单的表示能力，这些方法仍然面临着处理大型和复杂的时间信号的挑战，因为MLP的容量有限。在这篇论文中，我们提出了一种有效的方法，通过将时间剩余层添加到神经场中，称之为剩余场（ResFields），这种网络专门设计用于有效地表示复杂的时间信号。我们进行了全面的分析，并提出了一种矩阵分解技术来减少可训练参数的数量，提高泛化能力。重要的是，我们的方案可以兼容现有技术，并在不同的挑战任务上提供了稳定的改进。最后，我们示出了ResFields在捕捉低保持的3D场景中的实用性。

Do We Still Need Non-Maximum Suppression? Accurate Confidence Estimates and Implicit Duplication Modeling with IoU-Aware Calibration

paper_url: http://arxiv.org/abs/2309.03110
repo_url: None
paper_authors: Johannes Gilg, Torben Teepe, Fabian Herzog, Philipp Wolters, Gerhard Rigoll
for: 提高 object detection 系统的可靠性和可读性
methods: 使用 IoU-aware calibration 取代经典 NMS 后处理
results: 提高了 detection 的准确性和可读性，并且比标准 NMS 和 calibration 方法更高效

Abstract
Object detectors are at the heart of many semi- and fully autonomous decision systems and are poised to become even more indispensable. They are, however, still lacking in accessibility and can sometimes produce unreliable predictions. Especially concerning in this regard are the -- essentially hand-crafted -- non-maximum suppression algorithms that lead to an obfuscated prediction process and biased confidence estimates. We show that we can eliminate classic NMS-style post-processing by using IoU-aware calibration. IoU-aware calibration is a conditional Beta calibration; this makes it parallelizable with no hyper-parameters. Instead of arbitrary cutoffs or discounts, it implicitly accounts for the likelihood of each detection being a duplicate and adjusts the confidence score accordingly, resulting in empirically based precision estimates for each detection. Our extensive experiments on diverse detection architectures show that the proposed IoU-aware calibration can successfully model duplicate detections and improve calibration. Compared to the standard sequential NMS and calibration approach, our joint modeling can deliver performance gains over the best NMS-based alternative while producing consistently better-calibrated confidence predictions with less complexity. The \hyperlink{https://github.com/Blueblue4/IoU-AwareCalibration}{code} for all our experiments is publicly available.

摘要
To address these issues, we propose using IoU-aware calibration, which is a conditional Beta calibration that can be parallelized with no hyperparameters. This approach eliminates the need for classic NMS-style post-processing and instead uses empirical probability estimates to model duplicate detections and improve calibration.Our extensive experiments on diverse detection architectures show that the proposed IoU-aware calibration can successfully model duplicate detections and improve calibration. Compared to the standard sequential NMS and calibration approach, our joint modeling can deliver performance gains over the best NMS-based alternative while producing consistently better-calibrated confidence predictions with less complexity. The code for all our experiments is publicly available at \hyperlink{https://github.com/Blueblue4/IoU-AwareCalibration}{https://github.com/Blueblue4/IoU-AwareCalibration}.

FArMARe: a Furniture-Aware Multi-task methodology for Recommending Apartments based on the user interests

paper_url: http://arxiv.org/abs/2309.03100
repo_url: https://github.com/aliabdari/farmare
paper_authors: Ali Abdari, Alex Falcon, Giuseppe Serra
for: 本研究旨在提供一个基于文本查询的住房推荐系统，以减少用户寻找新住所的时间过程中的困难。
methods: 本研究使用了一种多任务方法，名为FArMARe，它支持跨Modal对照式训练，并且具有家具感知目标。
results: 经过三种不同的方法和两种原始特征提取方法的实验，FArMARe 在解决这个问题上显示了优秀的效果。

Abstract
Nowadays, many people frequently have to search for new accommodation options. Searching for a suitable apartment is a time-consuming process, especially because visiting them is often mandatory to assess the truthfulness of the advertisements found on the Web. While this process could be alleviated by visiting the apartments in the metaverse, the Web-based recommendation platforms are not suitable for the task. To address this shortcoming, in this paper, we define a new problem called text-to-apartment recommendation, which requires ranking the apartments based on their relevance to a textual query expressing the user's interests. To tackle this problem, we introduce FArMARe, a multi-task approach that supports cross-modal contrastive training with a furniture-aware objective. Since public datasets related to indoor scenes do not contain detailed descriptions of the furniture, we collect and annotate a dataset comprising more than 6000 apartments. A thorough experimentation with three different methods and two raw feature extraction procedures reveals the effectiveness of FArMARe in dealing with the problem at hand.

摘要
现在，许多人经常需要搜索新的住房选项。搜索一个适合的公寓是一个时间消耗的过程，特别是因为浏览它们是必需的，以确保在网上发现的广告的真实性。尽管这个过程可以通过虚拟世界中的浏览来减轻，但网络上的推荐平台并不适用于这个任务。为解决这个缺点，在这篇论文中，我们定义了一个新的问题，即文本到公寓推荐，这需要根据用户的文本查询来排序公寓的相关性。为解决这个问题，我们介绍了FArMARe，一种多任务方法，支持跨模态对比学习，并且具有家具意识的目标。由于公共 datasets related to indoor scenes 没有详细的家具描述，我们收集和注释了一个包含 más de 6000 个公寓的数据集。经过三种不同的方法和两种原始特征提取方法的实验，我们发现 FArMARe 能够成功地解决这个问题。

Character Queries: A Transformer-based Approach to On-Line Handwritten Character Segmentation

paper_url: http://arxiv.org/abs/2309.03072
repo_url: https://github.com/jungomi/character-queries
paper_authors: Michael Jungo, Beat Wolf, Andrii Maksai, Claudiu Musat, Andreas Fischer
for: 本研究旨在提高在线手写文本分 segmentation的精度，具体来说是在知道转录文本的情况下，通过划分样本点和字符的匹配来实现字符的分 segmentation。
methods: 我们提出了一种基于 transformer 架构的方法，其中每个分区是基于学习的字符查询在 transformer 解码块中形成的。我们还考虑了多种方法来评估我们的方法的效果。
results: 我们在两个常用的在线手写数据集上（IAM-OnDB 和 HANDS-VNOnDB）创建了字符分 segmentation 的ground truth，并对多种方法进行评估，得到了最佳的总结果。

Abstract
On-line handwritten character segmentation is often associated with handwriting recognition and even though recognition models include mechanisms to locate relevant positions during the recognition process, it is typically insufficient to produce a precise segmentation. Decoupling the segmentation from the recognition unlocks the potential to further utilize the result of the recognition. We specifically focus on the scenario where the transcription is known beforehand, in which case the character segmentation becomes an assignment problem between sampling points of the stylus trajectory and characters in the text. Inspired by the $k$-means clustering algorithm, we view it from the perspective of cluster assignment and present a Transformer-based architecture where each cluster is formed based on a learned character query in the Transformer decoder block. In order to assess the quality of our approach, we create character segmentation ground truths for two popular on-line handwriting datasets, IAM-OnDB and HANDS-VNOnDB, and evaluate multiple methods on them, demonstrating that our approach achieves the overall best results.

摘要
在线手写字符识别常常与手写识别结合在一起，即使识别模型包含了定位相关的机制，但通常还不够精确地分 segmentation。将分 segmentation 和识别解联，可以更好地利用识别结果。我们专注在已知转写的情况下，在这种情况下，字符分 segmentation 变成了对样本点轨迹和文本中的字符进行对应的对映问题。受 $k$-means 聚类算法的启发，我们从样本点轨迹的角度出发，并在Transformer底层构造中逐个形成学习的字符查询。为了评估我们的方法的质量，我们创建了两个受欢迎的在线手写数据集的字符分 segmentation 真实值，并评估了多种方法，展示了我们的方法取得了总体最好的结果。

Prompt-based All-in-One Image Restoration using CNNs and Transformer

paper_url: http://arxiv.org/abs/2309.03063
repo_url: None
paper_authors: Hu Gao, Jing Yang, Ning Wang, Jingfan Yang, Ying Zhang, Depeng Dang
for: 这个论文的目标是为了回复高质量图像从其受损观测中恢复。现有的大多数方法仅专注于单一的降低效果，因此在实际场景中不能达到最佳效果。
methods: 我们提出了一种数据组件指向方法，通过提取特征并使用提示来使单个模型能够有效地处理多种图像降低任务。我们使用编码器捕捉特征，并在解码器中引入提示来指导图像恢复。为了模型高质量图像恢复的地方不变性和非地方信息，我们将CNN操作和变换器结合使用。
results: 我们的方法可以快速和高效地处理多种图像降低任务，并且在不同的降低任务中达到了竞争力。我们的方法可以与专门的任务算法竞争，并且在实际场景中表现良好。

Abstract
Image restoration aims to recover the high-quality images from their degraded observations. Since most existing methods have been dedicated into single degradation removal, they may not yield optimal results on other types of degradations, which do not satisfy the applications in real world scenarios. In this paper, we propose a novel data ingredient-oriented approach that leverages prompt-based learning to enable a single model to efficiently tackle multiple image degradation tasks. Specifically, we utilize a encoder to capture features and introduce prompts with degradation-specific information to guide the decoder in adaptively recovering images affected by various degradations. In order to model the local invariant properties and non-local information for high-quality image restoration, we combined CNNs operations and Transformers. Simultaneously, we made several key designs in the Transformer blocks (multi-head rearranged attention with prompts and simple-gate feed-forward network) to reduce computational requirements and selectively determines what information should be persevered to facilitate efficient recovery of potentially sharp images. Furthermore, we incorporate a feature fusion mechanism further explores the multi-scale information to improve the aggregated features. The resulting tightly interlinked hierarchy architecture, named as CAPTNet, despite being designed to handle different types of degradations, extensive experiments demonstrate that our method performs competitively to the task-specific algorithms.

摘要
Image restoration aimed to recover high-quality images from degraded observations. Most existing methods only focus on single degradation removal, which may not produce optimal results in real-world scenarios. In this paper, we propose a novel data ingredient-oriented approach that leverages prompt-based learning to enable a single model to efficiently handle multiple image degradation tasks. Specifically, we use an encoder to capture features and introduce prompts with degradation-specific information to guide the decoder in adaptively recovering images affected by various degradations. To model local invariant properties and non-local information for high-quality image restoration, we combined CNNs operations and Transformers. Additionally, we made several key designs in the Transformer blocks, such as multi-head rearranged attention with prompts and simple-gate feed-forward network, to reduce computational requirements and selectively preserve information to facilitate efficient recovery of potentially sharp images. Furthermore, we incorporate a feature fusion mechanism to explore multi-scale information and improve aggregated features. The resulting tightly interlinked hierarchy architecture, named CAPTNet, despite being designed to handle different types of degradations, shows competitive performance compared to task-specific algorithms through extensive experiments.

Adaptive Growth: Real-time CNN Layer Expansion

paper_url: http://arxiv.org/abs/2309.03049
repo_url: https://github.com/yunjiezhu/extensible-convolutional-layer-git-version
paper_authors: Yunjie Zhu, Yunhao Chen
for: 提高深度学习模型的适应性和效率，适用于动态环境。
methods: 使用动态演进层，在已有深度学习模型中扩展层的功能，通过实时评估层的表征能力，进行自适应调整。
results: 与超参数方法相比，实现了更高的适应性和更好的性能在多个数据集上，包括MNIST、Fashion-MNIST、CIFAR-10和CIFAR-100等。同时，在转移学习场景下也表现出了更高的适应性。

Abstract
Deep Neural Networks (DNNs) have shown unparalleled achievements in numerous applications, reflecting their proficiency in managing vast data sets. Yet, their static structure limits their adaptability in ever-changing environments. This research presents a new algorithm that allows the convolutional layer of a Convolutional Neural Network (CNN) to dynamically evolve based on data input, while still being seamlessly integrated into existing DNNs. Instead of a rigid architecture, our approach iteratively introduces kernels to the convolutional layer, gauging its real-time response to varying data. This process is refined by evaluating the layer's capacity to discern image features, guiding its growth. Remarkably, our unsupervised method has outstripped its supervised counterparts across diverse datasets like MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100. It also showcases enhanced adaptability in transfer learning scenarios. By introducing a data-driven model scalability strategy, we are filling a void in deep learning, leading to more flexible and efficient DNNs suited for dynamic settings. Code:(https://github.com/YunjieZhu/Extensible-Convolutional-Layer-git-version).

摘要
深度神经网络（DNNs）在多个应用场景中表现出了无与伦比的成绩，彰显其对庞大数据集的管理能力。然而，它们的静态结构限制了它们在不断变化的环境中的适应性。这项研究提出了一个新的算法，允许 convolutional layer 中的 Convolutional Neural Network（CNN）在数据输入的基础上动态演化，而无需更改现有 DNN 的结构。而不是固定的 Architecture，我们的方法会在运行时逐渐添加 kernel 到 convolutional layer，根据数据的变化进行反馈，以提高它的感知度。这个过程由评估层的特征分类能力来引导，以便增强其生长。很显icht，我们的无监督方法在多个 dataset 上（如 MNIST、Fashion-MNIST、CIFAR-10 和 CIFAR-100）的表现都超过了其监督 counterpart。它还在转移学习场景中显示出了更高的适应性。我们通过引入数据驱动的模型扩展策略，填补了深度学习中的一个空白，导致更 flexible 和高效的 DNN 适用于动态场景。代码：https://github.com/YunjieZhu/Extensible-Convolutional-Layer-git-version。

Exploring Semantic Consistency in Unpaired Image Translation to Generate Data for Surgical Applications

paper_url: http://arxiv.org/abs/2309.03048
repo_url: None
paper_authors: Danush Kumar Venkatesh, Dominik Rivoir, Micha Pfeiffer, Fiona Kolbinger, Marius Distler, Jürgen Weitz, Stefanie Speidel
for: 这篇论文的目的是研究无监督图像翻译技术，以生成具有高semantic consistency的大量标注数据集，用于手术计算机视觉应用。
methods: 这篇论文使用了多种state-of-the-art图像翻译模型，包括structural-similarity loss和contrastive learning，以提高图像翻译的semantic consistency。
results: 研究表明，使用这种简单的组合方法可以生成高semantic consistency的图像数据集，并且可以更有效地用于手术 semantic segmentation 任务的训练。

Abstract
In surgical computer vision applications, obtaining labeled training data is challenging due to data-privacy concerns and the need for expert annotation. Unpaired image-to-image translation techniques have been explored to automatically generate large annotated datasets by translating synthetic images to the realistic domain. However, preserving the structure and semantic consistency between the input and translated images presents significant challenges, mainly when there is a distributional mismatch in the semantic characteristics of the domains. This study empirically investigates unpaired image translation methods for generating suitable data in surgical applications, explicitly focusing on semantic consistency. We extensively evaluate various state-of-the-art image translation models on two challenging surgical datasets and downstream semantic segmentation tasks. We find that a simple combination of structural-similarity loss and contrastive learning yields the most promising results. Quantitatively, we show that the data generated with this approach yields higher semantic consistency and can be used more effectively as training data.

摘要
在骨科计算机视觉应用中，获得标注数据具有数据隐私问题和专家标注的需求。不配对图像译化技术已经探索以自动生成大量标注数据，将 sintetic 图像翻译到真实域。然而，保持输入和翻译图像之间的结构和 semantics 一致存在主要挑战，特别当 semantic 特征领域的分布不同时。本研究 empirically 探究了无配对图像翻译方法在骨科应用中生成合适数据，专门关注 semantic 一致性。我们广泛评估了多种现状顶峰图像翻译模型，在两个复杂的骨科数据集和下游semantic 分割任务上进行了广泛的评估。我们发现，一种简单的结构相似损失和对比学习的组合可以获得最佳结果。量化地表明，通过这种方法生成的数据具有更高的semantic 一致性，可以更有效地作为训练数据使用。

MCM: Multi-condition Motion Synthesis Framework for Multi-scenario

paper_url: http://arxiv.org/abs/2309.03031
repo_url: None
paper_authors: Zeyu Ling, Bo Han, Yongkang Wong, Mohan Kangkanhalli, Weidong Geng
for: 本研究的目的是解决多个条件人体动作生成任务中的多个条件输入问题，包括文本、音乐、语音等多种形式的输入。
methods: 本研究提出了一种新的多Condition Motion Synthesis（MCM）模型，该模型可以同时处理多个条件输入，并且可以与DDPM-like扩散模型结合使用，以保持生成能力。MCM模型包括两个分支结构，主分支和控制分支，两者具有相同的结构，并且初始化控制分支的参数与主分支的参数相同，以确保生成能力的维持。
results: 研究表明，MCM模型在文本到动作和音乐到舞蹈等多个任务中均达到了顶峰性能，与专门为这些任务设计的方法相当。此外，MCM模型还能够有效地实现多个条件Modal控制，实现“一次训练，动作需要”的目标。

Abstract
The objective of the multi-condition human motion synthesis task is to incorporate diverse conditional inputs, encompassing various forms like text, music, speech, and more. This endows the task with the capability to adapt across multiple scenarios, ranging from text-to-motion and music-to-dance, among others. While existing research has primarily focused on single conditions, the multi-condition human motion generation remains underexplored. In this paper, we address these challenges by introducing MCM, a novel paradigm for motion synthesis that spans multiple scenarios under diverse conditions. The MCM framework is able to integrate with any DDPM-like diffusion model to accommodate multi-conditional information input while preserving its generative capabilities. Specifically, MCM employs two-branch architecture consisting of a main branch and a control branch. The control branch shares the same structure as the main branch and is initialized with the parameters of the main branch, effectively maintaining the generation ability of the main branch and supporting multi-condition input. We also introduce a Transformer-based diffusion model MWNet (DDPM-like) as our main branch that can capture the spatial complexity and inter-joint correlations in motion sequences through a channel-dimension self-attention module. Quantitative comparisons demonstrate that our approach achieves SoTA results in both text-to-motion and competitive results in music-to-dance tasks, comparable to task-specific methods. Furthermore, the qualitative evaluation shows that MCM not only streamlines the adaptation of methodologies originally designed for text-to-motion tasks to domains like music-to-dance and speech-to-gesture, eliminating the need for extensive network re-configurations but also enables effective multi-condition modal control, realizing "once trained is motion need".

摘要
目标是多个条件人体动作生成任务的多个条件人体动作生成任务，旨在涵盖多种形式，如文本、音乐、语音等。这使得任务具有适应多种场景的能力，从文本到动作和音乐到舞蹈等。现有研究主要集中在单个条件下进行研究，而多个条件人体动作生成任务仍然尚未得到充分探索。在这篇论文中，我们解决这些挑战，提出了MCM，一种新的人体动作生成模型，可以在多种条件下进行生成。MCM框架可以与任何DDPM-like扩散模型结合，并且可以同时处理多个条件输入，而不会影响生成能力。具体来说，MCM采用了两极架构，主分支和控制分支。控制分支与主分支结构相同，并且初始化为主分支的参数，以保持生成能力，同时支持多个条件输入。我们还提出了一种基于Transformer的扩散模型MWNet，它可以通过通道维度自注意模块捕捉人体动作序列中的空间复杂性和相关性。量化比较表明，我们的方法在文本到动作和音乐到舞蹈两个任务中均达到了状态艺术的结果，与专门的任务方法相当。此外，质量评估表明，MCM不仅可以将原本设计用于文本到动作任务的方法流线化到类似的音乐到舞蹈和语音到手势等领域，无需进行广泛的网络重新配置，还能够实现有效的多个条件模式控制，实现“一次训练，动作需要”。

SEAL: A Framework for Systematic Evaluation of Real-World Super-Resolution

paper_url: http://arxiv.org/abs/2309.03020
repo_url: https://github.com/xpixelgroup/seal
paper_authors: Wenlong Zhang, Xiaohui Li, Xiangyu Chen, Yu Qiao, Xiao-Ming Wu, Chao Dong
for: 这种研究旨在提供一个系统性的评估平台，以便对实际世界图像的超分辨率方法进行全面的评估。
methods: 该研究提出了一种新的评估框架，即 SEAL，它可以快速和系统地评估实际世界图像的超分辨率方法。
results: 该研究通过对现有的实际世界超分辨率方法进行评估，并提出了一个新的强基eline，以及两个新的评估指标（Acceptance Rate和Relative Performance Ratio），以便更好地评估实际世界图像的超分辨率方法。

Abstract
Real-world Super-Resolution (real-SR) methods focus on dealing with diverse real-world images and have attracted increasing attention in recent years. The key idea is to use a complex and high-order degradation model to mimic real-world degradations. Although they have achieved impressive results in various scenarios, they are faced with the obstacle of evaluation. Currently, these methods are only assessed by their average performance on a small set of degradation cases randomly selected from a large space, which fails to provide a comprehensive understanding of their overall performance and often yields biased results. To overcome the limitation in evaluation, we propose SEAL, a framework for systematic evaluation of real-SR. In particular, we cluster the extensive degradation space to create a set of representative degradation cases, which serves as a comprehensive test set. Next, we propose a coarse-to-fine evaluation protocol to measure the distributed and relative performance of real-SR methods on the test set. The protocol incorporates two new metrics: acceptance rate (AR) and relative performance ratio (RPR), derived from an acceptance line and an excellence line. Under SEAL, we benchmark existing real-SR methods, obtain new observations and insights into their performance, and develop a new strong baseline. We consider SEAL as the first step towards creating an unbiased and comprehensive evaluation platform, which can promote the development of real-SR.

摘要

Sparse 3D Reconstruction via Object-Centric Ray Sampling

paper_url: http://arxiv.org/abs/2309.03008
repo_url: None
paper_authors: Llukman Cerkezi, Paolo Favaro
for: 3D object reconstruction from a sparse set of views captured from a 360-degree calibrated camera rig
methods: hybrid model using both MLP-based neural representation and triangle mesh, object-centric sampling scheme of the neural representation, and differentiable renderer
results: state of the art 3D reconstructions, does not require additional supervision of segmentation masks, works with sparse views on several datasets (Google’s Scanned Objects, Tank and Temples, and MVMC Car)

Abstract
We propose a novel method for 3D object reconstruction from a sparse set of views captured from a 360-degree calibrated camera rig. We represent the object surface through a hybrid model that uses both an MLP-based neural representation and a triangle mesh. A key contribution in our work is a novel object-centric sampling scheme of the neural representation, where rays are shared among all views. This efficiently concentrates and reduces the number of samples used to update the neural model at each iteration. This sampling scheme relies on the mesh representation to ensure also that samples are well-distributed along its normals. The rendering is then performed efficiently by a differentiable renderer. We demonstrate that this sampling scheme results in a more effective training of the neural representation, does not require the additional supervision of segmentation masks, yields state of the art 3D reconstructions, and works with sparse views on the Google's Scanned Objects, Tank and Temples and MVMC Car datasets.

摘要
我们提出了一种新的3D物体重建方法，使用一个稀疏视图集 captured from一个360度准备好的相机环境。我们表示物体表面通过一种混合模型，这里使用MLP基于神经网络表示以及三角形网格。我们的研究所提供了一种新的物体中心的采样方案，其中光束被所有视图共享。这种采样方案基于网格表示，以确保采样点均匀分布在网格轴上。然后，我们使用可导渠渲染器进行高效渲染。我们示出了这种采样方案能够更有效地训练神经网络，不需要额外的分 segmentation 标注，实现了状态的arte 3D重建，并在Google Scanned Objects、Tank和Temples以及MVMC Car数据集上达到了最佳效果。

Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning

paper_url: http://arxiv.org/abs/2309.02999
repo_url: https://github.com/ch3cook-fdu/vote2cap-detr
paper_authors: Sijin Chen, Hongyuan Zhu, Mingsheng Li, Xin Chen, Peng Guo, Yinjie Lei, Gang Yu, Taihao Li, Tao Chen
for: 本研究旨在提出一种简单 yet effective的 transformer框架，以便在3D scene中进行详细描述。
methods: 本研究使用了分解预测和迭代 espacial refinement 策略，以提高 caption 生成和对象定位的准确性。
results: 对 ScanRefer 和 Nr3D 两个常用数据集进行了广泛的实验，结果表明 Vote2Cap-DETR 和 Vote2Cap-DETR++ 超过了传统的 “detect-then-describe” 方法，并且可以快速地生成详细的 caption。

Abstract
3D dense captioning requires a model to translate its understanding of an input 3D scene into several captions associated with different object regions. Existing methods adopt a sophisticated "detect-then-describe" pipeline, which builds explicit relation modules upon a 3D detector with numerous hand-crafted components. While these methods have achieved initial success, the cascade pipeline tends to accumulate errors because of duplicated and inaccurate box estimations and messy 3D scenes. In this paper, we first propose Vote2Cap-DETR, a simple-yet-effective transformer framework that decouples the decoding process of caption generation and object localization through parallel decoding. Moreover, we argue that object localization and description generation require different levels of scene understanding, which could be challenging for a shared set of queries to capture. To this end, we propose an advanced version, Vote2Cap-DETR++, which decouples the queries into localization and caption queries to capture task-specific features. Additionally, we introduce the iterative spatial refinement strategy to vote queries for faster convergence and better localization performance. We also insert additional spatial information to the caption head for more accurate descriptions. Without bells and whistles, extensive experiments on two commonly used datasets, ScanRefer and Nr3D, demonstrate Vote2Cap-DETR and Vote2Cap-DETR++ surpass conventional "detect-then-describe" methods by a large margin. Codes will be made available at https://github.com/ch3cook-fdu/Vote2Cap-DETR.

摘要
3D dense captioning需要一个模型将输入3D场景的理解翻译成多个关联于不同对象区域的caption。现有方法采用复杂的"检测然后描述"管道，其中建立了许多手动设计的组件。虽然这些方法已经实现了初步的成功，但cascade管道往往会积累错误，因为重复的和不准确的盒式估计以及混乱的3D场景。在这篇论文中，我们首先提出了Vote2Cap-DETR，一个简单又有效的转换器框架，该框架通过平行解码来解耦描述生成和对象定位的过程。此外，我们认为对象定位和描述生成需要不同水平的场景理解，这可能会在一组共享的查询中捕捉到挑战。为此，我们提出了Vote2Cap-DETR++，一个进一步的版本，该版本将查询分解成定位和描述查询，以捕捉任务特有的特征。此外，我们还引入了迭代空间重定位策略，以便更快地 converges和更好地定位性能。此外，我们还将空间信息添加到描述头部，以提高描述的准确性。无需额外的钻掘和细节，我们在两个常用的数据集ScanRefer和Nr3D上进行了广泛的实验，并证明Vote2Cap-DETR和Vote2Cap-DETR++在"检测然后描述"方法的基础上超过了 conventinal方法的表现。代码将在https://github.com/ch3cook-fdu/Vote2Cap-DETR上提供。

Continual Evidential Deep Learning for Out-of-Distribution Detection

paper_url: http://arxiv.org/abs/2309.02995
repo_url: None
paper_authors: Eduardo Aguilar, Bogdan Raducanu, Petia Radeva, Joost Van de Weijer
for: 这个研究旨在实现同时进行增执分类和异数数据检测，并且使用证据深度学习方法进行离散数据检测。
methods: 本研究提出了一个称为CEDL的方法，它结合了证据深度学习方法和增执学习框架，以便同时进行增执分类和异数数据检测。
results: 根据实验结果显示，CEDL方法在CIFAR-100数据集上，在5和10任务设置下，均能够提供比基eline更好的Object Classification结果，并且在OOD检测方面也能够大幅超过多种后勤方法的评估结果。

Abstract
Uncertainty-based deep learning models have attracted a great deal of interest for their ability to provide accurate and reliable predictions. Evidential deep learning stands out achieving remarkable performance in detecting out-of-distribution (OOD) data with a single deterministic neural network. Motivated by this fact, in this paper we propose the integration of an evidential deep learning method into a continual learning framework in order to perform simultaneously incremental object classification and OOD detection. Moreover, we analyze the ability of vacuity and dissonance to differentiate between in-distribution data belonging to old classes and OOD data. The proposed method, called CEDL, is evaluated on CIFAR-100 considering two settings consisting of 5 and 10 tasks, respectively. From the obtained results, we could appreciate that the proposed method, in addition to provide comparable results in object classification with respect to the baseline, largely outperforms OOD detection compared to several posthoc methods on three evaluation metrics: AUROC, AUPR and FPR95.

摘要
“uncertainty-based深度学习模型在提供准确和可靠预测方面吸引了很大的关注。证明深度学习在探测出现在数据集之外的数据时表现出色，我们在这篇论文中提出了将证明深度学习方法 integrate into continual learning框架，以同时进行逐步类别和对外数据检测。此外，我们还分析了真空和矛盾的能力，用于 отличить在数据集中的老类和外数据。我们提出的方法，称为CEDL，在CIFAR-100上进行了5和10任务的评估。结果显示，我们的方法不仅与基eline相当的对象分类结果，而且在OOD检测中也大幅超越了多个后处方法，以三个评估指标：AUROC、AUPR和FPR95来评估。”Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I will be happy to provide the translation in that format as well.

FishMOT: A Simple and Effective Method for Fish Tracking Based on IoU Matching

paper_url: http://arxiv.org/abs/2309.02975
repo_url: https://github.com/gakkistar/fishmot
paper_authors: Shuo Liu, Lulu Han, Xiaoyang Liu, Junli Ren, Fang Wang, Yuanshan Lin
for: 这篇论文旨在提供一个高精度、可靠的鱼类追踪方法，以应对鱼类行为和生态学研究中的追踪挑战。
methods: 本论文提出了一个称为FishMOT（鱼类多对象追踪）的新型鱼类追踪方法，它结合了物体检测和IoU匹配，包括基本模组、互动模组和发现模组。在这些模组中，基本模组通过IoU的识别码来进行目标协调，互动模组将IoU的识别码和鱼类物体的IoU进行融合，以应对鱼类间的遮挡；发现模组则使用空间时间信息来超越由检测器在复杂环境中缺失检测所导致的追踪失败。
results: 实验结果显示，FishMOT比前一代多对象追踪器和特化的鱼类追踪工具在MOTA、精度、计算时间、内存consumption等方面表现更好，并且具有优秀的一致性和通用性。

Abstract
Fish tracking plays a vital role in understanding fish behavior and ecology. However, existing tracking methods face challenges in accuracy and robustness dues to morphological change of fish, occlusion and complex environment. This paper proposes FishMOT(Multiple Object Tracking for Fish), a novel fish tracking approach combining object detection and IoU matching, including basic module, interaction module and refind module. Wherein, a basic module performs target association based on IoU of detection boxes between successive frames to deal with morphological change of fish; an interaction module combines IoU of detection boxes and IoU of fish entity to handle occlusions; a refind module use spatio-temporal information uses spatio-temporal information to overcome the tracking failure resulting from the missed detection by the detector under complex environment. FishMOT reduces the computational complexity and memory consumption since it does not require complex feature extraction or identity assignment per fish, and does not need Kalman filter to predict the detection boxes of successive frame. Experimental results demonstrate FishMOT outperforms state-of-the-art multi-object trackers and specialized fish tracking tools in terms of MOTA, accuracy, computation time, memory consumption, etc.. Furthermore, the method exhibits excellent robustness and generalizability for varying environments and fish numbers. The simplified workflow and strong performance make FishMOT as a highly effective fish tracking approach. The source codes and pre-trained models are available at: https://github.com/gakkistar/FishMOT

摘要
鱼类跟踪对鱼类行为和生态学理解具有重要作用。然而，现有的跟踪方法面临着准确性和可靠性的挑战，主要是因为鱼类形态变化、遮挡和复杂的环境。本文提出了鱼类MOT（多对目标跟踪 для鱼类），一种新的鱼类跟踪方法， combining 对象检测和IoU匹配，包括基本模块、交互模块和重新找模块。其中，基本模块通过IoU的检测盒子在不同帧之间进行目标关联，以处理鱼类形态变化;交互模块将IoU的检测盒子和鱼类实体IoU组合以处理遮挡;重新找模块使用空间时间信息以超越由检测器在复杂环境中 missed 检测而导致的跟踪失败。鱼类MOT减少了计算复杂性和内存占用，因为它不需要复杂的特征提取或鱼类特征分配，也不需要 kalman 筛选器来预测下一帧的检测盒子。实验结果表明，鱼类MOT 在 MOTA、准确率、计算时间、内存占用等方面比状态艺术多对象跟踪器和专门的鱼类跟踪工具更高。此外，方法具有优秀的抗难度和普适性，可以在不同环境和鱼类数量下展现出优秀的表现。简化的工作流程和强大的表现使得鱼类MOT 成为一种非常有效的鱼类跟踪方法。源代码和预训练模型可以在以下链接中获取：https://github.com/gakkistar/FishMOT

Dynamic Hyperbolic Attention Network for Fine Hand-object Reconstruction

paper_url: http://arxiv.org/abs/2309.02965
repo_url: None
paper_authors: Zhiying Leng, Shun-Cheng Wu, Mahdi Saleh, Antonio Montanaro, Hao Yu, Yin Wang, Nassir Navab, Xiaohui Liang, Federico Tombari
for: 本研究旨在提出一种基于贝叶斯空间的精确手object reconstruction方法，以便更好地学习手object的特征。
methods: 该方法基于贝叶斯空间的特性，包括强制学习的图像特征和多模态特征，以及一种名为动态贝叶斯注意力网络（DHANet）。
results: 对于三个公共数据集，该方法比大多数现有方法表现更好，提供了一个可行的手object reconstruction方案。

Abstract
Reconstructing both objects and hands in 3D from a single RGB image is complex. Existing methods rely on manually defined hand-object constraints in Euclidean space, leading to suboptimal feature learning. Compared with Euclidean space, hyperbolic space better preserves the geometric properties of meshes thanks to its exponentially-growing space distance, which amplifies the differences between the features based on similarity. In this work, we propose the first precise hand-object reconstruction method in hyperbolic space, namely Dynamic Hyperbolic Attention Network (DHANet), which leverages intrinsic properties of hyperbolic space to learn representative features. Our method that projects mesh and image features into a unified hyperbolic space includes two modules, ie. dynamic hyperbolic graph convolution and image-attention hyperbolic graph convolution. With these two modules, our method learns mesh features with rich geometry-image multi-modal information and models better hand-object interaction. Our method provides a promising alternative for fine hand-object reconstruction in hyperbolic space. Extensive experiments on three public datasets demonstrate that our method outperforms most state-of-the-art methods.

摘要
重构对象和手在3D从单个RGB图像中是复杂的。现有方法通过手动定义在欧式空间中的手-对象约束，导致特征学习不优化。相比欧式空间，拥有快速增长的空间距离的拓扑空间更好地保持 mesh 的几何性质，因为它将相似性基于的特征强调。在这种工作中，我们提出了首个在拓扑空间中精确重建手-对象方法，即动态拓扑空间注意力网络（DHANet），该方法利用拓扑空间的内在属性来学习表示性的特征。我们的方法将 mesh 和图像特征投影到一个统一的拓扑空间中，包括动态拓扑图 convolution 和图像注意力拓扑图 convolution。通过这两个模块，我们的方法学习了具有丰富几何-图像多模式信息的 mesh 特征，并更好地模型手-对象互动。我们的方法为精确手-对象重建在拓扑空间提供了一个有前途的代替。我们在三个公共数据集上进行了广泛的实验，并证明了我们的方法在大多数状态前方法之上。

Hierarchical-level rain image generative model based on GAN

paper_url: http://arxiv.org/abs/2309.02964
repo_url: None
paper_authors: Zhenyuan Liu, Tong Jia, Xingyu Xing, Jianfeng Wu, Junyi Chen
for: 提高自动驾驶车Visual perception系统的性能下限问题（SOTIF），通过生成不同雨强度的图像数据来测试雨天下Visual perception算法的性能。
methods: 基于生成对抗网络（GAN）的 Hierarchical-level雨图生成模型（RCCycleGAN），可以生成不同雨强度的图像。采用Conditional GAN（CGAN）的方式，将不同雨强度作为标签进行分类。同时，对模型结构进行优化，并调整训练策略以解决模式混合问题。
results: 比基eline模型CycleGAN和DerainCycleGAN的 peak signal-to-noise ratio（PSNR）和structural similarity（SSIM）指标都有显著提高，具体是2.58 dB和0.74 dB，增加了18%和8%。进行了ablation实验以验证模型调整的有效性。

Abstract
Autonomous vehicles are exposed to various weather during operation, which is likely to trigger the performance limitations of the perception system, leading to the safety of the intended functionality (SOTIF) problems. To efficiently generate data for testing the performance of visual perception algorithms under various weather conditions, a hierarchical-level rain image generative model, rain conditional CycleGAN (RCCycleGAN), is constructed. RCCycleGAN is based on the generative adversarial network (GAN) and can generate images of light, medium, and heavy rain. Different rain intensities are introduced as labels in conditional GAN (CGAN). Meanwhile, the model structure is optimized and the training strategy is adjusted to alleviate the problem of mode collapse. In addition, natural rain images of different intensities are collected and processed for model training and validation. Compared with the two baseline models, CycleGAN and DerainCycleGAN, the peak signal-to-noise ratio (PSNR) of RCCycleGAN on the test dataset is improved by 2.58 dB and 0.74 dB, and the structural similarity (SSIM) is improved by 18% and 8%, respectively. The ablation experiments are also carried out to validate the effectiveness of the model tuning.

摘要
自动驾驶车辆在运行过程中可能会遭遇不同的天气条件，这可能会导致感知系统的性能限制，从而导致安全功能（SOTIF）问题。为了效率地生成测试感知算法在不同天气条件下的性能数据，我们构建了一个层次结构的雨图生成模型，即雨条件的CycleGAN（RCCycleGAN）。RCCycleGAN基于生成对抗网络（GAN），可以生成不同雨强度的雨图。雨强度被用作CGAN中的标签。此外，模型结构优化和训练策略调整，以解决模式混合问题。此外，我们还收集了不同雨强度的自然雨图，用于模型训练和验证。与基eline模型CycleGAN和DerainCycleGAN相比，RCCycleGAN在测试集上的PSNR提高2.58dB和0.74dB，SSIM提高18%和8%。我们还进行了减少效果的实验来验证模型调整的有效性。

Indoor Localization Using Radio, Vision and Audio Sensors: Real-Life Data Validation and Discussion

paper_url: http://arxiv.org/abs/2309.02961
repo_url: None
paper_authors: Ilayda Yaman, Guoda Tian, Erik Tegler, Patrik Persson, Nikhil Challa, Fredrik Tufvesson, Ove Edfors, Kalle Astrom, Steffen Malkowsky, Liang Liu
for: 本研究探讨了使用Radio、视觉和声音感知器在同一环境中进行indoor定位方法。
methods: 本研究使用了现代算法和实际数据进行评估，包括使用巨量MIMO技术的机器学习算法 дляRadio定位、使用RGB-D摄像头的ORB-SLAM3算法 для视觉定位、以及使用麦克风数组的SFS2算法 для声音定位。
results: 本研究发现了不同感知器的定位精度、可靠性、准备需求和可能的系统复杂性等方面的优劣点，并提供了一个基础和引导 для进一步发展高精度多感知定位系统，如感知融合和上下文和环境意识适应。

Abstract
This paper investigates indoor localization methods using radio, vision, and audio sensors, respectively, in the same environment. The evaluation is based on state-of-the-art algorithms and uses a real-life dataset. More specifically, we evaluate a machine learning algorithm for radio-based localization with massive MIMO technology, an ORB-SLAM3 algorithm for vision-based localization with an RGB-D camera, and an SFS2 algorithm for audio-based localization with microphone arrays. Aspects including localization accuracy, reliability, calibration requirements, and potential system complexity are discussed to analyze the advantages and limitations of using different sensors for indoor localization tasks. The results can serve as a guideline and basis for further development of robust and high-precision multi-sensory localization systems, e.g., through sensor fusion and context and environment-aware adaptation.

摘要

A Non-Invasive Interpretable NAFLD Diagnostic Method Combining TCM Tongue Features

paper_url: http://arxiv.org/abs/2309.02959
repo_url: https://github.com/cshan-github/selectornet
paper_authors: Shan Cao, Qunsheng Ruan, Qingfeng Wu
for: 这个研究是为了提出一个非侵入性的、可解释的非酒精肝病诊断方法，并且仅需要用户提供的数据是年龄、性别、身高、体重、腰围和脐围，以及舌头图像。
methods: 这个方法使用了融合病人的生物体指标和舌头特征，然后将其输入到名为SelectorNet的融合网络中，SelectorNet包括了注意力机制和特征选择机制，可以自动学习选择重要的特征。
results: 实验结果显示，提出的方法可以使用非侵入性数据 achieve an accuracy of 77.22%，并且提供了吸引人的解释矩阵。

Abstract
Non-alcoholic fatty liver disease (NAFLD) is a clinicopathological syndrome characterized by hepatic steatosis resulting from the exclusion of alcohol and other identifiable liver-damaging factors. It has emerged as a leading cause of chronic liver disease worldwide. Currently, the conventional methods for NAFLD detection are expensive and not suitable for users to perform daily diagnostics. To address this issue, this study proposes a non-invasive and interpretable NAFLD diagnostic method, the required user-provided indicators are only Gender, Age, Height, Weight, Waist Circumference, Hip Circumference, and tongue image. This method involves merging patients' physiological indicators with tongue features, which are then input into a fusion network named SelectorNet. SelectorNet combines attention mechanisms with feature selection mechanisms, enabling it to autonomously learn the ability to select important features. The experimental results show that the proposed method achieves an accuracy of 77.22\% using only non-invasive data, and it also provides compelling interpretability matrices. This study contributes to the early diagnosis of NAFLD and the intelligent advancement of TCM tongue diagnosis. The project in this paper is available at: https://github.com/cshan-github/SelectorNet.

摘要
(Note: Simplified Chinese is used in this translation, as it is more widely used in mainland China and is the standard form of Chinese used in education and government. Traditional Chinese is used in Hong Kong, Macau, and Taiwan.)

Robust Visual Tracking by Motion Analyzing

paper_url: http://arxiv.org/abs/2309.03247
repo_url: https://github.com/XJLeoYu/Robust-Visual-Tracking-by-Motion-Analyzing
paper_authors: Mohammed Leo, Kurban Ubul, ShengJie Cheng, Michael Ma
for: 这篇论文旨在提出一种新的视频对象分割算法，以提高视频对象跟踪（VOT）的精度和效率。
methods: 该算法使用了tensor结构来描述目标的运动模式，并将其 integrate into the segmentation module。
results: 该算法在四个 benchmark（LaSOT\cite{fan2019lasot}, AVisT\cite{noman2022avist}, OTB100\cite{7001050}, GOT-10k\cite{huang2019got}) 上达到了SOTA的结果，并且具有实时运行的能力。

Abstract
In recent years, Video Object Segmentation (VOS) has emerged as a complementary method to Video Object Tracking (VOT). VOS focuses on classifying all the pixels around the target, allowing for precise shape labeling, while VOT primarily focuses on the approximate region where the target might be. However, traditional segmentation modules usually classify pixels frame by frame, disregarding information between adjacent frames. In this paper, we propose a new algorithm that addresses this limitation by analyzing the motion pattern using the inherent tensor structure. The tensor structure, obtained through Tucker2 tensor decomposition, proves to be effective in describing the target's motion. By incorporating this information, we achieved competitive results on Four benchmarks LaSOT\cite{fan2019lasot}, AVisT\cite{noman2022avist}, OTB100\cite{7001050}, and GOT-10k\cite{huang2019got} LaSOT\cite{fan2019lasot} with SOTA. Furthermore, the proposed tracker is capable of real-time operation, adding value to its practical application.

摘要
近年来，视频对象分割（VOS）作为对象跟踪（VOT）的补充方法而出现。VOS专注于将目标周围的所有像素分类，以获得精确的形态标注，而VOT主要关注目标的可能存在的短区域。然而，传统的分割模块通常frame by frame来分类像素，忽略了邻帧信息。在这篇论文中，我们提出了一种新的算法，通过利用内在的维度结构来解决这一限制。这种维度结构通过图kernels decomposition获得，并证明了其在目标运动中的效果。通过 incorporating这种信息，我们实现了在四个benchmark上达到了SOTA的竞争性成绩，分别是LaSOT\cite{fan2019lasot}, Avist\cite{noman2022avist}, OTB100\cite{7001050}和GOT-10k\cite{huang2019got}。此外，我们的追踪器还具有实时运行的能力，增加了其在实际应用中的价值。

M3D-NCA: Robust 3D Segmentation with Built-in Quality Control

paper_url: http://arxiv.org/abs/2309.02954
repo_url: None
paper_authors: John Kalkhof, Anirban Mukhopadhyay
for: 这篇论文的目的是提出一种基于神经细胞自动机（NCA）的三维医疗影像分类方法，以提高资源受限的医疗设施和冲突区域中的医疗影像分类效能。
methods: 这篇论文使用的方法是基于NCA的三维医疗影像分类方法，通过n级融合来实现。此外，这篇论文还提出了一个基于M3D-NCA的质量指标，可以自动检测NCAs中的错误。
results: 这篇论文的结果显示，M3D-NCA在脑干和膀胱分类中比据两个较大的UNet模型高出2%的 dice值，并且可以在Raspberry Pi 4 Model B（2GB RAM）上运行。这显示M3D-NCA可能是一个有效和高效的医疗影像分类方法，特别是在资源受限的医疗设施和冲突区域中。

Abstract
Medical image segmentation relies heavily on large-scale deep learning models, such as UNet-based architectures. However, the real-world utility of such models is limited by their high computational requirements, which makes them impractical for resource-constrained environments such as primary care facilities and conflict zones. Furthermore, shifts in the imaging domain can render these models ineffective and even compromise patient safety if such errors go undetected. To address these challenges, we propose M3D-NCA, a novel methodology that leverages Neural Cellular Automata (NCA) segmentation for 3D medical images using n-level patchification. Moreover, we exploit the variance in M3D-NCA to develop a novel quality metric which can automatically detect errors in the segmentation process of NCAs. M3D-NCA outperforms the two magnitudes larger UNet models in hippocampus and prostate segmentation by 2% Dice and can be run on a Raspberry Pi 4 Model B (2GB RAM). This highlights the potential of M3D-NCA as an effective and efficient alternative for medical image segmentation in resource-constrained environments.

摘要
医疗图像分割依赖大规模深度学习模型，如UNet基 architecture。然而，这些模型在实际应用中受限于高计算需求，使其不适用于资源有限的环境，如初级医疗机构和冲突区域。此外，图像领域的变化可能使这些模型无效，甚至威胁 patient safety 如果这些错误未探测。为Addressing these challenges, we propose M3D-NCA，一种新的方法，利用神经细胞自动机(NCA) segmentation for 3D medical images using n-level patchification。此外，我们利用 M3D-NCA 的变异来开发一种新的质量指标，可以自动探测 NCAs 分割过程中的错误。M3D-NCA 在 hippocampus 和 prostate 分割方面比 UNet 模型两倍大的范围内出performanced by 2% Dice，并且可以在 Raspberry Pi 4 Model B (2GB RAM) 上运行。这些结果表明 M3D-NCA 可以作为医疗图像分割的有效和高效的替代方案。

Patched Line Segment Learning for Vector Road Mapping

paper_url: http://arxiv.org/abs/2309.02923
repo_url: None
paper_authors: Jiakun Xu, Bowen Xu, Gui-Song Xia, Liang Dong, Nan Xue
for: 本研究提出了一种新的方法，用于从卫星遥感图像中计算向量道路地图。
methods: 该方法使用线段来表示路径，不仅捕捉了路径的位置，还捕捉了路径的方向，使其成为一种强大的表示方式。
results: 在我们的实验中，我们发现使用有效的路径表示可以大幅提高向量道路映射的性能，而不需要对神经网络架构进行大量修改。此外，我们的方法只需6个GPU小时的训练，相比之下，已有的方法需要32倍的训练时间。

Abstract
This paper presents a novel approach to computing vector road maps from satellite remotely sensed images, building upon a well-defined Patched Line Segment (PaLiS) representation for road graphs that holds geometric significance. Unlike prevailing methods that derive road vector representations from satellite images using binary masks or keypoints, our method employs line segments. These segments not only convey road locations but also capture their orientations, making them a robust choice for representation. More precisely, given an input image, we divide it into non-overlapping patches and predict a suitable line segment within each patch. This strategy enables us to capture spatial and structural cues from these patch-based line segments, simplifying the process of constructing the road network graph without the necessity of additional neural networks for connectivity. In our experiments, we demonstrate how an effective representation of a road graph significantly enhances the performance of vector road mapping on established benchmarks, without requiring extensive modifications to the neural network architecture. Furthermore, our method achieves state-of-the-art performance with just 6 GPU hours of training, leading to a substantial 32-fold reduction in training costs in terms of GPU hours.

摘要
Specifically, we divide the input image into non-overlapping patches and predict a suitable line segment within each patch. This allows us to capture spatial and structural cues from these patch-based line segments, simplifying the process of constructing the road network graph without the need for additional neural networks for connectivity.In our experiments, we show how our effective representation of the road graph significantly enhances the performance of vector road mapping on established benchmarks, without requiring extensive modifications to the neural network architecture. Additionally, our method achieves state-of-the-art performance with just 6 GPU hours of training, leading to a substantial 32-fold reduction in training costs in terms of GPU hours.

Towards Efficient Training with Negative Samples in Visual Tracking

paper_url: http://arxiv.org/abs/2309.02903
repo_url: None
paper_authors: Qingmao Wei, Bi Zeng, Guotian Zeng
for: 降低现代视觉对象跟踪方法中的计算资源和训练数据量，以避免过拟合。
methods: 该研究提出了一种更加高效的训练策略，通过将负样本与正样本混合，即 JOINT LEARNING WITH NEGATIVE SAMPLES（JN），以防止模型僵化并强制它使用模板来定位目标。此外，我们采用了分布型头，以表达在负样本下的uncertainty，从而有效地处理负样本。
results: 我们的模型JN-256在挑战性评价 benchmark上达到了75.8% AO 和 84.1% AUC，超过了以前的SOTA tracker，即使使用更大的模型和更高的输入分辨率。另外，JN-256 在训练时使用的数据量只有半个以前的works使用的样本数。

Abstract
Current state-of-the-art (SOTA) methods in visual object tracking often require extensive computational resources and vast amounts of training data, leading to a risk of overfitting. This study introduces a more efficient training strategy to mitigate overfitting and reduce computational requirements. We balance the training process with a mix of negative and positive samples from the outset, named as Joint learning with Negative samples (JN). Negative samples refer to scenarios where the object from the template is not present in the search region, which helps to prevent the model from simply memorizing the target, and instead encourages it to use the template for object location. To handle the negative samples effectively, we adopt a distribution-based head, which modeling the bounding box as distribution of distances to express uncertainty about the target's location in the presence of negative samples, offering an efficient way to manage the mixed sample training. Furthermore, our approach introduces a target-indicating token. It encapsulates the target's precise location within the template image. This method provides exact boundary details with negligible computational cost but improving performance. Our model, JN-256, exhibits superior performance on challenging benchmarks, achieving 75.8% AO on GOT-10k and 84.1% AUC on TrackingNet. Notably, JN-256 outperforms previous SOTA trackers that utilize larger models and higher input resolutions, even though it is trained with only half the number of data sampled used in those works.

摘要
现代Visual对象跟踪方法通常需要大量的计算资源和庞大的训练数据，导致风险过拟合。本研究提出了更加高效的训练策略，以避免过拟合和降低计算需求。我们从开始就将训练过程平衡使用负样本和正样本，称之为共同学习负样本（JN）。负样本指的是在搜索区域中没有目标对象的场景，这有助于避免模型升级目标，而是使其使用模板来定位对象。为了有效处理负样本，我们采用分布型头，通过表示范围内的距离分布来表示目标的位置不确定性，提供了一种高效的混合样本训练方法。此外，我们引入目标指示符。它将目标的准确位置包含在模板图像中。这种方法提供了精确的边界详细信息，而且计算成本很低，但是提高性能。我们的模型JN-256在GOT-10k和TrackingNet等挑战性评测中表现出色，达到75.8% AO和84.1% AUC。值得一提的是，JN-256比使用更大的模型和更高的输入分辨率的先前SOTA跟踪器即使在训练数据量为半数时，仍然表现出优异的性能。

A Unified Framework for Discovering Discrete Symmetries

paper_url: http://arxiv.org/abs/2309.02898
repo_url: None
paper_authors: Pavan Karjol, Rohan Kashyap, Aditya Gopalan, Prathosh A. P
for: 学习一个尊重Symmetry的函数，从多种子群中选择最佳函数。
methods: 提出了一种统一的框架，可以快速地找到最佳函数，并且可以处理多种不同的子群，包括地方对称、双棱镜对称和循环对称。核心是一种新的架构，包括线性和张量函数，可以很好地表达对Symmetry的函数。
results: 在图像数字和多项式回归任务中，该方法得到了极好的效果。

Abstract
We consider the problem of learning a function respecting a symmetry from among a class of symmetries. We develop a unified framework that enables symmetry discovery across a broad range of subgroups including locally symmetric, dihedral and cyclic subgroups. At the core of the framework is a novel architecture composed of linear and tensor-valued functions that expresses functions invariant to these subgroups in a principled manner. The structure of the architecture enables us to leverage multi-armed bandit algorithms and gradient descent to efficiently optimize over the linear and the tensor-valued functions, respectively, and to infer the symmetry that is ultimately learnt. We also discuss the necessity of the tensor-valued functions in the architecture. Experiments on image-digit sum and polynomial regression tasks demonstrate the effectiveness of our approach.

摘要
我们考虑一个函数在一组对称性下学习的问题。我们提出了一个统一的框架，可以在广泛的子群中包括地方对称、二面体和循环 subgroup中发现对称。框架的核心是一种新的建筑，由线性和张量函数组成，用于在原则上表达对称的函数。这种建筑结构允许我们通过多重投射算法和梯度下降来高效地优化线性函数和张量函数，并从中INFER learnt的对称性。我们还讨论了张量函数的必要性。在图像数字和多项式回归任务上，我们的方法得到了效果。

Image Aesthetics Assessment via Learnable Queries

paper_url: http://arxiv.org/abs/2309.02861
repo_url: None
paper_authors: Zhiwei Xiong, Yunfan Zhang, Zhiqi Shen, Peiran Ren, Han Yu
for: 这篇论文的目的是提出一种基于学习查询的图像美学评估方法（IAA-LQ），以提高图像美学评估的效果。methods: 该方法使用学习查询来提取图像中的美学特征，并且使用冻结的图像编码器来获取先验知识。results: 实验结果表明，IAA-LQ方法可以超越现有的状态时间方法，提高图像美学评估的精度，并且可以提高SRCC和PLCC指标的值。

Abstract
Image aesthetics assessment (IAA) aims to estimate the aesthetics of images. Depending on the content of an image, diverse criteria need to be selected to assess its aesthetics. Existing works utilize pre-trained vision backbones based on content knowledge to learn image aesthetics. However, training those backbones is time-consuming and suffers from attention dispersion. Inspired by learnable queries in vision-language alignment, we propose the Image Aesthetics Assessment via Learnable Queries (IAA-LQ) approach. It adapts learnable queries to extract aesthetic features from pre-trained image features obtained from a frozen image encoder. Extensive experiments on real-world data demonstrate the advantages of IAA-LQ, beating the best state-of-the-art method by 2.2% and 2.1% in terms of SRCC and PLCC, respectively.

摘要
Image 美学评估（IAA）目的是估计图像的美学性。根据图像内容，需要选择不同的标准来评估图像的美学性。现有的工作使用基于内容知识的预训练视觉干扰来学习图像美学。然而，训练这些干扰是时间consuming，而且容易产生注意力分散。 drawing inspiration from learnable queries in vision-language alignment, we propose the Image Aesthetics Assessment via Learnable Queries（IAA-LQ）approach。它适应learnable queries来提取图像美学特征从预训练图像特征中。我们进行了大量的实验，并证明IAA-LQ在实际数据上具有优势，比best state-of-the-art方法高2.2%和2.1%的SRCC和PLCC指标。

Bandwidth-efficient Inference for Neural Image Compression

paper_url: http://arxiv.org/abs/2309.02855
repo_url: None
paper_authors: Shanzhi Yin, Tongda Xu, Yongsheng Liang, Yuanyuan Wang, Yanghao Li, Yan Wang, Jingjing Liu
for: 提高Mobile和Edge设备上的神经网络推理性能，解决由深度神经网络和大小尺寸的特征地图所带来的带宽瓶颈和能源限制问题。
methods: 提出了一种终端到终端可微 differentiable带宽高效神经推理方法，其中的活化被使用神经数据压缩法进行压缩。特别是，我们提出了一种基于变换量化 entropy coding的pipeline，用于活化压缩。
results: 对于现有模型量化方法优化后的低级任务图像压缩，可以达到最高19x的带宽减少和6.21x的能源减少。

Abstract
With neural networks growing deeper and feature maps growing larger, limited communication bandwidth with external memory (or DRAM) and power constraints become a bottleneck in implementing network inference on mobile and edge devices. In this paper, we propose an end-to-end differentiable bandwidth efficient neural inference method with the activation compressed by neural data compression method. Specifically, we propose a transform-quantization-entropy coding pipeline for activation compression with symmetric exponential Golomb coding and a data-dependent Gaussian entropy model for arithmetic coding. Optimized with existing model quantization methods, low-level task of image compression can achieve up to 19x bandwidth reduction with 6.21x energy saving.

摘要
随着神经网络的深度和特征图的大小增加，在移动和边缘设备上实现网络推理时的限制性带宽和功能占用成为瓶颈。在这篇论文中，我们提出了一种终端到终端可导 differentiable带宽高效神经推理方法，其中活动被压缩使用神经数据压缩方法。具体来说，我们提出了一个转换-量化-Entropy编码管道，用于活动压缩，并使用对称的指数 Golomb编码和数据依赖的 Gaussian Entropy 模型 для数学编码。通过与现有模型量化方法结合优化，可以达到图像压缩的最低级别任务，带宽减少至 19 倍，能耗减少至 6.21 倍。

Knowledge Distillation Layer that Lets the Student Decide

paper_url: http://arxiv.org/abs/2309.02843
repo_url: https://github.com/adagorgun/letkd-framework
paper_authors: Ada Gorgun, Yeti Z. Gurbuz, A. Aydin Alatan
for: 本研究旨在提高知识蒸馈（KD）技术的实践，使学习Student模型的限制能力模型（teacher）的学习过程更加有效。
methods: 我们提出了一种可学习的KD层，用于在学习Student模型时将教师模型的知识直接嵌入到特征变换中。这些能力包括：i) 利用教师模型的知识来抛弃干扰信息，ii) 将知识传递深入。因此，学生模型在推理过程中可以获得教师模型的知识，不仅在训练过程中。
results: 我们通过对3种常见的分类 benchmark进行严格的实验，证明了我们的方法的效果。

Abstract
Typical technique in knowledge distillation (KD) is regularizing the learning of a limited capacity model (student) by pushing its responses to match a powerful model's (teacher). Albeit useful especially in the penultimate layer and beyond, its action on student's feature transform is rather implicit, limiting its practice in the intermediate layers. To explicitly embed the teacher's knowledge in feature transform, we propose a learnable KD layer for the student which improves KD with two distinct abilities: i) learning how to leverage the teacher's knowledge, enabling to discard nuisance information, and ii) feeding forward the transferred knowledge deeper. Thus, the student enjoys the teacher's knowledge during the inference besides training. Formally, we repurpose 1x1-BN-ReLU-1x1 convolution block to assign a semantic vector to each local region according to the template (supervised by the teacher) that the corresponding region of the student matches. To facilitate template learning in the intermediate layers, we propose a novel form of supervision based on the teacher's decisions. Through rigorous experimentation, we demonstrate the effectiveness of our approach on 3 popular classification benchmarks. Code is available at: https://github.com/adagorgun/letKD-framework

摘要
通常的知识填充（KD）技术是使学生模型（学生）的学习受到师模型（教师）的 régularization，使学生的回归与教师的回归相匹配。虽然在最后一层和以上层 particularly useful，但是对于学生的特征变换的影响相对较弱，限制了KD的实践在中间层。为了让学生在推理过程中获得教师的知识，我们提议一种可学习的KD层，该层具有以下两种能力：i) 学习如何利用教师的知识，以便抛弃干扰信息；ii) 将知识传递深入。因此，学生在推理过程中不仅会受到教师的影响，还会在推理过程中直接获得教师的知识。具体来说，我们将1x1-BN-ReLU-1x1 convolution block重新用于将每个地方分配一个 semantics vector，根据学生对应的地方与教师的地方匹配的模板（由教师supervise）。为了在中间层进行模板学习，我们提出了一种新的超visions基于教师的决策。经过严格的实验，我们证明了我们的方法在3个popular classification benchmark上的效果。代码可以在：https://github.com/adagorgun/letKD-framework 中找到。

Adjacency-hopping de Bruijn Sequences for Non-repetitive Coding

paper_url: http://arxiv.org/abs/2309.02841
repo_url: None
paper_authors: Bin Chen, Zhenglin Liang, Shiqian Wu
for: 这种论文主要用于描述一种特殊的循环序列，即邻接跳跃de Bruijn序列，以及这种序列在结构化光编码中的应用。
methods: 这种序列使用了一种新的邻接跳跃方法，这种方法可以保证邻接代码具有不同性，同时保持 subsequences 的唯一性。
results: 该论文 theoretically 证明了邻接跳跃de Bruijn序列的存在，并计算了这种序列的数量。此外，该论文还应用了这种序列在结构化光编码中，并提供了一个具有唯一性和邻接不同性的色带图像编码方案。

Abstract
A special type of cyclic sequences named adjacency-hopping de Bruijn sequences is introduced in this paper. It is theoretically proved the existence of such sequences, and the number of such sequences is derived. These sequences guarantee that all neighboring codes are different while retaining the uniqueness of subsequences, which is a significant characteristic of original de Bruijn sequences in coding and matching. At last, the adjacency-hopping de Bruijn sequences are applied to structured light coding, and a color fringe pattern coded by such a sequence is presented. In summary, the proposed sequences demonstrate significant advantages in structured light coding by virtue of the uniqueness of subsequences and the adjacency-hopping characteristic, and show potential for extension to other fields with similar requirements of non-repetitive coding and efficient matching.

摘要
本文提出了一种特殊的循环序列，名为邻接跳跃de Bruijn序列。这种序列的存在性被证明了，并且计算出了其数量。这些序列 garantía所有邻近代码都不同，同时保留原始de Bruijn序列中的 subsequences 唯一性，这是一个重要的特征。最后，这种序列应用于结构化光编码，并将其用于生成一个彩色环纹图像。总的来说，提出的序列在结构化光编码中表现出了优异的特点，包括 subsequences 的唯一性和邻接跳跃特性，并且具有扩展到其他需要非重复编码和效果快速匹配的领域的潜力。

EGIC: Enhanced Low-Bit-Rate Generative Image Compression Guided by Semantic Segmentation

paper_url: http://arxiv.org/abs/2309.03244
repo_url: https://github.com/nikolai10/egic
paper_authors: Nikolai Körber, Eduard Kromer, Andreas Siebert, Sascha Hauke, Daniel Mueller-Gritschneder
for: This paper is written for improving image compression using a generative model.
methods: The paper proposes a novel method called EGIC, which uses an implicitly encoded variant of image interpolation to predict the residual between a MSE-optimized and GAN-optimized decoder output.
results: The paper shows that EGIC outperforms several baseline methods, including HiFiC, MRIC, and DIRAC, while performing almost on par with VTM-20.0 on the distortion end. Additionally, EGIC is simple to implement, lightweight, and provides excellent interpolation characteristics, making it a promising candidate for practical applications targeting the low bit range.Here is the text in Simplified Chinese:
for: 这篇论文是为了提高图像压缩使用生成模型而写的。
methods: 论文提出了一种新的方法called EGIC，它使用隐式编码的图像 interpolate来预测MSE优化和GAN优化解码器输出之间的差异。
results: 论文表明EGIC在比较多种基eline方法，包括HiFiC、MRIC和DIRAC，而且与VTM-20.0在损均值上几乎相当，而且EGIC具有简单实现、轻量级（例如0.18x模型参数相比HiFiC）、优秀的 interpolate特性，使其成为实际应用中的优秀候选者。

Abstract
We introduce EGIC, a novel generative image compression method that allows traversing the distortion-perception curve efficiently from a single model. Specifically, we propose an implicitly encoded variant of image interpolation that predicts the residual between a MSE-optimized and GAN-optimized decoder output. On the receiver side, the user can then control the impact of the residual on the GAN-based reconstruction. Together with improved GAN-based building blocks, EGIC outperforms a wide-variety of perception-oriented and distortion-oriented baselines, including HiFiC, MRIC and DIRAC, while performing almost on par with VTM-20.0 on the distortion end. EGIC is simple to implement, very lightweight (e.g. 0.18x model parameters compared to HiFiC) and provides excellent interpolation characteristics, which makes it a promising candidate for practical applications targeting the low bit range.

摘要
我们介绍EGIC，一种新的生成图像压缩方法，可以高效地从单一模型中横越偏差感知曲线。具体来说，我们提出了一种隐式编码的图像插值方法，可以预测MSE优化和GAN优化解oder输出之间的差异。在接收端，用户可以控制GAN基建的重建影响。与改进的GAN基建块相结合，EGIC超越了广泛的感知对象和偏差对象的基eline，包括HiFiC、MRIC和DIRAC，并且在偏差端接近VTM-20.0的性能。EGIC简单实现、轻量级（例如0.18x的模型参数相比HiFiC），具有出色的插值特性，使其成为实际应用中的吸引人选择。

Image-Object-Specific Prompt Learning for Few-Shot Class-Incremental Learning

paper_url: http://arxiv.org/abs/2309.02833
repo_url: None
paper_authors: In-Ug Yoon, Tae-Min Choi, Sun-Kyung Lee, Young-Min Kim, Jong-Hwan Kim
for: This paper aims to improve the performance of Fine-Grained Semantic Image Segmentation (FSCIL) in incremental sessions, where the encoder often underperforms.
methods: The authors propose a novel training framework that leverages the generalizability of the Contrastive Language-Image Pre-training (CLIP) model to unseen classes, and formulates image-object-specific (IOS) classifiers for input images.
results: The proposed framework consistently demonstrates superior performance compared to state-of-the-art methods across three datasets (miniImageNet, CIFAR100, and CUB200), and the authors provide additional experiments to validate the learned model’s ability to achieve IOS classifiers.Here is the summary in the format you requested:
for: 提高 FSCIL 在增量会话中的表现，特别是在遇到训练集不够的情况下。
methods: 提出一种基于 CLIP 模型的新的训练框架，通过形成输入图像的特定属性（如翼或轮）的特定类别（IOS）分类器来提高 FSCIL 的表现。
results: 提出的框架在 miniImageNet、CIFAR100 和 CUB200 三个 dataset 上 consistently 示出了比state-of-the-art 方法更好的表现，并提供了验证学习模型是否可以实现 IOS 分类器的额外实验。

Abstract
While many FSCIL studies have been undertaken, achieving satisfactory performance, especially during incremental sessions, has remained challenging. One prominent challenge is that the encoder, trained with an ample base session training set, often underperforms in incremental sessions. In this study, we introduce a novel training framework for FSCIL, capitalizing on the generalizability of the Contrastive Language-Image Pre-training (CLIP) model to unseen classes. We achieve this by formulating image-object-specific (IOS) classifiers for the input images. Here, an IOS classifier refers to one that targets specific attributes (like wings or wheels) of class objects rather than the image's background. To create these IOS classifiers, we encode a bias prompt into the classifiers using our specially designed module, which harnesses key-prompt pairs to pinpoint the IOS features of classes in each session. From an FSCIL standpoint, our framework is structured to retain previous knowledge and swiftly adapt to new sessions without forgetting or overfitting. This considers the updatability of modules in each session and some tricks empirically found for fast convergence. Our approach consistently demonstrates superior performance compared to state-of-the-art methods across the miniImageNet, CIFAR100, and CUB200 datasets. Further, we provide additional experiments to validate our learned model's ability to achieve IOS classifiers. We also conduct ablation studies to analyze the impact of each module within the architecture.

摘要
虽然许多FSCIL研究已经进行过，但在增量会话中达到满意的表现仍然是一大挑战。一个显著的挑战是，在增量会话中训练的编码器，经常在增量会话中下降表现。在这个研究中，我们提出了一种新的FSCIL训练框架，利用CLIP模型对未看过的类型的泛化能力。我们实现这种IOS类ifiers的目的是通过我们特制的模块，将特定的属性（如翼或轮胎）作为类对象的特征进行编码。在FSCIL的视角下，我们的框架结构是保留先前的知识，并快速适应新会话而不忘记或过拟合。这包括在每个会话中更新模块的机制，以及一些经验上发现的快速吸收技巧。我们的方法在miniImageNet、CIFAR100和CUB200数据集上 consistently 示出了与状态ixel方法相比的superior表现。此外，我们还提供了额外的实验，以验证我们学习的模型是否可以实现IOS类ifiers。此外，我们还进行了ablation study，以分析每个模块在架构中的影响。

3D Trajectory Reconstruction of Drones using a Single Camera

paper_url: http://arxiv.org/abs/2309.02801
repo_url: None
paper_authors: Seobin Hwang, Hanyoung Kim, Chaeyeon Heo, Youkyoung Na, Cheongeun Lee, Yeongjun Cho
for: 防止非法用途的尼龙，这项研究提出了一种基于单个摄像头的尼龙三维轨迹重建框架。
methods: 该方法利用准备好的摄像头进行尼龙自动跟踪，并利用尼龙的实际长度信息和摄像头参数，通过几何关系确定尼龙的三维轨迹。
results: 实验结果表明，提出的方法可以准确地重建尼龙的三维轨迹，并demonstrate了该框架在单摄像头式监测系统中的潜力。

Abstract
Drones have been widely utilized in various fields, but the number of drones being used illegally and for hazardous purposes has increased recently. To prevent those illegal drones, in this work, we propose a novel framework for reconstructing 3D trajectories of drones using a single camera. By leveraging calibrated cameras, we exploit the relationship between 2D and 3D spaces. We automatically track the drones in 2D images using the drone tracker and estimate their 2D rotations. By combining the estimated 2D drone positions with their actual length information and camera parameters, we geometrically infer the 3D trajectories of the drones. To address the lack of public drone datasets, we also create synthetic 2D and 3D drone datasets. The experimental results show that the proposed methods accurately reconstruct drone trajectories in 3D space, and demonstrate the potential of our framework for single camera-based surveillance systems.

摘要
《用单一摄像头掌握无人机三维轨迹》Introduction:无人机在不同领域得到广泛应用，但近些年来，非法使用无人机的数量却在增加。为防止这些非法无人机，在这项工作中，我们提出了一种基于单一摄像头的无人机三维轨迹重建方案。我们利用准备好的摄像头进行准确跟踪无人机的2D位势，并利用摄像头的捕捉到无人机的2D旋转信息，以确定无人机的3D轨迹。Methodology:我们首先使用准备好的摄像头对无人机进行跟踪，并利用摄像头的准确性来确定无人机的2D位势。然后，我们利用无人机的实际长度信息和摄像头参数，在3D空间中进行三维轨迹的重建。为了解决公共无人机数据缺乏的问题，我们还创造了一些人工生成的2D和3D无人机数据集。Results:实验结果表明，我们提出的方法可以准确地重建无人机的3D轨迹，并证明了我们的框架在单一摄像头基础上的可行性。这些结果还表明了我们的方法在无人机监测系统中的潜在应用前景。Conclusion:本文提出了一种基于单一摄像头的无人机三维轨迹重建方案，可以准确地重建无人机的3D轨迹。我们还创造了一些人工生成的2D和3D无人机数据集，以便进一步验证和改进我们的方法。这些结果表明了我们的方法在无人机监测系统中的潜在应用前景。

LightNeuS: Neural Surface Reconstruction in Endoscopy using Illumination Decline

paper_url: http://arxiv.org/abs/2309.02777
repo_url: None
paper_authors: Víctor M. Batlle, José M. M. Montiel, Pascal Fua, Juan D. Tardós
for: 该论文旨在提出一种基于照明变化的眼窍endooscope序列图像三维重建方法。
methods: 该方法基于两点关键意识：首先，endooscope中的液体空间是水密的，这种性质由签名距离函数来自然强制实现。其次，场景照明是可变的，来自endooscope的灯光和距离表面的 inverse squared 强度衰减。为了利用这两点，该方法基于NeuS算法，一种能够从多视图学习表面形态和外观特征的神经 implicit surface reconstruction 技术，但是现在只能处理静止照明的场景。为了解决这个限制，我们修改NeuS架构，考虑到灯光和camera的准确均衡，并引入了准确的摄像头和灯光Source的光学模型。
results: 该方法可以生成整个colon部分的水密重建，并在幻像中达到了出色的准确性。另外，由于照明下降和水密 prior的结合，该方法可以完成未被观察的表面部分的重建，并达到了可接受的准确性，这些结果为自动评估癌症检查探针提供了基础。

Abstract
We propose a new approach to 3D reconstruction from sequences of images acquired by monocular endoscopes. It is based on two key insights. First, endoluminal cavities are watertight, a property naturally enforced by modeling them in terms of a signed distance function. Second, the scene illumination is variable. It comes from the endoscope's light sources and decays with the inverse of the squared distance to the surface. To exploit these insights, we build on NeuS, a neural implicit surface reconstruction technique with an outstanding capability to learn appearance and a SDF surface model from multiple views, but currently limited to scenes with static illumination. To remove this limitation and exploit the relation between pixel brightness and depth, we modify the NeuS architecture to explicitly account for it and introduce a calibrated photometric model of the endoscope's camera and light source. Our method is the first one to produce watertight reconstructions of whole colon sections. We demonstrate excellent accuracy on phantom imagery. Remarkably, the watertight prior combined with illumination decline, allows to complete the reconstruction of unseen portions of the surface with acceptable accuracy, paving the way to automatic quality assessment of cancer screening explorations, measuring the global percentage of observed mucosa.

摘要

Endoluminal cavities are watertight, which can be naturally enforced by modeling them as signed distance functions.2. The scene illumination is variable and decays with the inverse of the squared distance to the surface.To exploit these insights, we build on NeuS, a neural implicit surface reconstruction technique that can learn appearance and a SDF surface model from multiple views. However, NeuS is limited to scenes with static illumination. To remove this limitation, we modify the NeuS architecture to explicitly account for the relation between pixel brightness and depth, and introduce a calibrated photometric model of the endoscope’s camera and light source.Our method is the first to produce watertight reconstructions of whole colon sections. We demonstrate excellent accuracy on phantom imagery, and remarkably, the watertight prior combined with illumination decline allows us to complete the reconstruction of unseen portions of the surface with acceptable accuracy. This paves the way to automatic quality assessment of cancer screening explorations, and measuring the global percentage of observed mucosa.

Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter

paper_url: http://arxiv.org/abs/2309.02773
repo_url: None
paper_authors: Jinglong Wang, Xiawei Li, Jing Zhang, Qingyuan Xu, Qin Zhou, Qian Yu, Lu Sheng, Dong Xu
for: 该 paper 主要研究开放 vocabulary Semantic Segmentation 的问题，尝试使用 Conditional Latent Diffusion Model 来解决这个问题。
methods: 该 paper 使用了 pre-trained text-image discriminative models, such as CLIP, 并通过 contrastive learning 来Alignment 过程，但是这个过程可能会导致重要的局部化信息和物体完整性的丢失，这些信息是为Semantic Segmentation 准确性所必需的。
results: 该 paper 提出了一种基于 diffusion models 的open-vocabulary Semantic Segmentation 方法，并通过实验证明了这种方法可以 дости得高效的结果。Specifically, the proposed method uses a training-free approach named DiffSegmenter, which utilizes cross-attention maps produced by the denoising U-Net to generate segmentation scores, and further refines and completes the segmentation results with self-attention maps. The proposed method also designs effective textual prompts and a category filtering mechanism to enhance the segmentation results. Extensive experiments on three benchmark datasets show that the proposed DiffSegmenter achieves impressive results for open-vocabulary semantic segmentation.

Abstract
Recent research has explored the utilization of pre-trained text-image discriminative models, such as CLIP, to tackle the challenges associated with open-vocabulary semantic segmentation. However, it is worth noting that the alignment process based on contrastive learning employed by these models may unintentionally result in the loss of crucial localization information and object completeness, which are essential for achieving accurate semantic segmentation. More recently, there has been an emerging interest in extending the application of diffusion models beyond text-to-image generation tasks, particularly in the domain of semantic segmentation. These approaches utilize diffusion models either for generating annotated data or for extracting features to facilitate semantic segmentation. This typically involves training segmentation models by generating a considerable amount of synthetic data or incorporating additional mask annotations. To this end, we uncover the potential of generative text-to-image conditional diffusion models as highly efficient open-vocabulary semantic segmenters, and introduce a novel training-free approach named DiffSegmenter. Specifically, by feeding an input image and candidate classes into an off-the-shelf pre-trained conditional latent diffusion model, the cross-attention maps produced by the denoising U-Net are directly used as segmentation scores, which are further refined and completed by the followed self-attention maps. Additionally, we carefully design effective textual prompts and a category filtering mechanism to further enhance the segmentation results. Extensive experiments on three benchmark datasets show that the proposed DiffSegmenter achieves impressive results for open-vocabulary semantic segmentation.

摘要
近期研究探讨了利用预训练文本图像抗变模型，如CLIP，解决开 vocabulary semantic segmentation 中的挑战。然而，需要注意的是，这些模型使用的对比学习Alignment процесс可能会意外地导致重要的本地化信息和对象完整性的产生，这些信息是Semantic segmentation 的准确性所必需的。近来， diffusion models 在 semantic segmentation 领域的应用也在扩展。这些approaches使用 diffusion models 为生成注释数据或提取特征，以便实现 Semantic segmentation。通常情况下，这些方法需要生成大量的synthetic数据或添加额外的mask注释。为了实现这一点，我们探讨了使用生成文本图像决定型 diffusion models 作为开 vocabulary semantic segmenter，并提出了一种新的无需训练的方法 named DiffSegmenter。具体来说，我们将输入图像和候选类 feed 到一个预训练的conditional latent diffusion model中，并使用其生成的杂化映射来生成Semantic segmentation scores。这些 scores 然后通过自我注意 Mechanism 进行更新和完善。此外，我们还特意设计了有效的文本提示和类别筛选机制，以进一步提高 segmentation 结果。我们在三个benchmark dataset上进行了广泛的实验，结果显示，我们的 DiffSegmenter 可以在开 vocabulary semantic segmentation 中达到卓越的成绩。

RepSGG: Novel Representations of Entities and Relationships for Scene Graph Generation

paper_url: http://arxiv.org/abs/2309.03240
repo_url: None
paper_authors: Hengyue Liu, Bir Bhanu
for: 提高Scene Graph Generation（SGG）的精度和效率，尤其是在 Addressing the challenges of fixed-size entity representations and long-tailed distribution.
methods: 提出了一种novel architecture called RepSGG，通过将主题作为查询，对象作为键，以及它们之间的最大关注重量作为对应关系的权重，以提取更加精细和灵活的特征表示。此外，还提出了一种在训练过程中根据运行时性能进行对 relate logits 的修正，以便更好地塑造表示关系的精度和效率。
results: 实验结果显示，RepSGG可以在Visual Genome和Open Images V6 datasets上 achieve the state-of-the-art or comparable performance，同时具有快速的推理速度，证明了提出的方法的有效性和效率。

Abstract
Scene Graph Generation (SGG) has achieved significant progress recently. However, most previous works rely heavily on fixed-size entity representations based on bounding box proposals, anchors, or learnable queries. As each representation's cardinality has different trade-offs between performance and computation overhead, extracting highly representative features efficiently and dynamically is both challenging and crucial for SGG. In this work, a novel architecture called RepSGG is proposed to address the aforementioned challenges, formulating a subject as queries, an object as keys, and their relationship as the maximum attention weight between pairwise queries and keys. With more fine-grained and flexible representation power for entities and relationships, RepSGG learns to sample semantically discriminative and representative points for relationship inference. Moreover, the long-tailed distribution also poses a significant challenge for generalization of SGG. A run-time performance-guided logit adjustment (PGLA) strategy is proposed such that the relationship logits are modified via affine transformations based on run-time performance during training. This strategy encourages a more balanced performance between dominant and rare classes. Experimental results show that RepSGG achieves the state-of-the-art or comparable performance on the Visual Genome and Open Images V6 datasets with fast inference speed, demonstrating the efficacy and efficiency of the proposed methods.

摘要

DMKD: Improving Feature-based Knowledge Distillation for Object Detection Via Dual Masking Augmentation

paper_url: http://arxiv.org/abs/2309.02719
repo_url: None
paper_authors: Guang Yang, Yin Tang, Zhijian Wu, Jun Li, Jianhua Xu, Xili Wan
For: The paper is written for improving the performance of object detection tasks using masked knowledge distillation.* Methods: The paper uses a Dual Masked Knowledge Distillation (DMKD) framework, which employs dual attention mechanism and self-adjustable weighting strategy to capture both spatially important and channel-wise informative clues for comprehensive masked feature reconstruction.* Results: The paper demonstrates that the student networks achieve performance gains of 4.1% and 4.3% with the help of the proposed method when RetinaNet and Cascade Mask R-CNN are respectively used as the teacher networks, while outperforming other state-of-the-art distillation methods.Here is the information in Simplified Chinese text:* 用途: 论文是为了提高对象检测任务的性能使用masked知识传递。* 方法: 论文使用了Dual Masked Knowledge Distillation（DMKD）框架，该框架使用双注意力机制和自适应权重调整策略来捕捉双重重要的spatial和channelwise信息，实现了全面的masked特征重建。* 结果: 论文显示，使用提议方法可以使学生网络在RetinaNet和Cascade Mask R-CNN两个教师网络的帮助下提高性能，同时超越其他当前的传递方法。

Abstract
Recent mainstream masked distillation methods function by reconstructing selectively masked areas of a student network from the feature map of its teacher counterpart. In these methods, the masked regions need to be properly selected, such that reconstructed features encode sufficient discrimination and representation capability like the teacher feature. However, previous masked distillation methods only focus on spatial masking, making the resulting masked areas biased towards spatial importance without encoding informative channel clues. In this study, we devise a Dual Masked Knowledge Distillation (DMKD) framework which can capture both spatially important and channel-wise informative clues for comprehensive masked feature reconstruction. More specifically, we employ dual attention mechanism for guiding the respective masking branches, leading to reconstructed feature encoding dual significance. Furthermore, fusing the reconstructed features is achieved by self-adjustable weighting strategy for effective feature distillation. Our experiments on object detection task demonstrate that the student networks achieve performance gains of 4.1% and 4.3% with the help of our method when RetinaNet and Cascade Mask R-CNN are respectively used as the teacher networks, while outperforming the other state-of-the-art distillation methods.

摘要
(Simplified Chinese translation)现代主流的掩蔽知识传授方法都是通过学习器网络中选择性地掩蔽特定区域，以从教师网络的特征图中重建学习器网络中的特征。然而，先前的掩蔽知识传授方法仅专注于空间掩蔽，导致掩蔽区域仅对空间重要性有偏好，无法传递有用的通道讯息。在这一研究中，我们提出了一个多模式掩蔽知识传授（DMKD）框架，可以捕捉多模式掩蔽特征的重建。具体来说，我们使用双注意力机制来引导各自的掩蔽分支，使得重建特征具有双重重要性。此外，我们使用自适应的权重调整策略，以实现有效的特征传授。我们对物件检测任务进行实验，结果显示，使用我们的方法可以帮助学习器网络在RetinaNet和Cascade Mask R-CNN作为教师网络时，提高表现率4.1%和4.3%，并且超过其他状态的传授方法。

Gene-induced Multimodal Pre-training for Image-omic Classification

paper_url: http://arxiv.org/abs/2309.02702
repo_url: None
paper_authors: Ting Jin, Xingran Xie, Renjie Wan, Qingli Li, Yan Wang
for: 这篇论文旨在提出一种基于遗传学和材料学图像的多Modal Pre-training（GiMP）框架，用于 классификации任务。
methods: 该框架使用了群体多头自我注意力基因编码器，以捕捉全息特征，以及一种隐藏素征模型（MPM），用于捕捉不同组织的潜在特征。
results: 实验结果表明，GiMP框架在TCGA数据集上表现出色，准确率达99.47%，超过了传统方法。代码可以在https://github.com/huangwudiduan/GIMP上获取。

Abstract
Histology analysis of the tumor micro-environment integrated with genomic assays is the gold standard for most cancers in modern medicine. This paper proposes a Gene-induced Multimodal Pre-training (GiMP) framework, which jointly incorporates genomics and Whole Slide Images (WSIs) for classification tasks. Our work aims at dealing with the main challenges of multi-modality image-omic classification w.r.t. (1) the patient-level feature extraction difficulties from gigapixel WSIs and tens of thousands of genes, and (2) effective fusion considering high-order relevance modeling. Concretely, we first propose a group multi-head self-attention gene encoder to capture global structured features in gene expression cohorts. We design a masked patch modeling paradigm (MPM) to capture the latent pathological characteristics of different tissues. The mask strategy is randomly masking a fixed-length contiguous subsequence of patch embeddings of a WSI. Finally, we combine the classification tokens of paired modalities and propose a triplet learning module to learn high-order relevance and discriminative patient-level information.After pre-training, a simple fine-tuning can be adopted to obtain the classification results. Experimental results on the TCGA dataset show the superiority of our network architectures and our pre-training framework, achieving 99.47% in accuracy for image-omic classification. The code is publicly available at https://github.com/huangwudiduan/GIMP.

摘要
现代医学中大多数癌症的标准方法是 histology 分析和 genomic 测试。这篇文章提出了一个 Gene-induced Multimodal Pre-training（GiMP）框架，该框架结合 genomics 和 Whole Slide Images（WSIs）进行分类任务。我们的工作是解决多模态图像-生物学分类中关于（1）来自 gigapixel WSIs 和 tens of thousands of genes 的病人级特征提取问题，以及（2）高级相关模型化的有效融合问题。具体来说，我们首先提出了一种组 multi-head self-attention 基因编码器，以捕捉全Structured 特征在基因表达相关组中。我们设计了一种 masked patch modeling 模式（MPM），以捕捉不同组织的潜在病理特征。在 mask 策略中，随机遮盖一个固定长度连续序列的 patch 嵌入。最后，我们将多modal 分类标记 fusion 并提出了一个 triplet learning 模块，以学习高级相关和权威病人级信息。经过预训练后，可以简单地进行微调，以获得分类结果。TCGA 数据集的实验结果表明，我们的网络架构和预训练框架具有优势，实现了 99.47% 的图像-生物学分类精度。代码可以在上公开获取。

Improving Image Classification of Knee Radiographs: An Automated Image Labeling Approach

paper_url: http://arxiv.org/abs/2309.02681
repo_url: None
paper_authors: Jikai Zhang, Carlos Santos, Christine Park, Maciej Mazurowski, Roy Colglazier
for: 这个研究的目的是开发一种自动标注方法，以提高诊断X光影像的Image Classification模型，以分辨正常股骨影像和疾病或替换手术影像。methods: 这个研究使用了7382名病人和637名病人的数据来训练和验证自动标注模型。results: 研究结果表明，使用自动标注模型可以提高诊断X光影像的Image Classification性能，具有较高的WAUC值和AUC-ROC值，特别是在正常和疾病类别中。此外，DeLong测试表明，这种提高是 statistically significant（p-value<0.002和p-value<0.001）。这些结果表明，自动标注方法可以有效地提高诊断X光影像的Image Classification性能，为患者护理和大量股骨数据库的筛选提供帮助。

Abstract
Large numbers of radiographic images are available in knee radiology practices which could be used for training of deep learning models for diagnosis of knee abnormalities. However, those images do not typically contain readily available labels due to limitations of human annotations. The purpose of our study was to develop an automated labeling approach that improves the image classification model to distinguish normal knee images from those with abnormalities or prior arthroplasty. The automated labeler was trained on a small set of labeled data to automatically label a much larger set of unlabeled data, further improving the image classification performance for knee radiographic diagnosis. We developed our approach using 7,382 patients and validated it on a separate set of 637 patients. The final image classification model, trained using both manually labeled and pseudo-labeled data, had the higher weighted average AUC (WAUC: 0.903) value and higher AUC-ROC values among all classes (normal AUC-ROC: 0.894; abnormal AUC-ROC: 0.896, arthroplasty AUC-ROC: 0.990) compared to the baseline model (WAUC=0.857; normal AUC-ROC: 0.842; abnormal AUC-ROC: 0.848, arthroplasty AUC-ROC: 0.987), trained using only manually labeled data. DeLong tests show that the improvement is significant on normal (p-value<0.002) and abnormal (p-value<0.001) images. Our findings demonstrated that the proposed automated labeling approach significantly improves the performance of image classification for radiographic knee diagnosis, allowing for facilitating patient care and curation of large knee datasets.

摘要
大量的血管成像图像在膝关节Radiology实践中可用于深度学习模型的训练，识别膝关节异常。然而，这些图像通常没有可用的标签，因为人工标注的限制。我们的研究旨在开发一种自动标签方法，以提高血管成像分类模型，识别正常膝关节图像和异常或过去arthroplasty图像。我们使用了7,382名患者和637名患者的分离集来验证我们的方法。最终的血管成像分类模型，使用了手动标签和 Pseudo-标签数据进行训练，与基线模型（WAUC=0.857；正常AUC-ROC=0.842；异常AUC-ROC=0.848；arthroplasty AUC-ROC=0.987）相比，在所有类型中得到了更高的Weighted Average AUC（WAUC：0.903）值和更高的AUC-ROC值。DeLong测试表明，在正常（p-值<0.002）和异常（p-值<0.001）图像上，我们的提高是 statistically significant。我们的发现表明，我们提出的自动标签方法可以在膝关节Radiographic诊断中显著提高图像分类性能，为患者护理和大量膝关节数据库的筛选提供了有力的支持。

Efficient Training for Visual Tracking with Deformable Transformer

paper_url: http://arxiv.org/abs/2309.02676
repo_url: None
paper_authors: Qingmao Wei, Guotian Zeng, Bi Zeng
for: 这篇论文是为了提出一种高效的视觉对象跟踪方法，以便应用于实际场景。
methods: 该方法使用了高效的Encoder-Decoder结构，并使用了弹性变换器作为目标头，从而降低了GFLOPs。在训练过程中，我们引入了一种新的一对多标签分配方法和一种辅助去噪技术，使模型更快地趋向于 converges。
results: 我们的方法在顶尖GOT-10k测试benchmark上达到了72.9%的AO，仅用20%的训练班目时间，并且在GFLOPs方面比所有基eline的转换器更低。

Abstract
Recent Transformer-based visual tracking models have showcased superior performance. Nevertheless, prior works have been resource-intensive, requiring prolonged GPU training hours and incurring high GFLOPs during inference due to inefficient training methods and convolution-based target heads. This intensive resource use renders them unsuitable for real-world applications. In this paper, we present DETRack, a streamlined end-to-end visual object tracking framework. Our framework utilizes an efficient encoder-decoder structure where the deformable transformer decoder acting as a target head, achieves higher sparsity than traditional convolution heads, resulting in decreased GFLOPs. For training, we introduce a novel one-to-many label assignment and an auxiliary denoising technique, significantly accelerating model's convergence. Comprehensive experiments affirm the effectiveness and efficiency of our proposed method. For instance, DETRack achieves 72.9% AO on challenging GOT-10k benchmarks using only 20% of the training epochs required by the baseline, and runs with lower GFLOPs than all the transformer-based trackers.

摘要
In this paper, we propose DETRack, a streamlined end-to-end visual object tracking framework. Our framework uses an efficient encoder-decoder structure, where the deformable transformer decoder acts as a target head, achieving higher sparsity than traditional convolution heads, resulting in reduced GFLOPs.For training, we introduce a novel one-to-many label assignment and an auxiliary denoising technique, significantly accelerating the model's convergence. Comprehensive experiments demonstrate the effectiveness and efficiency of our proposed method. For example, DETRack achieves 72.9% AO on the challenging GOT-10k benchmark using only 20% of the training epochs required by the baseline, and runs with lower GFLOPs than all the transformer-based trackers.

Progressive Attention Guidance for Whole Slide Vulvovaginal Candidiasis Screening

paper_url: http://arxiv.org/abs/2309.02670
repo_url: https://github.com/cjdbehumble/miccai2023-vvc-screening
paper_authors: Jiangdong Cai, Honglin Xiong, Maosong Cao, Luyan Liu, Lichi Zhang, Qian Wang
for: 这个论文是为了提出一种基于整体图像分类的自动诊断病理图像检测方法，以解决病理图像检测领域中缺乏标注数据和病原菌特有的问题。
methods: 该方法首先使用一个预训练的检测模型作为先前的指导，然后使用跳过自我关注模块来细化关注病原菌的细腻特征。最后，使用对比学习方法来降低由病理图像的样式差异引起的过拟合和关注到假阳性区域。
results: 我们的实验结果表明，我们的框架可以达到状态畅的性能。代码和示例数据可以在https://github.com/cjdbehumble/MICCAI2023-VVC-Screening中找到。

Abstract
Vulvovaginal candidiasis (VVC) is the most prevalent human candidal infection, estimated to afflict approximately 75% of all women at least once in their lifetime. It will lead to several symptoms including pruritus, vaginal soreness, and so on. Automatic whole slide image (WSI) classification is highly demanded, for the huge burden of disease control and prevention. However, the WSI-based computer-aided VCC screening method is still vacant due to the scarce labeled data and unique properties of candida. Candida in WSI is challenging to be captured by conventional classification models due to its distinctive elongated shape, the small proportion of their spatial distribution, and the style gap from WSIs. To make the model focus on the candida easier, we propose an attention-guided method, which can obtain a robust diagnosis classification model. Specifically, we first use a pre-trained detection model as prior instruction to initialize the classification model. Then we design a Skip Self-Attention module to refine the attention onto the fined-grained features of candida. Finally, we use a contrastive learning method to alleviate the overfitting caused by the style gap of WSIs and suppress the attention to false positive regions. Our experimental results demonstrate that our framework achieves state-of-the-art performance. Code and example data are available at https://github.com/cjdbehumble/MICCAI2023-VVC-Screening.

摘要
对于人类发炎病毒（VVC）而言，这是最常见的感染，估计会在所有女性中发生至少一次。它会导致一些 симптом，例如痒著、阴道疼痛等。由于这种疾病的管控和预防问题很大，因此数位数据支持的数位构成VCC检测方法是非常需要的。然而，这种数位构成方法仍然没有，因为缺乏标注数据和发炎菌的特殊性。发炎菌在数位构成中具有特殊的延长形状、小型的分布和数位构成的Style gap。为了让模型更容易捕捉发炎菌，我们提出了一个注意力引导的方法。具体来说，我们首先使用预训掌握的检测模型作为几何调教，然后设计了跳跃自我注意模组，以更新注意力以精细特征。最后，我们使用对比学习方法，以减少因数位构成的Style gap导致的过滤。我们的实验结果显示，我们的框架可以实现国际一级的表现。可以在https://github.com/cjdbehumble/MICCAI2023-VVC-Screening上获取代码和示例数据。

Fast and Resource-Efficient Object Tracking on Edge Devices: A Measurement Study

paper_url: http://arxiv.org/abs/2309.02666
repo_url: https://github.com/git-disl/emo
paper_authors: Sanjana Vijay Ganesh, Yanzhao Wu, Gaowen Liu, Ramana Kompella, Ling Liu
For: This paper focuses on the performance issues and optimization opportunities for multi-object tracking (MOT) on edge devices with heterogeneous computing resources.* Methods: The paper proposes several edge-specific performance optimization strategies, called EMO, to speed up real-time object tracking, including window-based optimization and similarity-based optimization.* Results: The proposed EMO approach is competitive with respect to representative on-device object tracking techniques in terms of run-time performance and tracking accuracy, as demonstrated through extensive experiments on popular MOT benchmarks.

Abstract
Object tracking is an important functionality of edge video analytic systems and services. Multi-object tracking (MOT) detects the moving objects and tracks their locations frame by frame as real scenes are being captured into a video. However, it is well known that real time object tracking on the edge poses critical technical challenges, especially with edge devices of heterogeneous computing resources. This paper examines the performance issues and edge-specific optimization opportunities for object tracking. We will show that even the well trained and optimized MOT model may still suffer from random frame dropping problems when edge devices have insufficient computation resources. We present several edge specific performance optimization strategies, collectively coined as EMO, to speed up the real time object tracking, ranging from window-based optimization to similarity based optimization. Extensive experiments on popular MOT benchmarks demonstrate that our EMO approach is competitive with respect to the representative methods for on-device object tracking techniques in terms of run-time performance and tracking accuracy. EMO is released on Github at https://github.com/git-disl/EMO.

摘要
<>将文本翻译为简化中文。<>Edge video analytic系统和服务中的目标跟踪功能非常重要。多目标跟踪（MOT）可以检测到摄像头中的移动目标，并在每帧中跟踪它们的位置。然而，在边缘设备上实时进行目标跟踪存在重要的技术挑战，尤其是边缘设备的不同计算资源。这篇论文检查了目标跟踪的性能问题和边缘设备特有的优化机会。我们将示出，即使使用最佳化的MOT模型，在边缘设备有限的计算资源情况下，仍可能出现随机帧掉pping问题。我们提出了多种边缘特定的性能优化策略，称为EMO，以加速实时目标跟踪。这些策略包括窗口优化和相似性优化等。我们在流行的MOT benchmark上进行了广泛的实验，并证明了我们的EMO方法与当前的边缘设备上的目标跟踪技术相比，在运行时性能和跟踪精度方面具有竞争力。EMO在GitHub上发布，请参考。

Multiclass Alignment of Confidence and Certainty for Network Calibration

paper_url: http://arxiv.org/abs/2309.02636
repo_url: None
paper_authors: Vinith Kugathasan, Muhammad Haris Khan
For: 提高模型预测结果的准确性和可靠性，特别是在安全关键应用中。* Methods: 提出了一种新的训练时Calibration方法，基于模型预测信任度与确定性之间的差异，以提高模型的可靠性和准确性。* Results: 经过EXTENSIVE EXPERIMENTS的证明，该方法在十个复杂的数据集上取得了最佳的Calibration性能，包括在预测内部和外部预测中。

Abstract
Deep neural networks (DNNs) have made great strides in pushing the state-of-the-art in several challenging domains. Recent studies reveal that they are prone to making overconfident predictions. This greatly reduces the overall trust in model predictions, especially in safety-critical applications. Early work in improving model calibration employs post-processing techniques which rely on limited parameters and require a hold-out set. Some recent train-time calibration methods, which involve all model parameters, can outperform the postprocessing methods. To this end, we propose a new train-time calibration method, which features a simple, plug-and-play auxiliary loss known as multi-class alignment of predictive mean confidence and predictive certainty (MACC). It is based on the observation that a model miscalibration is directly related to its predictive certainty, so a higher gap between the mean confidence and certainty amounts to a poor calibration both for in-distribution and out-of-distribution predictions. Armed with this insight, our proposed loss explicitly encourages a confident (or underconfident) model to also provide a low (or high) spread in the presoftmax distribution. Extensive experiments on ten challenging datasets, covering in-domain, out-domain, non-visual recognition and medical image classification scenarios, show that our method achieves state-of-the-art calibration performance for both in-domain and out-domain predictions. Our code and models will be publicly released.

摘要
To improve model calibration, we propose a new train-time calibration method called multi-class alignment of predictive mean confidence and predictive certainty (MACC). This method is based on the observation that model miscalibration is directly related to predictive certainty, so a higher gap between the mean confidence and certainty indicates poor calibration for both in-distribution and out-of-distribution predictions. Our proposed loss explicitly encourages a confident (or underconfident) model to provide a low (or high) spread in the presoftmax distribution.We conduct extensive experiments on ten challenging datasets, covering in-domain, out-domain, non-visual recognition, and medical image classification scenarios, and show that our method achieves state-of-the-art calibration performance for both in-domain and out-domain predictions. Our code and models will be publicly released.

2023-09-06

cs.AI

cs.AI - 2023-09-06

The Role of Communication and Reference Songs in the Mixing Process: Insights from Professional Mix Engineers

paper_url: http://arxiv.org/abs/2309.03404
repo_url: None
paper_authors: Soumya Sai Vanka, Maryam Safi, Jean-Baptiste Rolland, György Fazekas
For: 这个论文的目的是研究专业混音工程师与客户之间的交流和反馈过程，以便更好地理解混音过程中的协作、共鸣和意图。* Methods: 这个研究使用了两个阶段的探索性研究方法，包括第一阶段的 semi-structured 采访，以及第二阶段的在线问卷调查。* Results: 这个研究发现，在混音过程中，协作、共鸣和意图是非常重要的，这些发现可以帮助开发智能多轨混音系统，以更好地支持这些实践。

Abstract
Effective music mixing requires technical and creative finesse, but clear communication with the client is crucial. The mixing engineer must grasp the client's expectations, and preferences, and collaborate to achieve the desired sound. The tacit agreement for the desired sound of the mix is often established using guides like reference songs and demo mixes exchanged between the artist and the engineer and sometimes verbalised using semantic terms. This paper presents the findings of a two-phased exploratory study aimed at understanding how professional mixing engineers interact with clients and use their feedback to guide the mixing process. For phase one, semi-structured interviews were conducted with five mixing engineers with the aim of gathering insights about their communication strategies, creative processes, and decision-making criteria. Based on the inferences from these interviews, an online questionnaire was designed and administered to a larger group of 22 mixing engineers during the second phase. The results of this study shed light on the importance of collaboration, empathy, and intention in the mixing process, and can inform the development of smart multi-track mixing systems that better support these practices. By highlighting the significance of these findings, this paper contributes to the growing body of research on the collaborative nature of music production and provides actionable recommendations for the design and implementation of innovative mixing tools.

摘要
要有效地混音音乐，技术和创造力都是必要的，但与客户的沟通也非常重要。混音工程师必须理解客户的期望和喜好，并与其合作以实现感想中的音乐风格。在混音过程中，客户和工程师之间的含义和听众往往通过参考歌曲和演示混音来建立共识。这篇论文介绍了一项两期探索性研究，旨在了解专业混音工程师与客户之间的交流方式、创作过程和决策标准。第一阶段，我们采访了5名混音工程师，以了解他们的沟通策略、创作过程和决策标准。基于这些采访的结论，我们 THEN designed an online问卷，并在第二阶段向22名混音工程师进行了调查。这些结果 shed light on the importance of collaboration, empathy, and intention in the mixing process, and can inform the development of smart multi-track mixing systems that better support these practices. By highlighting the significance of these findings, this paper contributes to the growing body of research on the collaborative nature of music production and provides actionable recommendations for the design and implementation of innovative mixing tools.

Efficient Baselines for Motion Prediction in Autonomous Driving

paper_url: http://arxiv.org/abs/2309.03387
repo_url: https://github.com/cram3r95/mapfe4mp
paper_authors: Carlos Gómez-Huélamo, Marcos V. Conde, Rafael Barea, Manuel Ocaña, Luis M. Bergasa
for: 这篇论文的目的是提出一些高效的基准方案来解决多个环境中的动作预测问题（MP），以便在复杂的环境中实现自动驾驶栈（ADS）。
methods: 这篇论文使用了现有的SOTA技术，包括注意力机制和图 Né net，以及一种新的预处理步骤，基于动力学约束，来生成可靠的多Modal轨迹。
results: 论文的实验结果表明，该方法可以在 Argoverse 1 动作预测benchmark上达到与其他SOTA方法相同的精度水平，但具有更少的操作和参数，以及更好的可读性。

Abstract
Motion Prediction (MP) of multiple surroundings agents is a crucial task in arbitrarily complex environments, from simple robots to Autonomous Driving Stacks (ADS). Current techniques tackle this problem using end-to-end pipelines, where the input data is usually a rendered top-view of the physical information and the past trajectories of the most relevant agents; leveraging this information is a must to obtain optimal performance. In that sense, a reliable ADS must produce reasonable predictions on time. However, despite many approaches use simple ConvNets and LSTMs to obtain the social latent features, State-Of-The-Art (SOTA) models might be too complex for real-time applications when using both sources of information (map and past trajectories) as well as little interpretable, specially considering the physical information. Moreover, the performance of such models highly depends on the number of available inputs for each particular traffic scenario, which are expensive to obtain, particularly, annotated High-Definition (HD) maps. In this work, we propose several efficient baselines for the well-known Argoverse 1 Motion Forecasting Benchmark. We aim to develop compact models using SOTA techniques for MP, including attention mechanisms and GNNs. Our lightweight models use standard social information and interpretable map information such as points from the driveable area and plausible centerlines by means of a novel preprocessing step based on kinematic constraints, in opposition to black-box CNN-based or too-complex graphs methods for map encoding, to generate plausible multimodal trajectories achieving up-to-pair accuracy with less operations and parameters than other SOTA methods. Our code is publicly available at https://github.com/Cram3r95/mapfe4mp .

摘要
<>TRANSLATE_TEXT多 surroundings agent 的动态预测（MP）在无序复杂环境中是关键任务，从简单的机器人到自动驾驶栈（ADS）都是如此。现有技术使用端到端管道来解决这个问题，输入数据通常是 render 的顶视图 Physical information 和过去 trajectory 最 relevante agents; 利用这些信息是必须以获取优化性能。在这种情况下，一个可靠的 ADS 必须在时间上 produz 合理的预测。然而，虽然许多方法使用简单的 ConvNet 和 LSTM 来获取社交尘肤特征，但 SOTA 模型可能在实时应用中过于复杂，特别是使用多种输入信息（地图和过去 trajectory）以及具有少量解释能力，尤其是考虑物理信息。此外，这些模型的性能强度取决于输入信息的数量，这些信息可能昂贵并且困难以获取，特别是需要注释高分辨率地图。在这种情况下，我们提出了一些高效的基线 для Argoverse 1 动态预测挑战 зада。我们目标是开发 compact 的模型，使用 SOTA 技术来实现 MP，包括注意机制和 GNNs。我们的轻量级模型使用标准的社交信息和可读的地图信息，例如 driveable 区域的点和可能的中心线，通过一种新的预处理步骤基于动力学约束来生成可能的多模态轨迹，实现对拥有同等精度的 SOTA 方法相同的准确率，并且具有更少的操作和参数。我们的代码公开可用于。<

Community-Based Hierarchical Positive-Unlabeled (PU) Model Fusion for Chronic Disease Prediction

paper_url: http://arxiv.org/abs/2309.03386
repo_url: https://github.com/yangwu001/putree
paper_authors: Yang Wu, Xurui Li, Xuhong Zhang, Yangyang Kang, Changlong Sun, Xiaozhong Liu
for: 这篇论文旨在解决 Chronic disease screening 问题，使用 Positive-Unlabeled (PU) Learning 方法，并考虑不同人口群体之间的差异。
methods: 本研究提出了一个新的 Positive-Unlabeled Learning Tree (PUtree) 算法，具有社区建立 PU 模型的能力，并通过统计融合不同社区的模型得到更加稳定的 Binary 分类结果。
results: 在两个 Benchmark 和一个新的 Diabetes 预测数据集上，PUtree 和其变种实现了更好的性能，较之前的 State-of-the-art PU learning 方法。

Abstract
Positive-Unlabeled (PU) Learning is a challenge presented by binary classification problems where there is an abundance of unlabeled data along with a small number of positive data instances, which can be used to address chronic disease screening problem. State-of-the-art PU learning methods have resulted in the development of various risk estimators, yet they neglect the differences among distinct populations. To address this issue, we present a novel Positive-Unlabeled Learning Tree (PUtree) algorithm. PUtree is designed to take into account communities such as different age or income brackets, in tasks of chronic disease prediction. We propose a novel approach for binary decision-making, which hierarchically builds community-based PU models and then aggregates their deliverables. Our method can explicate each PU model on the tree for the optimized non-leaf PU node splitting. Furthermore, a mask-recovery data augmentation strategy enables sufficient training of the model in individual communities. Additionally, the proposed approach includes an adversarial PU risk estimator to capture hierarchical PU-relationships, and a model fusion network that integrates data from each tree path, resulting in robust binary classification results. We demonstrate the superior performance of PUtree as well as its variants on two benchmarks and a new diabetes-prediction dataset.

摘要
Positive-Unlabeled (PU) 学习是 binary 分类问题中的一个挑战，其中有大量无标注数据和一小数量的正样本，可以用来解决慢性病creening问题。现有的PU学习方法已经导致了不同的风险估计器的开发，但它们忽略了不同人口群体之间的差异。为解决这个问题，我们提出了一种新的Positive-Unlabeled学习树（PUtree）算法。PUtree采用了社区分割的方法，如年龄或收入等，在慢性病预测任务中进行分类。我们提出了一种基于社区的二分法，即在社区中建立PU模型，然后对每个社区进行综合分类。此外，我们还提出了一种面具恢复数据增强策略，以便训练模型在每个社区中。此外，我们还提出了一种对 hierarchical PU 关系进行捕捉的 adversarial PU 风险估计器，以及一种拼接数据从每个树路的模型融合网络，以实现Robust binary 分类结果。我们在两个标准 benchmark 和一个新的 диабеتesprediction 数据集上展示了PUtree 的超越性和其变种的表现。

Self-Supervised Masked Digital Elevation Models Encoding for Low-Resource Downstream Tasks

paper_url: http://arxiv.org/abs/2309.03367
repo_url: None
paper_authors: Priyam Mazumdar, Aiman Soliman, Volodymyr Kindratenko, Luigi Marini, Kenton McHenry
for: 本研究的目的是提取基本建筑和路况信息从数字高程模型（DEM），以提供详细的地表 topology。
methods: 该模型使用遮盖 autoencoder 预训练 ImageNet（尽管存在大领域差异），并添加 UperNet 头来解码 segmentation。
results: 模型在建筑 segmentation 任务上获得 82.1% 交集 overlap（IoU）使用 450 张图像，并在只使用 50 张图像时获得 69.1% IoU。在更加困难的路况检测任务上，模型获得 82.7% IoU 使用 450 张图像，并在只使用 50 张图像时获得 73.2% IoU。

Abstract
The lack of quality labeled data is one of the main bottlenecks for training Deep Learning models. As the task increases in complexity, there is a higher penalty for overfitting and unstable learning. The typical paradigm employed today is Self-Supervised learning, where the model attempts to learn from a large corpus of unstructured and unlabeled data and then transfer that knowledge to the required task. Some notable examples of self-supervision in other modalities are BERT for Large Language Models, Wav2Vec for Speech Recognition, and the Masked AutoEncoder for Vision, which all utilize Transformers to solve a masked prediction task. GeoAI is uniquely poised to take advantage of the self-supervised methodology due to the decades of data collected, little of which is precisely and dependably annotated. Our goal is to extract building and road segmentations from Digital Elevation Models (DEM) that provide a detailed topography of the earths surface. The proposed architecture is the Masked Autoencoder pre-trained on ImageNet (with the limitation that there is a large domain discrepancy between ImageNet and DEM) with an UperNet Head for decoding segmentations. We tested this model with 450 and 50 training images only, utilizing roughly 5% and 0.5% of the original data respectively. On the building segmentation task, this model obtains an 82.1% Intersection over Union (IoU) with 450 Images and 69.1% IoU with only 50 images. On the more challenging road detection task the model obtains an 82.7% IoU with 450 images and 73.2% IoU with only 50 images. Any hand-labeled dataset made today about the earths surface will be immediately obsolete due to the constantly changing nature of the landscape. This motivates the clear necessity for data-efficient learners that can be used for a wide variety of downstream tasks.

摘要
“缺乏质量标注数据是深度学习模型训练的主要瓶颈。随着任务的复杂度增加，模型会面临更高的溢出和不稳定学习 penalty。当今通用的方法是自我超vised学习，其中模型尝试从大量未结构化和未标注数据中学习知识，然后将其应用到需要的任务上。 notable example包括BERT для大语言模型、Wav2Vec для语音识别和视觉领域中的Masked AutoEncoder，它们都使用Transformers解决了masked prediction任务。GeoAI具有自我超vised方法的优势，因为它拥有大量数据，但只有少量准确和可靠地标注。我们的目标是从数字高程模型（DEM）中提取建筑和路径分割，以获得详细的地表 topology。我们提议使用Masked Autoencoder预训练在ImageNet（具有大领域差异）和UpperNet Head для解码分割。我们使用450和50张图像进行测试，只使用原始数据的5%和0.5%。在建筑分割任务中，这个模型 obtiains 82.1%的Intersection over Union（IoU），使用450张图像时 obtiains 69.1%的IoU，使用50张图像时 obtiains 82.7%的IoU。在更加困难的道路检测任务中，模型 obtiains 82.7%的IoU，使用450张图像时 obtiains 73.2%的IoU。由于地表面的不断变化，任何手动标注的数据今天都将是过时的。这种情况激励我们需要数据效率的学习者，可以用于各种下游任务。”

ETP: Learning Transferable ECG Representations via ECG-Text Pre-training

paper_url: http://arxiv.org/abs/2309.07145
repo_url: None
paper_authors: Che Liu, Zhongwei Wan, Sibo Cheng, Mi Zhang, Rossella Arcucci
for: 针对卡路达学健康领域的电位图 (ECG) 作为非侵入性诊断工具，尝试使用自然语言描述 (NLP) 技术来学习 ECG 的特征表示。
methods: 我们提出了一种新的框架，即 ECG 文本预训练 (ETP)，它通过将 ECG 信号与文本报告相对应，来学习cross-模态的 ECG 特征表示。ETP 使用了 ECG 编码器和预训练的自然语言模型，以实现 ECG 信号与文本报告的对应。
results: ETP 在 linear 评估任务和零容量分类任务中表现出色，在 PTB-XL 和 CPSC2018 数据集上进行了证明。ETP 能够学习 Robust 和通用的 cross-模态 ECG 特征表示，并且在不同的类别上具有良好的一致性。

Abstract
In the domain of cardiovascular healthcare, the Electrocardiogram (ECG) serves as a critical, non-invasive diagnostic tool. Although recent strides in self-supervised learning (SSL) have been promising for ECG representation learning, these techniques often require annotated samples and struggle with classes not present in the fine-tuning stages. To address these limitations, we introduce ECG-Text Pre-training (ETP), an innovative framework designed to learn cross-modal representations that link ECG signals with textual reports. For the first time, this framework leverages the zero-shot classification task in the ECG domain. ETP employs an ECG encoder along with a pre-trained language model to align ECG signals with their corresponding textual reports. The proposed framework excels in both linear evaluation and zero-shot classification tasks, as demonstrated on the PTB-XL and CPSC2018 datasets, showcasing its ability for robust and generalizable cross-modal ECG feature learning.

摘要
在心血管健康领域，电cardiogram (ECG) 作为一种非侵入性诊断工具，扮演着关键的角色。 although recent advances in self-supervised learning (SSL) 在 ECG 表示学习方面具有承诺的进步，这些技术通常需要标注样本，并且在精度调整阶段难以处理不存在的类型。 To address these limitations, we introduce ECG-Text Pre-training (ETP), a novel framework designed to learn cross-modal representations that link ECG signals with textual reports. This framework leverages the zero-shot classification task in the ECG domain for the first time. ETP employs an ECG encoder along with a pre-trained language model to align ECG signals with their corresponding textual reports. The proposed framework excels in both linear evaluation and zero-shot classification tasks, as demonstrated on the PTB-XL and CPSC2018 datasets, showcasing its ability for robust and generalizable cross-modal ECG feature learning.

REBOOT: Reuse Data for Bootstrapping Efficient Real-World Dexterous Manipulation

paper_url: http://arxiv.org/abs/2309.03322
repo_url: None
paper_authors: Zheyuan Hu, Aaron Rovinsky, Jianlan Luo, Vikash Kumar, Abhishek Gupta, Sergey Levine
for: 学习灵活的抓取技能，以提高机器人手臂在实际世界中的操作能力。
methods: 利用近期的可效RL和重播缓存启动技术，将不同任务或物品的数据作为新任务的启动点，大幅提高学习效率。
results: 在实际世界中使用四 fingers 机器人手臂快速学习复杂的抓取技能，并完成实际训练周期，无需人工重置和奖励工程。

Abstract
Dexterous manipulation tasks involving contact-rich interactions pose a significant challenge for both model-based control systems and imitation learning algorithms. The complexity arises from the need for multi-fingered robotic hands to dynamically establish and break contacts, balance non-prehensile forces, and control large degrees of freedom. Reinforcement learning (RL) offers a promising approach due to its general applicability and capacity to autonomously acquire optimal manipulation strategies. However, its real-world application is often hindered by the necessity to generate a large number of samples, reset the environment, and obtain reward signals. In this work, we introduce an efficient system for learning dexterous manipulation skills with RL to alleviate these challenges. The main idea of our approach is the integration of recent advances in sample-efficient RL and replay buffer bootstrapping. This combination allows us to utilize data from different tasks or objects as a starting point for training new tasks, significantly improving learning efficiency. Additionally, our system completes the real-world training cycle by incorporating learned resets via an imitation-based pickup policy as well as learned reward functions, eliminating the need for manual resets and reward engineering. We demonstrate the benefits of reusing past data as replay buffer initialization for new tasks, for instance, the fast acquisition of intricate manipulation skills in the real world on a four-fingered robotic hand. (Videos: https://sites.google.com/view/reboot-dexterous)

摘要
dexterous manipulation task involving contact-rich interactions 是控制系统和模仿学习算法难以处理的挑战。这些复杂性来自 robotic hands 需要在动态建立和破坏 contacts, 平衡非握持力, 和控制大度自由度。 reinforcement learning (RL) 提供了一个有希望的方法，因为它可以自主地获得优化的抓取策略。然而，它的实际应用frequently hindered by the need to generate a large number of samples, reset the environment, and obtain reward signals.在这项工作中，我们介绍了一种高效的RL系统，用于学习dexterous manipulation skills。我们的方法基于 latest advances in sample-efficient RL 和 replay buffer bootstrapping。这种组合允许我们利用不同任务或对象的数据作为新任务的训练开始，从而显著提高学习效率。此外，我们的系统在实际世界训练周期中完成了学习重置和获得奖励信号的操作，从而消除了手动重置和奖励工程化的需求。我们在实际世界中使用了 learned resets 和 learned reward functions，以完成实际世界训练周期。我们示出了 reuse past data as replay buffer initialization for new tasks 的好处，例如在 four-fingered robotic hand 上快速获得复杂的抓取技能。（视频：https://sites.google.com/view/reboot-dexterous)

Fitness Approximation through Machine Learning

paper_url: http://arxiv.org/abs/2309.03318
repo_url: https://github.com/itaitzruia4/approxml
paper_authors: Itai Tzruia, Tomer Halperin, Moshe Sipper, Achiya Elyasaf
for: 这个论文主要目标是提出一种基于机器学习模型的遗传算法中的健康估计方法，以优化遗传算法的运行效率。
methods: 该方法使用机器学习模型来保持一个遗传算法中个体的健康估计，并在进行遗传算法的演化运行中不断更新该模型。
results: 实验结果表明，使用该方法可以显著提高遗传算法的运行效率，并且fitness分数与实际遗传算法的fitness分数相似或者只有轻微差异。

Abstract
We present a novel approach to performing fitness approximation in genetic algorithms (GAs) using machine-learning (ML) models, focusing on evolutionary agents in Gymnasium (game) simulators -- where fitness computation is costly. Maintaining a dataset of sampled individuals along with their actual fitness scores, we continually update throughout an evolutionary run a fitness-approximation ML model. We compare different methods for: 1) switching between actual and approximate fitness, 2) sampling the population, and 3) weighting the samples. Experimental findings demonstrate significant improvement in evolutionary runtimes, with fitness scores that are either identical or slightly lower than that of the fully run GA -- depending on the ratio of approximate-to-actual-fitness computation. Our approach is generic and can be easily applied to many different domains.

摘要
我们提出了一种新的方法，用机器学习（ML）模型来实现遗传算法（GA）中的健康估计，专注于在游戏仿真器（Gymnasium）中的进化代理人。在计算质量高的情况下，我们 continually 更新一个包含采样个体以及其实际健康分的数据集。我们比较了不同的方法来：1）在实际和估计健康之间切换，2）采样人口，和3）Weighting样本。我们的实验结果表明，我们的方法可以显著提高进化时间，并且健康分几乎与完全运行GA的健康分相同或者只有略低一些，具体取决于估计与实际健康计算的比率。我们的方法是通用的，可以适用于多种领域。

Comparative Analysis of Deep-Fake Algorithms

paper_url: http://arxiv.org/abs/2309.03295
repo_url: None
paper_authors: Nikhil Sontakke, Sejal Utekar, Shivansh Rastogi, Shriraj Sonawane
for: 本研究旨在提供深度伪造技术的现状概述，包括深度学习基于的伪造创建方法和检测技术。
methods: 本研究使用多种方法来检测深度伪造视频，包括人脸识别、运动分析和音频视频同步。
results: 本研究发现现有的深度伪造检测技术具有限制和挑战，需要进一步的研究和发展以确保数字视频的真实性。

Abstract
Due to the widespread use of smartphones with high-quality digital cameras and easy access to a wide range of software apps for recording, editing, and sharing videos and images, as well as the deep learning AI platforms, a new phenomenon of 'faking' videos has emerged. Deepfake algorithms can create fake images and videos that are virtually indistinguishable from authentic ones. Therefore, technologies that can detect and assess the integrity of digital visual media are crucial. Deepfakes, also known as deep learning-based fake videos, have become a major concern in recent years due to their ability to manipulate and alter images and videos in a way that is virtually indistinguishable from the original. These deepfake videos can be used for malicious purposes such as spreading misinformation, impersonating individuals, and creating fake news. Deepfake detection technologies use various approaches such as facial recognition, motion analysis, and audio-visual synchronization to identify and flag fake videos. However, the rapid advancement of deepfake technologies has made it increasingly difficult to detect these videos with high accuracy. In this paper, we aim to provide a comprehensive review of the current state of deepfake creation and detection technologies. We examine the various deep learning-based approaches used for creating deepfakes, as well as the techniques used for detecting them. Additionally, we analyze the limitations and challenges of current deepfake detection methods and discuss future research directions in this field. Overall, the paper highlights the importance of continued research and development in deepfake detection technologies in order to combat the negative impact of deepfakes on society and ensure the integrity of digital visual media.

摘要
Due to the widespread use of smartphones with high-quality digital cameras and easy access to a wide range of software apps for recording, editing, and sharing videos and images, as well as the deep learning AI platforms, a new phenomenon of 'faking' videos has emerged. Deepfake algorithms can create fake images and videos that are virtually indistinguishable from authentic ones. Therefore, technologies that can detect and assess the integrity of digital visual media are crucial. Deepfakes, also known as deep learning-based fake videos, have become a major concern in recent years due to their ability to manipulate and alter images and videos in a way that is virtually indistinguishable from the original. These deepfake videos can be used for malicious purposes such as spreading misinformation, impersonating individuals, and creating fake news. Deepfake detection technologies use various approaches such as facial recognition, motion analysis, and audio-visual synchronization to identify and flag fake videos. However, the rapid advancement of deepfake technologies has made it increasingly difficult to detect these videos with high accuracy. In this paper, we aim to provide a comprehensive review of the current state of deepfake creation and detection technologies. We examine the various deep learning-based approaches used for creating deepfakes, as well as the techniques used for detecting them. Additionally, we analyze the limitations and challenges of current deepfake detection methods and discuss future research directions in this field. Overall, the paper highlights the importance of continued research and development in deepfake detection technologies in order to combat the negative impact of deepfakes on society and ensure the integrity of digital visual media.

My Art My Choice: Adversarial Protection Against Unruly AI

paper_url: http://arxiv.org/abs/2309.03198
repo_url: None
paper_authors: Anthony Rhodes, Ram Bhagat, Umur Aybars Ciftci, Ilke Demir
for: 保护创作者的版权，防止Diffusion模型把艺术作品利用为自己的目的。
methods: 使用UNet生成器、多种损失函数、对黑盒Diffusion模型进行攻击，生成”保护”版本的图像，以做到对 diffusion模型的防御。
results: 在多个图像-图像任务上进行了实验，并评估了”保护”版本图像和Diffusion模型输出结果的视觉、噪音、结构、像素和生成空间性能，以验证我们的主张。

Abstract
Generative AI is on the rise, enabling everyone to produce realistic content via publicly available interfaces. Especially for guided image generation, diffusion models are changing the creator economy by producing high quality low cost content. In parallel, artists are rising against unruly AI, since their artwork are leveraged, distributed, and dissimulated by large generative models. Our approach, My Art My Choice (MAMC), aims to empower content owners by protecting their copyrighted materials from being utilized by diffusion models in an adversarial fashion. MAMC learns to generate adversarially perturbed "protected" versions of images which can in turn "break" diffusion models. The perturbation amount is decided by the artist to balance distortion vs. protection of the content. MAMC is designed with a simple UNet-based generator, attacking black box diffusion models, combining several losses to create adversarial twins of the original artwork. We experiment on three datasets for various image-to-image tasks, with different user control values. Both protected image and diffusion output results are evaluated in visual, noise, structure, pixel, and generative spaces to validate our claims. We believe that MAMC is a crucial step for preserving ownership information for AI generated content in a flawless, based-on-need, and human-centric way.

摘要
“生成AI在崛起，让每个人可以生成真实的内容通过公开可用的界面。尤其是导航图像生成，扩散模型在创新经济中产生高品质低成本的内容。在平行的情况下，艺术家在不良AI的挑战下，因为他们的艺术作品被扩散、分布和歪化了大量生成模型。我们的方法，“我的艺术，我的选择”（MAMC），旨在强化内容拥有者的权益，对扩散模型进行反对式使用的内容。MAMC使用了一个简单的UNet型生成器，攻击黑盒扩散模型，结合了多种损失函数创建反对双生的原始艺术作品。我们在三个 dataset上进行了多种图像转换任务的实验，不同的用户控制值。两个受保护图像和扩散输出结果在视觉、噪音、结构、像素和生成空间进行评估，以验证我们的声明。我们相信MAMC是为AI生成内容的拥有者掌握权益的重要一步，以精彩、需要、人性化的方式。”

Temporal Inductive Path Neural Network for Temporal Knowledge Graph Reasoning

paper_url: http://arxiv.org/abs/2309.03251
repo_url: None
paper_authors: Hao Dong, Pengyang Wang, Meng Xiao, Zhiyuan Ning, Pengfei Wang, Yuanchun Zhou
for: 该论文旨在提高 temps 知识图（TKG） reasoning 任务的性能，特别是在处理历史信息和新生成的实体时。
methods: 该论文提出了一种名为 Temporal Inductive Path Neural Network（TiPNN）的模型，它在entity-independent的角度模型历史信息，并通过定义query-aware的时间路径来模型历史路径信息相关于查询。
results: 实验结果表明，提出的模型不仅具有显著性能提升，还能够处理 inductive 设定，并且可以提供历史图中的理由证明。

Abstract
Temporal Knowledge Graph (TKG) is an extension of traditional Knowledge Graph (KG) that incorporates the dimension of time. Reasoning on TKGs is a crucial task that aims to predict future facts based on historical occurrences. The key challenge lies in uncovering structural dependencies within historical subgraphs and temporal patterns. Most existing approaches model TKGs relying on entity modeling, as nodes in the graph play a crucial role in knowledge representation. However, the real-world scenario often involves an extensive number of entities, with new entities emerging over time. This makes it challenging for entity-dependent methods to cope with extensive volumes of entities, and effectively handling newly emerging entities also becomes a significant challenge. Therefore, we propose Temporal Inductive Path Neural Network (TiPNN), which models historical information in an entity-independent perspective. Specifically, TiPNN adopts a unified graph, namely history temporal graph, to comprehensively capture and encapsulate information from history. Subsequently, we utilize the defined query-aware temporal paths to model historical path information related to queries on history temporal graph for the reasoning. Extensive experiments illustrate that the proposed model not only attains significant performance enhancements but also handles inductive settings, while additionally facilitating the provision of reasoning evidence through history temporal graphs.

摘要
Temporal Knowledge Graph (TKG) 是传统知识图 (KG) 的扩展，它包含时间dimension。理解 TKGs 是一个重要的任务，旨在根据历史发生项来预测未来的事实。主要挑战在于探索历史子图中的结构依赖关系和时间对称。现有的方法通常基于实体模型，将node在图中扮演重要的知识表示角色。但是，实际情况中通常会有很多实体，新实体随时出现，这使得实体dependent的方法很难处理大量的实体，同时新出现的实体也成为一个主要挑战。因此，我们提出了Temporal Inductive Path Neural Network (TiPNN)，它在实体独立的角度来模型历史信息。具体来说，TiPNN使用了一个统一的图，即历史时间图，以全面捕捉和储存历史信息。然后，我们使用定义的查询相依时间路径来模型对历史时间图的查询。实验结果显示，提案的模型不仅实现了重要的性能提升，同时也能够处理 inductive 设定，并且可以通过历史时间图来提供理解证据。

Split-Boost Neural Networks

paper_url: http://arxiv.org/abs/2309.03167
repo_url: https://github.com/Aastha2104/Parkinson-Disease-Prediction
paper_authors: Raffaele Giuseppe Cestari, Gabriele Maroni, Loris Cannelli, Dario Piga, Simone Formentin
for: 这篇研究的目的是提出一种对于单向神经网络的训练和准确化方法，以提高性能并自动包含调整行为而不需要直接模型。
methods: 这篇研究使用了一种称为”split-boost”的新训练策略，它可以增强性能并自动包含调整行为，不需要直接模型。
results: 研究中使用了一个匿名的医疗保险设计问题的实际数据集，结果显示了该策略可以提高性能和减少调整时间。

Abstract
The calibration and training of a neural network is a complex and time-consuming procedure that requires significant computational resources to achieve satisfactory results. Key obstacles are a large number of hyperparameters to select and the onset of overfitting in the face of a small amount of data. In this framework, we propose an innovative training strategy for feed-forward architectures - called split-boost - that improves performance and automatically includes a regularizing behaviour without modeling it explicitly. Such a novel approach ultimately allows us to avoid explicitly modeling the regularization term, decreasing the total number of hyperparameters and speeding up the tuning phase. The proposed strategy is tested on a real-world (anonymized) dataset within a benchmark medical insurance design problem.

摘要
neural network 的准确和训练是一个复杂且时间consuming的过程，需要大量的计算资源以获得满意的结果。关键障碍是大量的 hyperparameter 选择和数据少的情况下遇到过拟合。在这个框架下，我们提出了一种创新的训练策略 для径向网络 - called split-boost - 可以提高性能并自动包含了正则化行为而无需显式模型。这种新的方法最终允许我们避免显式模型正则化项，减少总的 hyperparameter 数量，加速调整阶段。我们在一个匿名的实际数据集上进行了一个审核医疗保险设计问题。

J-Guard: Journalism Guided Adversarially Robust Detection of AI-generated News

paper_url: http://arxiv.org/abs/2309.03164
repo_url: None
paper_authors: Tharindu Kumarage, Amrita Bhattacharjee, Djordje Padejski, Kristy Roschke, Dan Gillmor, Scott Ruston, Huan Liu, Joshua Garland
for: 本研究旨在寻找一种可靠地检测AI生成的新闻文章，以避免在线的谣言散播。
methods: 该研究利用了一个多 disciplinary 团队的专业知识，开发了一个名为 J-Guard 的框架，可以改进现有的超级vised AI 文本检测器，以便更好地分辨真实的新闻文章和 AI 生成的文章。
results: experiments 表明，J-Guard 可以增强检测能力，同时在面对黑客攻击时保持了 average 的性能下降只有 7%。

Abstract
The rapid proliferation of AI-generated text online is profoundly reshaping the information landscape. Among various types of AI-generated text, AI-generated news presents a significant threat as it can be a prominent source of misinformation online. While several recent efforts have focused on detecting AI-generated text in general, these methods require enhanced reliability, given concerns about their vulnerability to simple adversarial attacks. Furthermore, due to the eccentricities of news writing, applying these detection methods for AI-generated news can produce false positives, potentially damaging the reputation of news organizations. To address these challenges, we leverage the expertise of an interdisciplinary team to develop a framework, J-Guard, capable of steering existing supervised AI text detectors for detecting AI-generated news while boosting adversarial robustness. By incorporating stylistic cues inspired by the unique journalistic attributes, J-Guard effectively distinguishes between real-world journalism and AI-generated news articles. Our experiments on news articles generated by a vast array of AI models, including ChatGPT (GPT3.5), demonstrate the effectiveness of J-Guard in enhancing detection capabilities while maintaining an average performance decrease of as low as 7% when faced with adversarial attacks.

摘要
人工智能生成文本在线的快速扩散正在深刻地改变信息景观。 Among various types of AI-generated text, AI-generated news presents a significant threat as it can be a prominent source of misinformation online. While several recent efforts have focused on detecting AI-generated text in general, these methods require enhanced reliability, given concerns about their vulnerability to simple adversarial attacks. Furthermore, due to the eccentricities of news writing, applying these detection methods for AI-generated news can produce false positives, potentially damaging the reputation of news organizations. To address these challenges, we leverage the expertise of an interdisciplinary team to develop a framework, J-Guard, capable of steering existing supervised AI text detectors for detecting AI-generated news while boosting adversarial robustness. By incorporating stylistic cues inspired by the unique journalistic attributes, J-Guard effectively distinguishes between real-world journalism and AI-generated news articles. Our experiments on news articles generated by a vast array of AI models, including ChatGPT (GPT3.5), demonstrate the effectiveness of J-Guard in enhancing detection capabilities while maintaining an average performance decrease of as low as 7% when faced with adversarial attacks.

Risk-reducing design and operations toolkit: 90 strategies for managing risk and uncertainty in decision problems

paper_url: http://arxiv.org/abs/2309.03133
repo_url: https://github.com/sashagutfraind/uncertainty_strategies
paper_authors: Alexander Gutfraind
for: 这篇论文是为了探讨和发展一个叫做 RDOT（Risk-reducing Design and Operations Toolkit）的解决方案，这些解决方案可以在面对高度不确定的问题时提供有效的回应。
methods: 这篇论文使用了一种称为多元目标优化的方法，将 RDOT 策略分类为六个主要类别，并认为这些策略可以对面对高度不确定的问题提供有效的回应。
results: 这篇论文发现了超过 90 种 RDOT 策略，这些策略可以在不同的领域和领域中找到，这表明了这些策略的共同性。这篇论文还提出了一个框架，可以将这些策略包含在决策理论中，以便在面对高度不确定的问题时使用。

Abstract
Uncertainty is a pervasive challenge in decision analysis, and decision theory recognizes two classes of solutions: probabilistic models and cognitive heuristics. However, engineers, public planners and other decision-makers instead use a third class of strategies that could be called RDOT (Risk-reducing Design and Operations Toolkit). These include incorporating robustness into designs, contingency planning, and others that do not fall into the categories of probabilistic models or cognitive heuristics. Moreover, identical strategies appear in several domains and disciplines, pointing to an important shared toolkit. The focus of this paper is to develop a catalog of such strategies and develop a framework for them. The paper finds more than 90 examples of such strategies falling into six broad categories and argues that they provide an efficient response to decision problems that are seemingly intractable due to high uncertainty. It then proposes a framework to incorporate them into decision theory using multi-objective optimization. Overall, RDOT represents an overlooked class of responses to uncertainty. Because RDOT strategies do not depend on accurate forecasting or estimation, they could be applied fruitfully to certain decision problems affected by high uncertainty and make them much more tractable.

摘要
众所周知，决策分析中的不确定性是一大挑战，决策理论则认可两类解决方案：概率模型和认知逻辑。然而，工程师、公共规划人员和其他决策者实际上使用了第三类策略，可以称为风险减少设计和操作工具箱（RDOT）。这些策略包括在设计中加入强健性，备援计划等，不属于概率模型或认知逻辑类别。此外，同一类策略在不同领域和学科中出现，表明存在共同的工具箱。本文的目的是开发一份这些策略的目录，并为其提出一个框架。研究发现了超过90个这类策略，分为六大类。这些策略能够有效地对决策问题进行处理，即使面临高度不确定性。然后，文章提议使用多目标优化来包含这些策略在决策理论中。总之，RDOT表示决策中的一种被忽略的策略类型。由于RDOT策略不依赖于准确预测或估计，因此可以在高度不确定性的决策问题上应用得更加果断。

MyoDex: A Generalizable Prior for Dexterous Manipulation

paper_url: http://arxiv.org/abs/2309.03130
repo_url: None
paper_authors: Vittorio Caggiano, Sudeep Dasari, Vikash Kumar
for: 这个论文的目的是开发一种基于多任务学习的人工智能控制系统，以便在几个任务中快速学习和执行新的、先前无法实现的行为。
methods: 该论文使用了多任务学习来隐式地捕捉人类手部的行为偏好（MyoDex），并使用了一个physiologically realistic的人类手模型（MyoHand）来训练代理人。
results: 研究发现，使用MyoDex可以在几个任务中快速学习和执行新的行为，并且可以在不同的contact-rich behaviors中表现出人类手部的dexterity。此外，MyoDex还可以在24DoF Adroit Hand中提高dexterity的性能。

Abstract
Human dexterity is a hallmark of motor control. Our hands can rapidly synthesize new behaviors despite the complexity (multi-articular and multi-joints, with 23 joints controlled by more than 40 muscles) of musculoskeletal sensory-motor circuits. In this work, we take inspiration from how human dexterity builds on a diversity of prior experiences, instead of being acquired through a single task. Motivated by this observation, we set out to develop agents that can build upon their previous experience to quickly acquire new (previously unattainable) behaviors. Specifically, our approach leverages multi-task learning to implicitly capture task-agnostic behavioral priors (MyoDex) for human-like dexterity, using a physiologically realistic human hand model - MyoHand. We demonstrate MyoDex's effectiveness in few-shot generalization as well as positive transfer to a large repertoire of unseen dexterous manipulation tasks. Agents leveraging MyoDex can solve approximately 3x more tasks, and 4x faster in comparison to a distillation baseline. While prior work has synthesized single musculoskeletal control behaviors, MyoDex is the first generalizable manipulation prior that catalyzes the learning of dexterous physiological control across a large variety of contact-rich behaviors. We also demonstrate the effectiveness of our paradigms beyond musculoskeletal control towards the acquisition of dexterity in 24 DoF Adroit Hand. Website: https://sites.google.com/view/myodex

摘要
人类dexterity是运动控制的标志之一。我们的手可以快速合成新行为，即使musculoskeletal感知-运动回路复杂（多 JOINTS 和多骨骼，涉及到更多 than 40 肌肉）。在这项工作中，我们受到人类dexterity如何在多种前期经验基础上建立新的行为的启发。我们采取多任务学习的方法，以寻找人类类似的dexterity。我们的方法利用多任务学习来隐式地捕捉任务无关的行为先验（MyoDex），使用一个physiologically realistic的人手模型—MyoHand。我们示出MyoDex在少数shot泛化和负担减少的能力，以及对大量未看过的dexterous manipulation任务的积极转移。使用MyoDex的代理可以解决约3倍多的任务，并在比较基线下4倍快。而先前的工作只能合成单一的musculoskeletal控制行为，MyoDex是首个可以 catalyze 学习人类类似的physiological控制的多种contact-rich行为的普适概念。我们还证明了我们的思想在24 DoF Adroit Hand中的效果。网址：https://sites.google.com/view/myodex

Detecting Manufacturing Defects in PCBs via Data-Centric Machine Learning on Solder Paste Inspection Features

paper_url: http://arxiv.org/abs/2309.03113
repo_url: None
paper_authors: Jubilee Prasad-Rao, Roohollah Heidary, Jesse Williams
For: The paper aims to improve the automated detection of defects in Printed Circuit Board (PCB) manufacturing using Solder Paste Inspection (SPI) and Automated Optical Inspection (AOI) machines.* Methods: The paper uses a data-centric approach to train Machine Learning (ML) models to detect PCB defects at three stages of PCB manufacturing. The authors use SPI-extracted features of 6 million pins to train the ML models, and combine pin-level SPI features with component and PCB IDs to capture any inter-pin, inter-component, or spatial effects that may not be apparent at the pin level.* Results: The paper demonstrates the effectiveness of the proposed approach in detecting PCB defects. The authors use a base extreme gradient boosting (XGBoost) ML model and iterate on the data pre-processing step to improve detection performance. The results show that combining the detection results from different models can identify defective components more accurately.Here’s the information in Simplified Chinese text:* For: 本文旨在提高印刷电路板（PCB）生产中自动检测缺陷的效率，使用粘结材料检测（SPI）和自动光学检测（AOI）机器。* Methods: 本文采用数据驱动的方法来训练机器学习（ML）模型，检测PCB缺陷在三个阶段的PCB生产过程中。作者使用600万个封ajection的SPI特征来训练ML模型，并将封ajection级别的SPI特征与组件和PCB ID结合使用，以捕捉任何 между封ajection、组件或空间效果。* Results: 本文证明提出的方法的效果，使用基本的极限梯度提升（XGBoost）ML模型，并在数据预处理步骤中进行迭代来提高检测性能。结果表明，将不同模型的检测结果相乘可以更加准确地标识缺陷组件。

Abstract
Automated detection of defects in Printed Circuit Board (PCB) manufacturing using Solder Paste Inspection (SPI) and Automated Optical Inspection (AOI) machines can help improve operational efficiency and significantly reduce the need for manual intervention. In this paper, using SPI-extracted features of 6 million pins, we demonstrate a data-centric approach to train Machine Learning (ML) models to detect PCB defects at three stages of PCB manufacturing. The 6 million PCB pins correspond to 2 million components that belong to 15,387 PCBs. Using a base extreme gradient boosting (XGBoost) ML model, we iterate on the data pre-processing step to improve detection performance. Combining pin-level SPI features using component and PCB IDs, we developed training instances also at the component and PCB level. This allows the ML model to capture any inter-pin, inter-component, or spatial effects that may not be apparent at the pin level. Models are trained at the pin, component, and PCB levels, and the detection results from the different models are combined to identify defective components.

摘要
翻译结果：自动检测Printed Circuit Board (PCB)生产过程中的缺陷使用Solder Paste Inspection (SPI)和Automated Optical Inspection (AOI)机器可以提高操作效率，大幅减少人工干预。在这篇论文中，我们使用SPI提取的600万个见识特征，示出了一种数据驱动的方法来使Machine Learning (ML)模型检测PCB缺陷。这600万个PCB见识特征对应了200万个组件，这些组件属于15387个PCB。使用基本的极限Gradient Boosting (XGBoost) ML模型，我们在数据预处理步骤上进行迭代，以提高检测性能。将pin级SPI特征与组件ID和PCB ID结合，我们开发了训练示例也在组件和PCB级别。这使得ML模型能够捕捉到pin级别不可见的间接、组件间或空间效应。我们在不同级别上训练了模型，并将不同级别的检测结果合并，以识别缺陷组件。

A Multimodal Analysis of Influencer Content on Twitter

paper_url: http://arxiv.org/abs/2309.03064
repo_url: https://github.com/danaesavi/micd-influencer-content-twitter
paper_authors: Danae Sánchez Villegas, Catalina Goanta, Nikolaos Aletras
for: This paper aims to assist in the automatic detection of commercial influencer content on Twitter.
methods: The paper uses a new dataset of 15,998 influencer posts, and experiments with a range of predictive models that combine text and visual information, including a proposed cross-attention approach.
results: The paper shows that the cross-attention approach outperforms state-of-the-art multimodal models, and provides a thorough analysis of the strengths and limitations of the models. The models are effective in identifying commercial posts and reducing false positives, while capturing relevant context that aids in the discovery of undisclosed commercial posts.

Abstract
Influencer marketing involves a wide range of strategies in which brands collaborate with popular content creators (i.e., influencers) to leverage their reach, trust, and impact on their audience to promote and endorse products or services. Because followers of influencers are more likely to buy a product after receiving an authentic product endorsement rather than an explicit direct product promotion, the line between personal opinions and commercial content promotion is frequently blurred. This makes automatic detection of regulatory compliance breaches related to influencer advertising (e.g., misleading advertising or hidden sponsorships) particularly difficult. In this work, we (1) introduce a new Twitter (now X) dataset consisting of 15,998 influencer posts mapped into commercial and non-commercial categories for assisting in the automatic detection of commercial influencer content; (2) experiment with an extensive set of predictive models that combine text and visual information showing that our proposed cross-attention approach outperforms state-of-the-art multimodal models; and (3) conduct a thorough analysis of strengths and limitations of our models. We show that multimodal modeling is useful for identifying commercial posts, reducing the amount of false positives, and capturing relevant context that aids in the discovery of undisclosed commercial posts.

摘要
influencer marketing 包括各种策略，brand与popular content creator（i.e., influencer）合作，利用他们的影响力、信任度和对观众的影响，推广和推销产品或服务。由于追求者们更有可能根据authentic product endorsement而购买产品，而不是直接的产品推广，因此商业和非商业内容之间的分界难以确定。这使得自动检测与Influencer Advertising相关的规定遵守（如误导性广告或隐藏赞助）特别困难。在这项工作中，我们（1）引入了15,998名 influencer 的 Twitter（现在是 X）数据集，并将其分为商业和非商业类别以帮助自动检测商业 influencer 内容;（2）试用了一系列的预测模型，其中包括文本和视觉信息的组合，我们的跨层注意方法比 state-of-the-art 多模式模型更高效;（3）进行了模型的全面分析，包括其优点和局限性。我们表明，多模式模型可以帮助标识商业帖子，降低false positives的数量，并捕捉有关的上下文，以帮助发现未经披露的商业帖子。

Framework-Based Qualitative Analysis of Free Responses of Large Language Models: Algorithmic Fidelity

paper_url: http://arxiv.org/abs/2309.06364
repo_url: None
paper_authors: Aliya Amirova, Theodora Fteropoulli, Nafiso Ahmed, Martin R. Cowie, Joel Z. Leibo
For: The paper explores the use of large-scale generative language models (LLMs) to simulate free responses to interview questions and whether these artificial “silicon participants” can be studied using qualitative methods to produce insights that generalize to real human populations.* Methods: The paper uses an LLM to generate interviews with silicon participants matching specific demographic characteristics one-for-one with a set of human participants. The authors use framework-based qualitative analysis to compare the key themes obtained from both human and silicon participants, and they also analyze the structure and tone of the interviews.* Results: The paper finds that while the key themes obtained from both human and silicon participants are strikingly similar, there are significant differences in the structure and tone of the interviews. The authors also find evidence of the hyper-accuracy distortion described by Aher et al. (2023), which suggests that the LLM they tested (GPT-3.5) does not have sufficient algorithmic fidelity to expect research on it to generalize to human populations.Here are the three points in Simplified Chinese text:* For: 这篇论文探讨了使用大规模生成语言模型（LLM）来模拟面试问答和人工智能”Silicon Participants”是否可以通过质量研究方法获得可重复性和普适性。* Methods: 论文使用LLM生成面试问答，并与人类参与者一一匹配特定的人口特征。作者使用框架基本的质量分析方法对两组参与者的主题进行比较。* Results: 论文发现，尽管人类和人工智能参与者的主题相似度很高，但面试结构和语言表达存在很大差异。作者还发现Ahmed等人（2023）所描述的”超级准确性偏见”现象。

Abstract
Today, using Large-scale generative Language Models (LLMs) it is possible to simulate free responses to interview questions like those traditionally analyzed using qualitative research methods. Qualitative methodology encompasses a broad family of techniques involving manual analysis of open-ended interviews or conversations conducted freely in natural language. Here we consider whether artificial "silicon participants" generated by LLMs may be productively studied using qualitative methods aiming to produce insights that could generalize to real human populations. The key concept in our analysis is algorithmic fidelity, a term introduced by Argyle et al. (2023) capturing the degree to which LLM-generated outputs mirror human sub-populations' beliefs and attitudes. By definition, high algorithmic fidelity suggests latent beliefs elicited from LLMs may generalize to real humans, whereas low algorithmic fidelity renders such research invalid. Here we used an LLM to generate interviews with silicon participants matching specific demographic characteristics one-for-one with a set of human participants. Using framework-based qualitative analysis, we showed the key themes obtained from both human and silicon participants were strikingly similar. However, when we analyzed the structure and tone of the interviews we found even more striking differences. We also found evidence of the hyper-accuracy distortion described by Aher et al. (2023). We conclude that the LLM we tested (GPT-3.5) does not have sufficient algorithmic fidelity to expect research on it to generalize to human populations. However, the rapid pace of LLM research makes it plausible this could change in the future. Thus we stress the need to establish epistemic norms now around how to assess validity of LLM-based qualitative research, especially concerning the need to ensure representation of heterogeneous lived experiences.

摘要
The key concept in our analysis was "algorithmic fidelity," which refers to the degree to which LLM-generated outputs reflect the beliefs and attitudes of human sub-populations. If the algorithmic fidelity is high, it suggests that the latent beliefs elicited from the LLM may generalize to real humans, while low algorithmic fidelity renders the research invalid.To test the algorithmic fidelity of an LLM (GPT-3.5), we generated interviews with silicon participants that matched specific demographic characteristics with a set of human participants. We used framework-based qualitative analysis to identify the key themes in both the human and silicon participants' interviews. While the themes were strikingly similar, we found more significant differences in the structure and tone of the interviews. Additionally, we found evidence of the "hyper-accuracy distortion" described by Aher et al. (2023), which suggests that the LLM's responses were overly accurate and lacked the nuance and variation found in human responses.Based on our findings, we conclude that the LLM we tested does not have sufficient algorithmic fidelity to expect research on it to generalize to human populations. However, the rapid pace of LLM research makes it plausible that this could change in the future. Therefore, we stress the need to establish epistemic norms now around how to assess the validity of LLM-based qualitative research, particularly with regard to ensuring representation of heterogeneous lived experiences.

Hide and Seek (HaS): A Lightweight Framework for Prompt Privacy Protection

paper_url: http://arxiv.org/abs/2309.03057
repo_url: https://github.com/alohachen/hide-and-seek
paper_authors: Yu Chen, Tingxin Li, Huiming Liu, Yang Yu
for: 这个研究是为了提高大型自然语言模型（LLM）的隐私保护。methods: 这个研究使用了多方 computation（MPC）技术来保护用户的隐私，并将隐私资讯转换为小型本地模型来实现隐私保护。results: 实验结果显示，这个 HaS 框架可以实现隐私保护和功能性的平衡，并在翻译和分类任务中进行了成功的评估。

Abstract
Numerous companies have started offering services based on large language models (LLM), such as ChatGPT, which inevitably raises privacy concerns as users' prompts are exposed to the model provider. Previous research on secure reasoning using multi-party computation (MPC) has proven to be impractical for LLM applications due to its time-consuming and communication-intensive nature. While lightweight anonymization techniques can protect private information in prompts through substitution or masking, they fail to recover sensitive data replaced in the LLM-generated results. In this paper, we expand the application scenarios of anonymization techniques by training a small local model to de-anonymize the LLM's returned results with minimal computational overhead. We introduce the HaS framework, where "H(ide)" and "S(eek)" represent its two core processes: hiding private entities for anonymization and seeking private entities for de-anonymization, respectively. To quantitatively assess HaS's privacy protection performance, we propose both black-box and white-box adversarial models. Furthermore, we conduct experiments to evaluate HaS's usability in translation and classification tasks. The experimental findings demonstrate that the HaS framework achieves an optimal balance between privacy protection and utility.

摘要
众多公司已经开始提供基于大语言模型（LLM）的服务，例如ChatGPT，这无疑会引起隐私问题，因为用户的提示被暴露给模型提供者。先前的多方计算（MPC）安全思维研究已经证明对LLM应用程序来说是不实用的，因为它们的时间开销和通信 overhead 过高。虽然轻量级隐私技术可以在提示中保护private信息，但它们无法在LLM生成的结果中恢复敏感信息。在这篇论文中，我们扩展了隐私技术的应用场景，通过训练一个小型本地模型来解除LLM返回的结果中的隐私信息，并且减少计算 overhead。我们称之为HaS框架，其中"H"和"S"表示其两个核心过程：隐藏private实体 для隐私和寻找private实体 для解除隐私。为了量化HaS的隐私保护性能，我们提出了黑盒和白盒反对模型。此外，我们进行了翻译和分类任务的实验，以评估HaS的可用性。实验结果表明，HaS框架在隐私保护和实用之间做出了优质的均衡。

Combining pre-trained Vision Transformers and CIDER for Out Of Domain Detection

paper_url: http://arxiv.org/abs/2309.03047
repo_url: None
paper_authors: Grégor Jouet, Clément Duhart, Francis Rousseaux, Julio Laborde, Cyril de Runz
for: 本研究探讨了预训练模型在偏出分布外的检测性能。
methods: 本研究使用了预训练的 transformers 和 CNN 模型，以及改进方法 CIDER。
results: 实验结果表明，预训练 transformers 模型在 OOD 检测 task 上表现出色，而预训练 ViT 和 CNN 可以通过与 CIDER 结合来进一步提高 OOD 检测性能。

Abstract
Out-of-domain (OOD) detection is a crucial component in industrial applications as it helps identify when a model encounters inputs that are outside the training distribution. Most industrial pipelines rely on pre-trained models for downstream tasks such as CNN or Vision Transformers. This paper investigates the performance of those models on the task of out-of-domain detection. Our experiments demonstrate that pre-trained transformers models achieve higher detection performance out of the box. Furthermore, we show that pre-trained ViT and CNNs can be combined with refinement methods such as CIDER to improve their OOD detection performance even more. Our results suggest that transformers are a promising approach for OOD detection and set a stronger baseline for this task in many contexts

摘要
<>输出文本转换为简化中文。<>业务应用中， OUT-OF-DOMAIN（OOD）检测是一项重要的组成部分，可以识别模型处理的输入是否在训练分布外。大多数工业管道都依赖于预训练模型来进行下游任务，如CNN或Vision Transformers。这篇论文研究了这些模型在OOD检测任务上的性能。我们的实验表明，预训练的转换模型在出团的情况下可以达到更高的检测性能。此外，我们还证明了预训练的CNN和ViT可以与修正方法如CIDER结合，以进一步提高OOD检测性能。我们的结果表明，转换器是OOD检测中的一个有力的方法，并在许多情况下设置了更强的基线。

A Refutation of Shapley Values for Explainability

paper_url: http://arxiv.org/abs/2309.03041
repo_url: None
paper_authors: Xuanxiang Huang, Joao Marques-Silva
for: 这个论文的目的是否定使用Shapley值作为基于规则的解释中的理论基础。
methods: 这篇论文使用了暴力方法来找到Boolean函数，并对这些函数进行分析以找到它们的缺陷。
results: 这篇论文证明了，无论feature数是多少，都存在Boolean函数会显示一些不准确的解释问题，因此不能使用Shapley值作为基于规则的解释中的理议基础。

Abstract
Recent work demonstrated the existence of Boolean functions for which Shapley values provide misleading information about the relative importance of features in rule-based explanations. Such misleading information was broadly categorized into a number of possible issues. Each of those issues relates with features being relevant or irrelevant for a prediction, and all are significant regarding the inadequacy of Shapley values for rule-based explainability. This earlier work devised a brute-force approach to identify Boolean functions, defined on small numbers of features, and also associated instances, which displayed such inadequacy-revealing issues, and so served as evidence to the inadequacy of Shapley values for rule-based explainability. However, an outstanding question is how frequently such inadequacy-revealing issues can occur for Boolean functions with arbitrary large numbers of features. It is plain that a brute-force approach would be unlikely to provide insights on how to tackle this question. This paper answers the above question by proving that, for any number of features, there exist Boolean functions that exhibit one or more inadequacy-revealing issues, thereby contributing decisive arguments against the use of Shapley values as the theoretical underpinning of feature-attribution methods in explainability.

摘要

An Efficient Temporary Deepfake Location Approach Based Embeddings for Partially Spoofed Audio Detection

paper_url: http://arxiv.org/abs/2309.03036
repo_url: https://github.com/xieyuankun/tdl-add
paper_authors: Yuankun Xie, Haonan Cheng, Yutian Wang, Long Ye
for: 本文提出了一种精细化的假音检测方法，即Temporal Deepfake Location（TDL），以准确地检测帧级假音。
methods: 该方法包括两部分：嵌入相似模块和时间卷积操作。嵌入相似模块用于生成一个嵌入空间，以分离真实和假的帧。时间卷积操作则用于计算邻帧之间的相似性，并动态选择有用的邻帧进行卷积。
results: 实验显示，与基eline模型相比，本方法在ASVspoof2019 Partial Spoof dataset中表现出色，并在跨数据集场景下也达到了优秀的表现。代码已经上传到了线上。

Abstract
Partially spoofed audio detection is a challenging task, lying in the need to accurately locate the authenticity of audio at the frame level. To address this issue, we propose a fine-grained partially spoofed audio detection method, namely Temporal Deepfake Location (TDL), which can effectively capture information of both features and locations. Specifically, our approach involves two novel parts: embedding similarity module and temporal convolution operation. To enhance the identification between the real and fake features, the embedding similarity module is designed to generate an embedding space that can separate the real frames from fake frames. To effectively concentrate on the position information, temporal convolution operation is proposed to calculate the frame-specific similarities among neighboring frames, and dynamically select informative neighbors to convolution. Extensive experiments show that our method outperform baseline models in ASVspoof2019 Partial Spoof dataset and demonstrate superior performance even in the crossdataset scenario. The code is released online.

摘要
<>将给定文本翻译成简化中文。<>假 Audio 检测是一个复杂的任务，因为需要准确地确定各帧的真实性。为解决这个问题，我们提出了一种细化的假 Audio 检测方法，即 Temporal Deepfake Location（TDL），可以准确地捕捉各帧的特征信息和位置信息。具体来说，我们的方法包括两个新的部分：嵌入相似模块和时间核算操作。为了增强真实和假 Frame 之间的区别，我们设计了嵌入相似模块，用于生成一个可以分离真帧和假帧的嵌入空间。此外，我们还提出了时间核算操作，用于在邻近帧之间进行同义核算，以选择有用的邻居进行 convolution。我们的方法在 ASVspoof2019 partial spoof 数据集上实现了对基线模型的超越性，并在跨数据集场景下也达到了优秀的表现。代码已经在线发布。

Synthetic Text Generation using Hypergraph Representations

paper_url: http://arxiv.org/abs/2309.06550
repo_url: None
paper_authors: Natraj Raman, Sameena Shah
for: 文章的目的是生成文档的 sintetic variants，即文本转换。
methods: 本文提出了一种基于LLM的方法，首先将文档分解成semantic frames，然后使用这种间接稀疏格式生成文本。frames使用了一个浮动图，可以在原则性的方式下进行框架内容的改变。
results: 本文的解决方案可以生成多样化、凝聚的文档，其中包括不同的样式、情感、格式、组织结构和事实。

Abstract
Generating synthetic variants of a document is often posed as text-to-text transformation. We propose an alternate LLM based method that first decomposes a document into semantic frames and then generates text using this interim sparse format. The frames are modeled using a hypergraph, which allows perturbing the frame contents in a principled manner. Specifically, new hyperedges are mined through topological analysis and complex polyadic relationships including hierarchy and temporal dynamics are accommodated. We show that our solution generates documents that are diverse, coherent and vary in style, sentiment, format, composition and facts.

摘要
通常，生成文档的变体是一个文本到文本转换问题。我们提出了一种基于LLM的新方法，它首先将文档分解成Semantic Frame，然后使用这种间接稀有格式生成文本。这些帧是通过超graph进行模型，这允许在原则上进行帧内容的扰动。我们显示了我们的解决方案可以生成多样、一致、style、情感、格式、结构和事实等方面的文档。

Universal Preprocessing Operators for Embedding Knowledge Graphs with Literals

paper_url: http://arxiv.org/abs/2309.03023
repo_url: https://gitlab.com/patryk.preisner/mkga
paper_authors: Patryk Preisner, Heiko Paulheim
for: 本文是针对知识图（KG）中的实体进行 dense 数值表示的 dense numerical representations。
methods: 本文提出了一组通用预处理操作，可以将 KG 中的实体与数值、时间、文本和图像信息转换为任何方法可以使用的形式。
results: 在 kgbench 数据集上使用三种不同的嵌入方法进行实验，得到了满意的结果。

Abstract
Knowledge graph embeddings are dense numerical representations of entities in a knowledge graph (KG). While the majority of approaches concentrate only on relational information, i.e., relations between entities, fewer approaches exist which also take information about literal values (e.g., textual descriptions or numerical information) into account. Those which exist are typically tailored towards a particular modality of literal and a particular embedding method. In this paper, we propose a set of universal preprocessing operators which can be used to transform KGs with literals for numerical, temporal, textual, and image information, so that the transformed KGs can be embedded with any method. The results on the kgbench dataset with three different embedding methods show promising results.

摘要
知识图谱嵌入是指知识图谱中实体的密集数字表示。大多数方法只关注知识图谱中的关系信息，而 fewer approaches 存在，它们通常专门针对特定类型的媒体和嵌入方法。在这篇论文中，我们提出了一组通用预处理操作，可以将知识图谱中的 literals 转换为数字、时间、文本和图像信息，以便使用任何嵌入方法进行嵌入。 kgbench 数据集上的三种嵌入方法的结果表现良好。

EdgeFL: A Lightweight Decentralized Federated Learning Framework

paper_url: http://arxiv.org/abs/2309.02936
repo_url: None
paper_authors: Hongyi Zhang, Jan Bosch, Helena Holmström Olsson
for: 本研究旨在提供一个轻量级的分布式机器学习框架，以解决现有的聚合中心和扩展性问题。
methods: 本研究使用的方法是边缘仅的轻量级分布式机器学习框架，扩展了现有的中央聚合和扩展性问题。
results: 本研究的结果显示，边缘仅的轻量级分布式机器学习框架可以实现更好的性能，包括对于聚合中心和扩展性的改善。

Abstract
Federated Learning (FL) has emerged as a promising approach for collaborative machine learning, addressing data privacy concerns. However, existing FL platforms and frameworks often present challenges for software engineers in terms of complexity, limited customization options, and scalability limitations. In this paper, we introduce EdgeFL, an edge-only lightweight decentralized FL framework, designed to overcome the limitations of centralized aggregation and scalability in FL deployments. By adopting an edge-only model training and aggregation approach, EdgeFL eliminates the need for a central server, enabling seamless scalability across diverse use cases. With a straightforward integration process requiring just four lines of code (LOC), software engineers can easily incorporate FL functionalities into their AI products. Furthermore, EdgeFL offers the flexibility to customize aggregation functions, empowering engineers to adapt them to specific needs. Based on the results, we demonstrate that EdgeFL achieves superior performance compared to existing FL platforms/frameworks. Our results show that EdgeFL reduces weights update latency and enables faster model evolution, enhancing the efficiency of edge devices. Moreover, EdgeFL exhibits improved classification accuracy compared to traditional centralized FL approaches. By leveraging EdgeFL, software engineers can harness the benefits of federated learning while overcoming the challenges associated with existing FL platforms/frameworks.

摘要
Federated Learning（FL）已经出现为协同机器学习的有力的方法，解决了数据隐私问题。然而，现有的 FL 平台和框架通常会给软件工程师带来复杂性、有限的自定义选项和可扩展性限制。在这篇论文中，我们介绍 EdgeFL，一个靠 Edge 进行轻量级归并的分布式 FL 框架，用于超越中央集成和可扩展性在 FL 部署中的限制。通过采用 Edge Only 模型训练和归并方法，EdgeFL 消除了中央服务器的需求，使得各种使用场景中的扩展性变得自然和简单。具有四行代码（LOC）的简单集成过程，软件工程师可以轻松地将 FL 功能集成到其 AI 产品中。此外，EdgeFL 还提供了自定义归并函数的 flexibility，使得工程师可以根据特定需求进行适应。根据结果，我们表明 EdgeFL 可以比现有的 FL 平台/框架实现更高的性能。我们的结果显示，EdgeFL 可以降低 weights 更新延迟和快速进化模型，提高边缘设备的效率。此外，EdgeFL 还展现出了比传统中央 FL 方法更高的分类精度。通过利用 EdgeFL，软件工程师可以获得 federated learning 的优势，同时超越现有的 FL 平台/框架中的挑战。

Estimating irregular water demands with physics-informed machine learning to inform leakage detection

paper_url: http://arxiv.org/abs/2309.02935
repo_url: https://github.com/swn-group-at-tu-berlin/lila-pinn
paper_authors: Ivo Daniel, Andrea Cominola
for: This paper aims to develop a physics-informed machine learning algorithm for timely identifying and accurately localizing leakages in drinking water distribution networks.
methods: The proposed algorithm uses a fully connected neural network to analyze pressure data and estimate unknown irregular water demands, leveraging the Bernoulli equation to linearize the leakage detection problem.
results: The algorithm was tested on data from the L-Town benchmark network and showed good performance in estimating most irregular demands, with R2 values larger than 0.8. The results also demonstrated that the algorithm can improve the identification of leakages under the presence of irregular demands by a factor of 5.3 for abrupt leaks and a factor of 3.0 for incipient leaks compared to disregarding irregular demands.

Abstract
Leakages in drinking water distribution networks pose significant challenges to water utilities, leading to infrastructure failure, operational disruptions, environmental hazards, property damage, and economic losses. The timely identification and accurate localisation of such leakages is paramount for utilities to mitigate these unwanted effects. However, implementation of algorithms for leakage detection is limited in practice by requirements of either hydraulic models or large amounts of training data. Physics-informed machine learning can utilise hydraulic information thereby circumventing both limitations. In this work, we present a physics-informed machine learning algorithm that analyses pressure data and therefrom estimates unknown irregular water demands via a fully connected neural network, ultimately leveraging the Bernoulli equation and effectively linearising the leakage detection problem. Our algorithm is tested on data from the L-Town benchmark network, and results indicate a good capability for estimating most irregular demands, with R2 larger than 0.8. Identification results for leakages under the presence of irregular demands could be improved by a factor of 5.3 for abrupt leaks and a factor of 3.0 for incipient leaks when compared the results disregarding irregular demands.

摘要
饮水供应网络中的泄漏问题对水公司带来了重要挑战，导致基础设施崩溃、操作中断、环境风险、财务损失等。准确地识别和定位泄漏是水公司应对这些不良影响的关键。然而，现实中实施泄漏检测算法的困难在于需要水力模型或大量的训练数据。物理学 Informed machine learning 可以利用水力信息，从而绕过这两个限制。在这个工作中，我们提出了一种基于物理学 Informed machine learning 算法，通过分析压力数据，并由完全连接神经网络来估算未知的不规则水需求，最终利用白银方程和有效地线性化泄漏检测问题。我们的算法在 L-Town 测试网络上进行了测试，结果表明可以准确地估算大多数不规则需求，R2 值大于 0.8。在存在不规则需求情况下，泄漏的标识结果可以提高了5.3倍 для突然泄漏和3.0倍 дляincipient泄漏。

On the Challenges of Building Datasets for Hate Speech Detection

paper_url: http://arxiv.org/abs/2309.02912
repo_url: None
paper_authors: Vitthal Bhandari
for: 本研究旨在提供一种数据创建管道框架，以便在未来创建 hate speech 数据集时能够遵循best practice。
methods: 本研究使用了一种数据中心视角来分析 hate speech 检测问题，并提出了一个涵盖七个维度的数据创建管道框架。
results: 本研究通过使用这种框架，可以帮助 практикан们在未来创建 hate speech 数据集时遵循best practice，并提高数据的可靠性和一致性。

Abstract
Detection of hate speech has been formulated as a standalone application of NLP and different approaches have been adopted for identifying the target groups, obtaining raw data, defining the labeling process, choosing the detection algorithm, and evaluating the performance in the desired setting. However, unlike other downstream tasks, hate speech suffers from the lack of large-sized, carefully curated, generalizable datasets owing to the highly subjective nature of the task. In this paper, we first analyze the issues surrounding hate speech detection through a data-centric lens. We then outline a holistic framework to encapsulate the data creation pipeline across seven broad dimensions by taking the specific example of hate speech towards sexual minorities. We posit that practitioners would benefit from following this framework as a form of best practice when creating hate speech datasets in the future.

摘要
偏见排斥检测已经被设计为自然语言处理（NLP）的独立应用程序，不同的方法被采用来识别目标群体、获取原始数据、定义标签过程、选择检测算法和评估性能在适当的设定下。然而，与其他下游任务不同，偏见排斥受到大量、精心整理、通用的数据集的缺乏，这是因为这项任务的性质具有高度主观的特点。在这篇文章中，我们首先分析了偏见排斥检测的问题，并以 hate speech towards sexual minorities 为例子，描述了一个整体的框架，以帮助实践者在未来创建偏见排斥数据集时 seguir esta framework 作为最佳实践。

DECODE: Data-driven Energy Consumption Prediction leveraging Historical Data and Environmental Factors in Buildings

paper_url: http://arxiv.org/abs/2309.02908
repo_url: None
paper_authors: Aditya Mishra, Haroon R. Lone, Aayush Mishra
for: 预测建筑物的能源消耗，以便实现有效的能源管理和 distribuTECH grid 内部的能源分配。
methods: 使用历史能源数据、occupancy 模式和天气条件来预测建筑物的能源消耗，并使用 Long Short-Term Memory (LSTM) 模型进行预测。
results: 相比已有预测方法，LSTM 模型提供了更高的预测精度，其 R2 分数为 0.97，mean absolute error (MAE) 为 0.007。此外，该模型还能够从限制的数据集中进行高效的预测，并且具有很好的泛化能力和可靠性。

Abstract
Energy prediction in buildings plays a crucial role in effective energy management. Precise predictions are essential for achieving optimal energy consumption and distribution within the grid. This paper introduces a Long Short-Term Memory (LSTM) model designed to forecast building energy consumption using historical energy data, occupancy patterns, and weather conditions. The LSTM model provides accurate short, medium, and long-term energy predictions for residential and commercial buildings compared to existing prediction models. We compare our LSTM model with established prediction methods, including linear regression, decision trees, and random forest. Encouragingly, the proposed LSTM model emerges as the superior performer across all metrics. It demonstrates exceptional prediction accuracy, boasting the highest R2 score of 0.97 and the most favorable mean absolute error (MAE) of 0.007. An additional advantage of our developed model is its capacity to achieve efficient energy consumption forecasts even when trained on a limited dataset. We address concerns about overfitting (variance) and underfitting (bias) through rigorous training and evaluation on real-world data. In summary, our research contributes to energy prediction by offering a robust LSTM model that outperforms alternative methods and operates with remarkable efficiency, generalizability, and reliability.

摘要
The proposed LSTM model achieves high prediction accuracy, with an R2 score of 0.97 and a mean absolute error (MAE) of 0.007. Additionally, the model is efficient and can achieve accurate energy consumption forecasts even when trained on a limited dataset. To address concerns about overfitting and underfitting, the model is trained and evaluated on real-world data.Compared to other prediction methods, including linear regression, decision trees, and random forest, the LSTM model emerges as the superior performer across all metrics. The proposed model offers a robust and reliable solution for energy prediction, with exceptional efficiency and generalizability. Overall, this research contributes to energy prediction by providing a more accurate and effective approach for managing energy consumption in buildings.

A deep Natural Language Inference predictor without language-specific training data

paper_url: http://arxiv.org/abs/2309.02887
repo_url: https://github.com/Aryia-Behroziuan/neurons
paper_authors: Lorenzo Corradi, Alessandro Manenti, Francesca Del Bonifro, Francesco Setti, Dario Del Sorbo
for: 解决目标语言选择无需语言特定的训练数据集的推理关系（NLI）问题。
methods: 利用生成句子嵌入的generic翻译数据集，并使用两个相同的预训练模型：一个生成源语言句子嵌入，另一个在目标语言上练习，以模拟第一个。这种方法被称为知识填充。
results: 在Stanford NLI测试集、Multi-Genre NLI测试集和手动翻译RTE3-ITA测试集上评估了提议的建筑，并在不同任务上进行了实际验证，包括意识分析、偏好分析和主题识别。结果表明，知识填充技术可以超越基于机器翻译的方法，即使其未直接在测试数据上训练。

Abstract
In this paper we present a technique of NLP to tackle the problem of inference relation (NLI) between pairs of sentences in a target language of choice without a language-specific training dataset. We exploit a generic translation dataset, manually translated, along with two instances of the same pre-trained model - the first to generate sentence embeddings for the source language, and the second fine-tuned over the target language to mimic the first. This technique is known as Knowledge Distillation. The model has been evaluated over machine translated Stanford NLI test dataset, machine translated Multi-Genre NLI test dataset, and manually translated RTE3-ITA test dataset. We also test the proposed architecture over different tasks to empirically demonstrate the generality of the NLI task. The model has been evaluated over the native Italian ABSITA dataset, on the tasks of Sentiment Analysis, Aspect-Based Sentiment Analysis, and Topic Recognition. We emphasise the generality and exploitability of the Knowledge Distillation technique that outperforms other methodologies based on machine translation, even though the former was not directly trained on the data it was tested over.

摘要
在这篇论文中，我们提出了一种基于自然语言处理（NLP）技术的问题推理关系（NLI）解决方案，无需特定语言的训练数据集。我们利用了一个通用翻译数据集，手动翻译的两个实例，其中一个用于生成源语言的句子嵌入，另一个在目标语言上进行了精度调整，以模仿第一个。这种技术被称为知识填充。我们在机器翻译的Stanford NLI测试数据集、多种语言 NLI测试数据集以及手动翻译的RTE3-ITA测试数据集上评估了该模型。我们还在不同任务上测试了该建议的架构，以证明其通用性。在原始意大利语言ABSITA数据集上，我们评估了情感分析、受体语言分析和主题识别等任务。我们强调了知识填充技术的通用性和可利用性，并证明了它在基于机器翻译的方法之上表现出色。

MAD: Modality Agnostic Distance Measure for Image Registration

paper_url: http://arxiv.org/abs/2309.02875
repo_url: None
paper_authors: Vasiliki Sideri-Lampretsa, Veronika A. Zimmer, Huaqi Qiu, Georgios Kaissis, Daniel Rueckert
for: 这个论文的目的是提出一种多模态图像匹配算法，以便在医学应用中进行前处理。
methods: 该论文使用了随机卷积来学习图像的内在几何结构，并使用这种方法来适应不同的探测模式。
results: 论文的实验结果表明，使用这种方法可以成功地将多模态图像匹配成功，并且其捕捉范围更大于传统的度量方法，如相互信息和 норmalized gradient fields。

Abstract
Multi-modal image registration is a crucial pre-processing step in many medical applications. However, it is a challenging task due to the complex intensity relationships between different imaging modalities, which can result in large discrepancy in image appearance. The success of multi-modal image registration, whether it is conventional or learning based, is predicated upon the choice of an appropriate distance (or similarity) measure. Particularly, deep learning registration algorithms lack in accuracy or even fail completely when attempting to register data from an "unseen" modality. In this work, we present Modality Agnostic Distance (MAD), a deep image distance}] measure that utilises random convolutions to learn the inherent geometry of the images while being robust to large appearance changes. Random convolutions are geometry-preserving modules which we use to simulate an infinite number of synthetic modalities alleviating the need for aligned paired data during training. We can therefore train MAD on a mono-modal dataset and successfully apply it to a multi-modal dataset. We demonstrate that not only can MAD affinely register multi-modal images successfully, but it has also a larger capture range than traditional measures such as Mutual Information and Normalised Gradient Fields.

摘要

Rethinking Momentum Knowledge Distillation in Online Continual Learning

paper_url: http://arxiv.org/abs/2309.02870
repo_url: None
paper_authors: Nicolas Michel, Maorong Wang, Ling Xiao, Toshihiko Yamasaki
For: addresses the problem of training neural networks on a continuous data stream where multiple classification tasks emerge in sequence.* Methods: uses Momentum Knowledge Distillation (MKD) to enhance existing OCL methods, and demonstrates its capabilities to improve accuracy by more than 10% points on ImageNet100.* Results: improves existing state-of-the-art accuracy by more than 10% points on ImageNet100, and sheds light on MKD internal mechanics and impacts during training in OCL.

Abstract
Online Continual Learning (OCL) addresses the problem of training neural networks on a continuous data stream where multiple classification tasks emerge in sequence. In contrast to offline Continual Learning, data can be seen only once in OCL. In this context, replay-based strategies have achieved impressive results and most state-of-the-art approaches are heavily depending on them. While Knowledge Distillation (KD) has been extensively used in offline Continual Learning, it remains under-exploited in OCL, despite its potential. In this paper, we theoretically analyze the challenges in applying KD to OCL. We introduce a direct yet effective methodology for applying Momentum Knowledge Distillation (MKD) to many flagship OCL methods and demonstrate its capabilities to enhance existing approaches. In addition to improving existing state-of-the-arts accuracy by more than $10\%$ points on ImageNet100, we shed light on MKD internal mechanics and impacts during training in OCL. We argue that similar to replay, MKD should be considered a central component of OCL.

摘要
In this paper, we theoretically analyze the challenges in applying KD to OCL. We introduce a direct yet effective methodology for applying momentum knowledge distillation (MKD) to many flagship OCL methods and demonstrate its capabilities to enhance existing approaches. In addition to improving existing state-of-the-art accuracy by more than 10 percentage points on ImageNet100, we shed light on MKD's internal mechanisms and impacts during training in OCL. We argue that, similar to replay, MKD should be considered a central component of OCL.

A recommender for the management of chronic pain in patients undergoing spinal cord stimulation

paper_url: http://arxiv.org/abs/2309.03918
repo_url: None
paper_authors: Tigran Tchrakian, Mykhaylo Zayats, Alessandra Pascale, Dat Huynh, Pritish Parida, Carla Agurto Rios, Sergiy Zhuk, Jeffrey L. Rogers, ENVISION Studies Physician Author Group, Boston Scientific Research Scientists Consortium
for: 这个论文主要是为了管理慢性疼痛而设计的。
methods: 这个论文使用了一种 Contextual Multi-armed Bandit（CMAB）方法，用于开发一个可以为慢性疼痛患者提供SCS设置的推荐系统。
results: 这个研究发现，通过向患者提供SCS推荐，可以 statistically significant improvement in clinical outcomes（疼痛和/或生活质量），85%的所有subjects（N=21）表现出了改善。在moderate PS（N=7）的subjects中，100%的subjects表现出了 statistically significant improvement，5/7的subjects有改善的PS dwell time。

Abstract
Spinal cord stimulation (SCS) is a therapeutic approach used for the management of chronic pain. It involves the delivery of electrical impulses to the spinal cord via an implanted device, which when given suitable stimulus parameters can mask or block pain signals. Selection of optimal stimulation parameters usually happens in the clinic under the care of a provider whereas at-home SCS optimization is managed by the patient. In this paper, we propose a recommender system for the management of pain in chronic pain patients undergoing SCS. In particular, we use a contextual multi-armed bandit (CMAB) approach to develop a system that recommends SCS settings to patients with the aim of improving their condition. These recommendations, sent directly to patients though a digital health ecosystem, combined with a patient monitoring system closes the therapeutic loop around a chronic pain patient over their entire patient journey. We evaluated the system in a cohort of SCS-implanted ENVISION study subjects (Clinicaltrials.gov ID: NCT03240588) using a combination of quality of life metrics and Patient States (PS), a novel measure of holistic outcomes. SCS recommendations provided statistically significant improvement in clinical outcomes (pain and/or QoL) in 85\% of all subjects (N=21). Among subjects in moderate PS (N=7) prior to receiving recommendations, 100\% showed statistically significant improvements and 5/7 had improved PS dwell time. This analysis suggests SCS patients may benefit from SCS recommendations, resulting in additional clinical improvement on top of benefits already received from SCS therapy.

摘要
脊梗刺激疗法（SCS）是一种治疗方法用于管理慢性痛。它通过在体内嵌入设备的电rical脉搏，对脊梗进行刺激，可以阻据或掩盖痛讯。选择最佳刺激参数通常在医生诊所进行，而在家SCS优化则由病人自行处理。在这篇研究中，我们提出了一个推荐系统，用于管理慢性痛患者在刺激疗法中的痛症。我们使用多臂环境（CMAB）方法开发了一个系统，可以为患者提供SCS设置的推荐。这些推荐通过数位健康生态系统发送到病人，与病人监控系统结合，将患者的整个病程关键点组合起来。我们在ENVISION研究（Clinicaltrials.gov ID：NCT03240588）中评估了这个系统，使用质量生活指标和患者状态（PS），一种新的整体结果测量。SCS推荐提供了显著改善临床结果（痛和/或生活质量）的statistically significant improvement（p=0.003）。在moderate PS（N=7）前 receiving推荐的患者中，100% 示出了 statistically significant improvement，5/7 有改善的 PS 滞留时间。这个分析表明SCS患者可能受益于SCS推荐，从而增加了SCS疗法已经提供的临床改善。

Generalised Mutual Information: a Framework for Discriminative Clustering

paper_url: http://arxiv.org/abs/2309.02858
repo_url: None
paper_authors: Louis Ohl, Pierre-Alexandre Mattei, Charles Bouveyron, Warith Harchaoui, Mickaël Leclercq, Arnaud Droit, Frédéric Precioso
for: 本文旨在探讨最近一些深度归一化中的 clustering 目标函数，尤其是使用 Mutual Information (MI) 作为无监督的目标函数。
methods: 本文首先指出了 maximizing MI 不一定会得到满意的归一化结果，并提出了 Kullback-Leibler divergence 是这种行为的主要原因。然后，本文引入了一种扩展的 MI，称为 Generalised Mutual Information (GEMINI)，它是一种基于距离或kernel在数据空间的geometry-aware的度量。
results: GEMINIs 可以自动选择相关的数量的归一化类别，这是深度归一化预先未知的情况下归一化中很少研究的一个特性。

Abstract
In the last decade, recent successes in deep clustering majorly involved the Mutual Information (MI) as an unsupervised objective for training neural networks with increasing regularisations. While the quality of the regularisations have been largely discussed for improvements, little attention has been dedicated to the relevance of MI as a clustering objective. In this paper, we first highlight how the maximisation of MI does not lead to satisfying clusters. We identified the Kullback-Leibler divergence as the main reason of this behaviour. Hence, we generalise the mutual information by changing its core distance, introducing the Generalised Mutual Information (GEMINI): a set of metrics for unsupervised neural network training. Unlike MI, some GEMINIs do not require regularisations when training as they are geometry-aware thanks to distances or kernels in the data space. Finally, we highlight that GEMINIs can automatically select a relevant number of clusters, a property that has been little studied in deep discriminative clustering context where the number of clusters is a priori unknown.

摘要
过去一个十年，深度归一（Deep Clustering）的成功主要基于用无监督目标函数训练神经网络的积分信息（Mutual Information，MI）。虽然改进质量的讨论得到了广泛关注，但对于MI作为归一目标的重要性几乎没有任何关注。在这篇论文中，我们首先强调了将积分信息 maximize 不会导致满意的归一。我们认为，这是由卷积-莱布勒散射（Kullback-Leibler divergence）的主要原因。因此，我们总结了积分信息，并引入了一组新的准则（Generalized Mutual Information，GEMINI），这些准则不需要训练时的正则化。此外，我们发现了GEMINI可以自动选择合适的归一数量，这是在深度探测归一上未经知道归一数量的情况下很少研究的特性。

Getting too personal(ized): The importance of feature choice in online adaptive algorithms

paper_url: http://arxiv.org/abs/2309.02856
repo_url: None
paper_authors: ZhaoBin Li, Luna Yee, Nathaniel Sauerberg, Irene Sakson, Joseph Jay Williams, Anna N. Rafferty
for: 这个论文的目的是研究个性化学习技术是否有成本，以及个性化是否会导致政策采用延迟。
methods: 这篇论文使用了多臂投掷（MAB）算法来学习每个学生应该接受哪个版本的教育技术。
results: simulation 结果表明，在某些情况下，包含学生特征进行个性化可以有利，但在其他情况下，这会降低投掷算法的性能。此外，包含不必要的学生特征可能会系统性地劣化一些学生。

Abstract
Digital educational technologies offer the potential to customize students' experiences and learn what works for which students, enhancing the technology as more students interact with it. We consider whether and when attempting to discover how to personalize has a cost, such as if the adaptation to personal information can delay the adoption of policies that benefit all students. We explore these issues in the context of using multi-armed bandit (MAB) algorithms to learn a policy for what version of an educational technology to present to each student, varying the relation between student characteristics and outcomes and also whether the algorithm is aware of these characteristics. Through simulations, we demonstrate that the inclusion of student characteristics for personalization can be beneficial when those characteristics are needed to learn the optimal action. In other scenarios, this inclusion decreases performance of the bandit algorithm. Moreover, including unneeded student characteristics can systematically disadvantage students with less common values for these characteristics. Our simulations do however suggest that real-time personalization will be helpful in particular real-world scenarios, and we illustrate this through case studies using existing experimental results in ASSISTments. Overall, our simulations show that adaptive personalization in educational technologies can be a double-edged sword: real-time adaptation improves student experiences in some contexts, but the slower adaptation and potentially discriminatory results mean that a more personalized model is not always beneficial.

摘要
《数字教育技术的个性化机会》数字教育技术可以为学生个性化学习经验，了解每个学生的需求，并且可以通过更多的学生互动来提高技术。然而，我们需要考虑个性化是否有成本，例如个性化学生信息可能会导致政策采用延迟。我们在多臂投降（MAB）算法学习政策中对学生特征进行个性化时的问题。我们通过 simulations 表明，当学生特征需要了解优化行动时，包含学生特征可以有利。然而，在其他情况下，包含学生特征可能会降低投降算法的性能。此外，包含不必要的学生特征可能会系统性地劣化拥有不同特征的学生。我们的 simulations 表明，在某些实际场景下，实时个性化可以有助于学生，而不是一直等待个性化。总之，我们的 simulations 表明，个性化教育技术可以是一种两面的剑：实时个性化可以提高学生体验，但是更慢的个性化和可能歧视的结果表明，一个更加个性化的模型并不总是有利。

Promoting Open-domain Dialogue Generation through Learning Pattern Information between Contexts and Responses

paper_url: http://arxiv.org/abs/2309.02823
repo_url: https://github.com/russellliu0/rad
paper_authors: Mengjuan Liu, Chenyang Liu, Yunfan Yang, Jiang Liu, Mohan Jing
for: 提高开放领域对话模型中生成响应的质量，使其更加生动、有信息内容。
methods: 基于预训练语言模型（GPT-2）建立开放领域对话模型，提出改进的计划采样方法，使用响应来引导生成响应的训练阶段，并设计了响应相关机制以挖掘含义脉络信息。
results: 在Persona-Chat和DailyDialog数据集上评估提出的模型（RAD），实验结果表明，我们的模型在大多数自动和手动指标上超过了基eline。

Abstract
Recently, utilizing deep neural networks to build the opendomain dialogue models has become a hot topic. However, the responses generated by these models suffer from many problems such as responses not being contextualized and tend to generate generic responses that lack information content, damaging the user's experience seriously. Therefore, many studies try introducing more information into the dialogue models to make the generated responses more vivid and informative. Unlike them, this paper improves the quality of generated responses by learning the implicit pattern information between contexts and responses in the training samples. In this paper, we first build an open-domain dialogue model based on the pre-trained language model (i.e., GPT-2). And then, an improved scheduled sampling method is proposed for pre-trained models, by which the responses can be used to guide the response generation in the training phase while avoiding the exposure bias problem. More importantly, we design a response-aware mechanism for mining the implicit pattern information between contexts and responses so that the generated replies are more diverse and approximate to human replies. Finally, we evaluate the proposed model (RAD) on the Persona-Chat and DailyDialog datasets; and the experimental results show that our model outperforms the baselines on most automatic and manual metrics.

摘要
近期，使用深度神经网络建立开放领域对话模型已成为热点话题。然而，由这些模型生成的响应具有许多问题，如不具备上下文知识和生成泛润响应，导致用户体验受到严重损害。因此，许多研究尝试通过在对话模型中添加更多信息来使生成的响应更加生动和有用。不同于之前的研究，本文改进了对话模型的质量，通过在训练样本中学习含义映射信息。首先，我们基于预训练语言模型（i.e., GPT-2）建立了一个开放领域对话模型。然后，我们提出了一种改进的排定采样方法，通过响应来引导对话模型在训练阶段的响应生成，同时避免露出偏见问题。此外，我们设计了一种响应感知机制，以挖掘含义映射信息，使生成的答案更加多样化和人类化。最后，我们在Persona-Chat和DailyDialog数据集上评估了我们的模型（RAD），并获得了许多自动和手动指标上的优秀result。

Roulette: A Semantic Privacy-Preserving Device-Edge Collaborative Inference Framework for Deep Learning Classification Tasks

paper_url: http://arxiv.org/abs/2309.02820
repo_url: None
paper_authors: Jingyi Li, Guocheng Liao, Lin Chen, Xu Chen
for: 本文提出了一个名为 Roulette 的任务型semantic privacy-preserving 深度学习分类器框架，用于解决非同分布数据和隐私泄露问题。
methods: 该框架基于划算学习和加密学习，将数据的真实值视为私钥信息，并提供了一种分布式隐私保护的 garantue。
results: 通过使用实际数据进行广泛的性能评估， authors 发现 Roulette 可以有效地防止多种攻击，同时保持模型准确性。在非同分布数据下，Roulette 可以提高推断精度21%。

Abstract
Deep learning classifiers are crucial in the age of artificial intelligence. The device-edge-based collaborative inference has been widely adopted as an efficient framework for promoting its applications in IoT and 5G/6G networks. However, it suffers from accuracy degradation under non-i.i.d. data distribution and privacy disclosure. For accuracy degradation, direct use of transfer learning and split learning is high cost and privacy issues remain. For privacy disclosure, cryptography-based approaches lead to a huge overhead. Other lightweight methods assume that the ground truth is non-sensitive and can be exposed. But for many applications, the ground truth is the user's crucial privacy-sensitive information. In this paper, we propose a framework of Roulette, which is a task-oriented semantic privacy-preserving collaborative inference framework for deep learning classifiers. More than input data, we treat the ground truth of the data as private information. We develop a novel paradigm of split learning where the back-end DNN is frozen and the front-end DNN is retrained to be both a feature extractor and an encryptor. Moreover, we provide a differential privacy guarantee and analyze the hardness of ground truth inference attacks. To validate the proposed Roulette, we conduct extensive performance evaluations using realistic datasets, which demonstrate that Roulette can effectively defend against various attacks and meanwhile achieve good model accuracy. In a situation where the non-i.i.d. is very severe, Roulette improves the inference accuracy by 21\% averaged over benchmarks, while making the accuracy of discrimination attacks almost equivalent to random guessing.

摘要
深度学习分类器在人工智能时代扮演着关键性的角色。设备边缘基于的合作推理已成为一种高效的推广应用在物联网和5G/6G网络中。然而，它受到异步数据分布的精度下降和隐私泄露的问题。直接使用传输学习和分裂学习可能会带来高成本和隐私问题，而使用 криптография-based方法会带来巨大的负担。其他轻量级方法假设数据的真实值是无敏感的，但在许多应用中，用户的真实值是关键的隐私敏感信息。在这篇论文中，我们提出了一个名为Roulette的任务意义 Semantic Privacy-Preserving Collaborative Inference框架，其中我们将数据的真实值视为私钥信息。我们开发了一种新的分裂学习方法，其中后端DNN冻结，前端DNN重新训练为特征提取器和加密器。此外，我们提供了一种差分隐私保证，并分析了攻击者对真实值的推理难度。为验证我们的Roulette，我们对实际数据进行了广泛的性能评估，结果表明Roulette可以有效防御各种攻击，同时保持良好的模型准确率。在异步数据分布情况下，Roulette提高了推理精度21%，而对攻击者的推理精度接近随机猜测。

Combining Thermodynamics-based Model of the Centrifugal Compressors and Active Machine Learning for Enhanced Industrial Design Optimization

paper_url: http://arxiv.org/abs/2309.02818
repo_url: None
paper_authors: Shadi Ghiasi, Guido Pazzi, Concettina Del Grosso, Giovanni De Magistris, Giacomo Veneri
for: 这个论文主要是用来提高中心力压缩机的设计过程中的优化过程，以减少计算成本。
methods: 这个方法结合了一个内部的气动学模型和一个 Gaussian Process 基于的替身模型，并在一个可动性学习（AL）设定下使用。
results: 这个方法可以对替身模型进行优化，并且可以在生产环境中实现46%的计算时间减少，同时保持同性能。

Abstract
The design process of centrifugal compressors requires applying an optimization process which is computationally expensive due to complex analytical equations underlying the compressor's dynamical equations. Although the regression surrogate models could drastically reduce the computational cost of such a process, the major challenge is the scarcity of data for training the surrogate model. Aiming to strategically exploit the labeled samples, we propose the Active-CompDesign framework in which we combine a thermodynamics-based compressor model (i.e., our internal software for compressor design) and Gaussian Process-based surrogate model within a deployable Active Learning (AL) setting. We first conduct experiments in an offline setting and further, extend it to an online AL framework where a real-time interaction with the thermodynamics-based compressor's model allows the deployment in production. ActiveCompDesign shows a significant performance improvement in surrogate modeling by leveraging on uncertainty-based query function of samples within the AL framework with respect to the random selection of data points. Moreover, our framework in production has reduced the total computational time of compressor's design optimization to around 46% faster than relying on the internal thermodynamics-based simulator, achieving the same performance.

摘要
<>传统的中心旋转压缩机设计过程具有计算成本高的问题，因为这些过程下面有复杂的数学方程。虽然使用回归模型可以快速减少计算成本，但是主要挑战在于缺乏训练数据。为了积极利用标注样本，我们提出了Active-CompDesign框架，它将内部的热力学模型和 Gaussian 过程基于的准确模型结合在一起，并在可部署的活动学习（AL）环境中实现。我们首先在线上进行了实验，然后将其扩展到在线 AL 框架中，在实时与热力学模型进行交互时，可以在生产环境中部署。ActiveCompDesign 显示在准确模型化方面得到了明显的改进，通过在 AL 框架中使用不确定性基于的样本选择函数来避免随机选择数据点的问题。此外，我们的框架在生产环境中减少了压缩机的设计优化总计算时间约为46%，实现了同等性。

Near-continuous time Reinforcement Learning for continuous state-action spaces

paper_url: http://arxiv.org/abs/2309.02815
repo_url: None
paper_authors: Lorenzo Croissant, Marc Abeille, Bruno Bouchard
for: 这 paper Focuses on the reinforcement learning problem of controlling an unknown dynamical system to maximize the long-term average reward along a single trajectory, with the goal of overcoming the limitations of previous literature, which primarily considers discrete time and state-action spaces.
methods: The paper proposes a modelling approach that uses a Poisson clock of frequency $\varepsilon^{-1}$ to capture arbitrary time scales, and considers a generic reward function and state dynamics modelled as a jump process with an arbitrary transition kernel on $\mathbb{R}^d$. The algorithm uses an eluder dimension framework for learning and an approximate planning method based on a diffusive limit approximation of the jump process.
results: The paper shows that the celebrated optimism protocol applies when the sub-tasks (learning and planning) can be performed effectively, and the algorithm enjoys a regret of order $\tilde{\mathcal{O}(\varepsilon^{1/2} T+\sqrt{T})$. As the frequency of interactions blows up, the approximation error $\varepsilon^{1/2} T$ vanishes, showing that $\tilde{\mathcal{O}(\sqrt{T})$ is attainable in near-continuous time.

Abstract
We consider the Reinforcement Learning problem of controlling an unknown dynamical system to maximise the long-term average reward along a single trajectory. Most of the literature considers system interactions that occur in discrete time and discrete state-action spaces. Although this standpoint is suitable for games, it is often inadequate for mechanical or digital systems in which interactions occur at a high frequency, if not in continuous time, and whose state spaces are large if not inherently continuous. Perhaps the only exception is the Linear Quadratic framework for which results exist both in discrete and continuous time. However, its ability to handle continuous states comes with the drawback of a rigid dynamic and reward structure. This work aims to overcome these shortcomings by modelling interaction times with a Poisson clock of frequency $\varepsilon^{-1}$, which captures arbitrary time scales: from discrete ($\varepsilon=1$) to continuous time ($\varepsilon\downarrow0$). In addition, we consider a generic reward function and model the state dynamics according to a jump process with an arbitrary transition kernel on $\mathbb{R}^d$. We show that the celebrated optimism protocol applies when the sub-tasks (learning and planning) can be performed effectively. We tackle learning within the eluder dimension framework and propose an approximate planning method based on a diffusive limit approximation of the jump process. Overall, our algorithm enjoys a regret of order $\tilde{\mathcal{O}(\varepsilon^{1/2} T+\sqrt{T})$. As the frequency of interactions blows up, the approximation error $\varepsilon^{1/2} T$ vanishes, showing that $\tilde{\mathcal{O}(\sqrt{T})$ is attainable in near-continuous time.

摘要
我们考虑了控制未知动力系统的强化学习问题，以最大化漫游趋势的长期奖励。大多数文献都是在离散时间和离散状态动作空间下 considerthe Reinforcement Learning problem。 although this standpoint is suitable for games, it is often inadequate for mechanical or digital systems, where interactions occur at high frequencies or in continuous time, and the state spaces are large or inherently continuous. However, the Linear Quadratic framework exists for both discrete and continuous time, but it has a rigid dynamic and reward structure. This work aims to overcome these shortcomings by modeling interaction times with a Poisson clock of frequency $\varepsilon^{-1}$, which captures arbitrary time scales, from discrete time to continuous time. In addition, we consider a generic reward function and model the state dynamics according to a jump process with an arbitrary transition kernel on $\mathbb{R}^d$. We show that the celebrated optimism protocol applies when the sub-tasks (learning and planning) can be performed effectively. We tackle learning within the eluder dimension framework and propose an approximate planning method based on a diffusive limit approximation of the jump process. Overall, our algorithm enjoys a regret of order $\tilde{\mathcal{O}(\varepsilon^{1/2} T+\sqrt{T})$. As the frequency of interactions blows up, the approximation error $\varepsilon^{1/2} T$ vanishes, showing that $\tilde{\mathcal{O}(\sqrt{T})$ is attainable in near-continuous time.Here's the translation in Traditional Chinese:我们考虑了控制未知动力系统的强化学习问题，以最大化漫游趋势的长期奖励。大多数文献都是在离散时间和离散状态动作空间下考虑Reinforcement Learning problem。 although this standpoint is suitable for games, it is often inadequate for mechanical or digital systems, where interactions occur at high frequencies or in continuous time, and the state spaces are large or inherently continuous. However, the Linear Quadratic framework exists for both discrete and continuous time, but it has a rigid dynamic and reward structure. This work aims to overcome these shortcomings by modeling interaction times with a Poisson clock of frequency $\varepsilon^{-1}$, which captures arbitrary time scales, from discrete time to continuous time. In addition, we consider a generic reward function and model the state dynamics according to a jump process with an arbitrary transition kernel on $\mathbb{R}^d$. We show that the celebrated optimism protocol applies when the sub-tasks (learning and planning) can be performed effectively. We tackle learning within the eluder dimension framework and propose an approximate planning method based on a diffusive limit approximation of the jump process. Overall, our algorithm enjoys a regret of order $\tilde{\mathcal{O}(\varepsilon^{1/2} T+\sqrt{T})$. As the frequency of interactions blows up, the approximation error $\varepsilon^{1/2} T$ vanishes, showing that $\tilde{\mathcal{O}(\sqrt{T})$ is attainable in near-continuous time.

Automated Bioinformatics Analysis via AutoBA

paper_url: http://arxiv.org/abs/2309.03242
repo_url: https://github.com/joshuachou2018/autoba
paper_authors: Juexiao Zhou, Bin Zhang, Xiuying Chen, Haoyang Li, Xiaopeng Xu, Siyuan Chen, Xin Gao
for: 这篇论文是为了应对快速增长和变化的Omics数据的分析需求而设计的。
methods: 这篇论文使用了一个大型自然语言模型，设计用于传统的Omics数据分析。它使分析过程更加简单，需要最小化使用者的输入，并提供了详细的步骤计划 для多种生物信息学 задачі。
results: 这篇论文透过专家生物信息学家的验证，证明了AutoBA在多种Omics分析 caso中的强健性和适应力。包括整 genomic sequencing (WGS), RNA sequencing (RNA-seq), single-cell RNA-seq, ChIP-seq, 和 spatial transcriptomics等 caso。AutoBA还有自动设计分析过程的能力，根据输入数据的变化。相比于在网上的生物信息学服务，AutoBA可以在本地部署分析，保持数据隐私。

Abstract
With the fast-growing and evolving omics data, the demand for streamlined and adaptable tools to handle the analysis continues to grow. In response to this need, we introduce Auto Bioinformatics Analysis (AutoBA), an autonomous AI agent based on a large language model designed explicitly for conventional omics data analysis. AutoBA simplifies the analytical process by requiring minimal user input while delivering detailed step-by-step plans for various bioinformatics tasks. Through rigorous validation by expert bioinformaticians, AutoBA's robustness and adaptability are affirmed across a diverse range of omics analysis cases, including whole genome sequencing (WGS), RNA sequencing (RNA-seq), single-cell RNA-seq, ChIP-seq, and spatial transcriptomics. AutoBA's unique capacity to self-design analysis processes based on input data variations further underscores its versatility. Compared with online bioinformatic services, AutoBA deploys the analysis locally, preserving data privacy. Moreover, different from the predefined pipeline, AutoBA has adaptability in sync with emerging bioinformatics tools. Overall, AutoBA represents a convenient tool, offering robustness and adaptability for complex omics data analysis.

摘要
随着Omics数据的快速增长和演化，对处理分析的需求不断增长。为回应这一需求，我们介绍Auto Bioinformatics Analysis（AutoBA），一个基于大语言模型的自主AI代理，专门为传统Omics数据分析设计。AutoBA通过最小化用户输入而简化分析过程，并提供详细的步骤计划 для多种生物信息学任务。经过专家生物信息学家的严格验证，AutoBA在多种Omics分析 caso中表现出了Robustness和适应性，包括整个基因组测序（WGS）、RNA测序（RNA-seq）、单个单元RNA-seq、ChIP-seq和空间表述学。AutoBA的独特的自动设计分析过程基于输入数据的变化，进一步强调其灵活性。相比于在线生物信息学服务，AutoBA在本地部署分析，保持数据隐私。此外，AutoBA与emerging生物信息学工具相比，具有适应性。总的来说，AutoBA表示一种便捷的工具，提供Robustness和适应性 для复杂的Omics数据分析。

Norm Tweaking: High-performance Low-bit Quantization of Large Language Models

paper_url: http://arxiv.org/abs/2309.02784
repo_url: None
paper_authors: Liang Li, Qingyuan Li, Bo Zhang, Xiangxiang Chu
for: 这篇论文是针对大型自然语言模型（LLM）的模型压缩而写的，以实现在部署时不失去精度。
methods: 我们引入了一种名为“norm tweaking”的技术，这是一种可以与现有的PTQ方法整合的高精度且成本效益高的方法。我们的方法基于 Float 版本的活化函数与 quantized 版本之间的差异，通过更新对应的Normalization层的权重，以提高模型的通用能力。
results: 我们在诸多数据集上进行了广泛的实验，结果显示我们的方法可以在weight-only quantization和joint quantization of weights和活动中实现更高的精度，超过现有的PTQ方法。在 GLM-130B 和 OPT-66B 上，我们的方法甚至可以在2 bits 的量化下达到浮点版本的精度水平。

Abstract
As the size of large language models (LLMs) continues to grow, model compression without sacrificing accuracy has become a crucial challenge for deployment. While some quantization methods, such as GPTQ, have made progress in achieving acceptable 4-bit weight-only quantization, attempts at lower bit quantization often result in severe performance degradation. In this paper, we introduce a technique called norm tweaking, which can be used as a plugin in current PTQ methods to achieve high precision while being cost-efficient. Our approach is inspired by the observation that rectifying the quantized activation distribution to match its float counterpart can readily restore accuracy for LLMs. To achieve this, we carefully design a tweaking strategy that includes calibration data generation and channel-wise distance constraint to update the weights of normalization layers for better generalization. We conduct extensive experiments on various datasets using several open-sourced LLMs. Our method demonstrates significant improvements in both weight-only quantization and joint quantization of weights and activations, surpassing existing PTQ methods. On GLM-130B and OPT-66B, our method even achieves the same level of accuracy at 2-bit quantization as their float ones. Our simple and effective approach makes it more practical for real-world applications.

摘要
(Simplified Chinese translation)LLMs的大小继续增长，模型压缩无需牺牲准确性已成为部署的关键挑战。虽然一些归一化方法，如GPTQ，在实现4位准确性的weight-only归一化方面做出了进展，但尝试在更低的位数归一化时经常导致性能下降。在这篇论文中，我们介绍了一种名为“norm tweaking”的技术，可以作为现有的PTQ方法的插件，以实现高精度而且成本效益高的压缩。我们的方法受 Float 版本的 LLMS 的观察所启发，通过修正归一化后的激活量分布，以恢复 LLMS 的准确性。为了实现这一点，我们精心设计了一种调整策略，包括生成准确数据和通道级距离约束，以更新 normalization 层的权重，以便更好地适应。我们在多个数据集和多个开源 LLMS 上进行了广泛的实验。我们的方法在 weight-only 归一化和 weights 和激活量的共同归一化中都显示出了显著的改进，超越了现有的PTQ方法。在 GLM-130B 和 OPT-66B 上，我们的方法甚至可以在2位归一化下达到浮点版本的准确性水平。我们的简单而有效的方法使其更适合实际应用。

Improving diagnosis and prognosis of lung cancer using vision transformers: A scoping review

paper_url: http://arxiv.org/abs/2309.02783
repo_url: None
paper_authors: Hazrat Ali, Farida Mohsen, Zubair Shah
for: This paper aims to identify recent developments in vision transformer-based AI methods for lung cancer imaging applications, and to provide insights into their performance and potential for clinical translation.
methods: The paper reviews 34 studies published from 2020 to 2022 that use vision transformer-based methods for lung cancer diagnosis and prognosis, including classification of lung cancer types and segmentation of lungs. The studies combine vision transformers with other architectures such as convolutional neural networks or UNet models.
results: The review finds that vision transformer-based models are increasingly popular for lung cancer applications, but their computational complexity and clinical relevance are important factors to consider for future research. The studies show promising results in lung cancer diagnosis and prognosis, but lack clear strategies for clinical transformation.Here are the three points in Simplified Chinese text:
for: 这篇评论旨在描述最近几年内发表的视Transformer基于AI方法在肺癌图像应用中的发展，以及这些方法在临床翻译中的潜在价值。
methods: 这篇评论检讨了2020年至2022年间发表的34篇研究，这些研究使用视Transformer基于方法进行肺癌诊断和预测，包括肺癌类型分类和肺脏分割。这些研究通常将视Transformer与其他架构相结合，如卷积神经网络或UNet模型。
results: 评论发现，视Transformer基于模型在肺癌应用中日益受欢迎，但计算复杂度和临床 relevance 是未来研究的重要因素。研究显示，视Transformer基于模型在肺癌诊断和预测中表现出色，但缺乏清晰的临床转化策略。

Abstract
Vision transformer-based methods are advancing the field of medical artificial intelligence and cancer imaging, including lung cancer applications. Recently, many researchers have developed vision transformer-based AI methods for lung cancer diagnosis and prognosis. This scoping review aims to identify the recent developments on vision transformer-based AI methods for lung cancer imaging applications. It provides key insights into how vision transformers complemented the performance of AI and deep learning methods for lung cancer. Furthermore, the review also identifies the datasets that contributed to advancing the field. Of the 314 retrieved studies, this review included 34 studies published from 2020 to 2022. The most commonly addressed task in these studies was the classification of lung cancer types, such as lung squamous cell carcinoma versus lung adenocarcinoma, and identifying benign versus malignant pulmonary nodules. Other applications included survival prediction of lung cancer patients and segmentation of lungs. The studies lacked clear strategies for clinical transformation. SWIN transformer was a popular choice of the researchers; however, many other architectures were also reported where vision transformer was combined with convolutional neural networks or UNet model. It can be concluded that vision transformer-based models are increasingly in popularity for developing AI methods for lung cancer applications. However, their computational complexity and clinical relevance are important factors to be considered for future research work. This review provides valuable insights for researchers in the field of AI and healthcare to advance the state-of-the-art in lung cancer diagnosis and prognosis. We provide an interactive dashboard on lung-cancer.onrender.com/.

摘要
医学人工智能领域中，视transformer基本方法在肺癌成像方面得到了广泛应用，包括肺癌诊断和预后预测。最近几年，许多研究人员已经开发出了基于视transformer的人工智能方法，用于肺癌成像应用。本篇文章的目的是查找最新的视transformer基本方法在肺癌成像领域的发展情况。它提供了关键的洞察，描述了视transformer如何补充了人工智能和深度学习方法的表现，以及这些方法如何在肺癌预测和诊断中发挥作用。此外，文章还提到了这些研究中使用的数据集，以及这些数据集如何为领域的发展做出了贡献。从2020年到2022年，共检索到314篇研究文章，其中包括34篇发表在这三年间的研究。研究中最常 addressed的任务是分类肺癌类型，如肺平滑细胞癌 versus 肺尖链细胞癌，以及识别正常 versus 癌变肺脏囊。其他应用包括肺癌患者存活预测和肺部分 segmentation。然而，研究中没有明确的临床转化策略。SWIN transformer是研究人员最受欢迎的选择，但是也有许多其他架构，其中视transformer与卷积神经网络或 Unet 模型结合使用。可以 conclude 的是，基于视transformer的模型在肺癌应用领域越来越受欢迎。然而，其计算复杂性和临床实用性是未来研究的关键因素。本文提供了有价值的洞察，可以帮助医学人工智能和医疗领域的研究人员进一步推动肺癌诊断和预后预测的州前沿。我们还提供了一个交互式的仪表板，可以在lung-cancer.onrender.com/ 上查看。

GPT Can Solve Mathematical Problems Without a Calculator

paper_url: http://arxiv.org/abs/2309.03241
repo_url: https://github.com/thudm/mathglm
paper_authors: Zhen Yang, Ming Ding, Qingsong Lv, Zhihuan Jiang, Zehai He, Yuyi Guo, Jinfeng Bai, Jie Tang
for: 挑战大语言模型无法准确执行多位数 arithmetic 操作的假设。
methods: 使用 sufficient training data，一个 2 亿参数的语言模型可以准确执行多位数 arithmetic 操作，并且几乎没有数据泄露。
results: 我们的 MathGLM，基于 GLM-10B 进行了精度调整，在一个包含多步 arithmetic 操作和 math 问题的文本集上达到了类似于 GPT-4 的性能，在一个 5,000 个样本的中文 math 问题测试集上。

Abstract
Previous studies have typically assumed that large language models are unable to accurately perform arithmetic operations, particularly multiplication of >8 digits, and operations involving decimals and fractions, without the use of calculator tools. This paper aims to challenge this misconception. With sufficient training data, a 2 billion-parameter language model can accurately perform multi-digit arithmetic operations with almost 100% accuracy without data leakage, significantly surpassing GPT-4 (whose multi-digit multiplication accuracy is only 4.3%). We also demonstrate that our MathGLM, fine-tuned from GLM-10B on a dataset with additional multi-step arithmetic operations and math problems described in text, achieves similar performance to GPT-4 on a 5,000-samples Chinese math problem test set. Our code and data are public at https://github.com/THUDM/MathGLM.

摘要

SWAP: Exploiting Second-Ranked Logits for Adversarial Attacks on Time Series

paper_url: http://arxiv.org/abs/2309.02752
repo_url: None
paper_authors: Chang George Dong, Liangwei Nathan Zheng, Weitong Chen, Wei Emma Zhang, Lin Yue
for: 这篇论文的目的是提出一种新的时间序列分类模型攻击方法，以强化时间序列分类模型的攻击性和防御性。
methods: 这篇论文使用了一种新的攻击方法，名为SWAP，它将注意力集中在第二名的数据上，以提高第二名的数据的信任度，并对其他数据进行最小的操作。这使得SWAP可以增加时间序列分类模型的攻击成功率，同时降低攻击的复杂度。
results: 实验结果显示，SWAP可以实现高度的攻击成功率，超过50%，并与现有的方法相比，增加了18%的攻击成功率。

Abstract
Time series classification (TSC) has emerged as a critical task in various domains, and deep neural models have shown superior performance in TSC tasks. However, these models are vulnerable to adversarial attacks, where subtle perturbations can significantly impact the prediction results. Existing adversarial methods often suffer from over-parameterization or random logit perturbation, hindering their effectiveness. Additionally, increasing the attack success rate (ASR) typically involves generating more noise, making the attack more easily detectable. To address these limitations, we propose SWAP, a novel attacking method for TSC models. SWAP focuses on enhancing the confidence of the second-ranked logits while minimizing the manipulation of other logits. This is achieved by minimizing the Kullback-Leibler divergence between the target logit distribution and the predictive logit distribution. Experimental results demonstrate that SWAP achieves state-of-the-art performance, with an ASR exceeding 50% and an 18% increase compared to existing methods.

摘要
时间序列分类（TSC）已成为各个领域的关键任务，深度神经网络在TSC任务中表现出了优异的表现。然而，这些模型容易受到恶意攻击，其中细腻的干扰可以很大程度地影响预测结果。现有的攻击方法经常受到过参数化或随机Logit干扰的限制，这限制了其效iveness。此外，通常需要生成更多的噪声，以提高攻击成功率（ASR），这使得攻击更容易被检测出来。为解决这些限制，我们提出了SWAP，一种新的攻击方法 дляTSC模型。SWAP通过提高第二个排名的Logit的信任度，同时尽量减少其他Logit的干扰，来实现这一目的。我们通过最小化Kullback-Leibler散度 между目标Logit分布和预测Logit分布来实现这一目标。实验结果表明，SWAP可以达到现状最佳性能，ASR超过50%，与现有方法相比提高18%。

MLN-net: A multi-source medical image segmentation method for clustered microcalcifications using multiple layer normalization

paper_url: http://arxiv.org/abs/2309.02742
repo_url: https://github.com/yezanting/mln-net-verson1
paper_authors: Ke Wang, Zanting Ye, Xiang Xie, Haidong Cui, Tao Chen, Banteng Liu
for: 这篇论文是为了提高乳腺癌诊断和治疗的精确度数据分类clustered microcalcifications in mammography images。
methods: 这篇论文提出了一个名为MLN-net的新框架，可以将单一源像转换为多个源像，以提高分类精度。这个框架使用多层常化（LN）层来建立分类网络，并且实现了不同领域的分类精度。
results: 实验结果显示，MLN-net可以从不同领域的数据中精确地分类clustered microcalcifications，并且其分类精度比前方法高。

Abstract
Accurate segmentation of clustered microcalcifications in mammography is crucial for the diagnosis and treatment of breast cancer. Despite exhibiting expert-level accuracy, recent deep learning advancements in medical image segmentation provide insufficient contribution to practical applications, due to the domain shift resulting from differences in patient postures, individual gland density, and imaging modalities of mammography etc. In this paper, a novel framework named MLN-net, which can accurately segment multi-source images using only single source images, is proposed for clustered microcalcification segmentation. We first propose a source domain image augmentation method to generate multi-source images, leading to improved generalization. And a structure of multiple layer normalization (LN) layers is used to construct the segmentation network, which can be found efficient for clustered microcalcification segmentation in different domains. Additionally, a branch selection strategy is designed for measuring the similarity of the source domain data and the target domain data. To validate the proposed MLN-net, extensive analyses including ablation experiments are performed, comparison of 12 baseline methods. Extensive experiments validate the effectiveness of MLN-net in segmenting clustered microcalcifications from different domains and the its segmentation accuracy surpasses state-of-the-art methods. Code will be available at https://github.com/yezanting/MLN-NET-VERSON1.

摘要
严重粒体分化在胸部X射线护理中是致癌诊断和治疗的关键。尽管深度学习在医疗图像分割方面的最新进展展现出了专家级别的准确性，但是这些进展在实际应用中并没有做出足够的贡献，因为受到了患者姿势、个人脏腔密度和成像方式等因素的领域转移。在这篇论文中，我们提出了一种名为MLN-net的新框架，可以使用单源图像来准确地分割多源图像。我们首先提出了一种源Domain图像增强方法，以生成多源图像，从而提高了总体化。此外，我们还使用多层 нормализа（LN）层结构来构建分割网络，这种结构在不同的领域中效果非常高。此外，我们还设计了一种分支选择策略，用于测量源领域数据和目标领域数据之间的相似性。为验证我们提出的MLN-net，我们进行了广泛的分析，包括ablation实验和12个基eline方法的比较。广泛的实验证明了MLN-net在不同领域中对集群粒体分化的准确性具有优势，并且其分割精度超过了状态之前的方法。代码将在https://github.com/yezanting/MLN-NET-VERSON1上提供。

Rubric-Specific Approach to Automated Essay Scoring with Augmentation Training

paper_url: http://arxiv.org/abs/2309.02740
repo_url: None
paper_authors: Brian Cho, Youngbin Jang, Jaewoong Yoon
for: automatic evaluation of subjective responses
methods: neural solutions with data augmentation
results: state-of-the-art performance in Automated Student Assessment Prize datasetHere’s the full translation of the abstract in Simplified Chinese:
for: 本研究旨在 automatizethe evaluation of subjective responses，使用神经网络方法。
methods: 研究使用神经网络方法，并对数据进行数据增强操作，以帮助模型学习掌握旁被前一代工作忽略的特征和函数。
results: 研究在Automated Student Assessment Prize数据集上实现了最佳性能。

Abstract
Neural based approaches to automatic evaluation of subjective responses have shown superior performance and efficiency compared to traditional rule-based and feature engineering oriented solutions. However, it remains unclear whether the suggested neural solutions are sufficient replacements of human raters as we find recent works do not properly account for rubric items that are essential for automated essay scoring during model training and validation. In this paper, we propose a series of data augmentation operations that train and test an automated scoring model to learn features and functions overlooked by previous works while still achieving state-of-the-art performance in the Automated Student Assessment Prize dataset.

摘要

HC3 Plus: A Semantic-Invariant Human ChatGPT Comparison Corpus

paper_url: http://arxiv.org/abs/2309.02731
repo_url: None
paper_authors: Zhenpeng Su, Xing Wu, Wei Zhou, Guangyuan Ma, Songlin Hu
for: 本研究旨在提高AI生成内容检测的性能，尤其是对于模型生成文本的检测。
methods: 本研究使用了更加全面和complete的数据集，包括semantic-invariant任务，以及进行了大量的任务指令练化。
results: 我们的提议检测器在对semantic-invariant任务进行检测时表现出了更高的性能，并且超过了之前的状态 искусственный智能RoBERTa-based检测器。

Abstract
ChatGPT has gained significant interest due to its impressive performance, but people are increasingly concerned about its potential risks, particularly around the detection of AI-generated content (AIGC), which is often difficult for untrained humans to identify. Current datasets utilized for detecting ChatGPT-generated text primarily center around question-answering, yet they tend to disregard tasks that possess semantic-invariant properties, such as summarization, translation, and paraphrasing. Our primary studies demonstrate that detecting model-generated text on semantic-invariant tasks is more difficult. To fill this gap, we introduce a more extensive and comprehensive dataset that considers more types of tasks than previous work, including semantic-invariant tasks. In addition, the model after a large number of task instruction fine-tuning shows a strong powerful performance. Owing to its previous success, we further instruct fine-tuning Tk-instruct and built a more powerful detection system. Experimental results show that our proposed detector outperforms the previous state-of-the-art RoBERTa-based detector.

摘要

Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data

paper_url: http://arxiv.org/abs/2309.02730
repo_url: None
paper_authors: Hyungseob Lim, Kyungguen Byun, Sunkuk Moon, Erik Visser
for: 这个研究的目的是提高无需文本识别或说话者标注的任何对话转换模型，以更好地传递目标说话者的说话风格。
methods: 我们提出了一种新的方法，利用自动编程学习（SSL）模型来收集目标说话者每个不同的音频内容的说话风格，并将其表示为一组叫做“风格图书”的嵌入。然后，我们使用这些风格图书来对源语音内容进行适应，以确定最终的目标风格。最后，我们使用扩散基于的生成模型来生成转换后的声音mel-спектрограм。
results: 我们的提议方法与扩散基于的生成模型的组合，在任何对话转换任务中可以获得更好的说话者一致性，相比基eline模型。此外，我们发现，对于 longer utterances，计算复杂性的增加很小。

Abstract
While many recent any-to-any voice conversion models succeed in transferring some target speech's style information to the converted speech, they still lack the ability to faithfully reproduce the speaking style of the target speaker. In this work, we propose a novel method to extract rich style information from target utterances and to efficiently transfer it to source speech content without requiring text transcriptions or speaker labeling. Our proposed approach introduces an attention mechanism utilizing a self-supervised learning (SSL) model to collect the speaking styles of a target speaker each corresponding to the different phonetic content. The styles are represented with a set of embeddings called stylebook. In the next step, the stylebook is attended with the source speech's phonetic content to determine the final target style for each source content. Finally, content information extracted from the source speech and content-dependent target style embeddings are fed into a diffusion-based decoder to generate the converted speech mel-spectrogram. Experiment results show that our proposed method combined with a diffusion-based generative model can achieve better speaker similarity in any-to-any voice conversion tasks when compared to baseline models, while the increase in computational complexity with longer utterances is suppressed.

摘要
而多个最近的任意到任意语音转换模型可以将目标语音的样式信息传递到转换后的语音中，但它们仍然缺乏可以准确复制目标说话者的说话风格的能力。在这种工作中，我们提议一种新的方法，可以从目标语音中提取丰富的说话风格信息，并将其高效地传递给源语音内容，不需要文本转录或说话者标注。我们的提议方法使用一种自然语言学习（SSL）模型来收集目标说话者每个不同的phonetic content对应的说话风格。这些风格被表示为一组叫做“stylebook”的嵌入。在下一步，stylebook与源语音的phonetic content进行 attended，以确定每个源内容的最终目标风格。最后，来自源语音的内容信息和内容相关的目标风格嵌入被 fed into一个扩散型生成模型，以生成转换后的语音mel-spectrogram。实验结果表明，我们的提议方法与扩散型生成模型相结合可以在任意到任意语音转换任务中实现更好的说话者相似性，而与 longer utterances 相比，计算复杂性的增加被抑制。

Large Language Models for Automated Open-domain Scientific Hypotheses Discovery

paper_url: http://arxiv.org/abs/2309.02726
repo_url: https://github.com/zongliny/moose
paper_authors: Zonglin Yang, Xinya Du, Junxian Li, Jie Zheng, Soujanya Poria, Erik Cambria
for: 本研究旨在开发一种自动生成有效、新颖、有用的社会科学学术假设探索系统，需要使用原始网络 Corpora 作为观察数据，并提出新的假设，以便为人类研究者提供帮助。
methods: 本研究使用了一种多模块框架，以及三种不同的反馈机制，以提高系统的性能。
results: 研究发现，使用这些反馈机制可以使系统表现出较高的性能，包括使用 GPT-4 基于评估和社会科学专家评估。

Abstract
Hypothetical induction is recognized as the main reasoning type when scientists make observations about the world and try to propose hypotheses to explain those observations. Past research on hypothetical induction has a limited setting that (1) the observation annotations of the dataset are not raw web corpus but are manually selected sentences (resulting in a close-domain setting); and (2) the ground truth hypotheses annotations are mostly commonsense knowledge, making the task less challenging. In this work, we propose the first NLP dataset for social science academic hypotheses discovery, consisting of 50 recent papers published in top social science journals. Raw web corpora that are necessary for developing hypotheses in the published papers are also collected in the dataset, with the final goal of creating a system that automatically generates valid, novel, and helpful (to human researchers) hypotheses, given only a pile of raw web corpora. The new dataset can tackle the previous problems because it requires to (1) use raw web corpora as observations; and (2) propose hypotheses even new to humanity. A multi-module framework is developed for the task, as well as three different feedback mechanisms that empirically show performance gain over the base framework. Finally, our framework exhibits high performance in terms of both GPT-4 based evaluation and social science expert evaluation.

摘要
traducción al chino simplificado:推测induction被认为是科学家当中主要的理解类型，当他们观察世界并尝试提出解释这些观察结果。过去关于推测induction的研究有限制的设置，包括（1）数据集的观察注释不是 Raw web corpus，而是 manually selected sentences（导致close-domain setting）;和（2）人类常识的ground truth假设注释，使任务更加容易。在这项工作中，我们提出了首个社会科学学报上的NLP数据集，包括最近50篇发表在首屈社会科学期刊上的论文。 Raw web corpus，用于在发表论文中发展假设，也被收集到数据集中，最终目标是创建一个可以自动生成有效、新、有用（ для人类研究者）假设，只需要一堆 Raw web corpus。新数据集可以解决以前的问题，因为它需要（1）使用Raw web corpus作为观察;和（2）提出人类未知的假设。我们开发了一个多模块框架，以及三种不同的反馈机制，实际证明了对基础框架的性能提升。最终，我们的框架在GPT-4基于评估和社会科学专家评估中都表现出高性能。

Offensive Hebrew Corpus and Detection using BERT

paper_url: http://arxiv.org/abs/2309.02724
repo_url: https://github.com/sinalab/offensivehebrew
paper_authors: Nagham Hamad, Mustafa Jarrar, Mohammad Khalilia, Nadim Nashif
For: The paper is written for offensive language detection in Hebrew, specifically for low-resource languages.* Methods: The paper uses a new offensive language corpus in Hebrew, which consists of 15,881 tweets labeled with one or more of five classes (abusive, hate, violence, pornographic, or none offensive). The authors fine-tuned two Hebrew BERT models, HeBERT and AlephBERT, using their proposed dataset and another published dataset.* Results: The authors observed that their data boosts HeBERT performance by 2% when combined with D_OLaH, and fine-tuning AlephBERT on their data and testing on D_OLaH yields 69% accuracy. They also found that fine-tuning on D_OLaH and testing on their data yields 57% accuracy, which may indicate the generalizability of their data.Here’s the simplified Chinese text for the three key points:* 为：本文关注希伯来语中的不当语言检测，尤其是对于低资源语言。* 方法：本文使用一个新的希伯来语不当语言集，包含15,881条推特消息，每个消息被标记为一个或多个五种类别（不当、仇恨、暴力、色情或无不当）。作者使用了一种新的Annotation process，每个注释者需要熟悉以色列文化、政治和实践，以理解每条消息的上下文。* 结果：作者发现，将HeBERT模型在他们的数据集上进行微调，并与D_OLaH进行组合，可以提高HeBERT模型的性能 by 2%。此外，对AlephBERT模型进行微调，并测试在D_OLaH上，可以达到69%的准确率。此外，将模型微调在D_OLaH上，并测试在他们的数据集上，可以达到57%的准确率，这可能是数据的普适性的证明。

Abstract
Offensive language detection has been well studied in many languages, but it is lagging behind in low-resource languages, such as Hebrew. In this paper, we present a new offensive language corpus in Hebrew. A total of 15,881 tweets were retrieved from Twitter. Each was labeled with one or more of five classes (abusive, hate, violence, pornographic, or none offensive) by Arabic-Hebrew bilingual speakers. The annotation process was challenging as each annotator is expected to be familiar with the Israeli culture, politics, and practices to understand the context of each tweet. We fine-tuned two Hebrew BERT models, HeBERT and AlephBERT, using our proposed dataset and another published dataset. We observed that our data boosts HeBERT performance by 2% when combined with D_OLaH. Fine-tuning AlephBERT on our data and testing on D_OLaH yields 69% accuracy, while fine-tuning on D_OLaH and testing on our data yields 57% accuracy, which may be an indication to the generalizability our data offers. Our dataset and fine-tuned models are available on GitHub and Huggingface.

摘要
“对粗语言探测已经在许多语言中得到了很好的研究，但是在低资源语言中，例如希伯来语，则落后了。在这篇论文中，我们提供了一个新的希伯来语粗语言数据库。总共从Twitter上获取了15,881则短讯，每则短讯都被标注为一个或多个五种类别（不尊重、仇恨、暴力、色情或无不尊重）由阿拉伯语-希伯来语双语话者标注。标注过程是具有挑战性的，因为每个标注者须熟悉以色列文化、政治和实践来理解每则短讯的背景。我们精确地调整了两个希伯来BERT模型（HeBERT和AlephBERT），使用我们提出的资料集和另一个已发布的资料集。我们发现，当我们的数据与D_OLaH进行调整时，HeBERT的性能提高了2%。精确地调整AlephBERT使用我们的数据和D_OLaH，测试时的准确率为69%，而精确地调整AlephBERT使用D_OLaH，然后测试使用我们的数据，则为57%，这可能是我们数据的一般化性的证明。我们的资料集和调整后的模型在GitHub和Huggingface上可用。”

SlAction: Non-intrusive, Lightweight Obstructive Sleep Apnea Detection using Infrared Video

paper_url: http://arxiv.org/abs/2309.02713
repo_url: None
paper_authors: You Rim Choi, Gyeongseon Eo, Wonhyuck Youn, Hyojin Lee, Haemin Jang, Dongyoon Kim, Hyunwoo Shin, Hyung-Sin Kim
for: 该论文目的是检测呼吸暂停睡眠（OSA），以提供早期检测和个性化治疗的可能性。
methods: 该论文使用了非侵入式的视频检测技术，使用红外视频记录sleep environment，并使用低帧率、大小和步长的静止窗口分析方法来捕捉睡眠中呼吸事件的变化。
results: 论文的实验结果显示，SlAction可以在不同的睡眠环境中达到87.6%的准确率，并且可以在实时进行检测（~3秒钟），这表明SlAction有potential用于早期检测和个性化治疗OSA。

Abstract
Obstructive sleep apnea (OSA) is a prevalent sleep disorder affecting approximately one billion people world-wide. The current gold standard for diagnosing OSA, Polysomnography (PSG), involves an overnight hospital stay with multiple attached sensors, leading to potential inaccuracies due to the first-night effect. To address this, we present SlAction, a non-intrusive OSA detection system for daily sleep environments using infrared videos. Recognizing that sleep videos exhibit minimal motion, this work investigates the fundamental question: "Are respiratory events adequately reflected in human motions during sleep?" Analyzing the largest sleep video dataset of 5,098 hours, we establish correlations between OSA events and human motions during sleep. Our approach uses a low frame rate (2.5 FPS), a large size (60 seconds) and step (30 seconds) for sliding window analysis to capture slow and long-term motions related to OSA. Furthermore, we utilize a lightweight deep neural network for resource-constrained devices, ensuring all video streams are processed locally without compromising privacy. Evaluations show that SlAction achieves an average F1 score of 87.6% in detecting OSA across various environments. Implementing SlAction on NVIDIA Jetson Nano enables real-time inference (~3 seconds for a 60-second video clip), highlighting its potential for early detection and personalized treatment of OSA.

摘要
扑杀性睡眠呼吸暂停综合症（OSA）是全球范围内一种普遍的睡眠疾病，影响约10亿人。现有的黄金标准 dla OSA 诊断，多somnography（PSG），需要在医院住一夜，并attach多个传感器，可能导致首夜效应，从而影响准确性。为了解决这个问题，我们提出了SlAction，一种不侵入的OSA检测系统，用于日常睡眠环境中。我们问题是：“睡眠视频中的呼吸事件是否能够准确地反映在人体动作中？”通过分析了5098小时的睡眠视频数据，我们发现了OSA事件和人体动作之间的相关性。我们的方法使用低帧率（2.5 FPS）、大小（60秒）和步长（30秒）的滑动窗口分析，以捕捉睡眠中的慢速和长期动作。此外，我们使用轻量级的深度神经网络，确保在资源有限的设备上进行本地处理，并保持隐私。我们的评估表明，SlAction在不同环境中的OSA检测精度为87.6%。通过在NVIDIA Jetson Nano上实现SlAction，我们可以在实时（大约3秒）进行60秒视频剪辑，这表明SlAction在早期检测和个性化治疗OSA方面具有潜在的潜力。

Unveiling the frontiers of deep learning: innovations shaping diverse domains

paper_url: http://arxiv.org/abs/2309.02712
repo_url: None
paper_authors: Shams Forruque Ahmed, Md. Sakib Bin Alam, Maliha Kabir, Shaila Afrin, Sabiha Jannat Rafa, Aanushka Mehjabin, Amir H. Gandomi
for: 探讨深度学习在各个领域的应用和挑战
methods: 使用深度学习模型进行预测和分析，并且可以自适应和优化数据
results: 深度学习在各个领域都有精度的预测和分析结果，但需要大量数据进行有效处理和分析

Abstract
Deep learning (DL) enables the development of computer models that are capable of learning, visualizing, optimizing, refining, and predicting data. In recent years, DL has been applied in a range of fields, including audio-visual data processing, agriculture, transportation prediction, natural language, biomedicine, disaster management, bioinformatics, drug design, genomics, face recognition, and ecology. To explore the current state of deep learning, it is necessary to investigate the latest developments and applications of deep learning in these disciplines. However, the literature is lacking in exploring the applications of deep learning in all potential sectors. This paper thus extensively investigates the potential applications of deep learning across all major fields of study as well as the associated benefits and challenges. As evidenced in the literature, DL exhibits accuracy in prediction and analysis, makes it a powerful computational tool, and has the ability to articulate itself and optimize, making it effective in processing data with no prior training. Given its independence from training data, deep learning necessitates massive amounts of data for effective analysis and processing, much like data volume. To handle the challenge of compiling huge amounts of medical, scientific, healthcare, and environmental data for use in deep learning, gated architectures like LSTMs and GRUs can be utilized. For multimodal learning, shared neurons in the neural network for all activities and specialized neurons for particular tasks are necessary.

摘要
深度学习（DL）允许开发计算机模型，能够学习、可视化、优化、修剪和预测数据。近年来，DL在各种领域应用，如音视频数据处理、农业、交通预测、自然语言、生物医学、灾害管理、生物信息学、药物设计、 genomics、人脸识别和生态学。为了探讨深度学习的当前状况，需要调查最新的发展和应用在这些领域。然而，文献缺乏探讨深度学习在所有领域的应用。这篇论文因此进行了广泛的调查，探讨了深度学习在所有主要领域的可能应用，以及相关的优势和挑战。根据文献显示，DL在预测和分析中表现出了准确性，使其成为计算机科学中的 poderoso工具。DL的独立性使得它可以处理没有前期训练的数据，需要大量数据进行有效的分析和处理，类似于数据量。为了处理巨量数据的挑战，可以使用门控架构如LSTM和GRU。为多模态学习，共享 neurons 在神经网络中为所有活动和特定任务的专门 neurons 是必要的。

Addressing Imperfect Symmetry: a Novel Symmetry-Learning Actor-Critic Extension

paper_url: http://arxiv.org/abs/2309.02711
repo_url: https://github.com/m-abr/Adaptive-Symmetry-Learning
paper_authors: Miguel Abreu, Luis Paulo Reis, Nuno Lau
for: 本研究旨在捕捉人类在完美对称任务中的偏差和认知偏见（如有一个主导手），并通过使用强化学习来捕捉人类大脑对symmetry的能力。
methods: 本研究提出了一种名为自适应对称学习（ASL）的模型简化actor-critic扩展，可以在学习过程中适应不完全或不准确的对称描述，并在学习策略时保持共同的对称关系。ASL包括对称适应组件和模块化损失函数。
results: 对比现有的对称强化方法，ASL在四脚蚂蚁多向运动任务中表现出优于或相等于其他方法的性能，能够恢复大幅偏差和泛化知识到隐藏的对称状态。

Abstract
Symmetry, a fundamental concept to understand our environment, often oversimplifies reality from a mathematical perspective. Humans are a prime example, deviating from perfect symmetry in terms of appearance and cognitive biases (e.g. having a dominant hand). Nevertheless, our brain can easily overcome these imperfections and efficiently adapt to symmetrical tasks. The driving motivation behind this work lies in capturing this ability through reinforcement learning. To this end, we introduce Adaptive Symmetry Learning (ASL) $\unicode{x2013}$ a model-minimization actor-critic extension that addresses incomplete or inexact symmetry descriptions by adapting itself during the learning process. ASL consists of a symmetry fitting component and a modular loss function that enforces a common symmetric relation across all states while adapting to the learned policy. The performance of ASL is compared to existing symmetry-enhanced methods in a case study involving a four-legged ant model for multidirectional locomotion tasks. The results demonstrate that ASL is capable of recovering from large perturbations and generalizing knowledge to hidden symmetric states. It achieves comparable or better performance than alternative methods in most scenarios, making it a valuable approach for leveraging model symmetry while compensating for inherent perturbations.

摘要
“同调性”是我们理解环境的基本概念，却常以数学角度简化现实。人类是一个好例子，在外表和认知偏袋（例如有主要手）方面都不寻常。然而，我们的大脑可以轻松超越这些不完整性，并专注于symmetric task中的效率。这个工作的驱动力是通过强化学习来捕捉这个能力。为此，我们介绍了一个名为“adaptive symmetry learning”的model-minimization actor-critic扩展。ASL包括一个对称适摄Component和一个模块损失函数，这些函数在学习政策时适应对称关系。我们在一个四脚蚂蚁模型中进行多向运动任务的case study中评估了ASL的表现。结果显示ASL可以从大的干扰中恢复和获得隐藏的对称状态的知识。它在大多数情况下与其他方法相比，能够获得相似或更好的表现，因此成为一种有价的方法来利用模型的对称性，并对于内在的干扰进行补偿。

Certifying LLM Safety against Adversarial Prompting

paper_url: http://arxiv.org/abs/2309.02705
repo_url: None
paper_authors: Aounon Kumar, Chirag Agarwal, Suraj Srinivas, Soheil Feizi, Hima Lakkaraju
for: 防止语言模型生成危险内容，通过检查输入提示的安全性。
methods: 使用“erase-and-check”框架，通过擦除提示中的单个字符，并使用安全筛选器进行检查，以保证输入提示的安全性。
results: 对于 adversarial suffix、insertion和infusion三种攻击方式，“erase-and-check”框架可以提供强有保证的安全性保证，同时在安全提示上保持良好的性能。例如，对于 adversarial suffix 长度为 20，可以 certificatively 检测93%的危险提示，并将94%的安全提示标记为安全。

Abstract
Large language models (LLMs) released for public use incorporate guardrails to ensure their output is safe, often referred to as "model alignment." An aligned language model should decline a user's request to produce harmful content. However, such safety measures are vulnerable to adversarial prompts, which contain maliciously designed token sequences to circumvent the model's safety guards and cause it to produce harmful content. In this work, we introduce erase-and-check, the first framework to defend against adversarial prompts with verifiable safety guarantees. We erase tokens individually and inspect the resulting subsequences using a safety filter. Our procedure labels the input prompt as harmful if any subsequences or the input prompt are detected as harmful by the filter. This guarantees that any adversarial modification of a harmful prompt up to a certain size is also labeled harmful. We defend against three attack modes: i) adversarial suffix, which appends an adversarial sequence at the end of the prompt; ii) adversarial insertion, where the adversarial sequence is inserted anywhere in the middle of the prompt; and iii) adversarial infusion, where adversarial tokens are inserted at arbitrary positions in the prompt, not necessarily as a contiguous block. Empirical results demonstrate that our technique obtains strong certified safety guarantees on harmful prompts while maintaining good performance on safe prompts. For example, against adversarial suffixes of length 20, it certifiably detects 93% of the harmful prompts and labels 94% of the safe prompts as safe using the open source language model Llama 2 as the safety filter.

摘要
大型语言模型（LLM）发布 для公共使用都会包括安全保护，通常称为“模型Alignment”。一个平衡化的语言模型应该拒绝用户的请求生成伤害性内容。然而，这些安全措施可能会受到攻击性的提示，这些提示可能会使模型生成伤害性内容。在这个工作中，我们介绍了“抹除和检查”，是第一个对抗攻击性提示的安全保证框架。我们将单独抹除token，并使用一个安全范 filter 来检查结果。如果任何 subsequences 或输入提示被识别为伤害的，我们就将该提示识别为伤害的。这 garanties 任何攻击性提示的修改，都会被识别为伤害的，并且最多可以是一定长度的攻击性提示。我们防止了三种攻击模式：i）攻击 suffix，将攻击性序列 append 到提示的结尾; ii）攻击插入，将攻击性序列插入提示的任何中间位置; iii）攻击混合，将攻击 Token 插入提示的任何位置，不一定是一个连续的对。我们的技术在伤害提示上获得了强大的认证安全保证，同时保持了良好的性能在安全提示上。例如，对于攻击 suffix 的长度为 20，我们可以认证地检测 93% 的伤害提示，并将 94% 的安全提示识别为安全使用 LLama 2 作为安全范 filter。

Diffusion-EDFs: Bi-equivariant Denoising Generative Modeling on SE(3) for Visual Robotic Manipulation

paper_url: http://arxiv.org/abs/2309.02685
repo_url: https://github.com/tomato1mule/diffusion_edf
paper_authors: Hyunwoo Ryu, Jiwoo Kim, Junwoo Chang, Hyun Seok Ahn, Joohwan Seo, Taehan Kim, Yubin Kim, Jongeun Choi, Roberto Horowitz
for: 本研究旨在提高机器人学习的数据效率、通用性和稳定性，通过在扩散生成模型中 интеGRATE spatial roto-translation equivariance（SE(3)-equivariance）。
methods: 本文提出了一种新的Diffusion-EDFs方法，通过在模型架构中 интеGRATE SE(3)-equivariance，实现了很好的数据效率，只需5到10个任务示范来实现终端培训。
results: 我们的方法在比较 diffusion-based manipulation方法时显示出了superior的通用性。

Abstract
Recent studies have verified that equivariant methods can significantly improve the data efficiency, generalizability, and robustness in robot learning. Meanwhile, denoising diffusion-based generative modeling has recently gained significant attention as a promising approach for robotic manipulation learning from demonstrations with stochastic behaviors. In this paper, we present Diffusion-EDFs, a novel approach that incorporates spatial roto-translation equivariance, i.e., SE(3)-equivariance to diffusion generative modeling. By integrating SE(3)-equivariance into our model architectures, we demonstrate that our proposed method exhibits remarkable data efficiency, requiring only 5 to 10 task demonstrations for effective end-to-end training. Furthermore, our approach showcases superior generalizability compared to previous diffusion-based manipulation methods.

摘要
Recent studies have confirmed that equivariant methods can significantly improve data efficiency, generalizability, and robustness in robot learning. Meanwhile, denoising diffusion-based generative modeling has recently gained significant attention as a promising approach for robotic manipulation learning from demonstrations with stochastic behaviors. In this paper, we propose Diffusion-EDFs, a novel approach that incorporates spatial roto-translation equivariance, i.e., SE(3)-equivariance into diffusion generative modeling. By integrating SE(3)-equivariance into our model architectures, we demonstrate that our proposed method exhibits remarkable data efficiency, requiring only 5 to 10 task demonstrations for effective end-to-end training. Furthermore, our approach showcases superior generalizability compared to previous diffusion-based manipulation methods.Note:* "Recent studies" is translated as "近期研究" (jìn qī yán jí)* "equivariancy" is translated as "对称性" (duì xiàng xìng)* "denoising diffusion-based generative modeling" is translated as "减噪扩散生成模型" (jiǎn shēng kuò chǎn shēng chéng yì)* "SE(3)-equivariance" is translated as "SE(3)对称性" (SE(3) duì xiàng xìng)* "spatial roto-translation equivariance" is translated as "空间旋转翻译对称性" (kōng jiān zhòu zhù yǐng duì xiàng xìng)* "diffusion generative modeling" is translated as "扩散生成模型" (kuò chǎn shēng chéng yì)* "task demonstrations" is translated as "任务示例" (tâi yì zhèng yè)* "generalizability" is translated as "通用性" (tōng yòng xìng)

Spatio-Temporal Contrastive Self-Supervised Learning for POI-level Crowd Flow Inference

paper_url: http://arxiv.org/abs/2309.03239
repo_url: None
paper_authors: Songyu Ke, Ting Li, Li Song, Yanping Sun, Qintian Sun, Junbo Zhang, Yu Zheng
for: 这种研究的目的是为了准确地掌握城市流动人口，以便更好地管理交通、公共服务和城市规划。
methods: 这种研究使用了自我超vised attributed graph representation learning技术，并引入了一种新的对比自学习框架（CSST）来处理缺乏标注数据的问题。
results: 实验表明，使用CSST预训练模型，可以在两个实际数据集上 consistently 超过从scratch 训练的模型。

Abstract
Accurate acquisition of crowd flow at Points of Interest (POIs) is pivotal for effective traffic management, public service, and urban planning. Despite this importance, due to the limitations of urban sensing techniques, the data quality from most sources is inadequate for monitoring crowd flow at each POI. This renders the inference of accurate crowd flow from low-quality data a critical and challenging task. The complexity is heightened by three key factors: 1) The scarcity and rarity of labeled data, 2) The intricate spatio-temporal dependencies among POIs, and 3) The myriad correlations between precise crowd flow and GPS reports. To address these challenges, we recast the crowd flow inference problem as a self-supervised attributed graph representation learning task and introduce a novel Contrastive Self-learning framework for Spatio-Temporal data (CSST). Our approach initiates with the construction of a spatial adjacency graph founded on the POIs and their respective distances. We then employ a contrastive learning technique to exploit large volumes of unlabeled spatio-temporal data. We adopt a swapped prediction approach to anticipate the representation of the target subgraph from similar instances. Following the pre-training phase, the model is fine-tuned with accurate crowd flow data. Our experiments, conducted on two real-world datasets, demonstrate that the CSST pre-trained on extensive noisy data consistently outperforms models trained from scratch.

摘要
准确地获取人群流动的点位（POI）是城市管理、公共服务和城市规划等领域的关键。然而，由于城市感知技术的限制，大多数数据质量不够高，无法准确地监测POI上的人群流动。这使得从低质量数据中推断准确的人群流动成为一项重要和挑战性的任务。这些挑战来自于以下三个因素：1）罕见和罕见的标注数据的稀缺，2）POI之间的复杂的空间-时间关系，3）精度人群流动和GPS报告之间的多种相关性。为 Address these challenges, we recast the crowd flow inference problem as a self-supervised attributed graph representation learning task and introduce a novel Contrastive Self-learning framework for Spatio-Temporal data (CSST). Our approach begins with the construction of a spatial adjacency graph founded on the POIs and their respective distances. We then employ a contrastive learning technique to exploit large volumes of unlabeled spatio-temporal data. We adopt a swapped prediction approach to anticipate the representation of the target subgraph from similar instances. Following the pre-training phase, the model is fine-tuned with accurate crowd flow data. Our experiments, conducted on two real-world datasets, demonstrate that the CSST pre-trained on extensive noisy data consistently outperforms models trained from scratch.

RLSynC: Offline-Online Reinforcement Learning for Synthon Completion

paper_url: http://arxiv.org/abs/2309.02671
repo_url: None
paper_authors: Frazier N. Baker, Ziqi Chen, Xia Ning
for: 这 paper 的目的是提出一种新的逆Synthesis方法，即RLSynC，可以帮助在 semi-template-based retrosynthesis 方法中完成 synthon。methods: RLSynC 使用一个 offline-online 强化学习方法，每个 synthon 都有一个代理，通过顺序进行行动来完成 synthon。 RLSynC 可以从 both offline 训练集和 online 交互中学习策略，以便探索新的反应空间。 RLSynC 使用一个前向合成模型来评估预测的反应物在合成产品中的可能性，从而导引action搜索。results: 对比之前的逆Synthesis方法，RLSynC 可以在 synthon 完成和 retrosynthesis 方面具有14.9% 和 14.0% 的提升。这表明RLSynC 在合成规划中具有潜在的应用价值。

Abstract
Retrosynthesis is the process of determining the set of reactant molecules that can react to form a desired product. Semi-template-based retrosynthesis methods, which imitate the reverse logic of synthesis reactions, first predict the reaction centers in the products, and then complete the resulting synthons back into reactants. These methods enable necessary interpretability and high practical utility to inform synthesis planning. We develop a new offline-online reinforcement learning method RLSynC for synthon completion in semi-template-based methods. RLSynC assigns one agent to each synthon, all of which complete the synthons by conducting actions step by step in a synchronized fashion. RLSynC learns the policy from both offline training episodes and online interactions which allow RLSynC to explore new reaction spaces. RLSynC uses a forward synthesis model to evaluate the likelihood of the predicted reactants in synthesizing a product, and thus guides the action search. We compare RLSynC with the state-of-the-art retrosynthesis methods. Our experimental results demonstrate that RLSynC can outperform these methods with improvement as high as 14.9% on synthon completion, and 14.0% on retrosynthesis, highlighting its potential in synthesis planning.

摘要
<>translate "Retrosynthesis is the process of determining the set of reactant molecules that can react to form a desired product. Semi-template-based retrosynthesis methods, which imitate the reverse logic of synthesis reactions, first predict the reaction centers in the products, and then complete the resulting synthons back into reactants. These methods enable necessary interpretability and high practical utility to inform synthesis planning. We develop a new offline-online reinforcement learning method RLSynC for synthon completion in semi-template-based methods. RLSynC assigns one agent to each synthon, all of which complete the synthons by conducting actions step by step in a synchronized fashion. RLSynC learns the policy from both offline training episodes and online interactions which allow RLSynC to explore new reaction spaces. RLSynC uses a forward synthesis model to evaluate the likelihood of the predicted reactants in synthesizing a product, and thus guides the action search. We compare RLSynC with the state-of-the-art retrosynthesis methods. Our experimental results demonstrate that RLSynC can outperform these methods with improvement as high as 14.9% on synthon completion, and 14.0% on retrosynthesis, highlighting its potential in synthesis planning."Translation:这是一个逐步的过程，用于决定具有创建所需产品的化学物质的集合。使用半模板基的逆合成方法，首先预测产品中的反应中心，然后将结果转换回到底物。这些方法具有实用的可行性和解释性，以帮助合成观察。我们开发了一种新的线上-线下强化学习方法RLSynC，用于实对这些方法的完成。RLSynC将一个代理人分配到每个实体，这些代理人逐步完成实体，并在同步化的方式下进行动作搜索。RLSynC从线上训练集和线上互动中学习策略，这 permet RLSynC 探索新的反应空间。RLSynC 使用一个前方合成模型来评估预测的底物是否可以合成产品，因此导引动作搜索。我们与现有的逆合成方法进行比较。我们的实验结果表明，RLSynC 可以与现有的方法相比，在实体完成和逆合成方面提高改善率高达14.9%和14.0%。这 highlights RLSynC 的潜在在合成观察。

Subsethood Measures of Spatial Granules

paper_url: http://arxiv.org/abs/2309.02662
repo_url: None
paper_authors: Liquan Zhao, Yiyu Yao
for: 这篇论文主要针对的是掌握复杂的信息系统中的知识空间和知识结构，并通过粗化集合论和空间粗化集合论来描述这些知识空间和知识结构。
methods: 该论文使用了粗化集合论和空间粗化集合论来描述知识空间和知识结构，并提出了一种基于 conditional granularity 和 conditional fineness 的推理模型。
results: 该论文的研究结果包括提出了十二个笃设增减子集 axioms 和其对应的十二个笃设增减超集 axioms，以及五种 conditional granularity 度量和五种 conditional fineness 度量。这些度量都满足了其对应的笃设增减子集 axioms，但只有一个 boundary condition。此外，该论文还定义了五种 conditional granularity 熵和五种 conditional fineness 熵。

Abstract
Subsethood, which is to measure the degree of set inclusion relation, is predominant in fuzzy set theory. This paper introduces some basic concepts of spatial granules, coarse-fine relation, and operations like meet, join, quotient meet and quotient join. All the atomic granules can be hierarchized by set-inclusion relation and all the granules can be hierarchized by coarse-fine relation. Viewing an information system from the micro and the macro perspectives, we can get a micro knowledge space and a micro knowledge space, from which a rough set model and a spatial rough granule model are respectively obtained. The classical rough set model is the special case of the rough set model induced from the micro knowledge space, while the spatial rough granule model will be play a pivotal role in the problem-solving of structures. We discuss twelve axioms of monotone increasing subsethood and twelve corresponding axioms of monotone decreasing supsethood, and generalize subsethood and supsethood to conditional granularity and conditional fineness respectively. We develop five conditional granularity measures and five conditional fineness measures and prove that each conditional granularity or fineness measure satisfies its corresponding twelve axioms although its subsethood or supsethood measure only hold one of the two boundary conditions. We further define five conditional granularity entropies and five conditional fineness entropies respectively, and each entropy only satisfies part of the boundary conditions but all the ten monotone conditions.

摘要
“subsethood”，用于量度集合关系的度量，在模糊集合论中具有先锋性。本文介绍了一些基本概念，包括空间格粒、粗糙关系、会议、合并、对应关系和粗糙关系等。所有的格粒都可以归类为集合包含关系中的层次结构，而所有的格粒都可以归类为粗糙关系中的层次结构。从微观和macro two perspectives，我们可以从 informationspace 中获得一个微观知识空间和一个macro知识空间，从而获得一个粗糙集合模型和一个空间粗糙格粒模型。classical rough set model 是微观知识空间中粗糙集合模型的特殊情况，而空间粗糙格粒模型将在结构问题中发挥重要的作用。我们讨论了12个升递条件和12个降递条件，并将subsethood和supersethood扩展到 conditional granularity 和 conditional fineness 。我们开发了5个 conditional granularity 度量和5个 conditional fineness 度量，并证明每个 conditional granularity 或 conditional fineness 度量都遵循其所对应的12个条件，即使它们的subsethood 或 supersethood 度量只满足一个边界条件。我们进一步定义5个 conditional granularity 熵和5个 conditional fineness 熵，每个熵只遵循一部分边界条件，但是所有的10个升递条件。

TFBEST: Dual-Aspect Transformer with Learnable Positional Encoding for Failure Prediction

paper_url: http://arxiv.org/abs/2309.02641
repo_url: None
paper_authors: Rohan Mohapatra, Saptarshi Sengupta
for: 预测硬盘失效，避免数据损失和公司形象问题。
methods: 使用Self-Monitoring, Analysis and Reporting Technology（S.M.A.R.T）logs和一种新的transformer架构——Temporal-fusion Bi-encoder Self-attention Transformer（TFBEST）进行预测。
results: 比 estado-of-the-art RUL 预测方法更高的准确率，并提供了一种新的信任度 estadística来帮助制造商在一定时间内更换硬盘。

Abstract
Hard Disk Drive (HDD) failures in datacenters are costly - from catastrophic data loss to a question of goodwill, stakeholders want to avoid it like the plague. An important tool in proactively monitoring against HDD failure is timely estimation of the Remaining Useful Life (RUL). To this end, the Self-Monitoring, Analysis and Reporting Technology employed within HDDs (S.M.A.R.T.) provide critical logs for long-term maintenance of the security and dependability of these essential data storage devices. Data-driven predictive models in the past have used these S.M.A.R.T. logs and CNN/RNN based architectures heavily. However, they have suffered significantly in providing a confidence interval around the predicted RUL values as well as in processing very long sequences of logs. In addition, some of these approaches, such as those based on LSTMs, are inherently slow to train and have tedious feature engineering overheads. To overcome these challenges, in this work we propose a novel transformer architecture - a Temporal-fusion Bi-encoder Self-attention Transformer (TFBEST) for predicting failures in hard-drives. It is an encoder-decoder based deep learning technique that enhances the context gained from understanding health statistics sequences and predicts a sequence of the number of days remaining before a disk potentially fails. In this paper, we also provide a novel confidence margin statistic that can help manufacturers replace a hard-drive within a time frame. Experiments on Seagate HDD data show that our method significantly outperforms the state-of-the-art RUL prediction methods during testing over the exhaustive 10-year data from Backblaze (2013-present). Although validated on HDD failure prediction, the TFBEST architecture is well-suited for other prognostics applications and may be adapted for allied regression problems.

摘要
硬盘驱动器（HDD）在数据中心失效的情况非常昂贵，从惨重的数据损失到对客户的信誉受到影响，各方希望避免这种情况。为了执行前置监测，评估硬盘的剩下有用生命（RUL）是非常重要的。为此，硬盘内部的自我监测、分析和报告技术（S.M.A.R.T）提供了关键的日志记录，以长期维护硬盘的安全性和可靠性。过去的数据驱动预测模型使用了这些S.M.A.R.T.日志和卷积神经网络（CNN/RNN）结构，但它们在提供预测RUL值的置信度范围以及处理非常长的日志序列时受到了重大的挑战。此外，一些这些方法，如基于LSTM的方法，在训练过程中具有慢速的特点和繁琐的特征工程过程。为了解决这些挑战，我们在这种工作中提出了一种新的 transformer 架构——时间融合二元自注意 transformer（TFBEST），用于预测硬盘失效。这是一种基于Encoder-Decoder的深度学习技术，它在理解医疗统计序列中的上下文中提高了硬盘的健康统计序列，并预测硬盘失效的序列。在这篇论文中，我们还提出了一种新的置信度范围统计，可以帮助制造商在一定的时间内替换硬盘。实验结果表明，我们的方法在测试过程中对Backblaze（2013-present）的10年数据进行了显著的性能提升，至于RUL预测方面。尽管验证在硬盘失效预测方面，但TFBEST架构适用于其他预测应用程序，可以适应相关的回归问题。

Deep Reinforcement Learning from Hierarchical Weak Preference Feedback

paper_url: http://arxiv.org/abs/2309.02632
repo_url: https://github.com/abukharin3/heron
paper_authors: Alexander Bukharin, Yixiao Li, Pengcheng He, Weizhu Chen, Tuo Zhao
For: The paper is written for practical reinforcement learning (RL) tasks, specifically to address the challenges of reward engineering and the limitations of reinforcement learning from human feedback (RLHF).* Methods: The paper proposes a new RL framework called HERON, which uses a hierarchical decision tree to compare trajectories and train a preference-based reward model. The framework leverages human preference data to learn complex rewards that are well aligned with human preferences.* Results: The paper finds that the proposed HERON framework can train high-performing agents on a variety of difficult tasks, and provides additional benefits such as improved sample efficiency and robustness. The authors also provide a publicly available code implementation of the framework at https://github.com/abukharin3/HERON.Here is the same information in Simplified Chinese text:* For: 论文是为实际的奖励学习（RL）任务写的，特别是解决奖励工程学（RLHF）中的挑战和限制。* Methods: 论文提出了一种新的RL框架called HERON，使用决策树来比较 trajectory 并训练一种基于偏好的奖励模型。该框架利用人类偏好数据来学习复杂的奖励，使RL能够更好地解决复杂的问题。* Results: 论文发现，提出的HERON框架可以训练高性能的代理人agent 在多种具有挑战性的任务上，并提供了更多的优点，如提高的样本效率和Robustness。作者还提供了一个公共可用的代码实现HERON的框架，可以在https://github.com/abukharin3/HERON中获取。

Abstract
Reward design is a fundamental, yet challenging aspect of practical reinforcement learning (RL). For simple tasks, researchers typically handcraft the reward function, e.g., using a linear combination of several reward factors. However, such reward engineering is subject to approximation bias, incurs large tuning cost, and often cannot provide the granularity required for complex tasks. To avoid these difficulties, researchers have turned to reinforcement learning from human feedback (RLHF), which learns a reward function from human preferences between pairs of trajectory sequences. By leveraging preference-based reward modeling, RLHF learns complex rewards that are well aligned with human preferences, allowing RL to tackle increasingly difficult problems. Unfortunately, the applicability of RLHF is limited due to the high cost and difficulty of obtaining human preference data. In light of this cost, we investigate learning reward functions for complex tasks with less human effort; simply by ranking the importance of the reward factors. More specifically, we propose a new RL framework -- HERON, which compares trajectories using a hierarchical decision tree induced by the given ranking. These comparisons are used to train a preference-based reward model, which is then used for policy learning. We find that our framework can not only train high performing agents on a variety of difficult tasks, but also provide additional benefits such as improved sample efficiency and robustness. Our code is available at https://github.com/abukharin3/HERON.

摘要
практическое обучение с подкреплением (RL) 是一个基本 yet 挑战性的问题。 для简单任务，研究人员通常手工设计奖函数，例如使用一个线性组合多个奖因素。 however， Such reward engineering is subject to approximation bias, incurs large tuning cost, and often cannot provide the granularity required for complex tasks. To avoid these difficulties, researchers have turned to reinforcement learning from human feedback (RLHF), which learns a reward function from human preferences between pairs of trajectory sequences. By leveraging preference-based reward modeling, RLHF learns complex rewards that are well aligned with human preferences, allowing RL to tackle increasingly difficult problems. Unfortunately, the applicability of RLHF is limited due to the high cost and difficulty of obtaining human preference data.在考虑这些成本的情况下，我们调查了一种新的RL框架——HERON，该框架通过一个层次决策树来比较方程。这些比较用于训练一个基于偏好的奖金模型，该模型然后用于策略学习。我们发现，我们的框架不仅可以训练高性能的代理人在多种具有挑战性的任务上，还可以提供其他优点，如提高样本效率和Robustness。我们的代码可以在https://github.com/abukharin3/HERON中找到。

2023-09-06

cs.CL

cs.CL - 2023-09-06

RoDia: A New Dataset for Romanian Dialect Identification from Speech

paper_url: http://arxiv.org/abs/2309.03378
repo_url: https://github.com/codrut2/rodia
paper_authors: Codrut Rotaru, Nicolae-Catalin Ristea, Radu Tudor Ionescu
for: 这个论文的目的是提供一个关于罗马尼亚语言方言识别的数据集，以便进一步研究罗马尼亚语言方言识别技术。
methods: 这个论文使用了一些竞争性的模型来进行罗马尼亚语言方言识别，包括一个基于语音特征的模型和一个基于文本特征的模型。
results: 根据 macro F1 分数，这个论文的最高分模型可以达到 59.83% 和 62.08%，表明这是一个有挑战性的任务。

Abstract
Dialect identification is a critical task in speech processing and language technology, enhancing various applications such as speech recognition, speaker verification, and many others. While most research studies have been dedicated to dialect identification in widely spoken languages, limited attention has been given to dialect identification in low-resource languages, such as Romanian. To address this research gap, we introduce RoDia, the first dataset for Romanian dialect identification from speech. The RoDia dataset includes a varied compilation of speech samples from five distinct regions of Romania, covering both urban and rural environments, totaling 2 hours of manually annotated speech data. Along with our dataset, we introduce a set of competitive models to be used as baselines for future research. The top scoring model achieves a macro F1 score of 59.83% and a micro F1 score of 62.08%, indicating that the task is challenging. We thus believe that RoDia is a valuable resource that will stimulate research aiming to address the challenges of Romanian dialect identification. We publicly release our dataset and code at https://github.com/codrut2/RoDia.

摘要
<>TRANSLATE_TEXT叙述语言识别是语音处理和语言科技领域中的一项关键任务，提高了各种应用，如语音识别、 speaker 验证等。而大多数研究都集中在普通语言上进行了叙述语言识别研究，对于低资源语言，如罗马尼亚语，则受到了少量关注。为了填补这个研究漏洞，我们介绍了 RoDia，罗马尼亚语言识别的首个数据集。RoDia 数据集包括来自罗马尼亚五个区域的语音采样，涵盖了城市和农村环境，总计两小时的手动标注的语音数据。同时，我们也提供了一组竞争力强的模型，用于未来研究的基线。最高分模型的macro F1分数为59.83%，微 F1分数为62.08%，这表明该任务是有挑战性的。因此，我们认为 RoDia 是一个有价值的资源，将激发研究者努力解决罗马尼亚语言识别的挑战。我们在 GitHub 上公开发布了数据集和代码，请参考。

Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation

paper_url: http://arxiv.org/abs/2309.03340
repo_url: None
paper_authors: Arvind Krishna Sridhar, Yinyi Guo, Erik Visser, Rehana Mahfuz
for: 本研究旨在解决自动化音频captioning任务中的过参数和大型模型存储问题。
methods: 我们提出了一种数据增强技术，并使用 Shared latent space 来检测幻觉。我们还提出了一种 parameter efficient 的推理时 faithful decoding 算法，以降低模型的存储大小和计算复杂度。
results: 我们在标准测试集上证明了我们的方法可以 достичь与更大的模型相同的性能，同时具有较小的存储大小和计算复杂度。

Abstract
There has been significant research on developing pretrained transformer architectures for multimodal-to-text generation tasks. Albeit performance improvements, such models are frequently overparameterized, hence suffer from hallucination and large memory footprint making them challenging to deploy on edge devices. In this paper, we address both these issues for the application of automated audio captioning. First, we propose a data augmentation technique for generating hallucinated audio captions and show that similarity based on an audio-text shared latent space is suitable for detecting hallucination. Then, we propose a parameter efficient inference time faithful decoding algorithm that enables smaller audio captioning models with performance equivalent to larger models trained with more data. During the beam decoding step, the smaller model utilizes an audio-text shared latent representation to semantically align the generated text with corresponding input audio. Faithful guidance is introduced into the beam probability by incorporating the cosine similarity between latent representation projections of greedy rolled out intermediate beams and audio clip. We show the efficacy of our algorithm on benchmark datasets and evaluate the proposed scheme against baselines using conventional audio captioning and semantic similarity metrics while illustrating tradeoffs between performance and complexity.

摘要
有很多研究在开发预训练变换器架构来进行多modal-to-text生成任务。虽然性能有所提高，但这些模型经常过参数，导致幻觉和巨大的内存占用，使其在边缘设备上部署困难。在这篇论文中，我们解决了这些问题，并将其应用于自动化音频captioning。我们首先提出了一种数据增强技术，通过生成幻觉的音频caption来检测幻觉。然后，我们提出了一种 parameter efficient的执行时 faithful decoding算法，允许更小的音频captioning模型，并且其性能与更多数据训练的更大模型相同。在扩散搜索步骤中，更小的模型使用音频-文本共同的幻像表示来 semantic align生成的文本和相应的输入音频。我们引入了cosine相似性来导引搜索步骤中的投票概率，以确保生成的文本具有与输入音频的Semantic相似性。我们在标准 datasets上证明了我们的算法的效果，并对基eline使用 conventient audio captioning和semantic相似度度量进行评估，同时示出了性能和复杂度之间的负责任。

Gender-specific Machine Translation with Large Language Models

paper_url: http://arxiv.org/abs/2309.03175
repo_url: None
paper_authors: Eduardo Sánchez, Pierre Andrews, Pontus Stenetorp, Mikel Artetxe, Marta R. Costa-jussà
for: investigate the use of Decoder-only Large Language Models (LLMs) for gender-specific translations
methods: use LLaMa, a decoder-only LLM, to generate gender-specific translations and compare its performance to a state-of-the-art multilingual NMT system (NLLB)
results: LLaMa can generate gender-specific translations with competitive accuracy and mitigate gender bias, and its translations are robust in gender-ambiguous datasets but less consistent in less ambiguous contexts.

Abstract
Decoder-only Large Language Models (LLMs) have demonstrated potential in machine translation (MT), albeit with performance slightly lagging behind traditional encoder-decoder Neural Machine Translation (NMT) systems. However, LLMs offer a unique advantage: the ability to control the properties of the output through prompts. In this study, we harness this flexibility to explore LLaMa's capability to produce gender-specific translations for languages with grammatical gender. Our results indicate that LLaMa can generate gender-specific translations with competitive accuracy and gender bias mitigation when compared to NLLB, a state-of-the-art multilingual NMT system. Furthermore, our experiments reveal that LLaMa's translations are robust, showing significant performance drops when evaluated against opposite-gender references in gender-ambiguous datasets but maintaining consistency in less ambiguous contexts. This research provides insights into the potential and challenges of using LLMs for gender-specific translations and highlights the importance of in-context learning to elicit new tasks in LLMs.

摘要
大型语言模型（LLM）只有decoder部分的实现在机器翻译（MT）中表现了潜在的优势，虽然其性能略为落后于传统的编码器-解码器神经机器翻译（NMT）系统。然而，LLM具有一个独特的优势：可以通过提示来控制输出的性质。在这项研究中，我们利用这种灵活性来探索LLaMa的可能性以生成语言中 grammatical gender 的 gender-specific 翻译。我们的结果表明，LLaMa可以生成与NLLB（一种现代多语言 NMT 系统）相比的竞争性准确性和减少 gender bias 的 gender-specific 翻译。此外，我们的实验还表明了 LLaMa 的翻译是稳定的，在 gender-ambiguous 数据集中评估时表现出了明显的性能下降，但在 less ambiguous 上保持了一致性。这项研究为使用 LLM 进行 gender-specific 翻译提供了新的视角和挑战，并 highlighted 在 LLM 中进行 in-context learning 以提取新任务的重要性。

GPT-InvestAR: Enhancing Stock Investment Strategies through Annual Report Analysis with Large Language Models

paper_url: http://arxiv.org/abs/2309.03079
repo_url: https://github.com/UditGupta10/GPT-InvestAR
paper_authors: Udit Gupta
For: The paper aims to simplify the process of assessing Annual Reports of all firms by leveraging the capabilities of Large Language Models (LLMs) to generate insights and improve stock price predictions.* Methods: The paper uses Large Language Models (LLMs) to analyze Annual Reports and generate insights, which are then compiled into a Quant styled dataset and augmented with historical stock price data. A Machine Learning model is trained with LLM outputs as features to predict stock prices.* Results: The walkforward test results show promising outperformance compared to S&P500 returns, indicating the effectiveness of the proposed framework in predicting stock prices.Here’s the simplified Chinese text for the three information points:* For: 这份论文旨在使用大量语言模型（LLMs）来简化所有公司的年度报告评估过程，以提高股票价格预测的准确性。* Methods: 论文使用大量语言模型（LLMs）来分析年度报告，生成报告中的材料，并将其与历史股票价格数据相结合。使用机器学习模型，将LMM输出作为特征来预测股票价格。* Results: 论文的步骤测试结果表明，使用该方法可以与S&P500的返点相比，表明该方法的效果性。

Abstract
Annual Reports of publicly listed companies contain vital information about their financial health which can help assess the potential impact on Stock price of the firm. These reports are comprehensive in nature, going up to, and sometimes exceeding, 100 pages. Analysing these reports is cumbersome even for a single firm, let alone the whole universe of firms that exist. Over the years, financial experts have become proficient in extracting valuable information from these documents relatively quickly. However, this requires years of practice and experience. This paper aims to simplify the process of assessing Annual Reports of all the firms by leveraging the capabilities of Large Language Models (LLMs). The insights generated by the LLM are compiled in a Quant styled dataset and augmented by historical stock price data. A Machine Learning model is then trained with LLM outputs as features. The walkforward test results show promising outperformance wrt S&P500 returns. This paper intends to provide a framework for future work in this direction. To facilitate this, the code has been released as open source.

摘要
公司年度报告包含重要的财务健康信息，可以帮助评估公司股票价格的可能影响。这些报告是全面的，有时 même exceeding 100 页。分析这些报告是困难的，尤其是对于整个公司宇宙。随着年月的掌握，金融专家们已经学会了快速提取这些报告中的有价值信息。然而，这需要多年的实践和经验。本文提出了使用大型自然语言模型（LLM）简化公司年度报告的评估过程。LLM 输出的含义被编译成量化风格的数据集，并与历史股票价格数据进行拟合。然后，使用 LLM 输出作为特征来训练机器学习模型。walkforward 测试结果表明，使用这种方法可以实现出色的超前性，与 S&P500 回报相比。本文的目标是提供未来研究的框架。为此，我们已经公开发布了代码。

Narrative as a Dynamical System

paper_url: http://arxiv.org/abs/2309.06600
repo_url: None
paper_authors: Isidoros Doxas, James Meiss, Steven Bottone, Tom Strelich, Andrew Plummer, Adrienne Breland, Simon Dennis, Kathy Garvin-Doxas, Michael Klymkowsky
for: 这个论文探讨了人类活动和叙事的动力系统特性，并使用物理学概念来描述它们的演化。
methods: 该论文使用了500个不同的叙事来构建三条平均路径，并证明这些平均路径符合动力学原理。
results: 研究发现，人类活动和叙事的演化可以被视为动力系统，其演化可以通过动力学原理来描述。

Abstract
There is increasing evidence that human activity in general, and narrative in particular, can be treated as a dynamical system in the physics sense; a system whose evolution is described by an action integral, such that the average of all possible paths from point A to point B is given by the extremum of the action. We create by construction three such paths by averaging about 500 different narratives, and we show that the average path is consistent with an action principle.

摘要
人类活动和 narative 可以被视为物理学上的动力系统，其演化可以通过动力 integral 来描述，其中所有可能的路径从点 A 到点 B 的平均值是动力 integral 的极值。我们通过构建3个这样的路径，并证明这些路径符合动力原理。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know.

Everyone Deserves A Reward: Learning Customized Human Preferences

paper_url: http://arxiv.org/abs/2309.03126
repo_url: https://github.com/linear95/dsp
paper_authors: Pengyu Cheng, Jiawen Xie, Ke Bai, Yong Dai, Nan Du
for: 本研究旨在提高大语言模型（LLM）与人类偏好的对应性，以提高交互质量。
methods: 本研究提出了一种三Stage个性化奖励模型（RM）学习方案，并在实验阶段采用了多种训练和数据策略来保持普遍偏好的能力。
results: 研究发现，三Stage个性化奖励模型可以更好地适应个性化应用场景，并且可以保持普遍偏好的能力。此外，研究还发现了一些可以更好地保持普遍偏好的训练和数据策略。

Abstract
Reward models (RMs) are essential for aligning large language models (LLMs) with human preferences to improve interaction quality. However, the real world is pluralistic, which leads to diversified human preferences with respect to different religions, politics, cultures, etc. Moreover, each individual can have their unique preferences on various topics. Neglecting the diversity of human preferences, current human feedback aligning methods only consider a general reward model, which is below satisfaction for customized or personalized application scenarios. To explore customized preference learning, we collect a domain-specific preference (DSP) dataset, which includes preferred responses for each given query from four practical domains. Besides, from the perspective of data efficiency, we propose a three-stage customized RM learning scheme, then empirically verify its effectiveness on both general preference datasets and our DSP set. Furthermore, we test multiple training and data strategies on the three learning stages. We find several ways to better preserve the general preferring ability while training the customized RMs, especially general preference enrichment, and customized preference imitation learning. The DSP dataset and code are available at https://github.com/Linear95/DSP.

摘要
<>将文本翻译成简化中文。<>大型语言模型（LLM）需要奖励模型（RM）来调整和人类偏好以提高交互质量。然而，现实世界是多元的，这导致了不同的宗教、政治、文化等方面的多样化人类偏好。此外，每个个体可能有对各种主题的独特偏好。忽视人类偏好的多样性，现有的人类反馈对齐方法只考虑通用奖励模型，这对个性化或个性化应用场景下是不满足的。为了探索个性化偏好学习，我们收集了域pecific preference（DSP） dataset，该集包括每个查询的首选回答。此外，从数据效率的角度，我们提议了三个阶段个性化RM学习方案，然后经验验证其效果在通用奖励 dataset和我们的 DSP 集上。此外，我们在三个学习阶段中测试了多种训练和数据策略，发现了一些方法可以更好地保持通用偏好能力 while 训练个性化RM，特别是通用偏好增强和个性化偏好模仿学习。 DSP dataset 和代码可以在上获取。

Knowledge Solver: Teaching LLMs to Search for Domain Knowledge from Knowledge Graphs

paper_url: http://arxiv.org/abs/2309.03118
repo_url: None
paper_authors: Chao Feng, Xinyu Zhang, Zichu Fei
for: 提高大语言模型（LLM）的域专知和解释能力
methods: 提出了一种名为知识解决器（KSL）的方法，通过利用LLM的强大总体化能力来教育LLM检索外部知识库中的关键信息
results: 在三个数据集上（ CommonsenseQA、OpenbookQA 和 MedQA-USMLE）进行了实验，发现我们的方法可以提高LLM基线性能，相比原始LLM，提高了相对较大的margin。

Abstract
Large language models (LLMs), such as ChatGPT and GPT-4, are versatile and can solve different tasks due to their emergent ability and generalizability. However, LLMs sometimes lack domain-specific knowledge to perform tasks, which would also cause hallucination during inference. In some previous works, additional modules like graph neural networks (GNNs) are trained on retrieved knowledge from external knowledge bases, aiming to mitigate the problem of lacking domain-specific knowledge. However, incorporating additional modules: 1) would need retraining additional modules when encountering novel domains; 2) would become a bottleneck since LLMs' strong abilities are not fully utilized for retrieval. In this paper, we propose a paradigm, termed Knowledge Solver (KSL), to teach LLMs to search for essential knowledge from external knowledge bases by harnessing their own strong generalizability. Specifically, we design a simple yet effective prompt to transform retrieval into a multi-hop decision sequence, which empowers LLMs with searching knowledge ability in zero-shot manner. Additionally, KSL is able to provide complete retrieval paths and therefore increase explainability of LLMs' reasoning processes. We conduct experiments on three datasets: CommonsenseQA, OpenbookQA, and MedQA-USMLE, and found that our approach improves LLM baseline performance by a relatively large margin.

摘要
大型语言模型（LLM），如ChatGPT和GPT-4，具有多方面的能力和普遍化能力，可以解决不同任务。然而，LLM sometimes lack domain-specific knowledge to perform tasks, leading to hallucination during inference. 在一些先前的工作中，运用了另外的模组，如 graf neural network（GNN），在外部知识库中获取知识，以减少缺乏专业知识的问题。但是，将这些模组添加到 LLM 中会有以下问题：1）需要重新训练这些模组当遇到新的领域时; 2）会成为一个瓶颈，因为 LLM 的强大能力不能完全利用来搜寻。在这篇论文中，我们提出了一个概念，称为知识解决方案（KSL），以 teach LLM 如何从外部知识库中搜寻必要的知识。具体来说，我们设计了一个简单 yet 有效的提示，将搜寻变成多阶层决策序列，这使得 LLM 获得了寻找知识的能力，而不需要重新训练。此外，KSL 能够提供完整的搜寻路径，因此增加了 LLM 的解释过程的可读性。我们在 CommonsenseQA、OpenbookQA 和 MedQA-USMLE 三个数据集上进行了实验，发现我们的方法可以对 LLM 基线性能提高相对较大的margin。

ContrastWSD: Enhancing Metaphor Detection with Word Sense Disambiguation Following the Metaphor Identification Procedure

paper_url: http://arxiv.org/abs/2309.03103
repo_url: None
paper_authors: Mohamad Elzohbi, Richard Zhao
for: 本研究开发了一个基于 RoBERTa 的比喻检测模型，将比喻识别程序 (MIP) 和字汇散义 (WSD) 组合使用，从文本中提取和比较词汇的上下文意义和基本意义，以判断词汇是否在句子中使用比喻性。
methods: 本模型使用 WSD 模型获取词汇的不同意义，并与上下文嵌入相结合，以优化比喻检测过程。
results: 本研究在不同的 benchmark 数据集上进行评估，与强基eline 进行比较，结果显示本模型优于其他仅靠上下文嵌入或仅融合基本定义和其他外部知识的方法。

Abstract
This paper presents ContrastWSD, a RoBERTa-based metaphor detection model that integrates the Metaphor Identification Procedure (MIP) and Word Sense Disambiguation (WSD) to extract and contrast the contextual meaning with the basic meaning of a word to determine whether it is used metaphorically in a sentence. By utilizing the word senses derived from a WSD model, our model enhances the metaphor detection process and outperforms other methods that rely solely on contextual embeddings or integrate only the basic definitions and other external knowledge. We evaluate our approach on various benchmark datasets and compare it with strong baselines, indicating the effectiveness in advancing metaphor detection.

摘要
Here's the translation in Simplified Chinese:这篇论文介绍了 ContrastWSD，一种基于 RoBERTa 的比喻检测模型，该模型结合了 Metaphor Identification Procedure (MIP) 和 Word Sense Disambiguation (WSD)，以提取和对比上下文中的意思和基本意思来确定一个词是否在句子中使用比喻。通过利用 WSD 模型提取出的词义，我们的模型可以增强比喻检测的过程，并超越其他基于上下文嵌入或只是将基本定义和其他知识集成的方法。我们在多个标准测试集上评估了我们的方法，并与强基线相比较，表明我们的方法有效地提高了比喻检测。

Persona-aware Generative Model for Code-mixed Language

paper_url: http://arxiv.org/abs/2309.02915
repo_url: https://github.com/victor7246/paradox
paper_authors: Ayan Sengupta, Md Shad Akhtar, Tanmoy Chakraborty
for: 本研究旨在开发一种基于人物特点的代码混合生成模型，以生成更加真实的人类语言混合文本。
methods: 提出了一种基于Transformer编码器-解码器模型的Persona-aware Generative Model for Code-mixed Generation（PARADOX），通过对每个词语进行人物特点conditioning，生成更加自然的code-mixed文本。同时，提出了一种Alignment模块，以重新调整生成的序列，使其更加符合实际的语言混合文本。
results: PARADOX在测试集上的平均CM BLEU分数高于非人物基于模型1.6个分，其表现在混合文本的语义准确性和语言VALIDITY方面也有32%和47%的提升。

Abstract
Code-mixing and script-mixing are prevalent across online social networks and multilingual societies. However, a user's preference toward code-mixing depends on the socioeconomic status, demographics of the user, and the local context, which existing generative models mostly ignore while generating code-mixed texts. In this work, we make a pioneering attempt to develop a persona-aware generative model to generate texts resembling real-life code-mixed texts of individuals. We propose a Persona-aware Generative Model for Code-mixed Generation, PARADOX, a novel Transformer-based encoder-decoder model that encodes an utterance conditioned on a user's persona and generates code-mixed texts without monolingual reference data. We propose an alignment module that re-calibrates the generated sequence to resemble real-life code-mixed texts. PARADOX generates code-mixed texts that are semantically more meaningful and linguistically more valid. To evaluate the personification capabilities of PARADOX, we propose four new metrics -- CM BLEU, CM Rouge-1, CM Rouge-L and CM KS. On average, PARADOX achieves 1.6 points better CM BLEU, 47% better perplexity and 32% better semantic coherence than the non-persona-based counterparts.

摘要
【文本】在社交媒体和多语言社会中，混合代码和文本混合是普遍存在的。然而，用户对代码混合的偏好受到了用户的社会经济地位、人口结构和当地情况的影响，这些因素现存在的生成模型大多忽略。在这种情况下，我们提出了一种基于Transformer的Persona-aware生成模型，名为PARADOX，可以生成符合实际生活中个人代码混合文本的文本。我们提出了一个对齐模块，可以重新调整生成的序列，使其更加接近实际生活中的代码混合文本。PARADOX生成的文本具有更高的semantic coherence和linguistic validity。为评估PARADOX的人格化能力，我们提出了四个新的指标：CM BLEU、CM Rouge-1、CM Rouge-L和CM KS。平均而言，PARADOX在这些指标中的表现比非人格化counterpart更好，其CM BLEU指标提高1.6个点，折占率提高47%，semantic coherence提高32%。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. The translation is based on the original text in English, and some cultural references or idioms may not be fully preserved in the translation.

Leave no Place Behind: Improved Geolocation in Humanitarian Documents

paper_url: http://arxiv.org/abs/2309.02914
repo_url: None
paper_authors: Enrico M. Belliardo, Kyriaki Kalimeri, Yelena Mejova
for: The paper aims to improve the performance of natural language processing (NLP) tools in the humanitarian sector by developing annotated resources for geotagging humanitarian texts.
methods: The authors use two popular Named Entity Recognition (NER) tools, Spacy and roBERTa, and develop a geocoding method called FeatureRank to link candidate locations to the GeoNames database.
results: The authors find that the humanitarian-domain data improves the performance of the classifiers (up to F1 = 0.92) and alleviates some of the bias of the existing tools, which erroneously favor locations in Western countries. However, they conclude that more resources from non-Western documents are necessary to ensure that off-the-shelf NER systems are suitable for deployment in the humanitarian sector.Here is the same information in Simplified Chinese text:
for: 这篇论文目标是提高人道主义领域自然语言处理（NLP）工具的性能，通过开发人道主义文本地标注资源。
methods: 作者使用两种流行的名实Recognition（NER）工具，Spacy和roBERTa，并开发了一种名为FeatureRank的地编码方法，将候选地点与GeoNames数据库相关联。
results: 作者发现，人道主义领域数据可以提高分类器的性能（最高F1 = 0.92），并减少现有工具的偏见，但是发现更多的非西方文档资源是必要的，以确保off-the-shelf NER系统适用于人道主义领域。

Abstract
Geographical location is a crucial element of humanitarian response, outlining vulnerable populations, ongoing events, and available resources. Latest developments in Natural Language Processing may help in extracting vital information from the deluge of reports and documents produced by the humanitarian sector. However, the performance and biases of existing state-of-the-art information extraction tools are unknown. In this work, we develop annotated resources to fine-tune the popular Named Entity Recognition (NER) tools Spacy and roBERTa to perform geotagging of humanitarian texts. We then propose a geocoding method FeatureRank which links the candidate locations to the GeoNames database. We find that not only does the humanitarian-domain data improves the performance of the classifiers (up to F1 = 0.92), but it also alleviates some of the bias of the existing tools, which erroneously favor locations in the Western countries. Thus, we conclude that more resources from non-Western documents are necessary to ensure that off-the-shelf NER systems are suitable for the deployment in the humanitarian sector.

摘要
地理位置是人道主义应对中的关键元素，描述易受影响人口、进行中的事件和可用资源。最新的自然语言处理技术可能帮助提取人道主义领域的重要信息。然而，现有状态的信息EXTRACTION工具的性能和偏见仍未得到了评估。在这项工作中，我们开发了精心标注的资源，用于精细地调整Spacy和roBERTa等流行的名实Recognition（NER）工具。然后，我们提出了一种地图编码方法FeatureRank，将候选地点与GeoNames数据库连接起来。我们发现，不仅可以提高人道主义领域的数据改进了分类器的性能（F1 = 0.92），还可以解决现有工具的偏见，这些工具偏 toward西方国家的位置。因此，我们 conclude that需要更多来自非西方文档的资源，以确保off-the-shelf NER系统适用于人道主义领域。

paper_url: http://arxiv.org/abs/2309.02902
repo_url: https://github.com/phanchauthang/ViCGCN
paper_authors: Chau-Thang Phan, Quoc-Nam Nguyen, Chi-Thanh Dang, Trong-Hop Do, Kiet Van Nguyen
for: 本研究旨在提高越南语言社交媒体上的信息挖掘 task 的效果，通过利用图structured data的特点来 Address 数据不均匀和噪音问题。
methods: 本研究提出了一种基于 PhoBERT 和 Graph Convolutional Networks 的新方法，称为 ViCGCN，通过将它们相互融合以提高语言模型的表达能力和语义依赖关系的捕捉。
results: 对于多个越南语言社交媒体数据集， experiments 表明，应用 GCN 到 BERTology 模型的最终层可以显著提高性能，而 ViCGCN 还可以超过 13 个基eline 模型，包括 BERTology 模型、 fusions BERTology 和 GCN 模型、其他基eline 和 SOTA 在三个社交媒体数据集上。

Abstract
Social media processing is a fundamental task in natural language processing with numerous applications. As Vietnamese social media and information science have grown rapidly, the necessity of information-based mining on Vietnamese social media has become crucial. However, state-of-the-art research faces several significant drawbacks, including imbalanced data and noisy data on social media platforms. Imbalanced and noisy are two essential issues that need to be addressed in Vietnamese social media texts. Graph Convolutional Networks can address the problems of imbalanced and noisy data in text classification on social media by taking advantage of the graph structure of the data. This study presents a novel approach based on contextualized language model (PhoBERT) and graph-based method (Graph Convolutional Networks). In particular, the proposed approach, ViCGCN, jointly trained the power of Contextualized embeddings with the ability of Graph Convolutional Networks, GCN, to capture more syntactic and semantic dependencies to address those drawbacks. Extensive experiments on various Vietnamese benchmark datasets were conducted to verify our approach. The observation shows that applying GCN to BERTology models as the final layer significantly improves performance. Moreover, the experiments demonstrate that ViCGCN outperforms 13 powerful baseline models, including BERTology models, fusion BERTology and GCN models, other baselines, and SOTA on three benchmark social media datasets. Our proposed ViCGCN approach demonstrates a significant improvement of up to 6.21%, 4.61%, and 2.63% over the best Contextualized Language Models, including multilingual and monolingual, on three benchmark datasets, UIT-VSMEC, UIT-ViCTSD, and UIT-VSFC, respectively. Additionally, our integrated model ViCGCN achieves the best performance compared to other BERTology integrated with GCN models.

摘要
社交媒体处理是自然语言处理的基本任务，具有广泛的应用。随着越南社交媒体和信息科学的快速发展，对越南社交媒体上的信息挖掘成为了急需。然而，现状的研究面临着一些重要的缺点，包括社交媒体平台上的数据不均衡和噪音。这两个问题在越南社交媒体文本中具有重要性。图 convolutional neural networks 可以在文本分类任务中解决社交媒体上的数据不均衡和噪音问题，因为它可以利用数据的图结构。本研究提出了一种基于上下文化语言模型（PhoBERT）和图基本方法（图 convolutional neural networks）的新approach。具体来说，我们的方法ViCGCN通过将上下文化预测器和图基本方法相结合，以提高文本分类任务中的性能。我们对越南多个benchmark dataset进行了广泛的实验，以验证我们的方法。结果显示，在BERTology模型的最终层添加GCN层可以显著提高性能。此外，实验还表明，ViCGCN方法在三个benchmark社交媒体dataset上比13种基eline模型，包括BERTology模型、混合BERTology和GCN模型、其他基eline和SOTA的最佳性能。我们的提出的ViCGCN方法在三个benchmark dataset上提高了6.21%、4.61%和2.63%。此外，我们的整合模型ViCGCN在BERTology和GCN模型的混合方法中表现出了最佳性能。

paper_url: http://arxiv.org/abs/2309.06572
repo_url: None
paper_authors: Amit Moryossef
For: 本研究探讨了人类交流中的关键 yet often overlooked 的非语言示意，包括合作谈话姿势和表情，以及它们对自然语言处理（NLP）的影响。* Methods: 我们建议通过借鉴了手语处理技术的发展，开发一种通用的自动手势分割和转写模型，以将非语言示意转化为文本形式。这种方法可以填补 spoken language 理解中的盲点，扩大 NLP 模型的范围和适用范围。* Results: 通过启示性的例子，我们展示了依赖 solely 文本基础模型的局限性。我们提出了一种 computationally efficient 和灵活的方法，可以轻松地与现有 NLP 管道集成。我们 conclude 呼吁研究人员参与开发通用转写方法，并验证其效果性。

Abstract
This paper explores the critical but often overlooked role of non-verbal cues, including co-speech gestures and facial expressions, in human communication and their implications for Natural Language Processing (NLP). We argue that understanding human communication requires a more holistic approach that goes beyond textual or spoken words to include non-verbal elements. Borrowing from advances in sign language processing, we propose the development of universal automatic gesture segmentation and transcription models to transcribe these non-verbal cues into textual form. Such a methodology aims to bridge the blind spots in spoken language understanding, enhancing the scope and applicability of NLP models. Through motivating examples, we demonstrate the limitations of relying solely on text-based models. We propose a computationally efficient and flexible approach for incorporating non-verbal cues, which can seamlessly integrate with existing NLP pipelines. We conclude by calling upon the research community to contribute to the development of universal transcription methods and to validate their effectiveness in capturing the complexities of real-world, multi-modal interactions.

摘要
Translation notes:* "co-speech gestures" is translated as "同时手势" (tóngshí shǒu yì)* "facial expressions" is translated as "面孔表达" (miànkǒu biǎodòng)* "non-verbal cues" is translated as "非语言表达" (fēi yǔyán biǎodòng)* "spoken language understanding" is translated as "语言理解" (yǔyán lǐjiě)* "NLP" is translated as "自然语言处理" (zìrán yǔyán chùhē)* "universal automatic gesture segmentation and transcription models" is translated as "通用自动手势分割和译写模型" (tōngyòng zìdòng zhìyì fēnpièceshì yìyì módelì)* "computationally efficient and flexible approach" is translated as "高效灵活的方法" (gāoxìng língwù de fāngtiě)* "real-world, multi-modal interactions" is translated as "实际多模态互动" (shíjiè duōmódai yùdòng)

Aligning Large Language Models for Clinical Tasks

paper_url: http://arxiv.org/abs/2309.02884
repo_url: https://github.com/ssm123ssm/medGPT
paper_authors: Supun Manathunga, Isuru Hettigoda
for: 这个论文是为了探讨大语言模型（LLMs）在医疗应用中的适用性和效果。
methods: 这篇论文使用了一种组合技术，包括指令调整和在提示中使用少量和链式思维技巧，以提高 LLMS 的表现。
results: 这篇论文的实验结果表明，使用“扩展-猜测-调整”策略可以提高 LLMS 的表现，在一个子集问题来源于 USMLE 数据集上达到了 70.63% 的分数。

Abstract
Large Language Models (LLMs) have demonstrated remarkable adaptability, showcasing their capacity to excel in tasks for which they were not explicitly trained. However, despite their impressive natural language processing (NLP) capabilities, effective alignment of LLMs remains a crucial challenge when deploying them for specific clinical applications. The ability to generate responses with factually accurate content and to engage in non-trivial reasoning steps are crucial for the LLMs to be eligible for applications in clinical medicine. Employing a combination of techniques including instruction-tuning and in-prompt strategies like few-shot and chain-of-thought prompting has significantly enhanced the performance of LLMs. Our proposed alignment strategy for medical question-answering, known as 'expand-guess-refine', offers a parameter and data-efficient solution. A preliminary analysis of this method demonstrated outstanding performance, achieving a score of 70.63% on a subset of questions sourced from the USMLE dataset.

摘要

Agent-based simulation of pedestrians’ earthquake evacuation; application to Beirut, Lebanon

paper_url: http://arxiv.org/abs/2309.02812
repo_url: None
paper_authors: Rouba Iskandar, Kamel Allaw, Julie Dugdale, Elise Beck, Jocelyne Adjizian-Gérard, Cécile Cornou, Jacques Harb, Pascal Lacroix, Nada Badaro-Saliba, Stéphane Cartier, Rita Zaarour
for: 本研究旨在发展一种城市规模下的人行模拟器，以估算在地震时人员的迁徙和避险行为。
methods: 该模型 integrate了地震风险、物理抵触以及个人行为和 mobilty。使用GAMA进行高度实际的城市环境模拟，并使用过去的数据（包括建筑和土壤特性）以及新采集的高分辨率卫星图像数据来支持模型。
results: 研究发现，在地震时，人员可以快速迁徙到开放空间中，但是如果一些开放空间被锁死， then 52%的人口可以在5分钟内到达开放空间，而只有39%的人口可以到达开放空间。这些结果表明，城市中开放空间的存在和距离 residence 建筑物是关键的因素，以确保人员的安全性。

Abstract
Most seismic risk assessment methods focus on estimating the damages to the built environment and the consequent socioeconomic losses without fully taking into account the social aspect of risk. Yet, human behaviour is a key element in predicting the human impact of an earthquake, therefore, it is important to include it in quantitative risk assessment studies. In this study, an interdisciplinary approach simulating pedestrians' evacuation during earthquakes at the city scale is developed using an agent-based model. The model integrates the seismic hazard, the physical vulnerability as well as individuals' behaviours and mobility. The simulator is applied to the case of Beirut, Lebanon. Lebanon is at the heart of the Levant fault system that has generated several Mw>7 earthquakes, the latest being in 1759. It is one of the countries with the highest seismic risk in the Mediterranean region. This is due to the high seismic vulnerability of the buildings due to the absence of mandatory seismic regulation until 2012, the high level of urbanization, and the lack of adequate spatial planning and risk prevention policies. Beirut as the main residential, economic and institutional hub of Lebanon is densely populated. To accommodate the growing need for urban development, constructions have almost taken over all of the green areas of the city; squares and gardens are disappearing to give place to skyscrapers. However, open spaces are safe places to shelter, away from debris, and therefore play an essential role in earthquake evacuation. Despite the massive urbanization, there are a few open spaces but locked gates and other types of anthropogenic barriers often limit their access. To simulate this complex context, pedestrians' evacuation simulations are run in a highly realistic spatial environment implemented in GAMA [1]. Previous data concerning soil and buildings in Beirut [2, 3] are complemented by new geographic data extracted from high-resolution Pleiades satellite images. The seismic loading is defined as a peak ground acceleration of 0.3g, as stated in Lebanese seismic regulations. Building damages are estimated using an artificial neural network trained to predict the mean damage [4] based on the seismic loading as well as the soil and building vibrational properties [5]. Moreover, the quantity and the footprint of the generated debris around each building are also estimated and included in the model. We simulate how topography, buildings, debris, and access to open spaces, affect individuals' mobility. Two city configurations are implemented: 1. Open spaces are accessible without any barriers; 2. Access to some open spaces is blocked. The first simulation results show that while 52% of the population is able to arrive to an open space within 5 minutes after an earthquake, this number is reduced to 39% when one of the open spaces is locked. These results show that the presence of accessible open spaces in a city and their proximity to the residential buildings is a crucial factor for ensuring people's safety when an earthquake occurs.

摘要
多数地震风险评估方法都是估计建筑环境和后果的经济损害，未能充分考虑人类行为的影响。然而，人类行为是预测地震影响的关键因素，因此应包括其在量化风险评估研究中。本研究提出了一种涉及人类行为的地震风险评估模型，使用代理人模型来模拟步行者在地震中的逃生行为。该模型结合地震威胁、物理抵触以及个人行为和 mobilit。该模型在黎巴嫩apply于 случа子中。黎巴嫩位于地中海地震系统的中心，拥有许多Mw>7的地震，最近一次在1759年。黎巴嫩是地中海地区最高风险地震国家之一，这主要归功于建筑物的高度抵触、城市化水平高，以及缺乏合适的城市规划和风险预防政策。黎巴嫩作为国家的主要居住、经济和机构中心，人口密度非常高。为了满足城市发展的增长需求，建筑物几乎占用了所有绿地，广场和花园都被拆除，以建立高层楼房。然而，开放空间是避险的好地方，避免废墟的影响，因此在地震逃生中扮演着关键角色。尽管黎巴嫩城市化程度很高，但开放空间受到人工障碍的限制。为了模拟这种复杂的Context，我们在GAMA中运行了步行者逃生的高度现实主义空间环境。前一个数据集包括黎巴嫩的土壤和建筑物的特性，而新的地理数据则是从高分辨率的Pleiades卫星图像提取的。地震荷重定义为0.3g，根据黎巴嫩的地震规则。建筑物损害预测使用人工神经网络，基于地震荷重以及土壤和建筑物的振荡性能。此外，生成的废墟的质量和面积也被计算并包括在模型中。我们模拟了城市地形、建筑物、废墟和访问开放空间之间的相互作用，对人类 mobilit的影响。我们实现了两种城市配置：1. 开放空间无障碍访问; 2. 一些开放空间受阻。第一个 simulations 结果显示，当地震发生后，52%的人口可以在5分钟内到达开放空间，但当一个开放空间受阻时，这个数字降低到39%。这些结果表明，城市中开放空间的存在和距离居民建筑物的距离是预测人类安全的关键因素。

GRASS: Unified Generation Model for Speech-to-Semantic Tasks

paper_url: http://arxiv.org/abs/2309.02780
repo_url: None
paper_authors: Aobo Xia, Shuyu Lei, Yushu Yang, Xiang Guo, Hua Chai
for: 这 paper 探讨了对 speech-to-semantic 任务的 instruction fine-tuning 技术，并提出了一个 unified end-to-end (E2E) 框架，该框架可以根据任务相关的提示生成目标文本 conditioned on 音频数据。
methods: 作者采用了大量和多样的数据预训练模型，并使用 text-to-speech (TTS) 系统生成 instruction-speech 对。
results: 对多个 benchmark 进行了广泛的实验，显示了作者提出的模型在 speech named entity recognition, speech sentiment analysis, speech question answering 等任务上具有state-of-the-art (SOTA) 性能，并且在 zero-shot 和 few-shot enario 中也达到了竞争性能。

Abstract
This paper explores the instruction fine-tuning technique for speech-to-semantic tasks by introducing a unified end-to-end (E2E) framework that generates target text conditioned on a task-related prompt for audio data. We pre-train the model using large and diverse data, where instruction-speech pairs are constructed via a text-to-speech (TTS) system. Extensive experiments demonstrate that our proposed model achieves state-of-the-art (SOTA) results on many benchmarks covering speech named entity recognition, speech sentiment analysis, speech question answering, and more, after fine-tuning. Furthermore, the proposed model achieves competitive performance in zero-shot and few-shot scenarios. To facilitate future work on instruction fine-tuning for speech-to-semantic tasks, we release our instruction dataset and code.

摘要
这篇论文探讨了对语音到 semantic 任务的指令细化技术，通过引入一个综合的终端到终端 (E2E) 框架，将目标文本根据任务相关的提示生成于语音数据上。我们在大量和多样的数据上预训练模型，使用文本到语音 (TTS) 系统生成 instrucion-speech 对。广泛的实验证明我们提出的模型在许多 bencmark 上实现了状态的最佳 (SOTA) 结果，并在零shot 和几shot enario 中达到了竞争性的性能。此外，我们的模型还能够在零shot 和几shot enario 中实现了竞争性的性能。为将来的 instruction 细化任务的研究提供便利，我们公开了我们的 instruction 数据集和代码。

Improving Code Generation by Dynamic Temperature Sampling

paper_url: http://arxiv.org/abs/2309.02772
repo_url: https://github.com/chrisneagu/FTC-Skystone-Dark-Angels-Romania-2020
paper_authors: Yuqi Zhu, Jia Allen Li, Ge Li, YunFei Zhao, Jia Li, Zhi Jin, Hong Mei
for: 研究了一种针对代码生成的特有decoding策略，以优化现有的大语言模型（LLM）在代码生成中的性能。
methods: 通过分析代码token的损失分布，发现代码token可以分为两类：困难的token和自信的token。采用适应温度（AdapT）采样方法，动态调整温度系数在不同的token时。
results: 对不同大小的LLM进行应用，在两个流行的数据集上进行评估，结果表明适应温度采样方法可以明显超过现有的decoding策略。

Abstract
Recently, Large Language Models (LLMs) have shown impressive results in code generation. However, existing decoding strategies are designed for Natural Language (NL) generation, overlooking the differences between NL and programming languages (PL). Due to this oversight, a better decoding strategy for code generation remains an open question. In this paper, we conduct the first systematic study to explore a decoding strategy specialized in code generation. With an analysis of loss distributions of code tokens, we find that code tokens can be divided into two categories: challenging tokens that are difficult to predict and confident tokens that can be easily inferred. Among them, the challenging tokens mainly appear at the beginning of a code block. Inspired by the above findings, we propose a simple yet effective method: Adaptive Temperature (AdapT) sampling, which dynamically adjusts the temperature coefficient when decoding different tokens. We apply a larger temperature when sampling for challenging tokens, allowing LLMs to explore diverse choices. We employ a smaller temperature for confident tokens avoiding the influence of tail randomness noises. We apply AdapT sampling to LLMs with different sizes and conduct evaluations on two popular datasets. Results show that AdapT sampling significantly outperforms state-of-the-art decoding strategy.

摘要
Through an analysis of loss distributions of code tokens, we find that code tokens can be divided into two categories: challenging tokens that are difficult to predict and confident tokens that can be easily inferred. The challenging tokens mainly appear at the beginning of a code block. Inspired by these findings, we propose a simple yet effective method called Adaptive Temperature (AdapT) sampling, which dynamically adjusts the temperature coefficient when decoding different tokens. We apply a larger temperature when sampling for challenging tokens, allowing LLMs to explore diverse choices. We employ a smaller temperature for confident tokens to avoid the influence of tail randomness noises.We apply AdapT sampling to LLMs with different sizes and conduct evaluations on two popular datasets. The results show that AdapT sampling significantly outperforms state-of-the-art decoding strategies.

HAE-RAE Bench: Evaluation of Korean Knowledge in Language Models

paper_url: http://arxiv.org/abs/2309.02706
repo_url: None
paper_authors: Guijin Son, Hanwool Lee, Suwan Kim, Huiseo Kim, Jaecheol Lee, Je Won Yeom, Jihyu Jung, Jung Woo Kim, Songseong Kim
for: 本研究旨在评估大语言模型（LLMs）在韩语言和文化中的表现，以及语言特定模型（LLSMs）在语言知识领域的可行性。
methods: 本研究使用了6个任务，包括词汇、历史和通用知识，以评估语言模型在不同领域的表现。
results: 研究发现，使用专门为韩语言而设计的LLSMs可以与大型语言模型GPT-3.5相比，在语言特定知识领域具有相似的表现水平，而且这些模型比GPT-3.5更小。然而，当这些小型LMs被要求生成结构化答案时，它们却表现出很大的性能下降。

Abstract
Large Language Models (LLMs) pretrained on massive corpora exhibit remarkable capabilities across a wide range of tasks, however, the attention given to non-English languages has been limited in this field of research. To address this gap and assess the proficiency of language models in the Korean language and culture, we present HAE-RAE Bench, covering 6 tasks including vocabulary, history, and general knowledge. Our evaluation of language models on this benchmark highlights the potential advantages of employing Large Language-Specific Models(LLSMs) over a comprehensive, universal model like GPT-3.5. Remarkably, our study reveals that models approximately 13 times smaller than GPT-3.5 can exhibit similar performance levels in terms of language-specific knowledge retrieval. This observation underscores the importance of homogeneous corpora for training professional-level language-specific models. On the contrary, we also observe a perplexing performance dip in these smaller LMs when they are tasked to generate structured answers.

摘要
大型语言模型（LLM）在庞大资料库中显示出惊人的能力，但对非英语语言的研究却受到了限制。为了填补这个隔阂并评估语言模型在韩语言和文化中的水平，我们提出了HAE-RAE Bench，包括6个任务，如词汇、历史和通用知识。我们对这个benchmark进行了语言模型的评估，并发现使用专门为某种语言而设计的大型语言模型（LLSM）比普通的大型语言模型GPT-3.5更有优势。另外，我们发现使用比GPT-3.5更小的模型可以达到类似的性能水平，这显示了专门针对某种语言的训练数据的重要性。然而，我们也发现这些更小的模型在生成结构化答案时表现出异常低的性能。

A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models

paper_url: http://arxiv.org/abs/2309.02691
repo_url: https://github.com/lil-lab/phrase_grounding
paper_authors: Noriyuki Kojima, Hadar Averbuch-Elor, Yoav Artzi
for: 本研究旨在研究自然语言理解中的视觉上下文中的语言抽象，即将语言抽象与图像区域相关联。
methods: 本研究使用了一种混合的方法，包括任务性能和语言抽象的联合研究，以及三个 benchmark 来研究语言抽象和任务之间的关系。
results: 研究结果表明，当代模型在语言抽象和任务解决方面存在不一致性，可以通过对语言抽象的粗粒训练来解决这一问题。

Abstract
Key to tasks that require reasoning about natural language in visual contexts is grounding words and phrases to image regions. However, observing this grounding in contemporary models is complex, even if it is generally expected to take place if the task is addressed in a way that is conductive to generalization. We propose a framework to jointly study task performance and phrase grounding, and propose three benchmarks to study the relation between the two. Our results show that contemporary models demonstrate inconsistency between their ability to ground phrases and solve tasks. We show how this can be addressed through brute-force training on ground phrasing annotations, and analyze the dynamics it creates. Code and at available at https://github.com/lil-lab/phrase_grounding.

摘要
translate("Key to tasks that require reasoning about natural language in visual contexts is grounding words and phrases to image regions. However, observing this grounding in contemporary models is complex, even if it is generally expected to take place if the task is addressed in a way that is conductive to generalization. We propose a framework to jointly study task performance and phrase grounding, and propose three benchmarks to study the relation between the two. Our results show that contemporary models demonstrate inconsistency between their ability to ground phrases and solve tasks. We show how this can be addressed through brute-force training on ground phrasing annotations, and analyze the dynamics it creates. Code and at available at https://github.com/lil-lab/phrase_grounding.")Here's the translation:键在需要基于自然语言的视觉上进行推理的任务中是将单词和短语与图像区域相关联。然而，现代模型中的这种相关性观察是复杂的，即使它们在一种通用化的方式下进行处理。我们提出了一个框架，用于同时研究任务性能和短语相关性，并提出了三个benchmark来研究这两者之间的关系。我们的结果显示，当今的模型在短语相关性和任务解决方面存在不一致性。我们展示了通过简单的粗略训练ground phrasing标注来解决这个问题，并分析了这种动态的创建。代码和数据可以在https://github.com/lil-lab/phrase_grounding上获取。

Implicit Design Choices and Their Impact on Emotion Recognition Model Development and Evaluation

paper_url: http://arxiv.org/abs/2309.03238
repo_url: None
paper_authors: Mimansa Jaiswal
for: 本论文的目的是提高情感认知的准确性和可靠性，探讨了多种情感认知领域中的挑战。
methods: 本研究使用了多种方法，包括数据采集、数据增强、注释分析和隐藏变量的检测等，以提高情感认知模型的准确性和可靠性。
results: 本研究的结果表明，采用控制压力等方法可以更好地反映真实的情感生成过程，并且可以避免因注释标签的主观性而导致的偏见。此外，本研究还提出了一种新的评价指标，可以更好地衡量情感认知模型的性能。

Abstract
Emotion recognition is a complex task due to the inherent subjectivity in both the perception and production of emotions. The subjectivity of emotions poses significant challenges in developing accurate and robust computational models. This thesis examines critical facets of emotion recognition, beginning with the collection of diverse datasets that account for psychological factors in emotion production. To handle the challenge of non-representative training data, this work collects the Multimodal Stressed Emotion dataset, which introduces controlled stressors during data collection to better represent real-world influences on emotion production. To address issues with label subjectivity, this research comprehensively analyzes how data augmentation techniques and annotation schemes impact emotion perception and annotator labels. It further handles natural confounding variables and variations by employing adversarial networks to isolate key factors like stress from learned emotion representations during model training. For tackling concerns about leakage of sensitive demographic variables, this work leverages adversarial learning to strip sensitive demographic information from multimodal encodings. Additionally, it proposes optimized sociological evaluation metrics aligned with cost-effective, real-world needs for model testing. This research advances robust, practical emotion recognition through multifaceted studies of challenges in datasets, labels, modeling, demographic and membership variable encoding in representations, and evaluation. The groundwork has been laid for cost-effective, generalizable emotion recognition models that are less likely to encode sensitive demographic information.

摘要
emotional recognition 是一项复杂的任务，因为情感的感知和生产都具有主观性。这种主观性对计算机模型的开发带来了重大挑战。这个论文探讨了情感认知的重要方面，包括收集多样化的数据集，以 compte for psychological factors in emotion production。为了 Address non-representative 的训练数据问题，这个研究收集了 Multimodal Stressed Emotion 数据集，该数据集在数据收集时引入了控制的压力因素，以更好地表示真实世界中的影响。另外，这个研究还对数据增强技术和注释方案的影响进行了全面分析，以及模型中的人员变量编码。此外，它还使用了对抗网络来隔离学习过程中的压力因素，以避免模型中的敏感人员变量编码。此外，这个研究还提出了优化的社会学评价指标，以满足实际的成本效益需求。这些指标可以用于评价模型的性能，并且可以帮助避免模型中的敏感人员变量编码。总之，这个研究通过多方面的研究，提出了一些可行的方法来解决情感认知 task 中的挑战。这些方法可以帮助开发出可靠、实用的情感认知模型，并避免模型中的敏感人员变量编码。

Zero-Resource Hallucination Prevention for Large Language Models

paper_url: http://arxiv.org/abs/2309.02654
repo_url: None
paper_authors: Junyu Luo, Cao Xiao, Fenglong Ma
for: 这篇论文的目的是为了减少大语言模型（LLM）中的“幻视”现象，即模型生成的信息中包含无根据或错误的资讯。methods: 这篇论文使用了一种新的预先检测自我识别技术，称为“自我熟悉度”（SELF-FAMILIARITY），它评估模型对输入指令中的概念的熟悉程度，并在缺乏熟悉的情况下停止生成回答。results: 这篇论文的结果显示，使用 SELF-FAMILIARITY 技术可以与现有的方法相比，实现更高的检测性和可靠性，并且可以提高模型的解释性和应用性。

Abstract
The prevalent use of large language models (LLMs) in various domains has drawn attention to the issue of "hallucination," which refers to instances where LLMs generate factually inaccurate or ungrounded information. Existing techniques for hallucination detection in language assistants rely on intricate fuzzy, specific free-language-based chain of thought (CoT) techniques or parameter-based methods that suffer from interpretability issues. Additionally, the methods that identify hallucinations post-generation could not prevent their occurrence and suffer from inconsistent performance due to the influence of the instruction format and model style. In this paper, we introduce a novel pre-detection self-evaluation technique, referred to as SELF-FAMILIARITY, which focuses on evaluating the model's familiarity with the concepts present in the input instruction and withholding the generation of response in case of unfamiliar concepts. This approach emulates the human ability to refrain from responding to unfamiliar topics, thus reducing hallucinations. We validate SELF-FAMILIARITY across four different large language models, demonstrating consistently superior performance compared to existing techniques. Our findings propose a significant shift towards preemptive strategies for hallucination mitigation in LLM assistants, promising improvements in reliability, applicability, and interpretability.

摘要
现在的大语言模型（LLM）在各个领域的普遍使用已引起“幻觉”的问题的注意。幻觉指的是LLM生成的信息中存在虚假或不准确的内容。现有的幻觉检测技术在语言助手中使用复杂的朗文、特定的自由语言链条件（CoT）技术或参数基本方法，但这些方法受到解释性问题的困扰。另外，这些方法只能在生成后进行检测，并且因为指令格式和模型风格的影响而具有不稳定性。在这篇论文中，我们提出了一种新的预测检测技术，称为自己熟悉性（SELF-FAMILIARITY），它是根据输入指令中的概念 Familiarity 来评估模型的熟悉度。如果模型对概念不熟悉，则避免生成响应。这种方法模仿人类在不熟悉话题时的行为，从而减少幻觉。我们在四个不同的大语言模型上验证了SELF-FAMILIARITY，并证明其性能与现有技术相比具有显著优势。我们的发现提出了一种重要的转移，即采用预防策略来mitigate幻觉在LLM助手中，这将提高可靠性、可用性和解释性。

Epi-Curriculum: Episodic Curriculum Learning for Low-Resource Domain Adaptation in Neural Machine Translation

paper_url: http://arxiv.org/abs/2309.02640
repo_url: None
paper_authors: Keyu Chen, Di Zhuang, Mingchen Li, J. Morris Chang
for: 提高 Neural Machine Translation (NMT) 模型在新领域下的性能，尤其是在有限数据情况下。
methods: 提出了一种新的 episodic training 框架，以及一种denoised curriculum learning 技术，以增强模型对频率变化的适应力。
results: 实验表明，Epi-Curriculum 可以提高模型在 seen 和 unseen 频率下的性能，并且可以增强encoder 和 decoder 对频率变化的适应力。

Abstract
Neural Machine Translation (NMT) models have become successful, but their performance remains poor when translating on new domains with a limited number of data. In this paper, we present a novel approach Epi-Curriculum to address low-resource domain adaptation (DA), which contains a new episodic training framework along with denoised curriculum learning. Our episodic training framework enhances the model's robustness to domain shift by episodically exposing the encoder/decoder to an inexperienced decoder/encoder. The denoised curriculum learning filters the noised data and further improves the model's adaptability by gradually guiding the learning process from easy to more difficult tasks. Experiments on English-German and English-Romanian translation show that: (i) Epi-Curriculum improves both model's robustness and adaptability in seen and unseen domains; (ii) Our episodic training framework enhances the encoder and decoder's robustness to domain shift.

摘要
神经机器翻译（NMT）模型已经成功，但其性能在新领域 WITH 有限数据时仍然不佳。在这篇论文中，我们提出了一种新的方法：Epi-Curriculum，用于解决低资源领域适应（DA）问题。我们的 episodic 训练框架可以增强模型对频率转换的抵抗力，通过在不熟悉的decoder/encoder中进行 episodic 训练。另外，我们的denoised curriculum learning 技术可以进一步提高模型的适应性，通过逐渐引导学习过程从容易到更加difficult任务。我们在英语-德语和英语-罗马尼亚翻译 задании进行了实验，结果表明：1. Epi-Curriculum 可以提高模型在seen和unseen领域中的 robustness 和适应性。2. 我们的 episodic 训练框架可以增强encoder和decoder对频率转换的抵抗力。

2023-09-06

cs.LG

cs.LG - 2023-09-06

Ensemble linear interpolators: The role of ensembling

paper_url: http://arxiv.org/abs/2309.03354
repo_url: None
paper_authors: Mingqi Wu, Qiang Sun
for: 这个论文研究了 ensemble 如何稳定化和提高泛化性能，特别是在 dealing with noisy data 时。
methods: 这个论文使用了 bagging 和 multiplier-bootstrap 等随机化方法来实现 ensemble。
results: 研究发现，bagging 可以有效地减少 interpolator 的偏差和噪声，从而提高泛化性能。此外，研究还发现了 sketching 和 bagging 在不同参数空间下的统计作用。

Abstract
Interpolators are unstable. For example, the mininum $\ell_2$ norm least square interpolator exhibits unbounded test errors when dealing with noisy data. In this paper, we study how ensemble stabilizes and thus improves the generalization performance, measured by the out-of-sample prediction risk, of an individual interpolator. We focus on bagged linear interpolators, as bagging is a popular randomization-based ensemble method that can be implemented in parallel. We introduce the multiplier-bootstrap-based bagged least square estimator, which can then be formulated as an average of the sketched least square estimators. The proposed multiplier bootstrap encompasses the classical bootstrap with replacement as a special case, along with a more intriguing variant which we call the Bernoulli bootstrap. Focusing on the proportional regime where the sample size scales proportionally with the feature dimensionality, we investigate the out-of-sample prediction risks of the sketched and bagged least square estimators in both underparametrized and overparameterized regimes. Our results reveal the statistical roles of sketching and bagging. In particular, sketching modifies the aspect ratio and shifts the interpolation threshold of the minimum $\ell_2$ norm estimator. However, the risk of the sketched estimator continues to be unbounded around the interpolation threshold due to excessive variance. In stark contrast, bagging effectively mitigates this variance, leading to a bounded limiting out-of-sample prediction risk. To further understand this stability improvement property, we establish that bagging acts as a form of implicit regularization, substantiated by the equivalence of the bagged estimator with its explicitly regularized counterpart. We also discuss several extensions.

摘要
interpolators are unstable. for example, the minimum $\ell_2$ norm least square interpolator exhibits unbounded test errors when dealing with noisy data. in this paper, we study how ensemble stabilizes and thus improves the generalization performance, measured by the out-of-sample prediction risk, of an individual interpolator. we focus on bagged linear interpolators, as bagging is a popular randomization-based ensemble method that can be implemented in parallel. we introduce the multiplier-bootstrap-based bagged least square estimator, which can then be formulated as an average of the sketched least square estimators. the proposed multiplier bootstrap encompasses the classical bootstrap with replacement as a special case, along with a more intriguing variant which we call the Bernoulli bootstrap. focusing on the proportional regime where the sample size scales proportionally with the feature dimensionality, we investigate the out-of-sample prediction risks of the sketched and bagged least square estimators in both underparametrized and overparameterized regimes. our results reveal the statistical roles of sketching and bagging. in particular, sketching modifies the aspect ratio and shifts the interpolation threshold of the minimum $\ell_2$ norm estimator. however, the risk of the sketched estimator continues to be unbounded around the interpolation threshold due to excessive variance. in stark contrast, bagging effectively mitigates this variance, leading to a bounded limiting out-of-sample prediction risk. to further understand this stability improvement property, we establish that bagging acts as a form of implicit regularization, substantiated by the equivalence of the bagged estimator with its explicitly regularized counterpart. we also discuss several extensions.

Robotic Table Tennis: A Case Study into a High Speed Learning System

paper_url: http://arxiv.org/abs/2309.03315
repo_url: None
paper_authors: David B. D’Ambrosio, Jonathan Abelian, Saminda Abeyruwan, Michael Ahn, Alex Bewley, Justin Boyd, Krzysztof Choromanski, Omar Cortes, Erwin Coumans, Tianli Ding, Wenbo Gao, Laura Graesser, Atil Iscen, Navdeep Jaitly, Deepali Jain, Juhana Kangaspunta, Satoshi Kataoka, Gus Kouretas, Yuheng Kuang, Nevena Lazic, Corey Lynch, Reza Mahjourian, Sherry Q. Moore, Thinh Nguyen, Ken Oslund, Barney J Reed, Krista Reymann, Pannag R. Sanketi, Anish Shankar, Pierre Sermanet, Vikas Sindhwani, Avi Singh, Vincent Vanhoucke, Grace Vesom, Peng Xu
for: 本研究探讨了一个真实世界机器人学习系统，该系统在前一项研究中已经能够与人类进行数百次网球比赛，并且可以准确返回球到 désiré 目标。
methods: 该系统使用了优化的感知子系统、高速低延迟的机器人控制器、可预防实际世界损害的模拟平台，以及自动化的实际世界环境重置，以便在物理机器人上进行自主训练和评估。
results: 研究人员通过详细描述系统的全部设计决策，以及一系列的研究，解释了mitigate多种延迟的重要性、训练和部署分布shift的考虑、感知系统的Robustness、策略参数的敏感性和行动空间的选择。视频展示了系统的组件和实验结果的细节，可以在https://youtu.be/uFcnWjB42I0找到。

Abstract
We present a deep-dive into a real-world robotic learning system that, in previous work, was shown to be capable of hundreds of table tennis rallies with a human and has the ability to precisely return the ball to desired targets. This system puts together a highly optimized perception subsystem, a high-speed low-latency robot controller, a simulation paradigm that can prevent damage in the real world and also train policies for zero-shot transfer, and automated real world environment resets that enable autonomous training and evaluation on physical robots. We complement a complete system description, including numerous design decisions that are typically not widely disseminated, with a collection of studies that clarify the importance of mitigating various sources of latency, accounting for training and deployment distribution shifts, robustness of the perception system, sensitivity to policy hyper-parameters, and choice of action space. A video demonstrating the components of the system and details of experimental results can be found at https://youtu.be/uFcnWjB42I0.

摘要
我们提供了一个深入探讨真实世界机器人学习系统的文章，之前的研究已经证明该系统可以与人类进行数百个桌球赛，并且可以准确返回杯子到愿意的目标上。这个系统结合了高度优化的感知子系统，高速低延迟的机器人控制器，可以防止实际世界中的损害并用于零战斗训练和评估，以及自动化的实际环境重置。我们补充了完整的系统描述，包括多个设计决策，通常不宜公开，以及一系列研究，解释了减少各种延迟的重要性，训练和部署分布shift的考虑，感知系统的稳定性，策略参数的敏感性，和动作空间的选择。一个详细展示系统组件和实验结果的视频可以在https://youtu.be/uFcnWjB42I0找到。

Scalable Learning of Intrusion Responses through Recursive Decomposition

paper_url: http://arxiv.org/abs/2309.03292
repo_url: None
paper_authors: Kim Hammar, Rolf Stadler
for: 本研究旨在自动化网络入侵响应，以提高IT基础设施的安全性。
methods: 本研究使用了半观察游戏理论和自适应学习来解决攻击者和防御者之间的互动。为解决大规模游戏的计算复杂性问题，我们提出了一种将游戏划分成多个子游戏的方法，并使用优化停止理论来计算最佳回应策略。
results: 我们在一个模拟环境中评估了学习的策略，并发现它们可以准确地模拟攻击者和防御者之间的互动。相比之下，一种现有的算法在一个真实的基础设施配置下表现较差。

Abstract
We study automated intrusion response for an IT infrastructure and formulate the interaction between an attacker and a defender as a partially observed stochastic game. To solve the game we follow an approach where attack and defense strategies co-evolve through reinforcement learning and self-play toward an equilibrium. Solutions proposed in previous work prove the feasibility of this approach for small infrastructures but do not scale to realistic scenarios due to the exponential growth in computational complexity with the infrastructure size. We address this problem by introducing a method that recursively decomposes the game into subgames which can be solved in parallel. Applying optimal stopping theory we show that the best response strategies in these subgames exhibit threshold structures, which allows us to compute them efficiently. To solve the decomposed game we introduce an algorithm called Decompositional Fictitious Self-Play (DFSP), which learns Nash equilibria through stochastic approximation. We evaluate the learned strategies in an emulation environment where real intrusions and response actions can be executed. The results show that the learned strategies approximate an equilibrium and that DFSP significantly outperforms a state-of-the-art algorithm for a realistic infrastructure configuration.

摘要
我们研究自动化入侵应急回应概念，将攻击者和防御者之间的互动视为一个部分可见随机游戏。为解决这个游戏，我们采用一种方法，其中攻击和防御策略通过反射学习和自我玩家的共演演化向Equilibrium。先前的解决方案虽有可能性，但由于基础设施规模的幂等增长，无法承受实际场景中的游戏。为解决这个问题，我们提出一种方法，即 recursively decomposing the game into subgames，可以并发解决。通过优质停止理论，我们表明攻击策略在这些子游戏中具有阈值结构，可以有效计算。为解决分解的游戏，我们提出了一种算法，即 Decompositional Fictitious Self-Play（DFSP），它通过随机approximation学习Nash equilibria。在模拟环境中，我们测试了学习到的策略，结果表明学习策略接近Equilibrium，并且DFSP在一个实际基础设施配置下显著超越了当前状态艺术算法。

R2D2: Deep neural network series for near real-time high-dynamic range imaging in radio astronomy

paper_url: http://arxiv.org/abs/2309.03291
repo_url: None
paper_authors: Aghabiglou A, Chu C S, Jackson A, Dabbech A, Wiaux Y
for: 这个论文是用于描述一种基于深度神经网络（DNN）和数据一致更新的新型人工智能（AI）方法，用于高分辨率高动态范围成像探测天文学中的电波天文学（RI）。methods: 该方法基于混合的DNN和数据一致更新，其重建是通过一系列的差分图像来实现，每个差分图像被视为DNN的输出。该方法可以看作是一种学习的匹配追踪方法，其中模型组件是通过差分尘埃图像来逐步确定。results: 在使用S射频的Cygnus~A观测数据上，R2D2模型可以提供高精度的成像，与CLEAN和AIRI/uSARA等其他算法相当。R2D2模型的计算效率远高于AIRI和uSARA，并且比CLEAN更快，这些成果都表明了R2D2模型在RI成像中的优势。

Abstract
We present a novel AI approach for high-resolution high-dynamic range synthesis imaging by radio interferometry (RI) in astronomy. R2D2, standing for "{R}esidual-to-{R}esidual {D}NN series for high-{D}ynamic range imaging", is a model-based data-driven approach relying on hybrid deep neural networks (DNNs) and data-consistency updates. Its reconstruction is built as a series of residual images estimated as the outputs of DNNs, each taking the residual dirty image of the previous iteration as an input. The approach can be interpreted as a learned version of a matching pursuit approach, whereby model components are iteratively identified from residual dirty images, and of which CLEAN is a well-known example. We propose two variants of the R2D2 model, built upon two distinctive DNN architectures: a standard U-Net, and a novel unrolled architecture. We demonstrate their use for monochromatic intensity imaging on highly-sensitive observations of the radio galaxy Cygnus~A at S band, from the Very Large Array (VLA). R2D2 is validated against CLEAN and the recent RI algorithms AIRI and uSARA, which respectively inject a learned implicit regularization and an advanced handcrafted sparsity-based regularization into the RI data. With only few terms in its series, the R2D2 model is able to deliver high-precision imaging, significantly superior to CLEAN and matching the precision of AIRI and uSARA. In terms of computational efficiency, R2D2 runs at a fraction of the cost of AIRI and uSARA, and is also faster than CLEAN, opening the door to real-time precision imaging in RI.

摘要
我们提出了一种新的人工智能方法，用于高解度高动态范围成像探测（RI）在天文学中。我们称之为“R2D2”，即“差异到差异的深度神经网络系列 для高动态范围成像”。这种方法基于混合深度神经网络（DNN）和数据一致更新。它的重建是建立为一系列的差异图像，每个差异图像被视为前一轮差异 dirty image 的输入。这种方法可以被视为一种学习的匹配追踪方法，其中模型元件是通过差异 dirty images 进行逐步确定。我们提出了两种基于不同 DNN 架构的 R2D2 模型：标准 U-Net 和一种新的折衣架构。我们示出了它们在单色强度成像中的应用，使用 Very Large Array（VLA）对天鹅星系 Cygnus A 进行了高敏感观测。R2D2 被证明比 CLEAN 和 reciently 的 RI 算法 AIRI 和 uSARA 更高精度，并且在计算效率方面也更高，能够在实时高精度成像中具有优势。

Let Quantum Neural Networks Choose Their Own Frequencies

paper_url: http://arxiv.org/abs/2309.03279
repo_url: None
paper_authors: Ben Jaderberg, Antonio A. Gentile, Youssef Achari Berrada, Elvira Shishenina, Vincent E. Elfving
for: 这篇论文旨在探讨 parameterized quantum circuit 作为机器学习模型，以及其中的代表性函数的幂展开。
methods: 作者使用了一种通过增加可训练参数来扩展传统固定频率模型的方法，以便学习更适合任务的生成器。
results: 作者通过数值示例表明，这种方法可以学习出具有恰当特性的生成器，包括非Regularly spaced 频谱和灵活的 spectral richness。此外，作者还实际应用了这种方法，并达到了解决 Navier-Stokes 方程的更高精度。

Abstract
Parameterized quantum circuits as machine learning models are typically well described by their representation as a partial Fourier series of the input features, with frequencies uniquely determined by the feature map's generator Hamiltonians. Ordinarily, these data-encoding generators are chosen in advance, fixing the space of functions that can be represented. In this work we consider a generalization of quantum models to include a set of trainable parameters in the generator, leading to a trainable frequency (TF) quantum model. We numerically demonstrate how TF models can learn generators with desirable properties for solving the task at hand, including non-regularly spaced frequencies in their spectra and flexible spectral richness. Finally, we showcase the real-world effectiveness of our approach, demonstrating an improved accuracy in solving the Navier-Stokes equations using a TF model with only a single parameter added to each encoding operation. Since TF models encompass conventional fixed frequency models, they may offer a sensible default choice for variational quantum machine learning.

摘要
Parameterized量子Circuits作为机器学习模型通常由其输入特征的 partial Fourier 系列描述，它们的频率唯一由特征映射生成器的 Hamiltonians Determined。通常情况下，这些数据编码生成器在先前选择，确定了可以表示的函数空间。在这项工作中，我们考虑了量子模型的总体化，包括在生成器中添加可学习参数，导致可学习频率（TF）量子模型。我们数值示示了TF模型可以学习符合任务的 generator 的恰当性，包括非 Regularly spaced 的频谱和灵活的 spectral richness。最后，我们展示了我们的方法的实际效果，通过在每个编码操作中添加单个参数来提高解决 Navier-Stokes 方程的精度。由于 TF 模型包括 fixede frequency 模型，它们可能成为变量量子机器学习的合理默认选择。

Matcha-TTS: A fast TTS architecture with conditional flow matching

paper_url: http://arxiv.org/abs/2309.03199
repo_url: https://github.com/shivammehta25/Matcha-TTS
paper_authors: Shivam Mehta, Ruibo Tu, Jonas Beskow, Éva Székely, Gustav Eje Henter
for: 这个研究是为了提出一个新的encoder-decoder架构，实现快速的语音处理模型，并且使用最佳运输汇流匹配（OT-CFM）进行训练。
methods: 这个模型使用了一个基于射影函数的数据驱动的decoder，并且通过调整特定的设计选择，使每个合成步骤的运行时间变得更加快速。这个模型是 probabilistic、非 autoregressive，并且从零学习说话。
results: 与对照模型相比，Matcha-TTS系统具有最小的内存占用量，在长语音上与最快的模型相当，并且在听力测试中获得了最高的意见分数。另外，这个系统还提供了一些audio例子、代码和预训练模型，请参考https://shivammehta25.github.io/Matcha-TTS/。

Abstract
We introduce Matcha-TTS, a new encoder-decoder architecture for speedy TTS acoustic modelling, trained using optimal-transport conditional flow matching (OT-CFM). This yields an ODE-based decoder capable of high output quality in fewer synthesis steps than models trained using score matching. Careful design choices additionally ensure each synthesis step is fast to run. The method is probabilistic, non-autoregressive, and learns to speak from scratch without external alignments. Compared to strong pre-trained baseline models, the Matcha-TTS system has the smallest memory footprint, rivals the speed of the fastest models on long utterances, and attains the highest mean opinion score in a listening test. Please see https://shivammehta25.github.io/Matcha-TTS/ for audio examples, code, and pre-trained models.

摘要
我们介绍Matcha-TTS，一个新的编码器-解码器架构，用于快速的干扰声模型训练，使用最佳交通流匹配（OT-CFM）。这个方法使得解码器能够在 fewer synthesis steps 中实现高质量输出，比较Score Matching 训练的模型更快。我们在设计时遵循了一些精心选择，使每个合成步骤都很快。Matcha-TTS 是一个 probabilistic 、非自律的系统，可以从零开始学习说话，不需要外部对齐。相比强大的预训练基eline模型，Matcha-TTS 系统具有最小的内存占用量，在长句子上的速度与最快的模型相当，并在聆听测试中获得了最高的意见分数。您可以参考获取音频示例、代码和预训练模型。

Blink: Link Local Differential Privacy in Graph Neural Networks via Bayesian Estimation

paper_url: http://arxiv.org/abs/2309.03190
repo_url: https://github.com/zhxchd/blink_gnn
paper_authors: Xiaochen Zhu, Vincent Y. F. Tan, Xiaokui Xiao
for: 防止训练 Graph Neural Networks (GNNs) 时提高隐私投入，使得 Collaborative Graph Learning 可以在不披露图structure的情况下进行。
methods: 使用 link local differential privacy over decentralized nodes，通过 Bayesian estimation 更好地减少图像噪声，使得 GNNs 的训练不受隐私影响。
results: 提出了两种隐私机制，可以根据不同的隐私预算来选择合适的机制，并提出了一种 hybrid 机制，可以在不同的隐私预算下表现更好。实验结果表明，我们的方法可以在不同的隐私预算下提高 GNNs 的准确性。

Abstract
Graph neural networks (GNNs) have gained an increasing amount of popularity due to their superior capability in learning node embeddings for various graph inference tasks, but training them can raise privacy concerns. To address this, we propose using link local differential privacy over decentralized nodes, enabling collaboration with an untrusted server to train GNNs without revealing the existence of any link. Our approach spends the privacy budget separately on links and degrees of the graph for the server to better denoise the graph topology using Bayesian estimation, alleviating the negative impact of LDP on the accuracy of the trained GNNs. We bound the mean absolute error of the inferred link probabilities against the ground truth graph topology. We then propose two variants of our LDP mechanism complementing each other in different privacy settings, one of which estimates fewer links under lower privacy budgets to avoid false positive link estimates when the uncertainty is high, while the other utilizes more information and performs better given relatively higher privacy budgets. Furthermore, we propose a hybrid variant that combines both strategies and is able to perform better across different privacy budgets. Extensive experiments show that our approach outperforms existing methods in terms of accuracy under varying privacy budgets.

摘要
GRAPH NEURAL NETWORKS (GNNs) 在不同的图像任务中学习节点嵌入显示出了不断增长的潜力，但是它们的训练可能会引起隐私问题。为了解决这问题，我们提出了基于图像的链地方 differential privacy（LDP），允许不可信服务器在训练 GNNs 时不 revela any 链的存在。我们的方法在服务器端分配隐私预算，并使用某些隐私技术来更好地降噪图像，从而减轻 LDP 对训练 GNNs 的影响。我们对推测的链probability bound the mean absolute error against the ground truth graph topology。我们还提出了两种不同的隐私设定下的LDP机制，一种优先采用 fewer links 来避免高度不确定性时的假阳性链估计，另一种则可以在更高的隐私预算下获得更好的性能。最后，我们提出了一种混合机制，可以在不同的隐私预算下实现更好的性能。我们的实验表明，我们的方法在不同的隐私预算下都能够超越现有的方法。

Impression-Informed Multi-Behavior Recommender System: A Hierarchical Graph Attention Approach

paper_url: http://arxiv.org/abs/2309.03169
repo_url: None
paper_authors: Dong Li, Divya Bhargavi, Vidya Sagar Ravipati
for: This paper aims to address the limitations of traditional recommender systems that rely solely on implicit feedback, such as item purchases, by incorporating multi-behavior interactions and hierarchical attention mechanisms to improve the accuracy of recommendations.
methods: The proposed Hierarchical Multi-behavior Graph Attention Network (HMGN) utilizes attention mechanisms to distinguish between different types of behaviors and hierarchical Bayesian personalized ranking for optimization. The model also incorporates a specialized multi-behavior sub-graph sampling technique and can seamlessly integrate knowledge metadata and time-series data.
results: The paper reports up to 64% performance boost in NDCG@100 metrics compared to conventional graph neural network methods, demonstrating the effectiveness of the proposed HMGN model in improving the accuracy of recommendations based on multi-behavior interactions.

Abstract
While recommender systems have significantly benefited from implicit feedback, they have often missed the nuances of multi-behavior interactions between users and items. Historically, these systems either amalgamated all behaviors, such as \textit{impression} (formerly \textit{view}), \textit{add-to-cart}, and \textit{buy}, under a singular 'interaction' label, or prioritized only the target behavior, often the \textit{buy} action, discarding valuable auxiliary signals. Although recent advancements tried addressing this simplification, they primarily gravitated towards optimizing the target behavior alone, battling with data scarcity. Additionally, they tended to bypass the nuanced hierarchy intrinsic to behaviors. To bridge these gaps, we introduce the \textbf{H}ierarchical \textbf{M}ulti-behavior \textbf{G}raph Attention \textbf{N}etwork (HMGN). This pioneering framework leverages attention mechanisms to discern information from both inter and intra-behaviors while employing a multi-task Hierarchical Bayesian Personalized Ranking (HBPR) for optimization. Recognizing the need for scalability, our approach integrates a specialized multi-behavior sub-graph sampling technique. Moreover, the adaptability of HMGN allows for the seamless inclusion of knowledge metadata and time-series data. Empirical results attest to our model's prowess, registering a notable performance boost of up to 64\% in NDCG@100 metrics over conventional graph neural network methods.

摘要
历史上，推荐系统往往忽略了用户和物品之间的多种互动行为的细节。这些系统可能会将所有行为都汇总到一个单一的“互动”标签下，或者仅仅优先级化目标行为（通常是购买行为），抛弃了有价值的辅助信号。虽然最近的进步尝试了解决这些简化，但它们主要是专注于优化目标行为，面临着数据缺乏问题。此外，它们往往忽略了行为之间的层次结构。为了bridging这些差距，我们介绍了一种新的 Hierarchical Multi-behavior Graph Attention Network (HMGN)。这种革新的框架利用了注意力机制，以便在多种互动行为之间找出信息，同时使用多任务的层次 bayesian个性化排序（HBPR）进行优化。我们的方法还 integrates 特殊的多种互动子图采样技术，以确保可扩展性。此外，HMGN 还可以轻松地包含知识metadata和时间序列数据。实验结果表明，我们的模型在 NDCG@100 指标上具有显著的性能提升，高达 64%，较 conventional graph neural network 方法更出色。

Learning to Recharge: UAV Coverage Path Planning through Deep Reinforcement Learning

paper_url: http://arxiv.org/abs/2309.03157
repo_url: https://github.com/theilem/uavSim
paper_authors: Mirco Theile, Harald Bayerlein, Marco Caccamo, Alberto L. Sangiovanni-Vincentelli
for: solving the power-constrained coverage path planning problem for battery-limited unmanned aerial vehicles (UAVs)
methods: using a novel proximal policy optimization (PPO)-based deep reinforcement learning (DRL) approach with map-based observations, action masking, and discount factor scheduling
results: outperforming a baseline heuristic, generalizing to different target zones and maps, with limited generalization to unseen maps.

Abstract
Coverage path planning (CPP) is a critical problem in robotics, where the goal is to find an efficient path that covers every point in an area of interest. This work addresses the power-constrained CPP problem with recharge for battery-limited unmanned aerial vehicles (UAVs). In this problem, a notable challenge emerges from integrating recharge journeys into the overall coverage strategy, highlighting the intricate task of making strategic, long-term decisions. We propose a novel proximal policy optimization (PPO)-based deep reinforcement learning (DRL) approach with map-based observations, utilizing action masking and discount factor scheduling to optimize coverage trajectories over the entire mission horizon. We further provide the agent with a position history to handle emergent state loops caused by the recharge capability. Our approach outperforms a baseline heuristic, generalizes to different target zones and maps, with limited generalization to unseen maps. We offer valuable insights into DRL algorithm design for long-horizon problems and provide a publicly available software framework for the CPP problem.

摘要

Data-Driven Neural Polar Codes for Unknown Channels With and Without Memory

paper_url: http://arxiv.org/abs/2309.03148
repo_url: None
paper_authors: Ziv Aharoni, Bashar Huleihel, Henry D. Pfister, Haim H. Permuter
for: 这个论文是为了设计具有内存和无内存通道的浮点码而提出的一种数据驱动方法。
methods: 这种方法利用了successive cancellation（SC）解码器的结构，通过将SC解码器的核心元素 replaced by neural networks（NNs）来设计一个名为 neural SC（NSC）解码器。此外，还增加了一个NN来嵌入通道输出到SC解码器的输入空间。
results: 该方法被证明有理论保证，其计算复杂度不随通道内存大小增长，与successive cancellation trellis（SCT）解码器不同。在memoryless通道和具有内存通道上，实验结果与最佳浮点解码器相比较好。此外，该方法还可以应用于SC和SCT解码器不可用的情况。

Abstract
In this work, a novel data-driven methodology for designing polar codes for channels with and without memory is proposed. The methodology is suitable for the case where the channel is given as a "black-box" and the designer has access to the channel for generating observations of its inputs and outputs, but does not have access to the explicit channel model. The proposed method leverages the structure of the successive cancellation (SC) decoder to devise a neural SC (NSC) decoder. The NSC decoder uses neural networks (NNs) to replace the core elements of the original SC decoder, the check-node, the bit-node and the soft decision. Along with the NSC, we devise additional NN that embeds the channel outputs into the input space of the SC decoder. The proposed method is supported by theoretical guarantees that include the consistency of the NSC. Also, the NSC has computational complexity that does not grow with the channel memory size. This sets its main advantage over successive cancellation trellis (SCT) decoder for finite state channels (FSCs) that has complexity of $O(|\mathcal{S}|^3 N\log N)$, where $|\mathcal{S}|$ denotes the number of channel states. We demonstrate the performance of the proposed algorithms on memoryless channels and on channels with memory. The empirical results are compared with the optimal polar decoder, given by the SC and SCT decoders. We further show that our algorithms are applicable for the case where there SC and SCT decoders are not applicable.

摘要
“在这项工作中，我们提出了一种新的数据驱动方法，用于设计极码 для无记忆和具有记忆的通道。这种方法适用于 Situation where the channel is given as a "black-box" and the designer has access to the channel for generating observations of its inputs and outputs, but does not have access to the explicit channel model。我们的方法利用了成功的极化缓冲（SC）解码器的结构，并使用神经网络（NN）来替换SC解码器的核心元素，包括检查节点、位节点和软决策。此外，我们还提出了一个额外的NN，用于将通道输出嵌入到SC解码器的输入空间中。我们的方法得到了理论保证，包括NSC的一致性，并且NSC的计算复杂度不随通道记忆大小增长。这个特点使得NSC在对于有限状态通道（FSC）上的计算复杂度为O($|\mathcal{S}|^3N\log N$),而SCT解码器的计算复杂度为O($|\mathcal{S}|^3N^2\log N$).我们在无记忆通道和具有记忆通道上对方法进行了实验，并与最佳极码解码器（SC和SCT解码器）进行了比较。我们还证明了我们的方法在SC和SCT解码器不可用的情况下也适用。”

The Best Arm Evades: Near-optimal Multi-pass Streaming Lower Bounds for Pure Exploration in Multi-armed Bandits

paper_url: http://arxiv.org/abs/2309.03145
repo_url: None
paper_authors: Sepehr Assadi, Chen Wang
for: 这个论文是为了解决多支枪（Multi-armed bandits）中的纯exploration问题，提出了一种near-optimal的样本传递贸易，即使使用流式算法并不需要较高的内存占用。
methods: 这个论文使用了流式算法，并且使用了优化的样本复杂度为$O(\frac{n}{\Delta^2}$，其中$n$是枪数和$\Delta$是奖励差值。
results: 这个论文得到了与 Jin et al. 的 ICML’21 论文（即使下个数）匹配的结果，即使用$O(1)$内存可以得到$O(\log(\frac{1}{\Delta}))$ 的pass结果，并解决了Assadi和Wang 在 STOC’20 上提出的一个开问。

Abstract
We give a near-optimal sample-pass trade-off for pure exploration in multi-armed bandits (MABs) via multi-pass streaming algorithms: any streaming algorithm with sublinear memory that uses the optimal sample complexity of $O(\frac{n}{\Delta^2})$ requires $\Omega(\frac{\log{(1/\Delta)}{\log\log{(1/\Delta)})$ passes. Here, $n$ is the number of arms and $\Delta$ is the reward gap between the best and the second-best arms. Our result matches the $O(\log(\frac{1}{\Delta}))$-pass algorithm of Jin et al. [ICML'21] (up to lower order terms) that only uses $O(1)$ memory and answers an open question posed by Assadi and Wang [STOC'20].

摘要
我们提供了一个近似优化的样本传递协议 для纯探索在多重抓枪（MAB）中，通过多重流程算法：任何流程算法具有线性内存使用最佳样本复杂度为$O(\frac{n}{\Delta^2})$，需要$\Omega(\frac{\log{(1/\Delta)}{\log\log{(1/\Delta)})$ passes。在这里，$n$ 是抓枪的数量，$\Delta$ 是最佳和第二最佳抓枪之间的奖励差。我们的结果与 Jin et al. 的 $O(\log(\frac{1}{\Delta}))$-pass算法（Up to lower order terms）相匹配，该算法只使用 $O(1)$ 内存，并回答了 Assadi 和 Wang 提出的问题（STOC'20）。

Using Multiple Vector Channels Improves E(n)-Equivariant Graph Neural Networks

paper_url: http://arxiv.org/abs/2309.03139
repo_url: None
paper_authors: Daniel Levy, Sékou-Oumar Kaba, Carmelo Gonzales, Santiago Miret, Siamak Ravanbakhsh
for: 这个论文是为了扩展E(n)-对称图 neural network，使其使用多个对称向量。
methods: 这个论文使用多个对称向量来提高性能，并展示了这种扩展对不同物理系统 benchmark 任务有着明显的改善。
results: 对于N-体电荷粒子动力学、分子性质预测和太阳系体 trajectory 预测等任务，多通道EGNN 表现出了更高的性能，与标准单通道EGNN 的差异非常小。

Abstract
We present a natural extension to E(n)-equivariant graph neural networks that uses multiple equivariant vectors per node. We formulate the extension and show that it improves performance across different physical systems benchmark tasks, with minimal differences in runtime or number of parameters. The proposed multichannel EGNN outperforms the standard singlechannel EGNN on N-body charged particle dynamics, molecular property predictions, and predicting the trajectories of solar system bodies. Given the additional benefits and minimal additional cost of multi-channel EGNN, we suggest that this extension may be of practical use to researchers working in machine learning for the physical sciences

摘要
我们提出了多通道E(n)-对称图 neural network的自然扩展，该扩展使用每个节点多个对称向量。我们形式化了扩展并证明其在不同物理系统benchmark任务中提高性能，而且与标准单通道EGNN的运行时间和参数数量差不多。我们的多通道EGNN在N-体电荷 particule动力学、分子性质预测和太阳系体天路预测等任务上表现出色，与标准单通道EGNN相比，它的性能更高。考虑到这种扩展的额外优点和较少的额外成本，我们建议这种扩展可能对物理科学领域的研究人员有实际用途。

Graph Theory Applications in Advanced Geospatial Research

paper_url: http://arxiv.org/abs/2309.03249
repo_url: None
paper_authors: Surajit Ghosh, Archita Mallick, Anuva Chowdhury, Kounik De Sarkar
for: 本文旨在探讨 graf 理论在地理科学中的应用，包括网络分析、空间连接性、地理信息系统等方面。
methods: 本文使用 graf 理论的多种算法和概念来模拟和分析空间关系，包括度量学、最短路径、最大流等。
results: 本文提供了各种实际应用场景，如环境监测、交通规划、基础设施规划等，以及使用 graf 理论解决这些问题的研究和技术。

Abstract
Geospatial sciences include a wide range of applications, from environmental monitoring transportation to infrastructure planning, as well as location-based analysis and services. Graph theory algorithms in mathematics have emerged as indispensable tools in these domains due to their capability to model and analyse spatial relationships efficiently. This technical report explores the applications of graph theory algorithms in geospatial sciences, highlighting their role in network analysis, spatial connectivity, geographic information systems, and various other spatial problem-solving scenarios. It provides a comprehensive idea about the key concepts and algorithms of graph theory that assist the modelling processes. The report provides insights into the practical significance of graph theory in addressing real-world geospatial challenges and opportunities. It lists the extensive research, innovative technologies and methodologies implemented in this field.

摘要
地ospatial科学包括各种应用，从环境监测到交通规划，以及基础设施规划，同时还包括位置基于分析和服务。数学中的图论算法在这些领域中已成为不可或缺的工具，这是因为它可以有效地模拟和分析空间关系。本技术报告探讨了图论算法在地ospatial科学中的应用，特别是在网络分析、空间连接性、地图信息系统和其他空间问题解决方案中。报告提供了关键概念和算法的全面了解，并提供了实际应用中图论在地ospatial挑战中的实际意义和机遇。报告还列出了该领域的广泛的研究、创新技术和方法。

ORL-AUDITOR: Dataset Auditing in Offline Deep Reinforcement Learning

paper_url: http://arxiv.org/abs/2309.03081
repo_url: https://github.com/link-zju/orl-auditor
paper_authors: Linkang Du, Min Chen, Mingyang Sun, Shouling Ji, Peng Cheng, Jiming Chen, Zhikun Zhang
for: 本研究是为了提供一种基于累积奖励的 trajectory-level 数据审核机制，以适应 Offline Deep Reinforcement Learning（Offline DRL）enario。
methods: 本研究使用了累积奖励来作为数据审核的依据，并提出了一种基于累积奖励的数据审核机制——ORL-AUDITOR。
results: 实验表明，ORL-AUDITOR 可以准确地审核 Offline DRL 模型是否使用了正确的数据，并且可以在多个任务和模型上达到审核精度高于 95% 并且 false positive rate 低于 2.88%。

Abstract
Data is a critical asset in AI, as high-quality datasets can significantly improve the performance of machine learning models. In safety-critical domains such as autonomous vehicles, offline deep reinforcement learning (offline DRL) is frequently used to train models on pre-collected datasets, as opposed to training these models by interacting with the real-world environment as the online DRL. To support the development of these models, many institutions make datasets publicly available with opensource licenses, but these datasets are at risk of potential misuse or infringement. Injecting watermarks to the dataset may protect the intellectual property of the data, but it cannot handle datasets that have already been published and is infeasible to be altered afterward. Other existing solutions, such as dataset inference and membership inference, do not work well in the offline DRL scenario due to the diverse model behavior characteristics and offline setting constraints. In this paper, we advocate a new paradigm by leveraging the fact that cumulative rewards can act as a unique identifier that distinguishes DRL models trained on a specific dataset. To this end, we propose ORL-AUDITOR, which is the first trajectory-level dataset auditing mechanism for offline RL scenarios. Our experiments on multiple offline DRL models and tasks reveal the efficacy of ORL-AUDITOR, with auditing accuracy over 95% and false positive rates less than 2.88%. We also provide valuable insights into the practical implementation of ORL-AUDITOR by studying various parameter settings. Furthermore, we demonstrate the auditing capability of ORL-AUDITOR on open-source datasets from Google and DeepMind, highlighting its effectiveness in auditing published datasets. ORL-AUDITOR is open-sourced at https://github.com/link-zju/ORL-Auditor.

摘要
“资料是人工智能中的重要资产，高质量的数据可以帮助机器学习模型提高表现。在安全敏感领域，如自动驾驶车，离线深度学习（离线DRL）常常用于训练模型，而不是在实际环境中进行线上DRL。为了支持这些模型的发展，许多机构会公开数据，但这些数据受到潜在的违用或侵犯的威胁。将水印加入数据可以保护数据的知识产权，但这不能处理已经发布的数据，且无法在后续更改。现有的解决方案，如数据推理和会员推理，在离线DRL scenario中不太适用，因为模型行为特性和离线设定的限制。在这篇文章中，我们主张一个新的思路，利用总奖励可以作为特定数据集训练离线RL模型的唯一识别码。为此，我们提出了ORL-AUDITOR，是离线RL scenario中首个数据集评审机制。我们的实验显示，ORL-AUDITOR在多个离线DRL模型和任务上显示出评审精度高于95%，False Positive率低于2.88%。我们还提供了实际实施ORL-AUDITOR的有用指导，以及在Google和DeepMind的开源数据集上进行评审的能力。ORL-AUDITOR的开源网站可以在https://github.com/link-zju/ORL-Auditor 中找到。”

Parameterizing pressure-temperature profiles of exoplanet atmospheres with neural networks

paper_url: http://arxiv.org/abs/2309.03075
repo_url: https://github.com/timothygebhard/ml4ptp
paper_authors: Timothy D. Gebhard, Daniel Angerhausen, Björn S. Konrad, Eleonora Alei, Sascha P. Quanz, Bernhard Schölkopf
For: The paper aims to improve the accuracy and efficiency of atmospheric retrieval (AR) for exoplanets by introducing a new, data-driven parameterization scheme for pressure-temperature (PT) profiles.* Methods: The authors use a latent variable model (based on a neural network) to learn a distribution over functions (PT profiles) and a decoder network to map pressure to temperature. They train and evaluate their method on two publicly available datasets of self-consistent PT profiles.* Results: The authors find that their method achieves better fit quality than existing baseline methods, despite using fewer parameters. In an AR based on existing literature, their model (using two parameters) produces a tighter, more accurate posterior for the PT profile than the five-parameter polynomial baseline, while also speeding up the retrieval by more than a factor of three.

Abstract
Atmospheric retrievals (AR) of exoplanets typically rely on a combination of a Bayesian inference technique and a forward simulator to estimate atmospheric properties from an observed spectrum. A key component in simulating spectra is the pressure-temperature (PT) profile, which describes the thermal structure of the atmosphere. Current AR pipelines commonly use ad hoc fitting functions here that limit the retrieved PT profiles to simple approximations, but still use a relatively large number of parameters. In this work, we introduce a conceptually new, data-driven parameterization scheme for physically consistent PT profiles that does not require explicit assumptions about the functional form of the PT profiles and uses fewer parameters than existing methods. Our approach consists of a latent variable model (based on a neural network) that learns a distribution over functions (PT profiles). Each profile is represented by a low-dimensional vector that can be used to condition a decoder network that maps $P$ to $T$. When training and evaluating our method on two publicly available datasets of self-consistent PT profiles, we find that our method achieves, on average, better fit quality than existing baseline methods, despite using fewer parameters. In an AR based on existing literature, our model (using two parameters) produces a tighter, more accurate posterior for the PT profile than the five-parameter polynomial baseline, while also speeding up the retrieval by more than a factor of three. By providing parametric access to physically consistent PT profiles, and by reducing the number of parameters required to describe a PT profile (thereby reducing computational cost or freeing resources for additional parameters of interest), our method can help improve AR and thus our understanding of exoplanet atmospheres and their habitability.

摘要
通常情况下，描述外层行星大气的 Retrieval （AR）都是通过 Bayesian 推理技术和前向模拟器来估算大气属性，其中一个关键组件是压力-温度（PT）规则，它描述大气的热结构。现有的 AR 管道通常使用各种各样的适应函数来限制从观测 спектrum 中获取的 PT 规则，但仍然使用较多参数。在这项工作中，我们提出了一种新的、数据驱动的参数化方案，这种方案不需要显式地假设 PT 规则的函数形式，并且使用 fewer 参数 than 现有方法。我们的方法基于一个 latent 变量模型（基于神经网络），该模型学习一个分布 über 函数（PT 规则）。每个 profile 都被表示为一个低维度的 вектор，可以用来 condition 一个 decoder 网络，该网络将 $P$ 映射到 $T$。在训练和评估我们的方法时，我们发现我们的方法在两个公开可用的 datasets 上的自 consistency 的 PT 规则上达到了更高的适应质量，尽管使用 fewer 参数。在基于现有文献的 AR 中，我们的模型（使用两个参数）生成了一个更紧凑、更准确的 posterior 对 PT 规则，而且同时提高了计算速度，比现有五个参数的多项式基线 faster 了 более三分之一。通过为 PT 规则提供参数化的物理一致性，并降低计算量或释放资源，我们的方法可以帮助改进 AR，从而提高我们对外层行星大气的理解和其居住性。

Learning Active Subspaces for Effective and Scalable Uncertainty Quantification in Deep Neural Networks

paper_url: http://arxiv.org/abs/2309.03061
repo_url: None
paper_authors: Sanket Jantre, Nathan M. Urban, Xiaoning Qian, Byung-Jun Yoon
for: 这个论文旨在提出一种解决 bayesian deep learning 的计算复杂性问题的方法，使得可以提供准确的预测和量化的uncertainty。
methods: 该方法是通过构建一个具有低维度的 neural network 参数空间，并在这个空间中进行 Bayesian 推断，从而使得计算复杂性得到了改进。
results: 实验表明，该方法可以提供可靠的预测和具有robustness的uncertainty估计，并且可以在多种回归任务中应用。

Abstract
Bayesian inference for neural networks, or Bayesian deep learning, has the potential to provide well-calibrated predictions with quantified uncertainty and robustness. However, the main hurdle for Bayesian deep learning is its computational complexity due to the high dimensionality of the parameter space. In this work, we propose a novel scheme that addresses this limitation by constructing a low-dimensional subspace of the neural network parameters-referred to as an active subspace-by identifying the parameter directions that have the most significant influence on the output of the neural network. We demonstrate that the significantly reduced active subspace enables effective and scalable Bayesian inference via either Monte Carlo (MC) sampling methods, otherwise computationally intractable, or variational inference. Empirically, our approach provides reliable predictions with robust uncertainty estimates for various regression tasks.

摘要
bayesian 推理 для神经网络，或 bayesian 深度学习，有可能提供准确的预测和量化的不确定性。然而，bayesian 深度学习的主要障碍是其参数空间的维度太高，导致计算复杂度过高。在这种情况下，我们提议一种新的方案，即在神经网络参数空间中构建一个低维度的活动子空间（referred to as an active subspace），其中神经网络参数的方向对输出的影响最大。我们证明了，这个减少后的活动子空间可以使bayesian 推理变得可行和扩展性强。我们通过MC sampling方法或variational推理进行实验，并证明了我们的方法可以提供可靠的预测和robust的不确定性估计。

CoLA: Exploiting Compositional Structure for Automatic and Efficient Numerical Linear Algebra

paper_url: http://arxiv.org/abs/2309.03060
repo_url: https://github.com/wilson-labs/cola
paper_authors: Andres Potapczynski, Marc Finzi, Geoff Pleiss, Andrew Gordon Wilson
for: 这篇论文是为了解决大规模机器学习和科学中的线性代数问题，例如 eigendecompositions、解决线性系统、计算矩阵幂和迹估计。
methods: 这篇论文提出了一个简单 yet 通用的框架，named CoLA (Compositional Linear Algebra)，可以自动构建高效的内存和时间间步的数据运算法。CoLA使用了线性算子抽象和compositional dispatch规则，可以实现内存高效的自动微分、低精度计算和GPU加速，并且可以与多种下游套件进行整合。
results: 这篇论文显示了CoLA在广泛的应用中的高效性，包括partial differential equations、Gaussian processes、equivariant model construction和无supervised learning。

Abstract
Many areas of machine learning and science involve large linear algebra problems, such as eigendecompositions, solving linear systems, computing matrix exponentials, and trace estimation. The matrices involved often have Kronecker, convolutional, block diagonal, sum, or product structure. In this paper, we propose a simple but general framework for large-scale linear algebra problems in machine learning, named CoLA (Compositional Linear Algebra). By combining a linear operator abstraction with compositional dispatch rules, CoLA automatically constructs memory and runtime efficient numerical algorithms. Moreover, CoLA provides memory efficient automatic differentiation, low precision computation, and GPU acceleration in both JAX and PyTorch, while also accommodating new objects, operations, and rules in downstream packages via multiple dispatch. CoLA can accelerate many algebraic operations, while making it easy to prototype matrix structures and algorithms, providing an appealing drop-in tool for virtually any computational effort that requires linear algebra. We showcase its efficacy across a broad range of applications, including partial differential equations, Gaussian processes, equivariant model construction, and unsupervised learning.

摘要
很多机器学习和科学领域都涉及到大规模的线性代数问题，如归一化、解系统、计算矩阵幂和跟踪估计。这些矩阵经常具有克罗内cker、卷积、块对角、和积加结构。在这篇论文中，我们提出了一个简单 yet general的框架 для大规模的线性代数问题，名为CoLA（Compositional Linear Algebra）。通过将线性运算抽象与组合调度规则结合起来，CoLA自动构建了高效的内存和运行时数值算法。此外，CoLA提供了内存高效的自动微分、低精度计算和GPU加速，同时也支持在JAX和PyTorch中进行多重派发。CoLA可以加速许多线性运算，并使其易于实际matrix结构和算法的探索，提供了许多应用程序的灵活Drop-in工具。我们在各种应用程序中展示了CoLA的效果，包括偏微分方程、Gaussian процеcess、对称型建构和无监督学习。

Automated CVE Analysis for Threat Prioritization and Impact Prediction

paper_url: http://arxiv.org/abs/2309.03040
repo_url: None
paper_authors: Ehsan Aghaei, Ehab Al-Shaer, Waseem Shadid, Xi Niu
For: This paper aims to improve the efficiency and accuracy of CVE analysis and threat prioritization by introducing a novel predictive model and tool called CVEDrill.* Methods: CVEDrill uses machine learning algorithms to estimate the CVSS vector for precise threat mitigation and priority ranking, and it also automates the classification of CVEs into the appropriate CWE hierarchy classes.* Results: CVEDrill outperforms state-of-the-art tools like ChaptGPT in terms of accuracy and timeliness, allowing organizations to implement cybersecurity countermeasure mitigation with unparalleled effectiveness.Here are the three points in Simplified Chinese text:* For: 这篇论文目的是提高CVE分析和威胁优先级的效率和准确性，通过引入一种新的预测模型和工具CVEDrill。* Methods: CVEDrill使用机器学习算法来估算CVSS向量，以便精确地确定威胁优先级和Countermeasure的适用性，同时自动将CVE分类到适当的CWE层次结构中。* Results: CVEDrill比ChaptGPT等现有工具更高效和准确， allowing organizations可以通过无 precedent的效果和时效性来实施cybersecurity countermeasure mitigation。

Abstract
The Common Vulnerabilities and Exposures (CVE) are pivotal information for proactive cybersecurity measures, including service patching, security hardening, and more. However, CVEs typically offer low-level, product-oriented descriptions of publicly disclosed cybersecurity vulnerabilities, often lacking the essential attack semantic information required for comprehensive weakness characterization and threat impact estimation. This critical insight is essential for CVE prioritization and the identification of potential countermeasures, particularly when dealing with a large number of CVEs. Current industry practices involve manual evaluation of CVEs to assess their attack severities using the Common Vulnerability Scoring System (CVSS) and mapping them to Common Weakness Enumeration (CWE) for potential mitigation identification. Unfortunately, this manual analysis presents a major bottleneck in the vulnerability analysis process, leading to slowdowns in proactive cybersecurity efforts and the potential for inaccuracies due to human errors. In this research, we introduce our novel predictive model and tool (called CVEDrill) which revolutionizes CVE analysis and threat prioritization. CVEDrill accurately estimates the CVSS vector for precise threat mitigation and priority ranking and seamlessly automates the classification of CVEs into the appropriate CWE hierarchy classes. By harnessing CVEDrill, organizations can now implement cybersecurity countermeasure mitigation with unparalleled accuracy and timeliness, surpassing in this domain the capabilities of state-of-the-art tools like ChaptGPT.

摘要
通用漏洞和曝露（CVE）是重要的cybersecurity措施之一，包括服务补丁、安全强化等。然而，CVE通常只提供低级别、产品特定的漏洞描述，经常缺乏必要的攻击语义信息，这对于全面评估漏洞弱点和威胁影响进行了精度的评估和准确的威胁评估是关键。现行industry实践是通过手动评估CVE来评估其攻击严重性，并将其映射到Common Weakness Enumeration（CWE）中 для可能的缓解 identification。然而，这个手动分析过程存在主要的瓶颈，导致漏洞分析过程中的拥堵和可能的人工错误导致的不准确。在这项研究中，我们介绍了我们的新的预测模型和工具（称为CVEDrill），这种模型可以准确地估算CVSS vector，以便precise的威胁缓解和排名，并自动将CVE分类到相应的CWE层次分类中。通过使用CVEDrill，组织可以现在通过无前例的准确性和时效性，超越现有的工具，如ChaptGPT。

Deep Learning for Polycystic Kidney Disease: Utilizing Neural Networks for Accurate and Early Detection through Gene Expression Analysis

paper_url: http://arxiv.org/abs/2309.03033
repo_url: None
paper_authors: Kapil Panda, Anirudh Mazumder
for: 早期诊断肾脏瘤病（PKD），以确保病情管理的效果。
methods: 使用深度学习方法分析病人基因表达，实现精准和可靠的病症检测。
results: 研究提出的神经网络模型可以准确地预测病人可能患有PKD。

Abstract
With Polycystic Kidney Disease (PKD) potentially leading to fatal complications in patients due to the formation of cysts in the kidneys, early detection of PKD is crucial for effective management of the condition. However, the various patient-specific factors that play a role in the diagnosis make it an intricate puzzle for clinicians to solve. Therefore, in this study, we aim to utilize a deep learning-based approach for early disease detection. The devised neural network can achieve accurate and robust predictions for possible PKD in patients by analyzing patient gene expressions.

摘要
患有肾脏瘤病（PKD）的患者可能会因肾脏中的肿块形成而导致致命的合并症状。因此，早期PKD的诊断非常重要，以便有效管理疾病。然而，诊断过程中受到各种患者特定的因素的影响，使得诊断变得非常复杂。因此，在这项研究中，我们尝试使用深度学习的方法来早期诊断PKD。我们设计的神经网络可以通过分析患者的基因表达来实现准确和可靠的预测。

Amortised Inference in Bayesian Neural Networks

paper_url: http://arxiv.org/abs/2309.03018
repo_url: https://github.com/sheev13/bnn_amort_inf
paper_authors: Tommy Rochussen
for: 这个论文的目的是提出一种更有效率的 probabilistic meta-learning 方法，以便在有限的数据情况下进行预测。
methods: 这篇论文使用了 Bayesian neural networks 和 per-datapoint amortisation of inference 技术，实现了更有效率的 probabilistic meta-learning。
results: 这个方法在一个一维 regression 问题和一个更加复杂的图像完成问题中，具有较好的预测性能， especialy when the amount of training data is limited.

Abstract
Meta-learning is a framework in which machine learning models train over a set of datasets in order to produce predictions on new datasets at test time. Probabilistic meta-learning has received an abundance of attention from the research community in recent years, but a problem shared by many existing probabilistic meta-models is that they require a very large number of datasets in order to produce high-quality predictions with well-calibrated uncertainty estimates. In many applications, however, such quantities of data are simply not available. In this dissertation we present a significantly more data-efficient approach to probabilistic meta-learning through per-datapoint amortisation of inference in Bayesian neural networks, introducing the Amortised Pseudo-Observation Variational Inference Bayesian Neural Network (APOVI-BNN). First, we show that the approximate posteriors obtained under our amortised scheme are of similar or better quality to those obtained through traditional variational inference, despite the fact that the amortised inference is performed in a single forward pass. We then discuss how the APOVI-BNN may be viewed as a new member of the neural process family, motivating the use of neural process training objectives for potentially better predictive performance on complex problems as a result. Finally, we assess the predictive performance of the APOVI-BNN against other probabilistic meta-models in both a one-dimensional regression problem and in a significantly more complex image completion setting. In both cases, when the amount of training data is limited, our model is the best in its class.

摘要
<>translate_language: zh-CN<>框架：机器学习模型在一系列数据集上进行训练，以生成新数据集上的预测。 probabilistic meta-learning 在研究 сообществе中获得了很多关注，但是现有的 probabilistic meta-models Problem 是需要很多数据集来生成高质量的预测和准确的不确定度估计。在许多应用程序中，这些数据集的数量并不够。在这个论文中，我们提出了一种更加数据效率的 probabilistic meta-learning 方法，通过 Bayesian neural networks 中的 per-datapoint amortisation of inference， introduce the Amortised Pseudo-Observation Variational Inference Bayesian Neural Network (APOVI-BNN).我们首先示出，我们的杜立格 posterior 与传统的 variational inference 相比，可以在单个前进 pass 中获得类似或更好的结果。然后，我们讨论了如何视 APOVI-BNN 为一种新的 neural process 成员，并motivate 使用 neural process training objectives 以实现更好的预测性能在复杂问题上。最后，我们评估了 APOVI-BNN 的预测性能与其他 probabilistic meta-models 在一个一维回归问题和一个更加复杂的图像完成问题上。在这两个问题上，当训练数据少于时，我们的模型在其类中表现最佳。

SymED: Adaptive and Online Symbolic Representation of Data on the Edge

paper_url: http://arxiv.org/abs/2309.03014
repo_url: None
paper_authors: Daniel Hofstätter, Shashikant Ilager, Ivan Lujic, Ivona Brandic
for: 这个研究旨在实现对互联网预设设备（IoT）产生的数据进行 proximity 处理，并解决将资料传输、储存和处理到资源有限的边缘设备上所出现的挑战。
methods: 这个研究使用了符号表示法（SR）来将实际的原始数据转换为符号，以便在边缘设备上进行数据分析（例如异常检测和趋势预测），从而帮助大量边缘应用程序。
results: 这个研究的结果显示了 SymED 可以实现以下三个目的：（i）将原始数据压缩为平均压缩率为 9.5%；（ii）在 DTW 空间中保持低的重建误差为 13.25；（iii）同时提供实时适应性，以便在一般延迟为 42ms 的符号中进行在线流动 IoT 数据处理。

Abstract
The edge computing paradigm helps handle the Internet of Things (IoT) generated data in proximity to its source. Challenges occur in transferring, storing, and processing this rapidly growing amount of data on resource-constrained edge devices. Symbolic Representation (SR) algorithms are promising solutions to reduce the data size by converting actual raw data into symbols. Also, they allow data analytics (e.g., anomaly detection and trend prediction) directly on symbols, benefiting large classes of edge applications. However, existing SR algorithms are centralized in design and work offline with batch data, which is infeasible for real-time cases. We propose SymED - Symbolic Edge Data representation method, i.e., an online, adaptive, and distributed approach for symbolic representation of data on edge. SymED is based on the Adaptive Brownian Bridge-based Aggregation (ABBA), where we assume low-powered IoT devices do initial data compression (senders) and the more robust edge devices do the symbolic conversion (receivers). We evaluate SymED by measuring compression performance, reconstruction accuracy through Dynamic Time Warping (DTW) distance, and computational latency. The results show that SymED is able to (i) reduce the raw data with an average compression rate of 9.5%; (ii) keep a low reconstruction error of 13.25 in the DTW space; (iii) simultaneously provide real-time adaptability for online streaming IoT data at typical latencies of 42ms per symbol, reducing the overall network traffic.

摘要
Edge computing paradigm 帮助处理互联网物联网（IoT）生成的数据在数据的 sources 附近。但是，将大量快速增长的数据转移、存储和处理到资源有限的边缘设备上存在挑战。符号表示（SR）算法是一种有前途的解决方案，可以将实际的原始数据转换成符号，从而降低数据大小。此外，SR 还允许在符号上进行数据分析（例如，异常检测和趋势预测），对于许多边缘应用程序而言是非常有利。然而，现有的 SR 算法都是中央化的设计，在批处理数据的情况下做出了线上的处理，这对实时情况是不可能的。我们提出了 SymED - 符号边缘数据表示方法，即在线、适应、分布式的符号表示方法。SymED 基于 Adaptive Brownian Bridge-based Aggregation（ABBA），假设低功率的 IoT 设备进行初步数据压缩（发送器），而更 robust的边缘设备进行符号转换（接收器）。我们通过测量压缩性能、重建精度通过动态时间戳（DTW）距离以及计算延迟来评估 SymED。结果显示，SymED 能够：1. 将原始数据压缩到平均压缩率为 9.5%。2. 在 DTW 空间保持低于 13.25 的重建错误。3. 同时提供在线适应性，对于常见的 42ms 每个符号的延迟，减少总网络流量。

Theoretical Explanation of Activation Sparsity through Flat Minima and Adversarial Robustness

paper_url: http://arxiv.org/abs/2309.03004
repo_url: None
paper_authors: Ze Peng, Lei Qi, Yinghuan Shi, Yang Gao
for: 这个论文旨在解释 activation sparsity 的起源，以及如何通过 gradient sparsity 来实现 adversarial robustness。
methods: 这篇论文使用了 gradient sparsity 的概念，以及 random matrix theory (RMT) 来解释 activation sparsity 的起源。
results: 该论文提出了两种 plug-and-play 模块和一种 радикаль 修改，以提高模型的 sparse 性和 flatness，并通过实验证明了这些修改的效果。

Abstract
A recent empirical observation of activation sparsity in MLP layers offers an opportunity to drastically reduce computation costs for free. Despite several works attributing it to training dynamics, the theoretical explanation of activation sparsity's emergence is restricted to shallow networks, small training steps well as modified training, even though the sparsity has been found in deep models trained by vanilla protocols for large steps. To fill the three gaps, we propose the notion of gradient sparsity as the source of activation sparsity and a theoretical explanation based on it that explains gradient sparsity and then activation sparsity as necessary steps to adversarial robustness w.r.t. hidden features and parameters, which is approximately the flatness of minima for well-learned models. The theory applies to standardly trained LayerNorm-ed pure MLPs, and further to Transformers or other architectures if noises are added to weights during training. To eliminate other sources of flatness when arguing sparsities' necessity, we discover the phenomenon of spectral concentration, i.e., the ratio between the largest and the smallest non-zero singular values of weight matrices is small. We utilize random matrix theory (RMT) as a powerful theoretical tool to analyze stochastic gradient noises and discuss the emergence of spectral concentration. With these insights, we propose two plug-and-play modules for both training from scratch and sparsity finetuning, as well as one radical modification that only applies to from-scratch training. Another under-testing module for both sparsity and flatness is also immediate from our theories. Validational experiments are conducted to verify our explanation. Experiments for productivity demonstrate modifications' improvement in sparsity, indicating further theoretical cost reduction in both training and inference.

摘要
近期观察到多层感知（MLP）层中的活动稀畴现象，可以带来计算成本的削减。虽然一些研究归因于训练动力学，但理论解释活动稀畴的出现受限于浅网络、小训练步骤以及修改训练协议，即使发现在深度模型中训练。为填充这三个差距，我们提出梯度稀畴作为活动稀畴的来源，并基于这一理论来解释梯度稀畴和活动稀畴是对抗攻击的必要步骤，即隐藏特征和参数的平坦性，这与模型训练得到的极值几乎相同。这种理论适用于标准地训练了LayerNorm-ed纯MLP，并可以扩展到Transformers或其他架构。为消除其他 sources of flatness when arguing sparsities' necessity，我们发现特征归一化现象，即权重矩阵的最大和最小非零特征值占总特征值的比率很小。我们利用随机矩阵理论（RMT）作为一种强大的理论工具，分析随机梯度噪声，并讨论活动稀畴的出现。基于这些理解，我们提出了两个插件和一个重大修改，其中一个仅适用于从零开始训练。另外一个模块适用于 both sparsity and flatness，并且可以在训练和推理中带来进一步的成本削减。VALIDATION experiments are conducted to verify our explanation. Productivity experiments demonstrate modifications' improvement in sparsity, indicating further theoretical cost reduction in both training and inference.

Natural and Robust Walking using Reinforcement Learning without Demonstrations in High-Dimensional Musculoskeletal Models

paper_url: http://arxiv.org/abs/2309.02976
repo_url: None
paper_authors: Pierre Schumacher, Thomas Geijtenbeek, Vittorio Caggiano, Vikash Kumar, Syn Schmitt, Georg Martius, Daniel F. B. Haeufle
for: The paper aims to develop a reinforcement learning (RL) method for natural bipedal walking without relying on extensive expert demonstrations.
methods: The paper uses RL to learn a controller that can generate human-like walking with bipedal biomechanical models in complex natural environments.
results: The paper achieves natural locomotion with RL without sacrificing robustness, paving the way for a novel approach to studying human walking in complex natural environments.Here’s the Chinese translation of the three key points:
for: 本研究的目的是开发一种不需要广泛专家示范的强化学习（RL）方法，以便生成人类自然的双脚步行。
methods: 本研究使用RL来学习一个可以生成人类双脚生物运动模型中的自然步行控制器。
results: 本研究成功地实现了自然步行，并确保了其稳定性，这些结果显示RL可以在复杂的自然环境中实现人类步行。

Abstract
Humans excel at robust bipedal walking in complex natural environments. In each step, they adequately tune the interaction of biomechanical muscle dynamics and neuronal signals to be robust against uncertainties in ground conditions. However, it is still not fully understood how the nervous system resolves the musculoskeletal redundancy to solve the multi-objective control problem considering stability, robustness, and energy efficiency. In computer simulations, energy minimization has been shown to be a successful optimization target, reproducing natural walking with trajectory optimization or reflex-based control methods. However, these methods focus on particular motions at a time and the resulting controllers are limited when compensating for perturbations. In robotics, reinforcement learning~(RL) methods recently achieved highly stable (and efficient) locomotion on quadruped systems, but the generation of human-like walking with bipedal biomechanical models has required extensive use of expert data sets. This strong reliance on demonstrations often results in brittle policies and limits the application to new behaviors, especially considering the potential variety of movements for high-dimensional musculoskeletal models in 3D. Achieving natural locomotion with RL without sacrificing its incredible robustness might pave the way for a novel approach to studying human walking in complex natural environments. Videos: https://sites.google.com/view/naturalwalkingrl

摘要
人类在复杂自然环境中能够实现robust的双腿行走，每步都能够适应地与生物机械动力学和神经传导信号进行互动，以实现稳定性和可靠性。然而， nervious系统如何解决musculoskeletal redundanteness以解决多目标控制问题，包括稳定性、可靠性和能效性，仍未被完全理解。在计算机实验中，能量最小化被证明是一个成功的优化目标，可以通过轨迹优化或刷新控制方法来复制自然的行走。然而，这些方法通常会专注于特定的动作，并且 resulting控制器在补做干扰时有限。在机器人学中，使用强化学习（RL）方法已经实现了高稳定性（以及高效率）的四肢系统行走，但是使用人类类似的双腿生物机械模型进行行走却需要广泛使用专家数据集。这种强制依赖于示例数据集的方法通常会导致 brittle policies 并限制应用于新的行为，尤其是考虑到高维musculoskeletal模型在3D中的可能的多种运动。实现自然的行走通过RL而不需要牺牲其惊人的稳定性可能会开启一种新的研究人类行走的方法。Video:

On the Impact of Feeding Cost Risk in Aquaculture Valuation and Decision Making

paper_url: http://arxiv.org/abs/2309.02970
repo_url: https://github.com/kevinkamm/aquaculturestochasticfeeding
paper_authors: Christian Oliver Ewald, Kevin Kamm
For: 研究了动物原料的生产要素中的偶极成本的影响, 专门是关于养殖业。* Methods: 使用了Soybean futures来推断鲑鱼饲料的随机行为, 假设饲料采用Schwartz-2-factor模型。 Comparing harvesting salmon using a decision rule that accounts for stochastic feeding costs or deterministic feeding costs, and using deep neural networks to infer the decision boundary.* Results: 在一些情况下, 考虑到随机饲料成本会导致显著改善, 而在其他情况下, 固定饲料成本可以作为一个好的代理。新的决策规则都显示了更好的性能, 而且计算成本很低。使用深度神经网络来改进循环采集和拟合方法, 并在更高维度问题上scale well.

Abstract
We study the effect of stochastic feeding costs on animal-based commodities with particular focus on aquaculture. More specifically, we use soybean futures to infer on the stochastic behaviour of salmon feed, which we assume to follow a Schwartz-2-factor model. We compare the decision of harvesting salmon using a decision rule assuming either deterministic or stochastic feeding costs, i.e. including feeding cost risk. We identify cases, where accounting for stochastic feeding costs leads to significant improvements as well as cases where deterministic feeding costs are a good enough proxy. Nevertheless, in all of these cases, the newly derived rules show superior performance, while the additional computational costs are negligible. From a methodological point of view, we demonstrate how to use Deep-Neural-Networks to infer on the decision boundary that determines harvesting or continuation, improving on more classical regression-based and curve-fitting methods. To achieve this we use a deep classifier, which not only improves on previous results but also scales well for higher dimensional problems, and in addition mitigates effects due to model uncertainty, which we identify in this article. effects due to model uncertainty, which we identify in this article.

摘要
我们研究生物动物商品中的随机食品成本的影响，特别是关注养殖业。更具体地说，我们使用соยbean futures来推断鳟鱼饲料的随机行为，假设鳟鱼饲料遵循Schwartz-2-factor模型。我们比较了在决定鳟鱼采择时使用决定规则，包括饲料成本风险，以及不包括饲料成本风险的决定规则。我们发现在一些情况下，考虑随机饲料成本可以导致显著改善，而在其他情况下，决定规则假设饲料成本是确定的够好。然而，在所有这些情况下，我们新 derivation的规则都显示出了更高的性能，而且计算成本几乎是零。从方法ологиical的角度来看，我们示例了如何使用深度神经网络来推断决定边界，改进了以往的回归基于方法和曲线适应方法。为了实现这一点，我们使用深度分类器，不仅提高了前一代的结果，而且可扩展到更高维度的问题，并且减轻模型不确定性的影响，我们在这篇文章中提到。

CR-VAE: Contrastive Regularization on Variational Autoencoders for Preventing Posterior Collapse

paper_url: http://arxiv.org/abs/2309.02968
repo_url: None
paper_authors: Fotios Lygerakis, Elmar Rueckert
for: 解决Variational Autoencoder（VAE）中的后降现象，使其生成的含义不会随机变化。
methods: 我们提出了一种新的解决方案，即对VAE进行对比识别的正则化（CR-VAE），通过增加一个对比目标函数，使得输入和其含义之间的信息流最大化，从而避免后降现象。
results: 我们在多个视觉数据集上测试了我们的方法，结果显示，CR-VAE在避免后降现象方面表现出色，超过了现有的方法。

Abstract
The Variational Autoencoder (VAE) is known to suffer from the phenomenon of \textit{posterior collapse}, where the latent representations generated by the model become independent of the inputs. This leads to degenerated representations of the input, which is attributed to the limitations of the VAE's objective function. In this work, we propose a novel solution to this issue, the Contrastive Regularization for Variational Autoencoders (CR-VAE). The core of our approach is to augment the original VAE with a contrastive objective that maximizes the mutual information between the representations of similar visual inputs. This strategy ensures that the information flow between the input and its latent representation is maximized, effectively avoiding posterior collapse. We evaluate our method on a series of visual datasets and demonstrate, that CR-VAE outperforms state-of-the-art approaches in preventing posterior collapse.

摘要
“Variational Autoencoder（VAE）受到后期崩溃现象的影响，即生成的内在表现变得与输入无关。这会导致输入的表现为病征，并且被归于VAE的目标函数的限制。在这个工作中，我们提出了一个新的解决方案，即对VAE进行对照调整（CR-VAE）。我们的方法是将原始VAE加以一个对照目标，以 maximize 输入和其内在表现之间的相互信息。这策略可以确保输入和它的内在表现之间的信息流汇流，彻底避免后期崩溃。我们在一系列类别视觉数据集上评估了我们的方法，并证明了CR-VAE可以较好地避免后期崩溃。”Note: "后期崩溃" (posterior collapse) refers to a phenomenon where the latent representations generated by a Variational Autoencoder (VAE) become independent of the inputs, leading to degenerate representations of the input.

EvoCLINICAL: Evolving Cyber-Cyber Digital Twin with Active Transfer Learning for Automated Cancer Registry System

paper_url: http://arxiv.org/abs/2309.03246
repo_url: https://github.com/simula-complex/evoclinical
paper_authors: Chengjie Lu, Qinghua Xu, Tao Yue, Shaukat Ali, Thomas Schwitalla, Jan F. Nygård
for: 提高GURI自动癌病注册系统的可靠性和精准性，以便为癌病研究和相关统计提供可靠的基础。
methods: 提出了一种基于遗传算法的演进式临床数据采集方法，使用先前版本的CCDT作为预训练模型，并使用新版GURI上的数据进行微调。
results: 通过评估EvoCLINICAL在三个演进过程中的性能，发现其精度、准确率和F1分数均高于91%，表明EvoCLINICAL是有效的。此外，通过将EvoCLINICAL中的活动学习部分替换为随机选择，研究了转移学习对EvoCLINICAL的总性能的贡献。结果表明，在EvoCLINICAL中使用活动学习可以逐止性能的提高。

Abstract
The Cancer Registry of Norway (CRN) collects information on cancer patients by receiving cancer messages from different medical entities (e.g., medical labs, and hospitals) in Norway. Such messages are validated by an automated cancer registry system: GURI. Its correct operation is crucial since it lays the foundation for cancer research and provides critical cancer-related statistics to its stakeholders. Constructing a cyber-cyber digital twin (CCDT) for GURI can facilitate various experiments and advanced analyses of the operational state of GURI without requiring intensive interactions with the real system. However, GURI constantly evolves due to novel medical diagnostics and treatment, technological advances, etc. Accordingly, CCDT should evolve as well to synchronize with GURI. A key challenge of achieving such synchronization is that evolving CCDT needs abundant data labelled by the new GURI. To tackle this challenge, we propose EvoCLINICAL, which considers the CCDT developed for the previous version of GURI as the pretrained model and fine-tunes it with the dataset labelled by querying a new GURI version. EvoCLINICAL employs a genetic algorithm to select an optimal subset of cancer messages from a candidate dataset and query GURI with it. We evaluate EvoCLINICAL on three evolution processes. The precision, recall, and F1 score are all greater than 91%, demonstrating the effectiveness of EvoCLINICAL. Furthermore, we replace the active learning part of EvoCLINICAL with random selection to study the contribution of transfer learning to the overall performance of EvoCLINICAL. Results show that employing active learning in EvoCLINICAL increases its performances consistently.

摘要
norway 癌症注册系统 (CRN) 收集癌症患者信息，通过不同的医疗机构（如医学实验室和医院）所提供的癌症信息。这些信息会被自动化癌症注册系统：GURI 验证。GURI 的正常运行非常重要，因为它为癌症研究提供关键的癌症相关统计，并且是癌症研究的基础。为了促进 GURI 的运行，我们提出了创建一个 циber-циber数字双胞胎 (CCDT)，可以在不需要与真实系统进行互动的情况下，进行不同的实验和高级分析。然而，GURI 不断发展，因为新的医学诊断和治疗技术、技术进步等。因此，CCDT 也需要不断更新，以保持与 GURI 的同步。一个主要挑战是在更新 CCDT 时，需要大量的数据，并且这些数据需要由新的 GURI 标注。为解决这个问题，我们提出了 EvoCLINICAL，它将前一个 GURI 版本中开发的 CCDT 作为预训练模型，并将其调整为新的 GURI 版本的数据。EvoCLINICAL 使用进化算法选择最佳的肿瘤信息 subset，并将其提交给 GURI 进行查询。我们在三个演变过程中评估 EvoCLINICAL，结果显示其精度、准确率和 F1 分数都高于 91%，这demonstrates EvoCLINICAL 的有效性。此外，我们将 EvoCLINICAL 中的活动学习部分替换为随机选择，以研究转移学习对 EvoCLINICAL 的总性表现的贡献。结果显示，在 EvoCLINICAL 中使用活动学习可以提高其表现的一致性。

A hybrid quantum-classical fusion neural network to improve protein-ligand binding affinity predictions for drug discovery

paper_url: http://arxiv.org/abs/2309.03919
repo_url: None
paper_authors: S. Banerjee, S. He Yuxun, S. Konakanchi, L. Ogunfowora, S. Roy, S. Selvaras, L. Domingo, M. Chehimi, M. Djukic, C. Johnson
for: 这个论文主要针对的是药物发现领域中精准预测药物分子和目标蛋白之间的绑定亲和力，特别是当这些蛋白直接影响疾病进程时。
methods: 这篇论文使用了hybrid量子机器学习（QML）模型，其中结合了3D和空间图像 convolutional neural networks（CNN）在优化的量子架构中。
results: 实验结果显示，提案的模型相比现有的类型模型具有6%的提升预测精度，同时其 converge性也更为稳定。

Abstract
The field of drug discovery hinges on the accurate prediction of binding affinity between prospective drug molecules and target proteins, especially when such proteins directly influence disease progression. However, estimating binding affinity demands significant financial and computational resources. While state-of-the-art methodologies employ classical machine learning (ML) techniques, emerging hybrid quantum machine learning (QML) models have shown promise for enhanced performance, owing to their inherent parallelism and capacity to manage exponential increases in data dimensionality. Despite these advances, existing models encounter issues related to convergence stability and prediction accuracy. This paper introduces a novel hybrid quantum-classical deep learning model tailored for binding affinity prediction in drug discovery. Specifically, the proposed model synergistically integrates 3D and spatial graph convolutional neural networks within an optimized quantum architecture. Simulation results demonstrate a 6% improvement in prediction accuracy relative to existing classical models, as well as a significantly more stable convergence performance compared to previous classical approaches.

摘要
领域的药物发现涉及精准预测新药分子与目标蛋白之间的绑定亲和力，特别是当这些蛋白直接影响疾病进程时。然而，估计绑定亲和力需要较大的金融和计算资源。当前的方法使用经典机器学习（ML）技术，新兴的量子机器学习（QML）模型也有显著提高表现，因为它们具有内置的并行性和数据维度的加法性。然而，现有模型受到稳定性和预测精度的问题。本文介绍一种新的Quantum-Classical深度学习模型，专门用于药物绑定亲和力预测。该模型 synergistically integrates 3D和空间图 convolutional neural networks within an optimized quantum architecture。实验结果表明，提案的模型与现有经典模型相比，提高预测精度6%，同时也比之前的经典方法更加稳定地 converges。

GroupEnc: encoder with group loss for global structure preservation

paper_url: http://arxiv.org/abs/2309.02917
repo_url: None
paper_authors: David Novak, Sofie Van Gassen, Yvan Saeys
for: 本研究旨在开发一种基于Variational Autoencoder（VAE）和SQuadMDS算法的深度学习模型，用于降维高维数据，以提高下游处理效果。
methods: 本研究使用了Structure Preservation的概念，在本地和全局水平进行了定制，以创建一种叫做GroupEnc的编码器模型，其中使用了’group loss’函数来创建具有更少全局结构扭曲的嵌入。
results: 本研究使用了公共available的生物单细胞脱氧核酸数据集，采用RNX曲线进行评估，并证明了GroupEnc模型可以创建具有更高精度和更少扭曲的嵌入，比VAE模型更好。

Abstract
Recent advances in dimensionality reduction have achieved more accurate lower-dimensional embeddings of high-dimensional data. In addition to visualisation purposes, these embeddings can be used for downstream processing, including batch effect normalisation, clustering, community detection or trajectory inference. We use the notion of structure preservation at both local and global levels to create a deep learning model, based on a variational autoencoder (VAE) and the stochastic quartet loss from the SQuadMDS algorithm. Our encoder model, called GroupEnc, uses a 'group loss' function to create embeddings with less global structure distortion than VAEs do, while keeping the model parametric and the architecture flexible. We validate our approach using publicly available biological single-cell transcriptomic datasets, employing RNX curves for evaluation.

摘要
(Simplified Chinese)现有最新的维度减少技术已经实现了高维数据的更准确的二维表示。除了可视化目的外，这些表示还可以用于下游处理，包括批处理准备、聚类、社区探测或轨迹推断。我们使用了“结构保持”的概念来创建一个深度学习模型，基于变量自动编码器（VAE）和SQuadMDS算法中的随机四重损失函数。我们的编码器模型，称为GroupEnc，使用了“群体损失”函数来创建具有较少全局结构扭曲的嵌入，而保持模型参数化和架构灵活。我们验证了我们的方法使用公开ailable的生物单细胞脉冲数据集，使用RNX曲线进行评估。

Ensemble DNN for Age-of-Information Minimization in UAV-assisted Networks

paper_url: http://arxiv.org/abs/2309.02913
repo_url: None
paper_authors: Mouhamed Naby Ndiaye, El Houcine Bergou, Hajar El Hammouti
for: 本研究旨在解决无人机协助网络中的年龄信息问题（Age-of-Information，AoI），以最小化各设备的预期AoI。
methods: 我们首先 derivates a closed-form表达式，以表征设备选择概率下的预期AoI。然后，我们将问题转化为一个非核心最小化问题，并采用Ensemble Deep Neural Network（EDNN）方法来解决。Specifically, we use Deep Neural Networks（DNNs）在ensemble中，通过Lagrangian函数来学习不确定参数。
results: 我们的实验表明，提议的EDNN方法可以效果地减少预期AoI，实现了$29.5%$的remarkable减少。

Abstract
This paper addresses the problem of Age-of-Information (AoI) in UAV-assisted networks. Our objective is to minimize the expected AoI across devices by optimizing UAVs' stopping locations and device selection probabilities. To tackle this problem, we first derive a closed-form expression of the expected AoI that involves the probabilities of selection of devices. Then, we formulate the problem as a non-convex minimization subject to quality of service constraints. Since the problem is challenging to solve, we propose an Ensemble Deep Neural Network (EDNN) based approach which takes advantage of the dual formulation of the studied problem. Specifically, the Deep Neural Networks (DNNs) in the ensemble are trained in an unsupervised manner using the Lagrangian function of the studied problem. Our experiments show that the proposed EDNN method outperforms traditional DNNs in reducing the expected AoI, achieving a remarkable reduction of $29.5\%$.

摘要

A Multimodal Learning Framework for Comprehensive 3D Mineral Prospectivity Modeling with Jointly Learned Structure-Fluid Relationships

paper_url: http://arxiv.org/abs/2309.02911
repo_url: None
paper_authors: Yang Zheng, Hao Deng, Ruisheng Wang, Jingjie Wu
for: 这种研究旨在开发一种新的多模态融合模型，用于三维矿产可能地图（3D MPM），能够有效地结合结构和流体信息，通过深度网络架构。
methods: 该模型使用了卷积神经网络（CNN）和多层感知神经网络（MLP），并使用 canonical correlation analysis（CCA）对多模态特征进行对接和融合。
results: 对于硅铁矿质量分析 dataset 的严谨评估表明，该模型在分辨矿物实例和预测矿产可能性方面具有显著的优势，比其他模型更高效。减少学问还表明了对共同特征利用和 CCA 的重要性。

Abstract
This study presents a novel multimodal fusion model for three-dimensional mineral prospectivity mapping (3D MPM), effectively integrating structural and fluid information through a deep network architecture. Leveraging Convolutional Neural Networks (CNN) and Multilayer Perceptrons (MLP), the model employs canonical correlation analysis (CCA) to align and fuse multimodal features. Rigorous evaluation on the Jiaojia gold deposit dataset demonstrates the model's superior performance in distinguishing ore-bearing instances and predicting mineral prospectivity, outperforming other models in result analyses. Ablation studies further reveal the benefits of joint feature utilization and CCA incorporation. This research not only advances mineral prospectivity modeling but also highlights the pivotal role of data integration and feature alignment for enhanced exploration decision-making.

摘要
这项研究提出了一种新的多Modal融合模型，用于三维矿物资源潜力地图（3D MPM），能够有效地结合结构和流体信息通过深度网络架构。通过Convolutional Neural Networks (CNN)和多层感知器(MLP)，模型使用 canonical correlation analysis (CCA) 对多模态特征进行对齐和融合。经过严格的Jiaojia金矿储量数据集的测试，模型的表现superior于其他模型，在分类矿物实例和预测矿物潜力方面表现出色。剥离研究还表明了特征共同使用和CCA包含的利处。这项研究不仅提高了矿物资源评估模型，还强调了数据集成和特征对齐的重要性，为探索决策提供了更好的支持。

Testing properties of distributions in the streaming model

paper_url: http://arxiv.org/abs/2309.03245
repo_url: https://github.com/jettbrains/-L-
paper_authors: Sampriti Roy, Yadu Vasudev
for: 这篇论文研究了在标准访问模型和条件访问模型中的分布测试，当内存available to testing algorithm是有界的情况下。
methods: 论文使用了一种优化的样本复杂度和存储空间复杂度的方法来测试分布的性质， samples appear in an online fashion。
results: 论文显示了一种可以快速学习幂等分布的简短表示方法，并且可以在内存限制下实现高效的分布测试。此外，这种算法还可以扩展到更大的可分布类型。

Abstract
We study distribution testing in the standard access model and the conditional access model when the memory available to the testing algorithm is bounded. In both scenarios, the samples appear in an online fashion and the goal is to test the properties of distribution using an optimal number of samples subject to a memory constraint on how many samples can be stored at a given time. First, we provide a trade-off between the sample complexity and the space complexity for testing identity when the samples are drawn according to the conditional access oracle. We then show that we can learn a succinct representation of a monotone distribution efficiently with a memory constraint on the number of samples that are stored that is almost optimal. We also show that the algorithm for monotone distributions can be extended to a larger class of decomposable distributions.

摘要
我们研究分布测试在标准访问模型和条件访问模型中，当存储空间限制的情况下。在两种情况下，样本会在在线方式上出现，并且目标是使用最优的样本数量来测试分布的属性，受到存储空间的限制。首先，我们提供了样本复杂性和存储空间之间的贸易OFF，并在样本是根据条件访问oracle采样时显示了这种贸易OFF。然后，我们示出了如何高效地学习幂等分布的简短表示，并且存储空间的限制是几乎最优的。最后，我们extend了这种算法到更大的分布类型。

Non-Clashing Teaching Maps for Balls in Graphs

paper_url: http://arxiv.org/abs/2309.02876
repo_url: None
paper_authors: Jérémie Chalopin, Victor Chepoi, Fionn Mc Inerney, Sébastien Ratel
for: This paper is written to study non-clashing teaching and its applications in machine learning.
methods: The paper uses techniques from teaching and learning, including the concept of non-clashing teaching maps and the decision problem of non-clashing teaching dimension.
results: The paper shows that the decision problem of non-clashing teaching dimension for a specific class of concept is NP-complete, and derives upper and lower bounds on the size of non-clashing teaching maps for various types of graphs.Here is the answer in Simplified Chinese text:
for: 这篇论文是研究非冲突教学和其在机器学习中的应用。
methods: 这篇论文使用教学和学习中的概念，包括非冲突教学图和机器学习中的冲突教学维度问题。
results: 这篇论文显示了一个特定类型的概念的冲突教学维度问题是NP完备的，并 derive了不同类型的图的非冲突教学图的大小上下限。

Abstract
Recently, Kirkpatrick et al. [ALT 2019] and Fallat et al. [JMLR 2023] introduced non-clashing teaching and showed it to be the most efficient machine teaching model satisfying the benchmark for collusion-avoidance set by Goldman and Mathias. A teaching map $T$ for a concept class $\cal{C}$ assigns a (teaching) set $T(C)$ of examples to each concept $C \in \cal{C}$. A teaching map is non-clashing if no pair of concepts are consistent with the union of their teaching sets. The size of a non-clashing teaching map (NCTM) $T$ is the maximum size of a $T(C)$, $C \in \cal{C}$. The non-clashing teaching dimension NCTD$(\cal{C})$ of $\cal{C}$ is the minimum size of an NCTM for $\cal{C}$. NCTM$^+$ and NCTD$^+(\cal{C})$ are defined analogously, except the teacher may only use positive examples. We study NCTMs and NCTM$^+$s for the concept class $\mathcal{B}(G)$ consisting of all balls of a graph $G$. We show that the associated decision problem {\sc B-NCTD$^+$} for NCTD$^+$ is NP-complete in split, co-bipartite, and bipartite graphs. Surprisingly, we even prove that, unless the ETH fails, {\sc B-NCTD$^+$} does not admit an algorithm running in time $2^{2^{o(vc)}\cdot n^{O(1)}$, nor a kernelization algorithm outputting a kernel with $2^{o(vc)}$ vertices, where vc is the vertex cover number of $G$. These are extremely rare results: it is only the second (fourth, resp.) problem in NP to admit a double-exponential lower bound parameterized by vc (treewidth, resp.), and only one of very few problems to admit an ETH-based conditional lower bound on the number of vertices in a kernel. We complement these lower bounds with matching upper bounds. For trees, interval graphs, cycles, and trees of cycles, we derive NCTM$^+$s or NCTMs for $\mathcal{B}(G)$ of size proportional to its VC-dimension. For Gromov-hyperbolic graphs, we design an approximate NCTM$^+$ for $\mathcal{B}(G)$ of size 2.

摘要
最近， Kirkpatrick 等人（ALT 2019）和 Fallat 等人（JMLR 2023）提出了不冲突教学模型，并证明其能满足由 Goldman 和 Mathias 提出的冲突避免标准。一个教学地图 $T$ 将一个概念集 $\mathcal{C}$ 中的每个概念 $C$ 映射到一个示例集 $T(C)$。一个不冲突的教学地图（NCTM）的大小是示例集的最大大小。概念集 $\mathcal{C}$ 的不冲突教学维度（NCTD）是最小的 NCTM 的大小。NCTM 和 NCTD 分别在 $\mathcal{C}$ 中的正例和负例上进行教学。我们研究了 $\mathcal{B}(G)$ 中的 NCTM 和 NCTM +$ $，其中 $G$ 是一个图。我们证明了其相关的决策问题 $\sc B$-NCTD$^+$ 是 NP 完全的，并且在 split、co-bipartite 和 bipartite 图中存在 double-exponential 下界。这些结果非常罕见：只有第二（第四，resp.）个 NP 问题可以 Parametrized by vc （treewidth，resp.）下界，并且只有几个问题可以通过 ETH 基础来提供 conditional 下界。我们还提供了匹配的上界。对于树、Interval 图、循环图和树状循环图，我们得到了 NCTM 和 NCTM +$ 的大小与 $\mathcal{B}(G)$ 的 VC-维度成正比。对于 Gromov-hyperbolic 图，我们设计了一个 Approximate NCTM +$。

Learning Hybrid Dynamics Models With Simulator-Informed Latent States

paper_url: http://arxiv.org/abs/2309.02873
repo_url: None
paper_authors: Katharina Ensinger, Sebastian Ziesche, Sebastian Trimpe
for: 本研究旨在提出一种新的混合模型方法，用于结合学习和物理假设模型来提高预测结果的物理有效性。
methods: 我们提出了一种基于观察器的方法，通过将观察器与黑盒模拟器结合在一起，以控制预测结果的误差。在学习中，我们同时学习了动力和观察器，以便通过模拟器来更正学习过程中的模型匹配错误。
results: 我们的方法可以在不可预知的情况下提供更加准确和物理有效的预测结果，并且可以保持模型的灵活性。

Abstract
Dynamics model learning deals with the task of inferring unknown dynamics from measurement data and predicting the future behavior of the system. A typical approach to address this problem is to train recurrent models. However, predictions with these models are often not physically meaningful. Further, they suffer from deteriorated behavior over time due to accumulating errors. Often, simulators building on first principles are available being physically meaningful by design. However, modeling simplifications typically cause inaccuracies in these models. Consequently, hybrid modeling is an emerging trend that aims to combine the best of both worlds. In this paper, we propose a new approach to hybrid modeling, where we inform the latent states of a learned model via a black-box simulator. This allows to control the predictions via the simulator preventing them from accumulating errors. This is especially challenging since, in contrast to previous approaches, access to the simulator's latent states is not available. We tackle the task by leveraging observers, a well-known concept from control theory, inferring unknown latent states from observations and dynamics over time. In our learning-based setting, we jointly learn the dynamics and an observer that infers the latent states via the simulator. Thus, the simulator constantly corrects the latent states, compensating for modeling mismatch caused by learning. To maintain flexibility, we train an RNN-based residuum for the latent states that cannot be informed by the simulator.

摘要
模型学习动态模型的任务是从测量数据中推断未知动态和预测系统的未来行为。一般来说，使用回归模型进行训练。然而，这些模型的预测结果frequently不是物理意义上的。此外，随着时间的推移，这些模型的性能会逐渐下降，这是因为模型中的错误会积累。而物理意义上的模拟器通常是可用的，但是这些模型通常受到简化的限制，导致它们不准确。因此，hybrid模型成为一种趋势，它将了解的模型和物理意义上的模拟器结合在一起。在这篇论文中，我们提出一种新的hybrid模型方法，我们通过黑盒模拟器来控制了学习后的隐藏状态。这是因为，不同于之前的方法，我们没有直接访问黑盒模拟器的隐藏状态。我们使用控制理论中的观察器来推断未知的隐藏状态，并且在学习过程中，我们同时学习了动态和观察器。因此，黑盒模拟器不断地更正隐藏状态，以 compensate 模型的偏差，保持模型的准确性。为保持灵活性，我们在隐藏状态中使用RNN基的剩余来表示无法通过黑盒模拟器获取的信息。

On Reducing Undesirable Behavior in Deep Reinforcement Learning Models

paper_url: http://arxiv.org/abs/2309.02869
repo_url: None
paper_authors: Ophir M. Carmel, Guy Katz
for: 提高深度强化学习（DRL）软件的可靠性和可解释性，降低DRL训练过程中不良行为的频率。
methods: 提出一种基于EXTRACTING decision tree classifiers的新框架，将这些树 integrating into DRL 训练 loop，对系统进行错误处理，从而减少不良行为。
results: 在三个 significanthcase studies中，我们的方法可以 straightforward manner extend existing frameworks，增加训练时间的开销较小，并且对性能的影响较小或者甚至提高，同时减少不良行为的频率。

Abstract
Deep reinforcement learning (DRL) has proven extremely useful in a large variety of application domains. However, even successful DRL-based software can exhibit highly undesirable behavior. This is due to DRL training being based on maximizing a reward function, which typically captures general trends but cannot precisely capture, or rule out, certain behaviors of the system. In this paper, we propose a novel framework aimed at drastically reducing the undesirable behavior of DRL-based software, while maintaining its excellent performance. In addition, our framework can assist in providing engineers with a comprehensible characterization of such undesirable behavior. Under the hood, our approach is based on extracting decision tree classifiers from erroneous state-action pairs, and then integrating these trees into the DRL training loop, penalizing the system whenever it performs an error. We provide a proof-of-concept implementation of our approach, and use it to evaluate the technique on three significant case studies. We find that our approach can extend existing frameworks in a straightforward manner, and incurs only a slight overhead in training time. Further, it incurs only a very slight hit to performance, or even in some cases - improves it, while significantly reducing the frequency of undesirable behavior.

摘要

Enhancing Event Sequence Modeling with Contrastive Relational Inference

paper_url: http://arxiv.org/abs/2309.02868
repo_url: None
paper_authors: Yan Wang, Zhixuan Chu, Tao Zhou, Caigao Jiang, Hongyan Hao, Minjie Zhu, Xindong Cai, Qing Cui, Longfei Li, James Y Zhang, Siqiao Xue, Jun Zhou
for: 模型化连续时间事件序列，特别是捕捉事件之间的交互关系，以进行事件序列预测等推理任务。
methods: 基于神经关系推理（NRI），学习一个关系图，捕捉事件之间的交互关系，同时也学习事件序列的动态模式。
results: 在三个实际数据集上，实验表明我们的模型能够有效地捕捉事件之间的交互关系，用于事件序列模型化任务。

Abstract
Neural temporal point processes(TPPs) have shown promise for modeling continuous-time event sequences. However, capturing the interactions between events is challenging yet critical for performing inference tasks like forecasting on event sequence data. Existing TPP models have focused on parameterizing the conditional distribution of future events but struggle to model event interactions. In this paper, we propose a novel approach that leverages Neural Relational Inference (NRI) to learn a relation graph that infers interactions while simultaneously learning the dynamics patterns from observational data. Our approach, the Contrastive Relational Inference-based Hawkes Process (CRIHP), reasons about event interactions under a variational inference framework. It utilizes intensity-based learning to search for prototype paths to contrast relationship constraints. Extensive experiments on three real-world datasets demonstrate the effectiveness of our model in capturing event interactions for event sequence modeling tasks.

摘要
neural temporal point processes (TPPs) 有推荐模型 continuous-time event sequences. However, capturing the interactions between events is challenging yet critical for performing inference tasks like forecasting on event sequence data. Existing TPP models have focused on parameterizing the conditional distribution of future events but struggle to model event interactions. In this paper, we propose a novel approach that leverages Neural Relational Inference (NRI) to learn a relation graph that infers interactions while simultaneously learning the dynamics patterns from observational data. Our approach, the Contrastive Relational Inference-based Hawkes Process (CRIHP), reasons about event interactions under a variational inference framework. It utilizes intensity-based learning to search for prototype paths to contrast relationship constraints. Extensive experiments on three real-world datasets demonstrate the effectiveness of our model in capturing event interactions for event sequence modeling tasks.Here's the breakdown of the translation:* neural temporal point processes (TPPs) 是一种 continuous-time event sequences 的模型。* However, capturing the interactions between events is challenging yet critical for performing inference tasks like forecasting on event sequence data。* Existing TPP models have focused on parameterizing the conditional distribution of future events but struggle to model event interactions。* In this paper, we propose a novel approach that leverages Neural Relational Inference (NRI) to learn a relation graph that infers interactions while simultaneously learning the dynamics patterns from observational data。* Our approach, the Contrastive Relational Inference-based Hawkes Process (CRIHP), reasons about event interactions under a variational inference framework。* It utilizes intensity-based learning to search for prototype paths to contrast relationship constraints。* Extensive experiments on three real-world datasets demonstrate the effectiveness of our model in capturing event interactions for event sequence modeling tasks。

A Critical Review of Common Log Data Sets Used for Evaluation of Sequence-based Anomaly Detection Techniques

paper_url: http://arxiv.org/abs/2309.02854
repo_url: https://github.com/ait-aecid/anomaly-detection-log-datasets
paper_authors: Max Landauer, Florian Skopik, Markus Wurzenberger
for: 本研究旨在分析六种公共可用的日志数据集，以检测日志数据中异常现象的探测方法。
methods: 本研究使用了深度学习技术来检测日志数据中的异常现象，并评估了这些异常检测技术的效果。
results: 研究发现，大多数异常现象不是直接相关于顺序的探测，而是相关于其他特征的探测。此外，研究还发现了一些简单的检测技术可以在这些数据集上达到高的检测率。

Abstract
Log data store event execution patterns that correspond to underlying workflows of systems or applications. While most logs are informative, log data also include artifacts that indicate failures or incidents. Accordingly, log data are often used to evaluate anomaly detection techniques that aim to automatically disclose unexpected or otherwise relevant system behavior patterns. Recently, detection approaches leveraging deep learning have increasingly focused on anomalies that manifest as changes of sequential patterns within otherwise normal event traces. Several publicly available data sets, such as HDFS, BGL, Thunderbird, OpenStack, and Hadoop, have since become standards for evaluating these anomaly detection techniques, however, the appropriateness of these data sets has not been closely investigated in the past. In this paper we therefore analyze six publicly available log data sets with focus on the manifestations of anomalies and simple techniques for their detection. Our findings suggest that most anomalies are not directly related to sequential manifestations and that advanced detection techniques are not required to achieve high detection rates on these data sets.

摘要
log数据存储事件执行模式，与系统或应用程序的下面工作流程相对应。大多数日志数据都很有用，但日志数据也包含了失败或事件的畸形或物理特征。因此，日志数据经常用于评估自动发现不当或有关系系统行为模式的异常检测技术。在过去几年中，使用深度学习的检测方法在异常检测方面得到了越来越多的关注，特别是在 seq 执行模式中发现异常的变化。一些公共可用的数据集，如 HDFS、BGL、Thunderbird、OpenStack 和 Hadoop，已成为评估这些异常检测技术的标准数据集，但这些数据集的适用性没有过去仔细研究。在这篇论文中，我们分析了六个公共可用的日志数据集，关注异常的表现和简单的检测技术。我们的发现表明，大多数异常并不直接与 seq 执行模式相关，并且不需要高级的检测技术可以在这些数据集上 достичь高的检测率。

Random postprocessing for combinatorial Bayesian optimization

paper_url: http://arxiv.org/abs/2309.02842
repo_url: None
paper_authors: Keisuke Morita, Yoshihiko Nishikawa, Masayuki Ohzeki
for: optimize discrete “black-box” optimization problems
methods: Bayesian optimization techniques with postprocessing method to avoid duplicated samples
results: significantly reduces the number of sequential steps to find the global optimum, especially when the acquisition function is of maximum a posterior estimation.

Abstract
Model-based sequential approaches to discrete "black-box" optimization, including Bayesian optimization techniques, often access the same points multiple times for a given objective function in interest, resulting in many steps to find the global optimum. Here, we numerically study the effect of a postprocessing method on Bayesian optimization that strictly prohibits duplicated samples in the dataset. We find the postprocessing method significantly reduces the number of sequential steps to find the global optimum, especially when the acquisition function is of maximum a posterior estimation. Our results provide a simple but general strategy to solve the slow convergence of Bayesian optimization for high-dimensional problems.

摘要
模型基于的连续Sequential方法，包括 Bayesian 优化技术，通常会在给定目标函数中访问相同的点多次，从而导致找到全局最优的过程比较慢。在这里，我们numerically研究了在 Bayesian 优化中使用禁止重复样本的后处理方法的效果。我们发现这种后处理方法可以减少找到全局最优的步骤数量，特别是当采集函数是最大 posterior 估计时。我们的结果提供了一种简单 yet 通用的策略，用于解决高维问题中 Bayesian 优化的慢速收敛问题。

BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network

paper_url: http://arxiv.org/abs/2309.02836
repo_url: https://github.com/sony/bigvsan_eval
paper_authors: Takashi Shibuya, Yuhta Takida, Yuki Mitsufuji
for: 这个论文是关于使用生成对抗网络（GAN）实现高质量音频合成的研究。
methods: 这个论文使用了改进的对抗网络训练框架——割辑对抗网络（SAN），以便在特征空间找到最佳投影。
results: 通过实验，这个论文表明了SAN可以提高GAN基于 vocoder的性能，包括BigVGAN，只需小改动。

Abstract
Generative adversarial network (GAN)-based vocoders have been intensively studied because they can synthesize high-fidelity audio waveforms faster than real-time. However, it has been reported that most GANs fail to obtain the optimal projection for discriminating between real and fake data in the feature space. In the literature, it has been demonstrated that slicing adversarial network (SAN), an improved GAN training framework that can find the optimal projection, is effective in the image generation task. In this paper, we investigate the effectiveness of SAN in the vocoding task. For this purpose, we propose a scheme to modify least-squares GAN, which most GAN-based vocoders adopt, so that their loss functions satisfy the requirements of SAN. Through our experiments, we demonstrate that SAN can improve the performance of GAN-based vocoders, including BigVGAN, with small modifications. Our code is available at https://github.com/sony/bigvsan.

摘要

Introducing Thermodynamics-Informed Symbolic Regression – A Tool for Thermodynamic Equations of State Development

paper_url: http://arxiv.org/abs/2309.02805
repo_url: https://github.com/scoop-group/tisr
paper_authors: Viktor Martinek, Ophelia Frotscher, Markus Richter, Roland Herzog
For: The paper is written for researchers and developers who are interested in creating accurate thermodynamic equations of state (EOS) for various industries and academic applications.* Methods: The paper introduces a new symbolic regression (SR) tool called thermodynamics-informed symbolic regression (TiSR), which combines the SR base with extensions to work with scattered experimental data, different residual pre- and post-processing options, and additional features for thermodynamic EOS development.* Results: The paper showcases the progress of TiSR and discusses its current state, future directions, and potential applications in the field of thermodynamics.Here is the same information in Simplified Chinese text:* For: 这篇论文是为了各种工业和学术应用中的热力学方程状态（EOS）的创建而写的。* Methods: 论文介绍了一种新的符号回归（SR）工具——热力学 Informed Symbolic Regression（TiSR），它将SR基础与EXTensions相结合，以处理各种散布的实验数据、不同的剩余预处理和后处理选项、以及附加的热力学EOS发展要求。* Results: 论文展示了TiSR的进步和当前状态，并讨论了未来的发展和应用于热力学领域。I hope this helps!

Abstract
Thermodynamic equations of state (EOS) are essential for many industries as well as in academia. Even leaving aside the expensive and extensive measurement campaigns required for the data acquisition, the development of EOS is an intensely time-consuming process, which does often still heavily rely on expert knowledge and iterative fine-tuning. To improve upon and accelerate the EOS development process, we introduce thermodynamics-informed symbolic regression (TiSR), a symbolic regression (SR) tool aimed at thermodynamic EOS modeling. TiSR is already a capable SR tool, which was used in the research of https://doi.org/10.1007/s10765-023-03197-z. It aims to combine an SR base with the extensions required to work with often strongly scattered experimental data, different residual pre- and post-processing options, and additional features required to consider thermodynamic EOS development. Although TiSR is not ready for end users yet, this paper is intended to report on its current state, showcase the progress, and discuss (distant and not so distant) future directions. TiSR is available at https://github.com/scoop-group/TiSR and can be cited as https://doi.org/10.5281/zenodo.8317547.

摘要
thermodynamic equation of state (EOS) 是各行业以及学术界中非常重要的。即使不考虑数据收集的昂贵和复杂的测量活动，EOS 的开发还是一个非常时间consuming的过程，它frequently 仍然倚靠专家知识和迭代精细调整。为了改进和加速 EOS 开发过程，我们介绍 thermodynamic-informed symbolic regression (TiSR)，一种符号 regression (SR) 工具专门用于 thermodynamic EOS 模型化。TiSR 已经是一种可靠的 SR 工具，它在 https://doi.org/10.1007/s10765-023-03197-z 中的研究中使用。它的目标是将 SR 基础结合 thermodynamic EOS 开发所需的扩展，以及不同的剩余预处理和后处理选项，以及考虑 thermodynamic EOS 开发中的其他特性。虽然 TiSR 还不是用户准备就绪，但这篇文章是为报告其当前状况，展示进度，并讨论（远离和不远）未来方向。TiSR 可以在 https://github.com/scoop-group/TiSR 上下载，并可以在 https://doi.org/10.5281/zenodo.8317547 上引用。

Dynamic Encoding and Decoding of Information for Split Learning in Mobile-Edge Computing: Leveraging Information Bottleneck Theory

paper_url: http://arxiv.org/abs/2309.02787
repo_url: None
paper_authors: Omar Alhussein, Moshi Wei, Arashmid Akhavain
for: 该研究旨在提出一种基于分解学习的隐私保护分布式学习方法，以提高移动端Edge计算中的网络功能（如流量预测）的训练效果。
methods: 该研究使用了数据处理不均性和信息瓶颈理论来实现一种动态均衡传输资源的使用与共享干扰表示的信息含量，从而直接影响预测性能。
results: 该研究提出了一种基于encoder-decoder神经网络架构的新训练机制，可以实现可调Complexity-Relevance质量的交换，并且可以适应不同的实时网络条件和应用要求，从而减少运营成本和提高网络灵活性。

Abstract
Split learning is a privacy-preserving distributed learning paradigm in which an ML model (e.g., a neural network) is split into two parts (i.e., an encoder and a decoder). The encoder shares so-called latent representation, rather than raw data, for model training. In mobile-edge computing, network functions (such as traffic forecasting) can be trained via split learning where an encoder resides in a user equipment (UE) and a decoder resides in the edge network. Based on the data processing inequality and the information bottleneck (IB) theory, we present a new framework and training mechanism to enable a dynamic balancing of the transmission resource consumption with the informativeness of the shared latent representations, which directly impacts the predictive performance. The proposed training mechanism offers an encoder-decoder neural network architecture featuring multiple modes of complexity-relevance tradeoffs, enabling tunable performance. The adaptability can accommodate varying real-time network conditions and application requirements, potentially reducing operational expenditure and enhancing network agility. As a proof of concept, we apply the training mechanism to a millimeter-wave (mmWave)-enabled throughput prediction problem. We also offer new insights and highlight some challenges related to recurrent neural networks from the perspective of the IB theory. Interestingly, we find a compression phenomenon across the temporal domain of the sequential model, in addition to the compression phase that occurs with the number of training epochs.

摘要
分学学习是一种隐私保护的分布式学习 paradigma，在其中一个机器学习模型（例如神经网络）被分解成两部分（即编码器和解码器）。编码器分享所谓的隐藏表示，而不是原始数据，用于模型训练。在移动边缘计算中，网络功能（如交通预测）可以通过分学学习进行训练，其中编码器位于用户设备（UE）中，而解码器位于边缘网络中。基于数据处理不等式和信息瓶颈（IB）理论，我们提出了一个新的框架和训练机制，以实现在传输资源消耗和隐藏表示的共享中进行动态的衡量平衡，直接影响预测性能。我们的训练机制提供了一个编码器-解码器神经网络架构， featuring 多种复杂度-相关性质的交易，以实现可调性。这种适应性可以满足不同的实时网络条件和应用需求，可能减少运营成本并提高网络的灵活性。作为证明，我们应用训练机制到一个毫米波（mmWave）吞吐量预测问题上。我们还提供了新的视角和高亮一些与IB理论相关的挑战，包括循环神经网络中的压缩现象。 Interestingly，我们发现在时间Domain中，隐藏表示的压缩现象，以及在训练环次中的压缩阶段。

CVE-driven Attack Technique Prediction with Semantic Information Extraction and a Domain-specific Language Model

paper_url: http://arxiv.org/abs/2309.02785
repo_url: None
paper_authors: Ehsan Aghaei, Ehab Al-Shaer
for: 本研究旨在填补cybersecurity中的漏洞信息和攻击动作之间的知识渠道差距。
methods: 本文引入TTPpredictor工具，利用创新技术分析CVE描述文本，推断可能的攻击动作（tactics, techniques, and procedures,或TTP）。
results: 实验证明TTPpredictor具有约98%的准确率和95%-98%的F1分数，能够准确地将CVE分类到ATT&CK技术中。TTPpredictor的性能高于现有的语言模型工具 like ChatGPT。

Abstract
This paper addresses a critical challenge in cybersecurity: the gap between vulnerability information represented by Common Vulnerabilities and Exposures (CVEs) and the resulting cyberattack actions. CVEs provide insights into vulnerabilities, but often lack details on potential threat actions (tactics, techniques, and procedures, or TTPs) within the ATT&CK framework. This gap hinders accurate CVE categorization and proactive countermeasure initiation. The paper introduces the TTPpredictor tool, which uses innovative techniques to analyze CVE descriptions and infer plausible TTP attacks resulting from CVE exploitation. TTPpredictor overcomes challenges posed by limited labeled data and semantic disparities between CVE and TTP descriptions. It initially extracts threat actions from unstructured cyber threat reports using Semantic Role Labeling (SRL) techniques. These actions, along with their contextual attributes, are correlated with MITRE's attack functionality classes. This automated correlation facilitates the creation of labeled data, essential for categorizing novel threat actions into threat functionality classes and TTPs. The paper presents an empirical assessment, demonstrating TTPpredictor's effectiveness with accuracy rates of approximately 98% and F1-scores ranging from 95% to 98% in precise CVE classification to ATT&CK techniques. TTPpredictor outperforms state-of-the-art language model tools like ChatGPT. Overall, this paper offers a robust solution for linking CVEs to potential attack techniques, enhancing cybersecurity practitioners' ability to proactively identify and mitigate threats.

摘要

On the Effects of Heterogeneous Errors on Multi-fidelity Bayesian Optimization

paper_url: http://arxiv.org/abs/2309.02771
repo_url: None
paper_authors: Zahra Zanjani Foumani, Amin Yousefpour, Mehdi Shishehbor, Ramin Bostanabad
for: This paper is written for researchers and practitioners who are interested in using multi-fidelity methods for Bayesian optimization in materials design.
methods: The paper proposes a new multi-fidelity emulation method that learns a noise model for each data source and enables the use of highly biased low-fidelity sources for Bayesian optimization.
results: The paper demonstrates the performance of the proposed method through analytical examples and engineering problems on materials design, showing that it can improve the efficiency and accuracy of Bayesian optimization compared to existing methods.

Abstract
Bayesian optimization (BO) is a sequential optimization strategy that is increasingly employed in a wide range of areas including materials design. In real world applications, acquiring high-fidelity (HF) data through physical experiments or HF simulations is the major cost component of BO. To alleviate this bottleneck, multi-fidelity (MF) methods are used to forgo the sole reliance on the expensive HF data and reduce the sampling costs by querying inexpensive low-fidelity (LF) sources whose data are correlated with HF samples. However, existing multi-fidelity BO (MFBO) methods operate under the following two assumptions that rarely hold in practical applications: (1) LF sources provide data that are well correlated with the HF data on a global scale, and (2) a single random process can model the noise in the fused data. These assumptions dramatically reduce the performance of MFBO when LF sources are only locally correlated with the HF source or when the noise variance varies across the data sources. In this paper, we dispense with these incorrect assumptions by proposing an MF emulation method that (1) learns a noise model for each data source, and (2) enables MFBO to leverage highly biased LF sources which are only locally correlated with the HF source. We illustrate the performance of our method through analytical examples and engineering problems on materials design.

摘要
bayesian 优化 (BO) 是一种顺序优化策略，在各种领域中越来越广泛应用，包括材料设计。在实际应用中，通过物理实验或高精度计算获取高精度数据是 BO 的主要成本组成部分。为了缓解这个瓶颈，使用多元精度 (MF) 方法，可以减少样本成本，通过访问便宜的低精度 (LF) 源的数据，其与高精度数据相关。然而，现有的 MFBO 方法假设 rarely 在实际应用中成立：（1） LF 源提供的数据与高精度数据在全球范围内很好地相关，和（2）数据源之间的噪声可以由单个随机过程模型。这些假设会使 MFBO 在 LF 源只是地方相关于高精度源或噪声方差随着数据源而变化时表现不佳。在这篇文章中，我们抛弃这些错误假设，提出一种 MF 模拟方法，其中（1）学习每个数据源的噪声模型，和（2）允许 MFBO 利用高偏见 LF 源。我们通过分析示例和工程问题来证明我们的方法的性能。

Unifying over-smoothing and over-squashing in graph neural networks: A physics informed approach and beyond

paper_url: http://arxiv.org/abs/2309.02769
repo_url: None
paper_authors: Zhiqi Shao, Dai Shi, Andi Han, Yi Guo, Qibin Zhao, Junbin Gao
for:The paper aims to address critical computational challenges in graph neural networks (GNNs), such as over-smoothing and limited expressive power, by introducing a new method called Multi-Scaled Heat Kernel based GNN (MHKG) and its generalization G-MHKG.methods:The proposed method reverses the time direction of the graph heat equation to enhance the sharpness of graph node features, and leverages high pass filtering functions to improve the performance of GNNs.results:The proposed models (MHKG and G-MHKG) outperform several GNN baseline models in terms of performance across various graph datasets characterized by both homophily and heterophily. Additionally, the trade-off between over-smoothing and over-squashing is analyzed, and the method is shown to handle both issues under mild conditions.

Abstract
Graph Neural Networks (GNNs) have emerged as one of the leading approaches for machine learning on graph-structured data. Despite their great success, critical computational challenges such as over-smoothing, over-squashing, and limited expressive power continue to impact the performance of GNNs. In this study, inspired from the time-reversal principle commonly utilized in classical and quantum physics, we reverse the time direction of the graph heat equation. The resulted reversing process yields a class of high pass filtering functions that enhance the sharpness of graph node features. Leveraging this concept, we introduce the Multi-Scaled Heat Kernel based GNN (MHKG) by amalgamating diverse filtering functions' effects on node features. To explore more flexible filtering conditions, we further generalize MHKG into a model termed G-MHKG and thoroughly show the roles of each element in controlling over-smoothing, over-squashing and expressive power. Notably, we illustrate that all aforementioned issues can be characterized and analyzed via the properties of the filtering functions, and uncover a trade-off between over-smoothing and over-squashing: enhancing node feature sharpness will make model suffer more from over-squashing, and vice versa. Furthermore, we manipulate the time again to show how G-MHKG can handle both two issues under mild conditions. Our conclusive experiments highlight the effectiveness of proposed models. It surpasses several GNN baseline models in performance across graph datasets characterized by both homophily and heterophily.

摘要
GRAPH神经网络（GNNs）已经成为机器学习图structured数据的一种主流方法。尽管它们具有出色的成功，但是计算挑战，如过滤、过压和表达能力的局限性，仍然影响GNNs的性能。在这个研究中，我们基于时间反转原理，通常用于古典物理和量子物理中，对图热方程进行时间反转。这个过程生成了一类高通过滤函数，可以增强图节点特征的锐度。基于这个概念，我们介绍了多级热均衡基于GNN（MHKG），通过不同滤波函数对节点特征产生的影响。为了探索更多的筛选条件，我们进一步总结MHKG模型，并详细介绍每个元素在控制过滤、过压和表达能力方面的作用。我们发现，所有的问题都可以通过滤波函数的性质来characterized和分析，并发现过滤和过压之间存在一种负相关关系：增强节点特征锐度会使模型受到更多的过压，并且vice versa。此外，我们再次操作时间，展示了G-MHKG模型在轻度条件下可以处理两个问题。我们的实验结果表明，提出的模型在多个图 Dataset上表现出色，超过了一些GNN基eline模型。

Towards Unsupervised Graph Completion Learning on Graphs with Features and Structure Missing

paper_url: http://arxiv.org/abs/2309.02762
repo_url: None
paper_authors: Sichao Fu, Qinmu Peng, Yang He, Baokun Du, Xinge You
for: 这种方法可以用于改善存在特定任务下的图分析Graph Neural Networks (GNN)的性能，特别是在存在部分缺失的节点特征或结构关系的情况下。methods: 我们提出了一种更通用的图完成学习（GCL）框架，通过自我监督学习来提高现有GNN变体在图中的任务性能，并解决了现有GCL方法存在的标签依赖、节点特征和结构关系的偏见问题。results: 我们在八个 dataset、三种 GNN 变体和五种缺失率进行了广泛的实验，结果显示我们的提议方法能够有效地提高GNN的任务性能。

Abstract
In recent years, graph neural networks (GNN) have achieved significant developments in a variety of graph analytical tasks. Nevertheless, GNN's superior performance will suffer from serious damage when the collected node features or structure relationships are partially missing owning to numerous unpredictable factors. Recently emerged graph completion learning (GCL) has received increasing attention, which aims to reconstruct the missing node features or structure relationships under the guidance of a specifically supervised task. Although these proposed GCL methods have made great success, they still exist the following problems: the reliance on labels, the bias of the reconstructed node features and structure relationships. Besides, the generalization ability of the existing GCL still faces a huge challenge when both collected node features and structure relationships are partially missing at the same time. To solve the above issues, we propose a more general GCL framework with the aid of self-supervised learning for improving the task performance of the existing GNN variants on graphs with features and structure missing, termed unsupervised GCL (UGCL). Specifically, to avoid the mismatch between missing node features and structure during the message-passing process of GNN, we separate the feature reconstruction and structure reconstruction and design its personalized model in turn. Then, a dual contrastive loss on the structure level and feature level is introduced to maximize the mutual information of node representations from feature reconstructing and structure reconstructing paths for providing more supervision signals. Finally, the reconstructed node features and structure can be applied to the downstream node classification task. Extensive experiments on eight datasets, three GNN variants and five missing rates demonstrate the effectiveness of our proposed method.

摘要
Recently, graph neural networks (GNN) have made significant progress in various graph analytical tasks. However, GNN's performance will be severely affected when the collected node features or structural relationships are partially missing due to various unpredictable factors. To address this issue, graph completion learning (GCL) has received increasing attention, which aims to reconstruct the missing node features or structural relationships under the guidance of a specifically supervised task. Although these proposed GCL methods have achieved great success, they still have the following problems: reliance on labels, bias in the reconstructed node features and structural relationships. Moreover, the existing GCL methods have difficulty in generalizing when both collected node features and structural relationships are partially missing simultaneously.To solve these issues, we propose a more general GCL framework with the aid of self-supervised learning to improve the task performance of the existing GNN variants on graphs with features and structure missing, termed unsupervised GCL (UGCL). Specifically, to address the mismatch between missing node features and structure during the message-passing process of GNN, we separate the feature reconstruction and structural reconstruction and design personalized models for each. Then, we introduce a dual contrastive loss on the structure level and feature level to maximize the mutual information of node representations from feature reconstructing and structure reconstructing paths, providing more supervision signals. Finally, the reconstructed node features and structure can be applied to the downstream node classification task. Our extensive experiments on eight datasets, three GNN variants, and five missing rates demonstrate the effectiveness of our proposed method.

Safe Neural Control for Non-Affine Control Systems with Differentiable Control Barrier Functions

paper_url: http://arxiv.org/abs/2309.04492
repo_url: None
paper_authors: Wei Xiao, Ross Allen, Daniela Rus
for: 这 paper Addresses the problem of safety-critical control for non-affine control systems.
methods: The paper uses Control Barrier Functions (CBFs) to optimize quadratic costs subject to state and control constraints, and incorporates higher-order CBFs into neural ordinary differential equation-based learning models as differentiable CBFs to guarantee safety for non-affine control systems.
results: The proposed framework is capable of learning complex and optimal control policies that are usually intractable online, and can address the conservativeness of CBFs such that the system state will not stay unnecessarily far away from safe set boundaries. The effectiveness of the proposed framework is illustrated on LiDAR-based autonomous driving and compared with existing methods.

Abstract
This paper addresses the problem of safety-critical control for non-affine control systems. It has been shown that optimizing quadratic costs subject to state and control constraints can be sub-optimally reduced to a sequence of quadratic programs (QPs) by using Control Barrier Functions (CBFs). Our recently proposed High Order CBFs (HOCBFs) can accommodate constraints of arbitrary relative degree. The main challenges in this approach are that it requires affine control dynamics and the solution of the CBF-based QP is sub-optimal since it is solved point-wise. To address these challenges, we incorporate higher-order CBFs into neural ordinary differential equation-based learning models as differentiable CBFs to guarantee safety for non-affine control systems. The differentiable CBFs are trainable in terms of their parameters, and thus, they can address the conservativeness of CBFs such that the system state will not stay unnecessarily far away from safe set boundaries. Moreover, the imitation learning model is capable of learning complex and optimal control policies that are usually intractable online. We illustrate the effectiveness of the proposed framework on LiDAR-based autonomous driving and compare it with existing methods.

摘要

Improved Outlier Robust Seeding for k-means

paper_url: http://arxiv.org/abs/2309.02710
repo_url: None
paper_authors: Amit Deshpande, Rameshwar Pratap
for: 提出了一种robust的$k$-means算法，可以抗性减少噪声和异常点的影响。
methods: 使用了一种简单的变换方法来改进$D^{2}$ sampling的分布，使其更加抗性。
results: 提出了一种 Linear Time $k$-means算法，可以在 $O(ndk)$ 时间内输出 $O(k)$ 个团集，并且有 $O(1)$ 的近似性保证。

Abstract
The $k$-means is a popular clustering objective, although it is inherently non-robust and sensitive to outliers. Its popular seeding or initialization called $k$-means++ uses $D^{2}$ sampling and comes with a provable $O(\log k)$ approximation guarantee \cite{AV2007}. However, in the presence of adversarial noise or outliers, $D^{2}$ sampling is more likely to pick centers from distant outliers instead of inlier clusters, and therefore its approximation guarantees \textit{w.r.t.} $k$-means solution on inliers, does not hold. Assuming that the outliers constitute a constant fraction of the given data, we propose a simple variant in the $D^2$ sampling distribution, which makes it robust to the outliers. Our algorithm runs in $O(ndk)$ time, outputs $O(k)$ clusters, discards marginally more points than the optimal number of outliers, and comes with a provable $O(1)$ approximation guarantee. Our algorithm can also be modified to output exactly $k$ clusters instead of $O(k)$ clusters, while keeping its running time linear in $n$ and $d$. This is an improvement over previous results for robust $k$-means based on LP relaxation and rounding \cite{Charikar}, \cite{KrishnaswamyLS18} and \textit{robust $k$-means++} \cite{DeshpandeKP20}. Our empirical results show the advantage of our algorithm over $k$-means++~\cite{AV2007}, uniform random seeding, greedy sampling for $k$ means~\cite{tkmeanspp}, and robust $k$-means++~\cite{DeshpandeKP20}, on standard real-world and synthetic data sets used in previous work. Our proposal is easily amenable to scalable, faster, parallel implementations of $k$-means++ \cite{Bahmani,BachemL017} and is of independent interest for coreset constructions in the presence of outliers \cite{feldman2007ptas,langberg2010universal,feldman2011unified}.

摘要
“$k$-means”是一种受欢迎的聚类目标函数，然而它本身是非稳定的和敏感于异常值的。它的常用的种子或初始化方法called $k$-means++使用$D^{2}$抽样，并有一个可证明的$O(\log k)$的近似保证 \cite{AV2007}。然而，在异常值或异常抽样下，$D^{2}$抽样更容易从远程异常值中选择中心点而不是内lier峰值，因此其近似保证与$k$-means解峰不同。 Assuming that the outliers constitute a constant fraction of the given data, we propose a simple variant in the $D^2$ sampling distribution, which makes it robust to the outliers. Our algorithm runs in $O(ndk)$ time, outputs $O(k)$ clusters, discards marginally more points than the optimal number of outliers, and comes with a provable $O(1)$ approximation guarantee. Our algorithm can also be modified to output exactly $k$ clusters instead of $O(k)$ clusters, while keeping its running time linear in $n$ and $d$. This is an improvement over previous results for robust $k$-means based on LP relaxation and rounding \cite{Charikar}, \cite{KrishnaswamyLS18} and \textit{robust $k$-means++} \cite{DeshpandeKP20}. Our empirical results show the advantage of our algorithm over $k$-means++ \cite{AV2007}, uniform random seeding, greedy sampling for $k$ means \cite{tkmeanspp}, and robust $k$-means++ \cite{DeshpandeKP20}, on standard real-world and synthetic data sets used in previous work. Our proposal is easily amenable to scalable, faster, parallel implementations of $k$-means++ \cite{Bahmani,BachemL017} and is of independent interest for coreset constructions in the presence of outliers \cite{feldman2007ptas,langberg2010universal,feldman2011unified}.

Marketing Budget Allocation with Offline Constrained Deep Reinforcement Learning

paper_url: http://arxiv.org/abs/2309.02669
repo_url: None
paper_authors: Tianchi Cai, Jiyan Jiang, Wenpeng Zhang, Shiji Zhou, Xierui Song, Li Yu, Lihong Gu, Xiaodong Zeng, Jinjie Gu, Guannan Zhang
for: 这个论文关注在线广告运营中的营销预算分配问题，使用先前收集的线上数据。
methods: 该论文提出了一种基于游戏理论的离线值基 reinforcement learning方法，使用混合策略，从而减少了存储多个策略的需求，实现了几乎最佳的策略效率，使其在实际应用中成为可能。
results: 我们的实验表明，该方法可以在一个大规模的广告运营中，覆盖数十万名用户和数百亿的预算，并且超过了多种基准方法。此外，该方法可以 garantía到最佳策略的收敛，而不是先前的值基 reinforcement learning方法不能 garantía。

Abstract
We study the budget allocation problem in online marketing campaigns that utilize previously collected offline data. We first discuss the long-term effect of optimizing marketing budget allocation decisions in the offline setting. To overcome the challenge, we propose a novel game-theoretic offline value-based reinforcement learning method using mixed policies. The proposed method reduces the need to store infinitely many policies in previous methods to only constantly many policies, which achieves nearly optimal policy efficiency, making it practical and favorable for industrial usage. We further show that this method is guaranteed to converge to the optimal policy, which cannot be achieved by previous value-based reinforcement learning methods for marketing budget allocation. Our experiments on a large-scale marketing campaign with tens-of-millions users and more than one billion budget verify the theoretical results and show that the proposed method outperforms various baseline methods. The proposed method has been successfully deployed to serve all the traffic of this marketing campaign.

摘要
我们研究线上广告营运中的预算分配问题，利用先前收集的资料进行优化。我们首先讨论了在线上设定预算分配的长期影响。为解决这个挑战，我们提出了一种基于游戏理论的掌握策略混合方法。这种方法可以将储存多个策略的需求降低至几个策略，实现近乎最佳策略效率，使其在工业中实际和有利。我们进一步证明这种方法将会导向最佳策略，而先前的值基掌握学习方法不能实现这一点。我们的实验显示，这种方法在一个涉及数十万名用户和一百亿预算的大规模广告营运中表现出色，并且超过了各种基准方法。我们已经将这种方法发布到服务这个广告营运中的所有流量。

Federated Learning Over Images: Vertical Decompositions and Pre-Trained Backbones Are Difficult to Beat

paper_url: http://arxiv.org/abs/2309.03237
repo_url: None
paper_authors: Erdong Hu, Yuxin Tang, Anastasios Kyrillidis, Chris Jermaine
for: 这个论文是为了研究在联合环境中学习的算法。
methods: 论文使用了多种算法来学习，包括分割神经网络、纵向归一化和权重学习。
results: 研究发现，将神经网络Vertically decomposing可以获得最佳结果，并超过标准的重要方法。

Abstract
We carefully evaluate a number of algorithms for learning in a federated environment, and test their utility for a variety of image classification tasks. We consider many issues that have not been adequately considered before: whether learning over data sets that do not have diverse sets of images affects the results; whether to use a pre-trained feature extraction "backbone"; how to evaluate learner performance (we argue that classification accuracy is not enough), among others. Overall, across a wide variety of settings, we find that vertically decomposing a neural network seems to give the best results, and outperforms more standard reconciliation-used methods.

摘要
我们仔细评估了许多 Federated Learning 环境中学习算法，并对各种图像分类任务进行测试。我们考虑了许多未经充分考虑的问题：不同数据集之间的图像多样性对结果的影响；使用预训练的特征提取“脊梁”；评估学习器性能的方法（我们认为分类精度不够）等。总之，在多种设置下，我们发现垂直 decomposing 神经网络可以获得最好的结果，并超越了标准的重新整合方法。

Contrastive Learning as Kernel Approximation

paper_url: http://arxiv.org/abs/2309.02651
repo_url: None
paper_authors: Konstantinos Christopher Tsiolis
for: 这个论文主要针对的是用contrastive learning方法学习从无标签数据中提取特征。
methods: 这个论文使用了contrastive loss函数来训练低维ensional的特征表示，并使用了对偶搜集方法来生成对称的训练对。
results: 研究人员发现，使用contrastive learning方法可以从大量的无标签数据中提取高质量的特征，并且可以在小型标签数据上进行高精度的预测。

Abstract
In standard supervised machine learning, it is necessary to provide a label for every input in the data. While raw data in many application domains is easily obtainable on the Internet, manual labelling of this data is prohibitively expensive. To circumvent this issue, contrastive learning methods produce low-dimensional vector representations (also called features) of high-dimensional inputs on large unlabelled datasets. This is done by training with a contrastive loss function, which enforces that similar inputs have high inner product and dissimilar inputs have low inner product in the feature space. Rather than annotating each input individually, it suffices to define a means of sampling pairs of similar and dissimilar inputs. Contrastive features can then be fed as inputs to supervised learning systems on much smaller labelled datasets to obtain high accuracy on end tasks of interest. The goal of this thesis is to provide an overview of the current theoretical understanding of contrastive learning, specifically as it pertains to the minimizers of contrastive loss functions and their relationship to prior methods for learning features from unlabelled data. We highlight popular contrastive loss functions whose minimizers implicitly approximate a positive semidefinite (PSD) kernel. The latter is a well-studied object in functional analysis and learning theory that formalizes a notion of similarity between elements of a space. PSD kernels provide an implicit definition of features through the theory of reproducing kernel Hilbert spaces.

摘要
通常的超级vised机器学习中需要为每个输入提供标签。然而，在许多应用领域中，原始数据的 raw 数据可以轻松地从互联网上获得，但是手动标签这些数据是不可能的。为了解决这个问题，对冲学方法生成了低维度向量表示（也称为特征），这些特征可以在大量未标注数据上进行训练。这是通过对冲损失函数进行训练，使得相似的输入在特征空间中具有高内积，而不相似的输入具有低内积。而不是每个输入都需要注释，只需要定义一种方法来随机对应输入进行对比即可。对冲特征可以被用作supervised learning系统的输入，并在许多小型标注数据上获得高精度的终点任务。本论文的目标是提供对冲学的当前理论认知，具体来说是关于对冲损失函数的最小值和其与先前未标注数据学习方法之间的关系。我们强调了流行的对冲损失函数，其最小值隐式地 aproximate 一个正semidefinite（PSD）kernel。后者是函数分析和学习理论中已有的一个很好地定义相似性的概念。PSD kernel 提供了一种隐式地定义特征的方法，通过对冲特征空间的理论。

2023-09-07

Evaluation and Mitigation of Agnosia in Multimodal Large Language Models

Multiple Representation Transfer from Large Language Models to End-to-End ASR Systems

TIDE: Textual Identity Detection for Evaluating and Augmenting Classification and Language Models

LanSER: Language-Model Supported Speech Emotion Recognition

ImageBind-LLM: Multi-modality Instruction Tuning

Zero-Shot Audio Captioning via Audibility Guidance

On Large Language Models’ Selection Bias in Multi-Choice Questions

Introducing “Forecast Utterance” for Conversational Data Science

USA: Universal Sentiment Analysis Model & Construction of Japanese Sentiment Text Classification and Part of Speech Dataset

The Daunting Dilemma with Sentence Encoders: Success on Standard Benchmarks, Failure in Capturing Basic Semantic Properties

Word segmentation granularity in Korean

Exploring an LM to generate Prolog Predicates from Mathematics Questions

BNS-Net: A Dual-channel Sarcasm Detection Method Considering Behavior-level and Sentence-level Conflicts

Loquacity and Visible Emotion: ChatGPT as a Policy Advisor

Evaluating the Efficacy of Supervised Learning vs Large Language Models for Identifying Cognitive Distortions and Suicidal Risks in Chinese Social Media

All Labels Together: Low-shot Intent Detection with an Efficient Label Semantic Encoding Paradigm

An Anchor Learning Approach for Citation Field Learning

Machine Learning for Tangible Effects: Natural Language Processing for Uncovering the Illicit Massage Industry & Computer Vision for Tactile Sensing

Improving Open Information Extraction with Large Language Models: A Study on Demonstration Uncertainty

From Base to Conversational: Japanese Instruction Dataset and Tuning Large Language Models

2023-09-07

Bayesian Dynamic DAG Learning: Application in Discovering Dynamic Effective Connectome of Brain

SRN-SZ: Deep Leaning-Based Scientific Error-bounded Lossy Compression with Super-resolution Neural Networks

Brief technical note on linearizing recurrent neural networks (RNNs) before vs after the pointwise nonlinearity

Optimal Transport with Tempered Exponential Measures

An Element-wise RSAV Algorithm for Unconstrained Optimization Problems

Creating a Systematic ESG (Environmental Social Governance) Scoring System Using Social Network Analysis and Machine Learning for More Sustainable Company Practices

Derivation of Coordinate Descent Algorithms from Optimal Control Theory

DBsurf: A Discrepancy Based Method for Discrete Stochastic Gradient Estimation

Automatic Concept Embedding Model (ACEM): No train-time concepts, No issue!

A Tutorial on the Non-Asymptotic Theory of System Identification

Mixtures of Gaussians are Privately Learnable with a Polynomial Number of Samples

Gradient-Based Feature Learning under Structured Data

Early warning via transitions in latent stochastic dynamical systems

Bootstrapping Adaptive Human-Machine Interfaces with Offline Reinforcement Learning

Learning from Demonstration via Probabilistic Diagrammatic Teaching

Prime and Modulate Learning: Generation of forward models with signed back-propagation and environmental cues

Empirical Risk Minimization for Losses without Variance

Improved theoretical guarantee for rank aggregation via spectral method

Conformal Autoregressive Generation: Beam Search with Coverage Guarantees

Adversarially Robust Deep Learning with Optimal-Transport-Regularized Divergences

Neural lasso: a unifying approach of lasso and neural networks

Convergence Analysis of Decentralized ASGD

Medoid Silhouette clustering with automatic cluster number selection

Learning continuous-valued treatment effects through representation balancing

A Causal Perspective on Loan Pricing: Investigating the Impacts of Selection Bias on Identifying Bid-Response Functions

DGSD: Dynamical Graph Self-Distillation for EEG-Based Auditory Spatial Attention Detection

A State Representation for Diminishing Rewards

Chat Failures and Troubles: Reasons and Solutions

A Probabilistic Semi-Supervised Approach with Triplet Markov Chains

Short-Term Load Forecasting Using A Particle-Swarm Optimized Multi-Head Attention-Augmented CNN-LSTM Network

A computationally lightweight safe learning algorithm

Alzheimer Disease Detection from Raman Spectroscopy of the Cerebrospinal Fluid via Topological Machine Learning

Insights Into the Inner Workings of Transformer Models for Protein Function Prediction

Understanding Self-Supervised Learning of Speech Representation via Invariance and Redundancy Reduction

Filtration Surfaces for Dynamic Graph Classification

Your Battery Is a Blast! Safeguarding Against Counterfeit Batteries with Authentication

Beyond attention: deriving biologically interpretable insights from weakly-supervised multiple-instance learning models

Trinary Decision Trees for missing value handling

On the dynamics of multi agent nonlinear filtering and learning

MVD:A Novel Methodology and Dataset for Acoustic Vehicle Type Classification

Subgraph-based Tight Frames on Graphs with Compact Supports and Vanishing Moments

Privacy-preserving Continual Federated Clustering via Adaptive Resonance Theory

Cross-domain Sound Recognition for Efficient Underwater Data Analysis

Broadband Ground Motion Synthesis via Generative Adversarial Neural Operators: Development and Validation

Personalized Tucker Decomposition: Modeling Commonality and Peculiarity on Tensor Data

Byzantine-Robust Federated Learning with Variance Reduction and Differential Privacy

Equal Long-term Benefit Rate: Adapting Static Fairness Notions to Sequential Decision Making

2023-09-07

Secure Control of Networked Inverted Pendulum Visual Servo System with Adverse Effects of Image Computation (Extended Version)

2023-09-07

Channel Estimation for Quantized Systems based on Conditionally Gaussian Latent Models

HDR Imaging With One-Bit Quantization

Multivariate, Multi-step, and Spatiotemporal Traffic Prediction for NextG Network Slicing under SLA Constraints

Private Membership Aggregation

Experimental Study of Adversarial Attacks on ML-based xApps in O-RAN

Novel Power-Imbalanced Dense Codebooks for Reliable Multiplexing in Nakagami Channels

Space-Time Shift Keying Aided OTFS Modulation for Orthogonal Multiple Access

Resource Management for IRS-assisted WP-MEC Networks with Practical Phase Shift Model