2023-11-19

cs.SD

cs.SD - 2023-11-19

Encoding Performance Data in MEI with the Automatic Music Performance Analysis and Comparison Toolkit (AMPACT)

paper_url: http://arxiv.org/abs/2311.11363
repo_url: None
paper_authors: Johanna Devaney, Cecilia Beauchamp
for: 这个论文是用于介绍一种新的MEI编码方法，使用最近添加的\texttt{}元素来存储表演数据。
methods: 这个论文使用了Automatic Music Performance Analysis and Comparison Toolkit（AMPACT）来提取表演数据，并将其编码为JSON对象，并将其链接到特定的乐谱Note中的\texttt{}元素。
results: 这个论文使用了一组流行音乐 vocals 来示例出\texttt{}元素可以编码的范围和AMPACT可以在缺乏完整乐谱的情况下提取表演数据。

Abstract
This paper presents a new method of encoding performance data in MEI using the recently added \texttt{} element. Performance data was extracted using the Automatic Music Performance Analysis and Comparison Toolkit (AMPACT) and encoded as a JSON object within an \texttt{} element linked to a specific musical note. A set of pop music vocals has was encoded to demonstrate both the range of descriptors that can be encoded in and how AMPACT can be used for extracting performance data in the absence of a fully specified musical score.

摘要
Here's the Simplified Chinese translation:这篇论文提出了一种使用 MEI 中最近添加的 \texttt{} 元素来编码性能数据的新方法。性能数据由 Automatic Music Performance Analysis and Comparison Toolkit (AMPACT) 提取，并以 JSON 对象形式嵌入在 \texttt{} 元素中，与特定的音符相关联。为示这种编码方式的应用范围和 AMPACT 如何在缺乏完整的乐谱情况下提取性能数据，一组流行音乐 vocals 被编码。

paper_url: http://arxiv.org/abs/2311.11255
repo_url: None
paper_authors: Atin Sakkeer Hussain, Shansong Liu, Chenshuo Sun, Ying Shan
for:* 这种研究旨在利用大型语言模型（LLM）来理解和生成不同modalities的音乐。methods:* 该研究使用了预训练的MERT、ViT和ViViT模型来理解音乐、图片和视频。* 使用AudioLDM 2和MusicGen来实现音乐生成。results:* 该研究通过结合多modal的理解和音乐生成，实现了创造性的潜力。* 对比当前状态的艺术模型，该模型的实验结果表明其可以达到或超越现有的性能。

Abstract
The current landscape of research leveraging large language models (LLMs) is experiencing a surge. Many works harness the powerful reasoning capabilities of these models to comprehend various modalities, such as text, speech, images, videos, etc. They also utilize LLMs to understand human intention and generate desired outputs like images, videos, and music. However, research that combines both understanding and generation using LLMs is still limited and in its nascent stage. To address this gap, we introduce a Multi-modal Music Understanding and Generation (M$^{2}$UGen) framework that integrates LLM's abilities to comprehend and generate music for different modalities. The M$^{2}$UGen framework is purpose-built to unlock creative potential from diverse sources of inspiration, encompassing music, image, and video through the use of pretrained MERT, ViT, and ViViT models, respectively. To enable music generation, we explore the use of AudioLDM 2 and MusicGen. Bridging multi-modal understanding and music generation is accomplished through the integration of the LLaMA 2 model. Furthermore, we make use of the MU-LLaMA model to generate extensive datasets that support text/image/video-to-music generation, facilitating the training of our M$^{2}$UGen framework. We conduct a thorough evaluation of our proposed framework. The experimental results demonstrate that our model achieves or surpasses the performance of the current state-of-the-art models.

摘要
当前研究利用大型语言模型（LLM）的领域正在推进。许多研究利用这些模型的强大理解能力来理解不同类型的数据，如文本、语音、图像、视频等。它们还利用LLM来理解人类意图并生成所需的输出，如图像、视频和音乐。然而，将理解和生成用LLM结合的研究仍然很有限，在初始阶段。为解决这个差距，我们介绍了一个多模态音乐理解和生成（M$^{2}$UGen）框架，它结合了不同类型的启发源的LLM能力，以生成不同模式的音乐。M$^{2}$UGen框架是为了解锁多模态音乐创作的创新潜力，包括音乐、图像和视频。为实现音乐生成，我们探索了AudioLDM 2和MusicGen等模型的使用。通过将多模态理解和音乐生成结合起来，我们使用了LLaMA 2模型。此外，我们使用MU-LLaMA模型生成了大量的文本/图像/视频到音乐生成数据集，以支持我们的M$^{2}$UGen框架的训练。我们进行了严格的实验评估，实验结果表明，我们的模型在性能上与当前状态的艺术模型相当或超越。

2023-11-19

eess.AS

eess.AS - 2023-11-19

Label-Synchronous Neural Transducer for Adaptable Online E2E Speech Recognition

paper_url: http://arxiv.org/abs/2311.11353
repo_url: None
paper_authors: Keqi Deng, Philip C. Woodland
for: 提高自动语音识别（ASR）的泛化能力，增强其在不同语音频谱中的表现。
methods: 提出了一种标签同步神经转化器（LS-Transducer），通过文本数据来进行领域适应。LS-Transducer使用标签水平Encoder表示性提取，然后将其与预测网络输出结合。这使得预测网络可以轻松地适应文本数据，而不需要大量的标签数据。此外，提出了一种自动步进机制（AIF），可以在低延迟下生成标签水平Encoder表示性，并且可以用于流处理。
results: 与标准神经转化器相比，提出的LS-Transducer在内域LibriSpeech数据上实现了12.9%的相对WRER（WRER）减少，以及21.4%和24.6%的相对WRER减少在跨频数据上（TED-LIUM 2和AESRC2020），并且可以在不同语音频谱中保持高度的同步。

Abstract
Although end-to-end (E2E) automatic speech recognition (ASR) has shown state-of-the-art recognition accuracy, it tends to be implicitly biased towards the training data distribution which can degrade generalisation. This paper proposes a label-synchronous neural transducer (LS-Transducer), which provides a natural approach to domain adaptation based on text-only data. The LS-Transducer extracts a label-level encoder representation before combining it with the prediction network output. Since blank tokens are no longer needed, the prediction network performs as a standard language model, which can be easily adapted using text-only data. An Auto-regressive Integrate-and-Fire (AIF) mechanism is proposed to generate the label-level encoder representation while retaining low latency operation that can be used for streaming. In addition, a streaming joint decoding method is designed to improve ASR accuracy while retaining synchronisation with AIF. Experiments show that compared to standard neural transducers, the proposed LS-Transducer gave a 12.9% relative WER reduction (WERR) for intra-domain LibriSpeech data, as well as 21.4% and 24.6% relative WERRs on cross-domain TED-LIUM 2 and AESRC2020 data with an adapted prediction network.

摘要
although end-to-end (E2E) automatic speech recognition (ASR) has shown state-of-the-art recognition accuracy, it tends to be implicitly biased towards the training data distribution which can degrade generalisation. this paper proposes a label-synchronous neural transducer (LS-Transducer), which provides a natural approach to domain adaptation based on text-only data. the LS-Transducer extracts a label-level encoder representation before combining it with the prediction network output. since blank tokens are no longer needed, the prediction network performs as a standard language model, which can be easily adapted using text-only data. an auto-regressive integrate-and-fire (AIF) mechanism is proposed to generate the label-level encoder representation while retaining low latency operation that can be used for streaming. in addition, a streaming joint decoding method is designed to improve ASR accuracy while retaining synchronisation with AIF. experiments show that compared to standard neural transducers, the proposed LS-Transducer gave a 12.9% relative WER reduction (WERR) for intra-domain LibriSpeech data, as well as 21.4% and 24.6% relative WERRs on cross-domain TED-LIUM 2 and AESRC2020 data with an adapted prediction network.Note:* "WER" stands for "Word Error Rate"* "WERR" stands for "Word Error Rate Reduction"* "LibriSpeech" is a dataset of speech recordings* "TED-LIUM 2" and "AESRC2020" are other datasets of speech recordings

2023-11-19

cs.CV

cs.CV - 2023-11-19

paper_url: http://arxiv.org/abs/2311.11439
repo_url: None
paper_authors: Vic De Ridder, Bappaditya Dey, Victor Blanco, Sandip Halder, Bartel Van Waeyenberge
for: 这个论文旨在提高现有的抗噪检测技术，以满足高NA（数字幕面积）EUVL（极紫外光刻）制造过程中的精细缺陷检测需求。
methods: 本文使用了Slicing Aided Hyper Inference（SAHI）框架，通过对大小增加的SEM图像进行推理，提高检测小缺陷的能力。
results: 比较现有的半导体数据集，SAHI方法可以提高小缺陷检测率约2倍；在新的测试数据集上，SAHI方法可以实现缺陷检测率100%，而未经训练的模型则失败。此外，本文还提出了一种不会导致真正正确预测减少的SAHI扩展方法。

Abstract
In semiconductor manufacturing, lithography has often been the manufacturing step defining the smallest possible pattern dimensions. In recent years, progress has been made towards high-NA (Numerical Aperture) EUVL (Extreme-Ultraviolet-Lithography) paradigm, which promises to advance pattern shrinking (2 nm node and beyond). However, a significant increase in stochastic defects and the complexity of defect detection becomes more pronounced with high-NA. Present defect inspection techniques (both non-machine learning and machine learning based), fail to achieve satisfactory performance at high-NA dimensions. In this work, we investigate the use of the Slicing Aided Hyper Inference (SAHI) framework for improving upon current techniques. Using SAHI, inference is performed on size-increased slices of the SEM images. This leads to the object detector's receptive field being more effective in capturing small defect instances. First, the performance on previously investigated semiconductor datasets is benchmarked across various configurations, and the SAHI approach is demonstrated to substantially enhance the detection of small defects, by approx. 2x. Afterwards, we also demonstrated application of SAHI leads to flawless detection rates on a new test dataset, with scenarios not encountered during training, whereas previous trained models failed. Finally, we formulate an extension of SAHI that does not significantly reduce true-positive predictions while eliminating false-positive predictions.

摘要
在半导体制造中，镭射曾经是制造过程中定义最小 Pattern 维度的关键步骤。在最近的年头，高 NA (数字化镜像) EUVL (极紫外光刻) 平台在进行 Pattern 缩小 (2 nm 节点和更多) 做出了 significiant 进步。然而，高 NA 中存在较多的随机缺陷和缺陷检测的复杂性增加。现有的缺陷检测技术（包括机器学习和非机器学习基于的）在高 NA 环境下表现不满意。在这项工作中，我们调查使用 Slicing Aided Hyper Inference (SAHI) 框架来改进当前技术。使用 SAHI，我们在 SEM 图像上进行大小增加的 slice 检测，从而使检测器的感知范围更加有效地捕捉小缺陷实例。首先，我们对已知的半导体数据集进行了不同配置的比较，并证明 SAHI 方法可以大幅提高小缺陷的检测率，约2倍。之后，我们还证明 SAHI 可以在新的测试数据集上实现缺陷检测率100%，而未在训练过程中遇到的场景。最后，我们提出了 SAHI 的扩展，可以不会对真正的正确预测造成重要减少，而是完全消除假阳性预测。

DiffSCI: Zero-Shot Snapshot Compressive Imaging via Iterative Spectral Diffusion Model

paper_url: http://arxiv.org/abs/2311.11417
repo_url: None
paper_authors: Zhenghao Pan, Haijin Zeng, Jiezhang Cao, Kai Zhang, Yongyong Chen
for:diffsci leverages the structural insights from the deep prior and optimization-based methodologies, complemented by the generative capabilities offered by the contemporary denoising diffusion model.methods:firstly, we employ a pre-trained diffusion model, which has been trained on a substantial corpus of rbg images, as the generative denoiser within the plug-and-play framework for the first time. this integration allows for the successful completion of sci reconstruction, especially in the case that current methods struggle to address effectively. secondly, we systematically account for spectral band correlations and introduce a robust methodology to mitigate wavelength mismatch, thus enabling seamless adaptation of the rbg diffusion model to msis.results:we present extensive testing to show that diffsci exhibits discernible performance enhancements over prevailing self-supervised and zero-shot approaches, surpassing even supervised transformer counterparts across both simulated and real datasets.

Abstract
This paper endeavors to advance the precision of snapshot compressive imaging (SCI) reconstruction for multispectral image (MSI). To achieve this, we integrate the advantageous attributes of established SCI techniques and an image generative model, propose a novel structured zero-shot diffusion model, dubbed DiffSCI. DiffSCI leverages the structural insights from the deep prior and optimization-based methodologies, complemented by the generative capabilities offered by the contemporary denoising diffusion model. Specifically, firstly, we employ a pre-trained diffusion model, which has been trained on a substantial corpus of RGB images, as the generative denoiser within the Plug-and-Play framework for the first time. This integration allows for the successful completion of SCI reconstruction, especially in the case that current methods struggle to address effectively. Secondly, we systematically account for spectral band correlations and introduce a robust methodology to mitigate wavelength mismatch, thus enabling seamless adaptation of the RGB diffusion model to MSIs. Thirdly, an accelerated algorithm is implemented to expedite the resolution of the data subproblem. This augmentation not only accelerates the convergence rate but also elevates the quality of the reconstruction process. We present extensive testing to show that DiffSCI exhibits discernible performance enhancements over prevailing self-supervised and zero-shot approaches, surpassing even supervised transformer counterparts across both simulated and real datasets. Our code will be available.

摘要
Firstly, we use a pre-trained diffusion model, which has been trained on a large dataset of RGB images, as the generative denoiser within the Plug-and-Play framework for the first time. This integration allows for successful SCI reconstruction, especially in cases where current methods struggle.Secondly, we systematically account for spectral band correlations and introduce a robust methodology to mitigate wavelength mismatch, enabling seamless adaptation of the RGB diffusion model to MSIs.Thirdly, we develop an accelerated algorithm to expedite the resolution of the data subproblem. This not only accelerates the convergence rate but also improves the quality of the reconstruction process.We present extensive testing to show that DiffSCI exhibits significant performance enhancements over existing self-supervised and zero-shot approaches, outperforming even supervised transformer counterparts across both simulated and real datasets. Our code will be publicly available.

Enhancing Low-dose CT Image Reconstruction by Integrating Supervised and Unsupervised Learning

paper_url: http://arxiv.org/abs/2311.12071
repo_url: None
paper_authors: Ling Chen, Zhishen Huang, Yong Long, Saiprasad Ravishankar
for: 这个研究是为了提高低剂量 Computed Tomography (CT) 图像重建的精确性和效率。
methods: 本研究提出了一个混合监督式学习框架，结合了统计学模型和深度学习方法来进行图像重建。
results: 实验结果显示，提案的框架可以与现有的低剂量 CT 重建方法相比，具有更高的精确性和效率。

Abstract
Traditional model-based image reconstruction (MBIR) methods combine forward and noise models with simple object priors. Recent application of deep learning methods for image reconstruction provides a successful data-driven approach to addressing the challenges when reconstructing images with undersampled measurements or various types of noise. In this work, we propose a hybrid supervised-unsupervised learning framework for X-ray computed tomography (CT) image reconstruction. The proposed learning formulation leverages both sparsity or unsupervised learning-based priors and neural network reconstructors to simulate a fixed-point iteration process. Each proposed trained block consists of a deterministic MBIR solver and a neural network. The information flows in parallel through these two reconstructors and is then optimally combined. Multiple such blocks are cascaded to form a reconstruction pipeline. We demonstrate the efficacy of this learned hybrid model for low-dose CT image reconstruction with limited training data, where we use the NIH AAPM Mayo Clinic Low Dose CT Grand Challenge dataset for training and testing. In our experiments, we study combinations of supervised deep network reconstructors and MBIR solver with learned sparse representation-based priors or analytical priors. Our results demonstrate the promising performance of the proposed framework compared to recent low-dose CT reconstruction methods.

摘要
传统模型基于的图像重建（MBIR）方法会结合前向和噪声模型，并使用简单的物件假设。在这个工作中，我们提出了一个混合监督无监督学习框架，用于X射 Computed Tomography（CT）图像重建。我们的学习形式会利用循环约束和神经网络重建器来模拟一个固定点回旋过程。每个提出的训练块都包含一个决定性MBIR解析器和一个神经网络。信息在这两个重建器之间流动，然后互相结合。多个这些块被组合以形成一个重建管线。在我们的实验中，我们研究了对于低剂量CT图像重建的不同组合，包括对于对于MBIR解析器和神经网络的学习统计学假设。我们的结果显示了我们提出的框架与最近的低剂量CT重建方法相比，表现出色。

FDDM: Unsupervised Medical Image Translation with a Frequency-Decoupled Diffusion Model

paper_url: http://arxiv.org/abs/2311.12070
repo_url: None
paper_authors: Yunxiang Li, Hua-Chieh Shao, Xiaoxue Qian, You Zhang
for:The paper aims to improve the quality and accuracy of medical image translations using a novel framework called frequency-decoupled diffusion model (FDDM).methods:FDDM decouples the frequency components of medical images in the Fourier domain during the translation process, allowing for structure-preserved high-quality image conversion. It applies an unsupervised frequency conversion module and uses frequency-specific information to guide a following diffusion model for final source-to-target image translation.results:The proposed FDDM outperformed other GAN-, VAE-, and diffusion-based models in terms of image quality and faithfulness to the original anatomical structures. It achieved an FID of 29.88, significantly lower than the second best. These results demonstrate the effectiveness of FDDM in generating highly-realistic target-domain images while maintaining the accuracy of translated anatomical structures.Here is the simplified Chinese version of the three key points:for:这篇论文目标是提高医疗图像翻译质量和准确性，使用一种新的扩展模型（FDDM）。methods:FDDM 在快 Fourier 频域中分离医疗图像的频率组成，以实现保持结构的高质量图像翻译。它使用无监督频谱转换模块和频率特定信息导引后续的扩展模型进行最终的源图像翻译。results:提出的 FDDM 在图像质量和结构准确性方面表现出色，超越了其他 GAN-、VAE- 和扩展模型。它在 FID 指标上达到 29.88，与第二最佳的一半以下。这些结果表明 FDDM 能够生成高度真实的目标频域图像，同时保持翻译后的结构准确性。

Abstract
Diffusion models have demonstrated significant potential in producing high-quality images for medical image translation to aid disease diagnosis, localization, and treatment. Nevertheless, current diffusion models have limited success in achieving faithful image translations that can accurately preserve the anatomical structures of medical images, especially for unpaired datasets. The preservation of structural and anatomical details is essential to reliable medical diagnosis and treatment planning, as structural mismatches can lead to disease misidentification and treatment errors. In this study, we introduced a frequency-decoupled diffusion model (FDDM), a novel framework that decouples the frequency components of medical images in the Fourier domain during the translation process, to allow structure-preserved high-quality image conversion. FDDM applies an unsupervised frequency conversion module to translate the source medical images into frequency-specific outputs and then uses the frequency-specific information to guide a following diffusion model for final source-to-target image translation. We conducted extensive evaluations of FDDM using a public brain MR-to-CT translation dataset, showing its superior performance against other GAN-, VAE-, and diffusion-based models. Metrics including the Frechet inception distance (FID), the peak signal-to-noise ratio (PSNR), and the structural similarity index measure (SSIM) were assessed. FDDM achieves an FID of 29.88, less than half of the second best. These results demonstrated FDDM's prowess in generating highly-realistic target-domain images while maintaining the faithfulness of translated anatomical structures.

摘要
Diffusion models have shown great potential in generating high-quality images for medical image translation, aiding disease diagnosis, localization, and treatment. However, current diffusion models struggle to produce faithful image translations that accurately preserve the anatomical structures of medical images, especially for unpaired datasets. Preserving structural and anatomical details is crucial for reliable medical diagnosis and treatment planning, as mismatches can lead to disease misidentification and treatment errors. In this study, we proposed a frequency-decoupled diffusion model (FDDM) to address this challenge. FDDM separates the frequency components of medical images in the Fourier domain during translation, allowing for structure-preserved high-quality image conversion. FDDM first applies an unsupervised frequency conversion module to translate the source medical images into frequency-specific outputs, then uses the frequency-specific information to guide a following diffusion model for final source-to-target image translation. We evaluated FDDM on a public brain MR-to-CT translation dataset and found it outperformed other GAN-, VAE-, and diffusion-based models. Metrics such as Frechet inception distance (FID), peak signal-to-noise ratio (PSNR), and structural similarity index measure (SSIM) were used to assess performance. FDDM achieved an FID of 29.88, significantly better than the second best. These results demonstrate FDDM's ability to generate highly-realistic target-domain images while maintaining the faithfulness of translated anatomical structures.

A Survey of Emerging Applications of Diffusion Probabilistic Models in MRI

paper_url: http://arxiv.org/abs/2311.11383
repo_url: None
paper_authors: Yuheng Fan, Hanxi Liao, Shiqi Huang, Yimin Luo, Huazhu Fu, Haikun Qi
for: 本文旨在介绍Diffusion probabilistic models (DPMs)在医学成像中的应用，帮助医学成像领域的研究人员了解DPMs在不同应用中的进步。
methods: 本文首先介绍了两种主导的DPMs，即分别是粒子批处理和连续时间批处理的两种方法，然后提供了关于MRI应用的DPMs的全面回顾，包括重建、图像生成、图像翻译、分割、异常检测以及进一步研究话题。
results: 本文对DPMs在MRI应用中的总体局限性以及特定任务的局限性进行了讨论，并指出了未来研究的潜在领域。

Abstract
Diffusion probabilistic models (DPMs) which employ explicit likelihood characterization and a gradual sampling process to synthesize data, have gained increasing research interest. Despite their huge computational burdens due to the large number of steps involved during sampling, DPMs are widely appreciated in various medical imaging tasks for their high-quality and diversity of generation. Magnetic resonance imaging (MRI) is an important medical imaging modality with excellent soft tissue contrast and superb spatial resolution, which possesses unique opportunities for diffusion models. Although there is a recent surge of studies exploring DPMs in MRI, a survey paper of DPMs specifically designed for MRI applications is still lacking. This review article aims to help researchers in the MRI community to grasp the advances of DPMs in different applications. We first introduce the theory of two dominant kinds of DPMs, categorized according to whether the diffusion time step is discrete or continuous, and then provide a comprehensive review of emerging DPMs in MRI, including reconstruction, image generation, image translation, segmentation, anomaly detection, and further research topics. Finally, we discuss the general limitations as well as limitations specific to the MRI tasks of DPMs and point out potential areas that are worth further exploration.

摘要
吸引probabilistic models of diffusion (DPMs)，which employ explicit likelihood characterization and a gradual sampling process to synthesize data，have gained increasing research interest. Despite their huge computational burdens due to the large number of steps involved during sampling, DPMs are widely appreciated in various medical imaging tasks for their high-quality and diversity of generation.Magnetic resonance imaging (MRI) is an important medical imaging modality with excellent soft tissue contrast and superb spatial resolution, which possesses unique opportunities for diffusion models. Although there is a recent surge of studies exploring DPMs in MRI, a survey paper of DPMs specifically designed for MRI applications is still lacking. This review article aims to help researchers in the MRI community to grasp the advances of DPMs in different applications. We first introduce the theory of two dominant kinds of DPMs, categorized according to whether the diffusion time step is discrete or continuous, and then provide a comprehensive review of emerging DPMs in MRI, including reconstruction, image generation, image translation, segmentation, anomaly detection, and further research topics. Finally, we discuss the general limitations as well as limitations specific to the MRI tasks of DPMs and point out potential areas that are worth further exploration.

Evidential Uncertainty Quantification: A Variance-Based Perspective

paper_url: http://arxiv.org/abs/2311.11367
repo_url: https://github.com/kerrydrx/evidentialada
paper_authors: Ruxiao Duan, Brian Caffo, Harrison X. Bai, Haris I. Sair, Craig Jones
for: 这个研究的目的是为了提供一种量化深度神经网络中的不确定性的方法，并且适用于多元领域的应用。
methods: 这个研究使用了证据深度学习的方法，将不确定性量化为分类领域内的级别不确定性。
results: 实验结果显示，这个方法可以在跨 домен数据集上实现相当准确的分类，并且可以提供分类不确定性和级别间的相互相关信息。

Abstract
Uncertainty quantification of deep neural networks has become an active field of research and plays a crucial role in various downstream tasks such as active learning. Recent advances in evidential deep learning shed light on the direct quantification of aleatoric and epistemic uncertainties with a single forward pass of the model. Most traditional approaches adopt an entropy-based method to derive evidential uncertainty in classification, quantifying uncertainty at the sample level. However, the variance-based method that has been widely applied in regression problems is seldom used in the classification setting. In this work, we adapt the variance-based approach from regression to classification, quantifying classification uncertainty at the class level. The variance decomposition technique in regression is extended to class covariance decomposition in classification based on the law of total covariance, and the class correlation is also derived from the covariance. Experiments on cross-domain datasets are conducted to illustrate that the variance-based approach not only results in similar accuracy as the entropy-based one in active domain adaptation but also brings information about class-wise uncertainties as well as between-class correlations. The code is available at https://github.com/KerryDRX/EvidentialADA. This alternative means of evidential uncertainty quantification will give researchers more options when class uncertainties and correlations are important in their applications.

摘要
深度神经网络的不确定性评估已成为活跃的研究领域，对downstream任务如活动学习具有关键作用。近年来，显著的潜在深度学习技术在单个模型前进通过直接量化 aleatoric 和 epistemic 不确定性。大多数传统方法采用 entropy 基于的方法来Derive evidential uncertainty in classification，量化不确定性在样本级别。然而，在回归问题中广泛应用的 variance 基于的方法在分类 setting 中 rarely 使用。在这种工作中，我们将 regression 中的 variance 基于方法适应到分类 setting，量化分类不确定性在类别级别。在分类问题中，regression 中的 Covariance 分解技术被推广到类别 Covariance 分解，并 derive 类之间的相关性。在跨Domain 数据集上进行了实验，ILLUSTRATE dass variance-based approach 不仅与 entropy-based approach 在活动适应中得到相似的准确率，还提供了类别不确定性以及类之间的相关性信息。代码可以在上获取。这种 altenative 的 evidential uncertainty 评估方法将给研究人员更多的选择，当类别不确定性和相关性在应用中具有重要意义时。

Scale-aware competition network for palmprint recognition

paper_url: http://arxiv.org/abs/2311.11354
repo_url: None
paper_authors: Chengrui Gao, Ziyuan Yang, Min Zhu, Andrew Beng Jin Teoh
for: This paper aims to improve the recognition performance of palmprint biometrics by addressing the limitation of prior methodologies that only focus on texture orientation and neglect the significant texture scale dimension.
methods: The proposed method, called SAC-Net, consists of two modules: Inner-Scale Competition Module (ISCM) and Across-Scale Competition Module (ASCM). ISCM integrates learnable Gabor filters and a self-attention mechanism to extract rich orientation data, while ASCM leverages a competitive strategy across various scales to effectively encapsulate competitive texture scale elements.
results: The proposed method was tested on three benchmark datasets and showed exceptional recognition performance and resilience relative to state-of-the-art alternatives.

Abstract
Palmprint biometrics garner heightened attention in palm-scanning payment and social security due to their distinctive attributes. However, prevailing methodologies singularly prioritize texture orientation, neglecting the significant texture scale dimension. We design an innovative network for concurrently extracting intra-scale and inter-scale features to redress this limitation. This paper proposes a scale-aware competitive network (SAC-Net), which includes the Inner-Scale Competition Module (ISCM) and the Across-Scale Competition Module (ASCM) to capture texture characteristics related to orientation and scale. ISCM efficiently integrates learnable Gabor filters and a self-attention mechanism to extract rich orientation data and discern textures with long-range discriminative properties. Subsequently, ASCM leverages a competitive strategy across various scales to effectively encapsulate the competitive texture scale elements. By synergizing ISCM and ASCM, our method adeptly characterizes palmprint features. Rigorous experimentation across three benchmark datasets unequivocally demonstrates our proposed approach's exceptional recognition performance and resilience relative to state-of-the-art alternatives.

摘要
palmprint生物特征在掌握支付和社会保障方面受到高度关注，因其具有独特的特征。然而，现有的方法ologies仅单独强调Texture orientation，忽略了重要的Texture scale维度。我们设计了一种创新的网络，可同时提取内纬度和间纬度特征。这篇论文提出了一种Scale-aware竞争网络（SAC-Net），包括Inner-Scale Competition Module（ISCM）和Across-Scale Competition Module（ASCM），用于捕捉Texture特征相关的orientation和Scale。ISCM使用可学习的Gabor滤波器和自我注意机制，以提取丰富的orientation数据，并能够识别距离特征长距离性的文本。然后，ASCMC使用竞争策略跨多个纬度，以有效地包含竞争Texture纬度元素。通过融合ISCM和ASCMC，我们的方法能够有效地描述掌握特征。经过严格的实验证明，我们的提议方法在三个标准数据集上具有突出的认识性和抗衰落性，与当前的状态对照法相比。

MoVideo: Motion-Aware Video Generation with Diffusion Models

paper_url: http://arxiv.org/abs/2311.11325
repo_url: None
paper_authors: Jingyun Liang, Yuchen Fan, Kai Zhang, Radu Timofte, Luc Van Gool, Rakesh Ranjan
for: 文章旨在提出一种基于运动意识的视频生成（MoVideo）框架，以解决现有的视频生成方法缺乏运动考虑的问题。
methods: 该框架使用了两种方法来考虑运动：视频深度和光流。视频深度用于规范运动，而光流用于保持细节和改善时间一致性。
results: 实验结果显示，MoVideo在文本生成和图像生成中均达到了状态机器的结果，表现出了优秀的提示一致性、帧一致性和视觉质量。

Abstract
While recent years have witnessed great progress on using diffusion models for video generation, most of them are simple extensions of image generation frameworks, which fail to explicitly consider one of the key differences between videos and images, i.e., motion. In this paper, we propose a novel motion-aware video generation (MoVideo) framework that takes motion into consideration from two aspects: video depth and optical flow. The former regulates motion by per-frame object distances and spatial layouts, while the later describes motion by cross-frame correspondences that help in preserving fine details and improving temporal consistency. More specifically, given a key frame that exists or generated from text prompts, we first design a diffusion model with spatio-temporal modules to generate the video depth and the corresponding optical flows. Then, the video is generated in the latent space by another spatio-temporal diffusion model under the guidance of depth, optical flow-based warped latent video and the calculated occlusion mask. Lastly, we use optical flows again to align and refine different frames for better video decoding from the latent space to the pixel space. In experiments, MoVideo achieves state-of-the-art results in both text-to-video and image-to-video generation, showing promising prompt consistency, frame consistency and visual quality.

摘要
“Recent years have witnessed significant progress in using diffusion models for video generation, but most of these models are simple extensions of image generation frameworks, which ignore the key difference between videos and images: motion. In this paper, we propose a novel motion-aware video generation (MoVideo) framework that takes motion into consideration from two aspects: video depth and optical flow. The former regulates motion by per-frame object distances and spatial layouts, while the latter describes motion by cross-frame correspondences that help preserve fine details and improve temporal consistency. Specifically, given a key frame that exists or generated from text prompts, we first design a diffusion model with spatio-temporal modules to generate the video depth and the corresponding optical flows. Then, the video is generated in the latent space by another spatio-temporal diffusion model under the guidance of depth, optical flow-based warped latent video, and the calculated occlusion mask. Finally, we use optical flows again to align and refine different frames for better video decoding from the latent space to the pixel space. In experiments, MoVideo achieves state-of-the-art results in both text-to-video and image-to-video generation, demonstrating excellent prompt consistency, frame consistency, and visual quality.”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format instead.

Discrete approximations of Gaussian smoothing and Gaussian derivatives

paper_url: http://arxiv.org/abs/2311.11317
repo_url: None
paper_authors: Tony Lindeberg
for: 本研究探讨了在批处理数据时，使用抽象scale-space理论来近似 Gaussian 平滑和Gaussian 导数计算的问题。
methods: 本研究考虑了三种主要的精度方法，包括（i）采样Gaussian 函数和Gaussian 导数函数，（ii）在每个像素支持区域内进行局部积分Gaussian 函数和Gaussian 导数函数，以及（iii）基于离散Gaussian 函数的scale-space分析，并通过小支持中央差 опера作计算导数近似。
results: 研究发现，采样Gaussian 函数和Gaussian 导数函数在非常细见度上表现很差，而离散Gaussian 函数和其对应的离散导数近似方法在非常细见度上表现较好。同时，采样Gaussian 函数和Gaussian 导数函数在大比例参数时（大于约1个网格间距）表现 numerically 非常好，但是在非常细见度上表现不佳。

Abstract
This paper develops an in-depth treatment concerning the problem of approximating the Gaussian smoothing and Gaussian derivative computations in scale-space theory for application on discrete data. With close connections to previous axiomatic treatments of continuous and discrete scale-space theory, we consider three main ways discretizing these scale-space operations in terms of explicit discrete convolutions, based on either (i) sampling the Gaussian kernels and the Gaussian derivative kernels, (ii) locally integrating the Gaussian kernels and the Gaussian derivative kernels over each pixel support region and (iii) basing the scale-space analysis on the discrete analogue of the Gaussian kernel, and then computing derivative approximations by applying small-support central difference operators to the spatially smoothed image data. We study the properties of these three main discretization methods both theoretically and experimentally, and characterize their performance by quantitative measures, including the results they give rise to with respect to the task of scale selection, investigated for four different use cases, and with emphasis on the behaviour at fine scales. The results show that the sampled Gaussian kernels and derivatives as well as the integrated Gaussian kernels and derivatives perform very poorly at very fine scales. At very fine scales, the discrete analogue of the Gaussian kernel with its corresponding discrete derivative approximations performs substantially better. The sampled Gaussian kernel and the sampled Gaussian derivatives do, on the other hand, lead to numerically very good approximations of the corresponding continuous results, when the scale parameter is sufficiently large, in the experiments presented in the paper, when the scale parameter is greater than a value of about 1, in units of the grid spacing.

摘要
本文对抽象数据中的抽象 Gaussian 平滑和抽象 Gaussian 导数计算进行深入研究，以应用在抽象数据上。与之前的几何学对照推理相关，我们考虑了三种主要的精度化方法，包括：（i）采样 Gaussian 函数和导数函数，（ii）在每个像素支持区域内进行局部积分 Gaussian 函数和导数函数，以及（iii）基于抽象数据的 discrete Gaussian 函数，然后通过小支持中心差分算法计算导数近似。我们对这三种精度化方法进行了理论和实验研究，并通过量化指标来评估其性能，包括根据四个不同的应用场景进行的缩放选择任务的结果，并强调细致级别上的行为。结果表明，采样 Gaussian 函数和导数函数以及局部积分 Gaussian 函数和导数函数在极细致级别上表现很差，而抽象数据的 discrete Gaussian 函数和其对应的导数近似则表现出色。同时，采样 Gaussian 函数和导数函数在大于一定缩放参数（约等于网格间距）时对应的近似结果 numerically 非常好。

paper_url: http://arxiv.org/abs/2311.11312
repo_url: None
paper_authors: Shuai Zhang, Minghong Xie
for: 提高RGB-D图像semantic segmentation的精度
methods: 利用多Modal Interaction Fusion Module（MIM）和Pooling Attention Module（PAM）将RGB和深度信息融合，并在decoder阶段进行targeted Integration
results: 在NYUDv2和SUN-RGBD两个indoor scene数据集上，MIPANet表现出色，与现有方法相比，提高了RGB-Dsemantic segmentation的精度

Abstract
Semantic segmentation of RGB-D images involves understanding the appearance and spatial relationships of objects within a scene, which requires careful consideration of various factors. However, in indoor environments, the simple input of RGB and depth images often results in a relatively limited acquisition of semantic and spatial information, leading to suboptimal segmentation outcomes. To address this, we propose the Multi-modal Interaction and Pooling Attention Network (MIPANet), a novel approach designed to harness the interactive synergy between RGB and depth modalities, optimizing the utilization of complementary information. Specifically, we incorporate a Multi-modal Interaction Fusion Module (MIM) into the deepest layers of the network. This module is engineered to facilitate the fusion of RGB and depth information, allowing for mutual enhancement and correction. Additionally, we introduce a Pooling Attention Module (PAM) at various stages of the encoder. This module serves to amplify the features extracted by the network and integrates the module's output into the decoder in a targeted manner, significantly improving semantic segmentation performance. Our experimental results demonstrate that MIPANet outperforms existing methods on two indoor scene datasets, NYUDv2 and SUN-RGBD, underscoring its effectiveness in enhancing RGB-D semantic segmentation.

摘要
semantic segmentation of RGB-D 图像含义涉及到场景中对象的外观和空间关系的理解，需要谨慎考虑多种因素。然而，在室内环境中，简单地输入 RGB 和深度图像经常导致semantic和空间信息的有限获取，从而导致 segmentation 结果不佳。为此，我们提出了 Multi-modal Interaction and Pooling Attention Network (MIPANet)，一种新的方法，旨在利用 RGB 和深度模式之间的交互 synergy，最大化 complementary 信息的利用。具体来说，我们在网络的最深层中添加了 Multi-modal Interaction Fusion Module (MIM)。这个模块是用于把 RGB 和深度信息融合在一起，以便互相增强和更正。此外，我们在网络的不同层次中引入了 Pooling Attention Module (PAM)。这个模块用于增强网络提取的特征，并将其集成到解码器中，以提高 semantic segmentation 性能。我们的实验结果表明，MIPANet 在 NYUDv2 和 SUN-RGBD 两个室内场景数据集上的性能都高于现有方法，证明它在 RGB-D semantic segmentation 方面具有显著的改进效果。

UMAAF: Unveiling Aesthetics via Multifarious Attributes of Images

paper_url: http://arxiv.org/abs/2311.11306
repo_url: None
paper_authors: Weijie Li, Yitian Wan, Xingjiao Wu, Junjie Xu, Cheng Jin, Liang He
For: The paper focuses on Image Aesthetic Assessment (IAA) and proposes a Unified Multi-attribute Aesthetic Assessment Framework (UMAAF) to better utilize image attributes in aesthetic assessment.* Methods: The paper uses a combination of absolute-attribute perception modules and an absolute-attribute interacting network to extract and integrate absolute attribute features, as well as a Relative-Relation Loss function to model the relative attribute of images.* Results: The proposed UMAAF achieves state-of-the-art performance on TAD66K and AVA datasets, and multiple experiments demonstrate the effectiveness of each module and the model’s alignment with human preference.

Abstract
With the increasing prevalence of smartphones and websites, Image Aesthetic Assessment (IAA) has become increasingly crucial. While the significance of attributes in IAA is widely recognized, many attribute-based methods lack consideration for the selection and utilization of aesthetic attributes. Our initial step involves the acquisition of aesthetic attributes from both intra- and inter-perspectives. Within the intra-perspective, we extract the direct visual attributes of images, constituting the absolute attribute. In the inter-perspective, our focus lies in modeling the relative score relationships between images within the same sequence, forming the relative attribute. Then, to better utilize image attributes in aesthetic assessment, we propose the Unified Multi-attribute Aesthetic Assessment Framework (UMAAF) to model both absolute and relative attributes of images. For absolute attributes, we leverage multiple absolute-attribute perception modules and an absolute-attribute interacting network. The absolute-attribute perception modules are first pre-trained on several absolute-attribute learning tasks and then used to extract corresponding absolute attribute features. The absolute-attribute interacting network adaptively learns the weight of diverse absolute-attribute features, effectively integrating them with generic aesthetic features from various absolute-attribute perspectives and generating the aesthetic prediction. To model the relative attribute of images, we consider the relative ranking and relative distance relationships between images in a Relative-Relation Loss function, which boosts the robustness of the UMAAF. Furthermore, UMAAF achieves state-of-the-art performance on TAD66K and AVA datasets, and multiple experiments demonstrate the effectiveness of each module and the model's alignment with human preference.

摘要
随着智能手机和网站的普及，图像美学评估（IAA）变得越来越重要。虽然美学特征在IAA中的重要性得到了广泛的认可，但许多基于特征的方法忽略了图像美学特征的选择和使用。我们的初步步骤是从内部和外部两个角度获取图像美学特征。在内部角度，我们提取图像直接视觉特征，组成绝对特征。在外部角度，我们关注图像序列中图像之间的相对分数关系，组成相对特征。然后，为更好地利用图像特征进行美学评估，我们提议一种统一多属性美学评估框架（UMAAF），以模型图像绝对和相对特征。对于绝对特征，我们利用多个绝对特征感知模块和绝对特征互动网络。绝对特征感知模块首先在多个绝对特征学习任务上进行预训练，然后用来提取对应的绝对特征特征。绝对特征互动网络可以适应性地学习绝对特征特征之间的权重，有效地将多种绝对特征视角综合到一起，生成美学预测。为了模型图像相对特征，我们考虑图像之间的相对排名和相对距离关系，通过相对关系损失函数进行优化。此外，UMAAF在TAD66K和AVA数据集上实现了状态机器人的表现，并在多个实验中证明了模型与人类偏好的一致。

Exchanging Dual Encoder-Decoder: A New Strategy for Change Detection with Semantic Guidance and Spatial Localization

paper_url: http://arxiv.org/abs/2311.11302
repo_url: None
paper_authors: Sijie Zhao, Xueliang Zhang, Pengfeng Xiao, Guangjun He
for: 这个论文主要targets binary change detection in earth observation applications, with a focus on addressing the problems of bitemporal feature interference and inapplicability of existing methods.
methods: 该论文提出了一种新的exchange dual encoder-decoder structure，which fuses bitemporal features in the decision level and uses bitemporal semantic features to determine changed areas.
results: 根据实验结果，该模型在六个数据集上的性能均达到了或超过了state-of-the-art方法，其中F1-score为97.77%, 83.07%, 94.86%, 92.33%, 91.39%, 74.35%。

Abstract
Change detection is a critical task in earth observation applications. Recently, deep learning-based methods have shown promising performance and are quickly adopted in change detection. However, the widely used multiple encoder and single decoder (MESD) as well as dual encoder-decoder (DED) architectures still struggle to effectively handle change detection well. The former has problems of bitemporal feature interference in the feature-level fusion, while the latter is inapplicable to intraclass change detection and multiview building change detection. To solve these problems, we propose a new strategy with an exchanging dual encoder-decoder structure for binary change detection with semantic guidance and spatial localization. The proposed strategy solves the problems of bitemporal feature inference in MESD by fusing bitemporal features in the decision level and the inapplicability in DED by determining changed areas using bitemporal semantic features. We build a binary change detection model based on this strategy, and then validate and compare it with 18 state-of-the-art change detection methods on six datasets in three scenarios, including intraclass change detection datasets (CDD, SYSU), single-view building change detection datasets (WHU, LEVIR-CD, LEVIR-CD+) and a multiview building change detection dataset (NJDS). The experimental results demonstrate that our model achieves superior performance with high efficiency and outperforms all benchmark methods with F1-scores of 97.77%, 83.07%, 94.86%, 92.33%, 91.39%, 74.35% on CDD, SYSU, WHU, LEVIR-CD, LEVIR- CD+, and NJDS datasets, respectively. The code of this work will be available at https://github.com/NJU-LHRS/official-SGSLN.

摘要
Change detection is a critical task in earth observation applications. Recently, deep learning-based methods have shown promising performance and are quickly adopted in change detection. However, the widely used multiple encoder and single decoder (MESD) as well as dual encoder-decoder (DED) architectures still struggle to effectively handle change detection well. The former has problems of bitemporal feature interference in the feature-level fusion, while the latter is inapplicable to intraclass change detection and multiview building change detection. To solve these problems, we propose a new strategy with an exchanging dual encoder-decoder structure for binary change detection with semantic guidance and spatial localization. The proposed strategy solves the problems of bitemporal feature inference in MESD by fusing bitemporal features in the decision level and the inapplicability in DED by determining changed areas using bitemporal semantic features. We build a binary change detection model based on this strategy, and then validate and compare it with 18 state-of-the-art change detection methods on six datasets in three scenarios, including intraclass change detection datasets (CDD, SYSU), single-view building change detection datasets (WHU, LEVIR-CD, LEVIR-CD+), and a multiview building change detection dataset (NJDS). The experimental results demonstrate that our model achieves superior performance with high efficiency and outperforms all benchmark methods with F1-scores of 97.77%, 83.07%, 94.86%, 92.33%, 91.39%, and 74.35% on CDD, SYSU, WHU, LEVIR-CD, LEVIR-CD+, and NJDS datasets, respectively. The code of this work will be available at https://github.com/NJU-LHRS/official-SGSLN.

Pair-wise Layer Attention with Spatial Masking for Video Prediction

paper_url: http://arxiv.org/abs/2311.11289
repo_url: https://github.com/mlvccn/pla_sm_videopred
paper_authors: Ping Li, Chenhan Zhang, Zheng Yang, Xianghua Xu, Mingli Song
for: 预测视频帧，提高预测质量。
methods: Pair-wise Layer Attention (PLA) 模块和 Spatial Masking (SM) 模块。
results: 提高预测质量， capture 空间时间动态。Here’s the full translation in Simplified Chinese:
for: 预测视频帧，提高预测质量。
methods: 使用 Pair-wise Layer Attention (PLA) 模块和 Spatial Masking (SM) 模块。
results: 提高预测质量， capture 空间时间动态。I hope that helps!

Abstract
Video prediction yields future frames by employing the historical frames and has exhibited its great potential in many applications, e.g., meteorological prediction, and autonomous driving. Previous works often decode the ultimate high-level semantic features to future frames without texture details, which deteriorates the prediction quality. Motivated by this, we develop a Pair-wise Layer Attention (PLA) module to enhance the layer-wise semantic dependency of the feature maps derived from the U-shape structure in Translator, by coupling low-level visual cues and high-level features. Hence, the texture details of predicted frames are enriched. Moreover, most existing methods capture the spatiotemporal dynamics by Translator, but fail to sufficiently utilize the spatial features of Encoder. This inspires us to design a Spatial Masking (SM) module to mask partial encoding features during pretraining, which adds the visibility of remaining feature pixels by Decoder. To this end, we present a Pair-wise Layer Attention with Spatial Masking (PLA-SM) framework for video prediction to capture the spatiotemporal dynamics, which reflect the motion trend. Extensive experiments and rigorous ablation studies on five benchmarks demonstrate the advantages of the proposed approach. The code is available at GitHub.

摘要
<>translate "Video prediction yields future frames by employing the historical frames and has exhibited its great potential in many applications, e.g., meteorological prediction, and autonomous driving. Previous works often decode the ultimate high-level semantic features to future frames without texture details, which deteriorates the prediction quality. Motivated by this, we develop a Pair-wise Layer Attention (PLA) module to enhance the layer-wise semantic dependency of the feature maps derived from the U-shape structure in Translator, by coupling low-level visual cues and high-level features. Hence, the texture details of predicted frames are enriched. Moreover, most existing methods capture the spatiotemporal dynamics by Translator, but fail to sufficiently utilize the spatial features of Encoder. This inspires us to design a Spatial Masking (SM) module to mask partial encoding features during pretraining, which adds the visibility of remaining feature pixels by Decoder. To this end, we present a Pair-wise Layer Attention with Spatial Masking (PLA-SM) framework for video prediction to capture the spatiotemporal dynamics, which reflect the motion trend. Extensive experiments and rigorous ablation studies on five benchmarks demonstrate the advantages of the proposed approach. The code is available at GitHub." into Simplified Chinese.Video 预测可以通过历史帧来生成未来帧，并在许多应用中表现出了很大的潜力，例如气象预测和自动驾驶。先前的工作通常将最终的高级 semantic 特征解码到未来帧中，无论是 texture 细节，这会导致预测质量下降。我们受这种情况的激励，开发了一种 Pair-wise Layer Attention (PLA) 模块，以强化 Translator 中层次 semantic 相关性，通过对低级视觉指示和高级特征进行对接。因此，预测的帧 texture 细节得到了增强。此外，大多数现有方法可以通过 Translator 捕捉 spatiotemporal 动力学，但是它们通常不充分利用 Encoder 中的空间特征。这引发我们设计一种 Spatial Masking (SM) 模块，在预训练期间对 Encoder 中的部分特征进行遮盲，使得 Decoder 中的剩下特征像素可以得到更多的可见性。因此，我们提出了一种 Pair-wise Layer Attention with Spatial Masking (PLA-SM) 框架，用于视频预测，以捕捉 spatiotemporal 动力学，它反映了运动趋势。我们对五个标准 benchmark 进行了广泛的实验和严格的ablation 研究， demonstarted 我们提出的方法的优势。代码可以在 GitHub 上获取。

LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching

paper_url: http://arxiv.org/abs/2311.11284
repo_url: https://github.com/envision-research/luciddreamer
paper_authors: Yixun Liang, Xin Yang, Jiantao Lin, Haodong Li, Xiaogang Xu, Yingcong Chen
for: 提高文本到3D生成模型的质量和效率
methods: 使用权重匹配法和3D扩散轨迹来缓解过滤效应，并 integration of 3D Gaussian Splatting
results: 与现状比较，模型表现出较高的质量和更好的训练效率

Abstract
The recent advancements in text-to-3D generation mark a significant milestone in generative models, unlocking new possibilities for creating imaginative 3D assets across various real-world scenarios. While recent advancements in text-to-3D generation have shown promise, they often fall short in rendering detailed and high-quality 3D models. This problem is especially prevalent as many methods base themselves on Score Distillation Sampling (SDS). This paper identifies a notable deficiency in SDS, that it brings inconsistent and low-quality updating direction for the 3D model, causing the over-smoothing effect. To address this, we propose a novel approach called Interval Score Matching (ISM). ISM employs deterministic diffusing trajectories and utilizes interval-based score matching to counteract over-smoothing. Furthermore, we incorporate 3D Gaussian Splatting into our text-to-3D generation pipeline. Extensive experiments show that our model largely outperforms the state-of-the-art in quality and training efficiency.

摘要

Transcending Forgery Specificity with Latent Space Augmentation for Generalizable Deepfake Detection

paper_url: http://arxiv.org/abs/2311.11278
repo_url: None
paper_authors: Zhiyuan Yan, Yuhao Luo, Siwei Lyu, Qingshan Liu, Baoyuan Wu
for: 这篇论文的目的是提出一个简单 yet effective的伪照片检测方法，以解决现有的检测方法对于训练和测试数据的不同分布而导致的检测性下降问题。
methods: 这篇论文使用了一个叫做LSDA（Latent Space Data Augmentation）的方法，它基于一个假设：表现更多的伪照片特征可以帮助检测方法学习更加通用的分类边界，因此减少伪照片特有的方法特征的适应。这个方法包括在伪照片空间中制造和模拟不同的伪照片特征，以扩展伪照片空间，并将这些特征融入到检测方法中。
results: 实验结果显示，这篇论文的提出的方法 surprisingly effective，并在许多常用的 benchmark 上超过了现有的检测方法。

Abstract
Deepfake detection faces a critical generalization hurdle, with performance deteriorating when there is a mismatch between the distributions of training and testing data. A broadly received explanation is the tendency of these detectors to be overfitted to forgery-specific artifacts, rather than learning features that are widely applicable across various forgeries. To address this issue, we propose a simple yet effective detector called LSDA (\underline{L}atent \underline{S}pace \underline{D}ata \underline{A}ugmentation), which is based on a heuristic idea: representations with a wider variety of forgeries should be able to learn a more generalizable decision boundary, thereby mitigating the overfitting of method-specific features (see Figure. 1). Following this idea, we propose to enlarge the forgery space by constructing and simulating variations within and across forgery features in the latent space. This approach encompasses the acquisition of enriched, domain-specific features and the facilitation of smoother transitions between different forgery types, effectively bridging domain gaps. Our approach culminates in refining a binary classifier that leverages the distilled knowledge from the enhanced features, striving for a generalizable deepfake detector. Comprehensive experiments show that our proposed method is surprisingly effective and transcends state-of-the-art detectors across several widely used benchmarks.

摘要
深层负作假检测面临着普遍的泛化挑战，其性能在训练和测试数据分布不同时会下降。一个广泛接受的解释是这些检测器过拟合伪造特有的特征，而不是学习通用于不同伪造的特征。为解决这个问题，我们提出了一种简单 yet effective 的检测器 called LSDA（短 latent Space Data Augmentation），基于一个启发性的想法：具有更多种伪造的表示应该能够学习更通用的决策边界，从而避免检测器特有的特征过拟合（参见图1）。按照这个想法，我们建议扩大伪造空间，通过在和 across 伪造特征中构建和模拟变化。这种方法涵盖了获得了更加丰富的领域特定特征，以及在不同伪造类型之间更平滑的过渡。我们的方法最终导致了一种基于更加精炼的特征的二分类器，希望能够创造一个通用的深层负作假检测器。我们的实验表明，我们提出的方法 surprisingly 有效，超越了多个通用的benchmark。

Generalization and Hallucination of Large Vision-Language Models through a Camouflaged Lens

paper_url: http://arxiv.org/abs/2311.11273
repo_url: None
paper_authors: Lv Tang, Peng-Tao Jiang, Zhihao Shen, Hao Zhang, Jinwei Chen, Bo Li
for: 本研究旨在探讨大视语言模型（LVLM）是否可以在无需训练的情况下泛化到隐身物体检测（COD）场景中。
methods: 我们提出了一种新的框架——隐身视语框架（CPVLF），以探讨LVLM是否可以在COD场景中泛化。在泛化过程中，我们发现LVLM因为幻觉问题而可能错误地感知隐身场景中的 объек，生成了对应的假设概念。此外，由于LVLM没有特定地培育针对隐身物体的精确定位，它在找到隐身物体时表现出一定的uncertainty。因此，我们提出了链条视觉，以增强LVLM在隐身场景中的视觉感知，从语言和视觉两个角度减少幻觉问题，提高LVLM的隐身物体精确定位能力。
results: 我们在三个常用COD数据集上验证了CPVLF的效果，实验结果表明LVLM在COD任务中具有潜在的能力。

Abstract
Large Vision-Language Model (LVLM) has seen burgeoning development and increasing attention recently. In this paper, we propose a novel framework, camo-perceptive vision-language framework (CPVLF), to explore whether LVLM can generalize to the challenging camouflaged object detection (COD) scenario in a training-free manner. During the process of generalization, we find that due to hallucination issues within LVLM, it can erroneously perceive objects in camouflaged scenes, producing counterfactual concepts. Moreover, as LVLM is not specifically trained for the precise localization of camouflaged objects, it exhibits a degree of uncertainty in accurately pinpointing these objects. Therefore, we propose chain of visual perception, which enhances LVLM's perception of camouflaged scenes from both linguistic and visual perspectives, reducing the hallucination issue and improving its capability in accurately locating camouflaged objects. We validate the effectiveness of CPVLF on three widely used COD datasets, and the experiments show the potential of LVLM in the COD task.

摘要
Large Vision-Language Model (LVLM) 在最近几年来已经得到了极大的发展和关注。在这篇论文中，我们提出了一种新的框架，即隐形物检测（COD）场景下的视语框架（CPVLF），以探索whether LVLM可以在无需训练的情况下泛化到隐形物检测场景。在泛化过程中，我们发现LVLM因为内在的幻觉问题而可能错误地感知隐形场景中的物体，并且生成了一些对象的counterfactual概念。此外，由于LVLM没有特别地培训用于精确地定位隐形物体，它在检测隐形物体时表现出一定的不确定性。因此，我们提出了一种链条视觉感知机制，该机制可以帮助LVLM在语言和视觉两个方面更好地理解隐形场景，降低幻觉问题，并提高其检测隐形物体的精度。我们 validate了CPVLF的效果在三个常用的COD数据集上，实验结果表明LVLM在COD任务中的潜力。

Radarize: Large-Scale Radar SLAM for Indoor Environments

paper_url: http://arxiv.org/abs/2311.11260
repo_url: None
paper_authors: Emerson Sie, Xinyu Wu, Heyu Guo, Deepak Vasisht
for: 这个论文是为了解决室内环境中的SLAM问题，使用低成本的半导体mmWave雷达。
methods: 该方法使用雷达本身的特有现象，如Doppler偏移，来提高性能。
results: 作者对146个轨迹数据进行评估，结果显示该方法与当前最佳雷达基本方法相比，约5倍提高了径迹速度和8倍提高了终端SLAM性能，而无需其他传感器如IMU或轮胎速度测量。

Abstract
We present Radarize, a self-contained SLAM pipeline for indoor environments that uses only a low-cost commodity single-chip mmWave radar. Our radar-native approach leverages phenomena unique to radio frequencies, such as doppler shift-based odometry, to improve performance. We evaluate our method on a large-scale dataset of 146 trajectories spanning 4 campus buildings, totaling approximately 4680m of travel distance. Our results show that our method outperforms state-of-the-art radar-based approaches by approximately 5x in terms of odometry and 8x in terms of end-to-end SLAM, as measured by absolute trajectory error (ATE), without the need additional sensors such as IMUs or wheel odometry.

摘要
我们介绍Radarize，一个低成本半导体 millimeter 波雷达 Self-Contained SLAM 管线，适用于室内环境。我们的雷达原生方法利用电磁波特有的现象，如偏振Shift-based odometry，提高性能。我们在4座大学建筑物中测试了146个轨迹，总覆盖距离约4680米。我们的结果表明，我们的方法在相对 Error (ATE) 指标下，与当前雷达基本方法相比，提高约5倍的征迹和8倍的总SLAM。而无需额外传感器，如IMU或轮胎速度测量。

Quality and Quantity: Unveiling a Million High-Quality Images for Text-to-Image Synthesis in Fashion Design

paper_url: http://arxiv.org/abs/2311.12067
repo_url: None
paper_authors: Jia Yu, Lichao Zhang, Zijie Chen, Fayu Pan, MiaoMiao Wen, Yuming Yan, Fangsheng Weng, Shuai Zhang, Lili Pan, Zhenzhong Lan
for: 本研究旨在推动人工智能在时尚设计领域的应用，具体来说是解决当前lack of extensive, interrelated data on clothing and try-on stages问题。
methods: 本研究使用了多年的努力，创建了全球首个一百万高品质时尚图像和详细文本描述的 dataset，该 dataset 包括了多种地域和文化背景的时尚趋势。图像都有精心标注了时尚和人体的 attribute，从而将时尚设计过程转化为文本到图像（T2I）任务。
results: 本研究不仅提供了高品质的文本-图像对和多种人类-服装对，还提供了一个新的标准 benchmark 来评估时尚设计模型的性能。这项研究代表了人工智能驱动时尚设计领域的一个重要突破，为未来的这个领域做出了新的标准。

Abstract
The fusion of AI and fashion design has emerged as a promising research area. However, the lack of extensive, interrelated data on clothing and try-on stages has hindered the full potential of AI in this domain. Addressing this, we present the Fashion-Diffusion dataset, a product of multiple years' rigorous effort. This dataset, the first of its kind, comprises over a million high-quality fashion images, paired with detailed text descriptions. Sourced from a diverse range of geographical locations and cultural backgrounds, the dataset encapsulates global fashion trends. The images have been meticulously annotated with fine-grained attributes related to clothing and humans, simplifying the fashion design process into a Text-to-Image (T2I) task. The Fashion-Diffusion dataset not only provides high-quality text-image pairs and diverse human-garment pairs but also serves as a large-scale resource about humans, thereby facilitating research in T2I generation. Moreover, to foster standardization in the T2I-based fashion design field, we propose a new benchmark comprising multiple datasets for evaluating the performance of fashion design models. This work represents a significant leap forward in the realm of AI-driven fashion design, setting a new standard for future research in this field.

摘要
人工智能（AI）和时尚设计的融合已成为一个有前途的研究领域。然而，由于缺乏广泛、相互关联的服装和试穿stage数据，AI在这个领域的潜力尚未得到完全发挥。为解决这个问题，我们现在提出了时尚扩散数据集（Fashion-Diffusion dataset），这是多年的辛苦努力的产物。这个数据集包含了大量高质量的时尚图像和详细的文本描述，从多个地理位置和文化背景中收集到。图像被精心标注了细化的服装和人体特征，使得时尚设计过程被简化为文本到图像（T2I）任务。Fashion-Diffusion dataset不仅提供了高质量的文本-图像对和多样化的人类-服装对，还可以作为大规模人类资料，促进人工智能驱动的时尚设计研究。此外，为促进时尚设计领域中T2I模型的标准化，我们提议了一个新的标准套件，该套件包括多个数据集用于评估时尚设计模型的性能。这项工作代表了人工智能驱动时尚设计领域的一个重要突破，为未来这个领域的研究提供了新的标准。

Submeter-level Land Cover Mapping of Japan

paper_url: http://arxiv.org/abs/2311.11252
repo_url: None
paper_authors: Naoto Yokoya, Junshi Xia, Clifford Broni-Bediako
for: 本研究的目的是提出一种低成本的人工监督深度学习框架，用于大规模 submeter 级地面覆盖图的自动生成。
methods: 我们使用 OpenEarthMap benchmark 数据集，并提出了一种基于 U-Net 模型的人工监督深度学习框架，以实现全国范围内的 submeter 级地面覆盖图生成。我们采用了小量的标注数据，将 U-Net 模型 retrained，并达到了全国范围内的 80% 的准确率。
results: 我们使用日本地毫图像数据，生成了日本全国的 submeter 级地面覆盖图，并达到了 80% 的准确率。这种框架可以降低标注成本，并提供高精度的地面覆盖图生成结果，可以贡献于自动更新国家级 submeter 级地面覆盖图。

Abstract
Deep learning has shown promising performance in submeter-level mapping tasks; however, the annotation cost of submeter-level imagery remains a challenge, especially when applied on a large scale. In this paper, we present the first submeter-level land cover mapping of Japan with eight classes, at a relatively low annotation cost. We introduce a human-in-the-loop deep learning framework leveraging OpenEarthMap, a recently introduced benchmark dataset for global submeter-level land cover mapping, with a U-Net model that achieves national-scale mapping with a small amount of additional labeled data. By adding a small amount of labeled data of areas or regions where a U-Net model trained on OpenEarthMap clearly failed and retraining the model, an overall accuracy of 80\% was achieved, which is a nearly 16 percentage point improvement after retraining. Using aerial imagery provided by the Geospatial Information Authority of Japan, we create land cover classification maps of eight classes for the entire country of Japan. Our framework, with its low annotation cost and high-accuracy mapping results, demonstrates the potential to contribute to the automatic updating of national-scale land cover mapping using submeter-level optical remote sensing data. The mapping results will be made publicly available.

摘要
深度学习在微米级地形映射任务中表现出了扎根的优异性，但是微米级地形映射的标注成本仍然是一大挑战，特别是在大规模应用场景下。在这篇论文中，我们提出了首个在日本全国范围内进行微米级土地覆盖分类的案例，使用了OpenEarthMap数据集，一个最近引入的全球微米级土地覆盖分类数据集。我们采用了人工循环深度学习框架，使用U-Net模型，实现了全国范围内的地形映射，并且只需要小量的额外标注数据来提高总体精度。通过在OpenEarthMap上训练U-Net模型，并在模型失败的地方添加小量标注数据，我们实现了总体精度达80%，比前一次训练后提高了16%。使用日本地理信息掌握局提供的飞行图像，我们创建了日本全国范围内的八类土地分类图。我们的框架，具有低标注成本和高精度地形映射结果，适用于自动更新国家级别的土地覆盖分类使用微米级光学Remote感数据。映射结果将公开发布。

AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort

paper_url: http://arxiv.org/abs/2311.11243
repo_url: None
paper_authors: Wen Wang, Canyu Zhao, Hao Chen, Zhekai Chen, Kecheng Zheng, Chunhua Shen
for: Story visualization system that can generate diverse, high-quality, and consistent sets of story images with minimal human interactions.
methods: Utilizes comprehension and planning capabilities of large language models for layout planning, and leverages large-scale text-to-image models to generate sophisticated story images based on the layout.
results: Improves image quality and allows easy and intuitive user interactions, as well as generates multi-view consistent character images without reliance on human labor.Here’s the Chinese version:
for: Story 视觉系统，可以生成多样化、高质量、一致的故事图片，减少人工干预。
methods: Utilizes 大语言模型的理解和规划能力进行布局规划，然后利用大规模文本到图像模型生成复杂的故事图片。
results: 提高图片质量，使用户交互更加容易和直观，同时自动生成多视图一致的人物图片。

Abstract
Story visualization aims to generate a series of images that match the story described in texts, and it requires the generated images to satisfy high quality, alignment with the text description, and consistency in character identities. Given the complexity of story visualization, existing methods drastically simplify the problem by considering only a few specific characters and scenarios, or requiring the users to provide per-image control conditions such as sketches. However, these simplifications render these methods incompetent for real applications. To this end, we propose an automated story visualization system that can effectively generate diverse, high-quality, and consistent sets of story images, with minimal human interactions. Specifically, we utilize the comprehension and planning capabilities of large language models for layout planning, and then leverage large-scale text-to-image models to generate sophisticated story images based on the layout. We empirically find that sparse control conditions, such as bounding boxes, are suitable for layout planning, while dense control conditions, e.g., sketches and keypoints, are suitable for generating high-quality image content. To obtain the best of both worlds, we devise a dense condition generation module to transform simple bounding box layouts into sketch or keypoint control conditions for final image generation, which not only improves the image quality but also allows easy and intuitive user interactions. In addition, we propose a simple yet effective method to generate multi-view consistent character images, eliminating the reliance on human labor to collect or draw character images.

摘要
(Simplified Chinese translation)story visualization 目标是生成与文本描述匹配的系列图像，并且要求生成的图像具有高质量、文本描述Alignment和人物标识性的一致性。由于story visualization的复杂性，现有的方法通常是忽视一些特定的人物和场景，或者需要用户提供每个图像的控制条件，如素描。然而，这些简化方法在实际应用中无法满足需求。为此，我们提出了一个自动化的story visualization系统，可以生成多样化、高质量和一致的图像集，并且减少人类交互。我们利用大型语言模型的理解和规划能力来进行布局规划，然后利用大规模的文本到图像模型来生成基于布局的复杂的故事图像。我们发现，使用简单的 bounding box 控制条件可以很好地进行布局规划，而使用 dense 控制条件，如素描和关键点，可以生成高质量的图像内容。为了得到最佳的结果，我们提出了一种将简单的 bounding box 控制条件转换成素描或关键点控制条件的方法，从而提高图像质量并允许用户 intuitive 的交互。此外，我们还提出了一种简单 yet effective 的方法来生成多视图一致的人物图像，从而消除了人工劳动的需求，收集或绘制人物图像。

Open-Vocabulary Camouflaged Object Segmentation

paper_url: http://arxiv.org/abs/2311.11241
repo_url: None
paper_authors: Youwei Pang, Xiaoqi Zhao, Jiaming Zuo, Lihe Zhang, Huchuan Lu
for: 该 paper 主要研究开放词汇涂抹物体识别任务，即开放 vocabulary camouflaged object segmentation (OVCOS) 任务。
methods: 该 paper 使用 pre-trained 大规模视语言模型 (CLIP) 和 iterate semantic guidance 和 structure enhancement 等方法来实现开放词汇涂抹物体识别。
results: 该 paper 在新constructed的 OVCamo 数据集上 achieved state-of-the-art Results in open-vocabulary semantic image segmentation, outperforming previous methods by a large margin.

Abstract
Recently, the emergence of the large-scale vision-language model (VLM), such as CLIP, has opened the way towards open-world object perception. Many works has explored the utilization of pre-trained VLM for the challenging open-vocabulary dense prediction task that requires perceive diverse objects with novel classes at inference time. Existing methods construct experiments based on the public datasets of related tasks, which are not tailored for open vocabulary and rarely involves imperceptible objects camouflaged in complex scenes due to data collection bias and annotation costs. To fill in the gaps, we introduce a new task, open-vocabulary camouflaged object segmentation (OVCOS) and construct a large-scale complex scene dataset (\textbf{OVCamo}) which containing 11,483 hand-selected images with fine annotations and corresponding object classes. Further, we build a strong single-stage open-vocabulary \underline{c}amouflaged \underline{o}bject \underline{s}egmentation transform\underline{er} baseline \textbf{OVCoser} attached to the parameter-fixed CLIP with iterative semantic guidance and structure enhancement. By integrating the guidance of class semantic knowledge and the supplement of visual structure cues from the edge and depth information, the proposed method can efficiently capture camouflaged objects. Moreover, this effective framework also surpasses previous state-of-the-arts of open-vocabulary semantic image segmentation by a large margin on our OVCamo dataset. With the proposed dataset and baseline, we hope that this new task with more practical value can further expand the research on open-vocabulary dense prediction tasks.

摘要
最近，大规模视力语言模型（VLM）的出现，如CLIP，已经开创了开放世界物体识别的新途径。许多研究已经利用预训练VLM来进行开放词汇稠密预测任务，需要在推理时识别多种新类型的物体。现有方法基于相关任务的公共数据集构建实验，这些数据集通常不适用于开放词汇和罕见的隐形物体，受到数据采集偏见和注释成本的限制。为了填补这些缺陷，我们引入了一个新任务：开放词汇隐形物体分割（OVCOS），并构建了一个大规模的复杂场景数据集（OVCamo），包含11,483张手动选择的图像，以及相应的物体类别。此外，我们建立了一个强大的单stage开放词汇隐形物体分割变换器（OVCoser），将CLIP的参数固定，并采用迭代性含义指导和结构增强。通过结合类别含义知识和视觉结构征信息，提出的方法可以效果地捕捉隐形物体。此外，我们的方法还超过了开放词汇semantic图像分割的前一代性能，在我们的OVCamo数据集上。通过我们的数据集和基eline，我们希望能够通过这个更加实用的任务，进一步推动开放词汇稠密预测任务的研究。

Enhancing Radiology Diagnosis through Convolutional Neural Networks for Computer Vision in Healthcare

paper_url: http://arxiv.org/abs/2311.11234
repo_url: None
paper_authors: Keshav Kumar K., Dr N V S L Narasimham
for: 这个研究探讨了使用卷积神经网络(CNNs)在医学诊断中的转变力量，特别是其可解性、效果和伦理问题。
methods: 该研究使用了修改后的DenseNet体系结构，并通过比较分析表明其在特别性、敏感度和准确率方面表现出色。
results: 研究结果表明，CNNs 在医学诊断中表现出优于传统方法，但需要解决可解性问题和不断改进模型。同时，Integration 问题和培训 радиialogsts 也需要考虑。

Abstract
The transformative power of Convolutional Neural Networks (CNNs) in radiology diagnostics is examined in this study, with a focus on interpretability, effectiveness, and ethical issues. With an altered DenseNet architecture, the CNN performs admirably in terms of particularity, sensitivity, as well as accuracy. Its superiority over conventional methods is validated by comparative analyses, which highlight efficiency gains. Nonetheless, interpretability issues highlight the necessity of sophisticated methods in addition to continuous model improvement. Integration issues like interoperability and radiologists' training lead to suggestions for teamwork. Systematic consideration of the ethical implications is carried out, necessitating extensive frameworks. Refinement of architectures, interpretability, alongside ethical considerations need to be prioritized in future work for responsible CNN deployment in radiology diagnostics.

摘要
这种研究探讨了计算机神经网络（CNN）在医学诊断中的转化力量，特别是解释性、有效性和伦理问题。通过修改 denseNet 架构，CNN 表现出色地具有特征性、敏感度和准确性。与传统方法比较分析显示，CNN 在效率方面具有明显的优势。然而，解释性问题表明需要复杂的方法和不断改进模型。医生培训和兼容性问题引起了建议，需要团队合作。对伦理问题进行系统考虑，需要广泛的框架。未来的工作应该PRIORITIZE 架构优化、解释性和伦理考虑，以实现负责任的 CNN 部署在医学诊断中。

GaussianDiffusion: 3D Gaussian Splatting for Denoising Diffusion Probabilistic Models with Structured Noise

paper_url: http://arxiv.org/abs/2311.11221
repo_url: None
paper_authors: Xinhai Li, Huaibin Wang, Kuo-Kun Tseng
for: 这篇论文旨在提出一种基于 Gaussian splatting 的文本到三维内容生成框架，以实现更加真实的三维图像生成。
methods: 该框架使用 Gaussian splatting 技术，通过控制个别 Gaussian 球体透明度来调整图像饱和度，从而生成更加实际的图像。此外，paper 还提出了一种基于多视图噪声分布的方法，以解决多视图几何匹配问题。
results: 对比传统点 wise sampling 技术，Gaussian splatting 能够生成更加细节rich和饱和度更高的图像。此外，paper 还证明了该方法可以减少浮动、蜡烛和其他艺术ifacts，提高了三维图像生成的质量和稳定性。

Abstract
Text-to-3D, known for its efficient generation methods and expansive creative potential, has garnered significant attention in the AIGC domain. However, the amalgamation of Nerf and 2D diffusion models frequently yields oversaturated images, posing severe limitations on downstream industrial applications due to the constraints of pixelwise rendering method. Gaussian splatting has recently superseded the traditional pointwise sampling technique prevalent in NeRF-based methodologies, revolutionizing various aspects of 3D reconstruction. This paper introduces a novel text to 3D content generation framework based on Gaussian splatting, enabling fine control over image saturation through individual Gaussian sphere transparencies, thereby producing more realistic images. The challenge of achieving multi-view consistency in 3D generation significantly impedes modeling complexity and accuracy. Taking inspiration from SJC, we explore employing multi-view noise distributions to perturb images generated by 3D Gaussian splatting, aiming to rectify inconsistencies in multi-view geometry. We ingeniously devise an efficient method to generate noise that produces Gaussian noise from diverse viewpoints, all originating from a shared noise source. Furthermore, vanilla 3D Gaussian-based generation tends to trap models in local minima, causing artifacts like floaters, burrs, or proliferative elements. To mitigate these issues, we propose the variational Gaussian splatting technique to enhance the quality and stability of 3D appearance. To our knowledge, our approach represents the first comprehensive utilization of Gaussian splatting across the entire spectrum of 3D content generation processes.

摘要
文本到3D技术，因其高效生成方法和广泛的创造力，在AIGC领域引起了广泛关注。然而，将NERF和2D扩散模型结合起来经常导致过度饱和的图像，从而限制了下游工业应用的可行性，由于像素级渲染方法的约束。在这篇论文中，我们介绍了一种基于Gaussian splatting的新的文本到3D内容生成框架，可以通过个别Gaussian球体透明度来控制图像饱和程度，从而生成更真实的图像。然而，在多视图协调中，3D生成过程中的模型复杂度和精度受到了严重的限制。我们在SJC的灵感下， explore使用多视图噪声分布来扰乱3D Gaussian splatting生成的图像，以修复多视图几何学的不一致。我们创新地设计了一种高效的生成噪声方法，可以从多个视图点生成Gaussian噪声，所有这些噪声都来自同一个噪声源。此外，普通的3D Gaussian基于生成往往会让模型陷入本地最小值，导致残余元素如浮动物、刺激物或生长物的出现。为了解决这些问题，我们提出了变量Gaussian splatting技术，以提高3D外观质量和稳定性。根据我们所知，我们的方法是第一次在3D内容生成过程中广泛应用Gaussian splatting。

Infrared image identification method of substation equipment fault under weak supervision

paper_url: http://arxiv.org/abs/2311.11214
repo_url: None
paper_authors: Anjali Sharma, Priya Banerjee, Nikhil Singh
for: 这种研究旨在提出一种弱监督的方法，用于检测发电厂设备的缺陷。
methods: 该方法使用更改模型网络结构和参数，以提高设备识别精度。
results: 研究表明，该方法可以准确地识别多种设备类型的缺陷，并与人工标注结果进行比较，证明了该算法的高精度。

Abstract
This study presents a weakly supervised method for identifying faults in infrared images of substation equipment. It utilizes the Faster RCNN model for equipment identification, enhancing detection accuracy through modifications to the model's network structure and parameters. The method is exemplified through the analysis of infrared images captured by inspection robots at substations. Performance is validated against manually marked results, demonstrating that the proposed algorithm significantly enhances the accuracy of fault identification across various equipment types.

摘要
Here is the text in Simplified Chinese:这个研究提出了一种弱监督的方法，用于在发电厂设备上的热像中检测缺陷。它利用了Faster RCNN模型来识别设备，通过模型网络结构和参数的修改，提高检测精度。该方法通过在发电厂检测机器人采集的热像进行分析，并与手动标注结果进行验证，证明提posed算法可以明显提高不同设备类型的缺陷检测精度。

paper_url: http://arxiv.org/abs/2311.11210
repo_url: None
paper_authors: Lei Wang, Yinchi Ma, Peng Luan, Wei Yao, Congcong Li, Bo Liu
For: Robust gait recognition in unconstrained environments, addressing challenges such as view changes, occlusions, and varying walking speeds, as well as cross-modality incompatibility.* Methods: A multi-modal Hierarchy in Hierarchy network (HiH) that integrates silhouette and pose sequences, featuring Hierarchical Gait Decomposer (HGD) modules and auxiliary branches for 2D joint sequences, including Deformable Spatial Enhancement (DSE) and Deformable Temporal Alignment (DTA) modules.* Results: State-of-the-art performance in gait recognition, with a well-balanced trade-off between accuracy and efficiency, demonstrated through extensive evaluations across diverse indoor and outdoor datasets.

Abstract
Gait recognition has achieved promising advances in controlled settings, yet it significantly struggles in unconstrained environments due to challenges such as view changes, occlusions, and varying walking speeds. Additionally, efforts to fuse multiple modalities often face limited improvements because of cross-modality incompatibility, particularly in outdoor scenarios. To address these issues, we present a multi-modal Hierarchy in Hierarchy network (HiH) that integrates silhouette and pose sequences for robust gait recognition. HiH features a main branch that utilizes Hierarchical Gait Decomposer (HGD) modules for depth-wise and intra-module hierarchical examination of general gait patterns from silhouette data. This approach captures motion hierarchies from overall body dynamics to detailed limb movements, facilitating the representation of gait attributes across multiple spatial resolutions. Complementing this, an auxiliary branch, based on 2D joint sequences, enriches the spatial and temporal aspects of gait analysis. It employs a Deformable Spatial Enhancement (DSE) module for pose-guided spatial attention and a Deformable Temporal Alignment (DTA) module for aligning motion dynamics through learned temporal offsets. Extensive evaluations across diverse indoor and outdoor datasets demonstrate HiH's state-of-the-art performance, affirming a well-balanced trade-off between accuracy and efficiency.

摘要
<>transliteration: Gait recognition yǐjīng zhòngyì xiǎngyìng, yètī zhèngyì zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng zhìyǐ zhèngxìng

3D Guidewire Shape Reconstruction from Monoplane Fluoroscopic Images

paper_url: http://arxiv.org/abs/2311.11209
repo_url: None
paper_authors: Tudor Jianu, Baoru Huang, Pierre Berthet-Rayne, Sebastiano Fichera, Anh Nguyen
for: used to reconstruct 3D guidewires in endovascular interventions, reducing radiation exposure and improving accuracy.
methods: utilizes CathSim, a state-of-the-art endovascular simulator, and a 3D Fluoroscopy Guidewire Reconstruction Network (3D-FGRN) to reconstruct the 3D guidewire from simulated monoplane fluoroscopic images.
results: delivers results on par with conventional triangulation methods, demonstrating the efficiency and potential of the proposed network.

Abstract
Endovascular navigation, essential for diagnosing and treating endovascular diseases, predominantly hinges on fluoroscopic images due to the constraints in sensory feedback. Current shape reconstruction techniques for endovascular intervention often rely on either a priori information or specialized equipment, potentially subjecting patients to heightened radiation exposure. While deep learning holds potential, it typically demands extensive data. In this paper, we propose a new method to reconstruct the 3D guidewire by utilizing CathSim, a state-of-the-art endovascular simulator, and a 3D Fluoroscopy Guidewire Reconstruction Network (3D-FGRN). Our 3D-FGRN delivers results on par with conventional triangulation from simulated monoplane fluoroscopic images. Our experiments accentuate the efficiency of the proposed network, demonstrating it as a promising alternative to traditional methods.

摘要
endpointvascular navigation, essential for diagnosing and treating endpointvascular diseases, predominantly hinges on fluoroscopic images due to the constraints in sensory feedback. Current shape reconstruction techniques for endpointvascular intervention often rely on either a priori information or specialized equipment, potentially subjecting patients to heightened radiation exposure. While deep learning holds potential, it typically demands extensive data. In this paper, we propose a new method to reconstruct the 3D guidewire by utilizing CathSim, a state-of-the-art endpointvascular simulator, and a 3D Fluoroscopy Guidewire Reconstruction Network (3D-FGRN). Our 3D-FGRN delivers results on par with conventional triangulation from simulated monoplane fluoroscopic images. Our experiments accentuate the efficiency of the proposed network, demonstrating it as a promising alternative to traditional methods.Here's the breakdown of the translation:* endpointvascular (endpointvascular navigation) -> 终端血管导航 (endpointvascular navigation)* predominantly (predominantly hinges) -> 主要地 (predominantly hinges)* fluoroscopic (fluoroscopic images) -> 萤光成像 (fluoroscopic images)* triangulation (conventional triangulation) -> 三角形计算 (conventional triangulation)* CathSim (CathSim) -> CATSIM (CathSim)* 3D Fluoroscopy Guidewire Reconstruction Network (3D-FGRN) -> 三维萤光成像导wire重建网络 (3D-FGRN)Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need Traditional Chinese, please let me know.

LogicNet: A Logical Consistency Embedded Face Attribute Learning Network

paper_url: http://arxiv.org/abs/2311.11208
repo_url: None
paper_authors: Haiyu Wu, Sicong Tian, Huayu Li, Kevin W. Bowyer
for: 提高多属性分类中逻辑一致性的可靠性
methods: 引入两个挑战：1) 如何使用数据逻辑一致性检查训练模型，以便获得逻辑一致的预测结果？2) 如何实现这一点不需要数据逻辑一致性检查？提出两个数据集（FH41K和CelebA-logic）和LogicNet adversarial training框架，用于学习属性之间的逻辑关系。
results: LogicNet在FH37K、FH41K和CelebA-logic上的准确率比下一个最佳方法高出23.05%、9.96%和1.71%，在实际案例分析中，我们的方法可以将失败案例数减少 más than 50% compared to other methods。

Abstract
Ensuring logical consistency in predictions is a crucial yet overlooked aspect in multi-attribute classification. We explore the potential reasons for this oversight and introduce two pressing challenges to the field: 1) How can we ensure that a model, when trained with data checked for logical consistency, yields predictions that are logically consistent? 2) How can we achieve the same with data that hasn't undergone logical consistency checks? Minimizing manual effort is also essential for enhancing automation. To address these challenges, we introduce two datasets, FH41K and CelebA-logic, and propose LogicNet, an adversarial training framework that learns the logical relationships between attributes. Accuracy of LogicNet surpasses that of the next-best approach by 23.05%, 9.96%, and 1.71% on FH37K, FH41K, and CelebA-logic, respectively. In real-world case analysis, our approach can achieve a reduction of more than 50% in the average number of failed cases compared to other methods.

摘要
保证预测的逻辑一致性是多属性分类中的一个重要 yet 被忽略的方面。我们探讨了可能导致这种忽略的原因，并提出了两个严重的挑战：1）如何使得在数据进行逻辑一致性检查后训练的模型可以生成逻辑一致的预测？2）如何实现这一点而不需要手动干预？为解决这些挑战，我们介绍了两个数据集FH41K和CelebA-logic，并提出了LogicNet，一种基于对属性之间的逻辑关系进行反对抗训练的框架。LogicNet的准确率与接下来最佳方法相比高出23.05%、9.96%和1.71%的差。在实际案例分析中，我们的方法可以实现预测失败案例的减少率超过50%。

Shape-Sensitive Loss for Catheter and Guidewire Segmentation

paper_url: http://arxiv.org/abs/2311.11205
repo_url: None
paper_authors: Chayun Kongtongvattana, Baoru Huang, Jingxuan Kang, Hoan Nguyen, Olajide Olufemi, Anh Nguyen
for: 这个论文旨在提高镜像内的导 wire和干扰器分割精度。
methods: 该论文使用了一种形态敏感损失函数，并在视Transformer网络中应用该函数来实现新的状态值。
results: 该论文在大规模镜像数据集上实现了新的状态值，并且提供了高维特征向量，以及一种基于cosinus相似性的图像相似度测量方法。

Abstract
We introduce a shape-sensitive loss function for catheter and guidewire segmentation and utilize it in a vision transformer network to establish a new state-of-the-art result on a large-scale X-ray images dataset. We transform network-derived predictions and their corresponding ground truths into signed distance maps, thereby enabling any networks to concentrate on the essential boundaries rather than merely the overall contours. These SDMs are subjected to the vision transformer, efficiently producing high-dimensional feature vectors encapsulating critical image attributes. By computing the cosine similarity between these feature vectors, we gain a nuanced understanding of image similarity that goes beyond the limitations of traditional overlap-based measures. The advantages of our approach are manifold, ranging from scale and translation invariance to superior detection of subtle differences, thus ensuring precise localization and delineation of the medical instruments within the images. Comprehensive quantitative and qualitative analyses substantiate the significant enhancement in performance over existing baselines, demonstrating the promise held by our new shape-sensitive loss function for improving catheter and guidewire segmentation.

摘要
我们介绍了一种形态敏感损失函数，用于捷克和导 wire segmentation，并在视Transformer网络中应用，以实现大规模X射线图像数据集上新的状态anner-of-the-art result。我们将网络 derivated predictions和其对应的ground truth转换成签名距离地图，以便任何网络可以专注于重要的边界而不仅仅是总的轮廓。这些SDMs被视Transformer转换成高维特征向量，其中包含了重要的图像特征。通过计算这些特征向量之间的cosine相似性，我们获得了一种超越传统 overlap-based measures的多方面的图像相似性理解。我们的方法具有许多优点，包括尺度和平移不变性，以及更好地检测到微不足的差异，从而确保精准地位置和定义医疗器械在图像中。完整的量化和质量分析证明了我们新的形态敏感损失函数对捷克和导 wire segmentation的改进。

Self-Supervised Versus Supervised Training for Segmentation of Organoid Images

paper_url: http://arxiv.org/abs/2311.11198
repo_url: None
paper_authors: Asmaa Haja, Eric Brouwer, Lambert Schomaker
for: 本研究旨在提高数字显微镜技术中数据标注的效率和可靠性，以便更好地利用深度学习算法进行图像分类和分割。
methods: 本研究使用了自动学习（SSL）技术，通过学习内在特征来寻找与主任务相似的预测任务，从而不需要标注数据。研究使用了ResNet50 U-Net模型，首先在增强图像上进行了图像恢复训练，然后将模型拟合到图像分割任务上。
results: 研究结果表明，使用25%像素掉用或图像模糊增强策略可以更高效地进行自动学习，并且在几何损失、精度损失和交集精度损失等评价指标上表现更好。在训练114张图像后，自动学习方法的F1分数为0.85，高于监督学习方法的F1分数（0.78），并且在训练1000张图像后，自动学习方法的F1分数仍然高于监督学习方法（0.92 vs 0.85）。

Abstract
The process of annotating relevant data in the field of digital microscopy can be both time-consuming and especially expensive due to the required technical skills and human-expert knowledge. Consequently, large amounts of microscopic image data sets remain unlabeled, preventing their effective exploitation using deep-learning algorithms. In recent years it has been shown that a lot of relevant information can be drawn from unlabeled data. Self-supervised learning (SSL) is a promising solution based on learning intrinsic features under a pretext task that is similar to the main task without requiring labels. The trained result is transferred to the main task - image segmentation in our case. A ResNet50 U-Net was first trained to restore images of liver progenitor organoids from augmented images using the Structural Similarity Index Metric (SSIM), alone, and using SSIM combined with L1 loss. Both the encoder and decoder were trained in tandem. The weights were transferred to another U-Net model designed for segmentation with frozen encoder weights, using Binary Cross Entropy, Dice, and Intersection over Union (IoU) losses. For comparison, we used the same U-Net architecture to train two supervised models, one utilizing the ResNet50 encoder as well as a simple CNN. Results showed that self-supervised learning models using a 25\% pixel drop or image blurring augmentation performed better than the other augmentation techniques using the IoU loss. When trained on only 114 images for the main task, the self-supervised learning approach outperforms the supervised method achieving an F1-score of 0.85, with higher stability, in contrast to an F1=0.78 scored by the supervised method. Furthermore, when trained with larger data sets (1,000 images), self-supervised learning is still able to perform better, achieving an F1-score of 0.92, contrasting to a score of 0.85 for the supervised method.

摘要
digit化显微镜数据的标注过程可能会很时间consuming，特别是需要技术专业知识和人工智能。因此，大量的显微镜数据仍然无法被标注，从而阻碍它们的有效利用使用深度学习算法。在过去几年中，已经证明了很多相关信息可以从无标注数据中获取。基于学习内在特征的自我监督学习（SSL）是一个有前途的解决方案，不需要标注。在我们的情况下，我们使用ResNet50 U-Net来预测肝发生器组织胞的图像，并使用SSIM指数和L1损失来训练。 Encoder和decoder都在一起训练。 weights被转授到另一个用于类别的 U-Net 模型中，使用 Binary Cross Entropy、Dice 和 Intersection over Union（IoU）损失。为了比较，我们使用了同一个 U-Net 架构，训练了两个超vised模型，一个使用 ResNet50 嵌入器，另一个使用简单的 CNN。结果显示，使用25%像素截割或图像模糊增强的自我监督学习模型在IoU损失下表现较好，并且在仅使用114幅图像进行主任务训练时，自我监督学习方法的F1分数高于超vised方法（F1=0.85 vs F1=0.78）。此外，当训练数据集大小增加到1,000幅时，自我监督学习方法仍然能够表现较好，其F1分数为0.92，而超vised方法的F1分数为0.85。

2023-11-19

cs.AI

cs.AI - 2023-11-19

LLM aided semi-supervision for Extractive Dialog Summarization

paper_url: http://arxiv.org/abs/2311.11462
repo_url: None
paper_authors: Nishant Mishra, Gaurav Sahu, Iacer Calixto, Ameen Abu-Hanna, Issam H. Laradji
for: 提高chat summarization的效果
methods: 使用state-of-the-art大语言模型（LLMs）生成对话的pseudo-标签，然后使用这些pseudo-标签进行精度的训练，从而将知识传递到更小的专门化模型中
results: 在\tweetsumm dataset上实现了65.9/57.0/61.0 ROUGE-1/-2/-L的性能，比现有的状态场景下的性能提高了4.6%/12.2%/16.7%。最差情况下（即ROUGE-L），我们仍能保持94.7%的性能，使用只有10%的数据。

Abstract
Generating high-quality summaries for chat dialogs often requires large labeled datasets. We propose a method to efficiently use unlabeled data for extractive summarization of customer-agent dialogs. In our method, we frame summarization as a question-answering problem and use state-of-the-art large language models (LLMs) to generate pseudo-labels for a dialog. We then use these pseudo-labels to fine-tune a chat summarization model, effectively transferring knowledge from the large LLM into a smaller specialized model. We demonstrate our method on the \tweetsumm dataset, and show that using 10\% of the original labelled data set we can achieve 65.9/57.0/61.0 ROUGE-1/-2/-L, whereas the current state-of-the-art trained on the entire training data set obtains 65.16/55.81/64.37 ROUGE-1/-2/-L. In other words, in the worst case (i.e., ROUGE-L) we still effectively retain 94.7% of the performance while using only 10% of the data.

摘要
生成高质量摘要 для чат对话经常需要大量标注数据。我们提议使用无标注数据来高效地进行摘要EXTRACTIVE summarization of customer-agent dialogs。在我们的方法中，我们将摘要视为问答问题，使用当前最佳大语言模型（LLM）生成对对话的pseudo标签。然后，我们使用这些pseudo标签来练化一个专门的chat摘要模型，从而将大型LLM中的知识传递到一个更小的专门模型中。我们在\tweetsumm数据集上采用这种方法，并证明使用10%原始标注数据集可以达到65.9/57.0/61.0 ROUGE-1/-2/-L，而当前状态的训练所有数据集只能达到65.16/55.81/64.37 ROUGE-1/-2/-L。换句话说，在最差情况下（即ROUGE-L），我们仍然可以保留94.7%的性能，只使用10%的数据。

SecureBERT and LLAMA 2 Empowered Control Area Network Intrusion Detection and Classification

paper_url: http://arxiv.org/abs/2311.12074
repo_url: None
paper_authors: Xuemei Li, Huirong Fu
for: 本研究旨在评估预训练模型在控制区网络攻击检测中的适应性。
methods: 我们开发了两种不同的模型：CAN-SecureBERT和CAN-LLAMA2。CAN-LLAMA2模型在准确率、精度检测率、F1分数和假阳性率方面达到了最佳性能，其中假阳性率为3.10e-6，比前一代模型MTH-IDS（多层混合攻击检测系统）的假阳性率小52倍。
results: 我们的研究表明，使用大语言模型作为基本模型，并在其上添加适应器以满足其他计算机安全相关任务，可以保持模型的语言相关能力，同时提高检测性能。

Abstract
Numerous studies have proved their effective strength in detecting Control Area Network (CAN) attacks. In the realm of understanding the human semantic space, transformer-based models have demonstrated remarkable effectiveness. Leveraging pre-trained transformers has become a common strategy in various language-related tasks, enabling these models to grasp human semantics more comprehensively. To delve into the adaptability evaluation on pre-trained models for CAN intrusion detection, we have developed two distinct models: CAN-SecureBERT and CAN-LLAMA2. Notably, our CAN-LLAMA2 model surpasses the state-of-the-art models by achieving an exceptional performance 0.999993 in terms of balanced accuracy, precision detection rate, F1 score, and a remarkably low false alarm rate of 3.10e-6. Impressively, the false alarm rate is 52 times smaller than that of the leading model, MTH-IDS (Multitiered Hybrid Intrusion Detection System). Our study underscores the promise of employing a Large Language Model as the foundational model, while incorporating adapters for other cybersecurity-related tasks and maintaining the model's inherent language-related capabilities.

摘要
多个研究证明他们在检测控制区网络（CAN）攻击方面的效力是非常高的。在人类语义空间理解方面，基于转换器的模型表现了非常出色的。利用预训练转换器变得成为了许多语言相关任务中的常见策略，使得这些模型能够更全面地捕捉人类语义。为了探索预训练模型在CAN攻击检测中的适应性，我们开发了两个不同的模型：CAN-SecureBERT和CAN-LLAMA2。需要注意的是，我们的CAN-LLAMA2模型在权衡准确率、检测精度率、F1分数和false alarm rate方面达到了非常出色的表现，其中false alarm rate为3.10e-6，与领先的模型MTH-IDS（多层混合攻击检测系统）的false alarm rate相比，下降了52倍。我们的研究证明了在基于大语言模型的同时，采用适应器进行其他Cybersecurity相关任务的可行性。

Unveiling Public Perceptions: Machine Learning-Based Sentiment Analysis of COVID-19 Vaccines in India

paper_url: http://arxiv.org/abs/2311.11435
repo_url: None
paper_authors: Milind Gupta, Abhishek Kaushik
for: 本研究旨在探讨印度人民对COVID-19疫苗的看法，以便帮助印度政府成功实施疫苗接种计划。
methods: 本研究使用数据挖掘技术分析Reddit平台上的评论，以评估印度用户对COVID-19疫苗的看法。 Python的Text Blob库用于注释评论，以评估总体情感。
results: 结果显示，大多数Reddit用户在印度表达中性或无关性对疫苗接种的看法，这对印度政府的疫苗接种计划 pose 一定的挑战。

Abstract
In March 2020, the World Health Organisation declared COVID-19 a global pandemic as it spread to nearly every country. By mid-2021, India had introduced three vaccines: Covishield, Covaxin, and Sputnik. To ensure successful vaccination in a densely populated country like India, understanding public sentiment was crucial. Social media, particularly Reddit with over 430 million users, played a vital role in disseminating information. This study employs data mining techniques to analyze Reddit data and gauge Indian sentiments towards COVID-19 vaccines. Using Python's Text Blob library, comments are annotated to assess general sentiments. Results show that most Reddit users in India expressed neutrality about vaccination, posing a challenge for the Indian government's efforts to vaccinate a significant portion of the population.

摘要
三月2020年，世界卫生组织宣布COVID-19为全球大流行，其已在大多数国家传播。到2021年中期，印度已经推出三种疫苗：Covishield、Covaxin和Sputnik。为了在印度的高度密集化国家中成功进行疫苗接种，了解公众情绪非常重要。社交媒体，特别是Reddit（拥有430万用户），在传播信息方面发挥了重要作用。这项研究使用数据挖掘技术来分析Reddit数据，评估印度人民对COVID-19疫苗的情绪。使用Python的Text Blob库，评论被注释以评估总体情绪。结果显示，大多数Reddit用户在印度表达了中性的看法，对印度政府的大规模接种计划提出了挑战。

Appearance Codes using Joint Embedding Learning of Multiple Modalities

paper_url: http://arxiv.org/abs/2311.11427
repo_url: https://github.com/edogariu/alex-zhang
paper_authors: Alex Zhang, Evan Dogariu
for: 该文章是为了解决现有的生成模型中的一个主要限制，即需要在推理时重新训练新的外观代码。
methods: 该文章提出了一种框架，即在推理时强制 enforcing a contrastive loss constraint между不同的Modalities，以学习场景的 appearances 和结构的共同embedding空间。
results: 该文章通过应用该框架于 RADIATE 数据集 \cite{sheeny2021radiate} 中的一个简单的 Variational Auto-Encoder 模型，并质量地示出了通过使用日间外观代码生成夜间照片的能力。此外，该文章还与基准 VAE 相比较，并显示了该方法在推理时不需要学习任何未看过的图像的外观代码。

Abstract
The use of appearance codes in recent work on generative modeling has enabled novel view renders with variable appearance and illumination, such as day-time and night-time renders of a scene. A major limitation of this technique is the need to re-train new appearance codes for every scene on inference, so in this work we address this problem proposing a framework that learns a joint embedding space for the appearance and structure of the scene by enforcing a contrastive loss constraint between different modalities. We apply our framework to a simple Variational Auto-Encoder model on the RADIATE dataset \cite{sheeny2021radiate} and qualitatively demonstrate that we can generate new renders of night-time photos using day-time appearance codes without additional optimization iterations. Additionally, we compare our model to a baseline VAE that uses the standard per-image appearance code technique and show that our approach achieves generations of similar quality without learning appearance codes for any unseen images on inference.

摘要
Recent work on generative modeling 使用表现代码(appearance code) 实现了新的视图渲染，如Scene中的日间和夜间渲染。然而，这种技术的主要限制是需要在推理时重新训练新的表现代码，因此在这里我们提出了一种框架，强制对不同模式之间的对比损失约束，以学习场景的共同embedding空间。我们在RADIATE数据集\cite{sheeny2021radiate}上应用了我们的框架，并质量地示出了使用日间表现代码生成夜间照片的能力。此外，我们与基准VAE模型进行比较，并证明我们的方法可以在推理时生成相同质量的图像，而无需学习任何未看过的图像的表现代码。

LifeLearner: Hardware-Aware Meta Continual Learning System for Embedded Computing Platforms

paper_url: http://arxiv.org/abs/2311.11420
repo_url: None
paper_authors: Young D. Kwon, Jagmohan Chauhan, Hong Jia, Stylianos I. Venieris, Cecilia Mascolo
for: 本研究旨在开发一种具有硬件意识的元 continual learning 系统，以提高资源受限的嵌入式设备上的学习效率和适应能力。
methods: 本研究使用了元学习和复习策略，以解决数据稀缺问题并保证高准确率。同时，使用了lossless和lossy压缩技术来减少 CL 和复习样本的资源需求。
results: 研究结果显示，LifeLearner 可以实现近似于最优 CL 性能，与 oracle 基准值相差只有2.8%。相比之下，与 SOTA Meta CL 方法相比，LifeLearner 可以减少内存占用量（178.7倍）、终端延迟（80.8-94.2%）和能 consumption（80.9-94.2%）。此外，研究者还成功部署 LifeLearner 在两个边缘设备和一个微控制器Unit上，从而实现了资源受限平台上高效的 CL 部署。

Abstract
Continual Learning (CL) allows applications such as user personalization and household robots to learn on the fly and adapt to context. This is an important feature when context, actions, and users change. However, enabling CL on resource-constrained embedded systems is challenging due to the limited labeled data, memory, and computing capacity. In this paper, we propose LifeLearner, a hardware-aware meta continual learning system that drastically optimizes system resources (lower memory, latency, energy consumption) while ensuring high accuracy. Specifically, we (1) exploit meta-learning and rehearsal strategies to explicitly cope with data scarcity issues and ensure high accuracy, (2) effectively combine lossless and lossy compression to significantly reduce the resource requirements of CL and rehearsal samples, and (3) developed hardware-aware system on embedded and IoT platforms considering the hardware characteristics. As a result, LifeLearner achieves near-optimal CL performance, falling short by only 2.8% on accuracy compared to an Oracle baseline. With respect to the state-of-the-art (SOTA) Meta CL method, LifeLearner drastically reduces the memory footprint (by 178.7x), end-to-end latency by 80.8-94.2%, and energy consumption by 80.9-94.2%. In addition, we successfully deployed LifeLearner on two edge devices and a microcontroller unit, thereby enabling efficient CL on resource-constrained platforms where it would be impractical to run SOTA methods and the far-reaching deployment of adaptable CL in a ubiquitous manner. Code is available at https://github.com/theyoungkwon/LifeLearner.

摘要

利用元学习和熵策略，直接面对数据缺乏问题，并保证高精度。2. 有效地结合lossless和lossy压缩，以减少 CL 和熵样本的资源需求。3. 对嵌入式和互联网平台进行硬件意识系统开发，考虑硬件特点。因此，LifeLearner 实现了几乎最佳的 CL 性能，相比于 Oracle 基线下，只减少了2.8%的精度。相比于状态之前（SOTA）元 CL 方法，LifeLearner 减少了内存占用量（by 178.7x）、终端延迟（by 80.8-94.2%）和能 consumption（by 80.9-94.2%）。此外，我们成功部署 LifeLearner 在两个边缘设备和一个微控制器单元上，因此实现了高效的 CL 在有限资源平台上，这些平台上运行 SOTA 方法是不实际的。代码可以在 https://github.com/theyoungkwon/LifeLearner 上下载。

A Security Risk Taxonomy for Large Language Models

paper_url: http://arxiv.org/abs/2311.11415
repo_url: None
paper_authors: Erik Derner, Kristina Batistič, Jan Zahálka, Robert Babuška
for: 这篇论文旨在评估大语言模型（LLM）的安全风险，包括诈骗、数据泄露和声誉损害等。
methods: 本论文提出了一个基于用户-模型交互pipeline的安全风险分类方法，包括提示型攻击。
results: 研究发现了许多具体的攻击示例，以及它们在实际应用中的影响。这些攻击包括诈骗、数据泄露和声誉损害等。

Abstract
As large language models (LLMs) permeate more and more applications, an assessment of their associated security risks becomes increasingly necessary. The potential for exploitation by malicious actors, ranging from disinformation to data breaches and reputation damage, is substantial. This paper addresses a gap in current research by focusing on the security risks posed by LLMs, which extends beyond the widely covered ethical and societal implications. Our work proposes a taxonomy of security risks along the user-model communication pipeline, explicitly focusing on prompt-based attacks on LLMs. We categorize the attacks by target and attack type within a prompt-based interaction scheme. The taxonomy is reinforced with specific attack examples to showcase the real-world impact of these risks. Through this taxonomy, we aim to inform the development of robust and secure LLM applications, enhancing their safety and trustworthiness.

摘要
As large language models (LLMs) 普遍应用在更多应用程序中，评估这些模型相关的安全风险成为更加必要。这些模型的潜在问题包括伪信息、数据泄露和声誉伤害等，具体的威胁来源涉及访问者的黑客、恶意攻击者和无良用户。本文对现有的研究缺失进行了补充，专注于 LLMs 相关的安全风险，这些风险超出了广泛讨论的伦理和社会因素。我们的工作提出了一个对 LLMs 进行攻击的分类方案，并将攻击分为三类：内部攻击、外部攻击和混合攻击。这个分类方案以对话领域为基础，并提供具体的攻击示例，以显示这些风险对现实世界的影响。通过这个分类方案，我们想帮助开发安全和可靠的 LLM 应用程序，从而提高这些应用程序的安全性和可靠性。

Make me an Offer: Forward and Reverse Auctioning Problems in the Tourism Industry

paper_url: http://arxiv.org/abs/2311.11400
repo_url: None
paper_authors: Ioannis T. Christou, Dimitris Doukas, Konstantina Skouri, Gerasimos Meletiou
for: 帮助酒店iers和旅游者解决常见的季节性问题，提高旅游业的经济效益和社会影响。
methods: 开发了两种拍卖系统，一种是向下拍卖模型，允许低知名度地区或低季节期酒店拍卖房间，另一种是由客人发起的反拍卖模型，类似于priceline.com的拍卖概念，allowing customers to initiate a bidding process whereby hoteliers in an area may make offers to the customer for their rooms。
results: 通过数学Programming模型定义两种拍卖，并证明在每种中，都有significant benefits for both sides of the hotelier and the customer。

Abstract
Most tourist destinations are facing regular and consistent seasonality with significant economic and social impacts. This phenomenon is more pronounced in the post-covid era, where demand for travel has increased but unevenly among different geographic areas. To counter these problems that both customers and hoteliers are facing, we have developed two auctioning systems that allow hoteliers of lower popularity tier areas or during low season periods to auction their rooms in what we call a forward auction model, and also allows customers to initiate a bidding process whereby hoteliers in an area may make offers to the customer for their rooms, in what constitutes a reverse auction model initiated by the customer, similar to the bidding concept of priceline.com. We develop mathematical programming models that define explicitly both types of auctions, and show that in each type, there are significant benefits to be gained both on the side of the hotelier as well as on the side of the customer. We discuss algorithmic techniques for the approximate solution of these optimization problems, and present results using exact optimization solvers to solve them to guaranteed optimality. These techniques could be beneficial to both customer and hotelier reducing seasonality during middle and low season and providing the customer with attractive offers.

摘要
多个旅游目的地都面临了常见和稳定的季节性问题，这种现象在covid后期更加突出，旅游需求增加了，但不均匀分布在不同地理区域。为了解决客户和酒店经理面临的问题，我们开发了两种拍卖系统：一是向下层知名度地区或低季节期酒店拍卖房间的前向拍卖模式，二是让客户 initiat 竞拍过程，让酒店在区域内提供优惠套餐，类似于priceline.com的拍卖概念。我们定义了这两种拍卖模式的数学编程模型，并证明每种模型都有很大的利益。我们介绍了算法技术来解决这些优化问题，并使用精确优化器解决它们，以确保最优解。这些技术可以帮助客户和酒店减少中期和低季节的季节性问题，并为客户提供吸引人的优惠。

Enhancing Novel Object Detection via Cooperative Foundational Models

paper_url: http://arxiv.org/abs/2311.12068
repo_url: https://github.com/rohit901/cooperative-foundational-models
paper_authors: Rohit Bharadwaj, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan
For: 本研究目标是解决 novel object detection (NOD) 问题，即在推理过程中准确地检测已知和新类目对象。传统的对象检测算法受到closed-set的限制，无法处理 NOD。* Methods: 我们提出了一种将现有的 closed-set 检测器转化为 open-set 检测器的方法，利用 CLIP 和 SAM 两种预训练基本模型的优势。我们还与 state-of-the-art 的 open-set 检测器 GDINO 集成，以达到更高的对象检测性能。* Results: 我们在 LVIS 数据集上实现了 17.42 mAP 的新对象检测精度，并在 COCO OVD 分 split 上超过当前状态的报告值。我们的代码可以在 https://github.com/rohit901/cooperative-foundational-models 上下载。

Abstract
In this work, we address the challenging and emergent problem of novel object detection (NOD), focusing on the accurate detection of both known and novel object categories during inference. Traditional object detection algorithms are inherently closed-set, limiting their capability to handle NOD. We present a novel approach to transform existing closed-set detectors into open-set detectors. This transformation is achieved by leveraging the complementary strengths of pre-trained foundational models, specifically CLIP and SAM, through our cooperative mechanism. Furthermore, by integrating this mechanism with state-of-the-art open-set detectors such as GDINO, we establish new benchmarks in object detection performance. Our method achieves 17.42 mAP in novel object detection and 42.08 mAP for known objects on the challenging LVIS dataset. Adapting our approach to the COCO OVD split, we surpass the current state-of-the-art by a margin of 7.2 $ \text{AP}_{50} $ for novel classes. Our code is available at https://github.com/rohit901/cooperative-foundational-models .

摘要
在这项工作中，我们 Addressing the challenging and emerging problem of novel object detection (NOD), we focus on accurately detecting both known and novel object categories during inference. Traditional object detection algorithms are inherently closed-set, limiting their ability to handle NOD. We present a novel approach to transform existing closed-set detectors into open-set detectors. This transformation is achieved by leveraging the complementary strengths of pre-trained foundational models, specifically CLIP and SAM, through our cooperative mechanism. Furthermore, by integrating this mechanism with state-of-the-art open-set detectors such as GDINO, we establish new benchmarks in object detection performance. Our method achieves 17.42 mAP in novel object detection and 42.08 mAP for known objects on the challenging LVIS dataset. Adapting our approach to the COCO OVD split, we surpass the current state-of-the-art by a margin of 7.2 $\text{AP}_{50}$ for novel classes. Our code is available at .Here's the word-for-word translation of the text into Simplified Chinese:在这项工作中，我们 Addressing the challenging and emerging problem of novel object detection (NOD), we focus on accurately detecting both known and novel object categories during inference. Traditional object detection algorithms are inherently closed-set, limiting their ability to handle NOD. We present a novel approach to transform existing closed-set detectors into open-set detectors. This transformation is achieved by leveraging the complementary strengths of pre-trained foundational models, specifically CLIP and SAM, through our cooperative mechanism. Furthermore, by integrating this mechanism with state-of-the-art open-set detectors such as GDINO, we establish new benchmarks in object detection performance. Our method achieves 17.42 mAP in novel object detection and 42.08 mAP for known objects on the challenging LVIS dataset. Adapting our approach to the COCO OVD split, we surpass the current state-of-the-art by a margin of 7.2 $\text{AP}_{50}$ for novel classes. Our code is available at .

Inspecting Explainability of Transformer Models with Additional Statistical Information

paper_url: http://arxiv.org/abs/2311.11378
repo_url: None
paper_authors: Hoang C. Nguyen, Haeil Lee, Junmo Kim
for: 这篇论文的目的是为了有效地解释Transformer模型在视觉和多 modal任务中的含义。
methods: 这篇论文使用了将注意层组合起来，以显示每个图像块的重要性。
results: 这篇论文表明了对Swin Transformer和ViT的解释性能力很强，并且可以准确地显示预测对象。

Abstract
Transformer becomes more popular in the vision domain in recent years so there is a need for finding an effective way to interpret the Transformer model by visualizing it. In recent work, Chefer et al. can visualize the Transformer on vision and multi-modal tasks effectively by combining attention layers to show the importance of each image patch. However, when applying to other variants of Transformer such as the Swin Transformer, this method can not focus on the predicted object. Our method, by considering the statistics of tokens in layer normalization layers, shows a great ability to interpret the explainability of Swin Transformer and ViT.

摘要
孔 transformed 在视觉领域的 популяр度在最近几年内逐渐增长，因此有需要找到一种有效地 interpret 孔 transformed 模型的方法。在最近的工作中，Chefer et al. 通过组合注意层来有效地视觉和多modal任务中的孔 transformed 模型。然而，当应用于其他孔 transformed 模型，如 Swin Transformer 和 ViT 时，这种方法无法关注预测的对象。我们的方法，通过考虑层normalization层中的统计数据，显示了对 Swin Transformer 和 ViT 的解释性能出色。

SOccDPT: Semi-Supervised 3D Semantic Occupancy from Dense Prediction Transformers trained under memory constraints

paper_url: http://arxiv.org/abs/2311.11371
repo_url: None
paper_authors: Aditya Nalgunda Ganesh
for: 这种paper是为了提出一种能够在封闭环境中进行3Dsemantic occupancy prediction的方法，以提高现有方法的性能。
methods: 这种方法使用dense prediction transformers来进行预测，并使用 semi-supervised training pipeline来学习从不结构化交通数据集中。另外，它还引入了patch-wise training来解决内存限制问题。
results: 这种方法在不结构化交通场景下表现出色，与现有方法相比，它的RMSE分数为9.1473， semantic segmentation IoU分数为46.02%，并且能够运行在69.47 Hz的频率上。

Abstract
We present SOccDPT, a memory-efficient approach for 3D semantic occupancy prediction from monocular image input using dense prediction transformers. To address the limitations of existing methods trained on structured traffic datasets, we train our model on unstructured datasets including the Indian Driving Dataset and Bengaluru Driving Dataset. Our semi-supervised training pipeline allows SOccDPT to learn from datasets with limited labels by reducing the requirement for manual labelling by substituting it with pseudo-ground truth labels to produce our Bengaluru Semantic Occupancy Dataset. This broader training enhances our model's ability to handle unstructured traffic scenarios effectively. To overcome memory limitations during training, we introduce patch-wise training where we select a subset of parameters to train each epoch, reducing memory usage during auto-grad graph construction. In the context of unstructured traffic and memory-constrained training and inference, SOccDPT outperforms existing disparity estimation approaches as shown by the RMSE score of 9.1473, achieves a semantic segmentation IoU score of 46.02% and operates at a competitive frequency of 69.47 Hz. We make our code and semantic occupancy dataset public.

摘要
我们提出了SOccDPT，一种具有内存效率的方法，用于从单视图图像输入获取3D semantic occupancy预测。为了解决现有方法在结构化交通数据集上的局限性，我们将我们的模型训练在无结构数据集上，包括印度驾驶数据集和孟买丹驾驶数据集。我们的半supervised训练管道使得SOccDPT可以从有限标签的数据集中学习，而不需要手动标注。为了增强我们的模型对无结构交通场景的处理能力，我们引入了patch-wise训练，将每个epoch中的一 subset of参数选择训练。在自动梯度图构建过程中，我们减少了训练过程中的内存使用。在无结构交通和内存受限的训练和执行环境下，SOccDPT比现有的不同差分估计方法表现更好，RMSE分数为9.1473，实现semantic segmentation IoU分数为46.02%，并在竞争性的69.47 Hz频率下运行。我们将我们的代码和semantic occupancy数据集公开。

Using Causal Threads to Explain Changes in a Dynamic System

paper_url: http://arxiv.org/abs/2311.11334
repo_url: None
paper_authors: Robert B. Allen
for: 这篇论文主要用于构建系统的 semantics 模型，具体来说是用于描述系统中状态变化的 causal 解释。
methods: 该论文使用 structured causal explanations 和 process-based dynamic knowledge graphs 来建立系统的 semantics 模型。
results: 通过 Snowball Earth 理论中的地质变化的 causal 线索，构建了一个初步的图形界面来展示解释。与统计方法such as Large Language Models (LLMs)不同，该方法可以直接被检查和验证。

Abstract
We explore developing rich semantic models of systems. Specifically, we consider structured causal explanations about state changes in those systems. Essentially, we are developing process-based dynamic knowledge graphs. As an example, we construct a model of the causal threads for geological changes proposed by the Snowball Earth theory. Further, we describe an early prototype of a graphical interface to present the explanations. Unlike statistical approaches to summarization and explanation such as Large Language Models (LLMs), our approach of direct representation can be inspected and verified directly.

摘要
我们探索构建丰富Semantic模型系统。特别是，我们考虑结构化 causal 解释系统状态变化。基本上，我们正在构建基于过程的动态知识图。例如，我们构建了 Snowball Earth 理论提出的地质变化 causal 线索模型。此外，我们描述了一个早期的图形用户界面来展示解释。与统计方法such as Large Language Models (LLMs)不同，我们的直接表示方法可以直接检查和验证。

Portuguese FAQ for Financial Services

paper_url: http://arxiv.org/abs/2311.11331
repo_url: None
paper_authors: Paulo Finardi, Wanderley M. Melo, Edgard D. Medeiros Neto, Alex F. Mansano, Pablo B. Costa, Vinicius F. Caridá
for: 提高葡萄牙金融领域自然语言处理（NLP）应用的发展，因为当地数据的缺乏限制了NLP应用的研究和开发。
methods: 使用数据增强技术生成具有不同semantic similarity的数据集，并在超级vised和无监督任务中评估增强数据的影响。
results: 通过数据增强，提高了NLP应用中的性能，并且在low和high semantic similarity场景下都能够获得良好的效果。同时，生成的数据集将被公开发布在Hugging Face Datasets平台上，以便更广泛的学术社区参与和交流。

Abstract
Scarcity of domain-specific data in the Portuguese financial domain has disfavored the development of Natural Language Processing (NLP) applications. To address this limitation, the present study advocates for the utilization of synthetic data generated through data augmentation techniques. The investigation focuses on the augmentation of a dataset sourced from the Central Bank of Brazil FAQ, employing techniques that vary in semantic similarity. Supervised and unsupervised tasks are conducted to evaluate the impact of augmented data on both low and high semantic similarity scenarios. Additionally, the resultant dataset will be publicly disseminated on the Hugging Face Datasets platform, thereby enhancing accessibility and fostering broader engagement within the NLP research community.

摘要
缺乏特定领域数据的问题在葡萄牙金融领域妨碍了自然语言处理（NLP）应用的发展。为解决这个限制，当前的研究建议利用数据增强技术生成的 sintética data。研究将对来自中南美洲中央银行FAQ的数据进行增强，使用不同的semantic similarity技术。在低和高semantic similarity情况下，进行了supervised和unsupervised任务来评估增强数据的影响。此外，生成的数据集将公开发布在Hugging Face Datasets平台上，从而提高了可访问性和推动了更广泛的NLP研究社区参与。

Bounds on Representation-Induced Confounding Bias for Treatment Effect Estimation

paper_url: http://arxiv.org/abs/2311.11321
repo_url: None
paper_authors: Valentyn Melnychuk, Dennis Frauen, Stefan Feuerriegel
for: 该论文旨在提出一种新的、不受表示学习的束缚的极限 bounds 估计方法，用于评估 conditional average treatment effect (CATE) 估计中的受限制的束缚偏见。
methods: 该论文使用了一种新的、 representation-agnostic 框架，用于估计 CATE 估计中的受限制的束缚偏见。该框架包括一种 theoretically 确定 CATE 不可识别性的条件，以及一种用于估计受限制的束缚偏见的方法。
results: 该论文通过一系列实验证明了该 bounds 估计方法的效果iveness。具体来说，该方法可以减少 CATE 估计中的受限制的束缚偏见，并提供一种irect relevance 的方法来评估 CATE 估计的有效性。

Abstract
State-of-the-art methods for conditional average treatment effect (CATE) estimation make widespread use of representation learning. Here, the idea is to reduce the variance of the low-sample CATE estimation by a (potentially constrained) low-dimensional representation. However, low-dimensional representations can lose information about the observed confounders and thus lead to bias, because of which the validity of representation learning for CATE estimation is typically violated. In this paper, we propose a new, representation-agnostic framework for estimating bounds on the representation-induced confounding bias that comes from dimensionality reduction (or other constraints on the representations) in CATE estimation. First, we establish theoretically under which conditions CATEs are non-identifiable given low-dimensional (constrained) representations. Second, as our remedy, we propose to perform partial identification of CATEs or, equivalently, aim at estimating of lower and upper bounds of the representation-induced confounding bias. We demonstrate the effectiveness of our bounds in a series of experiments. In sum, our framework is of direct relevance in practice where the validity of CATE estimation is of importance.

摘要
现代方法 для条件平均治疗效果（CATE）估计广泛使用表示学习。这里的想法是通过减小低样本CATE估计的方差，使用（可能受限的）低维度表示。然而，低维度表示可能会产生对观察的隐藏变量的损失信息，从而导致偏误，因此表示学习在CATE估计中的有效性通常会被违反。在这篇论文中，我们提出了一个新的、表示无关的框架，用于估计受约束表示带来的 repression-induced confounding bias 的上下限。首先，我们理论上证明在低维度（受约束）表示下，CATE是非可定义的。其次，作为我们的救济方法，我们提议在CATE估计中进行部分标识，或等价地，估计 repression-induced confounding bias 的下限和上限。我们在一系列实验中证明了我们的上限的有效性。总之，我们的框架在实践中对CATE估计的有效性具有直接的重要性。

GeoSAM: Fine-tuning SAM with Sparse and Dense Visual Prompting for Automated Segmentation of Mobility Infrastructure

paper_url: http://arxiv.org/abs/2311.11319
repo_url: None
paper_authors: Rafi Ibn Sultan, Chengyin Li, Hui Zhu, Prashant Khanduri, Marco Brocanelli, Dongxiao Zhu
for: 减少地图图像中的交通基础设施 segmentation 难题
methods: 提出了一种基于 SAM 框架的 Geographical SAM（GeoSAM）模型，通过zero-shot learning 精度提升策略和基于预训练 CNN 分割模型的稀疏视觉提示策略
results: GeoSAM 模型在地图图像中的交通基础设施分割 tasks 上表现出优于现有方法，具体提高了20%, 14.29%, 17.65% 等，代表了对地图图像中交通基础设施分割任务的很大进步。

Abstract
The Segment Anything Model (SAM) has shown impressive performance when applied to natural image segmentation. However, it struggles with geographical images like aerial and satellite imagery, especially when segmenting mobility infrastructure including roads, sidewalks, and crosswalks. This inferior performance stems from the narrow features of these objects, their textures blending into the surroundings, and interference from objects like trees, buildings, vehicles, and pedestrians - all of which can disorient the model to produce inaccurate segmentation maps. To address these challenges, we propose Geographical SAM (GeoSAM), a novel SAM-based framework that implements a fine-tuning strategy using the dense visual prompt from zero-shot learning, and the sparse visual prompt from a pre-trained CNN segmentation model. The proposed GeoSAM outperforms existing approaches for geographical image segmentation, specifically by 20%, 14.29%, and 17.65% for road infrastructure, pedestrian infrastructure, and on average, respectively, representing a momentous leap in leveraging foundation models to segment mobility infrastructure including both road and pedestrian infrastructure in geographical images.

摘要
《Segment Anything Model》（SAM）在自然图像分割 tasks 上表现出色，但在飞行和卫星图像中分割交通基础设施，如道路、人行道和横渡道，却表现不佳，主要原因在于这些物体的窄特征，Texture与周围环境杂mix，以及对象如树、建筑、车辆和步行人的干扰，这些因素会让模型产生错误的分割图像。为Address这些挑战，我们提出了《地理SAM》（GeoSAM），一种基于SAM的novel框架，通过适应策略和零shot学习 dense visual prompt、以及预训练 CNN segmentation模型的稀疏 visual prompt来解决这些问题。我们的提议GeoSAM在地理图像分割 tasks 上比 existed Approach 高出20%, 14.29%, 和17.65%，分别表示了在基础模型上 segments mobility 基础设施，包括道路和人行道基础设施的分割，即使在飞行和卫星图像中，具有巨大的进步。

TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems

paper_url: http://arxiv.org/abs/2311.11315
repo_url: None
paper_authors: Yilun Kong, Jingqing Ruan, Yihong Chen, Bin Zhang, Tianpeng Bao, Shiwei Shi, Guoqing Du, Xiaoru Hu, Hangyu Mao, Ziyue Li, Xingyu Zeng, Rui Zhao
for: 提高基于 LLM 的任务规划和工具使用能力，在真实世界系统中运行
methods: 提出了一个完整的框架，包括 API 选取器、 LLF 练化器和示例选取器，用于解决真实世界系统中的三大挑战
results: 验证了方法使用真实世界商业系统和开源学术数据集，结果表明每个组件都有效，同时集成的框架也能够提高任务规划和工具使用能力

Abstract
Large Language Models (LLMs) have demonstrated proficiency in addressing tasks that necessitate a combination of task planning and the usage of external tools that require a blend of task planning and the utilization of external tools, such as APIs. However, real-world complex systems present three prevalent challenges concerning task planning and tool usage: (1) The real system usually has a vast array of APIs, so it is impossible to feed the descriptions of all APIs to the prompt of LLMs as the token length is limited; (2) the real system is designed for handling complex tasks, and the base LLMs can hardly plan a correct sub-task order and API-calling order for such tasks; (3) Similar semantics and functionalities among APIs in real systems create challenges for both LLMs and even humans in distinguishing between them. In response, this paper introduces a comprehensive framework aimed at enhancing the Task Planning and Tool Usage (TPTU) abilities of LLM-based agents operating within real-world systems. Our framework comprises three key components designed to address these challenges: (1) the API Retriever selects the most pertinent APIs for the user task among the extensive array available; (2) LLM Finetuner tunes a base LLM so that the finetuned LLM can be more capable for task planning and API calling; (3) the Demo Selector adaptively retrieves different demonstrations related to hard-to-distinguish APIs, which is further used for in-context learning to boost the final performance. We validate our methods using a real-world commercial system as well as an open-sourced academic dataset, and the outcomes clearly showcase the efficacy of each individual component as well as the integrated framework.

摘要

What Lies beyond the Pareto Front? A Survey on Decision-Support Methods for Multi-Objective Optimization

paper_url: http://arxiv.org/abs/2311.11288
repo_url: None
paper_authors: Zuzanna Osika, Jazmin Zatarain Salazar, Diederik M. Roijers, Frans A. Oliehoek, Pradeep K. Murukannaiah
for: 本文旨在总结和探讨多目标优化算法生成的解决方案的决策支持方法。
methods: 本文涵盖了多元优化算法解决问题时的解决方案分析方法，包括视觉化、解决集挖掘、不确定性探索以及emerging研究方向，如交互、解释性和伦理。
results: 本文synthesizes多种研究领域的方法，建立了一个独立于应用领域的统一方法，以降低使用多元优化算法的入门难度，并提供了新的研究方向。

Abstract
We present a review that unifies decision-support methods for exploring the solutions produced by multi-objective optimization (MOO) algorithms. As MOO is applied to solve diverse problems, approaches for analyzing the trade-offs offered by MOO algorithms are scattered across fields. We provide an overview of the advances on this topic, including methods for visualization, mining the solution set, and uncertainty exploration as well as emerging research directions, including interactivity, explainability, and ethics. We synthesize these methods drawing from different fields of research to build a unified approach, independent of the application. Our goals are to reduce the entry barrier for researchers and practitioners on using MOO algorithms and to provide novel research directions.

摘要
我们提出了一篇文章，整合多目标优化（MOO）算法生成的解决方案的决策支持方法的评审。由于MOO在解决多种问题时使用，关于MOO算法的解决方案分析方法在不同领域中散布开来。我们提供了这些进展的概述，包括可视化、解决集挖掘、不确定性探索以及新兴研究方向，如互动、解释性和伦理。我们将这些方法从不同领域的研究中总结出来，建立一个独立于应用的统一方法，以降低MOO算法的入门难度，并提供新的研究方向。

Tactile Active Inference Reinforcement Learning for Efficient Robotic Manipulation Skill Acquisition

paper_url: http://arxiv.org/abs/2311.11287
repo_url: None
paper_authors: Zihao Liu, Xing Liu, Yizhai Zhang, Zhengxiong Liu, Panfeng Huang
for: 这个研究旨在提出一种基于感觉活动吸引学习（Tactile-AIRL）的新方法，以实现有效的 manipulate 训练。
methods: 这种方法通过结合RL和活动推理，并使用视觉感知器来提供细致的感知，以提高RL的训练效率和适应率。
results: 实验结果表明，这种方法在非握持对象推力任务中可以达到显著高的训练效率，并在粗略和精细奖励任务中均表现出色，超过了SAC基eline。 Physical experiments on a gripper screwing task also demonstrate the algorithm’s rapid learning capability and its potential for practical applications.

Abstract
Robotic manipulation holds the potential to replace humans in the execution of tedious or dangerous tasks. However, control-based approaches are not suitable due to the difficulty of formally describing open-world manipulation in reality, and the inefficiency of existing learning methods. Thus, applying manipulation in a wide range of scenarios presents significant challenges. In this study, we propose a novel method for skill learning in robotic manipulation called Tactile Active Inference Reinforcement Learning (Tactile-AIRL), aimed at achieving efficient training. To enhance the performance of reinforcement learning (RL), we introduce active inference, which integrates model-based techniques and intrinsic curiosity into the RL process. This integration improves the algorithm's training efficiency and adaptability to sparse rewards. Additionally, we utilize a vision-based tactile sensor to provide detailed perception for manipulation tasks. Finally, we employ a model-based approach to imagine and plan appropriate actions through free energy minimization. Simulation results demonstrate that our method achieves significantly high training efficiency in non-prehensile objects pushing tasks. It enables agents to excel in both dense and sparse reward tasks with just a few interaction episodes, surpassing the SAC baseline. Furthermore, we conduct physical experiments on a gripper screwing task using our method, which showcases the algorithm's rapid learning capability and its potential for practical applications.

摘要
人工 manipulate 潜在可以取代人类在干燥或危险任务中执行。然而，控制基于方法不适用，因为在实际开放世界中形式地描述 manipulate 是困难的，而且现有的学习方法不够有效。因此，在各种场景中应用 manipulate 存在 significante挑战。在这项研究中，我们提出了一种新的技术 для manipulate 的技能学习，称为感觉主动吸引强化学习（Tactile-AIRL），以实现高效培训。为了提高强化学习（RL）的表现，我们引入了活动推理，将模型基本技术和内生好奇纳入RL过程中。这种整合可以提高算法的培训效率和适应缺乏奖励的适应力。此外，我们利用视觉基于感觉器来提供细致的感知 для manipulate 任务。最后，我们采用模型基本方法来假设和规划适当的动作，通过自由能量减少来做出最佳的行为。在非握持物推力任务上，我们的方法在训练效率方面取得了显著的进步，可以在很少的互动集上达到SAC基准的性能。此外，我们在一个吊钢钉卷任务中使用我们的方法进行物理实验，结果表明了算法的快速学习能力和实际应用的潜力。

Adversarial Prompt Tuning for Vision-Language Models

paper_url: http://arxiv.org/abs/2311.11261
repo_url: None
paper_authors: Jiaming Zhang, Xingjun Ma, Xin Wang, Lingyu Qiu, Jiaqi Wang, Yu-Gang Jiang, Jitao Sang
for: 本研究旨在提高多modal学习中预训练的视觉语言模型（VLM）对攻击图像的Robustness。
methods: 本文引入了一种新的技术 called Adversarial Prompt Tuning（AdvPT），它利用可学习的文本提示和恶意图像嵌入之间的对应关系，以提高VLM的攻击图像Robustness。
results: 实验结果表明，AdvPT可以提高VLM的抗白盒和黑盒攻击性能，并且可以与现有的图像处理方法结合使用，进一步提高安全性能。

Abstract
With the rapid advancement of multimodal learning, pre-trained Vision-Language Models (VLMs) such as CLIP have demonstrated remarkable capacities in bridging the gap between visual and language modalities. However, these models remain vulnerable to adversarial attacks, particularly in the image modality, presenting considerable security risks. This paper introduces Adversarial Prompt Tuning (AdvPT), a novel technique to enhance the adversarial robustness of image encoders in VLMs. AdvPT innovatively leverages learnable text prompts and aligns them with adversarial image embeddings, to address the vulnerabilities inherent in VLMs without the need for extensive parameter training or modification of the model architecture. We demonstrate that AdvPT improves resistance against white-box and black-box adversarial attacks and exhibits a synergistic effect when combined with existing image-processing-based defense techniques, further boosting defensive capabilities. Comprehensive experimental analyses provide insights into adversarial prompt tuning, a novel paradigm devoted to improving resistance to adversarial images through textual input modifications, paving the way for future robust multimodal learning research. These findings open up new possibilities for enhancing the security of VLMs. Our code will be available upon publication of the paper.

摘要
随着多modal学习的快速发展，预训练的视觉语言模型（VLM）如CLIP表现出了跨modalities的很好的桥接能力。然而，这些模型对抗攻击仍然存在很大的安全风险，特别是在图像模式下。这篇论文提出了一种新的技术——敏感提示调整（AdvPT），以提高VLM中图像Encoder的抗击攻击能力。AdvPT利用可学习的文本提示和敌对图像嵌入的对应关系，从而解决VLM中存在的抗击攻击漏洞，无需对模型参数进行较为广泛的训练或者改变模型结构。我们的实验结果表明，AdvPT可以提高对白盒和黑盒抗击攻击的抵抗力，并且与现有的图像处理基础技术结合使用时，可以进一步强化防御能力。这些发现开启了新的可能性，以提高VLM的安全性。我们的代码将在论文发表后提供。

Tensor networks for interpretable and efficient quantum-inspired machine learning

paper_url: http://arxiv.org/abs/2311.11258
repo_url: None
paper_authors: Shi-Ju Ran, Gang Su
for: 这篇论文是为了探讨 tensor network 在深度机器学习中的应用和发展。
methods: 本文使用 tensor network 作为一种数学工具，通过其在量子机理学和多体物理中的强有力基础，提供了一种高度可解释的深度机器学习方案。
results: 本文综述了 tensor network 在深度机器学习中的激进进步，包括高度可解释的 ML 方案和高效的计算技术。此外，随着量子计算机的发展，tensor network 预期会推出novel的量子硬件实现方案，指向未来的“量子人工智能”。

Abstract
It is a critical challenge to simultaneously gain high interpretability and efficiency with the current schemes of deep machine learning (ML). Tensor network (TN), which is a well-established mathematical tool originating from quantum mechanics, has shown its unique advantages on developing efficient ``white-box'' ML schemes. Here, we give a brief review on the inspiring progresses made in TN-based ML. On one hand, interpretability of TN ML is accommodated with the solid theoretical foundation based on quantum information and many-body physics. On the other hand, high efficiency can be rendered from the powerful TN representations and the advanced computational techniques developed in quantum many-body physics. With the fast development on quantum computers, TN is expected to conceive novel schemes runnable on quantum hardware, heading towards the ``quantum artificial intelligence'' in the forthcoming future.

摘要
Current deep machine learning (ML) schemes face a critical challenge in achieving both high interpretability and efficiency. Tensor network (TN), a well-established mathematical tool originating from quantum mechanics, has shown unique advantages in developing efficient "white-box" ML schemes. Here, we provide a brief review of the inspiring progress made in TN-based ML.On one hand, the interpretability of TN ML is rooted in the solid theoretical foundation based on quantum information and many-body physics. On the other hand, high efficiency can be achieved through the powerful TN representations and advanced computational techniques developed in quantum many-body physics. With the rapid development of quantum computers, TN is expected to create novel schemes that can run on quantum hardware, leading towards the "quantum artificial intelligence" of the future.

A Comprehensive Review on Sentiment Analysis: Tasks, Approaches and Applications

paper_url: http://arxiv.org/abs/2311.11250
repo_url: None
paper_authors: Sudhanshu Kumar, Partha Pratim Roy, Debi Prosad Dogra, Byung-Gyu Kim
for: 这篇论文主要是为了介绍情绪分析（SA）的研究和发展，以及它在不同领域的应用。
methods: 论文使用了 Lexicon-based approach、Machine Learning和深度学习等方法来实现情绪分析。
results: 论文总结了情绪分析的挑战和机遇，并提供了不同领域的应用例子。

Abstract
Sentiment analysis (SA) is an emerging field in text mining. It is the process of computationally identifying and categorizing opinions expressed in a piece of text over different social media platforms. Social media plays an essential role in knowing the customer mindset towards a product, services, and the latest market trends. Most organizations depend on the customer's response and feedback to upgrade their offered products and services. SA or opinion mining seems to be a promising research area for various domains. It plays a vital role in analyzing big data generated daily in structured and unstructured formats over the internet. This survey paper defines sentiment and its recent research and development in different domains, including voice, images, videos, and text. The challenges and opportunities of sentiment analysis are also discussed in the paper. \keywords{Sentiment Analysis, Machine Learning, Lexicon-based approach, Deep Learning, Natural Language Processing}

摘要
《情感分析（SA）是一个出现的领域，它是通过计算机自动地判断和分类文本中表达的意见的过程。社交媒体在了解客户心态方面发挥了关键作用，因为它可以为企业提供客户反馈和评价，帮助企业改进产品和服务。 SA或意见挖掘在不同领域的研究中具有普遍的应用前景，它可以处理大量的互联网上的数据，包括文本、语音、图像和视频等。本评论 paper define了情感的定义和最新的研究发展，以及不同领域中的挑战和机会。（ keywords：情感分析、机器学习、词典基本方法、深度学习、自然语言处理）

Open Set Dandelion Network for IoT Intrusion Detection

paper_url: http://arxiv.org/abs/2311.11249
repo_url: None
paper_authors: Jiashu Wu, Hao Dai, Kenneth B. Kent, Jerome Yen, Chengzhong Xu, Yang Wang
for: This paper aims to address the problem of intrusion detection in IoT devices, which is crucial due to the increasing use of IoT devices. However, traditional intrusion detection methods are not effective due to the data scarcity of IoT devices.methods: The proposed method, Open-Set Dandelion Network (OSDN), uses unsupervised heterogeneous domain adaptation in an open-set manner to transfer intrusion knowledge from a knowledge-rich source network intrusion domain to a data-scarce target IoT intrusion domain. The OSDN model forms the source domain into a dandelion-like feature space, and uses a target membership mechanism, dandelion angular separation mechanism, and dandelion embedding alignment mechanism to achieve better inter-category separability and intra-category compactness.results: The proposed OSDN model outperforms three state-of-the-art baseline methods by 16.9% in terms of intrusion detection accuracy. The comprehensive experiments on several intrusion datasets demonstrate the effectiveness of the OSDN model.

Abstract
As IoT devices become widely, it is crucial to protect them from malicious intrusions. However, the data scarcity of IoT limits the applicability of traditional intrusion detection methods, which are highly data-dependent. To address this, in this paper we propose the Open-Set Dandelion Network (OSDN) based on unsupervised heterogeneous domain adaptation in an open-set manner. The OSDN model performs intrusion knowledge transfer from the knowledge-rich source network intrusion domain to facilitate more accurate intrusion detection for the data-scarce target IoT intrusion domain. Under the open-set setting, it can also detect newly-emerged target domain intrusions that are not observed in the source domain. To achieve this, the OSDN model forms the source domain into a dandelion-like feature space in which each intrusion category is compactly grouped and different intrusion categories are separated, i.e., simultaneously emphasising inter-category separability and intra-category compactness. The dandelion-based target membership mechanism then forms the target dandelion. Then, the dandelion angular separation mechanism achieves better inter-category separability, and the dandelion embedding alignment mechanism further aligns both dandelions in a finer manner. To promote intra-category compactness, the discriminating sampled dandelion mechanism is used. Assisted by the intrusion classifier trained using both known and generated unknown intrusion knowledge, a semantic dandelion correction mechanism emphasises easily-confused categories and guides better inter-category separability. Holistically, these mechanisms form the OSDN model that effectively performs intrusion knowledge transfer to benefit IoT intrusion detection. Comprehensive experiments on several intrusion datasets verify the effectiveness of the OSDN model, outperforming three state-of-the-art baseline methods by 16.9%.

摘要
随着互联网物联网（IoT）设备的普及，保护它们免受恶意攻击变得非常重要。然而，IoT数据的缺乏使得传统的攻击检测方法不能够应用。为此，本文提出了基于无监督多类领域适应的开放集合网络（OSDN）模型，用于更加准确地检测IoT攻击。在开放集合Setting下，OSDN模型还可以检测新兴的目标领域攻击，而这些攻击没有在源领域观察到。为解决数据缺乏问题，OSDN模型将源领域形成一个 LIKE 风格的特征空间，在该空间中，每个攻击类型都是紧凑的分组的，同时也强调了不同攻击类型之间的分离。然后，通过目标风车机制，将目标领域转化为另一个风车。接着，风车angular分离机制实现了更好的inter-category分离，而风车嵌入对齐机制进一步将两个风车嵌入更细的层次。为了促进内部分类紧凑，使用抽象样本风车机制，以增强混淆类别之间的分离。总的来说，OSDN模型通过 Transfer learning 来传递知识，以便更好地检测IoT攻击。对于多个攻击数据集的实验证明，OSDN模型的效果非常出色，与三个状态机制比例高达16.9%。

AtomXR: Streamlined XR Prototyping with Natural Language and Immersive Physical Interaction

paper_url: http://arxiv.org/abs/2311.11238
repo_url: None
paper_authors: Alice Cai, Caine Ardayfio, AnhPhu Nguyen, Tica Lin, Elena Glassman
for: 提高开发效率和用户体验，为不熟悉XR开发的开发者提供一个易用的开发工具。
methods: 使用自然语言、眼动和触摸交互，提供一个吸引式的头戴式开发环境，并使用人工智能语言模型（LLMs）生成AtomScript脚本。
results: 经验证明，AtomXR可以提高开发速度和用户体验，比传统系统更高效。

Abstract
As technological advancements in extended reality (XR) amplify the demand for more XR content, traditional development processes face several challenges: 1) a steep learning curve for inexperienced developers, 2) a disconnect between 2D development environments and 3D user experiences inside headsets, and 3) slow iteration cycles due to context switching between development and testing environments. To address these challenges, we introduce AtomXR, a streamlined, immersive, no-code XR prototyping tool designed to empower both experienced and inexperienced developers in creating applications using natural language, eye-gaze, and touch interactions. AtomXR consists of: 1) AtomScript, a high-level human-interpretable scripting language for rapid prototyping, 2) a natural language interface that integrates LLMs and multimodal inputs for AtomScript generation, and 3) an immersive in-headset authoring environment. Empirical evaluation through two user studies offers insights into natural language-based and immersive prototyping, and shows AtomXR provides significant improvements in speed and user experience compared to traditional systems.

摘要
随着扩展现实（XR）技术的进步，需求更多的XR内容的增加，传统的开发过程面临多种挑战：1）新手开发者学习曲线过于陡峭，2）2D开发环境与头盔中的3D用户体验之间的断绝，3）由于开发和测试环境之间的上下文切换而导致的慢速迭代循环。为了解决这些挑战，我们介绍AtomXR，一种简化的、沉浸式、无代码XR开发工具，用于帮助经验不足和经验丰富的开发者在使用自然语言、眼动和触摸交互创建应用程序。AtomXR包括：1）AtomScript，一种高级的人类可解释的脚本语言，用于快速概念化，2）一个与大语言模型（LLM）和多Modal输入集成的自然语言界面，3）一个沉浸式在头盔中的作者环境。经验证明，通过两项用户研究，AtomXR在自然语言基础和沉浸式概念驱动中提供了 Traditional 系统相比的显著改进，具体来说是快速和用户体验方面的改进。

paper_url: http://arxiv.org/abs/2311.11237
repo_url: None
paper_authors: Jiazhen Wang
for: 这种多模态情绪识别方法是用于提高情绪识别精度和学习效率的。
methods: 这种方法使用了两个通道卷积神经网络和环形网络，并将词 vector化使用 GloVe，然后将词vector输入到卷积神经网络中，并使用注意机制和最大池化Converter BiSRU通道，从而获得本地深情和预后序情 semantics。
results: 实验表明，基于特征融合的情绪分析方法可以有效提高情绪数据集的识别精度和降低学习时间。这种模型具有一定的通用性。

Abstract
A multi-modal emotion recognition method was established by combining two-channel convolutional neural network with ring network. This method can extract emotional information effectively and improve learning efficiency. The words were vectorized with GloVe, and the word vector was input into the convolutional neural network. Combining attention mechanism and maximum pool converter BiSRU channel, the local deep emotion and pre-post sequential emotion semantics are obtained. Finally, multiple features are fused and input as the polarity of emotion, so as to achieve the emotion analysis of the target. Experiments show that the emotion analysis method based on feature fusion can effectively improve the recognition accuracy of emotion data set and reduce the learning time. The model has a certain generalization.

摘要
一种多模式情感识别方法由两个通道卷积神经网络与环形网络结合，以提取情感信息效果地和提高学习效率。文本被vector化使用GloVe，word vector输入卷积神经网络。将注意机制和最大池化器BiSRU通道结合，可以获得当地深情感和预后序列情感 semantics。最后，多种特征被融合并输入为情感方向，以实现目标情感分析。实验显示，基于特征融合的情感分析方法可以提高情感数据集的识别精度和减少学习时间。模型具有一定的通用性。

Unraveling the `Anomaly’ in Time Series Anomaly Detection: A Self-supervised Tri-domain Solution

paper_url: http://arxiv.org/abs/2311.11235
repo_url: https://github.com/pseudo-Skye/TriAD
paper_authors: Yuting Sun, Guansong Pang, Guanhua Ye, Tong Chen, Xia Hu, Hongzhi Yin
for: 本研究旨在提出一种基于自我超vision学习的三Domain异常检测器（TriAD），以解决时间序列异常检测（TSAD）中的挑战，包括缺乏异常标签和异常形态和长度的变化。
methods: TriAD使用了三个数据领域的特征模型，不需要异常标签，通过交互领域和内部领域的对比损失来学习常见特征。
results: TriAD在UCRC datasets上实现了三倍于SOTA深度学习模型的PA%K基数F1得分，以及50%的准确率提升 compared to SOTA的discord发现算法。

Abstract
The ongoing challenges in time series anomaly detection (TSAD), notably the scarcity of anomaly labels and the variability in anomaly lengths and shapes, have led to the need for a more efficient solution. As limited anomaly labels hinder traditional supervised models in TSAD, various SOTA deep learning techniques, such as self-supervised learning, have been introduced to tackle this issue. However, they encounter difficulties handling variations in anomaly lengths and shapes, limiting their adaptability to diverse anomalies. Additionally, many benchmark datasets suffer from the problem of having explicit anomalies that even random functions can detect. This problem is exacerbated by ill-posed evaluation metrics, known as point adjustment (PA), which can result in inflated model performance. In this context, we propose a novel self-supervised learning based Tri-domain Anomaly Detector (TriAD), which addresses these challenges by modeling features across three data domains - temporal, frequency, and residual domains - without relying on anomaly labels. Unlike traditional contrastive learning methods, TriAD employs both inter-domain and intra-domain contrastive loss to learn common attributes among normal data and differentiate them from anomalies. Additionally, our approach can detect anomalies of varying lengths by integrating with a discord discovery algorithm. It is worth noting that this study is the first to reevaluate the deep learning potential in TSAD, utilizing both rigorously designed datasets (i.e., UCR Archive) and evaluation metrics (i.e., PA%K and affiliation). Through experimental results on the UCR dataset, TriAD achieves an impressive three-fold increase in PA%K based F1 scores over SOTA deep learning models, and 50% increase of accuracy as compared to SOTA discord discovery algorithms.

摘要
“时间序列异常检测（TSAD）中的挑战，包括缺乏异常标签和异常长度和形状的变化，导致需要更有效的解决方案。由于受限于异常标签，传统的超级vised学习模型在TSAD中受到阻碍，而新的state-of-the-art（SOTA）深度学习技术如自我supervised learning被引入来解决这个问题。然而，这些技术在异常长度和形状的变化中遇到困难，限制它们在多种异常检测中的适用性。此外，许多benchmark dataset受到异常标签的问题，这个问题更加严重由于评估度量（point adjustment，PA）的问题，可能导致模型表现的夸大。在这个 контекст中，我们提出了一个基于自我supervised learning的Tri-domain Anomaly Detector（TriAD），解决了这些挑战。TriAD不需要异常标签，而是通过在三个数据领域（时间、频率和差分领域）中建立特征，并运用両侧和内部对照损失来学习常规数据的共同特征和与异常检测。此外，TriAD可以检测异常的不同长度，通过与歧义发现算法的结合。值得注意的是，本研究是TSAD领域中首次将深度学习的潜力发掘，运用了严谨的测试数据（i.e., UCR Archive）和评估度量（i.e., PA%K和affiliation）。经过实验结果显示，TriAD在UCR数据集上表现出三倍的PA%K基于F1分数，和对于SOTA歧义发现算法的50%增加。”

FedRA: A Random Allocation Strategy for Federated Tuning to Unleash the Power of Heterogeneous Clients

paper_url: http://arxiv.org/abs/2311.11227
repo_url: https://github.com/leondada/fedra
paper_authors: Shangchao Su, Bin Li, Xiangyang Xue
for: 这个研究旨在提出一个新的联邦调教算法（FedRA），以便在实际的联邦学习应用中，处理多个客户端的资料和计算资源，并将其联合 fine-tune 基础模型。
methods: FedRA 使用了一个随机生成的分配矩阵，并在资源受限的客户端上实现了一小部分层的重新排序和微调。这些微调的结果将被服务器统计，并将其与原始模型的对应层进行整合。
results: 在两个大规模的图像数据集上（DomainNet 和 NICO++），FedRA 在多个非同寻设定下进行了实验，并与比较方法相比，它表现出了很好的性能。

Abstract
With the increasing availability of Foundation Models, federated tuning has garnered attention in the field of federated learning, utilizing data and computation resources from multiple clients to collaboratively fine-tune foundation models. However, in real-world federated scenarios, there often exist a multitude of heterogeneous clients with varying computation and communication resources, rendering them incapable of supporting the entire model fine-tuning process. In response to this challenge, we propose a novel federated tuning algorithm, FedRA. The implementation of FedRA is straightforward and can be seamlessly integrated into any transformer-based model without the need for further modification to the original model. Specifically, in each communication round, FedRA randomly generates an allocation matrix. For resource-constrained clients, it reorganizes a small number of layers from the original model based on the allocation matrix and fine-tunes using LoRA. Subsequently, the server aggregates the updated LoRA parameters from the clients according to the current allocation matrix into the corresponding layers of the original model. It is worth noting that FedRA also supports scenarios where none of the clients can support the entire global model, which is an impressive advantage. We conduct experiments on two large-scale image datasets, DomainNet and NICO++, under various non-iid settings. The results demonstrate that FedRA outperforms the compared methods significantly. The source code is available at \url{https://github.com/leondada/FedRA}.

摘要
随着基础模型的可用性的增加，联邦调教（federated tuning）在联邦学习领域引起了关注，利用多个客户端的数据和计算资源进行联邦调教。然而，在实际的联邦场景中，客户端通常存在多种不同的计算和通信资源，使得他们无法支持整个模型调教过程。为了解决这个挑战，我们提出了一种新的联邦调教算法，即 FedRA。FedRA的实现非常直观，可以轻松地与任何基于转换器的模型集成，无需对原始模型进行进一步修改。具体来说，在每次通信循环中，FedRA随机生成一个分配矩阵。对于资源受限的客户端，它会根据分配矩阵重新排序一小部分层 FROM the original model，并使用 LoRA 进行微调。然后，服务器会根据当前的分配矩阵，将客户端上更新的 LoRA 参数与原始模型的相应层进行集成。需要注意的是，FedRA 还支持情况下，客户端无法支持整个全局模型，这是一个非常优势的特点。我们在 DomainNet 和 NICO++ 两个大规模图像数据集上进行了多种非标一致的实验，结果显示，FedRA 与相比方法相比有显著的优势。源代码可以在 \url{https://github.com/leondada/FedRA} 上获取。

An Interactive Query Generation Assistant using LLM-based Prompt Modification and User Feedback

paper_url: http://arxiv.org/abs/2311.11226
repo_url: None
paper_authors: Kaustubh D. Dhole, Ramraj Chandradevan, Eugene Agichtein
for: 这个论文旨在提供一种助手来帮助用户在搜索过程中提出更有效的查询。
methods: 论文使用了多种自然语言处理技术，包括语言模型（LLMs）和人工智能技术，以帮助用户提出更有效的查询。
results: 论文通过对多种语言和多个文档库进行测试，发现使用这种助手可以提高用户在搜索过程中提出的查询的有效性。

Abstract
While search is the predominant method of accessing information, formulating effective queries remains a challenging task, especially for situations where the users are not familiar with a domain, or searching for documents in other languages, or looking for complex information such as events, which are not easily expressible as queries. Providing example documents or passages of interest, might be easier for a user, however, such query-by-example scenarios are prone to concept drift, and are highly sensitive to the query generation method. This demo illustrates complementary approaches of using LLMs interactively, assisting and enabling the user to provide edits and feedback at all stages of the query formulation process. The proposed Query Generation Assistant is a novel search interface which supports automatic and interactive query generation over a mono-linguial or multi-lingual document collection. Specifically, the proposed assistive interface enables the users to refine the queries generated by different LLMs, to provide feedback on the retrieved documents or passages, and is able to incorporate the users' feedback as prompts to generate more effective queries. The proposed interface is a valuable experimental tool for exploring fine-tuning and prompting of LLMs for query generation to qualitatively evaluate the effectiveness of retrieval and ranking models, and for conducting Human-in-the-Loop (HITL) experiments for complex search tasks where users struggle to formulate queries without such assistance.

摘要
“寻找信息是主要的访问方法，但是寻找有效的查询仍然是一个问题，特别是当用户不熟悉一个领域，或者搜寻文档written in other languages，或者搜寻复杂的信息，如事件，这些不易表达为查询。提供示例文档或 interessante 的段落可能较为容易，但这些查询示例enario 容易受到概念变化的影响，并且高度敏感于查询生成方法。本 demo 展示了与 LLM 互动的辅助方法，帮助用户在查询生成过程中提供修改和反馈。提案的查询生成助手是一个新的搜寻界面，可以在单语言或多语言文档集上进行自动和互动查询生成。具体来说，提案的辅助界面可以让用户对不同的 LLM 生成的查询进行修改，提供关于已经 retrieve 的文档或段落的反馈，并且可以将用户的反馈作为提示，生成更有效的查询。本interface 是一个实用的实验工具，可以用于调整 LLM 的 fine-tuning 和提示，以评估搜寻和排名模型的效iveness，并且可以进行人类在 Loop （HITL）实验，用于复杂的搜寻任务，where users struggle to formulate queries without such assistance。”

SPLAIN: Augmenting CybersecurityWarnings with Reasons and Data

paper_url: http://arxiv.org/abs/2311.11215
repo_url: None
paper_authors: Vera A. Kazakova, Jena D. Hwang, Bonnie J. Dorr, Yorick Wilks, J. Blake Gage, Alex Memory, Mark A. Clark
for: 提供了一种自然语言生成器，用于转化 warnings 数据为用户友好的cyber 威胁解释。
methods: 使用模板基本 approached 生成一个层次结构的 warning 结构和词汇，以确保 warning 的一致性和可读性。
results: 通过使用 SPLAIN，可以提供了clear 和可行的输出，包括输入数据和系统功能的层次结构化解释。

Abstract
Effective cyber threat recognition and prevention demand comprehensible forecasting systems, as prior approaches commonly offer limited and, ultimately, unconvincing information. We introduce Simplified Plaintext Language (SPLAIN), a natural language generator that converts warning data into user-friendly cyber threat explanations. SPLAIN is designed to generate clear, actionable outputs, incorporating hierarchically organized explanatory details about input data and system functionality. Given the inputs of individual sensor-induced forecasting signals and an overall warning from a fusion module, SPLAIN queries each signal for information on contributing sensors and data signals. This collected data is processed into a coherent English explanation, encompassing forecasting, sensing, and data elements for user review. SPLAIN's template-based approach ensures consistent warning structure and vocabulary. SPLAIN's hierarchical output structure allows each threat and its components to be expanded to reveal underlying explanations on demand. Our conclusions emphasize the need for designers to specify the "how" and "why" behind cyber warnings, advocate for simple structured templates in generating consistent explanations, and recognize that direct causal links in Machine Learning approaches may not always be identifiable, requiring some explanations to focus on general methodologies, such as model and training data.

摘要
为了有效识别和预防网络攻击，需要有可理解的预测系统，因为传统的方法通常只提供有限和不够有力的信息。我们介绍了简化平文语言（SPLAIN），一种自然语言生成器，可以将警告数据转换成有用的网络攻击预测说明。SPLAIN是设计来生成明了、可行的输出，并包括阶段性的解释细节和系统功能。对输入感知器生成的单个警告信号和总警告从拢合模块来说，SPLAIN会询问每个信号的报告来自哪些感知器和数据信号。这些收集的数据被处理成一个 coherent English 说明，包括预测、感知和数据元素，供用户审查。SPLAIN 使用模板方法，以保证警告结构和词汇的一致性。SPLAIN 的层次结构输出，让每个威胁和其组成部分可以Expand 到显示下一层的解释。我们的结论强调设计人员需要指定 "如何" 和 "为什么" 在网络警告中，主张使用简单结构模板生成一致的解释，并认可直接 causal 链在机器学习方法中可能无法识别，有些解释需要关注总方法，如模型和训练数据。

Can We Utilize Pre-trained Language Models within Causal Discovery Algorithms?

paper_url: http://arxiv.org/abs/2311.11212
repo_url: None
paper_authors: Chanhui Lee, Juhyeon Kim, Yongjun Jeong, Juhyun Lyu, Junghee Kim, Sangmin Lee, Sangjun Han, Hyeokjun Choe, Soyeon Park, Woohyung Lim, Sungbin Lim, Sanghack Lee
for: 本研究探讨了透过预训语言模型（PLM）进行 causal reasoning 的可能性，以及如何将 PLM 应用于探索 causal 关系性。
methods: 本研究使用了重复的 causal reasoning 来测试 PLM 的可能性，并通过特定设计的 prompts 来实现。
results: 研究发现 PLM-based causal reasoning 存在许多限制，包括 prompt design 的影响和 false prediction 的风险。在实验中，我们显示了 PLM-based causal reasoning 的局限性，并提出了一个新的框架，具有与 PLM 的融合和 causal discovery 的整合。这个框架不仅在整合 PLM 和 causal discovery 中提高了性能，还建议了如何将 PLM 提取的专业知识与现有的 causal discovery 算法整合。

Abstract
Scaling laws have allowed Pre-trained Language Models (PLMs) into the field of causal reasoning. Causal reasoning of PLM relies solely on text-based descriptions, in contrast to causal discovery which aims to determine the causal relationships between variables utilizing data. Recently, there has been current research regarding a method that mimics causal discovery by aggregating the outcomes of repetitive causal reasoning, achieved through specifically designed prompts. It highlights the usefulness of PLMs in discovering cause and effect, which is often limited by a lack of data, especially when dealing with multiple variables. Conversely, the characteristics of PLMs which are that PLMs do not analyze data and they are highly dependent on prompt design leads to a crucial limitation for directly using PLMs in causal discovery. Accordingly, PLM-based causal reasoning deeply depends on the prompt design and carries out the risk of overconfidence and false predictions in determining causal relationships. In this paper, we empirically demonstrate the aforementioned limitations of PLM-based causal reasoning through experiments on physics-inspired synthetic data. Then, we propose a new framework that integrates prior knowledge obtained from PLM with a causal discovery algorithm. This is accomplished by initializing an adjacency matrix for causal discovery and incorporating regularization using prior knowledge. Our proposed framework not only demonstrates improved performance through the integration of PLM and causal discovery but also suggests how to leverage PLM-extracted prior knowledge with existing causal discovery algorithms.

摘要
<>使用预训练语言模型（PLM）进行 causal reasoning 已经得到了许多研究的推动。 PLM 可以通过文本描述来进行 causal reasoning，而不需要数据，这与 causal discovery 不同，后者的目标是通过数据来确定变量之间的 causal 关系。在最近的研究中，有一种方法被提出，它通过特定的 prompts 来模拟 causal discovery，并通过重复的 causal reasoning 来获得结果。这种方法高亮了 PLM 在发现 causal 关系的用途，特别是在 dealing with multiple variables 时。然而，PLM 的特点是它们不会分析数据，而且它们高度依赖于 prompt 的设计，这导致了直接使用 PLM 在 causal discovery 中的限制。因此，PLM-based causal reasoning 深受 prompt 设计的限制，并且存在风险的过确定和 false prediction 在确定 causal 关系上。在这篇论文中，我们employmultiple physics-inspired synthetic data 进行实验，以证明 PLM-based causal reasoning 中的上述限制。然后，我们提出了一种新的框架，它可以将 PLM 提取的 prior knowledge 与 causal discovery 算法相结合。我们的提出的框架不仅在将 PLM 和 causal discovery 结合起来后表现出了改进的性能，而且还表明了如何使用 PLM 提取的 prior knowledge 与现有的 causal discovery 算法结合。<>

Leveraging Generative AI for Clinical Evidence Summarization Needs to Achieve Trustworthiness

paper_url: http://arxiv.org/abs/2311.11211
repo_url: None
paper_authors: Gongbo Zhang, Qiao Jin, Denis Jered McInerney, Yong Chen, Fei Wang, Curtis L. Cole, Qian Yang, Yanshan Wang, Bradley A. Malin, Mor Peleg, Byron C. Wallace, Zhiyong Lu, Chunhua Weng, Yifan Peng
for: 提高医疗质量，使医疗决策和实践受到最佳证据支持。
methods: 使用大语言模型来自动摘要医学证据，以便更好地收集、评估和Synthesize医学证据。
results: 通过开发可信worthy的生成AI模型，可以提高医学证据摘要的效率和准确性。

Abstract
Evidence-based medicine aims to improve the quality of healthcare by empowering medical decisions and practices with the best available evidence. The rapid growth of medical evidence, which can be obtained from various sources, poses a challenge in collecting, appraising, and synthesizing the evidential information. Recent advancements in generative AI, exemplified by large language models, hold promise in facilitating the arduous task. However, developing accountable, fair, and inclusive models remains a complicated undertaking. In this perspective, we discuss the trustworthiness of generative AI in the context of automated summarization of medical evidence.

摘要
证据基础医学目标是提高健康医疗质量，通过最好的证据来决策和实践医疗。医疗证据的快速增长，来自多种来源，增加了收集、评估和总结证据的挑战。最新的生成AI技术，如大语言模型，表现出了激进推动这项任务的潜力。然而，建立可靠、公正和包容的模型仍然是一项复杂的任务。本观点讨论了自动摘要医疗证据中生成AI的可靠性。

On the Noise Scheduling for Generating Plausible Designs with Diffusion Models

paper_url: http://arxiv.org/abs/2311.11207
repo_url: None
paper_authors: Jiajie Fan, Laure Vuaille, Thomas Bäck, Hao Wang
for: 这个论文探讨了Diffusion Models（DGMs）在创造新的设计方面的应用，尤其是在视觉质量高和结构Semantic表达有约束的情况下。methods: 该论文使用了Diffusion Models（DGMs）进行生成图像，并研究了噪声程度对结果plausibility的影响。两种技术被提出来确定噪声范围，并制定了一个新的parametric噪声程度。results: 相比默认噪声程度，使用提出的噪声程度可以大幅提高了结果plausibility的率从83.4%提高到93.5%，并且Fr'echet Inception Distance（FID）从7.84降低到4.87。这些结果表明模型具有良好的结构理解能力。

Abstract
Deep Generative Models (DGMs) are widely used to create innovative designs across multiple industries, ranging from fashion to the automotive sector. In addition to generating images of high visual quality, the task of structural design generation imposes more stringent constrains on the semantic expression, e.g., no floating material or missing part, which we refer to as plausibility in this work. We delve into the impact of noise schedules of diffusion models on the plausibility of the outcome: there exists a range of noise levels at which the model's performance decides the result plausibility. Also, we propose two techniques to determine such a range for a given image set and devise a novel parametric noise schedule for better plausibility. We apply this noise schedule to the training and sampling of the well-known diffusion model EDM and compare it to its default noise schedule. Compared to EDM, our schedule significantly improves the rate of plausible designs from 83.4% to 93.5% and Fr\'echet Inception Distance (FID) from 7.84 to 4.87. Further applications of advanced image editing tools demonstrate the model's solid understanding of structure.

摘要

Unmasking and Improving Data Credibility: A Study with Datasets for Training Harmless Language Models

paper_url: http://arxiv.org/abs/2311.11202
repo_url: https://github.com/docta-ai/docta
paper_authors: Zhaowei Zhu, Jialu Wang, Hao Cheng, Yang Liu
for: 这项研究的目的是提高现实世界数据集的信息准确性，以提高语言模型的训练和应用。
methods: 该研究提出了一种系统性的评估数据集信息准确性的框架，并使用了自动标注和人工审核来检测数据集中的标签错误。
results: 该研究在11个实际世界数据集上发现了平均6.16%的标签错误，并通过直接修复标签错误来提高数据集的信息准确性和下游学习性能。

Abstract
Language models have shown promise in various tasks but can be affected by undesired data during training, fine-tuning, or alignment. For example, if some unsafe conversations are wrongly annotated as safe ones, the model fine-tuned on these samples may be harmful. Therefore, the correctness of annotations, i.e., the credibility of the dataset, is important. This study focuses on the credibility of real-world datasets, including the popular benchmarks Jigsaw Civil Comments, Anthropic Harmless & Red Team, PKU BeaverTails & SafeRLHF, that can be used for training a harmless language model. Given the cost and difficulty of cleaning these datasets by humans, we introduce a systematic framework for evaluating the credibility of datasets, identifying label errors, and evaluating the influence of noisy labels in the curated language data, specifically focusing on unsafe comments and conversation classification. With the framework, we find and fix an average of 6.16% label errors in 11 datasets constructed from the above benchmarks. The data credibility and downstream learning performance can be remarkably improved by directly fixing label errors, indicating the significance of cleaning existing real-world datasets. Open-source: https://github.com/Docta-ai/docta.

摘要
Language models have shown promise in various tasks, but can be affected by undesired data during training, fine-tuning, or alignment. For example, if some unsafe conversations are wrongly annotated as safe ones, the model fine-tuned on these samples may be harmful. Therefore, the correctness of annotations, i.e., the credibility of the dataset, is important. This study focuses on the credibility of real-world datasets, including the popular benchmarks Jigsaw Civil Comments, Anthropic Harmless & Red Team, PKU BeaverTails & SafeRLHF, that can be used for training a harmless language model. Given the cost and difficulty of cleaning these datasets by humans, we introduce a systematic framework for evaluating the credibility of datasets, identifying label errors, and evaluating the influence of noisy labels in the curated language data, specifically focusing on unsafe comments and conversation classification. With the framework, we find and fix an average of 6.16% label errors in 11 datasets constructed from the above benchmarks. The data credibility and downstream learning performance can be remarkably improved by directly fixing label errors, indicating the significance of cleaning existing real-world datasets. Open-source: .

Assessing AI Impact Assessments: A Classroom Study

paper_url: http://arxiv.org/abs/2311.11193
repo_url: None
paper_authors: Nari Johnson, Hoda Heidari
for: This paper is written to evaluate the effectiveness of existing AI impact assessments (AIIAs) and to provide recommendations for future work on developing and validating AIIAs.
methods: The paper uses a classroom study with 38 students at a large research-intensive university to evaluate the impact of AIIAs on participants’ perceptions of the potential risks of generative AI systems and the level of responsibility held by AI experts in addressing potential harm.
results: The study finds preliminary evidence that impact assessments can influence participants’ perceptions of the potential risks of generative AI systems, and identifies a consistent set of limitations shared by several existing AIIA instruments.Here are the results in Simplified Chinese text:
for: 这篇论文是为了评估现有的人工智能影响评估（AIIA）的有效性，并提供未来工作的建议。
methods: 这篇论文使用了一个大学课堂研究（N = 38），以评估AI影响评估的影响。
results: 研究发现，影响评估可以影响参与者对生成AI系统的风险潜在性的观念，并发现了许多现有AIIA工具的限制。

Abstract
Artificial Intelligence Impact Assessments ("AIIAs"), a family of tools that provide structured processes to imagine the possible impacts of a proposed AI system, have become an increasingly popular proposal to govern AI systems. Recent efforts from government or private-sector organizations have proposed many diverse instantiations of AIIAs, which take a variety of forms ranging from open-ended questionnaires to graded score-cards. However, to date that has been limited evaluation of existing AIIA instruments. We conduct a classroom study (N = 38) at a large research-intensive university (R1) in an elective course focused on the societal and ethical implications of AI. We assign students to different organizational roles (for example, an ML scientist or product manager) and ask participant teams to complete one of three existing AI impact assessments for one of two imagined generative AI systems. In our thematic analysis of participants' responses to pre- and post-activity questionnaires, we find preliminary evidence that impact assessments can influence participants' perceptions of the potential risks of generative AI systems, and the level of responsibility held by AI experts in addressing potential harm. We also discover a consistent set of limitations shared by several existing AIIA instruments, which we group into concerns about their format and content, as well as the feasibility and effectiveness of the activity in foreseeing and mitigating potential harms. Drawing on the findings of this study, we provide recommendations for future work on developing and validating AIIAs.

摘要
人工智能影响评估工具（AIIA），一家家用于假设提案的人工智能系统的结构化过程，已经成为控制人工智能系统的增加 популяр的建议。 latest efforts from government or private-sector organizations have proposed many diverse instantiations of AIIAs, which take a variety of forms ranging from open-ended questionnaires to graded score-cards. However, to date there has been limited evaluation of existing AIIA instruments.我们在一所大型研究激射大学（R1）的选修课程中（关于人工智能的社会和道德问题），分配学生不同的组织角色（例如，机器学习科学家或产品经理），并让参与者组合完成一个现有的人工智能影响评估工具。在我们的主题分析中，我们发现了参与者对生成人工智能系统的潜在风险的认知改变，以及AI专家在避免可能的危害方面承担的责任水平。我们还发现了许多现有AIIA工具的共同局限性，包括格式和内容的问题，以及活动的可行性和效果。基于这些研究结果，我们提供未来开发和验证AIIAs的建议。

Attention-Based Real-Time Defenses for Physical Adversarial Attacks in Vision Applications

paper_url: http://arxiv.org/abs/2311.11191
repo_url: None
paper_authors: Giulio Rossolini, Alessandro Biondi, Giorgio Buttazzo
for: 防止深度神经网络在实际世界中受到攻击，以确保其应用在安全关键领域。
methods: 使用Channel-attention mechanism来快速识别和跟踪恶意物体，并在多帧场景中掩蔽恶意影响。
results: 提高了现有的过度活动技术的性能，并在多帧场景中实现了高效的防御框架，通过广泛的实验证明了其效果。

Abstract
Deep neural networks exhibit excellent performance in computer vision tasks, but their vulnerability to real-world adversarial attacks, achieved through physical objects that can corrupt their predictions, raises serious security concerns for their application in safety-critical domains. Existing defense methods focus on single-frame analysis and are characterized by high computational costs that limit their applicability in multi-frame scenarios, where real-time decisions are crucial. To address this problem, this paper proposes an efficient attention-based defense mechanism that exploits adversarial channel-attention to quickly identify and track malicious objects in shallow network layers and mask their adversarial effects in a multi-frame setting. This work advances the state of the art by enhancing existing over-activation techniques for real-world adversarial attacks to make them usable in real-time applications. It also introduces an efficient multi-frame defense framework, validating its efficacy through extensive experiments aimed at evaluating both defense performance and computational cost.

摘要
深度神经网络在计算机视觉任务中表现出色，但它们受到真实世界的敌对攻击，通过物理对象腐蚀其预测，引发了安全问题的应用在安全关键领域。现有的防御方法主要是单帧分析，具有高计算成本，限制其在多帧场景中实现可靠的响应。为解决这个问题，本文提出了一种高效的注意力基于防御机制，通过对恶意对象在浅层神经网络中快速识别和跟踪，并在多帧设置中遮盖其恶意效应。本工作提高了现有的真实世界敌对攻击的过载技术的可用性，使其在实时应用中可用。此外，本文还介绍了一种高效的多帧防御框架，通过广泛的实验证明其防御性和计算成本的优化。

Few-Shot Classification & Segmentation Using Large Language Models Agent

paper_url: http://arxiv.org/abs/2311.12065
repo_url: None
paper_authors: Tian Meng, Yang Tao, Wuliang Yin
for: 解决几个示例图像分类和 segmentation 问题（few-shot image classification and segmentation，FS-CS），需要在查询图像中分类和 segmentation 目标对象，只需要几个目标类的示例。
methods: 利用大语言模型（LLM）作为代理，在培育自由的情况下解决 FS-CS 问题。将 LLM 作为任务规划者，并使用 off-the-shelf 视觉模型（如 Segment Anything Model 和 GPT-4Vision）帮助 LLM 理解空间和 semantic 信息。使用链式思维提示和在场景学习来导航 LLM，以使其在查询图像中分类和 segmentation 目标对象。
results: 提出的方法可以在 Pascal-5i 数据集上 дости得 state-of-the-art 性能。

Abstract
The task of few-shot image classification and segmentation (FS-CS) requires the classification and segmentation of target objects in a query image, given only a few examples of the target classes. We introduce a method that utilises large language models (LLM) as an agent to address the FS-CS problem in a training-free manner. By making the LLM the task planner and off-the-shelf vision models the tools, the proposed method is capable of classifying and segmenting target objects using only image-level labels. Specifically, chain-of-thought prompting and in-context learning guide the LLM to observe support images like human; vision models such as Segment Anything Model (SAM) and GPT-4Vision assist LLM understand spatial and semantic information at the same time. Ultimately, the LLM uses its summarizing and reasoning capabilities to classify and segment the query image. The proposed method's modular framework makes it easily extendable. Our approach achieves state-of-the-art performance on the Pascal-5i dataset.

摘要
很少示例图像分类和分割（FS-CS）任务需要根据只有几个目标类示例来分类和分割目标对象在查询图像中。我们介绍了一种方法，使用大型自然语言模型（LLM）作为任务规划器，使用各种可用的视觉模型（如Segment Anything Model（SAM）和GPT-4Vision），让LLM通过人类的思维方式来理解支持图像中的空间和semantic信息。最终，LLM使用总结和理解能力来分类和分割查询图像。我们的方法具有扩展性，我们的方法在Pascal-5i数据集上实现了状态的表现。

2023-11-19

cs.CL

cs.CL - 2023-11-19

Spot the Bot: Distinguishing Human-Written and Bot-Generated Texts Using Clustering and Information Theory Techniques

paper_url: http://arxiv.org/abs/2311.11441
repo_url: None
paper_authors: Vasilii Gromov, Quynh Nhu Dang
for: 本研究旨在开发一种基于不监督学习技术的 Bot 识别算法，不需要大量标注数据和/或 Bot 模型架构的先验知识。
methods: 本研究使用语义分析（卷积和模糊）和信息技术，构建了一个robust的 Bot 识别模型，可以对不同类型的 Bot 进行识别。
results: 研究发现，生成文本往往更加混乱，而文学作品则更加复杂。此外，人类文本 clustering 结果比 bot-生成文本更加杂乱，而 bot-生成文本 clustering 结果比人类文本更加紧凑和分化。

Abstract
With the development of generative models like GPT-3, it is increasingly more challenging to differentiate generated texts from human-written ones. There is a large number of studies that have demonstrated good results in bot identification. However, the majority of such works depend on supervised learning methods that require labelled data and/or prior knowledge about the bot-model architecture. In this work, we propose a bot identification algorithm that is based on unsupervised learning techniques and does not depend on a large amount of labelled data. By combining findings in semantic analysis by clustering (crisp and fuzzy) and information techniques, we construct a robust model that detects a generated text for different types of bot. We find that the generated texts tend to be more chaotic while literary works are more complex. We also demonstrate that the clustering of human texts results in fuzzier clusters in comparison to the more compact and well-separated clusters of bot-generated texts.

摘要
随着生成模型如GPT-3的发展，分化generated文本和人类写作文本的任务变得越来越复杂。有很多研究表明，可以通过supervised learning方法来实现 bot 识别。然而，大多数这些工作需要标注数据和/或对 bot-model architecture 的先验知识。在这个工作中，我们提出一种基于无监督学习技术的 bot 识别算法。通过结合 semantic analysis by clustering（crisp和fuzzy）和信息技术，我们构建了一个可靠的模型，可以 Detect 不同类型的 bot 生成的文本。我们发现，生成文本往往更加混乱，而文学作品则更加复杂。此外，我们还示出了人类文本的 clustering 结果比 bot-生成文本的 clustering 结果更加普通和更加紧密。

ML-LMCL: Mutual Learning and Large-Margin Contrastive Learning for Improving ASR Robustness in Spoken Language Understanding

paper_url: http://arxiv.org/abs/2311.11375
repo_url: None
paper_authors: Xuxin Cheng, Bowen Cao, Qichen Ye, Zhihong Zhu, Hongxiang Li, Yuexian Zou
for: 提高自动语音识别（ASR）Robustness in spoken language understanding（SLU）
methods: 提出了一种新的框架——相互学习和大margin对比学习（ML-LMCL），通过在精度训练和ASR训练中分别使用两个SLU模型，以便相互分享知识。同时，引入了距离楔化正则化器，以避免push away intra-cluster pairs。
results: 实验结果表明，ML-LMCL在三个dataset上比现有模型表现出色，达到了新的state-of-the-art性能。

Abstract
Spoken language understanding (SLU) is a fundamental task in the task-oriented dialogue systems. However, the inevitable errors from automatic speech recognition (ASR) usually impair the understanding performance and lead to error propagation. Although there are some attempts to address this problem through contrastive learning, they (1) treat clean manual transcripts and ASR transcripts equally without discrimination in fine-tuning; (2) neglect the fact that the semantically similar pairs are still pushed away when applying contrastive learning; (3) suffer from the problem of Kullback-Leibler (KL) vanishing. In this paper, we propose Mutual Learning and Large-Margin Contrastive Learning (ML-LMCL), a novel framework for improving ASR robustness in SLU. Specifically, in fine-tuning, we apply mutual learning and train two SLU models on the manual transcripts and the ASR transcripts, respectively, aiming to iteratively share knowledge between these two models. We also introduce a distance polarization regularizer to avoid pushing away the intra-cluster pairs as much as possible. Moreover, we use a cyclical annealing schedule to mitigate KL vanishing issue. Experiments on three datasets show that ML-LMCL outperforms existing models and achieves new state-of-the-art performance.

摘要
听话语理解（SLU）是对话系统中的基本任务。然而，自动声音识别（ASR）的不可避免错误通常会妨碍理解性能，导致错误卷入。虽然有一些尝试通过对比学习解决这个问题，但它们（1）在细化学习中对清晰手动词语和ASR词语进行同样的处理;（2）忽视了semantic similarity pairs在应用对比学习时被推迟的问题;（3）受到Kullback-Leibler（KL）消失问题。在这篇论文中，我们提出了相互学习和大margin对比学习（ML-LMCL），一种改进ASRRobustness在SLU中的新框架。具体来说，在细化学习中，我们将两个SLU模型分别在手动词语和ASR词语上进行微调，以便相互分享知识。此外，我们还引入了距离偏斜规则，以避免push away intra-cluster pairs的问题。此外，我们使用循环缓和调度来缓解KL消失问题。实验结果表明，ML-LMCL在三个dataset上超过现有模型，实现了新的状态机器人性能。

CHAMP: Efficient Annotation and Consolidation of Cluster Hierarchies

paper_url: http://arxiv.org/abs/2311.11301
repo_url: https://github.com/ariecattan/champ
paper_authors: Arie Cattan, Tom Hope, Doug Downey, Roy Bar-Haim, Lilach Eden, Yoav Kantor, Ido Dagan
for: 用于建立文本中复杂层次结构的 annotation
methods: 使用递归的增量建构方法
results: 可以快速构建层次结构，并且保证层次结构的可靠性和可比较性

Abstract
Various NLP tasks require a complex hierarchical structure over nodes, where each node is a cluster of items. Examples include generating entailment graphs, hierarchical cross-document coreference resolution, annotating event and subevent relations, etc. To enable efficient annotation of such hierarchical structures, we release CHAMP, an open source tool allowing to incrementally construct both clusters and hierarchy simultaneously over any type of texts. This incremental approach significantly reduces annotation time compared to the common pairwise annotation approach and also guarantees maintaining transitivity at the cluster and hierarchy levels. Furthermore, CHAMP includes a consolidation mode, where an adjudicator can easily compare multiple cluster hierarchy annotations and resolve disagreements.

摘要
各种自然语言处理任务需要复杂的层次结构，其中每个节点是一个文本中的一组项。例如生成包容关系图、跨文档涉及关系解决、标注事件和次事件关系等。为了有效地标注这些层次结构，我们发布了 CHAMP 开源工具，可以逐步建立节点和层次结构，并且可以同时处理任意类型的文本。这种逐步方法与常见的对应方法相比，可以减少标注时间，同时保证节点和层次结构的对称性。此外，CHAMP 还包含了一个整合模式，可以让评审人轻松比较多个节点层次结构，并解决不一致。

A Cross-Attention Augmented Model for Event-Triggered Context-Aware Story Generation

paper_url: http://arxiv.org/abs/2311.11271
repo_url: https://github.com/tonywenuon/dialog-coherence-metric
paper_authors: Chen Tang, Tyler Loakman, Chenghua Lin
for: 提高生成的故事质量，更好地包括上下文和事件特征。
methods: 使用架构层 residual mapping 机制，将上下文特征映射到事件序列中，以更好地利用事件之间的逻辑关系。
results: 与基eline模型相比，提高自动度量和人工评价指标约5%和10%。

Abstract
Despite recent advancements, existing story generation systems continue to encounter difficulties in effectively incorporating contextual and event features, which greatly influence the quality of generated narratives. To tackle these challenges, we introduce a novel neural generation model, EtriCA, that enhances the relevance and coherence of generated stories by employing a cross-attention mechanism to map context features onto event sequences through residual mapping. This feature capturing mechanism enables our model to exploit logical relationships between events more effectively during the story generation process. To further enhance our proposed model, we employ a post-training framework for knowledge enhancement (KeEtriCA) on a large-scale book corpus. This allows EtriCA to adapt to a wider range of data samples. This results in approximately 5\% improvement in automatic metrics and over 10\% improvement in human evaluation. We conduct extensive experiments, including comparisons with state-of-the-art (SOTA) baseline models, to evaluate the performance of our framework on story generation. The experimental results, encompassing both automated metrics and human assessments, demonstrate the superiority of our model over existing state-of-the-art baselines. These results underscore the effectiveness of our model in leveraging context and event features to improve the quality of generated narratives.

摘要
Despite recent advancements, existing story generation systems continue to struggle with effectively incorporating contextual and event features, which greatly impact the quality of generated narratives. To address these challenges, we propose a novel neural generation model, EtriCA, that enhances the relevance and coherence of generated stories by using a cross-attention mechanism to map context features onto event sequences through residual mapping. This feature capturing mechanism allows our model to exploit logical relationships between events more effectively during the story generation process. To further improve our proposed model, we employ a post-training framework for knowledge enhancement (KeEtriCA) on a large-scale book corpus. This enables EtriCA to adapt to a wider range of data samples, resulting in approximately 5% improvement in automatic metrics and over 10% improvement in human evaluation. We conduct extensive experiments, including comparisons with state-of-the-art baseline models, to evaluate the performance of our framework on story generation. The experimental results, encompassing both automated metrics and human assessments, demonstrate the superiority of our model over existing state-of-the-art baselines. These results highlight the effectiveness of our model in leveraging context and event features to improve the quality of generated narratives.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard Chinese languages. The other is Traditional Chinese.

Towards Real-World Writing Assistance: A Chinese Character Checking Benchmark with Faked and Misspelled Characters

paper_url: http://arxiv.org/abs/2311.11268
repo_url: https://github.com/THUKElab/Visual-C3
paper_authors: Yinghui Li, Zishan Xu, Shaoshen Chen, Haojing Huang, Yangning Li, Yong Jiang, Zhongli Li, Qingyu Zhou, Hai-Tao Zheng, Ying Shen
for: 提高中文输入文本的正确性和质量，尤其是在手写输入中检测和修正错误字符。
methods: 利用人工标注的Visual Chinese Character Checking dataset，并提出了新的基线方法来评估这些方法。
results: 实验和分析结果表明，Visual-C$^3$ 是一个高质量 yet 挑战性的 dataset，并且新的基线方法可以在这个 dataset 上达到显著的性能。

Abstract
Writing assistance is an application closely related to human life and is also a fundamental Natural Language Processing (NLP) research field. Its aim is to improve the correctness and quality of input texts, with character checking being crucial in detecting and correcting wrong characters. From the perspective of the real world where handwriting occupies the vast majority, characters that humans get wrong include faked characters (i.e., untrue characters created due to writing errors) and misspelled characters (i.e., true characters used incorrectly due to spelling errors). However, existing datasets and related studies only focus on misspelled characters mainly caused by phonological or visual confusion, thereby ignoring faked characters which are more common and difficult. To break through this dilemma, we present Visual-C$^3$, a human-annotated Visual Chinese Character Checking dataset with faked and misspelled Chinese characters. To the best of our knowledge, Visual-C$^3$ is the first real-world visual and the largest human-crafted dataset for the Chinese character checking scenario. Additionally, we also propose and evaluate novel baseline methods on Visual-C$^3$. Extensive empirical results and analyses show that Visual-C$^3$ is high-quality yet challenging. The Visual-C$^3$ dataset and the baseline methods will be publicly available to facilitate further research in the community.

摘要
文本协助是人类生活中非常重要的应用，同时也是自然语言处理（NLP）研究的基础领域之一。其目标是提高输入文本的正确性和质量，特别是在检查和更正错误字符时，字符检查具有关键性。在现实世界中，手写占了大多数情况下，人们通常会出现写字错误所导致的伪字和拼写错误。然而，现有的数据集和相关研究都主要关注了由语音或视觉混淆导致的拼写错误，而忽略了由写字错误引起的伪字，这些伪字更为普遍和困难。为了突破这种僵局，我们提出了Visual-C$^3$，一个人工标注的中文字符检查数据集，包括伪字和拼写错误的中文字符。到目前为止，Visual-C$^3$ 是实际世界中最大的人工制作的中文字符检查数据集，同时也是首个真实世界的视觉中文字符检查数据集。此外，我们还提出了一些基线方法，并进行了评估。广泛的实验结果和分析显示，Visual-C$^3$ 具有高质量又具有挑战性。Visual-C$^3$ 数据集和基线方法将在未来公开，以便更多的研究者参与进来。

Rethinking Large Language Models in Mental Health Applications

paper_url: http://arxiv.org/abs/2311.11267
repo_url: None
paper_authors: Shaoxiong Ji, Tianlin Zhang, Kailai Yang, Sophia Ananiadou, Erik Cambria
for: 这篇论文探讨了大语言模型在心理健康应用中的可能性，并讨论了生成模型的不稳定性和潜在的幻觉输出问题，强调需要持续进行审核和评估以确保其可靠性和可靠性。
methods: 该论文提出了在使用大语言模型时应该采取judicious和谨慎的心态，并讨论了泛化可解释性和特定可解释性的问题，强调需要开发自然可解释的方法而不是依赖可能的幻觉自我解释。
results: 论文指出，虽然大语言模型在心理健康应用中显示了推荐的可能性，但人类心理师的Empathy、细腻的解释和情境意识仍然是不可取代的，大语言模型应该被视为人类专家的工具而不是替代者。

Abstract
Large Language Models (LLMs) have become valuable assets in mental health, showing promise in both classification tasks and counseling applications. This paper offers a perspective on using LLMs in mental health applications. It discusses the instability of generative models for prediction and the potential for generating hallucinatory outputs, underscoring the need for ongoing audits and evaluations to maintain their reliability and dependability. The paper also distinguishes between the often interchangeable terms ``explainability'' and ``interpretability'', advocating for developing inherently interpretable methods instead of relying on potentially hallucinated self-explanations generated by LLMs. Despite the advancements in LLMs, human counselors' empathetic understanding, nuanced interpretation, and contextual awareness remain irreplaceable in the sensitive and complex realm of mental health counseling. The use of LLMs should be approached with a judicious and considerate mindset, viewing them as tools that complement human expertise rather than seeking to replace it.

摘要
Translation in Simplified Chinese:大型语言模型 (LLMs) 已成为心理健康领域的有价资产，在分类任务和咨询应用中显示了推荐的潜力。本文对使用 LLMs 在心理健康领域的应用提供了一种见解，讨论了预测时的生成模型的不稳定性和潜在的幻视出力，强调需要进行持续的审核和评估，以确保它们的可靠性和可靠性。文章也区别了“解释”和“解读”两个概念，主张发展自然可解释的方法，而不是依赖 LLMs 自带的可能幻视的自我解释。尽管 LLMs 的进步，但人类辅导员的 Empathy 理解、细化解释和场景意识仍然无法被取代在心理健康辅导中。使用 LLMs 应以谨慎和考虑的心态，视它们为人类专家知识的辅助工具，而不是尝试取代它们。

Causal ATE Mitigates Unintended Bias in Controlled Text Generation

paper_url: http://arxiv.org/abs/2311.11229
repo_url: None
paper_authors: Rahul Madhavan, Kahini Wadhawan
for: 这 paper 是研究语言模型中的特性控制问题，通过 causal average treatment effect（Causal ATE）方法进行研究。
methods: 该 paper 使用 causal ATE 方法来解决语言模型中的特性控制问题，并提供了一个理论基础和证明，证明该方法可以减少 false positive 的问题。
results: 该 paper 表明，使用 causal ATE 方法可以解决语言模型中的偶极现象问题，减少对保护群体的不必要偏见。

Abstract
We study attribute control in language models through the method of Causal Average Treatment Effect (Causal ATE). Existing methods for the attribute control task in Language Models (LMs) check for the co-occurrence of words in a sentence with the attribute of interest, and control for them. However, spurious correlation of the words with the attribute in the training dataset, can cause models to hallucinate the presence of the attribute when presented with the spurious correlate during inference. We show that the simple perturbation-based method of Causal ATE removes this unintended effect. Additionally, we offer a theoretical foundation for investigating Causal ATE in the classification task, and prove that it reduces the number of false positives -- thereby mitigating the issue of unintended bias. Specifically, we ground it in the problem of toxicity mitigation, where a significant challenge lies in the inadvertent bias that often emerges towards protected groups post detoxification. We show that this unintended bias can be solved by the use of the Causal ATE metric.

摘要
Translated into Simplified Chinese:我们研究语言模型中的特性控制方法，使用 causal average treatment effect（Causal ATE）方法。现有的语言模型（LMs）中的特性控制方法通常是根据 interessant attribute 的搜索和控制。然而，在训练数据集中存在偶极相关性的情况下，这些方法可能会导致模型在推理过程中假设存在特性，从而导致模型“假象”现象。我们表明，使用简单的扰动基于的 Causal ATE 方法可以解除这种不良影响。此外，我们还提供了对 Causal ATE 在分类任务中的理论基础，并证明它可以减少 false positive，从而解决不良偏见问题。 Specifically, 我们将其应用于抑止攻击性 Mitigation 中的潜在偏见问题，这是一个主要挑战在推理过程中偶极相关性导致保护组中的不良偏见。我们表明，使用 Causal ATE metric 可以解决这种偏见问题。

2023-11-19

cs.LG

cs.LG - 2023-11-19

Physics-Enhanced TinyML for Real-Time Detection of Ground Magnetic Anomalies

paper_url: http://arxiv.org/abs/2311.11452
repo_url: None
paper_authors: Talha Siddique, MD Shaad Mahmud
for: 这篇论文旨在发展一个基于物理指导的小型机器学习（Tiny Machine Learning）框架，以便提高宇宙天气预报中的精度和可靠性。
methods: 本论文使用了一个组合了物理基于调整和压缩的TinyML框架，以便实现高效的资料处理和模型预测。
results: 研究结果显示，该提出的框架能够提高预报的精度和可靠性，并且与传统方法进行比较，显示出该框架在宇宙天气预报中的应用前景。

Abstract
Space weather phenomena like geomagnetic disturbances (GMDs) and geomagnetically induced currents (GICs) pose significant risks to critical technological infrastructure. While traditional predictive models, grounded in simulation, hold theoretical robustness, they grapple with challenges, notably the assimilation of imprecise data and extensive computational complexities. In recent years, Tiny Machine Learning (TinyML) has been adopted to develop Machine Learning (ML)-enabled magnetometer systems for predicting real-time terrestrial magnetic perturbations as a proxy measure for GIC. While TinyML offers efficient, real-time data processing, its intrinsic limitations prevent the utilization of robust methods with high computational needs. This paper developed a physics-guided TinyML framework to address the above challenges. This framework integrates physics-based regularization at the stages of model training and compression, thereby augmenting the reliability of predictions. The developed pruning scheme within the framework harnesses the inherent physical characteristics of the domain, striking a balance between model size and robustness. The study presents empirical results, drawing a comprehensive comparison between the accuracy and reliability of the developed framework and its traditional counterpart. Such a comparative analysis underscores the prospective applicability of the developed framework in conceptualizing robust, ML-enabled magnetometer systems for real-time space weather forecasting.

摘要
Space weather phenomena like geomagnetic disturbances (GMDs) and geomagnetically induced currents (GICs) pose significant risks to critical technological infrastructure. While traditional predictive models, grounded in simulation, hold theoretical robustness, they grapple with challenges, notably the assimilation of imprecise data and extensive computational complexities. In recent years, Tiny Machine Learning (TinyML) has been adopted to develop Machine Learning (ML)-enabled magnetometer systems for predicting real-time terrestrial magnetic perturbations as a proxy measure for GIC. While TinyML offers efficient, real-time data processing, its intrinsic limitations prevent the utilization of robust methods with high computational needs. This paper developed a physics-guided TinyML framework to address the above challenges. This framework integrates physics-based regularization at the stages of model training and compression, thereby augmenting the reliability of predictions. The developed pruning scheme within the framework harnesses the inherent physical characteristics of the domain, striking a balance between model size and robustness. The study presents empirical results, drawing a comprehensive comparison between the accuracy and reliability of the developed framework and its traditional counterpart. Such a comparative analysis underscores the prospective applicability of the developed framework in conceptualizing robust, ML-enabled magnetometer systems for real-time space weather forecasting.Here's the word-for-word translation of the text into Simplified Chinese:Space weather现象如地磁干扰(GMDs)和地磁引起的电流(GICs)对重要的技术基础设施 pose significant risks. While traditional predictive模型, grounded in simulation, hold theoretical robustness, they grapple with challenges, notably the assimilation of imprecise data and extensive computational complexities. In recent years, Tiny Machine Learning (TinyML) has been adopted to develop Machine Learning (ML)-enabled magnetometer systems for predicting real-time terrestrial magnetic perturbations as a proxy measure for GIC. While TinyML offers efficient, real-time data processing, its intrinsic limitations prevent the utilization of robust methods with high computational needs. This paper developed a physics-guided TinyML framework to address the above challenges. This framework integrates physics-based regularization at the stages of model training and compression, thereby augmenting the reliability of predictions. The developed pruning scheme within the framework harnesses the inherent physical characteristics of the domain, striking a balance between model size and robustness. The study presents empirical results, drawing a comprehensive comparison between the accuracy and reliability of the developed framework and its traditional counterpart. Such a comparative analysis underscores the prospective applicability of the developed framework in conceptualizing robust, ML-enabled magnetometer systems for real-time space weather forecasting.

Weight Norm Control

paper_url: http://arxiv.org/abs/2311.11446
repo_url: https://github.com/sayantann11/all-classification-templetes-for-ML
paper_authors: Ilya Loshchilov
for: 本研究证明了预设 weights 的 norm 值不一定是最佳的，可以考虑其他 norm 值来优化模型性能。
methods: 本文提出了一种基于 weight norm control 的权重衰减正则化方法，并证明了这种方法可以与 Adam 等优化器结合使用。
results: 通过实验研究，本文发现了使用 weight norm control 正则化方法可以提高模型性能，并且可以与 Adam 等优化器结合使用。

Abstract
We note that decoupled weight decay regularization is a particular case of weight norm control where the target norm of weights is set to 0. Any optimization method (e.g., Adam) which uses decoupled weight decay regularization (respectively, AdamW) can be viewed as a particular case of a more general algorithm with weight norm control (respectively, AdamWN). We argue that setting the target norm of weights to 0 can be suboptimal and other target norm values can be considered. For instance, any training run where AdamW achieves a particular norm of weights can be challenged by AdamWN scheduled to achieve a comparable norm of weights. We discuss various implications of introducing weight norm control instead of weight decay.

摘要
我们注意到分离式重量衰减调教是重量 нор 控制的特例，其中Target norm of weights是设置为0。任何优化方法（例如Adam）使用分离式重量衰减调教（即AdamW）可以被视为一个更通用的算法，并且可以考虑其他Target norm值。例如，任何训练运行使用AdamW achieve particular norm of weights时，可以被挑战由AdamWN scheduled to achieve comparable norm of weights。我们讨论了将 weight norm control 替代 weight decay 的各种后果。

Duality of Bures and Shape Distances with Implications for Comparing Neural Representations

paper_url: http://arxiv.org/abs/2311.11436
repo_url: None
paper_authors: Sarah E. Harvey, Brett W. Larsen, Alex H. Williams
for: 这篇论文旨在统一两类神经网络表示之间的不同类型的相似度量表示法，并探讨这些方法之间的关系。
methods: 这篇论文使用了cosine函数来关联category 1中的里曼射影shape distance与category 2中的normalized Bures similarity，从而带来新的解释和对CKA相似度量的比较。
results: 研究发现，cosine函数的里曼射影shape distance与normalized Bures similarity之间存在等式关系，这些结果带来新的解释和对CKA相似度量的比较。

Abstract
A multitude of (dis)similarity measures between neural network representations have been proposed, resulting in a fragmented research landscape. Most of these measures fall into one of two categories. First, measures such as linear regression, canonical correlations analysis (CCA), and shape distances, all learn explicit mappings between neural units to quantify similarity while accounting for expected invariances. Second, measures such as representational similarity analysis (RSA), centered kernel alignment (CKA), and normalized Bures similarity (NBS) all quantify similarity in summary statistics, such as stimulus-by-stimulus kernel matrices, which are already invariant to expected symmetries. Here, we take steps towards unifying these two broad categories of methods by observing that the cosine of the Riemannian shape distance (from category 1) is equal to NBS (from category 2). We explore how this connection leads to new interpretations of shape distances and NBS, and draw contrasts of these measures with CKA, a popular similarity measure in the deep learning literature.

摘要
一种多样的（不同）相似度测量方法between神经网络表示方式已经被提出，导致了一个分散的研究领域。大多数这些测量方法都 falls into two categories。 first, measures such as linear regression, canonical correlations analysis (CCA), and shape distances, all learn explicit mappings between neural units to quantify similarity while accounting for expected invariances. Second, measures such as representational similarity analysis (RSA), centered kernel alignment (CKA), and normalized Bures similarity (NBS) all quantify similarity in summary statistics, such as stimulus-by-stimulus kernel matrices, which are already invariant to expected symmetries. 在这篇文章中，我们尝试通过观察cosine of the Riemannian shape distance (from category 1) is equal to NBS (from category 2)来统一这两类方法。我们探索这个连接如何导致新的Shape distances和NBS的解释，并与CKA, a popular similarity measure in the deep learning literature,进行比较。

Fast Heavy Inner Product Identification Between Weights and Inputs in Neural Network Training

paper_url: http://arxiv.org/abs/2311.11429
repo_url: None
paper_authors: Lianke Qin, Saayan Mitra, Zhao Song, Yuanyuan Yang, Tianyi Zhou
for: 这个论文解决了一个含有内积计算的内积识别问题，该问题泛化了光棒问题（参考文献：\cite{prr89}）。给定两个集合 $A \subset {-1,+1}^d$ 和 $B \subset {-1,+1}^d$，其中 $|A|=|B|=n$，如果存在 exact $k$ 对 whose inner product exceeds a certain threshold, i.e., ${(a_1, b_1), \cdots, (a_k, b_k)} \subset A \times B$ such that $\forall i \in [k], \langle a_i,b_i \rangle \geq \rho \cdot d$，where $\rho \in (0,1)$ is the threshold, then the goal is to identify those $k$ heavy inner product pairs.
methods: 我们提供了一种算法，运行时间为 $O(n^{2 \omega / 3+ o(1)})$，可以高 probabilty 地找到 exceeds $\rho \cdot d$ 的 $k$ inner product pair。该算法基于 matrix multiplication 和 random sampling 技术。
results: 我们的算法可以快速地训练具有 ReLU 活化函数的神经网络。

Abstract
In this paper, we consider a heavy inner product identification problem, which generalizes the Light Bulb problem~(\cite{prr89}): Given two sets $A \subset \{-1,+1\}^d$ and $B \subset \{-1,+1\}^d$ with $|A|=|B| = n$, if there are exact $k$ pairs whose inner product passes a certain threshold, i.e., $\{(a_1, b_1), \cdots, (a_k, b_k)\} \subset A \times B$ such that $\forall i \in [k], \langle a_i,b_i \rangle \geq \rho \cdot d$, for a threshold $\rho \in (0,1)$, the goal is to identify those $k$ heavy inner products. We provide an algorithm that runs in $O(n^{2 \omega / 3+ o(1)})$ time to find the $k$ inner product pairs that surpass $\rho \cdot d$ threshold with high probability, where $\omega$ is the current matrix multiplication exponent. By solving this problem, our method speed up the training of neural networks with ReLU activation function.

摘要
在本文中，我们考虑了一个重要的内积识别问题，该问题泛化了光泽问题（参考\cite{prr89}）：给定两个集合 $A \subset \{-1,+1\}^d$ 和 $B \subset \{-1,+1\}^d$，其中 $|A|=|B|=n$，如果存在 exact $k$ 对 whose inner product exceeds a certain threshold, i.e., $\{(a_1, b_1), \ldots, (a_k, b_k)\} \subset A \times B$ such that $\forall i \in [k], \langle a_i, b_i \rangle \geq \rho \cdot d$, for a threshold $\rho \in (0,1)$, then the goal is to identify those $k$ heavy inner product pairs. We provide an algorithm that runs in $O(n^{2 \omega / 3+ o(1)})$ time to find the $k$ inner product pairs that surpass $\rho \cdot d$ threshold with high probability, where $\omega$ is the current matrix multiplication exponent. By solving this problem, our method can speed up the training of neural networks with ReLU activation function.

Tensor-Aware Energy Accounting

paper_url: http://arxiv.org/abs/2311.11424
repo_url: https://github.com/project-smaragdine/smaragdine
paper_authors: Timur Babakol, Yu David Liu
For: This paper aims to introduce Smaragdine, a novel energy accounting system for tensor-based deep learning (DL) programs implemented with TensorFlow, to improve the energy efficiency of DL applications.* Methods: Smaragdine uses a novel white-box methodology of energy accounting that is aware of the internal structure of the DL program, allowing for a detailed breakdown of energy consumption by units aligned with the logical hierarchical decomposition structure.* Results: Smaragdine was applied to understand the energy behavior of BERT, a widely used language model, and was capable of identifying the highest energy/power-consuming components of BERT layer-by-layer and tensor-by-tensor. Additionally, two case studies were conducted to demonstrate the effectiveness of Smaragdine in supporting downstream toolchain building, one comparing the energy impact of hyperparameter tuning of BERT and the other analyzing the energy behavior evolution of BERT as it evolves to its next generation, ALBERT.

Abstract
With the rapid growth of Artificial Intelligence (AI) applications supported by deep learning (DL), the energy efficiency of these applications has an increasingly large impact on sustainability. We introduce Smaragdine, a new energy accounting system for tensor-based DL programs implemented with TensorFlow. At the heart of Smaragdine is a novel white-box methodology of energy accounting: Smaragdine is aware of the internal structure of the DL program, which we call tensor-aware energy accounting. With Smaragdine, the energy consumption of a DL program can be broken down into units aligned with its logical hierarchical decomposition structure. We apply Smaragdine for understanding the energy behavior of BERT, one of the most widely used language models. Layer-by-layer and tensor-by-tensor, Smaragdine is capable of identifying the highest energy/power-consuming components of BERT. Furthermore, we conduct two case studies on how Smaragdine supports downstream toolchain building, one on the comparative energy impact of hyperparameter tuning of BERT, the other on the energy behavior evolution when BERT evolves to its next generation, ALBERT.

摘要
随着人工智能（AI）应用程序基于深度学习（DL）的快速发展，这些应用程序的能源效率已经对可持续发展产生了越来越大的影响。我们介绍了一种新的能源账务系统：Smaragdine，用于 tensor-based DL 程序实现的 TensorFlow。Smaragdine 的核心是一种新的白盒方法：tensor-aware energy accounting，它能够跟踪 tensor 级别的能源消耗。我们通过应用 Smaragdine 来理解 BERT 语言模型的能量行为。层次地和tensor地，Smaragdine 可以识别 BERT 中最高能耗/功率消耗的组件。此外，我们还进行了两个 случа研究，一是关于 BERT 的hyperparameter 优化对能源影响的比较，另一是关于 BERT 进化到下一代 ALBERT 时的能量行为演化。

Offline Reinforcement Learning for Wireless Network Optimization with Mixture Datasets

paper_url: http://arxiv.org/abs/2311.11423
repo_url: None
paper_authors: Kun Yang, Cong Shen, Jing Yang, Shu-ping Yeh, Jerry Sydir
For: 这个论文的目的是研究在无线通信系统中应用 reinforcement learning (RL) 的在线学习方法，以解决无线 radio 资源管理 (RRM) 问题。* Methods: 这个论文使用了多种 state-of-the-art 的 offline RL 算法，包括行为约束 Q-学习 (BCQ)、保守 Q-学习 (CQL) 和隐式 Q-学习 (IQL)，来解决一个具体的 RRM 问题，该问题的目标是通过用户调度来最大化一个线性组合 {of sum和} 5-percentile 率。* Results: 论文发现了在 RRM 问题上使用 offline RL 的性能取决于用于数据收集的行为策略，并提出了一种新的 offline RL 解决方案，该解决方案可以利用不同的行为策略收集的多种数据进行权重平均，从而生成一个近似于最优 RL 策略。

Abstract
The recent development of reinforcement learning (RL) has boosted the adoption of online RL for wireless radio resource management (RRM). However, online RL algorithms require direct interactions with the environment, which may be undesirable given the potential performance loss due to the unavoidable exploration in RL. In this work, we first investigate the use of \emph{offline} RL algorithms in solving the RRM problem. We evaluate several state-of-the-art offline RL algorithms, including behavior constrained Q-learning (BCQ), conservative Q-learning (CQL), and implicit Q-learning (IQL), for a specific RRM problem that aims at maximizing a linear combination {of sum and} 5-percentile rates via user scheduling. We observe that the performance of offline RL for the RRM problem depends critically on the behavior policy used for data collection, and further propose a novel offline RL solution that leverages heterogeneous datasets collected by different behavior policies. We show that with a proper mixture of the datasets, offline RL can produce a near-optimal RL policy even when all involved behavior policies are highly suboptimal.

摘要
近期的增强学习（RL）发展，使得在线RL在无线电路资源管理（RRM）中得到广泛应用。然而，在线RL算法需要直接与环境交互，这可能是不良的，因为RL中不可避免的探索可能会导致性能下降。在这种情况下，我们首先研究使用停机RL算法解决RRM问题。我们评估了一些当今最佳的停机RL算法，包括行为约束Q学习（BCQ）、保守Q学习（CQL）和隐式Q学习（IQL），并评估其在特定RRM问题上的性能。我们发现，停机RL在RRM问题中的性能取决于用于数据采集的行为策略，并提出了一种新的停机RL解决方案，该解决方案利用不同行为策略收集的多种多样的数据进行混合。我们表明，将这些数据进行混合，停机RL可以生成一个几乎优质RL策略，即使所有参与的行为策略都是高度不优的。

Precision at the indistinguishability threshold: a method for evaluating classification algorithms

paper_url: http://arxiv.org/abs/2311.11422
repo_url: None
paper_authors: David J. T. Sumpter
for: 本研究想要解决现有的分类算法评价指标问题，提出一个新的指标来衡量分类算法的性能。
methods: 本研究使用一个新的指标，即“ precisions at the indistinguishability threshold”，这个指标可以衡量一个分类算法在分类精度和准确率之间的融合。
results: 研究发现，这个新指标可以更好地衡量分类算法的性能，而且不同于现有的指标，如AUC和F1-score，它不受分类精度和准确率的融合影响。

Abstract
There exist a wide range of single number metrics for assessing performance of classification algorithms, including AUC and the F1-score (Wikipedia lists 17 such metrics, with 27 different names). In this article, I propose a new metric to answer the following question: when an algorithm is tuned so that it can no longer distinguish labelled cats from real cats, how often does a randomly chosen image that has been labelled as containing a cat actually contain a cat? The steps to construct this metric are as follows. First, we set a threshold score such that when the algorithm is shown two randomly-chosen images -- one that has a score greater than the threshold (i.e. a picture labelled as containing a cat) and another from those pictures that really does contain a cat -- the probability that the image with the highest score is the one chosen from the set of real cat images is 50\%. At this decision threshold, the set of positively labelled images are indistinguishable from the set of images which are positive. Then, as a second step, we measure performance by asking how often a randomly chosen picture from those labelled as containing a cat actually contains a cat. This metric can be thought of as {\it precision at the indistinguishability threshold}. While this new metric doesn't address the tradeoff between precision and recall inherent to all such metrics, I do show why this method avoids pitfalls that can occur when using, for example AUC, and it is better motivated than, for example, the F1-score.

摘要
存在许多单数量指标用于评估分类算法的性能，包括AUC和F1分数（Wikipedia列出17种指标，共27个不同的名称）。在这篇文章中，我提出了一个新的指标，用于回答以下问题：当一个算法通过调整后不能区分标注的猫和实际的猫之间，某个随机选择的图像中是否实际上包含猫？为构建这个指标，我们首先设置一个阈值分数，使得当算法被给两个随机选择的图像——一个分数高于阈值（即一个标注为猫的图像）和另一个来自真正的猫图像集中的图像——的概率是50%。在这个决策阈值下，标注为猫的图像集和真正的猫图像集是无法区分的。然后，作为第二步，我们测量性能的方式是，随机选择标注为猫的图像中是否实际上包含猫。这个指标可以被理解为“在不可区分性阈值上的准确率”。这个新指标不同于AUC和F1分数等其他指标，它不处理分类算法的准确率和敏感度之间的负担。然而，我展示了这种方法可以避免使用AUC等指标时可能出现的坑，并且更有motivation чемF1分数。

Large Pre-trained time series models for cross-domain Time series analysis tasks

paper_url: http://arxiv.org/abs/2311.11413
repo_url: None
paper_authors: Harshavardhan Kamarthi, B. Aditya Prakash
For: 这个论文的目的是为了预训一个通用时序序列模型，以便在不同领域的时序序列分析任务中提供更高的性能。* Methods: 这篇论文使用了一种新的自适应分割策略，该策略可以自动根据不同领域的时序序列数据来选择最佳的分割策略，并在预训期间使用自然语言搅拌学习损失来学习这些分割策略。* Results: 论文的实验结果表明，使用该自适应分割策略和自然语言搅拌学习损失可以在多个不同领域的时序序列分析任务中提供类似或更好的性能，并且需要训练时间和数据量的减少。

Abstract
Large pre-trained models have been instrumental in significant advancements in domains like language and vision making model training for individual downstream tasks more efficient as well as provide superior performance. However, tackling time-series analysis tasks usually involves designing and training a separate model from scratch leveraging training data and domain expertise specific to the task. We tackle a significant challenge for pre-training a general time-series model from multiple heterogeneous time-series dataset: providing semantically useful inputs to models for modeling time series of different dynamics from different domains. We observe that partitioning time-series into segments as inputs to sequential models produces semantically better inputs and propose a novel model LPTM that automatically identifies optimal dataset-specific segmentation strategy leveraging self-supervised learning loss during pre-training. LPTM provides performance similar to or better than domain-specific state-of-art model and is significantly more data and compute efficient taking up to 40% less data as well as 50% less training time to achieve state-of-art performance in a wide range of time-series analysis tasks from multiple disparate domain.

摘要
大型预训模型在语言和视觉等领域取得了重要进步，使模型训练 для特定下游任务变得更加效率，并提供了更高的性能。然而，对时序分析任务通常需要针对特定任务和领域设计和训练独立的模型。我们解决了在多个不同领域的时序数据上预训一个通用时序模型的挑战：提供Semantically meaningful输入模型，以便模型可以更好地处理不同领域的时序序列。我们发现，将时序序列分割成段 inputs to sequential models生成更有意义的输入，并提出了一种新的模型LPTM，可以自动确定数据集特定的优化分割策略，通过自动学习损失来进行预训。LPTM在多个不同领域的时序分析任务中提供了和领域特定状态态的模型性能相似或更好的性能，并且在训练时间和数据量方面具有50%和40%的下降。

Negotiated Representations for Machine Mearning Application

paper_url: http://arxiv.org/abs/2311.11410
repo_url: https://github.com/nurikorhan/negotiated-representations
paper_authors: Nuri Korhan, Samet Bayram
for: 提高机器学习模型的分类精度和避免过拟合
methods: 通过模型与已知标签之间的谈判来增强模型的解释能力和泛化能力
results: 通过在CIFAR 10、CIFAR 100和MNIST等低度机器学习问题上实验，实现了提高分类精度和降低过拟合的目标，并且达到了许多其他研究领域的应用前提。

Abstract
Overfitting is a phenomenon that occurs when a machine learning model is trained for too long and focused too much on the exact fitness of the training samples to the provided training labels and cannot keep track of the predictive rules that would be useful on the test data. This phenomenon is commonly attributed to memorization of particular samples, memorization of the noise, and forced fitness into a data set of limited samples by using a high number of neurons. While it is true that the model encodes various peculiarities as the training process continues, we argue that most of the overfitting occurs in the process of reconciling sharply defined membership ratios. In this study, we present an approach that increases the classification accuracy of machine learning models by allowing the model to negotiate output representations of the samples with previously determined class labels. By setting up a negotiation between the models interpretation of the inputs and the provided labels, we not only increased average classification accuracy but also decreased the rate of overfitting without applying any other regularization tricks. By implementing our negotiation paradigm approach to several low regime machine learning problems by generating overfitting scenarios from publicly available data sets such as CIFAR 10, CIFAR 100, and MNIST we have demonstrated that the proposed paradigm has more capacity than its intended purpose. We are sharing the experimental results and inviting the machine learning community to explore the limits of the proposed paradigm. We also aim to incentive the community to exploit the negotiation paradigm to overcome the learning related challenges in other research fields such as continual learning. The Python code of the experimental setup is uploaded to GitHub.

摘要
“过拟合”是一种现象，当机器学习模型在训练过程中进行了太长时间，专注过多在训练样本与提供的训练标签之间的精确对应，而无法维护预测规律，导致模型对于测试样本的预测失败。这现象通常被归因于模型对特定样本的记忆、对于样本的噪音的记忆以及使用高量的神经元。虽然模型在训练过程中encode了许多特性，但我们认为大多数过拟合发生在推导鲜明的会员比率。在这篇研究中，我们提出了一种方法，可以增加机器学习模型的分类精度，并降低过拟合的比率，不需要其他调整技巧。我们通过在模型对输入的解释和提供的标签之间进行协商，使模型能够更好地预测测试样本。我们将这个协商 paradigm 应用到了一些低度机器学习问题中，从公共可用数据集如 CIFAR 10、CIFAR 100 和 MNIST 中产生了过拟合场景，并证明了我们的方法有更多的容量。我们将实验结果分享，邀请机器学习社区探索这个方法的限制，并务实阶段的挑战。我们还希望通过这个协商方法，导致其他研究领域，如持续学习，的挑战。Python 代码的实验设置已经上传到 GitHub。

Towards interpretable-by-design deep learning algorithms

paper_url: http://arxiv.org/abs/2311.11396
repo_url: None
paper_authors: Plamen Angelov, Dmitry Kangin, Ziyang Zhang
for: 这个论文旨在提出一种能够解释深度学习模型的框架，以便在深度学习模型的基础上构建更加解释性的模型。
methods: 该论文使用了现有的大 neural network（如视transformer）的固有空间，并将其转化为一种基于类似度的函数，以实现解释性。
results: 该论文的结果表明，可以使用这种方法转化深度学习模型，以获得更加解释性的模型，并且可以在不需要微调特征空间的情况下进行分类学习。

Abstract
The proposed framework named IDEAL (Interpretable-by-design DEep learning ALgorithms) recasts the standard supervised classification problem into a function of similarity to a set of prototypes derived from the training data, while taking advantage of existing latent spaces of large neural networks forming so-called Foundation Models (FM). This addresses the issue of explainability (stage B) while retaining the benefits from the tremendous achievements offered by DL models (e.g., visual transformers, ViT) pre-trained on huge data sets such as IG-3.6B + ImageNet-1K or LVD-142M (stage A). We show that one can turn such DL models into conceptually simpler, explainable-through-prototypes ones. The key findings can be summarized as follows: (1) the proposed models are interpretable through prototypes, mitigating the issue of confounded interpretations, (2) the proposed IDEAL framework circumvents the issue of catastrophic forgetting allowing efficient class-incremental learning, and (3) the proposed IDEAL approach demonstrates that ViT architectures narrow the gap between finetuned and non-finetuned models allowing for transfer learning in a fraction of time \textbf{without} finetuning of the feature space on a target dataset with iterative supervised methods.

摘要
提案的框架名为IDEAL（可解释性设计深度学习算法）将标准的监督学习问题转化为与训练数据集中的prototype函数的函数，同时利用现有的大神经网络的固有空间，即基础模型（FM）。这种方法解决了解释性问题（stage B），而不失去深度学习模型的成果（如视觉转换器、ViT）在大量数据集上的辉煌成果（stage A）。我们示出了将这些深度学习模型转化成概念更简单、可解释的模型的可能性。主要发现如下：1. 提案的模型可以通过prototype来解释，解决了混淆的解释问题。2. IDEAL框架缺少牺牲性学习问题，允许高效的分类增量学习。3. IDEAL方法示出了使用ViT架构，在缺少finetuning的情况下，在短时间内完成转移学习，使得模型的表现与finetuning的情况几乎相同。

Addressing the speed-accuracy simulation trade-off for adaptive spiking neurons

paper_url: http://arxiv.org/abs/2311.11390
repo_url: https://github.com/webstorms/blocks
paper_authors: Luke Taylor, Andrew J King, Nicol S Harper
for: 这个论文的目的是解决计算神经科学中的逐次采样问题，提高模型的训练速度而不损失准确性。
methods: 这个论文使用了对ALIF模型的算法重新解释，将模型的顺序 simulations 复杂度降低到最低，并使用GPU进行并行化。
results: 该实现可以在使用小步长DT时获得大约50倍的训练速度提升，并在不同的超visisted classification任务上达到与标准ALIF实现相同的性能，但是在训练时间上占了一个小 Fraction。此外，该模型还可以快速和准确地适应真实的电physiological记录，以捕捉精确的脉冲时间。

Abstract
The adaptive leaky integrate-and-fire (ALIF) model is fundamental within computational neuroscience and has been instrumental in studying our brains $\textit{in silico}$. Due to the sequential nature of simulating these neural models, a commonly faced issue is the speed-accuracy trade-off: either accurately simulate a neuron using a small discretisation time-step (DT), which is slow, or more quickly simulate a neuron using a larger DT and incur a loss in simulation accuracy. Here we provide a solution to this dilemma, by algorithmically reinterpreting the ALIF model, reducing the sequential simulation complexity and permitting a more efficient parallelisation on GPUs. We computationally validate our implementation to obtain over a $50\times$ training speedup using small DTs on synthetic benchmarks. We also obtained a comparable performance to the standard ALIF implementation on different supervised classification tasks - yet in a fraction of the training time. Lastly, we showcase how our model makes it possible to quickly and accurately fit real electrophysiological recordings of cortical neurons, where very fine sub-millisecond DTs are crucial for capturing exact spike timing.

摘要
computacional neuroscience中的适应泄漏集成和发射（ALIF）模型是基础性的，它在研究大脑的计算中扮演着重要的角色。由于计算这些神经网络模型的顺序性，常常会遇到速度精度之间的贸易OFF：可以使用小步长时间步（DT）准确地模拟神经元，但是这会比较慢；或者使用大步长时间步（DT）更快地模拟神经元，但是会导致模拟精度下降。在这篇文章中，我们提供了一种解决这个困境的方法，通过算法性地重新解释ALIF模型，从而降低了计算神经网络模型的顺序化复杂性，并使得在GPU上更有效地并行化。我们通过计算 validate我们的实现，在使用小DT的情况下达到了50倍以上的训练速度提升。此外，我们还证明了我们的实现在不同的Supervised classification任务中具有相同的性能，但是在训练时间上占了一小部分。最后，我们展示了我们的模型可以快速地和准确地适应实际的电physiological记录，其中非常细小的毫秒级DT是关键的，以捕捉精确的发射时间。

Multi-Task Reinforcement Learning with Mixture of Orthogonal Experts

paper_url: http://arxiv.org/abs/2311.11385
repo_url: None
paper_authors: Ahmed Hendawy, Jan Peters, Carlo D’Eramo
for: 解决多任务 reinforcement learning 中agent的技能可重用性问题
methods: 使用mixture of orthogonal experts（MOORE）方法，通过 Gram-Schmidt процесс将任务特定信息转化为共享表示
results: 在 MiniGrid 和 MetaWorld 两个多任务 reinforcement learning benchmark 上，MOORE 方法比基线相对较高，在 MetaWorld 上达到了新的状态纪录

Abstract
Multi-Task Reinforcement Learning (MTRL) tackles the long-standing problem of endowing agents with skills that generalize across a variety of problems. To this end, sharing representations plays a fundamental role in capturing both unique and common characteristics of the tasks. Tasks may exhibit similarities in terms of skills, objects, or physical properties while leveraging their representations eases the achievement of a universal policy. Nevertheless, the pursuit of learning a shared set of diverse representations is still an open challenge. In this paper, we introduce a novel approach for representation learning in MTRL that encapsulates common structures among the tasks using orthogonal representations to promote diversity. Our method, named Mixture Of Orthogonal Experts (MOORE), leverages a Gram-Schmidt process to shape a shared subspace of representations generated by a mixture of experts. When task-specific information is provided, MOORE generates relevant representations from this shared subspace. We assess the effectiveness of our approach on two MTRL benchmarks, namely MiniGrid and MetaWorld, showing that MOORE surpasses related baselines and establishes a new state-of-the-art result on MetaWorld.

摘要

Optimal Locally Private Nonparametric Classification with Public Data

paper_url: http://arxiv.org/abs/2311.11369
repo_url: https://github.com/karlmyh/lpct
paper_authors: Yuheng Ma, Hanfang Yang
for: investigate the problem of public data-assisted non-interactive LDP (Local Differential Privacy) learning with a focus on non-parametric classification.
methods: derive the mini-max optimal convergence rate with LDP constraint, and present a novel approach, the locally private classification tree, which attains the mini-max optimal convergence rate.
results: comprehensive experiments conducted on synthetic and real datasets show the superior performance of our proposed method, and both theoretical and experimental findings demonstrate the effectiveness of public data compared to private data, leading to practical suggestions for prioritizing non-private data collection.

Abstract
In this work, we investigate the problem of public data-assisted non-interactive LDP (Local Differential Privacy) learning with a focus on non-parametric classification. Under the posterior drift assumption, we for the first time derive the mini-max optimal convergence rate with LDP constraint. Then, we present a novel approach, the locally private classification tree, which attains the mini-max optimal convergence rate. Furthermore, we design a data-driven pruning procedure that avoids parameter tuning and produces a fast converging estimator. Comprehensive experiments conducted on synthetic and real datasets show the superior performance of our proposed method. Both our theoretical and experimental findings demonstrate the effectiveness of public data compared to private data, which leads to practical suggestions for prioritizing non-private data collection.

摘要
在这项研究中，我们研究了公共数据协助非互动式LDP（本地差分隐私学习）问题，专注于非参数型分类。基于 posterior 漂移假设，我们首次 derivate 最优的 mini-max 收敛速率带 LDP 约束。然后，我们提出了一种新的方法， namely 地方隐私分类树，可以实现最优的收敛速率。此外，我们设计了一种基于数据驱动的剪裁过程，以避免参数调整并生成快速收敛的估计器。在 synthetic 和实际数据集上进行了广泛的实验，并证明了我们提出的方法的优越性。 both 我们的理论和实验结果表明，公共数据比私人数据更有效，这导致了实际中优先集集数据的建议。Note: Simplified Chinese is used here, as it is more commonly used in mainland China. If you prefer Traditional Chinese, I can also provide the translation.

Self-Supervised Pretraining for Heterogeneous Hypergraph Neural Networks

paper_url: http://arxiv.org/abs/2311.11368
repo_url: None
paper_authors: Abdalgader Abubaker, Takanori Maehara, Madhav Nimishakavi, Vassilis Plachouras
for: 本研究的目的是提出一种自动编码的普适批处理框架（SPHH），用于强化hetereogeneous HyperGNN的自然语言模型。
methods: 本研究使用了两种自然语言任务，即node classification和link prediction，来自动学习 HyperGNN 的局部和全局表示。
results: 实验结果表明，使用 SPHH 框架可以在不同的 HyperGNN 模型和下游任务中提高表示性能，并且在不同的数据集上具有一定的稳定性。

Abstract
Recently, pretraining methods for the Graph Neural Networks (GNNs) have been successful at learning effective representations from unlabeled graph data. However, most of these methods rely on pairwise relations in the graph and do not capture the underling higher-order relations between entities. Hypergraphs are versatile and expressive structures that can effectively model higher-order relationships among entities in the data. Despite the efforts to adapt GNNs to hypergraphs (HyperGNN), there are currently no fully self-supervised pretraining methods for HyperGNN on heterogeneous hypergraphs. In this paper, we present SPHH, a novel self-supervised pretraining framework for heterogeneous HyperGNNs. Our method is able to effectively capture higher-order relations among entities in the data in a self-supervised manner. SPHH is consist of two self-supervised pretraining tasks that aim to simultaneously learn both local and global representations of the entities in the hypergraph by using informative representations derived from the hypergraph structure. Overall, our work presents a significant advancement in the field of self-supervised pretraining of HyperGNNs, and has the potential to improve the performance of various graph-based downstream tasks such as node classification and link prediction tasks which are mapped to hypergraph configuration. Our experiments on two real-world benchmarks using four different HyperGNN models show that our proposed SPHH framework consistently outperforms state-of-the-art baselines in various downstream tasks. The results demonstrate that SPHH is able to improve the performance of various HyperGNN models in various downstream tasks, regardless of their architecture or complexity, which highlights the robustness of our framework.

摘要
近期，对图神经网络（GNN）的预训练方法有了成功，可以从不标注的图数据中学习有效的表示。然而，大多数这些方法仅仅是基于图中的对应关系，而不是捕捉图中实体之间的更高级别关系。几何图（Hypergraphs）是一种灵活和表达力强的结构，可以有效地模型图中实体之间的数据。虽然有尝试将GNN应用于几何图（HyperGNN），但目前还没有完全自动预训练方法 для HyperGNN在不同类型的几何图上。在这篇论文中，我们提出了一种新的自动预训练框架（SPHH），可以有效地捕捉几何图中实体之间的更高级别关系。SPHH包括两个自动预训练任务，旨在同时学习实体在几何图中的本地和全局表示。我们使用几何图结构中的信息来 derivate 出有用的表示，以便在预训练过程中学习实体之间的关系。总的来说，我们的工作对自动预训练 HyperGNN 的领域做出了重要的进步，可以提高多种基于图的下游任务的性能，如节点分类和链接预测任务，这些任务可以与几何图配置相对应。我们在两个实际 benchmark 上使用四种不同的 HyperGNN 模型进行实验，得到的结果显示，我们提出的 SPHH 框架在多种下游任务中一直保持领先，并且可以在不同的 HyperGNN 模型和复杂度之间实现一致性，这说明了我们的框架的稳定性。

Symmetry-invariant quantum machine learning force fields

paper_url: http://arxiv.org/abs/2311.11362
repo_url: None
paper_authors: Isabel Nha Minh Le, Oriel Kiss, Julian Schuhmacher, Ivano Tavernelli, Francesco Tacchino
for: Computing efficient and accurate force fields for atomistic simulations using machine learning techniques and quantum computational methods.
methods: Using variational quantum learning models to predict potential energy surfaces and atomic forces from ab initio training data, and incorporating physically relevant symmetries in quantum neural networks.
results: Outperforming generic quantum learning models on individual molecules of growing complexity, and demonstrating the versatility of the approach on a water dimer as a minimal example of a system with multiple components.

Abstract
Machine learning techniques are essential tools to compute efficient, yet accurate, force fields for atomistic simulations. This approach has recently been extended to incorporate quantum computational methods, making use of variational quantum learning models to predict potential energy surfaces and atomic forces from ab initio training data. However, the trainability and scalability of such models are still limited, due to both theoretical and practical barriers. Inspired by recent developments in geometric classical and quantum machine learning, here we design quantum neural networks that explicitly incorporate, as a data-inspired prior, an extensive set of physically relevant symmetries. We find that our invariant quantum learning models outperform their more generic counterparts on individual molecules of growing complexity. Furthermore, we study a water dimer as a minimal example of a system with multiple components, showcasing the versatility of our proposed approach and opening the way towards larger simulations. Our results suggest that molecular force fields generation can significantly profit from leveraging the framework of geometric quantum machine learning, and that chemical systems represent, in fact, an interesting and rich playground for the development and application of advanced quantum machine learning tools.

摘要
Inspired by recent developments in geometric classical and quantum machine learning, we design quantum neural networks that explicitly incorporate an extensive set of physically relevant symmetries as a data-inspired prior. We find that our invariant quantum learning models outperform their more generic counterparts on individual molecules of growing complexity.Furthermore, we study a water dimer as a minimal example of a system with multiple components, showcasing the versatility of our proposed approach and opening the way towards larger simulations. Our results suggest that molecular force fields generation can significantly profit from leveraging the framework of geometric quantum machine learning, and that chemical systems represent an interesting and rich playground for the development and application of advanced quantum machine learning tools.Translated into Simplified Chinese:机器学习技术是 Computational efficient yet accurate 力场计算的关键工具。最近，这些技术已经扩展到包括量子计算方法，使用变量量子学习模型预测可见能量表面和原子力从初始数据训练中。然而，这些模型的可训练性和可扩展性仍然受到理论和实际障碍。受最近的几何类 classical 和量子机器学习发展启示，我们设计了量子神经网络，其直接包含大量物理相关的 симметрии作为数据驱动的假设。我们发现，我们的抗变量量子学习模型在增加复杂度的分子上表现出色，超过其更通用的对手。此外，我们研究了水二分子作为化学系统中的最小示例，展示了我们的方法的多样性和可扩展性。我们的结果表明，通过利用几何量子机器学习框架，分子力场生成可以取得显著的改进，化学系统 представ a rich and interesting 的机器学习工具开发和应用的场景。

Coverage-Validity-Aware Algorithmic Recourse

paper_url: http://arxiv.org/abs/2311.11349
repo_url: None
paper_authors: Ngoc Bui, Duy Nguyen, Man-Chung Yue, Viet Anh Nguyen
for: 提高机器学习模型的可解释性、透明度和伦理性
methods: 使用一种新的框架生成模型兼容的再利用方法，以确保对未来模型的适用性
results: 提出一种可以适应模型变化的再利用方法，并证明该方法可以恢复多种常见的正则化方法，包括L2正则化和类别权重正则化，同时可以生成INTUMPTIVE的再利用方法。

Abstract
Algorithmic recourse emerges as a prominent technique to promote the explainability, transparency and hence ethics of machine learning models. Existing algorithmic recourse approaches often assume an invariant predictive model; however, the predictive model is usually updated upon the arrival of new data. Thus, a recourse that is valid respective to the present model may become invalid for the future model. To resolve this issue, we propose a novel framework to generate a model-agnostic recourse that exhibits robustness to model shifts. Our framework first builds a coverage-validity-aware linear surrogate of the nonlinear (black-box) model; then, the recourse is generated with respect to the linear surrogate. We establish a theoretical connection between our coverage-validity-aware linear surrogate and the minimax probability machines (MPM). We then prove that by prescribing different covariance robustness, the proposed framework recovers popular regularizations for MPM, including the $\ell_2$-regularization and class-reweighting. Furthermore, we show that our surrogate pushes the approximate hyperplane intuitively, facilitating not only robust but also interpretable recourses. The numerical results demonstrate the usefulness and robustness of our framework.

摘要
仙术式回应技术在机器学习模型中崛起，以提高模型的解释性、透明度和伦理。现有的算法回应方法通常假设不变的预测模型，但实际上预测模型通常会随着新数据的到来而更新。因此，一个有效的回应可能会在未来的模型上无效。为解决这个问题，我们提出了一种新的框架，用于生成一个模型无关的回应，具有对模型变化的抗衰假设。我们首先构建一个具有覆盖度和有效性意识的非线性黑盒模型的线性抽象；然后，我们使用这个抽象来生成回应。我们证明了我们的抽象与最小最大概率机器（MPM）之间存在理论上的连接。我们还证明了，通过不同的covarianceRobustness，我们的框架可以恢复多种MPM的常见正则化，包括L2正则化和类别重量。此外，我们发现我们的抽象可以让近似的 гиперплоскоpush，使得回应不仅是可靠的，还是可解释的。数据结果表明我们的框架是有用和稳定的。

A Generative Model for Accelerated Inverse Modelling Using a Novel Embedding for Continuous Variables

paper_url: http://arxiv.org/abs/2311.11343
repo_url: None
paper_authors: Sébastien Bompas abd Stefan Sandfeld
for: 加速材料设计
methods: 使用生成机器学习模型，并 comparing a novel embedding strategy for generative models based on binary representation of floating point numbers
results: 提供了一种 versatile embedding space for conditioning the generative model，可以提供精细的控制 над生成的结构图像，并促进材料设计的加速。Here’s the Chinese text in the format you requested:
for: 加速材料设计
methods: 使用生成机器学习模型，并 comparing a novel embedding strategy for generative models based on binary representation of floating point numbers
results: 提供了一个 versatile embedding space for conditioning the generative model，可以提供精致的控制 над生成的结构图像，并促进材料设计的加速。

Abstract
In materials science, the challenge of rapid prototyping materials with desired properties often involves extensive experimentation to find suitable microstructures. Additionally, finding microstructures for given properties is typically an ill-posed problem where multiple solutions may exist. Using generative machine learning models can be a viable solution which also reduces the computational cost. This comes with new challenges because, e.g., a continuous property variable as conditioning input to the model is required. We investigate the shortcomings of an existing method and compare this to a novel embedding strategy for generative models that is based on the binary representation of floating point numbers. This eliminates the need for normalization, preserves information, and creates a versatile embedding space for conditioning the generative model. This technique can be applied to condition a network on any number, to provide fine control over generated microstructure images, thereby contributing to accelerated materials design.

摘要
在材料科学中，快速原型材料的性能通常需要大量实验来找到适合的微结构。此外，找到给定性能的微结构通常是一个不充分定义的问题，可能存在多个解。使用生成机器学习模型可以是一个可行的解决方案，同时也可以降低计算成本。但是，这也带来了新的挑战，例如需要输入条件模型的连续属性变量。我们研究现有方法的缺陷并与一种基于二进制浮点数表示的新嵌入策略进行比较。这种策略可以用来决定网络的任意数字，以提供精细的控制权 над生成的微结构图像，从而促进材料设计的加速。

On the Communication Complexity of Decentralized Bilevel Optimization

paper_url: http://arxiv.org/abs/2311.11342
repo_url: None
paper_authors: Yihan Zhang, My T. Thai, Jie Wu, Hongchang Gao
for: 提高实际任务中 Decentralized bilevel 优化的应用性
methods: 提出了一种基于 Stochastic bilevel gradient descent 算法的各自学习方法，具有小于一轮的通信成本和小于一轮的通信轮数
results: 实验结果证明了该算法的有效性，并在具有多级结构的各自学习中减少了通信复杂性

Abstract
Decentralized bilevel optimization has been actively studied in the past few years since it has widespread applications in machine learning. However, existing algorithms suffer from large communication complexity caused by the estimation of stochastic hypergradient, limiting their application to real-world tasks. To address this issue, we develop a novel decentralized stochastic bilevel gradient descent algorithm under the heterogeneous setting, which enjoys a small communication cost in each round and small communication rounds. As such, it can achieve a much better communication complexity than existing algorithms. Moreover, we extend our algorithm to the more challenging decentralized multi-level optimization. To the best of our knowledge, this is the first time achieving these theoretical results under the heterogeneous setting. At last, the experimental results confirm the efficacy of our algorithm.

摘要
“半централизовAN optimize的研究在最近几年来得到了广泛的应用，特别是在机器学习领域。然而，现有的算法受到估计随机超gradient的困扰，导致它们在实际任务中应用有限。为解决这个问题，我们开发了一种新的分布式随机二级梯度下降算法，在多样化设定下实现了每轮的小communication cost和小数量的通信轮次。因此，它在communication复杂度方面比现有算法更好。此外，我们还扩展了我们的算法到更加复杂的分布式多级优化问题。到目前为止，这是在多样化设定下首次实现的理论结果。最后，实验结果证明了我们的算法的有效性。”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

Self-Distilled Representation Learning for Time Series

paper_url: http://arxiv.org/abs/2311.11335
repo_url: None
paper_authors: Felix Pieper, Konstantin Ditschuneit, Martin Genzel, Alexandra Lindt, Johannes Otterbach
for: 本研究旨在探讨自监督学习在时间序列数据上的潜力，以及一种非对照学习方法，基于数据2vec自适应混合学习框架。
methods: 我们提出了一种基于学生-教师模式的非对照学习方法，其中predicts the latent representation of an input time series from masked views of the same time series。
results: 我们通过对UCRC、UEA、ETT和电力 datasets进行比较，证明了我们的方法在预测和分类任务中的竞争力。

Abstract
Self-supervised learning for time-series data holds potential similar to that recently unleashed in Natural Language Processing and Computer Vision. While most existing works in this area focus on contrastive learning, we propose a conceptually simple yet powerful non-contrastive approach, based on the data2vec self-distillation framework. The core of our method is a student-teacher scheme that predicts the latent representation of an input time series from masked views of the same time series. This strategy avoids strong modality-specific assumptions and biases typically introduced by the design of contrastive sample pairs. We demonstrate the competitiveness of our approach for classification and forecasting as downstream tasks, comparing with state-of-the-art self-supervised learning methods on the UCR and UEA archives as well as the ETT and Electricity datasets.

摘要
自领导学习 для时间序列数据具有与自然语言处理和计算机视觉领域最近获得的潜力相似。大多数现有的方法在这一领域都是基于对比学习的，我们则提议一种基于数据2vec自领导框架的非对比方法。我们的方法的核心是一种学生教师的方式，Predicting the latent representation of an input time series from masked views of the same time series。这种策略避免了强制性的modal-specific assumption和偏见，通常由对比样本对组成的设计引入。我们通过对UCRC和UEA数据库以及ETT和电力集成来证明我们的方法在分类和预测任务中的竞争力。

LABCAT: Locally adaptive Bayesian optimization using principal component-aligned trust regions

paper_url: http://arxiv.org/abs/2311.11328
repo_url: https://github.com/aemiliusretiarius/labcat
paper_authors: E. Visser, C. E. van Daalen, J. C. Schoeman
for: 优化贵重黑盒函数（Black-box function）的开销成本高的问题。
methods: 提议使用本地策略（such as trust regions）和主成分 aligned rotation 以及自适应尺度调整策略，以解决 BO 中的限制。
results: 经过大量的数字实验，表明 LABCAT 算法可以比 state-of-the-art BO 和其他黑盒优化算法表现更好。

Abstract
Bayesian optimization (BO) is a popular method for optimizing expensive black-box functions. BO has several well-documented shortcomings, including computational slowdown with longer optimization runs, poor suitability for non-stationary or ill-conditioned objective functions, and poor convergence characteristics. Several algorithms have been proposed that incorporate local strategies, such as trust regions, into BO to mitigate these limitations; however, none address all of them satisfactorily. To address these shortcomings, we propose the LABCAT algorithm, which extends trust-region-based BO by adding principal-component-aligned rotation and an adaptive rescaling strategy based on the length-scales of a local Gaussian process surrogate model with automatic relevance determination. Through extensive numerical experiments using a set of synthetic test functions and the well-known COCO benchmarking software, we show that the LABCAT algorithm outperforms several state-of-the-art BO and other black-box optimization algorithms.

摘要

Large Learning Rates Improve Generalization: But How Large Are We Talking About?

paper_url: http://arxiv.org/abs/2311.11303
repo_url: None
paper_authors: Ekaterina Lobacheva, Eduard Pockonechnyy, Maxim Kodryan, Dmitry Vetrov
for: 这 paper 探讨了在开始神经网络训练时使用大学习率（LR）以实现最佳一致性的假设。
methods: 该研究具体探讨了这个假设，并确定了最佳初始 LR 范围。我们在简化的设置下进行主要实验，并在更实际的设置中验证了我们的关键发现。
results: 我们发现，最佳初始 LR 范围实际上远 narrower than 普遍认为的。

Abstract
Inspired by recent research that recommends starting neural networks training with large learning rates (LRs) to achieve the best generalization, we explore this hypothesis in detail. Our study clarifies the initial LR ranges that provide optimal results for subsequent training with a small LR or weight averaging. We find that these ranges are in fact significantly narrower than generally assumed. We conduct our main experiments in a simplified setup that allows precise control of the learning rate hyperparameter and validate our key findings in a more practical setting.

摘要
Simplified Chinese:根据最近的研究，启发了使用大学习率（LR）进行神经网络训练以实现最佳通用化，我们在这个假设上进行了详细的探索。我们的研究发现，在后续训练中使用小学习率或权重平均时，最佳结果的初始LR范围实际上是比之前想象的更加窄的。我们在一个简化的设置下进行了主要的实验，该设置允许精准控制学习率超参数，并在更实际的设置下验证了我们的关键发现。

From Categories to Classifier: Name-Only Continual Learning by Exploring the Web

paper_url: http://arxiv.org/abs/2311.11293
repo_url: None
paper_authors: Ameya Prabhu, Hasan Abed Al Kader Hammoud, Ser-Nam Lim, Bernard Ghanem, Philip H. S. Torr, Adel Bibi
for: 解决手动标注数据的限制，提高 continual learning 的可行性和效率。
methods: 利用互联网查询和下载无监督的网络数据，并利用这些数据进行分类。
results: 比对手动标注数据和网络数据的可靠性，发现两者相对比较，并且可以通过网络数据创建比STATE-OF-THE-ART名称只分类支持集，提高分类精度。在不同的 continual learning 上下文中应用时，方法具有较小的性能差异。还提出了 EvoTrends，一个基于网络的类增量数据集，用于捕捉实际世界的趋势，只需几分钟创建。总之，这篇论文表明了使用无监督网络数据来缓解手动数据标注的挑战，可以提高 continual learning 的可行性和效率。

Abstract
Continual Learning (CL) often relies on the availability of extensive annotated datasets, an assumption that is unrealistically time-consuming and costly in practice. We explore a novel paradigm termed name-only continual learning where time and cost constraints prohibit manual annotation. In this scenario, learners adapt to new category shifts using only category names without the luxury of annotated training data. Our proposed solution leverages the expansive and ever-evolving internet to query and download uncurated webly-supervised data for image classification. We investigate the reliability of our web data and find them comparable, and in some cases superior, to manually annotated datasets. Additionally, we show that by harnessing the web, we can create support sets that surpass state-of-the-art name-only classification that create support sets using generative models or image retrieval from LAION-5B, achieving up to 25% boost in accuracy. When applied across varied continual learning contexts, our method consistently exhibits a small performance gap in comparison to models trained on manually annotated datasets. We present EvoTrends, a class-incremental dataset made from the web to capture real-world trends, created in just minutes. Overall, this paper underscores the potential of using uncurated webly-supervised data to mitigate the challenges associated with manual data labeling in continual learning.

摘要

TimeSQL: Improving Multivariate Time Series Forecasting with Multi-Scale Patching and Smooth Quadratic Loss

paper_url: http://arxiv.org/abs/2311.11285
repo_url: None
paper_authors: Site Mo, Haoxin Wang, Bixiong Li, Songhai Fan, Yuankai Wu, Xianggen Liu
for: 这篇论文的目的是提出一个简单有效的框架，用于多重时间序列预测（multivariate time series forecasting）。
methods: 这个框架使用了多尺度贴合和流畅quadratic loss（SQL）来解决实际时间序列拥有噪音和复杂的本地和全球时间对称性，使得预测未来时间序列很困难。
results: 根据论文的 teorical 分析和实验结果，这个框架可以在八个真实世界 benchmark 数据集上实现新的顶尖性能。

Abstract
Time series is a special type of sequence data, a sequence of real-valued random variables collected at even intervals of time. The real-world multivariate time series comes with noises and contains complicated local and global temporal dynamics, making it difficult to forecast the future time series given the historical observations. This work proposes a simple and effective framework, coined as TimeSQL, which leverages multi-scale patching and smooth quadratic loss (SQL) to tackle the above challenges. The multi-scale patching transforms the time series into two-dimensional patches with different length scales, facilitating the perception of both locality and long-term correlations in time series. SQL is derived from the rational quadratic kernel and can dynamically adjust the gradients to avoid overfitting to the noises and outliers. Theoretical analysis demonstrates that, under mild conditions, the effect of the noises on the model with SQL is always smaller than that with MSE. Based on the two modules, TimeSQL achieves new state-of-the-art performance on the eight real-world benchmark datasets. Further ablation studies indicate that the key modules in TimeSQL could also enhance the results of other models for multivariate time series forecasting, standing as plug-and-play techniques.

摘要
时间序列是一种特殊的序列数据，一个时间间隔为整数的序列实数随机变量的序列。实际世界多变量时间序列具有噪声和复杂的地方和全局时间动态，使其难以根据历史观测预测未来时间序列。这项工作提出了一个简单有效的框架，命名为TimeSQL，该框架利用多尺度补丁和平滑quadratic loss（SQL）来解决以上挑战。多尺度补丁将时间序列转换为二维补丁，并且具有不同的尺度，以便更好地捕捉时间序列的本地和长期相关性。SQL是基于 rational quadratic kernel 的，可以动态调整Gradient，以避免因噪声和异常值而过拟合。理论分析表明，在某些条件下，TimeSQL模型中的噪声对于MSE模型来说总是小于。基于两个模块，TimeSQL在八个实际世界 referential 数据集上达到了新的状态码性能。进一步的ablation 研究表明，TimeSQL 中的关键模块可以增强其他模型的多变量时间序列预测结果，作为插件技术。

Multi-Timescale Control and Communications with Deep Reinforcement Learning – Part I: Communication-Aware Vehicle Control

paper_url: http://arxiv.org/abs/2311.11281
repo_url: None
paper_authors: Tong Liu, Lei Lei, Kan Zheng, Xuemin, Shen
for: 本研究旨在开发一种基于车辆到所有东西（V2X）通信的智能决策系统，以实现自动驾驶（AD）的安全和效率。
methods: 本研究提出了一种多时间尺度控制和通信（MTCC）框架，基于深度学习算法（DRL）。
results: 研究人员在实验中比较了MTCC-PC算法与基eline DRL算法的性能，并证明了MTCC-PC算法可以在带有观测延迟的情况下提高PC性能。

Abstract
An intelligent decision-making system enabled by Vehicle-to-Everything (V2X) communications is essential to achieve safe and efficient autonomous driving (AD), where two types of decisions have to be made at different timescales, i.e., vehicle control and radio resource allocation (RRA) decisions. The interplay between RRA and vehicle control necessitates their collaborative design. In this two-part paper (Part I and Part II), taking platoon control (PC) as an example use case, we propose a joint optimization framework of multi-timescale control and communications (MTCC) based on Deep Reinforcement Learning (DRL). In this paper (Part I), we first decompose the problem into a communication-aware DRL-based PC sub-problem and a control-aware DRL-based RRA sub-problem. Then, we focus on the PC sub-problem assuming an RRA policy is given, and propose the MTCC-PC algorithm to learn an efficient PC policy. To improve the PC performance under random observation delay, the PC state space is augmented with the observation delay and PC action history. Moreover, the reward function with respect to the augmented state is defined to construct an augmented state Markov Decision Process (MDP). It is proved that the optimal policy for the augmented state MDP is optimal for the original PC problem with observation delay. Different from most existing works on communication-aware control, the MTCC-PC algorithm is trained in a delayed environment generated by the fine-grained embedded simulation of C-V2X communications rather than by a simple stochastic delay model. Finally, experiments are performed to compare the performance of MTCC-PC with those of the baseline DRL algorithms.

摘要
“一个智能做出决策系统，启用车辆与所有东西（V2X）通信，是为自动驾驶（AD） achieve safe和效率的运转，需要在不同时间尺度上做出两种决策，即车辆控制和对话资源分配（RRA）决策。这两种决策之间的互动，需要它们的共同设计。在这两部分文章（Part I和Part II）中，以单位排队控制（PC）为使用案例，我们提出了一个多个时间尺度的控制和通信（MTCC）的联合优化框架，基于深度循环学习（DRL）。在这篇文章（Part I）中，我们首先将问题分解为一个具有通信意识的DRL基于PC子问题，以及一个具有控制意识的DRL基于RRA子问题。然后，我们专注于PC子问题，假设RRA策略已经给出，并提出了MTCC-PC算法，以学习一个高效的PC策略。为了改善PC性能在随机观察延迟下，PC状态空间被扩展了，加入观察延迟和PC动作历史。此外，对于扩展的状态，定义了一个对应的资源分配奖励函数，以建立一个扩展状态Markov决策过程（MDP）。经过证明，最佳策略 для扩展状态MDP是最佳策略 для原始PC问题中的观察延迟。与大多数现有的通信意识控制方法不同，MTCC-PC算法在精细的C-V2X通信嵌入式 simulator上进行延迟生成的延迟环境中训练，而不是使用简单的概率延迟模型。 finally，我们对MTCC-PC算法与基于DRL的基eline算法进行比较实验。”

Multi-Timescale Control and Communications with Deep Reinforcement Learning – Part II: Control-Aware Radio Resource Allocation

paper_url: http://arxiv.org/abs/2311.11280
repo_url: None
paper_authors: Lei Lei, Tong Liu, Kan Zheng, Xuemin, Shen
for: 这个论文是为了解决 Cellular Vehicle-to-Everything (C-V2X) 系统中的多时频控制和通信问题（Multi-Timescale Control and Communications，MTCC）。
methods: 这篇论文使用 Deep Reinforcement Learning (DRL) 算法来解决 MTCC 问题，并将其分解成两个互相关联的互动问题：PC 问题和 RRA 问题。PC 问题是用 DRL 算法学习一个最佳的排队控制策略，而 RRA 问题是用 DRL 算法学习一个最佳的广播资源分配策略。
results: 在实验中，使用实际驾驶数据对主排队车进行训练，并与基eline DRL 算法进行比较，得到的结果表明，提出的 MTCC 算法能够更好地解决 MTCC 问题。

Abstract
In Part I of this two-part paper (Multi-Timescale Control and Communications with Deep Reinforcement Learning -- Part I: Communication-Aware Vehicle Control), we decomposed the multi-timescale control and communications (MTCC) problem in Cellular Vehicle-to-Everything (C-V2X) system into a communication-aware Deep Reinforcement Learning (DRL)-based platoon control (PC) sub-problem and a control-aware DRL-based radio resource allocation (RRA) sub-problem. We focused on the PC sub-problem and proposed the MTCC-PC algorithm to learn an optimal PC policy given an RRA policy. In this paper (Part II), we first focus on the RRA sub-problem in MTCC assuming a PC policy is given, and propose the MTCC-RRA algorithm to learn the RRA policy. Specifically, we incorporate the PC advantage function in the RRA reward function, which quantifies the amount of PC performance degradation caused by observation delay. Moreover, we augment the state space of RRA with PC action history for a more well-informed RRA policy. In addition, we utilize reward shaping and reward backpropagation prioritized experience replay (RBPER) techniques to efficiently tackle the multi-agent and sparse reward problems, respectively. Finally, a sample- and computational-efficient training approach is proposed to jointly learn the PC and RRA policies in an iterative process. In order to verify the effectiveness of the proposed MTCC algorithm, we performed experiments using real driving data for the leading vehicle, where the performance of MTCC is compared with those of the baseline DRL algorithms.

摘要
在这篇两部分文章中（多 timescale控制和通信 WITH deep reinforcement learning -- Part I: 交通aware Vehicle Control），我们将多 timescale控制和通信（MTCC）问题分解为一个交通aware Deep Reinforcement Learning（DRL）基于的坏车队控制（PC）子问题和一个控制 aware DRL 基于的广播资源分配（RRA）子问题。我们关注 PC 子问题，并提出了 MTCC-PC 算法，以学习一个最优的 PC 策略。在这篇文章中（Part II），我们首先关注 RRA 子问题，假设 PC 策略已知，并提出了 MTCC-RRA 算法，以学习 RRA 策略。具体来说，我们将 PC 优势函数 incorporated 到 RRA 奖励函数中，以量化 PC 性能下降的观测延迟量。此外，我们将 RRA 状态空间扩展为 PC 动作历史，以更好地 Inform RRA 策略。此外，我们使用 reward shaping 和 reward backpropagation prioritized experience replay（RBPER）技术，以有效地解决多代理和罕见奖励问题。最后，我们提出了一种 sample- 和计算效率高的训练方法，以同时学习 PC 和 RRA 策略。为验证我们提出的 MTCC 算法的有效性，我们使用实际驾驶数据进行了实验，并比较了 MTCC 的性能与基eline DRL 算法。

Uncertainty quantification for noisy inputs-outputs in physics-informed neural networks and neural operators

paper_url: http://arxiv.org/abs/2311.11262
repo_url: None
paper_authors: Zongren Zou, Xuhui Meng, George Em Karniadakis
for: This paper is written for addressing the uncertainty quantification (UQ) in scientific machine learning (SciML) models, specifically for noisy inputs in physics-informed neural networks (PINNs) and neural operators (NOs).
methods: The paper proposes a Bayesian approach to quantify uncertainty arising from noisy inputs-outputs in PINNs and NOs. This approach seamlessly integrates into PINNs and NOs, allowing them to address problems where the observed coordinate or input functions are subject to noise.
results: The proposed approach enables PINNs and NOs to handle noisy measurements for both input and output functions, providing reliable and trustworthy deployment of these models in applications involving physical knowledge.

Abstract
Uncertainty quantification (UQ) in scientific machine learning (SciML) becomes increasingly critical as neural networks (NNs) are being widely adopted in addressing complex problems across various scientific disciplines. Representative SciML models are physics-informed neural networks (PINNs) and neural operators (NOs). While UQ in SciML has been increasingly investigated in recent years, very few works have focused on addressing the uncertainty caused by the noisy inputs, such as spatial-temporal coordinates in PINNs and input functions in NOs. The presence of noise in the inputs of the models can pose significantly more challenges compared to noise in the outputs of the models, primarily due to the inherent nonlinearity of most SciML algorithms. As a result, UQ for noisy inputs becomes a crucial factor for reliable and trustworthy deployment of these models in applications involving physical knowledge. To this end, we introduce a Bayesian approach to quantify uncertainty arising from noisy inputs-outputs in PINNs and NOs. We show that this approach can be seamlessly integrated into PINNs and NOs, when they are employed to encode the physical information. PINNs incorporate physics by including physics-informed terms via automatic differentiation, either in the loss function or the likelihood, and often take as input the spatial-temporal coordinate. Therefore, the present method equips PINNs with the capability to address problems where the observed coordinate is subject to noise. On the other hand, pretrained NOs are also commonly employed as equation-free surrogates in solving differential equations and Bayesian inverse problems, in which they take functions as inputs. The proposed approach enables them to handle noisy measurements for both input and output functions with UQ.

摘要
科学机器学习（SciML）中的不确定性评估（UQ）在 neural networks（NNs）广泛应用的场景中变得越来越重要。代表性的 SciML 模型包括物理相关的 neural networks（PINNs）和 neural operators（NOs）。尽管 UQ 在 SciML 已经得到了一些研究，但很少有关注输入噪声的不确定性，如 PINNs 中的空间-时间坐标和 NOs 中的输入函数。噪声在模型输入中可能会比模型输出中的噪声更加复杂，主要是因为大多数 SciML 算法具有非线性性。因此，对输入噪声的不确定性评估变得非常重要，以确保模型在应用中的可靠和信任性。为此，我们介绍了一种 bayesian 方法，用于评估 PINNs 和 NOs 中噪声输入-输出的不确定性。我们示示这种方法可以顺利地与 PINNs 和 NOs 结合使用，当它们用于编码物理信息时。PINNs 通常通过自动微分来包含物理条件，并将空间-时间坐标作为输入。因此，我们的方法可以帮助 PINNs 解决受到噪声影响的问题。而 pretrained NOs 通常也被用作 equation-free 代理，用于解决微分方程和 bayesian 反问题，它们的输入是函数。我们的方法可以帮助它们处理噪声测量数据。

BOIS: Bayesian Optimization of Interconnected Systems

paper_url: http://arxiv.org/abs/2311.11254
repo_url: None
paper_authors: Leonardo D. González, Victor M. Zavala
for: 这个论文是用来优化昂贵的样本系统的一种有效方法。
methods: 这篇论文使用 Bayesian 优化（BO）方法，并利用 Gaussian 过程（GP）来表征模型不确定性，从而导引学习和搜索过程。
results: 该论文介绍了一种新的 BO 方法，即 BOIS，可以有效地使用 composite functions，并使用 adaptive linearizations 来获得关于 composite function 的闭式表达。该方法在一个化学过程优化案例中被评估，并与标准 BO 和抽样方法进行比较。结果表明，BOIS 可以获得性能提升和正确地捕捉 composite functions 的统计特性。

Abstract
Bayesian optimization (BO) has proven to be an effective paradigm for the global optimization of expensive-to-sample systems. One of the main advantages of BO is its use of Gaussian processes (GPs) to characterize model uncertainty which can be leveraged to guide the learning and search process. However, BO typically treats systems as black-boxes and this limits the ability to exploit structural knowledge (e.g., physics and sparse interconnections). Composite functions of the form $f(x, y(x))$, wherein GP modeling is shifted from the performance function $f$ to an intermediate function $y$, offer an avenue for exploiting structural knowledge. However, the use of composite functions in a BO framework is complicated by the need to generate a probability density for $f$ from the Gaussian density of $y$ calculated by the GP (e.g., when $f$ is nonlinear it is not possible to obtain a closed-form expression). Previous work has handled this issue using sampling techniques; these are easy to implement and flexible but are computationally intensive. In this work, we introduce a new paradigm which allows for the efficient use of composite functions in BO; this uses adaptive linearizations of $f$ to obtain closed-form expressions for the statistical moments of the composite function. We show that this simple approach (which we call BOIS) enables the exploitation of structural knowledge, such as that arising in interconnected systems as well as systems that embed multiple GP models and combinations of physics and GP models. Using a chemical process optimization case study, we benchmark the effectiveness of BOIS against standard BO and sampling approaches. Our results indicate that BOIS achieves performance gains and accurately captures the statistics of composite functions.

摘要
bayesian 优化（BO）已经证明是全球优化昂贵样本系统的有效方法。BO的一个主要优点是使用 Gaussian processes（GP）来Characterize model uncertainty，这可以用来导引学习和搜索过程。然而，BO通常对系统进行黑盒子封装，这限制了利用结构知识（例如物理和稀疏连接）。使用 $f(x, y(x)) $ 的 composite function，其中 GP 模型Shifted from performance function $f$ to intermediate function $y$，提供了利用结构知识的途径。然而，在 BO 框架中使用 composite function 增加了 Generating a probability density for $f$ from the Gaussian density of $y$ calculated by the GP 的需求（例如当 $f$ 非线性时，不能获得关闭式表达）。过去的工作通过 sampling 技术来解决这个问题，这些技术容易实现但计算昂贵。在这种工作中，我们引入了一种新的方法，即使用 adaptive linearization of $f$ 来获得关闭式表达 composite function 的统计 moments。我们显示这种简单的方法（我们称之为 BOIS）可以有效地利用结构知识，例如在交互连接系统中和GP模型和物理模型的组合中。使用化学过程优化案例研究，我们对 BOIS 的效果进行了比较，与标准 BO 和 sampling 方法进行了比较。我们的结果表明，BOIS 可以 achieve performance gains 和正确地捕捉 composite function 的统计特性。

A Universal Framework for Accurate and Efficient Geometric Deep Learning of Molecular Systems

paper_url: http://arxiv.org/abs/2311.11228
repo_url: https://github.com/XieResearchGroup/Physics-aware-Multiplex-GNN
paper_authors: Shuo Zhang, Yang Liu, Lei Xie
for: 用于三维分子的准确和有效学习表示
methods: 基于分子动力学的启发式模型本地和非本地交互，并对其共同效应进行模型
results: 在多种分子科学应用中表现出色，比如小分子性质、RNA三维结构和蛋白质-抑药结合稳定性，并且在精算和内存方面具有高效性。

Abstract
Molecular sciences address a wide range of problems involving molecules of different types and sizes and their complexes. Recently, geometric deep learning, especially Graph Neural Networks, has shown promising performance in molecular science applications. However, most existing works often impose targeted inductive biases to a specific molecular system, and are inefficient when applied to macromolecules or large-scale tasks, thereby limiting their applications to many real-world problems. To address these challenges, we present PAMNet, a universal framework for accurately and efficiently learning the representations of three-dimensional (3D) molecules of varying sizes and types in any molecular system. Inspired by molecular mechanics, PAMNet induces a physics-informed bias to explicitly model local and non-local interactions and their combined effects. As a result, PAMNet can reduce expensive operations, making it time and memory efficient. In extensive benchmark studies, PAMNet outperforms state-of-the-art baselines regarding both accuracy and efficiency in three diverse learning tasks: small molecule properties, RNA 3D structures, and protein-ligand binding affinities. Our results highlight the potential for PAMNet in a broad range of molecular science applications.

摘要
分子科学研究各种不同类型和大小的分子和其复杂系统。最近，深度学习，特别是图神经网络，在分子科学应用中表现出了有前途的性能。然而，大多数现有的工作通常会针对特定分子系统强制目标印象，对于大分子或大规模任务来说效率低下，因此对实际世界问题有限制。为解决这些挑战，我们提出了PAMNet，一个通用的框架，可以准确地和高效地学习三维分子的表示，无论它们是什么类型和大小。以分子动力学为灵感，PAMNet引入物理学引用的偏见，以模型本地和非本地交互和它们的共同效果。因此，PAMNet可以减少昂贵的操作，从而节省时间和存储空间。在广泛的benchmark测试中，PAMNet在三种多样化的学习任务中（小分子性质、RNA三维结构和蛋白质-抑药绑定亲和力）表现出了与状态所有基准点的超过。我们的结果表明PAMNet在分子科学应用中具有广泛的潜力。

TextGuard: Provable Defense against Backdoor Attacks on Text Classification

paper_url: http://arxiv.org/abs/2311.11225
repo_url: https://github.com/ai-secure/textguard
paper_authors: Hengzhi Pei, Jinyuan Jia, Wenbo Guo, Bo Li, Dawn Song
for: 防止机器学习模型中的后门攻击
methods: 分割训练数据，并从每个子训练集中训练基础分类器， ensemble 提供最终预测
results: 在三个文本分类任务上达到了证书精度，超过了现有的证书防御措施；并提供了进一步优化策略来提高 TextGuard 的实际性能。

Abstract
Backdoor attacks have become a major security threat for deploying machine learning models in security-critical applications. Existing research endeavors have proposed many defenses against backdoor attacks. Despite demonstrating certain empirical defense efficacy, none of these techniques could provide a formal and provable security guarantee against arbitrary attacks. As a result, they can be easily broken by strong adaptive attacks, as shown in our evaluation. In this work, we propose TextGuard, the first provable defense against backdoor attacks on text classification. In particular, TextGuard first divides the (backdoored) training data into sub-training sets, achieved by splitting each training sentence into sub-sentences. This partitioning ensures that a majority of the sub-training sets do not contain the backdoor trigger. Subsequently, a base classifier is trained from each sub-training set, and their ensemble provides the final prediction. We theoretically prove that when the length of the backdoor trigger falls within a certain threshold, TextGuard guarantees that its prediction will remain unaffected by the presence of the triggers in training and testing inputs. In our evaluation, we demonstrate the effectiveness of TextGuard on three benchmark text classification tasks, surpassing the certification accuracy of existing certified defenses against backdoor attacks. Furthermore, we propose additional strategies to enhance the empirical performance of TextGuard. Comparisons with state-of-the-art empirical defenses validate the superiority of TextGuard in countering multiple backdoor attacks. Our code and data are available at https://github.com/AI-secure/TextGuard.

摘要
<>translate the following text into Simplified Chinese<>背门攻击已成为安全应用中的主要安全威胁，当部署机器学习模型时。现有的研究努力已经提出了许多防御措施，但是None of these techniques could provide a formal and provable security guarantee against arbitrary attacks。因此，它们可以轻松被强化的适应攻击所损坏，如我们的评估所示。在这个工作中，我们提出了TextGuard，首个可提供可靠的防御措施，对于文本分类中的背门攻击。具体来说，TextGuard首先将（背门）训练数据分成子训练集，通过每个训练句子分成子句子。这些分割确保了训练集中大多数子训练集不包含背门触发器。接着，我们从每个子训练集中训练基本分类器，并将其结合为最终预测。我们 teorically prove that 当背门触发器的长度在某些阈值之下，TextGuard 的预测将不受训练和测试输入中背门触发器的影响。在我们的评估中，我们显示了 TextGuard 在三个标准文本分类任务上的效果，超过了现有的认证防御措施的认证精度。此外，我们还提出了一些增强 TextGuard 的实际性的策略。与现有的实际防御措施比较，TextGuard 在面对多个背门攻击时表现出色。我们的代码和数据可以在上获得。

Robust Network Slicing: Multi-Agent Policies, Adversarial Attacks, and Defensive Strategies

paper_url: http://arxiv.org/abs/2311.11206
repo_url: None
paper_authors: Feng Wang, M. Cenk Gursoy, Senem Velipasalar
for: 本文提出了一种基于多智能深度学习（深度RL）框架的网络切片在动态环境中的多基站多用户场景中的自适应控制方法。
methods: 本文提出了一种新的深度RL框架，其中多个演示器（actor）被实现为指向网络中的变化输入。此外，中央批评器（critic）也被使用以提高演示器的性能。
results: 通过实验表明，提出的深度RL算法可以具有效果地控制网络切片。此外，还提出了一种基于深度RL的干扰器，可以在有限的前置信息和功率预算下减少网络切片的传输率，从而降低网络切片的性能。最后，提出了一种基于 Nash 平衡的策略混合 Profile，可以用于网络切片和干扰器中。通过在实验中应用这种策略混合 Profile，得到了良好的性能。

Abstract
In this paper, we present a multi-agent deep reinforcement learning (deep RL) framework for network slicing in a dynamic environment with multiple base stations and multiple users. In particular, we propose a novel deep RL framework with multiple actors and centralized critic (MACC) in which actors are implemented as pointer networks to fit the varying dimension of input. We evaluate the performance of the proposed deep RL algorithm via simulations to demonstrate its effectiveness. Subsequently, we develop a deep RL based jammer with limited prior information and limited power budget. The goal of the jammer is to minimize the transmission rates achieved with network slicing and thus degrade the network slicing agents' performance. We design a jammer with both listening and jamming phases and address jamming location optimization as well as jamming channel optimization via deep RL. We evaluate the jammer at the optimized location, generating interference attacks in the optimized set of channels by switching between the jamming phase and listening phase. We show that the proposed jammer can significantly reduce the victims' performance without direct feedback or prior knowledge on the network slicing policies. Finally, we devise a Nash-equilibrium-supervised policy ensemble mixed strategy profile for network slicing (as a defensive measure) and jamming. We evaluate the performance of the proposed policy ensemble algorithm by applying on the network slicing agents and the jammer agent in simulations to show its effectiveness.

摘要
在本文中，我们提出了一个多代理深度强化学习（深度RL）框架，用于在多个基站和多个用户的动态环境中实现网络分割。特别是，我们提出了一种新的多actor和中央批评者（MACC）深度RL框架，在其中， actors 是实现为指针网络，以适应输入的变化维度。我们通过 simulations 评估了我们提出的深度RL 算法的性能，以示其效果。然后，我们开发了一个基于深度RL 的干扰器，该干扰器具有限制的前置信息和限制的能量预算。干扰器的目标是降低网络分割代理的性能，以达到干扰网络分割的目的。我们设计了干扰器的 listening 和干扰阶段，并对干扰频道优化以及干扰位置优化进行深度RL 的调 optimize。我们在优化的位置上进行干扰，通过在优化的频道上进行 switching 来生成干扰攻击。我们证明了我们的干扰器可以在不知情的情况下，对网络分割代理造成重大的性能降低。最后，我们提出了一种 Nash 平衡监督的策略ensemble 混合策略，用于网络分割（作为防御措施）和干扰。我们通过在网络分割代理和干扰器代理上应用这种策略 ensemble 算法，以证明其效果。

Scale-free networks: improved inference

paper_url: http://arxiv.org/abs/2311.11200
repo_url: https://github.com/Aryia-Behroziuan/neurons
paper_authors: Nixon Jerez-Lillo, Francisco A. Rodrigues, Pedro L. Ramos
for: 本文 investigate whether 网络的度分布遵循 power-law 分布，并提出了改进的 Bayesian inference 方法来确定模型参数的准确性和置信区间。
methods: 本文使用 Bayesian inference 方法，包括对绝对和相对分布进行推导，以获得准确的模型参数估计和置信区间。
results: 研究结果表明，使用 Bayesian inference 方法可以获得 nearly unbiased 的模型参数估计，并且在实际应用中更适用。本文还对 более чем 5,000 个synthetic网络和3,000个实际网络进行了应用。结果表明，我们的方法在实际应用中更适用，它的频率接受相对较高。

Abstract
The power-law distribution plays a crucial role in complex networks as well as various applied sciences. Investigating whether the degree distribution of a network follows a power-law distribution is an important concern. The commonly used inferential methods for estimating the model parameters often yield biased estimates, which can lead to the rejection of the hypothesis that a model conforms to a power-law. In this paper, we discuss improved methods that utilize Bayesian inference to obtain accurate estimates and precise credibility intervals. The inferential methods are derived for both continuous and discrete distributions. These methods reveal that objective Bayesian approaches return nearly unbiased estimates for the parameters of both models. Notably, in the continuous case, we identify an explicit posterior distribution. This work enhances the power of goodness-of-fit tests, enabling us to accurately discern whether a network or any other dataset adheres to a power-law distribution. We apply the proposed approach to fit degree distributions for more than 5,000 synthetic networks and over 3,000 real networks. The results indicate that our method is more suitable in practice, as it yields a frequency of acceptance close to the specified nominal level.

摘要
power-law 分布在复杂网络以及各种应用科学中扮演着关键角色。检查网络度分布是否遵循power-law分布是一项重要的问题。通常使用的推断方法对模型参数进行估计通常会导致偏向估计，这可能会导致模型不符合power-law分布的拒绝。在这篇论文中，我们讨论了改进的方法，利用 Bayesian 推断来获得准确的估计和精确的信任范围。这些方法适用于连续和离散分布。这些方法表明，对 Bayesian 方法进行 Objective 的推断可以返回准确的参数估计。在连续 caso，我们确定了一个显式后验分布。这项工作可以增强好宜性测试的力量，使我们能准确地判断网络或任何其他数据集是否遵循power-law分布。我们在fit degree distribution中应用该方法，并对超过5,000个synthetic网络和3,000个实际网络进行测试。结果表明，我们的方法在实践中更适合，它的频次接受度很接近指定的正常水平。

Testing with Non-identically Distributed Samples

paper_url: http://arxiv.org/abs/2311.11194
repo_url: https://github.com/jettbrains/-L-
paper_authors: Shivam Garg, Chirag Pabbaraju, Kirankumar Shiragur, Gregory Valiant
for: 这个论文研究了在独立但不同分布下的非线性样本计数和估计问题。
methods: 该论文使用了分布测试框架，具体来说是对每个分布$\textbf{p}i$取$c$个独立的样本，然后通过学习或测试$\textbf{p}{\text{avg}$的平均分布来进行分布测试。
results: 研究结果表明，当$c=1$时，需要$\Theta(k/\varepsilon^2)$样本来准确地学习$\textbf{p}_{\text{avg}$，而在测试uniformity或identity时，需要 linear 样本数量与$k$相关。而当$c\geq 2$时，样本数量可以降低至$O(\sqrt{k}/\varepsilon^2 + 1/\varepsilon^4)$，与i.i.d.情况的优秀样本复杂度相同。此外，当$c=2$时，存在一个常数$\rho > 0$，使得即使使用$\rho k$个样本，也无法通过忽略哪些样本来自同$\textbf{p}_i$的Multiset测试uniformity。

Abstract
We examine the extent to which sublinear-sample property testing and estimation applies to settings where samples are independently but not identically distributed. Specifically, we consider the following distributional property testing framework: Suppose there is a set of distributions over a discrete support of size $k$, $\textbf{p}_1, \textbf{p}_2,\ldots,\textbf{p}_T$, and we obtain $c$ independent draws from each distribution. Suppose the goal is to learn or test a property of the average distribution, $\textbf{p}_{\mathrm{avg}$. This setup models a number of important practical settings where the individual distributions correspond to heterogeneous entities -- either individuals, chronologically distinct time periods, spatially separated data sources, etc. From a learning standpoint, even with $c=1$ samples from each distribution, $\Theta(k/\varepsilon^2)$ samples are necessary and sufficient to learn $\textbf{p}_{\mathrm{avg}$ to within error $\varepsilon$ in TV distance. To test uniformity or identity -- distinguishing the case that $\textbf{p}_{\mathrm{avg}$ is equal to some reference distribution, versus has $\ell_1$ distance at least $\varepsilon$ from the reference distribution, we show that a linear number of samples in $k$ is necessary given $c=1$ samples from each distribution. In contrast, for $c \ge 2$, we recover the usual sublinear sample testing of the i.i.d. setting: we show that $O(\sqrt{k}/\varepsilon^2 + 1/\varepsilon^4)$ samples are sufficient, matching the optimal sample complexity in the i.i.d. case in the regime where $\varepsilon \ge k^{-1/4}$. Additionally, we show that in the $c=2$ case, there is a constant $\rho > 0$ such that even in the linear regime with $\rho k$ samples, no tester that considers the multiset of samples (ignoring which samples were drawn from the same $\textbf{p}_i$) can perform uniformity testing.

摘要
我们研究在独立但不同分布下的测试和估计中，sublinear-sample property testing和估计是否适用。 Specifically，我们考虑以下分布性 property testing框架：假设有一集合 $\textbf{p}_1, \textbf{p}_2, \ldots, \textbf{p}_T$ 的分布，每个分布都是在给定的类别支持 $\mathcal{X}$ 上的一个统计分布，并且我们从每个分布中获得 $c$ 个独立的抽象。我们的目标是要学习或测试 $\textbf{p}_{\text{avg}$ 的平均分布。这个设定模拟了许多实际上重要的实际设定，例如个别分布对应的不同实体，例如个人、时间对应的不同时期、空间分布的不同数据来源等。从学习角度来看，就算有 $c=1$ 个抽象，需要 $\Theta(k/\varepsilon^2)$ 个抽象来learn $\textbf{p}_{\text{avg}$ 到 Within 误差 $\varepsilon$ 的TV距离。对于uniformity或identify性测试，我们显示出在 $k$ 中的线性数量的抽象是必要的，假设 $c=1$ 个抽象。相比之下，如果 $c \ge 2$，我们可以回传i.i.d. 的情况下的低于线性数量的抽象，即 $O(\sqrt{k}/\varepsilon^2 + 1/\varepsilon^4)$ 个抽象足够，与i.i.d. 情况下的低于线性数量的抽象匹配。此外，我们还显示出在 $c=2$ 情况下，存在一个常数 $\rho > 0$，使得甚至在 $\rho k$ 个抽象下，没有任何测试器可以对多个抽象（忽略哪些抽象是哪些 $\textbf{p}_i$ 的）进行uniformity测试。

2023-11-19

eess.IV

eess.IV - 2023-11-19

Classification of Radio Galaxies with trainable COSFIRE filters

paper_url: http://arxiv.org/abs/2311.11286
repo_url: None
paper_authors: Steven Ndungu, Trienko Grobler, Stefan J. Wijnholds Dimka Karastoyanova, George Azzopardi
for: radio galaxy classification
methods: COSFIRE filters (explainable, learning-free, rotation-tolerant, efficient)
results: achieved an average accuracy rate of 93.36%, outperformed contemporary deep learning models, better computational performance ($\sim$20$\times$ fewer operations)Here’s the full text in Simplified Chinese:for: 这个研究旨在为 радио галактик类型分类做出一种有效的方法。methods: 我们使用了COSFIRE滤波器，它具有适应形态和方向的能力，同时具有解释性、学习自由、旋转快速等优点。results: 我们在一个标准的 радио галактик数据集上进行了实验，包括1180个训练样本和404个测试样本。结果显示，我们的方法实现了93.36%的平均准确率，超越了当前的深度学习模型，并在这个数据集上达到了最佳的成绩。此外，COSFIRE滤波器在计算性能方面也表现出了优势，相比同精度的 denseNet 模型，它的计算量只需20倍以下。这些发现证明了 COSFIRE 滤波器在 радио галактик类型分类中的效iveness。

Abstract
Radio galaxies exhibit a rich diversity of characteristics and emit radio emissions through a variety of radiation mechanisms, making their classification into distinct types based on morphology a complex challenge. To address this challenge effectively, we introduce an innovative approach for radio galaxy classification using COSFIRE filters. These filters possess the ability to adapt to both the shape and orientation of prototype patterns within images. The COSFIRE approach is explainable, learning-free, rotation-tolerant, efficient, and does not require a huge training set. To assess the efficacy of our method, we conducted experiments on a benchmark radio galaxy data set comprising of 1180 training samples and 404 test samples. Notably, our approach achieved an average accuracy rate of 93.36\%. This achievement outperforms contemporary deep learning models, and it is the best result ever achieved on this data set. Additionally, COSFIRE filters offer better computational performance, $\sim$20$\times$ fewer operations than the DenseNet-based competing method (when comparing at the same accuracy). Our findings underscore the effectiveness of the COSFIRE filter-based approach in addressing the complexities associated with radio galaxy classification. This research contributes to advancing the field by offering a robust solution that transcends the orientation challenges intrinsic to radio galaxy observations. Our method is versatile in that it is applicable to various image classification approaches.

摘要
Radio галактики表现出丰富多样性，通过多种辐射机制发射电波，这使得它们的分类成为一项复杂的挑战。为了解决这个挑战，我们提出了一种创新的电波 галакти子分类方法使用COSFIRE筛选器。这些筛选器具有适应形状和orientation的图像模式的能力。COSFIRE方法是可解释的、学习无需、旋转快速、高效的，并不需要庞大的训练集。为评估我们的方法的有效性，我们在一个标准电波 галакти子数据集上进行了实验，该数据集包括1180个训练样本和404个测试样本。结果显示，我们的方法实现了93.36%的准确率，这比当今的深度学习模型高出了一些，同时也是这个数据集上的最佳成绩。此外，COSFIRE筛选器在计算性能方面比 denseNet 基于的竞争方法更好，在同等准确率下，它们的操作数量只需 $\sim$20 $\times$ fewer。我们的发现证明了 COSFIRE 筛选器基于的方法在电波 галакти子分类中的效iveness。这种方法不仅可以应用于电波 галакти子分类，还可以扩展到其他图像分类领域。

Wireless Regional Imaging through Reconfigurable Intelligent Surfaces: Passive Mode

paper_url: http://arxiv.org/abs/2311.11222
repo_url: None
paper_authors: Fuhai Wang, Chun Wang, Rujing Xiong, Zhengyu Wang, Tiebin Mi, Robert Caiming Qiu
for: 这篇论文旨在提出一种基于多个反射器的无线成像框架，以解决分布式感知网络的问题。
methods: 该系统使用可变相位调制的多个反射器（RIS）来生成随机的反射 patrern，使接收器能够在定义的空间区域（SoI）中捕捉信号。
results: 论文提出了一种基于多个反射器的线性成像通道模型，并提出了一种计算成像理论来恢复SOI中信号强度分布。furthermore, the paper proposes an amplitude-only imaging algorithm to mitigate the problem of phase unpredictability. Finally, the performance of the imaging algorithm is verified by proof-of-concept experiments under reasonable parameter settings.

Abstract
In this paper, we propose a multi-RIS-aided wireless imaging framework in 3D facing the distributed placement of multi-sensor networks. The system creates a randomized reflection pattern by adjusting the RIS phase shift, enabling the receiver to capture signals within the designated space of interest (SoI). Firstly, a multi-RIS-aided linear imaging channel modeling is proposed. We introduce a theoretical framework of computational imaging to recover the signal strength distribution of the SOI. For the RIS-aided imaging system, the impact of multiple parameters on the performance of the imaging system is analyzed. The simulation results verify the correctness of the proposal. Furthermore, we propose an amplitude-only imaging algorithm for the RIS-aided imaging system to mitigate the problem of phase unpredictability. Finally, the performance verification of the imaging algorithm is carried out by proof of concept experiments under reasonable parameter settings.

摘要
在这篇论文中，我们提出了一种基于多个反射层设备（RIS）的无线影像框架，面临分布式感知网络的分布式部署。系统通过调整RIS阶段偏移，使接收器能够在指定空间 интере（SoI）内捕捉信号。首先，我们提出了一种基于多个RIS的线性影像通道模型。我们介绍了一种计算影像理论框架，以回收SOI中信号强度分布。对于RIS协助影像系统，我们分析了多个参数对系统性能的影响。实验结果证明了我们的提议的正确性。此外，我们提出了一种幂only影像算法，以 Mitigate phase unpredictability问题。最后，我们验证了影像算法的性能通过理想参数设置的证明。

2023-11-19

eess.SP

eess.SP - 2023-11-19

Rethinking Integrated Sensing and Communication: When Near Field Meets Wideband

paper_url: http://arxiv.org/abs/2311.11416
repo_url: None
paper_authors: Zhaolin Wang, Xidong Mu, Yuanwei Liu
for: 本研究重新审视了融合感知通信（ISAC）系统在近场区域内部 while 利用大带宽。
methods: 我们首先揭示了宽频感知通信（S&C）通道的基本特征，并强调在远场到近场区域的转变中发生的两个基本变化。
results: near-field region可以实现宽频-like S&C 功能，包括信号复用和距离探测，以及在高速移动 S&C 场景中实现不可能的宽频功能。

Abstract
This article re-examines integrated sensing and communication (ISAC) systems operating in the near-field region of a large antenna array while exploiting a large bandwidth. We first reveal the fundamental characteristics of wideband sensing and communication (S&C) channels and highlight the key changes that occur during the transition from the far-field to the near-field region. Specifically, there are two fundamental changes in the near-field region: strong angular-delay correlation and element-specific Doppler frequencies. It is highlighted that the near-field effect can enable the wideband-like S&C functionalities in terms of signal multiplexing and range sensing due to the strong angular-delay correlation, thus allowing the trading of large antenna arrays for large bandwidths. Furthermore, it also introduces the wideband-unattainable functionalities in high mobility S&C scenarios by leveraging the element-specific Doppler frequencies. We then delineate certain paradigm shifts in thinking required to advance toward near-field wideband ISAC systems, with a particular emphasis on resource allocation, antenna array arrangement, and transceiver architecture. Lastly, some other promising directions are discussed.

摘要

Vital Signs Estimation Using a 26 GHz Multi-Beam Communication Testbed

paper_url: http://arxiv.org/abs/2311.11275
repo_url: None
paper_authors: Miquel Sellés Valls, Sofie Pollin, Ying Wang, Rizqi Hersyandika, Andre Kokkeler, Yang Miao
for: 这个论文旨在开发一种基于26GHz多束通信测试床的血气监测管理系统，以实现无接触的人体血气监测。
methods: 该系统使用多束通信技术，并利用空间排序的扫描方式来估计呼吸和心跳速率。在单人场景下，使用频率域干扰滤波来提高性能，而在多人场景下，使用K-means算法来提取多个人的血气监测数据。
results: 测试结果显示，在单人场景下， breath rate和heart rate的估计 error 可以达到2 beat/分钟以下，而在多人场景下，可以分别提取出每个人的血气监测数据。这些结果表明，该系统可以准确地检测人体血气监测，并且可以在不同的场景下进行无接触的监测。

Abstract
This paper presents a novel pipeline for vital sign monitoring using a 26 GHz multi-beam communication testbed. In context of Joint Communication and Sensing (JCAS), the advanced communication capability at millimeter-wave bands is comparable to the radio resource of radars and is promising to sense the surrounding environment. Being able to communicate and sense the vital sign of humans present in the environment will enable new vertical services of telecommunication, i.e., remote health monitoring. The proposed processing pipeline leverages spatially orthogonal beams to estimate the vital sign - breath rate and heart rate - of single and multiple persons in static scenarios from the raw Channel State Information samples. We consider both monostatic and bistatic sensing scenarios. For monostatic scenario, we employ the phase time-frequency calibration and Discrete Wavelet Transform to improve the performance compared to the conventional Fast Fourier Transform based methods. For bistatic scenario, we use K-means clustering algorithm to extract multi-person vital signs due to the distinct frequency-domain signal feature between single and multi-person scenarios. The results show that the estimated breath rate and heart rate reach below 2 beats per minute (bpm) error compared to the reference captured by on-body sensor for the single-person monostatic sensing scenario with body-transceiver distance up to 2 m, and the two-person bistatic sensing scenario with BS-UE distance up to 4 m. The presented work does not optimize the OFDM waveform parameters for sensing; it demonstrates a promising JCAS proof-of-concept in contact-free vital sign monitoring using mmWave multi-beam communication systems.

摘要

Link Streams as a Generalization of Graphs and Time Series

paper_url: http://arxiv.org/abs/2311.11187
repo_url: None
paper_authors: Esteban Bautista, Matthieu Latapy
for: 本文旨在探讨链流的ormalism的开发，具体来说是将链流扩展到时间序列之上。
methods: 本文使用了图论的扩展来建立链流的正式形式，并将时间序列扩展到关系维度。
results: 本文表明了链流可以看作是时间序列的扩展，并开展了链流上的信号处理概念的扩展。

Abstract
A link stream is a set of possibly weighted triplets (t, u, v) modeling that u and v interacted at time t. Link streams offer an effective model for datasets containing both temporal and relational information, making their proper analysis crucial in many applications. They are commonly regarded as sequences of graphs or collections of time series. Yet, a recent seminal work demonstrated that link streams are more general objects of which graphs are only particular cases. It therefore started the construction of a dedicated formalism for link streams by extending graph theory. In this work, we contribute to the development of this formalism by showing that link streams also generalize time series. In particular, we show that a link stream corresponds to a time-series extended to a relational dimension, which opens the door to also extend the framework of signal processing to link streams. We therefore develop extensions of numerous signal concepts to link streams: from elementary ones like energy, correlation, and differentiation, to more advanced ones like Fourier transform and filters.

摘要
一个链流是一组可能权重的三元组（t, u, v），表示在时间t上u和v之间的交互。链流提供了包含时间和关系信息的数据集模型，其分析在许多应用中非常重要。它们通常被视为时间序列或图集。然而，一篇最近的著名论文表明，链流是图集的更一般的对象。因此，它开始了链流的专门 формальismus的建构，通过扩展图论。在这项工作中，我们贡献了链流 formalism的发展，并证明了链流也拓展了时间序列。具体来说，一个链流对应于一个扩展到关系维度的时间序列，这开门了将信号处理框架扩展到链流的可能性。因此，我们开发了链流上许多信号概念的扩展：从基础的一些如能量、相关性和导数，到更高级的一些如傅里叶变换和筛选器。

2023-11-18

cs.CV

cs.CV - 2023-11-18

Diverse Shape Completion via Style Modulated Generative Adversarial Networks

paper_url: http://arxiv.org/abs/2311.11184
repo_url: None
paper_authors: Wesley Khademi, Li Fuxin
For: 本文提出了一种新的 Conditional Generative Adversarial Network (CGAN)，用于完成部分观察到的三维对象的形状。* Methods: 该网络使用了风格修饰来实现多种可能的完成，并通过提取完整形状的风格代码来实现更好的完成。它还引入了多尺度多样性罚款和批判器，以避免 conditional mode collapse 并在不需要多个真实完成的前提下训练。* Results: 对多个 sintetic 和实际数据集进行评估，该方法能够有效地尊重部分观察，同时获得更多的多样性在完成中。

Abstract
Shape completion aims to recover the full 3D geometry of an object from a partial observation. This problem is inherently multi-modal since there can be many ways to plausibly complete the missing regions of a shape. Such diversity would be indicative of the underlying uncertainty of the shape and could be preferable for downstream tasks such as planning. In this paper, we propose a novel conditional generative adversarial network that can produce many diverse plausible completions of a partially observed point cloud. To enable our network to produce multiple completions for the same partial input, we introduce stochasticity into our network via style modulation. By extracting style codes from complete shapes during training, and learning a distribution over them, our style codes can explicitly carry shape category information leading to better completions. We further introduce diversity penalties and discriminators at multiple scales to prevent conditional mode collapse and to train without the need for multiple ground truth completions for each partial input. Evaluations across several synthetic and real datasets demonstrate that our method achieves significant improvements in respecting the partial observations while obtaining greater diversity in completions.

摘要
shape completion 目标是从部分观察到的3D形状恢复完整的形状。这是一个多Modal的问题，因为可以有多种可能性地完成部分观察到的区域。这种多样性表示形状的下面 uncertainty 和可能性，这对下游任务 such as 规划来说是有利的。在这篇论文中，我们提出了一种新的 conditional 生成 Adversarial network，可以生成多种可能性的完整的点云。为了使我们的网络可以生成同一个部分输入多个完整的结果，我们在网络中引入了随机性。在训练中，我们从完整的形状中提取了style code，学习了这些代码的分布，这些代码可以显式地携带形状类别信息，从而得到更好的 completions。我们还引入了多个缩放因子和权重来避免 conditional 模式崩溃和不需要多个真实的完整结果来训练。在多个 sintetic 和实际的数据集上进行了评估，我们的方法在尊重部分观察的同时获得了更大的多样性。

Active Prompt Learning in Vision Language Models

paper_url: http://arxiv.org/abs/2311.11178
repo_url: https://github.com/jettbrains/-L-
paper_authors: Jihwan Bang, Sumyeong Ahn, Jae-Gil Lee
for: 本研究旨在适应预训练视觉语言模型（VLM）在活动学习框架下进行适应。
methods: 我们提出了一种新的活动学习框架，称为PCB，以适应预训练VLM。PCB利用VLM的知识提供偏好，以解决标签选择的不均衡问题。
results: 我们在七个实际世界数据集上进行了实验，结果表明PCB比普通的活动学习和随机抽样方法表现更好。

Abstract
Pre-trained Vision Language Models (VLMs) have demonstrated notable progress in various zero-shot tasks, such as classification and retrieval. Despite their performance, because improving performance on new tasks requires task-specific knowledge, their adaptation is essential. While labels are needed for the adaptation, acquiring them is typically expensive. To overcome this challenge, active learning, a method of achieving a high performance by obtaining labels for a small number of samples from experts, has been studied. Active learning primarily focuses on selecting unlabeled samples for labeling and leveraging them to train models. In this study, we pose the question, "how can the pre-trained VLMs be adapted under the active learning framework?" In response to this inquiry, we observe that (1) simply applying a conventional active learning framework to pre-trained VLMs even may degrade performance compared to random selection because of the class imbalance in labeling candidates, and (2) the knowledge of VLMs can provide hints for achieving the balance before labeling. Based on these observations, we devise a novel active learning framework for VLMs, denoted as PCB. To assess the effectiveness of our approach, we conduct experiments on seven different real-world datasets, and the results demonstrate that PCB surpasses conventional active learning and random sampling methods.

摘要

LOSTU: Fast, Scalable, and Uncertainty-Aware Triangulation

paper_url: http://arxiv.org/abs/2311.11171
repo_url: None
paper_authors: Sébastien Henry, John A. Christian
for: 提供一种快速、可扩展、统计优化的三角分解方法（LOSTU），用于结构从运动（SfM）管道中的点云三角分解。
methods: 该方法基于最近的发现，并且不同于传统的$L_2$三角分解方法，可以考虑3D点云不确定性。
results: LOSTU可以减少3D重建错误，并且可以更快速于Levenberg-Marquardt优化方案。

Abstract
Triangulation algorithms often aim to minimize the reprojection ($L_2$) error, but this only provides the maximum likelihood estimate when there are no errors in the camera parameters or camera poses. Although recent advancements have yielded techniques to estimate camera parameters accounting for 3D point uncertainties, most structure from motion (SfM) pipelines still use older triangulation algorithms. This work leverages recent discoveries to provide a fast, scalable, and statistically optimal way to triangulate called LOSTU. Results show that LOSTU consistently produces lower 3D reconstruction errors than conventional $L_2$ triangulation methods -- often allowing LOSTU to successfully triangulate more points. Moreover, in addition to providing a better 3D reconstruction, LOSTU can be substantially faster than Levenberg-Marquardt (or similar) optimization schemes.

摘要
通常，三角化算法的目标是减少($L_2$) reprojection 错误，但这只提供了无摄像头参数或摄像头姿态错误时的最大可能性估计。尽管最近有一些进步，仍有许多结构从运动（SfM）管道使用老的三角化算法。这项工作利用最近的发现，提供一种快速、可扩展、统计优化的三角化方法，称为LOSTU。结果显示，LOSTU 常常生成较低的3D重建错误，并且可以更多的点 successfully triangulated。此外，LOSTU 还可以比 Levenberg-Marquardt（或类似）优化方案更快。

Benchmarking Feature Extractors for Reinforcement Learning-Based Semiconductor Defect Localization

paper_url: http://arxiv.org/abs/2311.11145
repo_url: None
paper_authors: Enrique Dehaerne, Bappaditya Dey, Sandip Halder, Stefan De Gendt
for: 测试和检测半导体板件中的缺陷
methods: 使用深度强化学习（RL）方法进行缺陷定位，并评估不同的特征提取器的效果
results: 评估18个代理人在不同的特征提取器下进行定位缺陷的效果，并讨论RL基本框架在半导体缺陷定位中的优点和缺点。

Abstract
As semiconductor patterning dimensions shrink, more advanced Scanning Electron Microscopy (SEM) image-based defect inspection techniques are needed. Recently, many Machine Learning (ML)-based approaches have been proposed for defect localization and have shown impressive results. These methods often rely on feature extraction from a full SEM image and possibly a number of regions of interest. In this study, we propose a deep Reinforcement Learning (RL)-based approach to defect localization which iteratively extracts features from increasingly smaller regions of the input image. We compare the results of 18 agents trained with different feature extractors. We discuss the advantages and disadvantages of different feature extractors as well as the RL-based framework in general for semiconductor defect localization.

摘要
As semiconductor patterning dimensions shrink, more advanced Scanning Electron Microscopy (SEM) image-based defect inspection techniques are needed. Recently, many Machine Learning (ML)-based approaches have been proposed for defect localization and have shown impressive results. These methods often rely on feature extraction from a full SEM image and possibly a number of regions of interest. In this study, we propose a deep Reinforcement Learning (RL)-based approach to defect localization which iteratively extracts features from increasingly smaller regions of the input image. We compare the results of 18 agents trained with different feature extractors. We discuss the advantages and disadvantages of different feature extractors as well as the RL-based framework in general for semiconductor defect localization.Here's the text in Traditional Chinese:为了应对半导体 Patterning 的缩小，需要更进一步的 Scanning Electron Microscopy (SEM) 图像基于抗错方法。最近，许多 Machine Learning (ML) 基于方法已经被提出来进行抗错定位，并且获得了优异的结果。这些方法通常将特征提取自全SEM图像以及可能的一些区域区域。在这一研究中，我们提出了一个深度强化学习 (RL) 基于的抗错定位方法，这个方法会逐步提取从输入图像中的特征，并且将其与RL网络进行结合。我们将训练 18 个代理人使用不同的特征提取器，并且比较它们的结果。我们会讨论不同的特征提取器优点和缺点，以及RL 基于架构的一般优点和缺点。

Estimating Uncertainty in Landslide Segmentation Models

paper_url: http://arxiv.org/abs/2311.11138
repo_url: None
paper_authors: Savinay Nagendra, Chaopeng Shen, Daniel Kifer
for: 这个论文的目的是为了提供高质量、大规模的滥覆区域风险地区数据集，以便进行防范和减轻滥覆的准备和防控工作。
methods: 这篇论文使用了深度学习模型进行滥覆分割（像素标注），并评估了多种不需要建立新的架构的方法来评估像素级别的不确定性。
results: 实验结果表明，使用测试时数据拟合法（Test-Time Augmentation）方法可以在不同的模型和指标下提供最高质量的不确定性评估结果。

Abstract
Landslides are a recurring, widespread hazard. Preparation and mitigation efforts can be aided by a high-quality, large-scale dataset that covers global at-risk areas. Such a dataset currently does not exist and is impossible to construct manually. Recent automated efforts focus on deep learning models for landslide segmentation (pixel labeling) from satellite imagery. However, it is also important to characterize the uncertainty or confidence levels of such segmentations. Accurate and robust uncertainty estimates can enable low-cost (in terms of manual labor) oversight of auto-generated landslide databases to resolve errors, identify hard negative examples, and increase the size of labeled training data. In this paper, we evaluate several methods for assessing pixel-level uncertainty of the segmentation. Three methods that do not require architectural changes were compared, including Pre-Threshold activations, Monte-Carlo Dropout and Test-Time Augmentation -- a method that measures the robustness of predictions in the face of data augmentation. Experimentally, the quality of the latter method was consistently higher than the others across a variety of models and metrics in our dataset.

摘要
landslide 是一种常 recurs 的、广泛存在的威胁。 prep 和 mitigation efforts 可以通过高质量、大规模的数据集来得到支持。目前没有这样的数据集，并且无法手动构建。 latest 自动化努力是使用深度学习模型进行滥舟分割（像素标注），但也重要是确定这些分割的不确定性或信任水平。准确和可靠的不确定性估计可以减少人工劳动成本，以便对自动生成的滥舟数据库进行低成本监督，解决错误、识别硬例外并增加标注训练数据的大小。在这篇论文中，我们评估了一些方法来评估像素级别的不确定性。我们比较了三种方法，包括 Pre-Threshold 活动、Monte-Carlo Dropout 和 Test-Time Augmentation。实验表明，后一种方法在不同的模型和指标上表现了最高的质量。

Invariant-based Mapping of Space During General Motion of an Observer

paper_url: http://arxiv.org/abs/2311.11130
repo_url: None
paper_authors: Juan D. Yepes, Daniel Raviv
for: 这个论文探讨了基于视觉运动的 invariants，导致一个新的即时领域，在这个领域中，站ARY environment被看作是不变的，即使图像在摄像机运动中不断改变，并且可以探测和避免特定子空间中的障碍物，以及检测运动 objetcs。
methods: 这个论文使用了非线性函数， derivated from measurable optical flow，这些函数与三维几何 invariants相连接。
results: 作者通过实验和实践，证明了这种方法可以在具有相机运动的情况下，保持图像的stationary environment不变，并且可以检测和分类运动 объекcs。

Abstract
This paper explores visual motion-based invariants, resulting in a new instantaneous domain where: a) the stationary environment is perceived as unchanged, even as the 2D images undergo continuous changes due to camera motion, b) obstacles can be detected and potentially avoided in specific subspaces, and c) moving objects can potentially be detected. To achieve this, we make use of nonlinear functions derived from measurable optical flow, which are linked to geometric 3D invariants. We present simulations involving a camera that translates and rotates relative to a 3D object, capturing snapshots of the camera projected images. We show that the object appears unchanged in the new domain over time. We process real data from the KITTI dataset and demonstrate how to segment space to identify free navigational regions and detect obstacles within a predetermined subspace. Additionally, we present preliminary results, based on the KITTI dataset, on the identification and segmentation of moving objects, as well as the visualization of shape constancy. This representation is straightforward, relying on functions for the simple de-rotation of optical flow. This representation only requires a single camera, it is pixel-based, making it suitable for parallel processing, and it eliminates the necessity for 3D reconstruction techniques.

摘要
a) The stationary environment is perceived as unchanged, even as the 2D images undergo continuous changes due to camera motion.b) Obstacles can be detected and potentially avoided in specific subspaces.c) Moving objects can potentially be detected.To achieve this, we use nonlinear functions derived from measurable optical flow, which are linked to geometric 3D invariants. We present simulations involving a camera that translates and rotates relative to a 3D object, capturing snapshots of the camera-projected images. We show that the object appears unchanged in the new domain over time.We process real data from the KITTI dataset and demonstrate how to segment space to identify free navigational regions and detect obstacles within a predetermined subspace. Additionally, we present preliminary results, based on the KITTI dataset, on the identification and segmentation of moving objects, as well as the visualization of shape constancy.This representation is straightforward, relying on functions for the simple de-rotation of optical flow. This representation only requires a single camera, it is pixel-based, making it suitable for parallel processing, and it eliminates the necessity for 3D reconstruction techniques.

SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation

paper_url: http://arxiv.org/abs/2311.11125
repo_url: None
paper_authors: Yamei Chen, Yan Di, Guangyao Zhai, Fabian Manhardt, Chenyangguang Zhang, Ruida Zhang, Federico Tombari, Nassir Navab, Benjamin Busam
for: 这篇研究目的是估计物体的6D姿势和3D大小，特别是面对着巨量内类形态的挑战。
methods: 本研究使用物体特有的几何特征，与DINOv2的semantic数据组合，实现了SE(3)-不变的几何特征抽出，并与DINOv2的特征进行对点对焦，实现了对焦对应的物体表现。
results: 实验结果显示，SecondPose在NOCS-REAL275上比前一代最高12.4%提高，并在更复杂的HouseCat6D上仍然超越其他竞争对手。

Abstract
Category-level object pose estimation, aiming to predict the 6D pose and 3D size of objects from known categories, typically struggles with large intra-class shape variation. Existing works utilizing mean shapes often fall short of capturing this variation. To address this issue, we present SecondPose, a novel approach integrating object-specific geometric features with semantic category priors from DINOv2. Leveraging the advantage of DINOv2 in providing SE(3)-consistent semantic features, we hierarchically extract two types of SE(3)-invariant geometric features to further encapsulate local-to-global object-specific information. These geometric features are then point-aligned with DINOv2 features to establish a consistent object representation under SE(3) transformations, facilitating the mapping from camera space to the pre-defined canonical space, thus further enhancing pose estimation. Extensive experiments on NOCS-REAL275 demonstrate that SecondPose achieves a 12.4% leap forward over the state-of-the-art. Moreover, on a more complex dataset HouseCat6D which provides photometrically challenging objects, SecondPose still surpasses other competitors by a large margin. The code will be released soon.

摘要
Category-level object pose estimation aims to predict the 6D pose and 3D size of objects from known categories, but it often struggles with large intra-class shape variation. Existing methods using mean shapes often fail to capture this variation. To address this issue, we propose SecondPose, a novel approach that integrates object-specific geometric features with semantic category priors from DINOv2. By leveraging the advantages of DINOv2 in providing SE(3)-consistent semantic features, we hierarchically extract two types of SE(3)-invariant geometric features to further encapsulate local-to-global object-specific information. These geometric features are then point-aligned with DINOv2 features to establish a consistent object representation under SE(3) transformations, facilitating the mapping from camera space to the pre-defined canonical space, thus further enhancing pose estimation. Extensive experiments on NOCS-REAL275 demonstrate that SecondPose achieves a 12.4% improvement over the state-of-the-art. Moreover, on the more complex HouseCat6D dataset, which provides photometrically challenging objects, SecondPose still outperforms other competitors by a large margin. The code will be released soon.Here's the translation in Traditional Chinese:Category-level object pose estimation aims to predict the 6D pose and 3D size of objects from known categories, but it often struggles with large intra-class shape variation. Existing methods using mean shapes often fail to capture this variation. To address this issue, we propose SecondPose, a novel approach that integrates object-specific geometric features with semantic category priors from DINOv2. By leveraging the advantages of DINOv2 in providing SE(3)-consistent semantic features, we hierarchically extract two types of SE(3)-invariant geometric features to further encapsulate local-to-global object-specific information. These geometric features are then point-aligned with DINOv2 features to establish a consistent object representation under SE(3) transformations, facilitating the mapping from camera space to the pre-defined canonical space, thus further enhancing pose estimation. Extensive experiments on NOCS-REAL275 demonstrate that SecondPose achieves a 12.4% improvement over the state-of-the-art. Moreover, on the more complex HouseCat6D dataset, which provides photometrically challenging objects, SecondPose still outperforms other competitors by a large margin. The code will be released soon.

ShapeMaker: Self-Supervised Joint Shape Canonicalization, Segmentation, Retrieval and Deformation

paper_url: http://arxiv.org/abs/2311.11106
repo_url: None
paper_authors: Yan Di, Chenyangguang Zhang, Chaowei Wang, Ruida Zhang, Guangyao Zhai, Yanyan Li, Bowen Fu, Xiangyang Ji, Shan Gao
for: 本文提出了一种自助学习框架ShapeMaker，用于同时进行形态均衡化、分割、检索和变换等四个高度相关的过程。
methods: 本文使用了一种独特的自助学习方法，同时具有分割、检索和变换等功能。具体来说，首先从 partially-observed 对象中提取出点级 affine-invariant 特征，然后利用这些特征预测semantically consistent的部分分 segmentation和对应的部分中心。接着，使用了一种轻量级的检索模块，将每个部分的特征集成为其检索 токен，然后与一个预设的数据库中的源形进行比较，以确定最接近的形状。最后，使用了一种基于部分中心的神经网络围栏变换模块，将检索到的形状与输入对象进行精准匹配。
results: 实验表明，ShapeMaker 在 Synthetic 数据集 PartNet、ComplementMe 和实际数据集 Scan2CAD 上表现出色，与竞争者相比，具有显著的优势。

Abstract
In this paper, we present ShapeMaker, a unified self-supervised learning framework for joint shape canonicalization, segmentation, retrieval and deformation. Given a partially-observed object in an arbitrary pose, we first canonicalize the object by extracting point-wise affine-invariant features, disentangling inherent structure of the object with its pose and size. These learned features are then leveraged to predict semantically consistent part segmentation and corresponding part centers. Next, our lightweight retrieval module aggregates the features within each part as its retrieval token and compare all the tokens with source shapes from a pre-established database to identify the most geometrically similar shape. Finally, we deform the retrieved shape in the deformation module to tightly fit the input object by harnessing part center guided neural cage deformation. The key insight of ShapeMaker is the simultaneous training of the four highly-associated processes: canonicalization, segmentation, retrieval, and deformation, leveraging cross-task consistency losses for mutual supervision. Extensive experiments on synthetic datasets PartNet, ComplementMe, and real-world dataset Scan2CAD demonstrate that ShapeMaker surpasses competitors by a large margin. Codes will be released soon.

摘要
在本文中，我们提出了ShapeMaker，一个独立学习框架，用于同时进行形态均衡化、分割、检索和变换。给定一个部分可见的物体，我们首先使用点精度不变的特征提取方法，提取物体的内在结构，并与姿态和大小相关。这些学习的特征然后用于预测相同分割和相应的中心点。接着，我们的轻量级检索模块将每个分割的特征作为检索token进行聚合，并将所有token与源形状库中的形状进行比较，以确定最接近的形状。最后，我们使用中心导航神经网络扭曲模块将检索到的形状与输入物体进行紧密匹配。ShapeMaker的关键思想是同时培养四个高度相关的过程：均衡化、分割、检索和变换，通过交叉任务一致损失来互相超级视图。我们在PartNet、ComplementMe和Scan2CAD等 sintetic数据集上进行了广泛的实验，结果显示ShapeMaker在与竞争对手进行比较时，具有很大的优势。代码即将发布。

On the Out of Distribution Robustness of Foundation Models in Medical Image Segmentation

paper_url: http://arxiv.org/abs/2311.11096
repo_url: None
paper_authors: Duy Minh Ho Nguyen, Tan Ngoc Pham, Nghiem Tuong Diep, Nghi Quoc Phan, Quang Pham, Vinh Tong, Binh T. Nguyen, Ngan Hoang Le, Nhat Ho, Pengtao Xie, Daniel Sonntag, Mathias Niepert
for: 本研究旨在探讨基础模型在医学图像分割任务中的鲁棒性，以便更好地适应不同的分布转移。
methods: 我们使用了各种基础模型，包括ViT和DeiT，并对其进行了微调。
results: 我们的实验结果表明，基础模型在不同的频率上具有更高的鲁棒性，并且我们还发展了一种新的 bayesian 不确定性估计方法，可以用于评估模型在不同数据集上的性能。

Abstract
Constructing a robust model that can effectively generalize to test samples under distribution shifts remains a significant challenge in the field of medical imaging. The foundational models for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach. It showcases impressive learning abilities across different tasks with the need for only a limited amount of annotated samples. While numerous techniques have focused on developing better fine-tuning strategies to adapt these models for specific domains, we instead examine their robustness to domain shifts in the medical image segmentation task. To this end, we compare the generalization performance to unseen domains of various pre-trained models after being fine-tuned on the same in-distribution dataset and show that foundation-based models enjoy better robustness than other architectures. From here, we further developed a new Bayesian uncertainty estimation for frozen models and used them as an indicator to characterize the model's performance on out-of-distribution (OOD) data, proving particularly beneficial for real-world applications. Our experiments not only reveal the limitations of current indicators like accuracy on the line or agreement on the line commonly used in natural image applications but also emphasize the promise of the introduced Bayesian uncertainty. Specifically, lower uncertainty predictions usually tend to higher out-of-distribution (OOD) performance.

摘要
建立一个坚固的模型，使其能够有效地泛化到测试样本下的分布变化，是领域医学影像处理中的一大挑战。基础模型，在大量自然图像和文本数据上进行预训练，已经出现为一种有前途的方法。它在不同任务上展示出了卓越的学习能力，只需要有限量的标注样本。虽然许多技术专注于开发更好的细化策略，以适应特定领域，但我们则研究基础模型在医学图像分割任务中的Robustness。为此，我们比较了不同预训练模型在未看到的领域中的泛化性能，并发现基础模型在这个方面表现出了更好的Robustness。此外，我们还开发了一种新的 bayesian uncertainty estimation 方法，并用其作为指标，来评估模型在未分布（OOD）数据上的性能。我们的实验结果不仅揭示了现有的指标，如精度在线或者线上协调率在自然图像应用中的局限性，而且还强调了我们引入的 bayesian uncertainty 的承诺。具体来说， Lower uncertainty predictions 通常与更高的OOD性能相关。

LightBTSeg: A lightweight breast tumor segmentation model using ultrasound images via dual-path joint knowledge distillation

paper_url: http://arxiv.org/abs/2311.11086
repo_url: None
paper_authors: Hongjiang Guo, Shengwen Wang, Hao Dang, Kangle Xiao, Yaru Yang, Wenpei Liu, Tongtong Liu, Yiying Wan
for: 这个研究的目的是为了提高乳腺癌检测的精确性，以提高乳腺癌的早期诊断和治疗。
methods: 这个研究使用了一种名为LightBTSeg的双路共同知识传授框架，它利用了双教师模型来表达乳腺癌的细部特征。
results: 实验结果显示，LightBTSeg在乳腺癌检测中表现出色，比其他Counterparts更高精确。

Abstract
The accurate segmentation of breast tumors is an important prerequisite for lesion detection, which has significant clinical value for breast tumor research. The mainstream deep learning-based methods have achieved a breakthrough. However, these high-performance segmentation methods are formidable to implement in clinical scenarios since they always embrace high computation complexity, massive parameters, slow inference speed, and huge memory consumption. To tackle this problem, we propose LightBTSeg, a dual-path joint knowledge distillation framework, for lightweight breast tumor segmentation. Concretely, we design a double-teacher model to represent the fine-grained feature of breast ultrasound according to different semantic feature realignments of benign and malignant breast tumors. Specifically, we leverage the bottleneck architecture to reconstruct the original Attention U-Net. It is regarded as a lightweight student model named Simplified U-Net. Then, the prior knowledge of benign and malignant categories is utilized to design the teacher network combined dual-path joint knowledge distillation, which distills the knowledge from cumbersome benign and malignant teachers to a lightweight student model. Extensive experiments conducted on breast ultrasound images (Dataset BUSI) and Breast Ultrasound Dataset B (Dataset B) datasets demonstrate that LightBTSeg outperforms various counterparts.

摘要
importante para la detección de lesiones en el diagnóstico de tumores mamarios. Los métodos basados en aprendizaje profundo mainstream han logrado un avance significativo. Sin embargo, estos métodos de alta performance son difíciles de implementar en escenarios clínicos debido a su complejidad computacional alta, parámetros masivos, velocidad de inferencia lenta y consumo de memoria grande. Para abordar este problema, propusimos LightBTSeg, un marco de distilación de conocimiento dual-path, para segmentación de tumores mamarios livianos.En detalle, diseñamos un modelo doble-maestro para representar la característica fine-grained de la ecografía de mama según diferentes realineaciones semánticas de tumores benignos y malignos. Utilizamos la arquitectura de bottleneck para reconstruir la red Attention U-Net original. Se considera un modelo estudiantil liviano llamado Simplified U-Net. Luego, utilizamos la información previa de categorías benignas y malignas para diseñar la red maestra combinada con distilación de conocimiento dual-path, que transmite el conocimiento de los maestros pesados benignos y malignos a un modelo estudiantil liviano.Los experimentos extensivos realizados en las imágenes de ecografía de mama (dataset BUSI) y el dataset Breast Ultrasound Dataset B (dataset B) demostraron que LightBTSeg supera a sus pares.

Enhancing Transformer-Based Segmentation for Breast Cancer Diagnosis using Auto-Augmentation and Search Optimisation Techniques

paper_url: http://arxiv.org/abs/2311.11065
repo_url: None
paper_authors: Leon Hamnett, Mary Adewunmi, Modinat Abayomi, Kayode Raheem, Fahad Ahmed
for:This paper aims to improve the accuracy and robustness of breast cancer cell segmentation in histology slides using automated image augmentation selection and search optimization strategies.methods:The proposed methodology combines RandAugment with Tree-based Parzen Estimator to identify optimal values for image augmentations and their associated parameters, leading to enhanced segmentation performance.results:The proposed methodology leads to segmentation models that are more resilient to variations in histology slides while maintaining high levels of segmentation performance, with improved segmentation of the tumour class compared to previous research. The best result after applying the augmentations is a Dice Score of 84.08 and an IoU score of 72.54 when segmenting the tumour class.

Abstract
Breast cancer remains a critical global health challenge, necessitating early and accurate detection for effective treatment. This paper introduces a methodology that combines automated image augmentation selection (RandAugment) with search optimisation strategies (Tree-based Parzen Estimator) to identify optimal values for the number of image augmentations and the magnitude of their associated augmentation parameters, leading to enhanced segmentation performance. We empirically validate our approach on breast cancer histology slides, focusing on the segmentation of cancer cells. A comparative analysis of state-of-the-art transformer-based segmentation models is conducted, including SegFormer, PoolFormer, and MaskFormer models, to establish a comprehensive baseline, before applying the augmentation methodology. Our results show that the proposed methodology leads to segmentation models that are more resilient to variations in histology slides whilst maintaining high levels of segmentation performance, and show improved segmentation of the tumour class when compared to previous research. Our best result after applying the augmentations is a Dice Score of 84.08 and an IoU score of 72.54 when segmenting the tumour class. The primary contribution of this paper is the development of a methodology that enhances segmentation performance while ensuring model robustness to data variances. This has significant implications for medical practitioners, enabling the development of more effective machine learning models for clinical applications to identify breast cancer cells from histology slides. Furthermore, the codebase accompanying this research will be released upon publication. This will facilitate further research and application development based on our methodology, thereby amplifying its impact.

摘要
乳癌仍然是全球健康挑战之一，需要早期精准的检测以实现有效的治疗。本文介绍一种方法，将自动图像增强选择（RandAugment）与搜索优化策略（Tree-based Parzen Estimator）结合，以确定图像增强数量和相关增强参数的优化值，以提高分 segmentation性能。我们对乳癌 histology 胶卷进行了实验，专注于癌细胞分 segmentation。我们进行了现有 transformer 基本模型的比较分析，包括 SegFormer、PoolFormer 和 MaskFormer 模型，以建立全面的基准。我们的结果表明，提posed方法可以提高模型对数据变化的抗性，保持高水平的分 segmentation性能，并在识别癌细胞方面显示了改进的分 segmentation性能。我们的最佳结果是 Dice 分数84.08和 IoU 分数72.54，分 segmentation癌细胞类。本文的主要贡献是开发了一种能够提高分 segmentation性能的同时保持模型对数据变化的抗性的方法，这有着重要的医疗应用。此外，本文的代码库将在出版时发布，以便进一步的研究和应用开发，从而增强其影响。

HIDRO-VQA: High Dynamic Range Oracle for Video Quality Assessment

paper_url: http://arxiv.org/abs/2311.11059
repo_url: None
paper_authors: Shreshth Saini, Avinab Saha, Alan C. Bovik
for: 这篇论文旨在提供高动态范围视频质量评估（VQA）模型，用于提供高精度的HDR视频质量评估。
methods: 该模型使用自动生成的相似性推训策略，将SDR视频中的质量感知特征转移到HDR视频中，无需标注。
results: 研究发现，通过自动生成的相似性推训策略，可以将SDR视频中的质量感知特征转移到HDR视频中，并在LIVE-HDR VQA数据库上达到了状态之最好的性能。

Abstract
We introduce HIDRO-VQA, a no-reference (NR) video quality assessment model designed to provide precise quality evaluations of High Dynamic Range (HDR) videos. HDR videos exhibit a broader spectrum of luminance, detail, and color than Standard Dynamic Range (SDR) videos. As HDR content becomes increasingly popular, there is a growing demand for video quality assessment (VQA) algorithms that effectively address distortions unique to HDR content. To address this challenge, we propose a self-supervised contrastive fine-tuning approach to transfer quality-aware features from the SDR to the HDR domain, utilizing unlabeled HDR videos. Our findings demonstrate that self-supervised pre-trained neural networks on SDR content can be further fine-tuned in a self-supervised setting using limited unlabeled HDR videos to achieve state-of-the-art performance on the only publicly available VQA database for HDR content, the LIVE-HDR VQA database. Moreover, our algorithm can be extended to the Full Reference VQA setting, also achieving state-of-the-art performance. Our code is available publicly at https://github.com/avinabsaha/HIDRO-VQA.

摘要
我们介绍HIDRO-VQA，一个不受参考（NR）影像质量评估模型，旨在为高动态范围（HDR）影像提供精确的质量评估。HDR影像比标准动态范围（SDR）影像更具宽频谱、细节和颜色，随着HDR内容的普及，需要一种能有效地处理HDR内容的质量评估算法。为解决这个挑战，我们提议一种基于自我超级vised contrastive fine-tuning的方法，将SDR频谱中的质量感知特征转移到HDR频谱中，使用有限的无标注HDR影像进行自我超级vised fine-tuning。我们的发现表明，可以在自我超级vised Setting中使用SDR内容的自我超级vised预训练网络，通过有限的无标注HDR影像进行自我超级vised fine-tuning，以达到LIVE-HDR VQA数据库中的最新纪录。此外，我们的算法还可以扩展到全参考VQA Setting，也达到了最新纪录。我们的代码可以在https://github.com/avinabsaha/HIDRO-VQA上获得。

Hyperbolic Space with Hierarchical Margin Boosts Fine-Grained Learning from Coarse Labels

paper_url: http://arxiv.org/abs/2311.11019
repo_url: None
paper_authors: Shu-Lin Xu, Yifan Sun, Faen Zhang, Anqi Xu, Xiu-Shen Wei, Yi Yang
for: 这篇论文的目的是提出一种新的方法，用于从粗略标签中学习细化嵌入。
methods: 该方法使用了一种新的嵌入方法，将视觉嵌入映射到一个希пербо利空间中，并在这个空间中应用一种层次cosine margin方式来增强嵌入的推理能力。
results: 经过广泛的实验，该方法在五个 benchmark 数据集上达到了最佳效果，超过了竞争方法的表现。

Abstract
Learning fine-grained embeddings from coarse labels is a challenging task due to limited label granularity supervision, i.e., lacking the detailed distinctions required for fine-grained tasks. The task becomes even more demanding when attempting few-shot fine-grained recognition, which holds practical significance in various applications. To address these challenges, we propose a novel method that embeds visual embeddings into a hyperbolic space and enhances their discriminative ability with a hierarchical cosine margins manner. Specifically, the hyperbolic space offers distinct advantages, including the ability to capture hierarchical relationships and increased expressive power, which favors modeling fine-grained objects. Based on the hyperbolic space, we further enforce relatively large/small similarity margins between coarse/fine classes, respectively, yielding the so-called hierarchical cosine margins manner. While enforcing similarity margins in the regular Euclidean space has become popular for deep embedding learning, applying it to the hyperbolic space is non-trivial and validating the benefit for coarse-to-fine generalization is valuable. Extensive experiments conducted on five benchmark datasets showcase the effectiveness of our proposed method, yielding state-of-the-art results surpassing competing methods.

摘要
学习细腻嵌入从粗略标签的挑战 task Due to limited label granularity supervision, i.e., lacking the detailed distinctions required for fine-grained tasks. The task becomes even more demanding when attempting few-shot fine-grained recognition, which holds practical significance in various applications. To address these challenges, we propose a novel method that embeds visual embeddings into a hyperbolic space and enhances their discriminative ability with a hierarchical cosine margins manner. Specifically, the hyperbolic space offers distinct advantages, including the ability to capture hierarchical relationships and increased expressive power, which favors modeling fine-grained objects. Based on the hyperbolic space, we further enforce relatively large/small similarity margins between coarse/fine classes, respectively, yielding the so-called hierarchical cosine margins manner. While enforcing similarity margins in the regular Euclidean space has become popular for deep embedding learning, applying it to the hyperbolic space is non-trivial and validating the benefit for coarse-to-fine generalization is valuable. Extensive experiments conducted on five benchmark datasets showcase the effectiveness of our proposed method, yielding state-of-the-art results surpassing competing methods.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

Improving Adversarial Transferability by Stable Diffusion

paper_url: http://arxiv.org/abs/2311.11017
repo_url: None
paper_authors: Jiayang Liu, Siyu Zhu, Siyuan Liang, Jie Zhang, Han Fang, Weiming Zhang, Ee-Chien Chang
For: The paper is written to explore the potential of leveraging data generated by Stable Diffusion to boost adversarial transferability in the black-box scenario.* Methods: The paper introduces a novel attack method called Stable Diffusion Attack Method (SDAM), which incorporates samples generated by Stable Diffusion to augment input images. Additionally, the paper proposes a fast variant of SDAM to reduce computational overhead while preserving high adversarial transferability.* Results: The paper demonstrates that the proposed method outperforms state-of-the-art baselines by a substantial margin. The approach is also compatible with existing transfer-based attacks to further enhance adversarial transferability.

Abstract
Deep neural networks (DNNs) are susceptible to adversarial examples, which introduce imperceptible perturbations to benign samples, deceiving DNN predictions. While some attack methods excel in the white-box setting, they often struggle in the black-box scenario, particularly against models fortified with defense mechanisms. Various techniques have emerged to enhance the transferability of adversarial attacks for the black-box scenario. Among these, input transformation-based attacks have demonstrated their effectiveness. In this paper, we explore the potential of leveraging data generated by Stable Diffusion to boost adversarial transferability. This approach draws inspiration from recent research that harnessed synthetic data generated by Stable Diffusion to enhance model generalization. In particular, previous work has highlighted the correlation between the presence of both real and synthetic data and improved model generalization. Building upon this insight, we introduce a novel attack method called Stable Diffusion Attack Method (SDAM), which incorporates samples generated by Stable Diffusion to augment input images. Furthermore, we propose a fast variant of SDAM to reduce computational overhead while preserving high adversarial transferability. Our extensive experimental results demonstrate that our method outperforms state-of-the-art baselines by a substantial margin. Moreover, our approach is compatible with existing transfer-based attacks to further enhance adversarial transferability.

摘要
Here's the translation in Simplified Chinese:深度神经网络 (DNN) 容易受到攻击，这些攻击引入不可见的杂音，误导 DNN 预测。虽然一些攻击方法在白盒设置下表现出色，但在黑盒设置下，它们经常遇到困难，特别是面临了防御机制。不同的技术已经出现，以增强黑盒攻击的传输性。 Among them, input transformation-based attacks have shown their effectiveness. 在这篇文章中，我们探索了使用 Stable Diffusion 生成的数据来提高攻击的传输性。这种方法启发于最近的研究，通过 Stable Diffusion 生成的数据来提高模型通用性。以前的研究已经表明，当存在真实数据和生成数据时，模型的通用性会得到改善。基于这一点，我们提出了一种新的攻击方法，即 Stable Diffusion Attack Method (SDAM)，它利用 Stable Diffusion 生成的样本来补充输入图像。此外，我们还提出了一种快速的 SDAM variant，以降低计算开销而保持高的攻击传输性。我们的实验结果表明，我们的方法在比较之下大幅超越了状态当前的基准值。此外，我们的方法与现有的传输基于攻击相容，可以进一步提高攻击的传输性。

Implicit Event-RGBD Neural SLAM

paper_url: http://arxiv.org/abs/2311.11013
repo_url: None
paper_authors: Delin Qu, Chi Yan, Dong Wang, Jie Yin, Dan Xu, Bin Zhao, Xuelong Li
for: 这种paper的目的是提出一种基于事件RGBD的启发式SLAM框架，以解决非理想enario中的问题，如运动模糊和灯光变化，从而提高tracking和mapping的精度和稳定性。
methods: 这种方法使用了可微分CRF渲染技术，通过共享辐射场来生成独特的RGB和事件摄像头数据，并通过学习一个统一的启发式表示来优化 captured事件和RGBD监视。此外，基于事件的时间差性，我们提出了一种时间聚合优化策略，使用事件的连续差异约束来提高跟踪准确性和稳定性。
results: 我们在6个场景、17个序列的实验中，证明了我们的方法可以在不同的挑战环境中高效地处理运动模糊和灯光变化，并且与现有最佳方法进行比较，在跟踪ATE和mappingACC中具有更高的精度和稳定性。

Abstract
Implicit neural SLAM has achieved remarkable progress recently. Nevertheless, existing methods face significant challenges in non-ideal scenarios, such as motion blur or lighting variation, which often leads to issues like convergence failures, localization drifts, and distorted mapping. To address these challenges, we propose $\textbf{EN-SLAM}$, the first event-RGBD implicit neural SLAM framework, which effectively leverages the high rate and high dynamic range advantages of event data for tracking and mapping. Specifically, EN-SLAM proposes a differentiable CRF (Camera Response Function) rendering technique to generate distinct RGB and event camera data via a shared radiance field, which is optimized by learning a unified implicit representation with the captured event and RGBD supervision. Moreover, based on the temporal difference property of events, we propose a temporal aggregating optimization strategy for the event joint tracking and global bundle adjustment, capitalizing on the consecutive difference constraints of events, significantly enhancing tracking accuracy and robustness. Finally, we construct the simulated dataset $\textbf{DEV-Indoors}$ and real captured dataset $\textbf{DEV-Reals}$ containing 6 scenes, 17 sequences with practical motion blur and lighting changes for evaluations. Experimental results show that our method outperforms the SOTA methods in both tracking ATE and mapping ACC with a real-time $17$ FPS in various challenging environments. The code and dataset will be released upon the paper publication.

摘要
Recently, implicit neural SLAM has made significant progress. However, existing methods still face challenges in non-ideal scenarios, such as motion blur or lighting variation, which often leads to issues like convergence failures, localization drifts, and distorted mapping. To address these challenges, we propose $\textbf{EN-SLAM}$, the first event-RGBD implicit neural SLAM framework, which effectively leverages the high rate and high dynamic range advantages of event data for tracking and mapping. Specifically, EN-SLAM proposes a differentiable CRF (Camera Response Function) rendering technique to generate distinct RGB and event camera data via a shared radiance field, which is optimized by learning a unified implicit representation with the captured event and RGBD supervision. Moreover, based on the temporal difference property of events, we propose a temporal aggregating optimization strategy for the event joint tracking and global bundle adjustment, capitalizing on the consecutive difference constraints of events, significantly enhancing tracking accuracy and robustness. Finally, we construct the simulated dataset $\textbf{DEV-Indoors}$ and real captured dataset $\textbf{DEV-Reals}$ containing 6 scenes, 17 sequences with practical motion blur and lighting changes for evaluations. Experimental results show that our method outperforms the SOTA methods in both tracking ATE and mapping ACC with a real-time $17$ FPS in various challenging environments. The code and dataset will be released upon the paper publication.Here's the translation in Traditional Chinese:过去的几年，隐式神经 SLAM 已经取得了非常的进步。然而，现有的方法在非理想的enario中仍然面临着问题，例如运动模糊或照明变化，这经常导致整合失败、位置漂移和投影变形的问题。为了解决这些问题，我们提出了 $\textbf{EN-SLAM}$，第一个事件-RGBD 隐式神经 SLAM 框架。EN-SLAM 使用了可微的 CRF (Camera Response Function) 渲染技术生成了不同的 RGB 和事件摄像机数据，并且透过学习一个统一的隐式表现来对于捕捉的事件和 RGBD 进行超参。此外，基于事件的时间差异性，我们提出了一个时间聚合优化策略，具体来说是在事件统一追踪和全局统一调整中，运用了 consecutive difference 的条件来增强追踪精度和Robustness。最后，我们建立了 $\textbf{DEV-Indoors}$ 和 $\textbf{DEV-Reals}$ 两个实验 dataset，包括 6 个scene，17 个序列，实际上具有了实验模糊和照明变化。实验结果显示，我们的方法在追踪 ATE 和投影 ACC 方面具有了 SOTA 的表现，并且在多种挑战性环境中实现了实时 $17$ FPS。代码和dataset 将在论文发表时释出。

Learning Scene Context Without Images

paper_url: http://arxiv.org/abs/2311.10998
repo_url: None
paper_authors: Amirreza Rouhi, David Han
for: 教学机器人场景知识，以便它们更有效地与环境交互，预测或预测不可见在视觉场景中的对象。
methods: 提出了一种基于 transformer 的新方法 $LMOD$（标签基于缺失对象检测），通过注意机制教会机器人场景知识。该方法不需要实际图像，只需要图像集标签。
results: 研究表明，通过基于标签学习场景关系，可以通过自注意机制学习场景知识，并且该知识可以提高其他视觉基于对象检测算法的性能。

Abstract
Teaching machines of scene contextual knowledge would enable them to interact more effectively with the environment and to anticipate or predict objects that may not be immediately apparent in their perceptual field. In this paper, we introduce a novel transformer-based approach called $LMOD$ ( Label-based Missing Object Detection) to teach scene contextual knowledge to machines using an attention mechanism. A distinctive aspect of the proposed approach is its reliance solely on labels from image datasets to teach scene context, entirely eliminating the need for the actual image itself. We show how scene-wide relationships among different objects can be learned using a self-attention mechanism. We further show that the contextual knowledge gained from label based learning can enhance performance of other visual based object detection algorithm.

摘要
教机器人场景知识会使其更有效地与环境交互，预测或预测未在视觉范围内出现的对象。在这篇论文中，我们介绍了一种新的变换器基本方法，称为$LMOD$（标签基本缺失检测），用于教机器人场景知识。这种方法异常之处在于它完全不需要图像本身，只需要图像的标签。我们示示了如何使用自我注意机制来学习场景中对象之间的关系。我们进一步示示了通过标签学习获得的Contextual知识可以提高其他视觉基于对象检测算法的性能。

Towards Robust and Accurate Visual Prompting

paper_url: http://arxiv.org/abs/2311.10992
repo_url: None
paper_authors: Qi Li, Liangzhi Li, Zhouqiang Jiang, Bowen Wang
for: 本文研究了使用Visual Prompting（VP）在视觉任务中，并解释了robust模型下的VP表现是否会受到数据集的影响。
methods: 本文使用了一种新的技术名为Prompt Boundary Loose（PBL），以提高标准精度下的视觉提示表现，而不会失去对抗 robustness。
results: 广泛的实验结果表明，我们的方法可以在不同的数据集上提高标准精度和对抗 robustness。

Abstract
Visual prompting, an efficient method for transfer learning, has shown its potential in vision tasks. However, previous works focus exclusively on VP from standard source models, it is still unknown how it performs under the scenario of a robust source model: Whether a visual prompt derived from a robust model can inherit the robustness while suffering from the generalization performance decline, albeit for a downstream dataset that is different from the source dataset? In this work, we get an affirmative answer of the above question and give an explanation on the visual representation level. Moreover, we introduce a novel technique named Prompt Boundary Loose (PBL) to effectively mitigates the suboptimal results of visual prompt on standard accuracy without losing (or even significantly improving) its adversarial robustness when using a robust model as source model. Extensive experiments across various datasets show that our findings are universal and demonstrate the significant benefits of our proposed method.

摘要
<>Visual 提示，一种高效的转移学习方法，在视觉任务中表现出了潜在的潜力。然而，先前的工作都是基于标准源模型进行了研究，而不是知道robust模型下的Visual 提示是否能够继承鲁棒性，而在不同的下游数据集上受到较大的泛化性下降？在这个研究中，我们得到了上述问题的积极答案，并对Visual 提示的视觉表示进行了解释。此外，我们还提出了一种名为Prompt Boundary Loose（PBL）的新技术，可以有效地 mitigate 视觉提示在标准准确性下的不佳结果，而不失去（或甚至进一步提高）对 robust 模型的鲁棒性。通过对多个数据集进行了广泛的实验，我们的发现表明了universal的特点，并demonstrated Visual 提示的重要性和PBL的有效性。Note: "robust" in Chinese is "鲁棒" (lùbù), and "standard" is "标准" (biāozhāng).

Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention

paper_url: http://arxiv.org/abs/2311.10988
repo_url: None
paper_authors: Zuyao Chen, Jinlin Wu, Zhen Lei, Zhaoxiang Zhang, Changwen Chen
for: 提供一种结构化表示方法，用于许多计算机视觉应用程序中。
methods: 基于变换器架构，学习视觉概念对应关系，以扩展已知对象和关系类别。
results: 在视觉 génome 测试benchmark上，提出一种全开 vocabulary SGG方法，实现了不确定对象和关系类别的承认。

Abstract
Scene Graph Generation (SGG) offers a structured representation critical in many computer vision applications. Traditional SGG approaches, however, are limited by a closed-set assumption, restricting their ability to recognize only predefined object and relation categories. To overcome this, we categorize SGG scenarios into four distinct settings based on the node and edge: Closed-set SGG, Open Vocabulary (object) Detection-based SGG (OvD-SGG), Open Vocabulary Relation-based SGG (OvR-SGG), and Open Vocabulary Detection + Relation-based SGG (OvD+R-SGG). While object-centric open vocabulary SGG has been studied recently, the more challenging problem of relation-involved open-vocabulary SGG remains relatively unexplored. To fill this gap, we propose a unified framework named OvSGTR towards fully open vocabulary SGG from a holistic view. The proposed framework is an end-toend transformer architecture, which learns a visual-concept alignment for both nodes and edges, enabling the model to recognize unseen categories. For the more challenging settings of relation-involved open vocabulary SGG, the proposed approach integrates relation-aware pre-training utilizing image-caption data and retains visual-concept alignment through knowledge distillation. Comprehensive experimental results on the Visual Genome benchmark demonstrate the effectiveness and superiority of the proposed framework.

摘要

Make Pixels Dance: High-Dynamic Video Generation

paper_url: http://arxiv.org/abs/2311.10982
repo_url: https://github.com/makepixelsdance/makepixelsdance.github.io
paper_authors: Yan Zeng, Guoqiang Wei, Jiani Zheng, Jiaxin Zou, Yang Wei, Yuchen Zhang, Hang Li
for: 本研究旨在提高人工智能中的视频生成技术，特别是生成具有复杂场景和细腻动作的动画视频。
methods: 本研究提出了一种基于扩散模型的新方法，称为PixelDance，该方法在视频生成过程中结合文本指令和图像指令。
results: 经验表明，PixelDance在使用公共数据进行训练后，能够生成具有复杂场景和细腻动作的视频，并设置了新的标准 для视频生成。

Abstract
Creating high-dynamic videos such as motion-rich actions and sophisticated visual effects poses a significant challenge in the field of artificial intelligence. Unfortunately, current state-of-the-art video generation methods, primarily focusing on text-to-video generation, tend to produce video clips with minimal motions despite maintaining high fidelity. We argue that relying solely on text instructions is insufficient and suboptimal for video generation. In this paper, we introduce PixelDance, a novel approach based on diffusion models that incorporates image instructions for both the first and last frames in conjunction with text instructions for video generation. Comprehensive experimental results demonstrate that PixelDance trained with public data exhibits significantly better proficiency in synthesizing videos with complex scenes and intricate motions, setting a new standard for video generation.

摘要
创造高动态视频，如有很多动作和复杂视觉效果，是人工智能领域的一大挑战。现有的现场最佳实践，主要集中在文本到视频生成，往往会生成视频剪辑件的动作较少，即使保持高精度。我们认为，仅仅根据文本指令是不够和不优化的 для视频生成。在这篇论文中，我们介绍PixelDance，一种基于扩散模型的新方法，将文本指令和图像指令结合使用，用于视频生成。我们的实验结果表明，PixelDance通过使用公共数据进行训练，可以生成视频中的复杂场景和细腻动作，创造出新的标准 для视频生成。

Structure-Aware Sparse-View X-ray 3D Reconstruction

paper_url: http://arxiv.org/abs/2311.10959
repo_url: None
paper_authors: Yuanhao Cai, Jiahao Wang, Alan Yuille, Zongwei Zhou, Angtian Wang
for: 提高 sparse-view X-ray 3D 重建的精度和效率
methods: 使用 Line Segment-based Transformer (Lineformer) 和 Masked Local-Global (MLG) 照样策略
results: 在 X3D dataset 上，SAX-NeRF 比前一代 NeRF-based 方法提高 12.56 和 2.49 dB 在新视图合成和 CT 重建中

Abstract
X-ray, known for its ability to reveal internal structures of objects, is expected to provide richer information for 3D reconstruction than visible light. Yet, existing neural radiance fields (NeRF) algorithms overlook this important nature of X-ray, leading to their limitations in capturing structural contents of imaged objects. In this paper, we propose a framework, Structure-Aware X-ray Neural Radiodensity Fields (SAX-NeRF), for sparse-view X-ray 3D reconstruction. Firstly, we design a Line Segment-based Transformer (Lineformer) as the backbone of SAX-NeRF. Linefomer captures internal structures of objects in 3D space by modeling the dependencies within each line segment of an X-ray. Secondly, we present a Masked Local-Global (MLG) ray sampling strategy to extract contextual and geometric information in 2D projection. Plus, we collect a larger-scale dataset X3D covering wider X-ray applications. Experiments on X3D show that SAX-NeRF surpasses previous NeRF-based methods by 12.56 and 2.49 dB on novel view synthesis and CT reconstruction. Code, models, and data will be released at https://github.com/caiyuanhao1998/SAX-NeRF

摘要
X射线，知名于对物体内部结构的显示，对于3D重建提供更加丰富的信息。然而，现有的神经辐射场（NeRF）算法忽略了这一重要特点，导致它们在捕捉图像对象结构的能力有限。在这篇论文中，我们提出了一个框架，即结构意识X射线神经辐射场（SAX-NeRF），用于稀疏视角X射线3D重建。首先，我们设计了一种基于线段的转换器（Lineformer）作为SAX-NeRF的核心。Lineformer可以在3D空间中捕捉物体的内部结构，并且通过模型每个线段之间的依赖关系来捕捉物体的3D结构。其次，我们提出了一种面积掩码本地全球（MLG）照明策略，以EXTRACTContextual和Geometric信息在2D投影中。此外，我们收集了更加广泛的X3D数据集，覆盖更多的X射线应用场景。实验表明，SAX-NeRF在X3D数据集上超过了之前的NeRF基于方法， Novel View Synthesis和CT重建方面的表现提高12.56和2.49 dB。代码、模型和数据将在https://github.com/caiyuanhao1998/SAX-NeRF上发布。

NAS-ASDet: An Adaptive Design Method for Surface Defect Detection Network using Neural Architecture Search

paper_url: http://arxiv.org/abs/2311.10952
repo_url: None
paper_authors: Zhenrong Wang, Bin Li, Weifeng Li, Shuanlong Niu, Wang Miao, Tongzhi Niu
for: 这个研究旨在找到一个自动生成适合Surface Defect Detection任务的神经网络架构，以提高工业场景中的检测精度和效率。
methods: 本研究使用Neural Architecture Search（NAS）技术，搭配一个适应性搜寻空间，以自动生成适合Surface Defect Detection任务的神经网络架构。搜寻空间包括重复排列的基本新细胞，以及可搜寻的注意操作。进一步地，使用一种进步搜寻策略和深度超级运算，以更快速地和更好地探索搜寻空间。
results: 实验结果显示，提案的方法可以实现高性能和轻量级的检测网络，并且与其他竞争方法相比，包括手动设计和NAS方法，具有更好的性能和较小的网络模型大小。

Abstract
Deep convolutional neural networks (CNNs) have been widely used in surface defect detection. However, no CNN architecture is suitable for all detection tasks and designing effective task-specific requires considerable effort. The neural architecture search (NAS) technology makes it possible to automatically generate adaptive data-driven networks. Here, we propose a new method called NAS-ASDet to adaptively design network for surface defect detection. First, a refined and industry-appropriate search space that can adaptively adjust the feature distribution is designed, which consists of repeatedly stacked basic novel cells with searchable attention operations. Then, a progressive search strategy with a deep supervision mechanism is used to explore the search space faster and better. This method can design high-performance and lightweight defect detection networks with data scarcity in industrial scenarios. The experimental results on four datasets demonstrate that the proposed method achieves superior performance and a relatively lighter model size compared to other competitive methods, including both manual and NAS-based approaches.

摘要
深度卷积神经网络 (CNN) 已广泛应用于表面缺陷检测中。然而，没有任何 CNN 架构适合所有检测任务，设计有效的任务特定网络需要较大的努力。神经网络搜索 (NAS) 技术使得可以自动生成适应数据驱动的网络。我们提出了一种新的方法called NAS-ASDet，用于适应性地设计检测网络。首先，我们设计了一个精细化和适用于工业场景的搜索空间，这个空间由重叠的基本新细胞组成，每个细胞具有搜索注意操作。然后，我们使用一种进步的搜索策略和深度超级视图机制，以更快和更好地探索搜索空间。这种方法可以在工业场景中的数据稀缺情况下设计高性能且轻量级的缺陷检测网络。实验结果表明，我们的方法可以在四个数据集上实现superior的性能，并且与其他竞争方法相比，模型的大小更轻量级。

Single-shot Phase Retrieval from a Fractional Fourier Transform Perspective

paper_url: http://arxiv.org/abs/2311.10950
repo_url: None
paper_authors: Yixiao Yang, Ran Tao, Kaixuan Wei, Jun Shi
for: 复原普通频率图像 (Classical Phase Retrieval)
methods: 融合FrFT测量模型与自愿学习重建方法
results: 实现单一测量复原 (Single-shot Phase Retrieval) 和获得高品质图像 (High-quality Images)

Abstract
The realm of classical phase retrieval concerns itself with the arduous task of recovering a signal from its Fourier magnitude measurements, which are fraught with inherent ambiguities. A single-exposure intensity measurement is commonly deemed insufficient for the reconstruction of the primal signal, given that the absent phase component is imperative for the inverse transformation. In this work, we present a novel single-shot phase retrieval paradigm from a fractional Fourier transform (FrFT) perspective, which involves integrating the FrFT-based physical measurement model within a self-supervised reconstruction scheme. Specifically, the proposed FrFT-based measurement model addresses the aliasing artifacts problem in the numerical calculation of Fresnel diffraction, featuring adaptability to both short-distance and long-distance propagation scenarios. Moreover, the intensity measurement in the FrFT domain proves highly effective in alleviating the ambiguities of phase retrieval and relaxing the previous conditions on oversampled or multiple measurements in the Fourier domain. Furthermore, the proposed self-supervised reconstruction approach harnesses the fast discrete algorithm of FrFT alongside untrained neural network priors, thereby attaining preeminent results. Through numerical simulations, we demonstrate that both amplitude and phase objects can be effectively retrieved from a single-shot intensity measurement using the proposed approach and provide a promising technique for support-free coherent diffraction imaging.

摘要
经典阶段恢复的领域涉及于从傅里尔变换（Fourier Transform）中获取信号，但这在存在内在幂等性的情况下是一项困难的任务。单个曝光量测量通常被视为不够于重建原始信号，因为缺失的相位组件是重建过程中的关键因素。在这种情况下，我们提出了一种基于分解傅里尔变换（FrFT）的新的单极恢复方法。我们在这种方法中将FrFT基于物理测量模型集成到了一种自适应恢复方案中。具体来说，我们的FrFT基于测量模型解决了数值计算幂等噪声的问题，并且可以适应短距离和长距离传播enario。此外，在FrFT域中测量INTENSITY的方法具有减轻恢复ambiguities和放弃先前需要多个测量或扩展的观测的优点。此外，我们的自适应恢复方法利用FrFT快速简洁算法和未训练神经网络约束，实现了突出的结果。通过数值实验，我们示示了单极测量INTENSITY可以有效地从单个曝光量测量中获取恢复信号，并提供了一种支持自由幂 diffraction imaging 的有力的技术。

Jenga Stacking Based on 6D Pose Estimation for Architectural Form Finding Process

paper_url: http://arxiv.org/abs/2311.10918
repo_url: None
paper_authors: Zixun Huang
for: 这篇论文主要是为了探讨当前最新的6D pose estimation方法的现状，以及在不同的建筑设计场景下应用pose estimation方法的选择。
methods: 本论文通过对最新的Gen6d研究进行评估，对当前开放集成方法进行质量评估，包括应用级别、预测速度、遮挡性、准确率、环境干扰等方面的评估。
results: 本论文通过对6D pose estimation方法的综合评估，发现在应用级别和预测速度方面有所改进空间，对于遮挡性和环境干扰的防止仍然存在一定的挑战。同时，通过与建筑风景环境评估结合，提出了一种可质量的建筑设计方法。

Abstract
This paper includes a review of current state of the art 6d pose estimation methods, as well as a discussion of which pose estimation method should be used in two types of architectural design scenarios. Taking the latest pose estimation research Gen6d as an example, we make a qualitative assessment of the current openset methods in terms of application level, prediction speed, resistance to occlusion, accuracy, resistance to environmental interference, etc. In addition, we try to combine 6D pose estimation and building wind environment assessment to create tangible architectural design approach, we discuss the limitations of the method and point out the direction in which 6d pose estimation is eager to progress in this scenario.

摘要
这篇论文包括当前最佳6D姿态估计方法的回顾，以及在两类建筑设计场景中应用哪种姿态估计方法的讨论。以latest pose estimation research Gen6d为例，我们对当前开放集成方法进行质量评估，包括应用水平、预测速度、遮挡耐受度、准确率、环境干扰耐受度等方面。此外，我们尝试将6D姿态估计与建筑风险环境评估结合，创造可触媒建筑设计方法，并讨论这种方法的局限性和进一步发展方向。

2023-11-18

cs.AI

cs.AI - 2023-11-18

Morphology-Enhanced CAM-Guided SAM for weakly supervised Breast Lesion Segmentation

paper_url: http://arxiv.org/abs/2311.11176
repo_url: https://github.com/yuexin18/morseg-cam-sam
paper_authors: Xin Yue, Qing Zhao, Jianqiang Li, Xiaoling Liu, Changwei Song, Suqin Liu, Guanghui Fu
for: 这个研究的目的是提出一个新的测脉检测标准，以帮助早期识别乳腺癌。
methods: 这个方法使用了一个基于 morphology 的测脉检测模型，并且使用了 Class Activation Map (CAM) 来精确地定位悖论。
results: 这个方法可以获得比较好的效果，其 Dice 分数为 74.39%，与超参考方法相比，其 Hausdorff 距离为 24.27，较低。

Abstract
Breast cancer diagnosis challenges both patients and clinicians, with early detection being crucial for effective treatment. Ultrasound imaging plays a key role in this, but its utility is hampered by the need for precise lesion segmentation-a task that is both time-consuming and labor-intensive. To address these challenges, we propose a new framework: a morphology-enhanced, Class Activation Map (CAM)-guided model, which is optimized using a computer vision foundation model known as SAM. This innovative framework is specifically designed for weakly supervised lesion segmentation in early-stage breast ultrasound images. Our approach uniquely leverages image-level annotations, which removes the requirement for detailed pixel-level annotation. Initially, we perform a preliminary segmentation using breast lesion morphology knowledge. Following this, we accurately localize lesions by extracting semantic information through a CAM-based heatmap. These two elements are then fused together, serving as a prompt to guide the SAM in performing refined segmentation. Subsequently, post-processing techniques are employed to rectify topological errors made by the SAM. Our method not only simplifies the segmentation process but also attains accuracy comparable to supervised learning methods that rely on pixel-level annotation. Our framework achieves a Dice score of 74.39% on the test set, demonstrating compareable performance with supervised learning methods. Additionally, it outperforms a supervised learning model, in terms of the Hausdorff distance, scoring 24.27 compared to Deeplabv3+'s 32.22. These experimental results showcase its feasibility and superior performance in integrating weakly supervised learning with SAM. The code is made available at: https://github.com/YueXin18/MorSeg-CAM-SAM.

摘要
乳癌诊断带来挑战，特别是在早期发现是关键。ultrasound imaging在这方面发挥着关键作用，但是需要精准的肿体分割，这是时间consuming和劳动密集的任务。为了解决这些挑战，我们提出了一个新的框架：一种基于形态学的Class Activation Map（CAM）引导模型，通过一个已知的计算机视觉基础模型（SAM）进行优化。这种创新的框架专门用于早期乳癌ultrasound图像中的弱类supervised lesion segmentation。我们的方法利用图像级别的标注，从而消除了精准的像素级别标注的需求。首先，我们进行初步分 segmentation，基于乳腺癌形态知识。然后，我们准确地找到肿体，通过提取semantic信息，并使用CAM基于的热图来准确地Localize lesions。这两个元素之后被融合，作为SAM进行精确的分 segmentation的指引。最后，我们使用post-processing技术来修正SAM中的topological错误。我们的方法不仅简化了分 segmentation过程，还可以达到与超级vised learning方法相同的准确性。我们的框架在测试集上达到了74.39%的Dice分数，与超级vised learning方法相当。此外，它还在 Hausdorff distance上 OUTPERFORMS Deeplabv3+，分别为24.27和32.22。这些实验结果表明了我们的框架在结合弱类学习与SAM的可行性和性能优势。代码可以在以下链接获取：https://github.com/YueXin18/MorSeg-CAM-SAM。

Best uses of ChatGPT and Generative AI for computer science research

paper_url: http://arxiv.org/abs/2311.11175
repo_url: None
paper_authors: Eduardo C. Garrido-Merchan
for: This paper explores the diverse applications of ChatGPT and other generative AI technologies in computer science academic research, with a focus on using these tools to boost the productivity of computer research scientists.methods: The paper highlights innovative uses of generative AI, such as brainstorming research ideas, aiding in the drafting and styling of academic papers, and assisting in the synthesis of state-of-the-art sections.results: The paper makes recommendations for using generative AI to improve the productivity of computer research scientists, including using these tools for synthetic data creation, research methodology, and mentorship, as well as for task organization and article quality assessment. Additionally, the paper explores the capabilities of generative AI in disseminating ideas, generating images and audio, text transcription, and engaging with editors.

Abstract
Generative Artificial Intelligence (AI), particularly tools like OpenAI's popular ChatGPT, is reshaping the landscape of computer science research. Used wisely, these tools can boost the productivity of a computer research scientist. This paper provides an exploration of the diverse applications of ChatGPT and other generative AI technologies in computer science academic research, making recommendations about the use of Generative AI to make more productive the role of the computer research scientist, with the focus of writing new research papers. We highlight innovative uses such as brainstorming research ideas, aiding in the drafting and styling of academic papers and assisting in the synthesis of state-of-the-art section. Further, we delve into using these technologies in understanding interdisciplinary approaches, making complex texts simpler, and recommending suitable academic journals for publication. Significant focus is placed on generative AI's contributions to synthetic data creation, research methodology, and mentorship, as well as in task organization and article quality assessment. The paper also addresses the utility of AI in article review, adapting texts to length constraints, constructing counterarguments, and survey development. Moreover, we explore the capabilities of these tools in disseminating ideas, generating images and audio, text transcription, and engaging with editors. We also describe some non-recommended uses of generative AI for computer science research, mainly because of the limitations of this technology.

摘要
生成人工智能（AI），特别是开源AI的ChatGPT等工具，正在计算机科学研究领域发挥重要作用。如果用得当，这些工具可以提高计算机研究科学家的生产力。本文通过探讨生成AI在计算机科学学术研究中的多种应用，并提出使用生成AI来提高计算机研究科学家的作用，主要是为了写新的研究论文。我们指出了使用这些技术的创新用途，如讨论研究主题、帮助撰写和编写学术论文、协助合并state-of-the-art部分。此外，我们还探讨了这些技术在跨学科approach、简化复杂文本、建议适合发表学术刊物等方面的应用。在生成数据创造、研究方法、导师、任务组织和文章质量评估等方面，生成AI做出了重要贡献。此外，我们还探讨了AI在文章审查、文章修改、调查开发等方面的应用。此外，我们还探讨了这些工具在传播想法、生成图像和音频、文本笔记、与编辑器交互等方面的能力。最后，我们还讨论了生成AI在计算机科学研究中的一些不建议使用情况，主要是因为这些技术的限制。

Deep Coherence Learning: An Unsupervised Deep Beamformer for High Quality Single Plane Wave Imaging in Medical Ultrasound

paper_url: http://arxiv.org/abs/2311.11169
repo_url: None
paper_authors: Hyunwoo Cho, Seongjun Park, Jinbum Kang, Yangmo Yoo
for: 这个研究旨在提高单束波图像成像（PWI）的高框率和新临床应用。
methods: 这篇研究提出了一种新的无监督学习方法，即深度同步学习（DL）基本构成（DL-DCL），以提高单束波图像成像质量。
results: 实验和phantom研究表明，提出的DL-DCL方法可以与传统的多束波图像成像（DMAS）和传统的深度学习（DAS）相比，具有更高的空间和对比分辨率。

Abstract
Plane wave imaging (PWI) in medical ultrasound is becoming an important reconstruction method with high frame rates and new clinical applications. Recently, single PWI based on deep learning (DL) has been studied to overcome lowered frame rates of traditional PWI with multiple PW transmissions. However, due to the lack of appropriate ground truth images, DL-based PWI still remains challenging for performance improvements. To address this issue, in this paper, we propose a new unsupervised learning approach, i.e., deep coherence learning (DCL)-based DL beamformer (DL-DCL), for high-quality single PWI. In DL-DCL, the DL network is trained to predict highly correlated signals with a unique loss function from a set of PW data, and the trained DL model encourages high-quality PWI from low-quality single PW data. In addition, the DL-DCL framework based on complex baseband signals enables a universal beamformer. To assess the performance of DL-DCL, simulation, phantom and in vivo studies were conducted with public datasets, and it was compared with traditional beamformers (i.e., DAS with 75-PWs and DMAS with 1-PW) and other DL-based methods (i.e., supervised learning approach with 1-PW and generative adversarial network (GAN) with 1-PW). From the experiments, the proposed DL-DCL showed comparable results with DMAS with 1-PW and DAS with 75-PWs in spatial resolution, and it outperformed all comparison methods in contrast resolution. These results demonstrated that the proposed unsupervised learning approach can address the inherent limitations of traditional PWIs based on DL, and it also showed great potential in clinical settings with minimal artifacts.

摘要
单波探射图像重建（PWI）在医疗超声中成为重要的重建方法，具有高帧率和新的临床应用。近期，基于深度学习（DL）的单波PWI被研究以超越传统PWI的帧率降低。然而，由于缺乏适当的基准图像，DL基础的PWI仍然具有挑战性。为解决这问题，本文提出了一新的无监督学习方法——深度协调学习（DCL）基础的DL扁平变数（DL-DCL），以提高单波PWI的质量。在DCL-DCL中，DL网络被训练来预测基于单波探射数据的高度相关的信号，并且训练DL模型以从低质量单波探射数据中获得高质量PWI。此外，DCL-DCL框架基于复杂的基带信号，实现了通用的扁平变数。为评估DCL-DCL的性能，本文进行了遮 simulation、实验和生体研究，并与传统扁平变数（i.e., DAS with 75-PWs和DMAS with 1-PW）和其他DL基础方法（i.e., 监督学习方法with 1-PW和生成 adversarial network（GAN）with 1-PW）进行比较。实验结果显示，提案的DCL-DCL与DMAS with 1-PW和DAS with 75-PWs相似的 spatial resolution，并且与所有比较方法相比，具有更高的contrast resolution。这些结果显示了提案的无监督学习方法可以解决传统PWI基于DL的内在限制，并且在临床设置中具有最小的错误。

Mitigating Exposure Bias in Discriminator Guided Diffusion Models

paper_url: http://arxiv.org/abs/2311.11164
repo_url: None
paper_authors: Eleftherios Tsonis, Paraskevi Tzouveli, Athanasios Voulodimos
for: 提高Diffusion Models生成图像质量
methods: incorporating an auxiliary term derived from a discriminator network, modifying the sampling approach
results: Achieving an FID score of 1.73 on the unconditional CIFAR-10 dataset, outperforming the current state-of-the-art.Here’s the format you requested:
for: <what are the paper written for?>
methods: <what methods the paper use?>
results: <what results the paper get?>I hope that helps!

Abstract
Diffusion Models have demonstrated remarkable performance in image generation. However, their demanding computational requirements for training have prompted ongoing efforts to enhance the quality of generated images through modifications in the sampling process. A recent approach, known as Discriminator Guidance, seeks to bridge the gap between the model score and the data score by incorporating an auxiliary term, derived from a discriminator network. We show that despite significantly improving sample quality, this technique has not resolved the persistent issue of Exposure Bias and we propose SEDM-G++, which incorporates a modified sampling approach, combining Discriminator Guidance and Epsilon Scaling. Our proposed approach outperforms the current state-of-the-art, by achieving an FID score of 1.73 on the unconditional CIFAR-10 dataset.

摘要
Diffusion Models 在图像生成中表现出色，但它们的训练需要高度计算能力，这导致了不断尝试改进生成图像质量的方法。一种最新的方法是通过抽象网络来提供一个辅助项，以bridging模型分数和数据分数之间的差距。我们发现，尽管显著提高样本质量，但这种技术并未解决持续存在的曝光偏见问题。我们提出了SEDM-G++，它将 combine Discriminator Guidance 和 Epsilon Scaling 两种技术，并实现了 current state-of-the-art 的 FID 分数（1.73）在无条件 CIFAR-10 数据集上。

paper_url: http://arxiv.org/abs/2311.11157
repo_url: None
paper_authors: Saurav Joshi, Filip Ilievski, Luca Luceri
for: 本研究目的是寻找互联网趋势的表达方式，即互联网趋势的媒体表达形式。
methods: 本研究使用了一种名为“知识图”的semantic repository of knowledge，将互联网趋势的表达形式与知识图中的内容进行对比，从而识别和映射互联网趋势。
results: 研究发现，可以通过对互联网趋势的表达形式与知识图中的内容进行对比，来识别和映射互联网趋势。此外，研究还发现了不同平台上的趋势的差异和流行的趋势，以及一些常见的趋势渠道和子Reddit。最后，研究还示出了如何使用知识图来提供社交媒体上趋势的上下文。

Abstract
Internet memes have emerged as a novel format for communication and expressing ideas on the web. Their fluidity and creative nature are reflected in their widespread use, often across platforms and occasionally for unethical or harmful purposes. While computational work has already analyzed their high-level virality over time and developed specialized classifiers for hate speech detection, there have been no efforts to date that aim to holistically track, identify, and map internet memes posted on social media. To bridge this gap, we investigate whether internet memes across social media platforms can be contextualized by using a semantic repository of knowledge, namely, a knowledge graph. We collect thousands of potential internet meme posts from two social media platforms, namely Reddit and Discord, and perform an extract-transform-load procedure to create a data lake with candidate meme posts. By using vision transformer-based similarity, we match these candidates against the memes cataloged in a recently released knowledge graph of internet memes, IMKG. We provide evidence that memes published online can be identified by mapping them to IMKG. We leverage this grounding to study the prevalence of memes on different platforms, discover popular memes, and select common meme channels and subreddits. Finally, we illustrate how the grounding can enable users to get context about memes on social media thanks to their link to the knowledge graph.

摘要
互联网趣闻在互联网上作为一种新的交流和表达方式得到广泛的应用。它们的流动性和创新性使其在不同的平台上广泛使用， occasional 用于不道德或有害的目的。虽然计算工作已经分析了这些趣闻的时间权 virality 和开发了特殊的 hate speech 检测器，但到目前为止没有任何尝试将互联网趣闻在社交媒体上 Contextualized 。为了bridging这个差距，我们investigate 了 whether 互联网趣闻在社交媒体平台上可以通过使用一个semantic repository of knowledge，即知识 graphs 来contextualize。我们收集了 thousands 的 potential 互联网趣闻帖子从两个社交媒体平台，namely Reddit 和 Discord，并perform 了一个 extract-transform-load 过程，以创建一个数据湖包含候选趣闻帖子。通过使用视力 transformer 基于相似性，我们将这些候选者与 IMKG 中 cataloged 的趣闻进行匹配。我们提供了证据，证明在线上发布的趣闻可以被IMKG 中的趣闻映射。我们利用这种固定来研究不同平台上趣闻的流行程度，发现 популяр的趣闻，并选择常见的趣闻渠道和 subreddits。最后，我们示例了如何通过这种固定，使用户在社交媒体上获得趣闻的CONTEXT。

A Principled Framework for Knowledge-enhanced Large Language Model

paper_url: http://arxiv.org/abs/2311.11135
repo_url: None
paper_authors: Saizhuo Wang, Zhihan Liu, Zhaoran Wang, Jian Guo
for: 提高LLMs的深度分析能力和可靠性，使其能够应用于重要场景。
methods: 提出了一种严格设计的框架，通过 anchoring knowledge 和 closed-loop reasoning 来提高 LLMs 的分析能力。
results: 通过分析框架的各个组成部分，证明了 LLMs 的理解能力的提高，并且在定义的假设下提供了理论保证。

Abstract
Large Language Models (LLMs) are versatile, yet they often falter in tasks requiring deep and reliable reasoning due to issues like hallucinations, limiting their applicability in critical scenarios. This paper introduces a rigorously designed framework for creating LLMs that effectively anchor knowledge and employ a closed-loop reasoning process, enhancing their capability for in-depth analysis. We dissect the framework to illustrate the contribution of each component to the LLMs' performance, offering a theoretical assurance of improved reasoning under well-defined assumptions.

摘要

Bayesian Neural Networks: A Min-Max Game Framework

paper_url: http://arxiv.org/abs/2311.11126
repo_url: None
paper_authors: Junping Hong, Ercan Engin Kuruoglu
for: 这个论文主要是为了探讨bayesian neural networks的应用和关系。
methods: 这个论文使用了bayesian neural networks和variational inference来训练模型，并将其формализова为一个MINIMAX游戏问题。
results: 实验结果表明，这个方法和现有的closed-loop transcription neural network相当，并且可以提供另一种 bayesian neural networks的视角。

Abstract
Bayesian neural networks use random variables to describe the neural networks rather than deterministic neural networks and are mostly trained by variational inference which updates the mean and variance at the same time. Here, we formulate the Bayesian neural networks as a minimax game problem. We do the experiments on the MNIST data set and the primary result is comparable to the existing closed-loop transcription neural network. Finally, we reveal the connections between Bayesian neural networks and closed-loop transcription neural networks, and show our framework is rather practical, and provide another view of Bayesian neural networks.

摘要
bayesian neural networks 使用Random variable来描述神经网络，而不是决定性神经网络，通常通过变量推导来训练。在这里，我们将 bayesian neural networks 表示为最小化游戏问题。我们在 MNIST 数据集上进行实验，主要结果与现有的关闭环路译写神经网络相当。最后，我们描述 bayesian neural networks 和关闭环路译写神经网络之间的联系，并表明我们的框架是实用的，并提供了 bayesian neural networks 另一种视图。Here's the translation breakdown:* Bayesian neural networks ( bayesian neural networks ) - 概率神经网络* use random variables ( 使用Random variable ) - 使用随机变量* to describe the neural networks ( 描述神经网络 ) - 描述神经网络* rather than deterministic neural networks ( 而不是决定性神经网络 ) - 而不是决定性神经网络* and are mostly trained by variational inference ( 通常通过变量推导来训练 ) - 通常通过变量推导来训练* We formulate the Bayesian neural networks as a minimax game problem ( 在这里，我们将 bayesian neural networks 表示为最小化游戏问题 ) - 将 bayesian neural networks 表示为最小化游戏问题* We do the experiments on the MNIST data set ( 我们在 MNIST 数据集上进行实验 ) - 在 MNIST 数据集上进行实验* and the primary result is comparable to the existing closed-loop transcription neural network ( 主要结果与现有的关闭环路译写神经网络相当 ) - 主要结果与现有的关闭环路译写神经网络相当* Finally, we reveal the connections between Bayesian neural networks and closed-loop transcription neural networks ( 最后，我们描述 bayesian neural networks 和关闭环路译写神经网络之间的联系 ) - 最后，我们描述 bayesian neural networks 和关闭环路译写神经网络之间的联系* and show our framework is rather practical ( 并表明我们的框架是实用的 ) - 并表明我们的框架是实用的* and provide another view of Bayesian neural networks ( 并提供了 bayesian neural networks 另一种视图 ) - 并提供了 bayesian neural networks 另一种视图

An Improved Neural Network Model Based On CNN Using For Fruit Sugar Degree Detection

paper_url: http://arxiv.org/abs/2311.11120
repo_url: None
paper_authors: Boyang Deng, Xin Wen, Zhan Gao
for: 这个论文是用于检测水果糖分的检测方法。
methods: 该论文使用了人工神经网络模型，其中低层为多层感知器(MLP)，中间层为2维相关矩阵层，高层为卷积神经网络(CNN)层。此外，论文还使用了振荡分解(WD)来降维特征，以及进化算法(GA)来找到优秀特征。
results: 论文通过对水果谱спектrum数据进行处理和分析，并比较了不同的神经网络模型和传统参数选择方法的效果。结果表明，使用人工神经网络模型可以准确地检测水果糖分，并且比传统参数选择方法更高效。此外，论文还提出了一种基于数据标准差(STD)的新评价标准，用于评价检测性能。

Abstract
Artificial Intelligence(AI) widely applies in Image Classification and Recognition, Text Understanding and Natural Language Processing, which makes great progress. In this paper, we introduced AI into the fruit quality detection field. We designed a fruit sugar degree regression model using an Artificial Neural Network based on spectra of fruits within the visible/near-infrared(V/NIR)range. After analysis of fruit spectra, we innovatively proposed a new neural network structure: low layers consist of a Multilayer Perceptron(MLP), a middle layer is a 2-dimensional correlation matrix layer, and high layers consist of several Convolutional Neural Network(CNN) layers. In this study, we used fruit sugar value as a detection target, collecting two fruits called Gan Nan Navel and Tian Shan Pear as samples, doing experiments respectively, and comparing their results. We used Analysis of Variance(ANOVA) to evaluate the reliability of the dataset we collected. Then, we tried multiple strategies to process spectrum data, evaluating their effects. In this paper, we tried to add Wavelet Decomposition(WD) to reduce feature dimensions and a Genetic Algorithm(GA) to find excellent features. Then, we compared Neural Network models with traditional Partial Least Squares(PLS) based models. We also compared the neural network structure we designed(MLP-CNN) with other traditional neural network structures. In this paper, we proposed a new evaluation standard derived from dataset standard deviation(STD) for evaluating detection performance, validating the viability of using an artificial neural network model to do fruit sugar degree nondestructive detection.

摘要
人工智能（AI）广泛应用于图像分类和识别、自然语言处理和文本理解等领域，带来了很大的进步。在这篇论文中，我们将AI应用于水果质量检测领域。我们设计了一种基于人工神经网络的水果糖度预测模型，使用水果在可见/近红外（V/NIR）谱spectra进行分析。经过分析水果谱spectra后，我们创新地提出了一种新的神经网络结构：低层为多层感知网络（MLP），中层为2维相关矩阵层，高层为几个卷积神经网络（CNN）层。在这个研究中，我们使用水果糖度值作为检测目标，采集了两种水果样本——芜南柑和天山梨，进行了分别的实验，并比较了其结果。我们使用分布式 Анализа variance（ANOVA）评估数据集的可靠性。然后，我们尝试了多种处理谱数据的策略，评估其效果。在这篇论文中，我们尝试了使用扩展特征矩阵（WD）减少特征维度，以及使用进化算法（GA）找到优秀的特征。然后，我们比较了神经网络模型与传统的部分最小平方（PLS）基于模型。我们还比较了我们设计的神经网络结构（MLP-CNN）与其他传统神经网络结构。在这篇论文中，我们提出了一种基于数据集标准差（STD）的新评价标准，用于评估检测性能。这些结果validate了使用人工神经网络模型进行水果糖度非 destruktive检测的可能性。

Utilizing Speech Emotion Recognition and Recommender Systems for Negative Emotion Handling in Therapy Chatbots

paper_url: http://arxiv.org/abs/2311.11116
repo_url: None
paper_authors: Farideh Majidi, Marzieh Bahrami
for: 提高访问者情绪支持和提供人类化同情methods: 使用语音感知技术和语音感受识别（SER）技术，并使用Convolutional Neural Network（CNN）模型和ShEMO数据集来准确地探测和分类负面情绪，包括恐慌、恐惧和悲伤。同时，开发了一个推荐系统，利用SER模型的输出生成个性化的情绪管理建议，并使用 GloVe 和 LSTM 模型来实现这一点。results: SER 模型在语音信号中探测和分类负面情绪的验证准确率为 88%，而推荐模型在生成个性化的情绪管理建议方面的准确率为 98%。通过 integrate 文本到语音模型 GlowTTS，能够为用户提供人类化的同情和建议，并在英文和波斯语中提供多语言支持。

Abstract
Emotional well-being significantly influences mental health and overall quality of life. As therapy chatbots become increasingly prevalent, their ability to comprehend and respond empathetically to users' emotions remains limited. This paper addresses this limitation by proposing an approach to enhance therapy chatbots with auditory perception, enabling them to understand users' feelings and provide human-like empathy. The proposed method incorporates speech emotion recognition (SER) techniques using Convolutional Neural Network (CNN) models and the ShEMO dataset to accurately detect and classify negative emotions, including anger, fear, and sadness. The SER model achieves a validation accuracy of 88%, demonstrating its effectiveness in recognizing emotional states from speech signals. Furthermore, a recommender system is developed, leveraging the SER model's output to generate personalized recommendations for managing negative emotions, for which a new bilingual dataset was generated as well since there is no such dataset available for this task. The recommender model achieves an accuracy of 98% by employing a combination of global vectors for word representation (GloVe) and LSTM models. To provide a more immersive and empathetic user experience, a text-to-speech model called GlowTTS is integrated, enabling the therapy chatbot to audibly communicate the generated recommendations to users in both English and Persian. The proposed approach offers promising potential to enhance therapy chatbots by providing them with the ability to recognize and respond to users' emotions, ultimately improving the delivery of mental health support for both English and Persian-speaking users.

摘要
情感健康对于心理健康和全面生活质量具有重要影响。随着咨询虚拟助手在使用人数的增加，它们的理解和回应用户情感的能力仍然有限。本文解决这一问题，提出一种增强咨询虚拟助手的方法，使其能够通过听觉感知，理解用户的情感状况，并提供人类化同情。本方法利用了支持Vector（SER）技术，使用Convolutional Neural Network（CNN）模型和ShEMO数据集，准确地检测和分类负面情感，包括恐惧、愤怒和悲伤。SER模型的验证准确率达88%，证明其能够从语音信号中准确地检测情感状况。此外，我们还开发了一个推荐系统，利用SER模型的输出，为用户提供个性化的情感管理建议。我们生成了一个新的双语数据集，以便在这个任务上进行训练。推荐模型使用了GloVe word表示模型和LSTM模型，实现了98%的准确率。为提供更加投入和同情的用户体验，我们还 интегрирова了一个名为GlowTTS的文本读取模型，使咨询虚拟助手能够通过语音方式传达给用户建议，并在英文和波斯语中进行同时传达。本方法的提议具有推动咨询虚拟助手提供更好的情感管理支持的潜在能力。

Environment-Aware Dynamic Graph Learning for Out-of-Distribution Generalization

paper_url: http://arxiv.org/abs/2311.11114
repo_url: https://github.com/ringbdstack/eagle
paper_authors: Haonan Yuan, Qingyun Sun, Xingcheng Fu, Ziwei Zhang, Cheng Ji, Hao Peng, Jianxin Li
for: This paper focuses on improving the out-of-distribution (OOD) generalization of dynamic graph neural networks (DGNNs) by modeling complex coupled environments and exploiting spatio-temporal invariant patterns.
methods: The proposed Environment-Aware dynamic Graph LEarning (EAGLE) framework includes an environment-aware EA-DGNN to model environments, an environment instantiation mechanism to diversify environments, and an invariant pattern recognition mechanism to discriminate spatio-temporal invariant patterns for OOD prediction.
results: The proposed EAGLE framework achieves superior performance compared to state-of-the-art baselines under distribution shifts, demonstrating its effectiveness in improving OOD generalization on dynamic graphs.

Abstract
Dynamic graph neural networks (DGNNs) are increasingly pervasive in exploiting spatio-temporal patterns on dynamic graphs. However, existing works fail to generalize under distribution shifts, which are common in real-world scenarios. As the generation of dynamic graphs is heavily influenced by latent environments, investigating their impacts on the out-of-distribution (OOD) generalization is critical. However, it remains unexplored with the following two major challenges: (1) How to properly model and infer the complex environments on dynamic graphs with distribution shifts? (2) How to discover invariant patterns given inferred spatio-temporal environments? To solve these challenges, we propose a novel Environment-Aware dynamic Graph LEarning (EAGLE) framework for OOD generalization by modeling complex coupled environments and exploiting spatio-temporal invariant patterns. Specifically, we first design the environment-aware EA-DGNN to model environments by multi-channel environments disentangling. Then, we propose an environment instantiation mechanism for environment diversification with inferred distributions. Finally, we discriminate spatio-temporal invariant patterns for out-of-distribution prediction by the invariant pattern recognition mechanism and perform fine-grained causal interventions node-wisely with a mixture of instantiated environment samples. Experiments on real-world and synthetic dynamic graph datasets demonstrate the superiority of our method against state-of-the-art baselines under distribution shifts. To the best of our knowledge, we are the first to study OOD generalization on dynamic graphs from the environment learning perspective.

摘要
现代图学 neural networks (DGNNs) 在利用空间-时间模式方面越来越普遍。然而，现有的工作无法适应分布变化，这是现实世界中的常见情况。由于生成动态图的过程受到隐藏环境的影响，研究这些环境对于 OUT-OF-DISTRIBUTION (OOD) 通用性的影响是 kritical。然而，这还没有得到充分研究，主要有两个挑战：1. 如何正确地模型和推理动态图上复杂的环境下的分布变化？2. 如何在推理出的空间-时间环境中发现不变Pattern？为解决这些挑战，我们提出了一种新的 Environment-Aware 动态图学 LEarning (EAGLE) 框架，用于 OOD 通用性。具体来说，我们首先设计了环境意识 EA-DGNN，用于模型环境。然后，我们提出了环境实例化机制，用于实现环境多样性。最后，我们通过不变Pattern认识机制来识别OOD情况，并通过精细的 causal interventions 来进行精细的节点修饰。实验结果表明，我们的方法在真实世界和synthetic dynamic graph dataset上表现出优于状态的基eline under distribution shifts。到目前为止，我们是第一个从环境学习角度研究 OOD 通用性在动态图上。

$\varepsilon$-fractional Core Stability in Hedonic Games

paper_url: http://arxiv.org/abs/2311.11101
repo_url: None
paper_authors: Simone Fioravanti, Michele Flammini, Bojana Kodric, Giovanna Varricchio
for:This paper focuses on the problem of coalition formation in hedonic games, where agents are strategic and have individual preferences. The goal is to find a stable coalition structure that satisfies some form of stability, such as core-stability.methods:The paper proposes a new notion of $\varepsilon$-fractional core-stability, which allows for a fraction of coalitions to core-block, and designs efficient algorithms to find such partitions for two fundamental classes of hedonic games. The paper also explores the use of probabilistic sampling to learn valuations and compute outcomes that are $\varepsilon$-fractional core-stable.results:The paper shows that the proposed notion of $\varepsilon$-fractional core-stability can guarantee both existence and polynomial-time computation, and provides efficient algorithms for finding such partitions in two fundamental classes of hedonic games. The paper also gives positive and negative results on which distributions allow for the efficient computation of outcomes that are $\varepsilon$-fractional core-stable with arbitrarily high confidence in a PAC-learning fashion.

Abstract
Hedonic Games (HGs) are a classical framework modeling coalition formation of strategic agents guided by their individual preferences. According to these preferences, it is desirable that a coalition structure (i.e. a partition of agents into coalitions) satisfies some form of stability. The most well-known and natural of such notions is arguably core-stability. Informally, a partition is core-stable if no subset of agents would like to deviate by regrouping in a so-called core-blocking coalition. Unfortunately, core-stable partitions seldom exist and even when they do, it is often computationally intractable to find one. To circumvent these problems, we propose the notion of $\varepsilon$-fractional core-stability, where at most an $\varepsilon$-fraction of all possible coalitions is allowed to core-block. It turns out that such a relaxation may guarantee both existence and polynomial-time computation. Specifically, we design efficient algorithms returning an $\varepsilon$-fractional core-stable partition, with $\varepsilon$ exponentially decreasing in the number of agents, for two fundamental classes of HGs: Simple Fractional and Anonymous. From a probabilistic point of view, being the definition of $\varepsilon$-fractional core equivalent to requiring that uniformly sampled coalitions core-block with probability lower than $\varepsilon$, we further extend the definition to handle more complex sampling distributions. Along this line, when valuations have to be learned from samples in a PAC-learning fashion, we give positive and negative results on which distributions allow the efficient computation of outcomes that are $\varepsilon$-fractional core-stable with arbitrarily high confidence.

摘要
хедонис游戏（HG）是一种经典的框架模型，用于模型策略性agent coalition formation。根据这些 preferences，一个 coalition structure（即agent分配到不同的 coalition）满足一种形式的稳定性是感兴趣的。最常见和最自然的这种概念是核稳定性。 informally, a partition is core-stable if no subset of agents would like to deviate by regrouping in a so-called core-blocking coalition。 unfortunately, core-stable partitions seldom exist and even when they do, it is often computationally intractable to find one。 to circumvent these problems, we propose the notion of ε-fractional core-stability, where at most an ε-fraction of all possible coalitions is allowed to core-block。 it turns out that such a relaxation may guarantee both existence and polynomial-time computation。 specifically, we design efficient algorithms returning an ε-fractional core-stable partition, with ε exponentially decreasing in the number of agents, for two fundamental classes of HGs：simple fractional and anonymous。 from a probabilistic point of view, being the definition of ε-fractional core equivalent to requiring that uniformly sampled coalitions core-block with probability lower than ε， we further extend the definition to handle more complex sampling distributions。 along this line, when valuations have to be learned from samples in a PAC-learning fashion, we give positive and negative results on which distributions allow the efficient computation of outcomes that are ε-fractional core-stable with arbitrarily high confidence。

Introducing NCL-SM: A Fully Annotated Dataset of Images from Human Skeletal Muscle Biopsies

paper_url: http://arxiv.org/abs/2311.11099
repo_url: https://github.com/atifkhanncl/ncl-sm
paper_authors: Atif Khan, Conor Lawless, Amy Vincent, Charlotte Warren, Valeria Di Leo, Tiago Gomes, A. Stephen McGough
for: 这个论文的目的是提供一个高质量的生物快照数据集，用于开发自动化、精准、可重复地分析单元细胞（SM）组织图像。
methods: 这篇论文使用了生物快照技术，并提供了高质量的生物快照数据集，包括46个人体制备样本和生物Marker的检测结果。
results: 这篇论文发现了一个全新的、高质量的生物快照数据集，可以用于开发自动化、精准、可重复地分析SM组织图像。这个数据集包括超过50,000个手动分割的肌细胞（myofibers），并且对每个myofibers进行了高质量的检查和标注。

Abstract
Single cell analysis of skeletal muscle (SM) tissue is a fundamental tool for understanding many neuromuscular disorders. For this analysis to be reliable and reproducible, identification of individual fibres within microscopy images (segmentation) of SM tissue should be precise. There is currently no tool or pipeline that makes automatic and precise segmentation and curation of images of SM tissue cross-sections possible. Biomedical scientists in this field rely on custom tools and general machine learning (ML) models, both followed by labour intensive and subjective manual interventions to get the segmentation right. We believe that automated, precise, reproducible segmentation is possible by training ML models. However, there are currently no good quality, publicly available annotated imaging datasets available for ML model training. In this paper we release NCL-SM: a high quality bioimaging dataset of 46 human tissue sections from healthy control subjects and from patients with genetically diagnosed muscle pathology. These images include $>$ 50k manually segmented muscle fibres (myofibres). In addition we also curated high quality myofibres and annotated reasons for rejecting low quality myofibres and regions in SM tissue images, making this data completely ready for downstream analysis. This, we believe, will pave the way for development of a fully automatic pipeline that identifies individual myofibres within images of tissue sections and, in particular, also classifies individual myofibres that are fit for further analysis.

摘要
单元细胞分析对骨附肌（SM）组织是基本工具，用于理解许多神经肌肉疾病。为了使这种分析可靠和重复，单元细胞内的图像分割（segmentation）需要精准。目前没有任何工具或管道可以自动地对SM组织横截面图像进行精准分割和摘要。生物医学科学家在这个领域依赖于自定义工具和通用机器学习（ML）模型，然后进行劳动 INTENSIVE 和主观的手动干预，以确保分割是正确的。我们认为，通过训练ML模型，可以实现自动、精准、可重复的分割。然而，目前没有一个良好的、公共可用的批处理图像数据集，用于ML模型训练。在这篇论文中，我们发布NCL-SM：一个高质量生物影像数据集，包括46名健康控制者和被遗传诊断的肌肉疾病患者的人类组织横截面图像。这些图像包含> 50k个手动分割的肌肉元（myofibres）。此外，我们还精心准备了高质量的肌肉元和SM组织图像的批处理结果，并对低质量的肌肉元和SM组织图像进行了描述，使这些数据完全准备好进行下游分析。我们认为，这将开创出一个完全自动的分析管道，可以在SM组织图像中自动地标识和分割各个肌肉元，并在特定情况下还可以对各个肌肉元进行分类。

Radiology Report Generation Using Transformers Conditioned with Non-imaging Data

paper_url: http://arxiv.org/abs/2311.11097
repo_url: None
paper_authors: Nurbanu Aksoy, Nishant Ravikumar, Alejandro F Frangi
for: 这个研究旨在提高医疗影像解释的效率，并且使用多 modal 的资料来增强 radiology 报告生成。methods: 这个研究使用了一个新的多Modal transformer 网络，将静脉肺 X 光像和相关的病人人口资料集成，生成具体化的 radiology 报告。results: 根据评估指标，包括病人人口资料，使用提案的方法可以将 radiology 报告质量提高，相比基准网络使用静脉肺 X 光像alone。

Abstract
Medical image interpretation is central to most clinical applications such as disease diagnosis, treatment planning, and prognostication. In clinical practice, radiologists examine medical images and manually compile their findings into reports, which can be a time-consuming process. Automated approaches to radiology report generation, therefore, can reduce radiologist workload and improve efficiency in the clinical pathway. While recent deep-learning approaches for automated report generation from medical images have seen some success, most studies have relied on image-derived features alone, ignoring non-imaging patient data. Although a few studies have included the word-level contexts along with the image, the use of patient demographics is still unexplored. This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information, to synthesise patient-specific radiology reports. The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information, to synthesise full-text radiology reports. Data from two public databases were used to train and evaluate the proposed approach. CXRs and reports were extracted from the MIMIC-CXR database and combined with corresponding patients' data MIMIC-IV. Based on the evaluation metrics used including patient demographic information was found to improve the quality of reports generated using the proposed approach, relative to a baseline network trained using CXRs alone. The proposed approach shows potential for enhancing radiology report generation by leveraging rich patient metadata and combining semantic text embeddings derived thereof, with medical image-derived visual features.

摘要
医疗图像解读是许多临床应用的核心，包括疾病诊断、治疗规划和前景预测。在临床实践中，医生会 manually examining医疗图像并编写报告，这可能是一项时间消耗的过程。因此，自动化医学报告生成方法可以减轻医生的工作负担，提高临床流程的效率。 although recent deep learning approaches for automated report generation from medical images have shown some success, most studies have relied on image-derived features alone, ignoring non-imaging patient data。在这些研究中，只有一些研究包括了图像上的字符级别上下文，但是使用患者的人口数据仍然是一个未explored的领域。这篇论文提出了一种新的多Modal transformer网络，该网络将 integrate chest x-ray (CXR) 图像和相关的患者人口数据，以生成个性化的医学报告。该网络使用卷积神经网络提取 CXR 图像中的视觉特征，并使用基于 transformer 的 encoder-decoder 网络将这些视觉特征与患者人口数据相结合，以生成全文医学报告。使用两个公共数据库进行训练和评估，包括 MIMIC-CXR 数据库和 MIMIC-IV 数据库。根据评估指标，包括患者人口数据，使用该提案的方法生成的报告质量相比基线网络训练用 CXR 图像alone 提高。该提案表明可以通过利用丰富的患者Metadata和基于 semantic text embeddings derivated thereof，与医疗图像中的视觉特征结合，提高医学报告生成的质量。

Deep Tensor Network

paper_url: http://arxiv.org/abs/2311.11091
repo_url: https://github.com/carpedm20/DCGAN-tensorflow
paper_authors: Yifan Zhang
for: 本文探讨了矩阵category的基本原理，通过矩阵产品的universal property，开拓了深度网络架构中新的方法ология。
methods: 本文的主要贡献是提出了矩阵注意力和矩阵互动机制，这是一种利用矩阵category提高深度网络计算效率和表达能力的新方法，甚至可以推广到量子领域。
results: 本文的实验结果表明，通过矩阵注意力和矩阵互动机制，可以增强深度网络的计算效率和表达能力，并且可以应用于量子领域。

Abstract
In this paper, we delve into the foundational principles of tensor categories, harnessing the universal property of the tensor product to pioneer novel methodologies in deep network architectures. Our primary contribution is the introduction of the Tensor Attention and Tensor Interaction Mechanism, a groundbreaking approach that leverages the tensor category to enhance the computational efficiency and the expressiveness of deep networks, and can even be generalized into the quantum realm.

摘要
在这篇论文中，我们探索了矩阵类别的基本原理，利用矩阵产品的 universality 性来开拓深度网络体系的新方法。我们的主要贡献是提出了矩阵注意力和矩阵互动机制，这是一种创新的方法，利用矩阵类别来提高深度网络的计算效率和表达力，甚至可以扩展到量子领域。

paper_url: http://arxiv.org/abs/2311.11090
repo_url: None
paper_authors: Nurbanu Aksoy, Serge Sharoff, Selcuk Baser, Nishant Ravikumar, Alejandro F Frangi
for: 这个论文目的是为了自动生成医疗影像报告，描述医疗影像中的发现。
methods: 该论文提出了一种基于多Modal深度学习网络的方法，结合了结构化患者数据（如生物指标和症状）和不结构化医疗记录，以生成胸部X射线报告。我们引入了一种conditioned cross-multi-head注意力模块，以融合这些不同数据模式，bridging semantic gap between visual and textual data。
results: 实验表明，通过结合多 modal 数据，比单依图像数据获得了显著提高。此外，我们的模型在ROUGE-L指标上达到了相关的state-of-the-art 模型之上。此外，我们还使用了人工评估和临床semantic相似度测量，并与word-overlap指标进行深度的量化分析。

Abstract
Image-to-text radiology report generation aims to automatically produce radiology reports that describe the findings in medical images. Most existing methods focus solely on the image data, disregarding the other patient information accessible to radiologists. In this paper, we present a novel multi-modal deep neural network framework for generating chest X-rays reports by integrating structured patient data, such as vital signs and symptoms, alongside unstructured clinical notes.We introduce a conditioned cross-multi-head attention module to fuse these heterogeneous data modalities, bridging the semantic gap between visual and textual data. Experiments demonstrate substantial improvements from using additional modalities compared to relying on images alone. Notably, our model achieves the highest reported performance on the ROUGE-L metric compared to relevant state-of-the-art models in the literature. Furthermore, we employed both human evaluation and clinical semantic similarity measurement alongside word-overlap metrics to improve the depth of quantitative analysis. A human evaluation, conducted by a board-certified radiologist, confirms the model's accuracy in identifying high-level findings, however, it also highlights that more improvement is needed to capture nuanced details and clinical context.

摘要
radiology report生成旨在自动生成医疗影像中的找到结果。大多数现有方法只是专注于图像数据，忽略了可以 accessible 的患者信息。在这篇论文中，我们提出了一种新的多modal deep neural network框架，用于生成胸部X射线报告，并将结构化患者数据，如生物指标和症状，与不结构化医疗记录相结合。我们提出了一种 conditioned cross-multi-head attention模块，以融合这些不同数据模式，跨越视觉和文本数据之间的semantic gap。实验结果表明，通过使用更多的modalities，可以获得显著提高。特别是，我们的模型在ROUGE-L指标上的表现比 relevante state-of-the-art 模型更高。此外，我们还使用了人类评估和临床semantic相似度测量，与word-overlap指标共同进行深入的量化分析。人类评估，由美国医学会资具认证的放射学专家进行评估，表明模型能够准确地确定高级结果，但也表明需要进一步改进，以捕捉细节和临床上下文。

Combining EEG and NLP Features for Predicting Students’ Lecture Comprehension using Ensemble Classification

paper_url: http://arxiv.org/abs/2311.11088
repo_url: None
paper_authors: Phantharach Natnithikarat, Theerawit Wilaiprasitporn, Supavit Kongwudhikunakorn
for: 这个研究用于探讨学生在课堂教学中的理解水平，并提出一种分类框架来预测学生在两个任务中的困惑和答案正确性。
methods: 这个研究使用了生物маркер和自然语言处理技术来检测学生的课堂理解水平。生物маркер技术用于检测学生的脑电活动，而自然语言处理技术用于分析学生的语言表达。这些技术结合使用以提取 инте格рирован的特征，以便更好地预测学生的理解水平。
results: 实验结果显示，这种分类框架可以比基eline更高的准确率预测学生的困惑和答案正确性。在两个任务中，这种框架的F1分数最高达0.65和0.78，表明使用这种方法可以提高分类性能。此外，还使用了学生自我报告的困惑评分作为一个Integrated特征，以进一步提高分类性能。

Abstract
Electroencephalography (EEG) and Natural Language Processing (NLP) can be applied for education to measure students' comprehension in classroom lectures; currently, the two measures have been used separately. In this work, we propose a classification framework for predicting students' lecture comprehension in two tasks: (i) students' confusion after listening to the simulated lecture and (ii) the correctness of students' responses to the post-lecture assessment. The proposed framework includes EEG and NLP feature extraction, processing, and classification. EEG and NLP features are extracted to construct integrated features obtained from recorded EEG signals and sentence-level syntactic analysis, which provide information about specific biomarkers and sentence structures. An ensemble stacking classification method -- a combination of multiple individual models that produces an enhanced predictive model -- is studied to learn from the features to make predictions accurately. Furthermore, we also utilized subjective confusion ratings as another integrated feature to enhance classification performance. By doing so, experiment results show that this framework performs better than the baselines, which achieved F1 up to 0.65 for predicting confusion and 0.78 for predicting correctness, highlighting that utilizing this has helped improve the classification performance.

摘要
电子脑波图像（EEG）和自然语言处理（NLP）可以应用于教育，以测量学生在课堂讲解中的理解水平。目前，这两种测量方法都是分开使用的。在这项工作中，我们提议一种分类框架，用于预测学生在两个任务中的课堂理解水平：（i）学生听完模拟课程后的混乱程度，以及（ii）学生完成后评估测验中的答案正确性。提议的框架包括EEG和NLP特征提取、处理和分类。EEG和NLP特征都是从记录的EEG信号和句子水平语法分析中提取出来的，它们提供了特定的生物标志和句子结构信息。我们还利用了学生主观的混乱评分作为另一个整合特征，以提高分类性能。通过这样做，实验结果显示，这个框架比基线表现更好，其F1分数可达0.65，用于预测混乱，以及0.78，用于预测正确性，这表明使用这个框架可以提高分类性能。

ECLM: Efficient Edge-Cloud Collaborative Learning with Continuous Environment Adaptation

paper_url: http://arxiv.org/abs/2311.11083
repo_url: None
paper_authors: Yan Zhuang, Zhenzhe Zheng, Yunfeng Shao, Bingshuai Li, Fan Wu, Guihai Chen
For: The paper is written for developing an edge-cloud collaborative learning framework for rapid model adaptation in dynamic edge environments.* Methods: The paper proposes a novel block-level model decomposition design to decompose the original large cloud model into multiple combinable modules, and an end-to-end learning framework that incorporates the modular model design into an efficient model adaptation pipeline.* Results: The paper achieves significant improvements in model performance (18.89% accuracy increase) and resource efficiency (7.12x communication cost reduction) in adapting models to dynamic edge environments by efficiently collaborating the edge and the cloud models.

Abstract
Pervasive mobile AI applications primarily employ one of the two learning paradigms: cloud-based learning (with powerful large models) or on-device learning (with lightweight small models). Despite their own advantages, neither paradigm can effectively handle dynamic edge environments with frequent data distribution shifts and on-device resource fluctuations, inevitably suffering from performance degradation. In this paper, we propose ECLM, an edge-cloud collaborative learning framework for rapid model adaptation for dynamic edge environments. We first propose a novel block-level model decomposition design to decompose the original large cloud model into multiple combinable modules. By flexibly combining a subset of the modules, this design enables the derivation of compact, task-specific sub-models for heterogeneous edge devices from the large cloud model, and the seamless integration of new knowledge learned on these devices into the cloud model periodically. As such, ECLM ensures that the cloud model always provides up-to-date sub-models for edge devices. We further propose an end-to-end learning framework that incorporates the modular model design into an efficient model adaptation pipeline including an offline on-cloud model prototyping and training stage, and an online edge-cloud collaborative adaptation stage. Extensive experiments over various datasets demonstrate that ECLM significantly improves model performance (e.g., 18.89% accuracy increase) and resource efficiency (e.g., 7.12x communication cost reduction) in adapting models to dynamic edge environments by efficiently collaborating the edge and the cloud models.

摘要
通用移动AI应用主要采用云基本学习（强大大模型）或设备内学习（轻量级小模型）两种学习方法。尽管它们各有优点，但是 neither paradigm can effectively handle动态边缘环境中的数据分布变化和设备资源波动， resulting in performance degradation. In this paper, we propose ECLM, an edge-cloud collaborative learning framework for rapid model adaptation in dynamic edge environments. We first propose a novel block-level model decomposition design to decompose the original large cloud model into multiple combinable modules. By flexibly combining a subset of the modules, this design enables the derivation of compact, task-specific sub-models for heterogeneous edge devices from the large cloud model, and the seamless integration of new knowledge learned on these devices into the cloud model periodically. As such, ECLM ensures that the cloud model always provides up-to-date sub-models for edge devices. We further propose an end-to-end learning framework that incorporates the modular model design into an efficient model adaptation pipeline including an offline on-cloud model prototyping and training stage, and an online edge-cloud collaborative adaptation stage. Extensive experiments over various datasets demonstrate that ECLM significantly improves model performance (e.g., 18.89% accuracy increase) and resource efficiency (e.g., 7.12x communication cost reduction) in adapting models to dynamic edge environments by efficiently collaborating the edge and the cloud models.

paper_url: http://arxiv.org/abs/2311.11080
repo_url: None
paper_authors: Yuxin Zuo, Haojia Sun, Yongyi Hu, Jianxiong Guo, Xiaofeng Gao
for: This paper aims to address the data-driven version of influence maximization, where the diffusion model is not given and needs to be inferred from the history cascades.methods: The paper proposes a machine learning-based framework called DSCom, which leverages node attributes to estimate the closeness between connected nodes and overcome the influence overlap problem.results: The proposed algorithm is evaluated through empirical experiments with parameterized diffusion models based on real-world social networks, showing its efficiency and effectiveness.Here’s the Chinese version:for: 这篇论文主要解决了数据驱动版本的影响最大化问题，其中diffusion模型未提供，需要从历史扩散中推断。methods: 该论文提出了基于机器学习的DSCom框架，利用节点特征来估计连接节点的相互关系，并通过自similarity matrix来解决因果重叠问题。results: 该算法经验测试了基于实际社交网络的参数化扩散模型，证明其效率和有效性。

Abstract
Influence maximization aims to find a subset of seeds that maximize the influence spread under a given budget. In this paper, we mainly address the data-driven version of this problem, where the diffusion model is not given but needs to be inferred from the history cascades. Several previous works have addressed this topic in a statistical way and provided efficient algorithms with theoretical guarantee. However, in their settings, though the diffusion parameters are inferred, they still need users to preset the diffusion model, which can be an intractable problem in real-world practices. In this paper, we reformulate the problem on the attributed network and leverage the node attributes to estimate the closeness between the connected nodes. Specifically, we propose a machine learning-based framework, named DSCom, to address this problem in a heuristic way. Under this framework, we first infer the users' relationship from the diffusion dataset through attention mechanism and then leverage spectral clustering to overcome the influence overlap problem in the lack of exact diffusion formula. Compared to the previous theoretical works, we carefully designed empirical experiments with parameterized diffusion models based on real-world social networks, which prove the efficiency and effectiveness of our algorithm.

摘要
“影响 maximization 目标是找到一个最大化影响的种子子集，在给定的预算下。在这篇论文中，我们主要关注数据驱动的版本问题，即 diffusion 模型不是给定的，而是从历史扩散中推断出来。先前的一些工作已经Addressed this topic in a statistical way, providing efficient algorithms with theoretical guarantee. However, in their settings, the diffusion parameters are inferred, but users still need to preset the diffusion model, which can be an intractable problem in real-world practices.在这篇论文中，我们将问题 reformulate 到 attributed network 上，并利用节点特征来估计连接节点之间的距离。specifically，我们提出了一种机器学习基于的框架，named DSCom，来解决这个问题。在这个框架下，我们首先通过注意力机制从扩散数据集中推断用户之间的关系，然后利用 спектраль聚类来超越扩散影响的问题，具有缺乏准确扩散方程的情况下。与先前的理论工作相比，我们在实际实验中谨慎地设计了参数化的扩散模型，基于实际的社交网络数据，这证明了我们的算法的效率和有效性。”

Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning

paper_url: http://arxiv.org/abs/2311.11077
repo_url: None
paper_authors: Clifton Poth, Hannah Sterz, Indraneil Paul, Sukannya Purkayastha, Leon Engländer, Timo Imhof, Ivan Vulić, Sebastian Ruder, Iryna Gurevych, Jonas Pfeiffer
For: 这篇论文是为了推广更有效率的转移学习方法，提供了一个开源库（Adapters），可以实现参数高效和弹性的转移学习。* Methods: 这篇论文使用了10种多样化的adapter方法，并将它们集成到一个简单的接口中，提供了使用方便和可配置的方式。* Results: 论文透过评估这些adapter方法的性能，证明了这个库的可行性和弹性，并且比较了它们与传统 fine-tuning 方法的性能。

Abstract
We introduce Adapters, an open-source library that unifies parameter-efficient and modular transfer learning in large language models. By integrating 10 diverse adapter methods into a unified interface, Adapters offers ease of use and flexible configuration. Our library allows researchers and practitioners to leverage adapter modularity through composition blocks, enabling the design of complex adapter setups. We demonstrate the library's efficacy by evaluating its performance against full fine-tuning on various NLP tasks. Adapters provides a powerful tool for addressing the challenges of conventional fine-tuning paradigms and promoting more efficient and modular transfer learning. The library is available via https://adapterhub.ml/adapters.

摘要
我们介绍了一个名为“Adapter”的开源库，这个库可以实现parameter-efficient和modular的转移学习在大型自然语言模型中。我们统一了10种不同的adapter方法，并将它们集成到了一个简单的界面中，这使得研究者和实践者可以轻松地使用和自定义adapter。我们显示了这个库的性能，与完整的 fine-tuning 进行比较，并证明了它在不同的NLP任务中的表现。“Adapter”提供了一个强大的工具，用于解决传统 fine-tuning 方法的挑战，并促进更有效和自定义的转移学习。这个库可以在https://adapterhub.ml/adapters中下载。

Community-Aware Efficient Graph Contrastive Learning via Personalized Self-Training

paper_url: http://arxiv.org/abs/2311.11073
repo_url: None
paper_authors: Yuecheng Li, Yanming Hu, Lele Fu, Chuan Chen, Lei Yang, Zibin Zheng
for: The paper is written for community detection tasks in graph-structured data, and it proposes a novel framework called Community-aware Efficient Graph Contrastive Learning (CEGCL) to jointly learn community partition and node representations in an end-to-end manner.
methods: The proposed CEGCL framework uses a personalized self-training (PeST) strategy for unsupervised scenarios, which enables the model to capture precise community-level personalized information in a graph. Additionally, the aligned graph clustering (AlGC) is employed to obtain the community partition.
results: The paper demonstrates the effectiveness of the proposed CEGCL model for community detection both theoretically and experimentally. Extensive experimental results show that CEGCL exhibits state-of-the-art performance on three benchmark datasets with different scales.

Abstract
In recent years, graph contrastive learning (GCL) has emerged as one of the optimal solutions for various supervised tasks at the node level. However, for unsupervised and structure-related tasks such as community detection, current GCL algorithms face difficulties in acquiring the necessary community-level information, resulting in poor performance. In addition, general contrastive learning algorithms improve the performance of downstream tasks by increasing the number of negative samples, which leads to severe class collision and unfairness of community detection. To address above issues, we propose a novel Community-aware Efficient Graph Contrastive Learning Framework (CEGCL) to jointly learn community partition and node representations in an end-to-end manner. Specifically, we first design a personalized self-training (PeST) strategy for unsupervised scenarios, which enables our model to capture precise community-level personalized information in a graph. With the benefit of the PeST, we alleviate class collision and unfairness without sacrificing the overall model performance. Furthermore, the aligned graph clustering (AlGC) is employed to obtain the community partition. In this module, we align the clustering space of our downstream task with that in PeST to achieve more consistent node embeddings. Finally, we demonstrate the effectiveness of our model for community detection both theoretically and experimentally. Extensive experimental results also show that our CEGCL exhibits state-of-the-art performance on three benchmark datasets with different scales.

摘要
Recently, graph contrastive learning (GCL) has emerged as one of the optimal solutions for various supervised tasks at the node level. However, for unsupervised and structure-related tasks such as community detection, current GCL algorithms have difficulty obtaining the necessary community-level information, resulting in poor performance. In addition, general contrastive learning algorithms improve the performance of downstream tasks by increasing the number of negative samples, which leads to severe class collision and unfairness of community detection. To address these issues, we propose a novel Community-aware Efficient Graph Contrastive Learning Framework (CEGCL) to jointly learn community partition and node representations in an end-to-end manner. Specifically, we first design a personalized self-training (PeST) strategy for unsupervised scenarios, which enables our model to capture precise community-level personalized information in a graph. With the benefit of the PeST, we alleviate class collision and unfairness without sacrificing the overall model performance. Furthermore, the aligned graph clustering (AlGC) is employed to obtain the community partition. In this module, we align the clustering space of our downstream task with that in PeST to achieve more consistent node embeddings. Finally, we demonstrate the effectiveness of our model for community detection both theoretically and experimentally. Extensive experimental results also show that our CEGCL exhibits state-of-the-art performance on three benchmark datasets with different scales.

SBTRec- A Transformer Framework for Personalized Tour Recommendation Problem with Sentiment Analysis

paper_url: http://arxiv.org/abs/2311.11071
repo_url: None
paper_authors: Ngai Lam Ho, Roy Ka-Wei Lee, Kwan Hui Lim
for: 提供个性化的旅游点赞推荐，帮助旅行者更好地规划旅游路线和探索当地热点。
methods: 基于BERT的旅游点序列推荐算法，利用用户检查到和上传照片来理解POI访问和距离之间的关系，并通过情感分析提高推荐准确性。
results: 与基eline算法进行比较，SBTRec实现了平均F1分数61.45%，表明其在序列预测任务中表现出色。

Abstract
When traveling to an unfamiliar city for holidays, tourists often rely on guidebooks, travel websites, or recommendation systems to plan their daily itineraries and explore popular points of interest (POIs). However, these approaches may lack optimization in terms of time feasibility, localities, and user preferences. In this paper, we propose the SBTRec algorithm: a BERT-based Trajectory Recommendation with sentiment analysis, for recommending personalized sequences of POIs as itineraries. The key contributions of this work include analyzing users' check-ins and uploaded photos to understand the relationship between POI visits and distance. We introduce SBTRec, which encompasses sentiment analysis to improve recommendation accuracy by understanding users' preferences and satisfaction levels from reviews and comments about different POIs. Our proposed algorithms are evaluated against other sequence prediction methods using datasets from 8 cities. The results demonstrate that SBTRec achieves an average F1 score of 61.45%, outperforming baseline algorithms. The paper further discusses the flexibility of the SBTRec algorithm, its ability to adapt to different scenarios and cities without modification, and its potential for extension by incorporating additional information for more reliable predictions. Overall, SBTRec provides personalized and relevant POI recommendations, enhancing tourists' overall trip experiences. Future work includes fine-tuning personalized embeddings for users, with evaluation of users' comments on POIs,~to further enhance prediction accuracy.

摘要
The paper further discusses the flexibility of the SBTRec algorithm, its ability to adapt to different scenarios and cities without modification, and its potential for extension by incorporating additional information for more reliable predictions. Overall, SBTRec provides personalized and relevant POI recommendations, enhancing tourists' overall trip experiences. Future work includes fine-tuning personalized embeddings for users, with evaluation of users' comments on POIs, to further enhance prediction accuracy.Translated into Simplified Chinese:旅游者在前往未知城市度假时，经常依靠旅游指南、旅游网站或推荐系统，以制定日程和探索流行点 интере斯（POIs）。然而，这些方法可能缺乏时间可行性、地点和用户偏好的优化。在这篇论文中，我们提出了SBTRec算法：一种基于BERT的 trajectory recommendation，具有情感分析，用于建议个性化POIs的顺序。我们的主要贡献包括分析用户检查到和上传照片，以理解POI访问和距离之间的关系。我们引入SBTRec，它包括情感分析，以提高推荐准确性。我们的提出的算法与其他序列预测方法进行比较，使用8个城市的数据。结果显示，SBTRec实现了平均F1分数为61.45%，超过基线算法。论文进一步讨论了SBTRec算法的灵活性，它可以适应不同的情况和城市，无需修改。此外，它还有扩展的潜在，通过添加更多信息，以提高预测的可靠性。总的来说，SBTRec提供了个性化和相关的POI推荐，提高旅游者的总体旅行体验。未来的工作包括个性化用户的嵌入调整，通过评估用户对POIs的评论，进一步提高预测准确性。

AIMS-EREA – A framework for AI-accelerated Innovation of Materials for Sustainability – for Environmental Remediation and Energy Applications

paper_url: http://arxiv.org/abs/2311.11060
repo_url: None
paper_authors: Sudarson Roy Pratihar, Deepesh Pai, Manaswita Nag
For: 可以用于综合考虑多种可能性和结构，快速找到适合的绿色材料，以满足可持续发展的能源和环境修复应用。* Methods: 基于密度函数理论（DFT）和其他理论，以及人工智能技术，可以快速和效率地对可能性进行筛选和预测，从而降低实验室synthesis和分析过程中的努力和成本。* Results: 通过 combing best of breed of Material Science theory with the power of Generative AI, 可以快速和高效地找到适合的绿色材料，并且可以避免生产危险副产品的可能性。

Abstract
Many environmental remediation and energy applications (conversion and storage) for sustainability need design and development of green novel materials. Discovery processes of such novel materials are time taking and cumbersome due to large number of possible combinations and permutations of materials structures. Often theoretical studies based on Density Functional Theory (DFT) and other theories, coupled with Simulations are conducted to narrow down sample space of candidate materials, before conducting laboratory-based synthesis and analytical process. With the emergence of artificial intelligence (AI), AI techniques are being tried in this process too to ease out simulation time and cost. However tremendous values of previously published research from various parts of the world are still left as labor-intensive manual effort and discretion of individual researcher and prone to human omissions. AIMS-EREA is our novel framework to blend best of breed of Material Science theory with power of Generative AI to give best impact and smooth and quickest discovery of material for sustainability. This also helps to eliminate the possibility of production of hazardous residues and bye-products of the reactions. AIMS-EREA uses all available resources -- Predictive and Analytical AI on large collection of chemical databases along with automated intelligent assimilation of deep materials knowledge from previously published research works through Generative AI. We demonstrate use of our own novel framework with an example, how this framework can be successfully applied to achieve desired success in development of thermoelectric material for waste heat conversion.

摘要
多种环境恢复和能源应用（转化和存储）需要设计和开发绿色新材料。发现这些新材料的过程是时间consuming和复杂，因为有很多可能的组合和排序结构。经常通过密度函数理论（DFT）和其他理论，加上计算机模拟，来缩小实验室合成和分析过程中的样本空间。随着人工智能（AI）的出现，AI技术也在这个过程中使用，以减少计算时间和成本。然而，大量前期发表的研究成果仍然受到劳动密集和个人研究者的主观性的影响，容易出现人类缺失。我们的AIMS-EREA框架通过融合材料科学理论和生成AI的力量，为可持续发展提供了最佳影响和最快速的材料发现。此外，它还可以消除生产过程中可能产生的危险副产品。AIMS-EREA利用了所有可用资源——预测和分析AI在大量化学数据库中，以及自动智能吸收深入材料知识从前期发表的研究作品中。我们示例如如何使用我们的框架成功应用于废热电转换材料的开发。

Designing Interpretable ML System to Enhance Trustworthy AI in Healthcare: A Systematic Review of the Last Decade to A Proposed Robust Framework

paper_url: http://arxiv.org/abs/2311.11055
repo_url: None
paper_authors: Elham Nasarian, Roohallah Alizadehsani, U. Rajendra Acharyac, d Kwok-Leung Tsui
for: This paper aims to review and discuss the processes and challenges of interpretable machine learning (IML) and explainable AI (XAI) in healthcare, with a focus on quality control and the importance of robust interpretability.methods: The paper uses a systematic literature review approach, searching PubMed, Scopus, and Web of Science databases using specific strings to identify relevant studies. The IML process is classified into three stages: data pre-processing interpretability, interpretable modeling, and post-processing interpretability.results: The paper provides experimental results to establish the importance of robust interpretability in healthcare, and offers insights for creating communicable clinician-AI tools. The survey also introduces a step-by-step roadmap for implementing XAI in clinical applications, addressing existing gaps and acknowledging XAI model limitations.

Abstract
AI-based medical technologies, including wearables, telemedicine, LLMs, and digital care twins, significantly impact healthcare. Ensuring AI results are accurate and interpretable is crucial, especially for clinicians. This paper reviews processes and challenges of interpretable ML (IML) and explainable AI (XAI) in healthcare. Objectives include reviewing XAI processes, methods, applications, and challenges, with a focus on quality control. The IML process is classified into data pre-processing interpretability, interpretable modeling, and post-processing interpretability. The paper aims to establish the importance of robust interpretability in healthcare through experimental results, providing insights for creating communicable clinician-AI tools. Research questions, eligibility criteria, and goals were identified following PRISMA and PICO methods. PubMed, Scopus, and Web of Science were systematically searched using specific strings. The survey introduces a step-by-step roadmap for implementing XAI in clinical applications, addressing existing gaps and acknowledging XAI model limitations.

摘要
人工智能技术在医疗领域的应用，包括智能服务器、远程医疗、语言模型和数字护理双，对医疗业产生了深远的影响。为确保人工智能结果准确和可解释，特别是 для临床医生，在医疗领域中确保可解释的机器学习（IML）和可解释人工智能（XAI）的过程和挑战是非常重要。本文将对可解释ML（IML）和可解释人工智能（XAI）在医疗领域的过程和挑战进行了评估。包括数据预处理可解释、可解释模型和后处理可解释在内的IML过程将被分类。本文的目标是通过实验结果证明可Robust可解释在医疗领域的重要性，并为创建可通信的医生-AI工具提供了新的发现。根据PRISMA和PICO方法，我们定义了研究问题、适用性标准和目标。通过对PubMed、Scopus和Web of Science等数据库进行系统性搜索，我们使用特定的搜索串检索相关文献。本文将提供一个步骤并进的路线图，以帮助实施XAI在临床应用中，并解决现有的坑害和XAI模型的限制。

Orca 2: Teaching Small Language Models How to Reason

paper_url: http://arxiv.org/abs/2311.11045
repo_url: None
paper_authors: Arindam Mitra, Luciano Del Corro, Shweti Mahajan, Andres Codas, Clarisse Simoes, Sahaj Agrawal, Xuxi Chen, Anastasia Razdaibiedina, Erik Jones, Kriti Aggarwal, Hamid Palangi, Guoqing Zheng, Corby Rosset, Hamed Khanpour, Ahmed Awadallah
for: 本研究旨在探讨如何通过改进训练信号来提高小型语言模型（LM）的逻辑能力。
methods: 研究人员使用了不同的解释Trace来训练小型LM，并研究了不同的解释策略（如步骤解释、记忆然后生成、记忆然后解释生成等），以帮助模型选择最佳解释策略以适应不同任务。
results: Orca 2在15个多样化的 benchmarck 上表现出优于同类型模型和大型模型的 Zero-shot 表现，并在复杂任务中达到了与大型模型相当或更好的水平。

Abstract
Orca 1 learns from rich signals, such as explanation traces, allowing it to outperform conventional instruction-tuned models on benchmarks like BigBench Hard and AGIEval. In Orca 2, we continue exploring how improved training signals can enhance smaller LMs' reasoning abilities. Research on training small LMs has often relied on imitation learning to replicate the output of more capable models. We contend that excessive emphasis on imitation may restrict the potential of smaller models. We seek to teach small LMs to employ different solution strategies for different tasks, potentially different from the one used by the larger model. For example, while larger models might provide a direct answer to a complex task, smaller models may not have the same capacity. In Orca 2, we teach the model various reasoning techniques (step-by-step, recall then generate, recall-reason-generate, direct answer, etc.). More crucially, we aim to help the model learn to determine the most effective solution strategy for each task. We evaluate Orca 2 using a comprehensive set of 15 diverse benchmarks (corresponding to approximately 100 tasks and over 36,000 unique prompts). Orca 2 significantly surpasses models of similar size and attains performance levels similar or better to those of models 5-10x larger, as assessed on complex tasks that test advanced reasoning abilities in zero-shot settings. We open-source Orca 2 to encourage further research on the development, evaluation, and alignment of smaller LMs.

摘要
鲸鱼1从 ricSignals 学习，如解释迹象，使其在 BigBench Hard 和 AGIEval 上表现出色。在 Orca 2 中，我们继续探索如何提高训练信号可以提高小LMs 的理解能力。研究小LMs 的训练通常采用了模仿学习，以复制更强大的模型的输出。我们认为过分强调模仿可能会限制小LMs 的潜力。我们尝试教育小LMs 使用不同的解决方案来解决不同任务，可能与更大的模型不同。例如，更大的模型可能会提供一个直接回答复杂任务的方法，而小LMs 可能没有同样的容量。在 Orca 2 中，我们教育模型多种理解技巧（步骤、回忆然后生成、回忆然后生成、直接回答等）。更重要的是，我们努力帮助模型学习选择每个任务最有效的解决方案。我们使用了15种多样化的benchmark（相当于100个任务和36,000个唯一提问）来评估 Orca 2。结果显示，Orca 2 在复杂任务中表现出色，并在零容量情况下与5-10倍大的模型相当或更好的表现。我们将 Orca 2 开源，以便进一步研究小LMs 的发展、评估和对齐。

Synthetic Data Generation for Bridging Sim2Real Gap in a Production Environment

paper_url: http://arxiv.org/abs/2311.11039
repo_url: None
paper_authors: Parth Rawal, Mrunal Sompura, Wolfgang Hintze
for: 这篇论文的目的是提出一种基于模拟数据生成的方法，用于帮助机器人在生产环境中使用人工智能。
methods: 论文使用的方法包括基于模拟数据的生成和组合，以帮助减少实际环境和模拟环境之间的差异。
results: 试验结果表明，使用这些基本方法的组合可以在生产环境中减少模拟到现实之间的差异，从而提高机器人在生产中的表现。

Abstract
Synthetic data is being used lately for training deep neural networks in computer vision applications such as object detection, object segmentation and 6D object pose estimation. Domain randomization hereby plays an important role in reducing the simulation to reality gap. However, this generalization might not be effective in specialized domains like a production environment involving complex assemblies. Either the individual parts, trained with synthetic images, are integrated in much larger assemblies making them indistinguishable from their counterparts and result in false positives or are partially occluded just enough to give rise to false negatives. Domain knowledge is vital in these cases and if conceived effectively while generating synthetic data, can show a considerable improvement in bridging the simulation to reality gap. This paper focuses on synthetic data generation procedures for parts and assemblies used in a production environment. The basic procedures for synthetic data generation and their various combinations are evaluated and compared on images captured in a production environment, where results show up to 15% improvement using combinations of basic procedures. Reducing the simulation to reality gap in this way can aid to utilize the true potential of robot assisted production using artificial intelligence.

摘要

Data Center Audio/Video Intelligence on Device (DAVID) – An Edge-AI Platform for Smart-Toys

paper_url: http://arxiv.org/abs/2311.11030
repo_url: None
paper_authors: Gabriel Cosache, Francisco Salgado, Cosmin Rotariu, George Sterpu, Rishabh Jain, Peter Corcoran
for: 这份论文主要用于介绍一种 Edge AI 平台，即 DAVID Smart-Toy 平台，该平台包含了高级低功耗数据处理的神经推理模型，并与相关的图像或音频感知器一起嵌入在设备中。
methods: 该平台使用了神经推理模型进行数据处理，并提供了在设备内部进行文本识别和语音生成功能。
results: 该平台可以根据用户的语音和面部表达进行识别和 интерпретаción，并且具有嵌入式的数据保护功能，以保障用户的隐私。

Abstract
An overview is given of the DAVID Smart-Toy platform, one of the first Edge AI platform designs to incorporate advanced low-power data processing by neural inference models co-located with the relevant image or audio sensors. There is also on-board capability for in-device text-to-speech generation. Two alternative embodiments are presented: a smart Teddy-bear, and a roving dog-like robot. The platform offers a speech-driven user interface and can observe and interpret user actions and facial expressions via its computer vision sensor node. A particular benefit of this design is that no personally identifiable information passes beyond the neural inference nodes thus providing inbuilt compliance with data protection regulations.

摘要
TEXT这里提供了DAVID智能玩具平台的总览，这是首先采用进步低功耗神经推论模型与相应的图像或语音感应器集成的 Edge AI 平台设计。它还具有内置的文本转语音功能。这两个版本中的一个是聪明的 teddy bear，另一个是一只行走的狗like 机器人。这个平台具有语音驱动的用户界面，可以通过计算机视觉感应器监测和解读用户的动作和表情。特别的是，这个设计不会将个人识别信息传递到神经推论节点以外，因此提供了内置的数据保护规定的实现。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

Geometric Data Augmentations to Mitigate Distribution Shifts in Pollen Classification from Microscopic Images

paper_url: http://arxiv.org/abs/2311.11029
repo_url: None
paper_authors: Nam Cao, Olga Saukh
for: 本研究旨在解决机器学习模型在真实应用场景中的精度下降问题，具体是对野外采集的蜕花粉样图像使用低成本摄像头摄取器进行分类。
methods: 我们利用了领域知识，即形态特征对准精确蜕花识别非常重要，并引入了两种新的几何图像增强技术来减少模型在训练和测试数据集之间的准确度差距。特别是，我们表明了Tenengrad和ImageToSketch筛选器能够很好地补做形态和文本信息，而不会让模型受到不重要的细节的干扰。
results: 我们进行了广泛的评估，并证明了 geometric 增强技术可以在不同的模型架构上提供一定的改进，其中最高达14%。此外，我们还进行了减少采集的滤波器和图像增强器的综合评估，并证明了我们的几何增强技术在文献中的评分最高。

Abstract
Distribution shifts are characterized by differences between the training and test data distributions. They can significantly reduce the accuracy of machine learning models deployed in real-world scenarios. This paper explores the distribution shift problem when classifying pollen grains from microscopic images collected in the wild with a low-cost camera sensor. We leverage the domain knowledge that geometric features are highly important for accurate pollen identification and introduce two novel geometric image augmentation techniques to significantly narrow the accuracy gap between the model performance on the train and test datasets. In particular, we show that Tenengrad and ImageToSketch filters are highly effective to balance the shape and texture information while leaving out unimportant details that may confuse the model. Extensive evaluations on various model architectures demonstrate a consistent improvement of the model generalization to field data of up to 14% achieved by the geometric augmentation techniques when compared to a wide range of standard image augmentations. The approach is validated through an ablation study using pollen hydration tests to recover the shape of dry pollen grains. The proposed geometric augmentations also receive the highest scores according to the affinity and diversity measures from the literature.

摘要
分布Shift问题是指训练和测试数据之间的差异，可能导致机器学习模型在实际应用场景中的精度下降。本文探讨了在野外采集的杂花粉胞微scopic图像中的分布Shift问题，并利用域知识，认为 геометрические特征对于准确的杂花识别非常重要。我们引入了两种新的地形图像增强技术，以减少模型在训练和测试数据集之间的精度差距。特别是，我们表明了Tenengrad和ImageToSketch筛选器在平衡形态和文本信息的同时，留下无关重要信息的能力，可以提高模型的总体性能。我们进行了对多种模型架构的广泛评估，并证明了地形增强技术可以在 field data 上提高模型总体性能达到14%。我们还进行了一项ablation study，用气压测试来恢复干燥杂花粉胞的形状，以验证我们的方法。此外，我们的地形增强技术也在文献中得到了最高的评分。

Lesion Search with Self-supervised Learning

paper_url: http://arxiv.org/abs/2311.11014
repo_url: None
paper_authors: Kristin Qi, Jiali Cheng, Daniel Haehn
for: 帮助临床专业人员更快地查找相似图像，不需要手动标注。
methods: 使用自动学习的对比学习（SimCLR）来实现内容基于图像检索（CBIR），并使用Generalized-mean（GeM）池化和L2normalization来分类疾病类型和检索相似图像。
results: 实现了提高的表现。 Additionally, 开发了一个开源的图像分析和检索应用程序，易于集成，可能对临床专业人员的日常活动产生潜在支持。

Abstract
Content-based image retrieval (CBIR) with self-supervised learning (SSL) accelerates clinicians' interpretation of similar images without manual annotations. We develop a CBIR from the contrastive learning SimCLR and incorporate a generalized-mean (GeM) pooling followed by L2 normalization to classify lesion types and retrieve similar images before clinicians' analysis. Results have shown improved performance. We additionally build an open-source application for image analysis and retrieval. The application is easy to integrate, relieving manual efforts and suggesting the potential to support clinicians' everyday activities.

摘要

Multiple View Geometry Transformers for 3D Human Pose Estimation

paper_url: http://arxiv.org/abs/2311.10983
repo_url: None
paper_authors: Ziwei Liao, Jialiang Zhu, Chunyu Wang, Han Hu, Steven L. Waslander
for: 提高多视图3D人姿估计中Transformers的3D理解能力
methods: 提出了一种新的混合模型MVGFormer，包括不可变的geometry模块和可变的appearance模块，可以同时处理视角依赖的3D任务和图像信号
results: 对于域内和域外设置，模型均表现出优于当前状态艺法，特别是在域外设置下表现出了明显的优势，并且可以在新的摄像头和几何学上进行普适化。

Abstract
In this work, we aim to improve the 3D reasoning ability of Transformers in multi-view 3D human pose estimation. Recent works have focused on end-to-end learning-based transformer designs, which struggle to resolve geometric information accurately, particularly during occlusion. Instead, we propose a novel hybrid model, MVGFormer, which has a series of geometric and appearance modules organized in an iterative manner. The geometry modules are learning-free and handle all viewpoint-dependent 3D tasks geometrically which notably improves the model's generalization ability. The appearance modules are learnable and are dedicated to estimating 2D poses from image signals end-to-end which enables them to achieve accurate estimates even when occlusion occurs, leading to a model that is both accurate and generalizable to new cameras and geometries. We evaluate our approach for both in-domain and out-of-domain settings, where our model consistently outperforms state-of-the-art methods, and especially does so by a significant margin in the out-of-domain setting. We will release the code and models: https://github.com/XunshanMan/MVGFormer.

摘要
在这项工作中，我们目标是提高 transformer 在多视图三维人姿估计中的3D理解能力。先前的工作主要集中于终端学习基于 transformer 的设计，它们在 occlusion 时具有准确地理信息的解决能力有限。相比之下，我们提议一种新的混合模型，MVGFormer，它包括一系列具有学习自由的geometry模块和学习端到端的 appearance模块，这些模块在执行循环方式下组织。geometry模块负责所有视点依赖的3D任务，可以大幅提高模型的泛化能力。appearance模块通过 directly 从图像信号中学习2D姿势，可以在 occlusion 时达到高度准确的估计，导致模型具有高度准确和泛化性。我们在域内和域外设置下评估了我们的方法，其中我们的方法在域外设置下一直保持领先，特别是在 occlusion 情况下，我们的方法表现出了明显的优势。我们将在 GitHub 上发布代码和模型：https://github.com/XunshanMan/MVGFormer。

HungerGist: An Interpretable Predictive Model for Food Insecurity

paper_url: http://arxiv.org/abs/2311.10953
repo_url: None
paper_authors: Yongsu Ahn, Muheng Yan, Yu-Ru Lin, Zian Wang
for: This paper aims to address the critical need for advanced early warning systems to combat escalating food insecurity in Africa, which is caused by factors such as war, climate change, and poverty.methods: The paper introduces a multi-task deep learning model called “HungerGist” that utilizes news texts and natural language processing (NLP) techniques to analyze and predict food insecurity.results: The model outperforms the baseline method trained on both traditional risk factors and human-curated keywords, and has the ability to detect critical texts that contain interpretable signals known as “gists.” Additionally, the approach has the potential to reveal latent factors that would otherwise remain concealed in unstructured texts.

Abstract
The escalating food insecurity in Africa, caused by factors such as war, climate change, and poverty, demonstrates the critical need for advanced early warning systems. Traditional methodologies, relying on expert-curated data encompassing climate, geography, and social disturbances, often fall short due to data limitations, hindering comprehensive analysis and potential discovery of new predictive factors. To address this, this paper introduces "HungerGist", a multi-task deep learning model utilizing news texts and NLP techniques. Using a corpus of over 53,000 news articles from nine African countries over four years, we demonstrate that our model, trained solely on news data, outperforms the baseline method trained on both traditional risk factors and human-curated keywords. In addition, our method has the ability to detect critical texts that contain interpretable signals known as "gists." Moreover, our examination of these gists indicates that this approach has the potential to reveal latent factors that would otherwise remain concealed in unstructured texts.

摘要
“非洲的食品不安全升级，由于战争、气候变化和贫困等因素，表明了高度需要先进早期警示系统。传统的方法，依靠专家手动维护的数据，包括气候、地理和社会冲击，经常因数据限制而受到限制，阻碍了全面分析和潜在的新预测因素的发现。为解决这个问题，本文介绍了“饥饿精”，一种多任务深度学习模型，使用新闻文本和自然语言处理技术。使用9个非洲国家的4年新闻文章 corps（总计53,000篇），我们表明了我们的模型，通过新闻数据进行训练，比基eline方法（基于传统风险因素和人工标记）更高效。此外，我们的方法还能探测关键的新闻文本，含有可解释的信号，称为“精”。此外，我们的研究表明，这种方法有潜在的发现隐藏在未结构化文本中的因素的潜力。”

RecExplainer: Aligning Large Language Models for Recommendation Model Interpretability

paper_url: http://arxiv.org/abs/2311.10947
repo_url: None
paper_authors: Yuxuan Lei, Jianxun Lian, Jing Yao, Xu Huang, Defu Lian, Xing Xie
for: 这个论文的目的是提出一种新的模型解释方法，使用大语言模型（LLM）作为庸服务器模型的代理模型，以便更好地理解和解释推荐模型的行为。
methods: 该方法使用了三种Alignment方法：行为对齐、意图对齐和混合对齐。行为对齐在语言空间中表示用户喜好和物品信息为文本，以学习推荐模型的行为；意图对齐在推荐模型的latent空间中工作，使用用户和物品表示来理解模型的行为；混合对齐组合了语言和latent空间进行对齐训练。
results: 经过测试，该方法能够有效地使LLMs理解推荐模型的行为，并生成高度可信的推荐解释。

Abstract
Recommender systems are widely used in various online services, with embedding-based models being particularly popular due to their expressiveness in representing complex signals. However, these models often lack interpretability, making them less reliable and transparent for both users and developers. With the emergence of large language models (LLMs), we find that their capabilities in language expression, knowledge-aware reasoning, and instruction following are exceptionally powerful. Based on this, we propose a new model interpretation approach for recommender systems, by using LLMs as surrogate models and learn to mimic and comprehend target recommender models. Specifically, we introduce three alignment methods: behavior alignment, intention alignment, and hybrid alignment. Behavior alignment operates in the language space, representing user preferences and item information as text to learn the recommendation model's behavior; intention alignment works in the latent space of the recommendation model, using user and item representations to understand the model's behavior; hybrid alignment combines both language and latent spaces for alignment training. To demonstrate the effectiveness of our methods, we conduct evaluation from two perspectives: alignment effect, and explanation generation ability on three public datasets. Experimental results indicate that our approach effectively enables LLMs to comprehend the patterns of recommendation models and generate highly credible recommendation explanations.

摘要
推荐系统在线服务中广泛应用，尤其是基于嵌入式模型，因其可以表达复杂的信号。然而，这些模型通常缺乏可读性，使得用户和开发者对其不可靠和透明度感到不满。随着大语言模型（LLMs）的出现，我们发现它们在语言表达、知识感知和指令遵循方面具有极高的能力。基于这一点，我们提出了一种新的推荐系统模型解释方法，通过使用 LLMS 作为代理模型，并学习模仿和理解目标推荐模型的行为。 Specifically, we introduce three alignment methods: 行为对齐、意图对齐和混合对齐。行为对齐在语言空间中表示用户偏好和物品信息为文本，以学习推荐模型的行为;意图对齐在推荐模型的latent空间中使用用户和物品表示，以理解模型的行为;混合对齐将语言和latent空间进行对齐训练。为证明我们的方法的有效性，我们从两个角度进行评估：对齐效果和解释生成能力，并在三个公共数据集上进行实验。实验结果表明，我们的方法可以有效地使 LLMS 理解推荐模型的模式，并生成高可信度的推荐解释。

An Empirical Bayes Framework for Open-Domain Dialogue Generation

paper_url: http://arxiv.org/abs/2311.10945
repo_url: None
paper_authors: Jing Yang Lee, Kong Aik Lee, Woon-Seng Gan
for: 开发一个可以与人类用户进行意义性对话的开放领域对话机器人。
methods: 使用预训练语言模型和 bayesian 方法来建立一个 Bayesian 开放领域对话机器人。
results: BODEB Framework 在多样性和协调性两个方面都达到了更好的结果，比较于Variational Frameworks。

Abstract
To engage human users in meaningful conversation, open-domain dialogue agents are required to generate diverse and contextually coherent dialogue. Despite recent advancements, which can be attributed to the usage of pretrained language models, the generation of diverse and coherent dialogue remains an open research problem. A popular approach to address this issue involves the adaptation of variational frameworks. However, while these approaches successfully improve diversity, they tend to compromise on contextual coherence. Hence, we propose the Bayesian Open-domain Dialogue with Empirical Bayes (BODEB) framework, an empirical bayes framework for constructing an Bayesian open-domain dialogue agent by leveraging pretrained parameters to inform the prior and posterior parameter distributions. Empirical results show that BODEB achieves better results in terms of both diversity and coherence compared to variational frameworks.

摘要
要让人工智能对话机器人与人进行有意义的对话，开放领域对话代理需要生成多样化和上下文相关的对话。尽管最近的进步可以归功于预训练语言模型的使用，但生成多样化和上下文相关的对话仍然是一个开放的研究问题。一种受欢迎的方法来解决这个问题是适应变量框架。然而，这些方法通常会牺牲上下文相关性。因此，我们提出了概率开放领域对话框架（BODEB），一种基于预训练参数的 bayesian 开放领域对话代理。实际结果表明，BODEB 在多样性和上下文相关性两个方面都比变量框架更好。

Practical Estimation of Ensemble Accuracy

paper_url: http://arxiv.org/abs/2311.10940
repo_url: https://github.com/sayantann11/all-classification-templetes-for-ML
paper_authors: Simi Haber, Yonatan Wexler
for: 这个论文是为了提出一种可行的方法来估计多个分类器的共同力量，而不需要标签信息，因此可以在无监督学习的大规模数据集上进行实际应用。
methods: 该方法基于一种新的 combinatorial bound 来估计分类器的集合准确率，这个 bound 可以在样本数量linear增长的情况下高效地近似，从而实现一个高效地寻找高 JOIN 准确率的分类器集合的方法。
results: 作者在popular的大规模人脸识别数据集上进行了实验，并证明了该方法的可行性和实用性，同时该方法可以在无监督学习的情况下提供高度的精度和稳定性。

Abstract
Ensemble learning combines several individual models to obtain better generalization performance. In this work we present a practical method for estimating the joint power of several classifiers which differs from existing approaches by {\em not relying on labels}, hence enabling the work in unsupervised setting of huge datasets. It differs from existing methods which define a "diversity measure". The heart of the method is a combinatorial bound on the number of mistakes the ensemble is likely to make. The bound can be efficiently approximated in time linear in the number of samples. Thus allowing an efficient search for a combination of classifiers that are likely to produce higher joint accuracy. Moreover, having the bound applicable to unlabeled data makes it both accurate and practical in modern setting of unsupervised learning. We demonstrate the method on popular large-scale face recognition datasets which provide a useful playground for fine-grain classification tasks using noisy data over many classes. The proposed framework fits neatly in trending practices of unsupervised learning. It is a measure of the inherent independence of a set of classifiers not relying on extra information such as another classifier or labeled data.

摘要
ensemble learning可以提高总体化性能，在这个工作中我们提出了一种实用的方法，不同于现有方法，这种方法不需要标签，因此可以在无标签 dataset 上进行学习。它与现有方法不同，这种方法定义一个“多样性度量”。 ensemble learning 的核心思想是一种可以有效地估计多个分类器的结合力的 bounds，这个 bounds 可以在样本数 linear 时间内efficiently aproximated。因此，可以有效地搜索一组可能 producen higher 的 joint accuracy 的分类器组合。此外，由于这个 bounds 适用于无标签数据，这使得它在现代无supervised learning 中具有准确性和实用性。我们在流行的大规模人脸识别 dataset 上进行了示例，这些 dataset 提供了一个有用的游戏场景，用于精细的分类任务，使用噪音数据。our proposed framework 适合当前流行的无supervised learning 做法，它是一种不依赖于其他分类器或标签数据的独立性度量。

Case Repositories: Towards Case-Based Reasoning for AI Alignment

paper_url: http://arxiv.org/abs/2311.10934
repo_url: None
paper_authors: K. J. Kevin Feng, Quan Ze, Chen, Inyoung Cheong, King Xia, Amy X. Zhang
for: 该论文旨在提出一种基于情感reasoning（CBR）的AI对齐方法，以解决在不同个人和社区中存在复杂和uncertain的社会问题时，AI系统如何对齐。
methods: 该论文提出了一种结合案例研究（CBR）的方法，包括收集seed例题（问题），召集领域专家来确定领域特定的关键维度，使用自然语言处理技术生成不同的案例，并通过公众参与来评判和改进案例。
results: 该论文认为，通过建立一个包含多种案例的案例库，可以帮助AI系统对齐，同时也可以提供一个让人们进行道德反思的平台。

Abstract
Case studies commonly form the pedagogical backbone in law, ethics, and many other domains that face complex and ambiguous societal questions informed by human values. Similar complexities and ambiguities arise when we consider how AI should be aligned in practice: when faced with vast quantities of diverse (and sometimes conflicting) values from different individuals and communities, with whose values is AI to align, and how should AI do so? We propose a complementary approach to constitutional AI alignment, grounded in ideas from case-based reasoning (CBR), that focuses on the construction of policies through judgments on a set of cases. We present a process to assemble such a case repository by: 1) gathering a set of ``seed'' cases -- questions one may ask an AI system -- in a particular domain from discussions in online communities, 2) eliciting domain-specific key dimensions for cases through workshops with domain experts, 3) using LLMs to generate variations of cases not seen in the wild, and 4) engaging with the public to judge and improve cases. We then discuss how such a case repository could assist in AI alignment, both through directly acting as precedents to ground acceptable behaviors, and as a medium for individuals and communities to engage in moral reasoning around AI

摘要
法律、伦理和其他领域的教学经常作为智能机器的准则核心，面临着复杂和uncertain的社会问题，受到人类价值观的框架。在实践中，AI的Alignment也面临着类似的复杂性和uncertainty，问题是AI与 whose values should it align, and how should it do so? We propose a complementary approach to constitutional AI alignment, based on ideas from case-based reasoning (CBR), which focuses on constructing policies through judgments on a set of cases. We present a process to assemble such a case repository by:1. 收集域 especific seed cases（问题可以对AI系统提问）from online community discussions,2. 通过域专家组织工作shop elicit domain-specific key dimensions for cases,3. 使用LLMs生成不同from wild variations of cases,4. 与公众交流，评估和改进cases.然后，我们讨论了如何使用这个案例库来帮助AI的Alignment，包括直接作为行为的根据，以及作为人们和社区们在AI的伦理思考中的媒介。

Representing visual classification as a linear combination of words

paper_url: http://arxiv.org/abs/2311.10933
repo_url: https://github.com/lotterlab/task_word_explainability
paper_authors: Shobhit Agarwal, Yevgeniy R. Semenov, William Lotter
for: 这个论文的目的是解释深度学习模型在医疗领域中的决策过程，并提供一种基于视觉和语言的解释策略。
methods: 该论文使用了一种视觉语言模型来identify语言基于描述器，并使用这些描述器来描述视觉分类任务的决策过程。
results: 研究发现，使用这种解释策略可以提供一些与临床知识相符的描述器，并且可以帮助非专业人员完成一些特殊的医疗任务。同时，研究还发现了公共数据集中存在一些”短cut连接”的问题。

Abstract
Explainability is a longstanding challenge in deep learning, especially in high-stakes domains like healthcare. Common explainability methods highlight image regions that drive an AI model's decision. Humans, however, heavily rely on language to convey explanations of not only "where" but "what". Additionally, most explainability approaches focus on explaining individual AI predictions, rather than describing the features used by an AI model in general. The latter would be especially useful for model and dataset auditing, and potentially even knowledge generation as AI is increasingly being used in novel tasks. Here, we present an explainability strategy that uses a vision-language model to identify language-based descriptors of a visual classification task. By leveraging a pre-trained joint embedding space between images and text, our approach estimates a new classification task as a linear combination of words, resulting in a weight for each word that indicates its alignment with the vision-based classifier. We assess our approach using two medical imaging classification tasks, where we find that the resulting descriptors largely align with clinical knowledge despite a lack of domain-specific language training. However, our approach also identifies the potential for 'shortcut connections' in the public datasets used. Towards a functional measure of explainability, we perform a pilot reader study where we find that the AI-identified words can enable non-expert humans to perform a specialized medical task at a non-trivial level. Altogether, our results emphasize the potential of using multimodal foundational models to deliver intuitive, language-based explanations of visual tasks.

摘要
explainability 是深度学习中长期的挑战，尤其在医疗领域。常见的解释方法会强调图像区域，帮助人类理解 AI 模型做出的决定。然而，人类主要通过语言来传达解释，不仅包括 "where"，还包括 "what"。此外，大多数解释方法都是解释个别 AI 预测，而不是描述 AI 模型在总体上使用的特征。后者尤其有用于模型和数据集 Auditing，以及可能even 生成知识，因为 AI 在新任务中使用的情况在增加。在这里，我们提出了一种解释策略，使用视觉语言模型来identify图像中的语言基于描述器。我们利用预训练的共同 embedding空间，以图像和文本的 JOINT embedding space，我们的方法可以将新的分类任务看作是一种线性组合的 слова，从而得到一个对应于每个单词的权重，这个权重指示单词与视觉基于分类器的Alignment。我们通过两个医疗成像分类任务进行评估，发现我们的方法可以获得与临床知识相当的描述器，即使没有域 específico 语言培训。然而，我们的方法还发现了公共数据集中的 "短cut 连接" 问题。为了评估函数性的解释度量，我们进行了一个 Pilot 读者研究，发现 AI 标识的单词可以帮助非专业人员在特殊医疗任务中达到非常轻量级的性能。总之，我们的结果强调了使用多Modal 基础模型来提供直观的语言基于解释，以便更好地理解视觉任务。

Cognitive bias in large language models: Cautious optimism meets anti-Panglossian meliorism

paper_url: http://arxiv.org/abs/2311.10932
repo_url: None
paper_authors: David Thorstad
for: 本文旨在探讨大语言模型中的偏见问题，特别是在不公正对待少数群体方面。
methods: 本文使用了现代语言模型评估方法，并对现有的偏见研究进行了探讨和分析。
results: 本文指出，现有的大语言模型可能存在一些偏见问题，并提出了一些途径来减少这些偏见。同时，本文也探讨了人类认知偏见的哲学意义以及模型偏见的原因。

Abstract
Traditional discussions of bias in large language models focus on a conception of bias closely tied to unfairness, especially as affecting marginalized groups. Recent work raises the novel possibility of assessing the outputs of large language models for a range of cognitive biases familiar from research in judgment and decisionmaking. My aim in this paper is to draw two lessons from recent discussions of cognitive bias in large language models: cautious optimism about the prevalence of bias in current models coupled with an anti-Panglossian willingness to concede the existence of some genuine biases and work to reduce them. I draw out philosophical implications of this discussion for the rationality of human cognitive biases as well as the role of unrepresentative data in driving model biases.

摘要
传统的大语言模型偏见讨论围绕着不公正性，特别是对弱势群体产生影响。 current work 提出了评估大语言模型输出的多种认知偏见的新可能性，这些偏见 Familiar from research on judgment and decision-making. My goal in this paper is to draw two lessons from recent discussions of cognitive bias in large language models: cautious optimism about the prevalence of bias in current models, combined with an anti-Panglossian willingness to acknowledge the existence of some genuine biases and work to reduce them. I will draw out the philosophical implications of this discussion for human cognitive biases and the role of unrepresentative data in driving model biases.Note: "Panglossian" refers to the tendency to overlook or downplay the existence of negative aspects or biases, named after the character Dr. Pangloss in Voltaire's Candide. The term "anti-Panglossian" is used to describe a willingness to acknowledge and address such biases.

CAMRA: Copilot for AMR Annotation

paper_url: http://arxiv.org/abs/2311.10928
repo_url: None
paper_authors: Jon Z. Cai, Shafiuddin Rehan Ahmed, Julia Bonn, Kristin Wright-Bettner, Martha Palmer, James H. Martin
for: 本研究旨在开发一个基于网络语言文本的抽象含义表示（AMR）创建工具——CAMRA（编程语言类型的 AMR 编辑器）。
methods: CAMRA 使用了一种新的方法，将 AMR 注释视为编程语言中的编程，通过利用编程语言的概念，帮助用户更好地理解和使用 AMR 注释。
results: CAMRA 可以快速和准确地生成 AMR 注释，并且可以帮助用户更好地理解和使用 Propbank 角色集。Here’s the breakdown of each point in English:
for: The paper is aimed at developing a tool for creating Abstract Meaning Representations (AMR) from natural language text, called CAMRA (a programming language-like AMR editor).
methods: CAMRA uses a novel approach that treats AMR annotation as coding in programming languages, leveraging the familiarity of programming paradigms to help users better understand and use AMR annotation.
results: CAMRA can quickly and accurately generate AMR annotation, and can also help users better understand and use Propbank role sets.

Abstract
In this paper, we introduce CAMRA (Copilot for AMR Annotatations), a cutting-edge web-based tool designed for constructing Abstract Meaning Representation (AMR) from natural language text. CAMRA offers a novel approach to deep lexical semantics annotation such as AMR, treating AMR annotation akin to coding in programming languages. Leveraging the familiarity of programming paradigms, CAMRA encompasses all essential features of existing AMR editors, including example lookup, while going a step further by integrating Propbank roleset lookup as an autocomplete feature within the tool. Notably, CAMRA incorporates AMR parser models as coding co-pilots, greatly enhancing the efficiency and accuracy of AMR annotators. To demonstrate the tool's capabilities, we provide a live demo accessible at: https://camra.colorado.edu

摘要
在这篇论文中，我们介绍了 CAMRA（抽象含义表示语言编辑器），一款前沿的网页式工具，用于从自然语言文本中构建抽象含义表示（AMR）。CAMRA 采用了一种新的方法来进行深层次语义注释，将 AMR 注释视为编程语言中的编程。利用编程语言的概念互联，CAMRA 包含了所有现有 AMR 编辑器中的重要功能，包括示例查询，同时还添加了基于 Propbank 角色集Lookup 的自动完成功能。另外，CAMRA 还 integrate了 AMR 解析模型，以增强 AMR 注释的效率和准确性。为证明工具的能力，我们提供了一个实时示例，可以在以下链接中访问：https://camra.colorado.edu。

Explainable Product Classification for Customs

paper_url: http://arxiv.org/abs/2311.10922
repo_url: None
paper_authors: Eunji Lee, Sihyeon Kim, Sundong Kim, Soyeon Jung, Heeja Kim, Meeyoung Cha
for: 这研究旨在提供一个可解释的决策支持模型，帮助海关官员分配国际通用商品代码（HS码）。
methods: 该模型使用了机器学习算法，并提供了可解释的理由文档，以帮助海关官员理解和采纳建议。
results: 研究结果表明，该模型对925个困难子类别中的前三个建议准确率为93.9%。用户研究也表明，该算法提供的可解释建议和理由文档可以有效减少海关官员的分类审核时间和劳动。

Abstract
The task of assigning internationally accepted commodity codes (aka HS codes) to traded goods is a critical function of customs offices. Like court decisions made by judges, this task follows the doctrine of precedent and can be nontrivial even for experienced officers. Together with the Korea Customs Service (KCS), we propose a first-ever explainable decision supporting model that suggests the most likely subheadings (i.e., the first six digits) of the HS code. The model also provides reasoning for its suggestion in the form of a document that is interpretable by customs officers. We evaluated the model using 5,000 cases that recently received a classification request. The results showed that the top-3 suggestions made by our model had an accuracy of 93.9\% when classifying 925 challenging subheadings. A user study with 32 customs experts further confirmed that our algorithmic suggestions accompanied by explainable reasonings, can substantially reduce the time and effort taken by customs officers for classification reviews.

摘要
customs offices 负责分配国际接受的商品代码（即HS码）是一项关键的任务。这种任务与法律判决一样，按照前例进行，而且对经验丰富的官员也可能是非常困难的。我们与韩国海关服务（KCS）合作，提出了一种首次出现的可解释决策支持模型，该模型建议商品的可能性最高的六位HS码。此外，模型还提供了其建议的解释，以文档的形式，可以由海关官员理解。我们对5000个最近获得分类请求的案例进行了评估，结果显示，我们模型的top3建议的准确率为93.9%，对925个困难的子标签进行分类。一个用户研究中，32名海关专家确认了我们的算法建议和可解释的理由，可以减少海关官员对分类审核的时间和努力。

Compact and Intuitive Airfoil Parameterization Method through Physics-aware Variational Autoencoder

paper_url: http://arxiv.org/abs/2311.10921
repo_url: None
paper_authors: Yu-Eop Kang, Dawoon Lee, Kwanjung Yee
for: optimize the design of high-performance aircraft airfoils
methods: uses physics-aware variational autoencoder to parameterize airfoil shape
results: produces smooth and non-intersecting airfoils with improved feasibility and intuitiveness

Abstract
Airfoil shape optimization plays a critical role in the design of high-performance aircraft. However, the high-dimensional nature of airfoil representation causes the challenging problem known as the "curse of dimensionality". To overcome this problem, numerous airfoil parameterization methods have been developed, which can be broadly classified as polynomial-based and data-driven approaches. Each of these methods has desirable characteristics such as flexibility, parsimony, feasibility, and intuitiveness, but a single approach that encompasses all of these attributes has yet to be found. For example, polynomial-based methods struggle to balance parsimony and flexibility, while data-driven methods lack in feasibility and intuitiveness. In recent years, generative models, such as generative adversarial networks and variational autoencoders, have shown promising potential in airfoil parameterization. However, these models still face challenges related to intuitiveness due to their black-box nature. To address this issue, we developed a novel airfoil parameterization method using physics-aware variational autoencoder. The proposed method not only explicitly separates the generation of thickness and camber distributions to produce smooth and non-intersecting airfoils, thereby improving feasibility, but it also directly aligns its latent dimensions with geometric features of the airfoil, significantly enhancing intuitiveness. Finally, extensive comparative studies were performed to demonstrate the effectiveness of our approach.

摘要
高性能飞机设计中，飞机翼形参数化具有核心作用。然而，飞机翼形表示的维度高度带来“维度咒数”问题，这种问题很难解决。为了缓解这个问题，许多飞机翼形参数化方法已经开发出来，可以分为多项式基于的方法和数据驱动方法。每种方法都具有便利、简洁、可行性和直观性等特点，但是一种方法同时具有所有这些特点还没有被发现。例如，多项式基于的方法很难平衡简洁和灵活性，而数据驱动方法则缺乏可行性和直观性。在最近几年，生成模型，如生成敌方网络和变量自动编码器，在飞机翼形参数化中表现出了潜在的潜力。然而，这些模型仍然面临直观性问题，因为它们的黑盒结构。为了解决这个问题，我们开发了一种新的飞机翼形参数化方法，使用物理意识的变量自动编码器。我们的方法不仅能够明确分离thickness和camber分布的生成，从而提高可行性，而且直接将其缺失的维度与飞机翼形的几何特征直接对应，从而明显提高直观性。最后，我们进行了广泛的比较研究，以证明我们的方法的有效性。

Understanding and Mitigating Classification Errors Through Interpretable Token Patterns

paper_url: http://arxiv.org/abs/2311.10920
repo_url: None
paper_authors: Michael A. Hedderich, Jonas Fischer, Dietrich Klakow, Jilles Vreeken
For: 本文旨在Characterizing NLP类型错误，以提供global和可读的描述，以便改进NLP类型错误。* Methods: 本文提出了一种基于最小描述长度原则的方法，可以找到 Correct和错误预测之间的 TokenPatterns。* Results: 实验表明，该方法能够成功地找到ground truth，即使数据集很大且词汇表很大。在VQA和NERcase study中，该方法能够提供明确和行动可能的错误描述。

Abstract
State-of-the-art NLP methods achieve human-like performance on many tasks, but make errors nevertheless. Characterizing these errors in easily interpretable terms gives insight into whether a classifier is prone to making systematic errors, but also gives a way to act and improve the classifier. We propose to discover those patterns of tokens that distinguish correct and erroneous predictions as to obtain global and interpretable descriptions for arbitrary NLP classifiers. We formulate the problem of finding a succinct and non-redundant set of such patterns in terms of the Minimum Description Length principle. Through an extensive set of experiments, we show that our method, Premise, performs well in practice. Unlike existing solutions, it recovers ground truth, even on highly imbalanced data over large vocabularies. In VQA and NER case studies, we confirm that it gives clear and actionable insight into the systematic errors made by NLP classifiers.

摘要
现代NLPT方法可以达到人类水平的性能在许多任务上，但仍会出错。Characterizing这些错误的方式可以提供权威的错误分布，并且可以用来改进分类器。我们提议使用token的 patrerns来 отличи correct和erroneous预测。我们将这问题转化为最小描述长度原理来解决。经过广泛的实验，我们发现我们的方法Premise在实践中表现良好。与现有的解决方案不同，它可以在大词汇和强相关性的数据上恢复真实的描述。在VQA和NER例子中，我们证明了它可以提供清晰和行动可能的NLPT分类器的系统性错误的理解。

2023-11-19

Encoding Performance Data in MEI with the Automatic Music Performance Analysis and Comparison Toolkit (AMPACT)

M$^{2}$UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models

2023-11-19

Label-Synchronous Neural Transducer for Adaptable Online E2E Speech Recognition

2023-11-19

Improved Defect Detection and Classification Method for Advanced IC Nodes by Using Slicing Aided Hyper Inference with Refinement Strategy

DiffSCI: Zero-Shot Snapshot Compressive Imaging via Iterative Spectral Diffusion Model

Enhancing Low-dose CT Image Reconstruction by Integrating Supervised and Unsupervised Learning

FDDM: Unsupervised Medical Image Translation with a Frequency-Decoupled Diffusion Model

A Survey of Emerging Applications of Diffusion Probabilistic Models in MRI

Evidential Uncertainty Quantification: A Variance-Based Perspective

Scale-aware competition network for palmprint recognition

MoVideo: Motion-Aware Video Generation with Diffusion Models

Discrete approximations of Gaussian smoothing and Gaussian derivatives

Optimizing rgb-d semantic segmentation through multi-modal interaction and pooling attention

UMAAF: Unveiling Aesthetics via Multifarious Attributes of Images

Exchanging Dual Encoder-Decoder: A New Strategy for Change Detection with Semantic Guidance and Spatial Localization

Pair-wise Layer Attention with Spatial Masking for Video Prediction

LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching

Transcending Forgery Specificity with Latent Space Augmentation for Generalizable Deepfake Detection

Generalization and Hallucination of Large Vision-Language Models through a Camouflaged Lens

Radarize: Large-Scale Radar SLAM for Indoor Environments

Quality and Quantity: Unveiling a Million High-Quality Images for Text-to-Image Synthesis in Fashion Design

Submeter-level Land Cover Mapping of Japan

AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort

Open-Vocabulary Camouflaged Object Segmentation

Enhancing Radiology Diagnosis through Convolutional Neural Networks for Computer Vision in Healthcare

GaussianDiffusion: 3D Gaussian Splatting for Denoising Diffusion Probabilistic Models with Structured Noise

Infrared image identification method of substation equipment fault under weak supervision

HiH: A Multi-modal Hierarchy in Hierarchy Network for Unconstrained Gait Recognition

3D Guidewire Shape Reconstruction from Monoplane Fluoroscopic Images

LogicNet: A Logical Consistency Embedded Face Attribute Learning Network

Shape-Sensitive Loss for Catheter and Guidewire Segmentation

Self-Supervised Versus Supervised Training for Segmentation of Organoid Images

2023-11-19

LLM aided semi-supervision for Extractive Dialog Summarization

SecureBERT and LLAMA 2 Empowered Control Area Network Intrusion Detection and Classification

Unveiling Public Perceptions: Machine Learning-Based Sentiment Analysis of COVID-19 Vaccines in India

Appearance Codes using Joint Embedding Learning of Multiple Modalities

LifeLearner: Hardware-Aware Meta Continual Learning System for Embedded Computing Platforms

A Security Risk Taxonomy for Large Language Models

Make me an Offer: Forward and Reverse Auctioning Problems in the Tourism Industry

Enhancing Novel Object Detection via Cooperative Foundational Models

Inspecting Explainability of Transformer Models with Additional Statistical Information

SOccDPT: Semi-Supervised 3D Semantic Occupancy from Dense Prediction Transformers trained under memory constraints

Using Causal Threads to Explain Changes in a Dynamic System

Portuguese FAQ for Financial Services

Bounds on Representation-Induced Confounding Bias for Treatment Effect Estimation

GeoSAM: Fine-tuning SAM with Sparse and Dense Visual Prompting for Automated Segmentation of Mobility Infrastructure

TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems

What Lies beyond the Pareto Front? A Survey on Decision-Support Methods for Multi-Objective Optimization

Tactile Active Inference Reinforcement Learning for Efficient Robotic Manipulation Skill Acquisition

Adversarial Prompt Tuning for Vision-Language Models

Tensor networks for interpretable and efficient quantum-inspired machine learning

A Comprehensive Review on Sentiment Analysis: Tasks, Approaches and Applications

Open Set Dandelion Network for IoT Intrusion Detection

AtomXR: Streamlined XR Prototyping with Natural Language and Immersive Physical Interaction

Implementation of AI Deep Learning Algorithm For Multi-Modal Sentiment Analysis

Unraveling the `Anomaly’ in Time Series Anomaly Detection: A Self-supervised Tri-domain Solution

FedRA: A Random Allocation Strategy for Federated Tuning to Unleash the Power of Heterogeneous Clients

An Interactive Query Generation Assistant using LLM-based Prompt Modification and User Feedback

SPLAIN: Augmenting CybersecurityWarnings with Reasons and Data

Can We Utilize Pre-trained Language Models within Causal Discovery Algorithms?

Leveraging Generative AI for Clinical Evidence Summarization Needs to Achieve Trustworthiness

On the Noise Scheduling for Generating Plausible Designs with Diffusion Models

Unmasking and Improving Data Credibility: A Study with Datasets for Training Harmless Language Models

Assessing AI Impact Assessments: A Classroom Study

Attention-Based Real-Time Defenses for Physical Adversarial Attacks in Vision Applications

Few-Shot Classification & Segmentation Using Large Language Models Agent

2023-11-19

Spot the Bot: Distinguishing Human-Written and Bot-Generated Texts Using Clustering and Information Theory Techniques

ML-LMCL: Mutual Learning and Large-Margin Contrastive Learning for Improving ASR Robustness in Spoken Language Understanding

CHAMP: Efficient Annotation and Consolidation of Cluster Hierarchies

A Cross-Attention Augmented Model for Event-Triggered Context-Aware Story Generation

Towards Real-World Writing Assistance: A Chinese Character Checking Benchmark with Faked and Misspelled Characters

Rethinking Large Language Models in Mental Health Applications

Causal ATE Mitigates Unintended Bias in Controlled Text Generation

2023-11-19

Physics-Enhanced TinyML for Real-Time Detection of Ground Magnetic Anomalies