2023-09-07

cs.AI

cs.AI - 2023-09-07

Evaluation of large language models for discovery of gene set function

paper_url: http://arxiv.org/abs/2309.04019
repo_url: https://github.com/idekerlab/llm_evaluation_for_gene_set_interpretation
paper_authors: Mengzhou Hu, Sahar Alkhairy, Ingoo Lee, Rudolf T. Pillich, Robin Bachelder, Trey Ideker, Dexter Pratt
for: 这 paper 旨在评估 OpenAI 的 GPT-4 是否可以从嵌入的生物医学知识中提取共同的基因函数理论。
methods: 作者使用 GPT-4 pipeline 将基因集标记为概括其共谊功能的名称，并提供分析文本和参考文献支持。
results: GPT-4 在 Gene Ontology 中提供的名称与实际名称相似，并在 ‘omics 数据中提供了更加详细的基因集名称，并且支持语句和参考文献几乎全部得到了人工审查的 verify。

Abstract
Gene set analysis is a mainstay of functional genomics, but it relies on manually curated databases of gene functions that are incomplete and unaware of biological context. Here we evaluate the ability of OpenAI's GPT-4, a Large Language Model (LLM), to develop hypotheses about common gene functions from its embedded biomedical knowledge. We created a GPT-4 pipeline to label gene sets with names that summarize their consensus functions, substantiated by analysis text and citations. Benchmarking against named gene sets in the Gene Ontology, GPT-4 generated very similar names in 50% of cases, while in most remaining cases it recovered the name of a more general concept. In gene sets discovered in 'omics data, GPT-4 names were more informative than gene set enrichment, with supporting statements and citations that largely verified in human review. The ability to rapidly synthesize common gene functions positions LLMs as valuable functional genomics assistants.

摘要

ConDA: Contrastive Domain Adaptation for AI-generated Text Detection

paper_url: http://arxiv.org/abs/2309.03992
repo_url: https://github.com/amritabh/conda-gen-text-detection
paper_authors: Amrita Bhattacharjee, Tharindu Kumarage, Raha Moraffah, Huan Liu
for: 这篇论文旨在建立一个不需要标注训练数据的人工智能生成文本检测器，以应对伪信息的散播。
methods: 本文使用了一种叫做对比领域适应（ConDA）的框架，它结合了标准的领域适应技术和对比学习的表现力，从不标注目标资料中学习对应的领域不变表示，以便进行最终的无标注检测任务。
results: 实验结果显示，使用ConDA框架可以从最好的基eline中获得31.7%的性能提升，并且与全标注检测器之间的差距在0.8%之内。所有的代码和数据可以在https://github.com/AmritaBh/ConDA-gen-text-detection上取得。

Abstract
Large language models (LLMs) are increasingly being used for generating text in a variety of use cases, including journalistic news articles. Given the potential malicious nature in which these LLMs can be used to generate disinformation at scale, it is important to build effective detectors for such AI-generated text. Given the surge in development of new LLMs, acquiring labeled training data for supervised detectors is a bottleneck. However, there might be plenty of unlabeled text data available, without information on which generator it came from. In this work we tackle this data problem, in detecting AI-generated news text, and frame the problem as an unsupervised domain adaptation task. Here the domains are the different text generators, i.e. LLMs, and we assume we have access to only the labeled source data and unlabeled target data. We develop a Contrastive Domain Adaptation framework, called ConDA, that blends standard domain adaptation techniques with the representation power of contrastive learning to learn domain invariant representations that are effective for the final unsupervised detection task. Our experiments demonstrate the effectiveness of our framework, resulting in average performance gains of 31.7% from the best performing baselines, and within 0.8% margin of a fully supervised detector. All our code and data is available at https://github.com/AmritaBh/ConDA-gen-text-detection.

摘要

Noisy Computing of the $\mathsf{OR}$ and $\mathsf{MAX}$ Functions

paper_url: http://arxiv.org/abs/2309.03986
repo_url: None
paper_authors: Banghua Zhu, Ziao Wang, Nadim Ghaddar, Jiantao Jiao, Lele Wang
For: The paper is written for computing a function of $n$ variables using noisy queries, where each query is incorrect with some fixed and known probability $p \in (0,1/2)$.* Methods: The paper uses noisy queries to compute the $\mathsf{OR}$ function of $n$ bits and the $\mathsf{MAX}$ function of $n$ real numbers, with an expected number of queries of $(1 \pm o(1)) \frac{n\log \frac{1}{\delta}{D_{\mathsf{KL}(p | 1-p)}$.* Results: The paper shows that this expected number of queries is both sufficient and necessary to compute both functions with a vanishing error probability $\delta = o(1)$, and tightens the dependence on $p$ in both the upper and lower bounds for the two functions.

Abstract
We consider the problem of computing a function of $n$ variables using noisy queries, where each query is incorrect with some fixed and known probability $p \in (0,1/2)$. Specifically, we consider the computation of the $\mathsf{OR}$ function of $n$ bits (where queries correspond to noisy readings of the bits) and the $\mathsf{MAX}$ function of $n$ real numbers (where queries correspond to noisy pairwise comparisons). We show that an expected number of queries of \[ (1 \pm o(1)) \frac{n\log \frac{1}{\delta}{D_{\mathsf{KL}(p \| 1-p)} \] is both sufficient and necessary to compute both functions with a vanishing error probability $\delta = o(1)$, where $D_{\mathsf{KL}(p \| 1-p)$ denotes the Kullback-Leibler divergence between $\mathsf{Bern}(p)$ and $\mathsf{Bern}(1-p)$ distributions. Compared to previous work, our results tighten the dependence on $p$ in both the upper and lower bounds for the two functions.

摘要
我们考虑一个函数计算问题，其中有 $n$ 变量，每个变量的误差概率为 $p \in (0,1/2)$。我们考虑了计算 $\mathsf{OR}$ 函数和 $\mathsf{MAX}$ 函数的问题，其中每个变量的误差概率都是 $p$。我们显示出，需要 $\left(1 \pm o(1)\right) \frac{n \log \frac{1}{\delta}{D_{\mathsf{KL}(p \| 1-p)}$ 个查询，以达到误差概率 $\delta = o(1)$ 下降到零。这个结果比前一个研究更加紧凑，并且在上下限中都紧紧地依赖于 $p$。Here's the breakdown of the translation:* 我们考虑 (we consider)* 一个函数计算问题 (a function computation problem)* 其中有 $n$ 变量 (where there are $n$ variables)* 每个变量的误差概率为 $p$ (each variable has an error probability of $p$)* 我们考虑了计算 $\mathsf{OR}$ 函数和 $\mathsf{MAX}$ 函数的问题 (we consider the problem of computing the $\mathsf{OR}$ function and the $\mathsf{MAX}$ function)* 其中每个变量的误差概率都是 $p$ (where each variable has an error probability of $p$)* 我们显示出 (we show)* 需要 $\left(1 \pm o(1)\right) \frac{n \log \frac{1}{\delta}{D_{\mathsf{KL}(p \| 1-p)}$ 个查询 (need $\left(1 \pm o(1)\right) \frac{n \log \frac{1}{\delta}{D_{\mathsf{KL}(p \| 1-p)}$ queries)* 以达到误差概率 $\delta = o(1)$ 下降到零 (to reduce the error probability to zero)* 这个结果比前一个研究更加紧凑 (this result is tighter than previous studies)* 并且在上下限中都紧紧地依赖于 $p$ (and is tight in both the upper and lower bounds for $p$)

DiffusionEngine: Diffusion Model is Scalable Data Engine for Object Detection

paper_url: http://arxiv.org/abs/2309.03893
repo_url: None
paper_authors: Manlin Zhang, Jie Wu, Yuxi Ren, Ming Li, Jie Qin, Xuefeng Xiao, Wei Liu, Rui Wang, Min Zheng, Andy J. Ma
for: 这篇论文的目的是为了提出一个可扩展的数据引擎，以便实现物件探测中的训练。
methods: 这篇论文使用了一个名为DiffusionEngine的数据扩展引擎，该引擎包括一个预训练的数据模型和一个有效的探测适配器。这些元件可以在单一的过程中生成大量、多样化和可重复的探测训练 pairs。
results: 实验结果显示，这篇论文提出的DiffusionEngine可以在多种情况下取得显著的改善，例如不同的探测算法、自我指导预训练、数据缺乏、标签缺乏、跨领域和半指导学习等。例如，使用DiffusionEngine和DINO-based适配器将数据扩展，则在COCO、VOC和Clipart上的mAP分别提高了3.1%、7.6%和11.5%。

Abstract
Data is the cornerstone of deep learning. This paper reveals that the recently developed Diffusion Model is a scalable data engine for object detection. Existing methods for scaling up detection-oriented data often require manual collection or generative models to obtain target images, followed by data augmentation and labeling to produce training pairs, which are costly, complex, or lacking diversity. To address these issues, we presentDiffusionEngine (DE), a data scaling-up engine that provides high-quality detection-oriented training pairs in a single stage. DE consists of a pre-trained diffusion model and an effective Detection-Adapter, contributing to generating scalable, diverse and generalizable detection data in a plug-and-play manner. Detection-Adapter is learned to align the implicit semantic and location knowledge in off-the-shelf diffusion models with detection-aware signals to make better bounding-box predictions. Additionally, we contribute two datasets, i.e., COCO-DE and VOC-DE, to scale up existing detection benchmarks for facilitating follow-up research. Extensive experiments demonstrate that data scaling-up via DE can achieve significant improvements in diverse scenarios, such as various detection algorithms, self-supervised pre-training, data-sparse, label-scarce, cross-domain, and semi-supervised learning. For example, when using DE with a DINO-based adapter to scale up data, mAP is improved by 3.1% on COCO, 7.6% on VOC, and 11.5% on Clipart.

摘要
“数据是深度学习的基础。这篇论文揭示了最近开发的扩散模型是一种可扩展的数据引擎 для物体检测。现有的方法 для扩大检测引导的数据经常需要手动收集或生成模型来获取目标图像，然后进行数据扩展和标注来生成训练对，这些过程昂贵、复杂或缺乏多样性。为解决这些问题，我们提出了DiffusionEngine（DE），一种可扩展的数据扩大引擎。DE包括一个预训练的扩散模型和一个有效的检测适配器，它可以在一个插入式的方式下生成高质量的检测引导数据。检测适配器是通过将偏振的含义和位置知识从存储在各种扩散模型中的含义和位置知识与检测意图相匹配，以提高矩形框预测的准确性。此外，我们还提供了COCO-DE和VOC-DE两个数据集，以扩大现有的检测benchmark，便于后续研究。广泛的实验表明，通过DE进行数据扩大可以在多种场景下实现显著的提升，包括不同的检测算法、自我主导的预训练、数据稀缺、标注缺乏、跨Domain、和半supervised学习。例如，当使用DE和DINO基于的适配器来扩大数据时，COCO上的mAP提高3.1%，VOC上提高7.6%，Clipart上提高11.5%。”

A Function Interpretation Benchmark for Evaluating Interpretability Methods

paper_url: http://arxiv.org/abs/2309.03886
repo_url: https://github.com/multimodal-interpretability/find
paper_authors: Sarah Schwettmann, Tamar Rott Shaham, Joanna Materzynska, Neil Chowdhury, Shuang Li, Jacob Andreas, David Bau, Antonio Torralba
for: 这 paper 的目的是为了评估自动化解释方法的性能。
methods: 这 paper 使用了语言模型（LM）来生成代码和文本描述函数行为。
results: 研究发现，使用黑盒访问函数的LM可以做出一些科学家的推测和实验，但是它们通常只能捕捉全局函数行为，而不是地方腐化。这些结果表明，FIND 可以用于评估更复杂的解释方法的性能，以前置应用于实际模型。

Abstract
Labeling neural network submodules with human-legible descriptions is useful for many downstream tasks: such descriptions can surface failures, guide interventions, and perhaps even explain important model behaviors. To date, most mechanistic descriptions of trained networks have involved small models, narrowly delimited phenomena, and large amounts of human labor. Labeling all human-interpretable sub-computations in models of increasing size and complexity will almost certainly require tools that can generate and validate descriptions automatically. Recently, techniques that use learned models in-the-loop for labeling have begun to gain traction, but methods for evaluating their efficacy are limited and ad-hoc. How should we validate and compare open-ended labeling tools? This paper introduces FIND (Function INterpretation and Description), a benchmark suite for evaluating the building blocks of automated interpretability methods. FIND contains functions that resemble components of trained neural networks, and accompanying descriptions of the kind we seek to generate. The functions are procedurally constructed across textual and numeric domains, and involve a range of real-world complexities, including noise, composition, approximation, and bias. We evaluate new and existing methods that use language models (LMs) to produce code-based and language descriptions of function behavior. We find that an off-the-shelf LM augmented with only black-box access to functions can sometimes infer their structure, acting as a scientist by forming hypotheses, proposing experiments, and updating descriptions in light of new data. However, LM-based descriptions tend to capture global function behavior and miss local corruptions. These results show that FIND will be useful for characterizing the performance of more sophisticated interpretability methods before they are applied to real-world models.

摘要
Labeling neural network submodules with human-legible descriptions是有用的downstream任务：这些描述可以暴露失败，导引 intervención，并可能 même explain important model behaviors。到目前为止，大多数机制性描述已经只是用小型模型、窄化的现象和大量的人工劳动。将所有人类可读的子计算机制 INTO models of increasing size and complexity will almost certainly require tools that can generate and validate descriptions automatically。Recently, techniques that use learned models in-the-loop for labeling have begun to gain traction, but methods for evaluating their efficacy are limited and ad-hoc。How should we validate and compare open-ended labeling tools？This paper introduces FIND (Function INterpretation and Description), a benchmark suite for evaluating the building blocks of automated interpretability methods。FIND contains functions that resemble components of trained neural networks, and accompanying descriptions of the kind we seek to generate。The functions are procedurally constructed across textual and numeric domains, and involve a range of real-world complexities, including noise, composition, approximation, and bias。We evaluate new and existing methods that use language models (LMs) to produce code-based and language descriptions of function behavior。We find that an off-the-shelf LM augmented with only black-box access to functions can sometimes infer their structure, acting as a scientist by forming hypotheses, proposing experiments, and updating descriptions in light of new data。However, LM-based descriptions tend to capture global function behavior and miss local corruptions。These results show that FIND will be useful for characterizing the performance of more sophisticated interpretability methods before they are applied to real-world models。

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

paper_url: http://arxiv.org/abs/2309.03883
repo_url: https://github.com/voidism/dola
paper_authors: Yung-Sung Chuang, Yujia Xie, Hongyin Luo, Yoon Kim, James Glass, Pengcheng He
for: 减少大语言模型（LLMs）的幻觉，即生成不符事实的内容。
methods: 提出了一种简单的解oding策略，不需要conditioning retrieved external knowledge nor additional fine-tuning，可以更好地浮现LLMs中的事实知识。
results: 对多个选择任务和开放式生成任务进行了改进，如提高了LLaMA家族模型在TruthfulQA任务的性能，提高约12-17%绝对点数。

Abstract
Despite their impressive capabilities, large language models (LLMs) are prone to hallucinations, i.e., generating content that deviates from facts seen during pretraining. We propose a simple decoding strategy for reducing hallucinations with pretrained LLMs that does not require conditioning on retrieved external knowledge nor additional fine-tuning. Our approach obtains the next-token distribution by contrasting the differences in logits obtained from projecting the later layers versus earlier layers to the vocabulary space, exploiting the fact that factual knowledge in an LLMs has generally been shown to be localized to particular transformer layers. We find that this Decoding by Contrasting Layers (DoLa) approach is able to better surface factual knowledge and reduce the generation of incorrect facts. DoLa consistently improves the truthfulness across multiple choices tasks and open-ended generation tasks, for example improving the performance of LLaMA family models on TruthfulQA by 12-17% absolute points, demonstrating its potential in making LLMs reliably generate truthful facts.

摘要
尽管它们具有印象的能力，大语言模型（LLM）仍然容易出现幻觉，即生成不符事实的内容。我们提议一种简单的解码策略可以减少LLM中的幻觉，不需要基于检索到的外部知识 nor 额外调整。我们的方法通过对 later层和earlier层的投影到 vocabulary space 进行对比，利用了 LLM 中的事实具有局部化特征。我们称之为 Decoding by Contrasting Layers（DoLa）方法。我们发现 DoLa 方法可以更好地把 фактиче知识浮现出来，降低生成错误的事实。DoLa 方法在多个选择任务和开放式生成任务中表现出色，例如提高了 LLaMA 家族模型在 TruthfulQA 中的表现，提高了 truthfulness 的表现约 12-17% 绝对点数，这表明 DoLa 方法可以使 LLM 可靠地生成真实的事实。

OpinionGPT: Modelling Explicit Biases in Instruction-Tuned LLMs

paper_url: http://arxiv.org/abs/2309.03876
repo_url: None
paper_authors: Patrick Haller, Ansar Aynetdinov, Alan Akbik
for: 这个论文的目的是为了使人们可以通过查看具有不同偏见的答案来了解语言模型中的偏见。
methods: 这个论文使用了特定偏见的文本数据来训练模型，然后通过在用户选择的偏见下提供答案来展示这些偏见。
results: 这个论文的结果是一个名为OpinionGPT的在线示例，可以让用户问题并选择想要调查的偏见，然后模型将根据这些偏见提供答案，从而使用户可以对偏见进行互动和比较。

Abstract
Instruction-tuned Large Language Models (LLMs) have recently showcased remarkable ability to generate fitting responses to natural language instructions. However, an open research question concerns the inherent biases of trained models and their responses. For instance, if the data used to tune an LLM is dominantly written by persons with a specific political bias, we might expect generated answers to share this bias. Current research work seeks to de-bias such models, or suppress potentially biased answers. With this demonstration, we take a different view on biases in instruction-tuning: Rather than aiming to suppress them, we aim to make them explicit and transparent. To this end, we present OpinionGPT, a web demo in which users can ask questions and select all biases they wish to investigate. The demo will answer this question using a model fine-tuned on text representing each of the selected biases, allowing side-by-side comparison. To train the underlying model, we identified 11 different biases (political, geographic, gender, age) and derived an instruction-tuning corpus in which each answer was written by members of one of these demographics. This paper presents OpinionGPT, illustrates how we trained the bias-aware model and showcases the web application (available at https://opiniongpt.informatik.hu-berlin.de).

摘要
很近期， instruction-tuned 大型自然语言模型（LLM）已经展现出了remarkable的适应能力，可以生成适应natural language instruction的回答。然而，一个打开的研究问题是训练模型内置的偏见。例如，如果用于训练 LLM 的数据主要由特定政治偏见的人员写成，那么生成的答案可能会带有这种偏见。现有的研究工作是想要减少或抑制这些偏见的模型。在这个示例中，我们采取了一种不同的视角，即不是减少偏见，而是使它们显示出来，并让用户可以选择想要调查的偏见。为此，我们提出了 OpinionGPT，一个网上示例，用户可以在这里提问问题，并选择想要调查的偏见。示例中的答案将使用基于每种选择的偏见进行模型细化，进行侧重比较。为了训练底层模型，我们identified 11种偏见（政治、地理、性别、年龄），并 derivated一个 instrucion-tuning 训练集，每个答案都是由不同的民族成员写成。这篇论文介绍了 OpinionGPT，详细介绍了我们如何训练偏见意识的模型，并展示了网上应用程序（可以在https://opiniongpt.informatik.hu-berlin.de 中查看）。

FLM-101B: An Open LLM and How to Train It with $100K Budget

paper_url: http://arxiv.org/abs/2309.03852
repo_url: None
paper_authors: Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, Xuying Meng, Siqi Fan, Peng Han, Jing Li, Li Du, Bowen Qin, Zheng Zhang, Aixin Sun, Yequan Wang
for: 这篇论文目标是提出一种减少大语言模型（LLM）训练成本的解决方案，并通过实验证明其效果。
methods: 该论文使用了一种增长策略来减少LLM训练成本，并在其基础上进行了一系列的IQ评价。
results: 实验结果显示，使用该增长策略训练的FLM-101B模型，可以与其他 poderful 和具有名声的模型相比，尤其是在IQ评价中表现出色。

Abstract
Large language models (LLMs) have achieved remarkable success in NLP and multimodal tasks, among others. Despite these successes, two main challenges remain in developing LLMs: (i) high computational cost, and (ii) fair and objective evaluations. In this paper, we report a solution to significantly reduce LLM training cost through a growth strategy. We demonstrate that a 101B-parameter LLM with 0.31T tokens can be trained with a budget of 100K US dollars. Inspired by IQ tests, we also consolidate an additional range of evaluations on top of existing evaluations that focus on knowledge-oriented abilities. These IQ evaluations include symbolic mapping, rule understanding, pattern mining, and anti-interference. Such evaluations minimize the potential impact of memorization. Experimental results show that our model, named FLM-101B, trained with a budget of 100K US dollars, achieves performance comparable to powerful and well-known models, e.g., GPT-3 and GLM-130B, especially on the additional range of IQ evaluations. The checkpoint of FLM-101B is released at https://huggingface.co/CofeAI/FLM-101B.

摘要
大型语言模型（LLM）在自然语言处理和多模态任务中备受推崇，其中两大挑战是：（一）高计算成本，（二）公正和 объектив的评估。本文报道了一种减少 LLM 训练成本的解决方案，我们示示了一个 101B 参数的 LLM 可以在 100K 美元预算下训练。受智商测试的 inspiration，我们还添加了一些以知识为导向的评估方法，包括符号映射、规则理解、模式挖掘和抗干扰。这些评估方法减少了可能的记忆效应。实验结果表明，我们名为 FLM-101B 的模型，训练预算为 100K 美元，与知名的 GPT-3 和 GLM-130B 模型相当，尤其是在其他评估方法上。FLM-101B 的检查点可以在上下载。

Uncovering Drift in Textual Data: An Unsupervised Method for Detecting and Mitigating Drift in Machine Learning Models

paper_url: http://arxiv.org/abs/2309.03831
repo_url: None
paper_authors: Saeed Khaki, Akhouri Abhinav Aditya, Zohar Karnin, Lan Ma, Olivia Pan, Samarth Marudheri Chandrashekar
For: 本研究旨在提出一种不需要人工标注的自动检测方法，以便在机器学习模型性能下降时提前发现和修复问题。* Methods: 我们采用了一种两步方法。第一步是将生产数据编码为目标分布，模型训练数据编码为参照分布。第二步是使用核函数基于最大均值差距（MMD）距离度量，比较参照和目标分布之间的差异，并估计任何可能的漂移。* Results: 我们的方法可以快速和准确地检测到生产数据中的漂移，并且可以识别出导致漂移的子集。 retrained 使用这些标识的高漂移样本表示在线客户体验质量指标上显著改善。

Abstract
Drift in machine learning refers to the phenomenon where the statistical properties of data or context, in which the model operates, change over time leading to a decrease in its performance. Therefore, maintaining a constant monitoring process for machine learning model performance is crucial in order to proactively prevent any potential performance regression. However, supervised drift detection methods require human annotation and consequently lead to a longer time to detect and mitigate the drift. In our proposed unsupervised drift detection method, we follow a two step process. Our first step involves encoding a sample of production data as the target distribution, and the model training data as the reference distribution. In the second step, we employ a kernel-based statistical test that utilizes the maximum mean discrepancy (MMD) distance metric to compare the reference and target distributions and estimate any potential drift. Our method also identifies the subset of production data that is the root cause of the drift. The models retrained using these identified high drift samples show improved performance on online customer experience quality metrics.

摘要
In our proposed unsupervised drift detection method, we follow a two-step process:Step 1: Encode a sample of production data as the target distribution, and the model training data as the reference distribution.Step 2: Employ a kernel-based statistical test that utilizes the maximum mean discrepancy (MMD) distance metric to compare the reference and target distributions and estimate any potential drift.Our method also identifies the subset of production data that is the root cause of the drift. By retraining the models using these identified high drift samples, we observe improved performance on online customer experience quality metrics.

Training Acceleration of Low-Rank Decomposed Networks using Sequential Freezing and Rank Quantization

paper_url: http://arxiv.org/abs/2309.03824
repo_url: None
paper_authors: Habib Hajimolahoseini, Walid Ahmed, Yang Liu
for: 提高深度学习模型的训练和执行速度，而不需要采用小rank的约化
methods: 提出了两种加速低约数据模型的技术，包括约化优化和顺序冻结分解层
results: 实验表明，这两种技术可以在训练和执行过程中提高模型的吞吐量，最高可达60%和37%，同时保持模型的准确率与原始模型接近

Abstract
Low Rank Decomposition (LRD) is a model compression technique applied to the weight tensors of deep learning models in order to reduce the number of trainable parameters and computational complexity. However, due to high number of new layers added to the architecture after applying LRD, it may not lead to a high training/inference acceleration if the decomposition ranks are not small enough. The issue is that using small ranks increases the risk of significant accuracy drop after decomposition. In this paper, we propose two techniques for accelerating low rank decomposed models without requiring to use small ranks for decomposition. These methods include rank optimization and sequential freezing of decomposed layers. We perform experiments on both convolutional and transformer-based models. Experiments show that these techniques can improve the model throughput up to 60% during training and 37% during inference when combined together while preserving the accuracy close to that of the original models

摘要
低阶划分（LRD）是一种深度学习模型压缩技术，用于减少训练参数数量和计算复杂性。然而，由于LRD后加入的新层数量增加，可能无法导致高训练/推理加速，特别是使用小极值推理可能会导致准确性下降。在这篇论文中，我们提出了两种加速LRD模型无需使用小极值划分的技术。这些方法包括排序优化和顺序冻结分解层。我们在 convolutional 和 transformer 基于模型上进行了实验，实验结果表明，这些技术可以在训练和推理过程中提高模型吞吐量，最高可达 60%，并保持准确性接近原始模型。

AnthroNet: Conditional Generation of Humans via Anthropometrics

paper_url: http://arxiv.org/abs/2309.03812
repo_url: https://github.com/Unity-Technologies/AnthroNet
paper_authors: Francesco Picetti, Shrinath Deshpande, Jonathan Leban, Soroosh Shahtalebi, Jay Patel, Peifeng Jing, Chunpu Wang, Charles Metze III, Cameron Sun, Cera Laidlaw, James Warren, Kathy Huynh, River Page, Jonathan Hogins, Adam Crespi, Sujoy Ganguly, Salehe Erfanian Ebadi
for: The paper is written for the purpose of presenting a novel human body model that can generate a wide range of human body shapes and poses.
methods: The paper uses a deep generative architecture to train the model end-to-end using only synthetically generated data, which provides highly accurate human mesh representations and allows for precise anthropometry of the body.
results: The model is capable of producing humans in any arbitrary pose and can be used to generate millions of unique human identities and poses for non-commercial academic research purposes.Here is the simplified Chinese text for the three key points:
for: 这篇论文是为了介绍一种新的人体模型，可以生成各种人体形态和姿势。
methods: 这篇论文使用深度生成架构来直接训练模型，只使用人工生成的数据进行训练，可以提供高度准确的人体三维模型和人体测量数据。
results: 模型可以生成任意姿势的人体，并且可以生成数百万个唯一的人体标示和姿势。

Abstract
We present a novel human body model formulated by an extensive set of anthropocentric measurements, which is capable of generating a wide range of human body shapes and poses. The proposed model enables direct modeling of specific human identities through a deep generative architecture, which can produce humans in any arbitrary pose. It is the first of its kind to have been trained end-to-end using only synthetically generated data, which not only provides highly accurate human mesh representations but also allows for precise anthropometry of the body. Moreover, using a highly diverse animation library, we articulated our synthetic humans' body and hands to maximize the diversity of the learnable priors for model training. Our model was trained on a dataset of $100k$ procedurally-generated posed human meshes and their corresponding anthropometric measurements. Our synthetic data generator can be used to generate millions of unique human identities and poses for non-commercial academic research purposes.

摘要
我们提出了一种新的人体模型，基于广泛的人体中心量测量，可以生成广泛的人体形态和姿势。我们的模型可以直接模拟特定的人类特征，通过深度生成架构来生成任意姿势的人类。这是首次使用只有生成的数据进行端到端训练的人体模型，不仅提供了高度准确的人体网格表示，还允许精确的人体 anthropometry 测量。此外，我们使用了高度多样化的动画库，将我们的 sintetic humans 的身体和手部动作塑造得更加多样化，以最大化学习 prior 的多样性。我们的模型在一个包含 100k 个生成的姿势人体网格和其对应的人体测量数据集上进行了训练。我们的 sintetic 数据生成器可以生成数百万个独特的人体形态和姿势，用于非商业学术研究 purposes。

Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck

paper_url: http://arxiv.org/abs/2309.03800
repo_url: None
paper_authors: Benjamin L. Edelman, Surbhi Goel, Sham Kakade, Eran Malach, Cyril Zhang
for: 这项研究探讨了深度学习中计算统计差距的细化算法设计选择。
methods: 本文考虑了离线稀疏偏好学习，这是一种超参数分类问题，具有一个统计查询下界，可以用来训练一个多层感知机。这个下界可以看作多种资源交易前ier：成功学习只能在一个具有足够财富（大型模型）、知识（大量数据）、耐心（多个训练轮次）或幸运（多个随机猜测）的情况下进行。
results: 我们通过理论和实验来证明，在这种设定下，稀疏初始化和增加网络宽度可以实现显著的样本效率提高。在这里，宽度扮演着平行搜索的角色：它增加了找到”彩礼奖”神经元的概率，这些神经元更加 sample-efficiently 学习稀疏特征。此外，我们还证明了使用宽、稀疏初始化的 MLP 模型可以在标准表格分类benchmark上实现更好的样本效率，这些网络在一些情况下even outperform了调参随机森林。

Abstract
This work investigates the nuanced algorithm design choices for deep learning in the presence of computational-statistical gaps. We begin by considering offline sparse parity learning, a supervised classification problem which admits a statistical query lower bound for gradient-based training of a multilayer perceptron. This lower bound can be interpreted as a multi-resource tradeoff frontier: successful learning can only occur if one is sufficiently rich (large model), knowledgeable (large dataset), patient (many training iterations), or lucky (many random guesses). We show, theoretically and experimentally, that sparse initialization and increasing network width yield significant improvements in sample efficiency in this setting. Here, width plays the role of parallel search: it amplifies the probability of finding "lottery ticket" neurons, which learn sparse features more sample-efficiently. Finally, we show that the synthetic sparse parity task can be useful as a proxy for real problems requiring axis-aligned feature learning. We demonstrate improved sample efficiency on tabular classification benchmarks by using wide, sparsely-initialized MLP models; these networks sometimes outperform tuned random forests.

摘要

FisheyePP4AV: A privacy-preserving method for autonomous vehicles on fisheye camera images

paper_url: http://arxiv.org/abs/2309.03799
repo_url: None
paper_authors: Linh Trinh, Bach Ha, Tu Tran
for: 保护自驾车摄像头拍摄的人脸和车牌号 Privacy Concerns in Autonomous Driving
methods: 提出了一种基于多种教师模型的面和车牌号识别框架，并使用变化和现实的 fisheye 变换将图像和标签转换为 fisheye-like 数据
results: 对于使用自驾车摄像头拍摄的 PP4AV dataset，我们的模型比基eline方法高效，即使数据被软标签Here’s a breakdown of each point:1. for: The paper is focused on addressing privacy concerns in autonomous driving by protecting pedestrian faces and nearby car license plates in actual road-driving scenarios.2. methods: The proposed method uses a framework for extracting face and plate identification knowledge from multiple teacher models, and transforms both the image and the label from a regular image to fisheye-like data using a varied and realistic fisheye transformation.3. results: The experimental findings demonstrated that the proposed model outperformed baseline methods when trained on data from autonomous vehicles, even when the data were softly labeled.

Abstract
In many parts of the world, the use of vast amounts of data collected on public roadways for autonomous driving has increased. In order to detect and anonymize pedestrian faces and nearby car license plates in actual road-driving scenarios, there is an urgent need for effective solutions. As more data is collected, privacy concerns regarding it increase, including but not limited to pedestrian faces and surrounding vehicle license plates. Normal and fisheye cameras are the two common camera types that are typically mounted on collection vehicles. With complex camera distortion models, fisheye camera images were deformed in contrast to regular images. It causes computer vision tasks to perform poorly when using numerous deep learning models. In this work, we pay particular attention to protecting privacy while yet adhering to several laws for fisheye camera photos taken by driverless vehicles. First, we suggest a framework for extracting face and plate identification knowledge from several teacher models. Our second suggestion is to transform both the image and the label from a regular image to fisheye-like data using a varied and realistic fisheye transformation. Finally, we run a test using the open-source PP4AV dataset. The experimental findings demonstrated that our model outperformed baseline methods when trained on data from autonomous vehicles, even when the data were softly labeled. The implementation code is available at our github: https://github.com/khaclinh/FisheyePP4AV.

摘要
在多个国家和地区，自动驾驶技术的应用使用了大量公共道路上收集的数据，增加了面临挑战的需求。为了探测和隐私化行人脸和附近车辆号牌在实际道路驾驶场景中，隐私问题的关注也在不断增加，包括但不限于行人脸和周围车辆号牌。通常，自动驾驶车辆上会安装 Normal 和 fisheye 两种常见的摄像头类型。由于复杂的摄像头扭曲模型，fisheye 摄像头图像与常见图像不同，导致计算机视觉任务的表现不佳，需要许多深度学习模型来进行改进。在这种情况下，我们强调保护隐私的同时，遵循多个法律。我们的方法包括：一、从多个教师模型中提取面和号牌识别知识。二、将图像和标签从常见图像转换为 fisheye-like 数据，使用变化和实际的 fisheye 转换。最后，我们使用开源的 PP4AV 数据集进行测试。实验结果表明，我们的模型在使用自动驾驶车辆收集的数据进行训练时，能够超越基eline方法。代码可以在我们的 GitHub 上找到：https://github.com/khaclinh/FisheyePP4AV。

CPU frequency scheduling of real-time applications on embedded devices with temporal encoding-based deep reinforcement learning

paper_url: http://arxiv.org/abs/2309.03779
repo_url: https://github.com/coladog/tinyagent
paper_authors: Ti Zhou, Man Lin
for: 这篇论文主要是关于开发小设备上Periodic任务的高效能源管理方法。
methods: 作者首先研究了小设备中Linux内置方法的限制，然后描述了三种常见的工作负荷/系统模式，这些模式对Linux内置解决方案而言是挑战。然后，作者开发了一种基于强化学习的技术，使用时间编码，从而 derivate一个高效的DVFSGOVERNOR。这个GOVERNOR只需一个性能计数器，与Linux内置机制一样，并不需要显式任务模型。
results: 作者实现了一个基于Nvidia Jetson Nano板的原型系统，并对六个应用程序进行了实验，包括两个自定义应用程序和四个参考应用程序。在不同的截止时间限制下，我们的方法可以快速 derivate一个适应性能要求的DVFSGOVERNOR，并在能源储存方面高效于Linux内置机制。在Mibench工作负荷上，在性能潜伏范围为0.04s至0.4s时，提posed方法可以保存3%-11%的能源。AudioReg和FaceReg应用程序的能源储存改进率为5%-14%。作者已经开源了内核量化神经网络引擎的实现代码，代码库可以在以下链接中找到：https://github.com/coladog/tinyagent。

Abstract
Small devices are frequently used in IoT and smart-city applications to perform periodic dedicated tasks with soft deadlines. This work focuses on developing methods to derive efficient power-management methods for periodic tasks on small devices. We first study the limitations of the existing Linux built-in methods used in small devices. We illustrate three typical workload/system patterns that are challenging to manage with Linux's built-in solutions. We develop a reinforcement-learning-based technique with temporal encoding to derive an effective DVFS governor even with the presence of the three system patterns. The derived governor uses only one performance counter, the same as the built-in Linux mechanism, and does not require an explicit task model for the workload. We implemented a prototype system on the Nvidia Jetson Nano Board and experimented with it with six applications, including two self-designed and four benchmark applications. Under different deadline constraints, our approach can quickly derive a DVFS governor that can adapt to performance requirements and outperform the built-in Linux approach in energy saving. On Mibench workloads, with performance slack ranging from 0.04 s to 0.4 s, the proposed method can save 3% - 11% more energy compared to Ondemand. AudioReg and FaceReg applications tested have 5%- 14% energy-saving improvement. We have open-sourced the implementation of our in-kernel quantized neural network engine. The codebase can be found at: https://github.com/coladog/tinyagent.

摘要

Extending Transductive Knowledge Graph Embedding Models for Inductive Logical Relational Inference

paper_url: http://arxiv.org/abs/2309.03773
repo_url: https://github.com/tgebhart/sheaf_kg_transind
paper_authors: Thomas Gebhart, John Cobb
for: bridging the gap between transductive and inductive knowledge graph embedding methods
methods: leveraging representations learned through transductive embedding methods to infer representations of new entities in the inductive setting
results: competitive with or outperforming state-of-the-art models derived explicitly for inductive tasks in experiments on large-scale knowledge graph embedding benchmarks

Abstract
Many downstream inference tasks for knowledge graphs, such as relation prediction, have been handled successfully by knowledge graph embedding techniques in the transductive setting. To address the inductive setting wherein new entities are introduced into the knowledge graph at inference time, more recent work opts for models which learn implicit representations of the knowledge graph through a complex function of a network's subgraph structure, often parametrized by graph neural network architectures. These come at the cost of increased parametrization, reduced interpretability and limited generalization to other downstream inference tasks. In this work, we bridge the gap between traditional transductive knowledge graph embedding approaches and more recent inductive relation prediction models by introducing a generalized form of harmonic extension which leverages representations learned through transductive embedding methods to infer representations of new entities introduced at inference time as in the inductive setting. This harmonic extension technique provides the best such approximation, can be implemented via an efficient iterative scheme, and can be employed to answer a family of conjunctive logical queries over the knowledge graph, further expanding the capabilities of transductive embedding methods. In experiments on a number of large-scale knowledge graph embedding benchmarks, we find that this approach for extending the functionality of transductive knowledge graph embedding models to perform knowledge graph completion and answer logical queries in the inductive setting is competitive with--and in some scenarios outperforms--several state-of-the-art models derived explicitly for such inductive tasks.

摘要
许多知识图embedding任务，如关系预测，已经由知识图嵌入技术在推uctive setting中成功处理。在 inductive setting中，新的实体被引入到知识图时，更新工作选择了模型学习知识图的隐式表示，通常通过图 neural network架构进行 parametrization。这些模型增加参数化，降低可解释性，并有限制其他下游任务的通用性。在这项工作中，我们将传统的推uctive知识图嵌入方法与更新的 inductive relation prediction模型相连接，通过一种通用的harmonic extension技术来推算新引入的实体的表示，这种技术可以实现高效的迭代方案，并可以用来回答 conjunctive logical queries sobre知识图，从而扩展传统的推uctive嵌入方法的能力。在一些大规模知识图嵌入benchmark上进行实验，我们发现这种方法可以在 inductive setting中完成知识图完成和回答逻辑查询任务，与一些状态对应的模型相比，在一些场景下even outperform。

Hybrid of representation learning and reinforcement learning for dynamic and complex robotic motion planning

paper_url: http://arxiv.org/abs/2309.03758
repo_url: None
paper_authors: Chengmin Zhou, Xin Lu, Jiapeng Dai, Bingding Huang, Xiaoxu Liu, Pasi Fränti
for: This paper proposes a hybrid algorithm for robotic motion planning that combines long short-term memory (LSTM) pooling and skip connection for attention-based discrete soft actor critic (LSA-DSAC).
methods: The proposed algorithm uses a graph network and attention network to interpret the environmental state, and integrates skip connection to mitigate overfitting and improve convergence speed.
results: The proposed LSA-DSAC algorithm outperforms the state-of-the-art in training and most evaluations, and is successfully implemented and tested on a physical robot in the real world.

Abstract
Motion planning is the soul of robot decision making. Classical planning algorithms like graph search and reaction-based algorithms face challenges in cases of dense and dynamic obstacles. Deep learning algorithms generate suboptimal one-step predictions that cause many collisions. Reinforcement learning algorithms generate optimal or near-optimal time-sequential predictions. However, they suffer from slow convergence, suboptimal converged results, and overfittings. This paper introduces a hybrid algorithm for robotic motion planning: long short-term memory (LSTM) pooling and skip connection for attention-based discrete soft actor critic (LSA-DSAC). First, graph network (relational graph) and attention network (attention weight) interpret the environmental state for the learning of the discrete soft actor critic algorithm. The expressive power of attention network outperforms that of graph in our task by difference analysis of these two representation methods. However, attention based DSAC faces the overfitting problem in training. Second, the skip connection method is integrated to attention based DSAC to mitigate overfitting and improve convergence speed. Third, LSTM pooling is taken to replace the sum operator of attention weigh and eliminate overfitting by slightly sacrificing convergence speed at early-stage training. Experiments show that LSA-DSAC outperforms the state-of-the-art in training and most evaluations. The physical robot is also implemented and tested in the real world.

摘要
<>TRANSLATE_TEXT运动规划是机器人决策的核心。经典的规划算法如搜索graph和反应型算法在受到紧密和动态障碍物时遇到问题。深度学习算法生成的一步预测通常会导致多次相撞。再强化学习算法则可以生成优化或近似优化的时间序列预测，但它们受到慢速度的转化和不佳的转化结果的影响。本文提出了机器人运动规划的гибри达算法：长Short-Term Memory（LSTM）混合和跳过连接 для注意力基于Discrete Soft Actor Critic（LSA-DSAC）。首先，图网络（关系图）和注意力网络（注意力权重）解释环境状态，以便学习Discrete Soft Actor Critic算法。对比这两种表示方法，注意力网络在我们的任务中表现出了更高的表达力。然而，注意力基于DSAC still faces the overfitting problem in training。其次，跳过连接方法被集成到注意力基于DSAC中，以mitigate overfitting和提高转化速度。最后，LSTM混合被用来取代注意力权重的 SUM 操作，以消除过拟合的问题，但是略微牺牲早期训练的速度。实验表明，LSA-DSAC在训练和评估中都能够超越状态 искус границы。physical robot也在实际世界中进行了测试。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. The translation is based on the given text and may not reflect the exact nuances of the original text.

TSGBench: Time Series Generation Benchmark

paper_url: http://arxiv.org/abs/2309.03755
repo_url: None
paper_authors: Yihao Ang, Qiang Huang, Yifan Bao, Anthony K. H. Tung, Zhiyong Huang
for: 本研究的目的是提供一个普遍和完整的TSG方法评估 benchmark，以扩展和改善现有的TSG方法。
methods: 本研究使用了10种先进的TSG方法，并使用了12个评估指标，包括标准的评估指标和新的距离基准。
results: 研究发现，\textsf{TSGBench} 能够提供一个统一和完整的TSG方法评估，并且能够给出不同测试集和评估指标下方法的性能差异，提供了更加精确的方法评估。

Abstract
Synthetic Time Series Generation (TSG) is crucial in a range of applications, including data augmentation, anomaly detection, and privacy preservation. Although significant strides have been made in this field, existing methods exhibit three key limitations: (1) They often benchmark against similar model types, constraining a holistic view of performance capabilities. (2) The use of specialized synthetic and private datasets introduces biases and hampers generalizability. (3) Ambiguous evaluation measures, often tied to custom networks or downstream tasks, hinder consistent and fair comparison. To overcome these limitations, we introduce \textsf{TSGBench}, the inaugural TSG Benchmark, designed for a unified and comprehensive assessment of TSG methods. It comprises three modules: (1) a curated collection of publicly available, real-world datasets tailored for TSG, together with a standardized preprocessing pipeline; (2) a comprehensive evaluation measures suite including vanilla measures, new distance-based assessments, and visualization tools; (3) a pioneering generalization test rooted in Domain Adaptation (DA), compatible with all methods. We have conducted extensive experiments across ten real-world datasets from diverse domains, utilizing ten advanced TSG methods and twelve evaluation measures, all gauged through \textsf{TSGBench}. The results highlight its remarkable efficacy and consistency. More importantly, \textsf{TSGBench} delivers a statistical breakdown of method rankings, illuminating performance variations across different datasets and measures, and offering nuanced insights into the effectiveness of each method.

摘要
“人工时间序列生成（TSG）在许多应用中扮演重要角色，包括数据增强、异常检测和隐私保护。 Although significant strides have been made in this field, existing methods have three key limitations: (1) They often benchmark against similar model types, limiting a holistic view of performance capabilities. (2) The use of specialized synthetic and private datasets introduces biases and hinders generalizability. (3) Ambiguous evaluation measures, often tied to custom networks or downstream tasks, hinder consistent and fair comparison. To overcome these limitations, we introduce \textsf{TSGBench}, the inaugural TSG Benchmark, designed for a unified and comprehensive assessment of TSG methods. It comprises three modules: (1) a curated collection of publicly available, real-world datasets tailored for TSG, together with a standardized preprocessing pipeline; (2) a comprehensive evaluation measures suite including vanilla measures, new distance-based assessments, and visualization tools; (3) a pioneering generalization test rooted in Domain Adaptation (DA), compatible with all methods. We have conducted extensive experiments across ten real-world datasets from diverse domains, utilizing ten advanced TSG methods and twelve evaluation measures, all gauged through \textsf{TSGBench}. The results highlight its remarkable efficacy and consistency. More importantly, \textsf{TSGBench} delivers a statistical breakdown of method rankings, illuminating performance variations across different datasets and measures, and offering nuanced insights into the effectiveness of each method.”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know and I can provide the translation in that form instead.

Enhancing Pipeline-Based Conversational Agents with Large Language Models

paper_url: http://arxiv.org/abs/2309.03748
repo_url: None
paper_authors: Mina Foosherian, Hendrik Purwins, Purna Rathnayake, Touhidul Alam, Rui Teimao, Klaus-Dieter Thoben
for: 这篇论文旨在探讨如何使用大语言模型（LLM）来增强基于管道的对话代理人。
methods: 这篇论文在两个阶段 investigates LLMs’ capabilities: 在设计和开发阶段，LLMs 可以帮助生成训练数据、提取实体和同义词、本地化和人物设计；在运行阶段，LLMs 可以帮助 Contextualization、意图分类、避免对话堵塞和处理非法问题、自动修正语句、重塑回答、生成缓解 вопросы、总结和启用关闭问答能力。
results: 作者通过使用 GPT-4 在私人银行领域进行了实际实验，以示出上述场景。由于隐私问题和替换既有的生态系统需要深度 интегра，因此公司可能会尽量保留其基于管道的代理人。一种混合方法，在基于管道的代理人中 интегра LLMs，可以让公司节省建立和运行代理人的时间和成本，同时保留现有系统的隐私和安全保障。

Abstract
The latest advancements in AI and deep learning have led to a breakthrough in large language model (LLM)-based agents such as GPT-4. However, many commercial conversational agent development tools are pipeline-based and have limitations in holding a human-like conversation. This paper investigates the capabilities of LLMs to enhance pipeline-based conversational agents during two phases: 1) in the design and development phase and 2) during operations. In 1) LLMs can aid in generating training data, extracting entities and synonyms, localization, and persona design. In 2) LLMs can assist in contextualization, intent classification to prevent conversational breakdown and handle out-of-scope questions, auto-correcting utterances, rephrasing responses, formulating disambiguation questions, summarization, and enabling closed question-answering capabilities. We conducted informal experiments with GPT-4 in the private banking domain to demonstrate the scenarios above with a practical example. Companies may be hesitant to replace their pipeline-based agents with LLMs entirely due to privacy concerns and the need for deep integration within their existing ecosystems. A hybrid approach in which LLMs' are integrated into the pipeline-based agents allows them to save time and costs of building and running agents by capitalizing on the capabilities of LLMs while retaining the integration and privacy safeguards of their existing systems.

摘要
最新的人工智能和深度学习技术突破已经导致大型语言模型（LLM）基于代理人的突破，如GPT-4。然而，许多商业对话代理人开发工具是管道式的，它们在保持人类化对话方面有限制。这篇论文研究了LLM在两个阶段中对管道式对话代理人的增强：1）在设计和开发阶段，LLM可以帮助生成训练数据，提取实体和同义词，本地化，并设计人物。2）在运行阶段，LLM可以帮助Contextualization，意图类型分类，避免对话堵塞和处理外部问题，自动更正词语，重新推敲答案，形成杠词问题，摘要和启用关闭问题-答案功能。我们在private banking领域使用GPT-4进行了非正式的实验，以示上述场景。由于隐私问题和现有系统集成的需求，公司可能会尽量保留管道式代理人，而不是完全取代它们。一种混合方法，在管道式代理人中 integrate LLM，允许它们在利用LLM的能力的同时，保留现有系统的隐私和安全保障。

A Natural Gas Consumption Forecasting System for Continual Learning Scenarios based on Hoeffding Trees with Change Point Detection Mechanism

paper_url: http://arxiv.org/abs/2309.03720
repo_url: https://github.com/rasvob/hoeffding-trees-with-cpd-multistep-forecasing
paper_authors: Radek Svoboda, Sebastian Basterrech, Jędrzej Kozal, Jan Platoš, Michał Woźniak
for: 预测天然气消耗，考虑季节性和趋势，对于工业实体来说是非常重要的，以便规划生产和消耗天然气，最大化生产成本。同时，在供应威胁的情况下，也是社会能源安全的关键因素。
methods: 本文提出了一种新的多步前预测天然气消耗方法，integrating change point detection，用于数据流处理。使用Hoeffding树预测模型和Prune Exact Linear Time（PELT）算法进行变点检测。在实际应用中，使用了不同的变点检测方法来选择不同的模型集。
results: 我们的实验表明，具有变点检测功能的预测模型比无变点检测的基准方法更具有优势，尤其是在检测到更多变点的情况下。此外，使用简单的变点检测方法可以获得更加稳定和适合持续学习任务的预测模型。

Abstract
Forecasting natural gas consumption, considering seasonality and trends, is crucial in planning its supply and consumption and optimizing the cost of obtaining it, mainly by industrial entities. However, in times of threats to its supply, it is also a critical element that guarantees the supply of this raw material to meet individual consumers' needs, ensuring society's energy security. This article introduces a novel multistep ahead forecasting of natural gas consumption with change point detection integration for model collection selection with continual learning capabilities using data stream processing. The performance of the forecasting models based on the proposed approach is evaluated in a complex real-world use case of natural gas consumption forecasting. We employed Hoeffding tree predictors as forecasting models and the Pruned Exact Linear Time (PELT) algorithm for the change point detection procedure. The change point detection integration enables selecting a different model collection for successive time frames. Thus, three model collection selection procedures (with and without an error feedback loop) are defined and evaluated for forecasting scenarios with various densities of detected change points. These models were compared with change point agnostic baseline approaches. Our experiments show that fewer change points result in a lower forecasting error regardless of the model collection selection procedure employed. Also, simpler model collection selection procedures omitting forecasting error feedback leads to more robust forecasting models suitable for continual learning tasks.

摘要
预测天然气消耗，考虑季节性和趋势，是重要的在规划生产和消耗的天然气supply和cost optimization中。然而，在供应威胁时，也是一个关键的元素，确保这种原料的供应，以满足个人消耗者的需求，保障社会能源安全。本文介绍了一种新的多步 ahead forecasting天然气消耗方法， integrate change point detection for model collection selection with continual learning capabilities using data stream processing。 Forecasting models based on the proposed approach were evaluated in a complex real-world use case of natural gas consumption forecasting. We employed Hoeffding tree predictors as forecasting models and the Pruned Exact Linear Time (PELT) algorithm for the change point detection procedure. The change point detection integration enables selecting a different model collection for successive time frames. Therefore, three model collection selection procedures (with and without an error feedback loop) were defined and evaluated for forecasting scenarios with various densities of detected change points. These models were compared with change point agnostic baseline approaches. Our experiments show that fewer change points result in a lower forecasting error regardless of the model collection selection procedure employed. Additionally, simpler model collection selection procedures omitting forecasting error feedback leads to more robust forecasting models suitable for continual learning tasks.

PyGraft: Configurable Generation of Schemas and Knowledge Graphs at Your Fingertips

paper_url: http://arxiv.org/abs/2309.03685
repo_url: https://github.com/nicolas-hbt/pygraft
paper_authors: Nicolas Hubert, Pierre Monnin, Mathieu d’Aquin, Armelle Brun, Davy Monticolo
for: 本研究旨在提供一个可以生成对象 Oriented 的 Knowledge Graph (KG) 的工具，以便为 Graph-based Machine Learning (ML) 的模型进行更多的评估和测试。
methods: 本研究使用 Python 语言开发了一个名为 PyGraft 的工具，可以生成具有不同特性和规模的 Knowledge Graphs (KGs)，并且可以保证这些生成的资源的逻辑一致性。
results: 本研究透过 PyGraft 生成的 KGs，实现了对 Graph-based ML 模型的更多和更具体的评估和测试，并且获得了更好的结果。

Abstract
Knowledge graphs (KGs) have emerged as a prominent data representation and management paradigm. Being usually underpinned by a schema (e.g. an ontology), KGs capture not only factual information but also contextual knowledge. In some tasks, a few KGs established themselves as standard benchmarks. However, recent works outline that relying on a limited collection of datasets is not sufficient to assess the generalization capability of an approach. In some data-sensitive fields such as education or medicine, access to public datasets is even more limited. To remedy the aforementioned issues, we release PyGraft, a Python-based tool that generates highly customized, domain-agnostic schemas and knowledge graphs. The synthesized schemas encompass various RDFS and OWL constructs, while the synthesized KGs emulate the characteristics and scale of real-world KGs. Logical consistency of the generated resources is ultimately ensured by running a description logic (DL) reasoner. By providing a way of generating both a schema and KG in a single pipeline, PyGraft's aim is to empower the generation of a more diverse array of KGs for benchmarking novel approaches in areas such as graph-based machine learning (ML), or more generally KG processing. In graph-based ML in particular, this should foster a more holistic evaluation of model performance and generalization capability, thereby going beyond the limited collection of available benchmarks. PyGraft is available at: https://github.com/nicolas-hbt/pygraft.

摘要
知识图（KG）已经成为数据表示和管理方法的一种显著的特点。通常受到 schema（例如ontology）的支持，KG 不仅记录了事实信息，还捕捉了 contextual knowledge。在某些任务中，一些 KG 已经成为了标准的参考基eline。然而， latest works 表明，仅仅靠用有限的数据集来评估一个方法的通用能力并不够。在一些数据敏感的领域，如教育或医学，对公共数据的访问也是有限的。为了解决以上问题，我们释放了 PyGraft，一个基于 Python 的工具，可以生成高度自定义、领域不依赖的 schema 和知识图。生成的 schema 包括 RDFS 和 OWL 结构体系，而生成的知识图模拟了实际知识图的特点和规模。在整个过程中，我们使用描述逻辑（DL）理解器来保证生成的资源的逻辑一致性。通过在单个管道中生成 schema 和知识图，PyGraft 的目标是激励一种更多样的知识图的生成，以便对图基于机器学习（ML）或更一般的知识图处理方法进行更加全面的评估和性能评估。在图基于 ML 中，这应该激励模型的性能和通用能力的全面评估，从而超越现有的限定的 benchmark。PyGraft 可以在以下地址获取：https://github.com/nicolas-hbt/pygraft。

Dataset Generation and Bonobo Classification from Weakly Labelled Videos

paper_url: http://arxiv.org/abs/2309.03671
repo_url: None
paper_authors: Pierre-Etienne Martin
for: 本研究旨在开发一个基于常用机器学习方法的 bonobo 检测和分类管线，以便在无人协助的情况下，使用触摸屏设备测试 bonobo 在围栏中的行为。
methods: 本研究使用了一个新收集的 bonobo 录制数据集，并使用了手工特征和不同的分类算法以及深度学习方法，包括 ResNet 架构，进行 bonobo 识别。
results: 本研究的结果表明，通过meaningful数据分割和 fine-tuning ResNet 模型，可以达到75%的准确率。同时，研究还证明了数据预处理的重要性，并示出了 incorrect 数据分割可能导致 false 的好结果。

Abstract
This paper presents a bonobo detection and classification pipeline built from the commonly used machine learning methods. Such application is motivated by the need to test bonobos in their enclosure using touch screen devices without human assistance. This work introduces a newly acquired dataset based on bonobo recordings generated semi-automatically. The recordings are weakly labelled and fed to a macaque detector in order to spatially detect the individual present in the video. Handcrafted features coupled with different classification algorithms and deep-learning methods using a ResNet architecture are investigated for bonobo identification. Performance is compared in terms of classification accuracy on the splits of the database using different data separation methods. We demonstrate the importance of data preparation and how a wrong data separation can lead to false good results. Finally, after a meaningful separation of the data, the best classification performance is obtained using a fine-tuned ResNet model and reaches 75% of accuracy.

摘要

How adversarial attacks can disrupt seemingly stable accurate classifiers

paper_url: http://arxiv.org/abs/2309.03665
repo_url: None
paper_authors: Oliver J. Sutton, Qinghua Zhou, Ivan Y. Tyukin, Alexander N. Gorban, Alexander Bastounis, Desmond J. Higham
for: 这个论文主要针对的是防御性攻击，即使用一些微小的修改来让模型输出错误的输入数据。
methods: 这篇论文使用了一种简单的普适的框架，来解释实际系统中观察到的一些特性，如模型对小范围的攻击敏感，而对大范围的随机干扰免疫。
results: 这篇论文的结果表明，即使使用大量的随机干扰，模型仍然可能受到小范围的攻击，而且这种攻击可以轻松地构造。此外，研究还发现，使用随机干扰进行训练或测试可能不能探测出这种攻击，需要更加严格的对抗训练来解决这个问题。

Abstract
Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data. Paradoxically, empirical evidence indicates that even systems which are robust to large random perturbations of the input data remain susceptible to small, easily constructed, adversarial perturbations of their inputs. Here, we show that this may be seen as a fundamental feature of classifiers working with high dimensional input data. We introduce a simple generic and generalisable framework for which key behaviours observed in practical systems arise with high probability -- notably the simultaneous susceptibility of the (otherwise accurate) model to easily constructed adversarial attacks, and robustness to random perturbations of the input data. We confirm that the same phenomena are directly observed in practical neural networks trained on standard image classification problems, where even large additive random noise fails to trigger the adversarial instability of the network. A surprising takeaway is that even small margins separating a classifier's decision surface from training and testing data can hide adversarial susceptibility from being detected using randomly sampled perturbations. Counterintuitively, using additive noise during training or testing is therefore inefficient for eradicating or detecting adversarial examples, and more demanding adversarial training is required.

摘要
敌对攻击可能导致学习系统的输出发生显著变化，只需要对输入数据进行微小的修改。尽管这些系统可以抵抗大量随机干扰输入数据，但它们却容易受到小型针对性的攻击。在这篇文章中，我们表明这可能是高维输入数据上的分类器工作的基本特点。我们提出了一个简单的通用的框架，可以解释在实际系统中观察到的一些重要行为，例如抗 Random Perturbations 的模型同时受到攻击和稳定性问题。我们验证了这些现象在实际的神经网络中也出现，即使添加大量随机干扰也无法触发神经网络的攻击不稳定性。一个意外的发现是， même 小的准确率差可以隐藏攻击的敏感性，因此使用随机干扰来检测或消除攻击是不够有效的。相反，需要更加严格的敌对训练来检测和消除攻击。

Towards Comparable Knowledge Distillation in Semantic Image Segmentation

paper_url: http://arxiv.org/abs/2309.03659
repo_url: None
paper_authors: Onno Niemann, Christopher Vox, Thorben Werner
for: 本研究目的是提出一种解决大型模型和慢速识别问题的方法，即知识塑化（KD）。
methods: 本研究使用了25种提出的塑化损失项，从14篇最近4年的论文中提取。
results: 研究发现，使用同样的模型和数据集时，SSTKD方法可以提高学生mIoU值4.54个百分点和最终性能29.19个百分点，而APD方法只提高学生性能2.06个百分点，但实现了39.25个百分点的最终性能。这种极大差异的原因通常是使用不优化的超参数，导致参照模型的性能下降。研究还发现，使用SKD和IFVD框架的塑化改进可以在超参数优化得到更好的性能。为了改善未来在这个领域的研究比较可读性，本研究提供了三个数据集和两个学生模型的固定基线，并提供了广泛的超参数优化信息。研究发现，只有两种技术可以与我们的简单基线相比肩，并且只有在ADE20K数据集上。

Abstract
Knowledge Distillation (KD) is one proposed solution to large model sizes and slow inference speed in semantic segmentation. In our research we identify 25 proposed distillation loss terms from 14 publications in the last 4 years. Unfortunately, a comparison of terms based on published results is often impossible, because of differences in training configurations. A good illustration of this problem is the comparison of two publications from 2022. Using the same models and dataset, Structural and Statistical Texture Distillation (SSTKD) reports an increase of student mIoU of 4.54 and a final performance of 29.19, while Adaptive Perspective Distillation (APD) only improves student performance by 2.06 percentage points, but achieves a final performance of 39.25. The reason for such extreme differences is often a suboptimal choice of hyperparameters and a resulting underperformance of the student model used as reference point. In our work, we reveal problems of insufficient hyperparameter tuning by showing that distillation improvements of two widely accepted frameworks, SKD and IFVD, vanish when hyperparameters are optimized sufficiently. To improve comparability of future research in the field, we establish a solid baseline for three datasets and two student models and provide extensive information on hyperparameter tuning. We find that only two out of eight techniques can compete with our simple baseline on the ADE20K dataset.

摘要
知识塑化（KD）是一种提出来解决大型模型和慢速推理问题的方案，在我们的研究中，我们发现了25种提议的塑化损失项。 unfortunately，由于各个论文的训练配置不同，对于已发表结果进行比较是很困难的。以2022年的两篇论文为例，使用同样的模型和数据集，Structural and Statistical Texture Distillation（SSTKD） Reported an increase of student mIoU of 4.54 and a final performance of 29.19，而Adaptive Perspective Distillation（APD）只有2.06个百分点的提高，但是实现了39.25的最终性能。这种极大的差异的原因通常是模型参数的不佳选择和引用模型的下表性。在我们的工作中，我们发现了塑化过程中的不足hyperparameter tuning，并通过显示SKD和IFVD两种广泛accepted框架的塑化改进 vanish when hyperparameters are optimized sufficiently。为了提高未来的研究领域的比较可读性，我们建立了三个数据集和两个学生模型的固定基线，并提供了广泛的hyperparameter tuning信息。我们发现只有ADE20K数据集上的两个技术能够与我们的简单基线相比肩。

Anatomy-informed Data Augmentation for Enhanced Prostate Cancer Detection

paper_url: http://arxiv.org/abs/2309.03652
repo_url: https://github.com/mic-dkfz/anatomy_informed_da
paper_authors: Balint Kovacs, Nils Netzer, Michael Baumgartner, Carolin Eith, Dimitrios Bounias, Clara Meinzer, Paul F. Jaeger, Kevin S. Zhang, Ralf Floca, Adrian Schrader, Fabian Isensee, Regula Gnirs, Magdalena Goertz, Viktoria Schuetz, Albrecht Stenzinger, Markus Hohenfellner, Heinz-Peter Schlemmer, Ivo Wolf, David Bonekamp, Klaus H. Maier-Hein
for: 这篇研究旨在提高医疗影像分析中的资料增强（DA）方法，以增强遗传统数据中的肿瘤标示精度。
methods: 本研究提出了一种新的生物学信息驱动的增强方法，利用邻近器官信息来模拟Typical的生物LOGICAL deformations of the prostate，实现不同的肿瘤形状和组织弹性。这个增强方法的计算成本轻量级，可以与常见的DA框架集成。
results: 本研究在774篇确诊检查中评估了一种常见的PCa检测方法，包括不同的增强设定。结果显示，这个新的增强方法可以增强PCa检测的精度和一致性。

Abstract
Data augmentation (DA) is a key factor in medical image analysis, such as in prostate cancer (PCa) detection on magnetic resonance images. State-of-the-art computer-aided diagnosis systems still rely on simplistic spatial transformations to preserve the pathological label post transformation. However, such augmentations do not substantially increase the organ as well as tumor shape variability in the training set, limiting the model's ability to generalize to unseen cases with more diverse localized soft-tissue deformations. We propose a new anatomy-informed transformation that leverages information from adjacent organs to simulate typical physiological deformations of the prostate and generates unique lesion shapes without altering their label. Due to its lightweight computational requirements, it can be easily integrated into common DA frameworks. We demonstrate the effectiveness of our augmentation on a dataset of 774 biopsy-confirmed examinations, by evaluating a state-of-the-art method for PCa detection with different augmentation settings.

摘要
增强数据（DA）是医疗图像分析中关键因素，如抑阻肾癌（PCa）检测在磁共振图像中。现有的计算机支持诊断系统仍然依赖于简单的空间变换来保持疾病标签后转换。然而，这些扩展不会显著增加器官以及肿瘤形态变化的多样性在训练集中，限制模型的泛化能力。我们提议一种新的解剖学知识支持的变换，利用邻近器官信息来模拟Typical的生理性肿瘤变化，生成Unique的疾病形态而无需改变其标签。由于其轻量级的计算需求，它可以轻松地与常见的DA框架集成。我们在774例采取确认检查中证明了我们的扩展的有效性，通过评估一种state-of-the-art方法在不同的扩展设置下进行PCa检测。

Learning of Generalizable and Interpretable Knowledge in Grid-Based Reinforcement Learning Environments

paper_url: http://arxiv.org/abs/2309.03651
repo_url: https://github.com/manueleberhardinger/ec-rl
paper_authors: Manuel Eberhardinger, Johannes Maucher, Setareh Maghsudi
for: 本文旨在理解深度强化学习训练的 Agent 之间的交互，以便在游戏或真实世界中部署 Agent。在游戏中，不合理的行为会让玩家感到困惑。在真实世界中，这种效果更加严重，因为不期望的行为可能会导致严重和长期的后果。
methods: 本文使用程序生成来模拟强化学习策略，以便更好地理解 Agent 的行为。程序具有可读性和可验证性，可以帮助我们更好地理解 Agent 学习的概念。我们使用 DreamCoder 系统，这是目前最佳的程序生成系统，在网格环境中进行学习概念，包括导航任务和两个小型的 Atari 游戏 Space Invaders 和 Asterix。
results: 我们通过观察生成的库来理解 Agent 学习的概念，并通过视觉化 Agent 决策过程来更好地理解 Agent 的行为。我们使用不同类型的程序生成器，包括搜索方法、神经网络引导搜索和语言模型精心调整代码，来评估我们的方法。

Abstract
Understanding the interactions of agents trained with deep reinforcement learning is crucial for deploying agents in games or the real world. In the former, unreasonable actions confuse players. In the latter, that effect is even more significant, as unexpected behavior cause accidents with potentially grave and long-lasting consequences for the involved individuals. In this work, we propose using program synthesis to imitate reinforcement learning policies after seeing a trajectory of the action sequence. Programs have the advantage that they are inherently interpretable and verifiable for correctness. We adapt the state-of-the-art program synthesis system DreamCoder for learning concepts in grid-based environments, specifically, a navigation task and two miniature versions of Atari games, Space Invaders and Asterix. By inspecting the generated libraries, we can make inferences about the concepts the black-box agent has learned and better understand the agent's behavior. We achieve the same by visualizing the agent's decision-making process for the imitated sequences. We evaluate our approach with different types of program synthesizers based on a search-only method, a neural-guided search, and a language model fine-tuned on code.

摘要
理解深度强化学习模型训练后的交互是部署在游戏或实际世界中的关键。在前一种情况下，不合理的行为会让玩家感到困惑。在后一种情况下，这种效果更加严重，因为不期望的行为可能会导致严重和长期的后果，对参与者来说。在这项工作中，我们提议使用程序生成来模拟强化学习策略，以观察行为序列的轨迹。程序有利于因为它们是可解释的和可验证的。我们修改了当前的程序生成系统DreamCoder，以学习在网格环境中的概念，具体来说是一个导航任务和两个小型的Atari游戏Space Invaders和Asterix。通过检查生成的库，我们可以从拟合的序列中提取出黑obox模型学习的概念，并更好地理解模型的行为。我们还可以通过可视化模型决策过程来描述拟合序列。我们使用不同类型的程序生成器，包括搜索方法、神经网络引导搜索和语言模型在代码上进行微调，来评估我们的方法。

Large-Scale Automatic Audiobook Creation

paper_url: http://arxiv.org/abs/2309.03926
repo_url: None
paper_authors: Brendan Walsh, Mark Hamilton, Greg Newby, Xi Wang, Serena Ruan, Sheng Zhao, Lei He, Shaofei Zhang, Eric Dettinger, William T. Freeman, Markus Weimer
for: This paper aims to improve the accessibility and engagement of literature by automatically generating high-quality audiobooks from online e-books.
methods: The authors use recent advances in neural text-to-speech to create and release thousands of human-quality, open-license audiobooks from the Project Gutenberg e-book collection. They identify the proper subset of e-book content to read and can operate on hundreds of books in parallel, allowing users to customize the speaking speed, style, and emotional intonation of the audiobooks.
results: The authors contributed over five thousand open-license audiobooks and an interactive demo that allows users to quickly create their own customized audiobooks. To listen to the audiobook collection, visit \url{https://aka.ms/audiobook}.

Abstract
An audiobook can dramatically improve a work of literature's accessibility and improve reader engagement. However, audiobooks can take hundreds of hours of human effort to create, edit, and publish. In this work, we present a system that can automatically generate high-quality audiobooks from online e-books. In particular, we leverage recent advances in neural text-to-speech to create and release thousands of human-quality, open-license audiobooks from the Project Gutenberg e-book collection. Our method can identify the proper subset of e-book content to read for a wide collection of diversely structured books and can operate on hundreds of books in parallel. Our system allows users to customize an audiobook's speaking speed and style, emotional intonation, and can even match a desired voice using a small amount of sample audio. This work contributed over five thousand open-license audiobooks and an interactive demo that allows users to quickly create their own customized audiobooks. To listen to the audiobook collection visit \url{https://aka.ms/audiobook}.

摘要
audiobook可以大幅提高文学作品的可访问性和读者参与度。然而，制作audiobook需要数百个工作人员的努力来创建、编辑和发布。在这项工作中，我们介绍了一种系统，可以自动生成高质量的audiobook从在线电子书。特别是，我们利用了近期的神经网络文本读取技术来创建和发布数千个人质量高的开源audiobook从Project Gutenberg电子书集。我们的方法可以确定电子书内容的合适子集，并可以同时处理数百本书。我们的系统允许用户自定义audiobook的读音速度和风格，以及使用小量的示例音频来匹配所需的声音。这项工作已经提供了五千个开源audiobook以及一个互动 demo，允许用户快速创建自己的个性化audiobook。要听取音书集，请访问 \url{https://aka.ms/audiobook}.

Promoting Fairness in GNNs: A Characterization of Stability

paper_url: http://arxiv.org/abs/2309.03648
repo_url: None
paper_authors: Yaning Jia, Chunhui Zhang
for: 本研究旨在提出一种用于稳定 Graph Neural Networks（GNN）输出的方法，以满足在非欧几何数据上进行公平训练。
methods: 本研究使用了 Lipschitz 约束来限制 GNN 输出变动，并对输出变动进行分析，以确定输出变动的最大值。
results: 研究表明，使用 Lipschitz 约束可以有效地限制 GNN 输出变动，并且可以在训练过程中更好地平衡准确性和公平性。

Abstract
The Lipschitz bound, a technique from robust statistics, can limit the maximum changes in the output concerning the input, taking into account associated irrelevant biased factors. It is an efficient and provable method for examining the output stability of machine learning models without incurring additional computation costs. Recently, Graph Neural Networks (GNNs), which operate on non-Euclidean data, have gained significant attention. However, no previous research has investigated the GNN Lipschitz bounds to shed light on stabilizing model outputs, especially when working on non-Euclidean data with inherent biases. Given the inherent biases in common graph data used for GNN training, it poses a serious challenge to constraining the GNN output perturbations induced by input biases, thereby safeguarding fairness during training. Recently, despite the Lipschitz constant's use in controlling the stability of Euclideanneural networks, the calculation of the precise Lipschitz constant remains elusive for non-Euclidean neural networks like GNNs, especially within fairness contexts. To narrow this gap, we begin with the general GNNs operating on an attributed graph, and formulate a Lipschitz bound to limit the changes in the output regarding biases associated with the input. Additionally, we theoretically analyze how the Lipschitz constant of a GNN model could constrain the output perturbations induced by biases learned from data for fairness training. We experimentally validate the Lipschitz bound's effectiveness in limiting biases of the model output. Finally, from a training dynamics perspective, we demonstrate why the theoretical Lipschitz bound can effectively guide the GNN training to better trade-off between accuracy and fairness.

摘要
“利普希茨范围”，一种从稳定统计学中的技术，可以限制输入变化所导致的输出变化，考虑到相关的无关偏调因素。这是一种有效和可证明的方法，可以无额外计算成本，检查机器学习模型的稳定性。最近，图 neural network（GNN），它们在非欧几何数据上运作，获得了很大的关注。然而，前一次的研究没有探讨GNN的利普希茨范围，对于稳定模型输出，尤其是在非欧几何数据上具有自然偏调的情况下。由于实际 graph 数据中的偏调，对于 GNN 的训练induced的输出干扰带来了严重的挑战，以保持公平性。虽然利普希茨常量在控制欧几何神经网络的稳定性中使用，但是非欧几何神经网络Like GNNs 的利普希茨常量的计算仍然是一个未解之处。为了填补这个 gap，我们从一般的 GNN 开始，定义一个利普希茨范围，以限制对于输入偏调的变化。此外，我们也进行了理论分析，该利普希茨常量如何对模型输出偏调带来的影响。我们还进行了实验 validate 利普希茨范围的有效性，限制模型输出偏调。最后，从训练动态的角度来看，我们显示了理论上的利普希茨范围可以有效地导引 GNN 训练，以更好地平衡精度和公平性。

VideolandGPT: A User Study on a Conversational Recommender System

paper_url: http://arxiv.org/abs/2309.03645
repo_url: None
paper_authors: Mateo Gutierrez Granada, Dina Zilbershtein, Daan Odijk, Francesco Barile
for: 这个论文探讨了如何使用大语言模型（LLMs）提高推荐系统，特别是基于对话的推荐系统，该系统利用用户偏好和个性化候选选择来优化推荐结果。
methods: 该论文提出了一种基于ChatGPT的视频在线推荐系统，称为VideolandGPT，该系统使用ChatGPT选择 predetermined 集合中的内容，考虑用户与对话界面的互动提供的额外上下文。
results: 我们在用户研究中对两个版本的系统进行了比较，一个是个性化版本，另一个是非个性化版本。结果显示个性化版本在准确性和总体用户满意度方面表现出色，而两个版本都提高了不在推荐列表的ITEMS的可见性。然而，两个版本在公平性方面存在不一致的行为，系统可能生成不在Videoland上的推荐。

Abstract
This paper investigates how large language models (LLMs) can enhance recommender systems, with a specific focus on Conversational Recommender Systems that leverage user preferences and personalised candidate selections from existing ranking models. We introduce VideolandGPT, a recommender system for a Video-on-Demand (VOD) platform, Videoland, which uses ChatGPT to select from a predetermined set of contents, considering the additional context indicated by users' interactions with a chat interface. We evaluate ranking metrics, user experience, and fairness of recommendations, comparing a personalised and a non-personalised version of the system, in a between-subject user study. Our results indicate that the personalised version outperforms the non-personalised in terms of accuracy and general user satisfaction, while both versions increase the visibility of items which are not in the top of the recommendation lists. However, both versions present inconsistent behavior in terms of fairness, as the system may generate recommendations which are not available on Videoland.

摘要

Beyond XAI:Obstacles Towards Responsible AI

paper_url: http://arxiv.org/abs/2309.03638
repo_url: None
paper_authors: Yulu Pi
for: 这篇论文主要是为了探讨Explainable Artificial Intelligence（XAI）领域的发展，并提出了一些用于使AI系统更加透明和理解的技术。
methods: 本论文使用了一些现有的解释性技术，并评估了这些技术在实际应用中的局限性。
results: 本论文发现了许多解释性技术和评估策略在实际应用中存在一些限制，并讨论了这些限制对负责任AI的扩展发展的影响。

Abstract
The rapidly advancing domain of Explainable Artificial Intelligence (XAI) has sparked significant interests in developing techniques to make AI systems more transparent and understandable. Nevertheless, in real-world contexts, the methods of explainability and their evaluation strategies present numerous limitations.Moreover, the scope of responsible AI extends beyond just explainability. In this paper, we explore these limitations and discuss their implications in a boarder context of responsible AI when considering other important aspects, including privacy, fairness and contestability.

摘要
rapidly advancing domain of Explainable Artificial Intelligence (XAI) has sparked significant interests in developing techniques to make AI systems more transparent and understandable. Nevertheless, in real-world contexts, the methods of explainability and their evaluation strategies present numerous limitations.Moreover, the scope of responsible AI extends beyond just explainability. In this paper, we explore these limitations and discuss their implications in a boarder context of responsible AI when considering other important aspects, including privacy, fairness and contestability.Here's the word-for-word translation:快速发展的解释人工智能（XAI）领域引起了广泛的关注，旨在开发更加透明和理解的人工智能系统。然而，在实际应用场景中，解释方法和评估策略具有多种限制。此外，负责任人工智能的范围不仅包括解释性，还包括隐私、公平和竞争等重要方面。在本文中，我们探讨这些限制的影响，并在更广泛的负责任人工智能框架下讨论它们的意义。

NeuroCodeBench: a plain C neural network benchmark for software verification

paper_url: http://arxiv.org/abs/2309.03617
repo_url: None
paper_authors: Edoardo Manino, Rafael Sá Menezes, Fedor Shmarov, Lucas C. Cordeiro
for: 这篇论文是为了证明神经网络组件中的强制保证。
methods: 论文使用了平台C编程的神经网络代码进行验证。
results: 验证结果表明，现有的软件验证工具无法证明神经网络实现中的软件问题。

Abstract
Safety-critical systems with neural network components require strong guarantees. While existing neural network verification techniques have shown great progress towards this goal, they cannot prove the absence of software faults in the network implementation. This paper presents NeuroCodeBench - a verification benchmark for neural network code written in plain C. It contains 32 neural networks with 607 safety properties divided into 6 categories: maths library, activation functions, error-correcting networks, transfer function approximation, probability density estimation and reinforcement learning. Our preliminary evaluation shows that state-of-the-art software verifiers struggle to provide correct verdicts, due to their incomplete support of the standard C mathematical library and the complexity of larger neural networks.

摘要
安全关键系统中的神经网络组件需要强大的保证。现有的神经网络验证技术已经取得了很大的进步，但它们无法证明神经网络实现中的软件问题。这篇文章介绍了NeuroCodeBench，一个用于神经网络代码中的权威验证标准。它包含32个神经网络，607个安全性特性，分为6个类别：数学库、激活函数、错误修复网络、传输函数近似、概率密度估计和奖励学习。我们的初步评估表明，现有的软件验证工具很难提供正确的判决，因为它们对标准C语言数学库的支持不够完善，大神经网络的复杂性也很高。

Evaluating ChatGPT as a Recommender System: A Rigorous Approach

paper_url: http://arxiv.org/abs/2309.03613
repo_url: https://github.com/sisinflab/Recommender-ChatGPT
paper_authors: Dario Di Palma, Giovanni Maria Biancofiore, Vito Walter Anelli, Fedelucio Narducci, Tommaso Di Noia, Eugenio Di Sciascio
for: 这种研究旨在探索ChatGPT作为零次推荐系统的可能性，以评估其根据用户喜好进行推荐、重新排序现有推荐列表、利用类似用户的信息和冷启动情况下的表现。
methods: 该研究使用了MovieLens Small、Last.FM和Facebook Book三个数据集，对ChatGPT的表现进行了广泛的实验，并与标准推荐算法和其他大语言模型进行比较，如GPT-3.5和PaLM-2。用于评估推荐效果的评价指标包括MAP、Recall、Precision、F1、nDCG、Item Coverage、EPC、ACLT和ARP等。
results: 研究发现ChatGPT在推荐领域的表现很出色，具有较高的MAP、Recall和Precision值，同时也具有较好的 Item Coverage、EPC、ACLT和ARP值。与标准推荐算法和其他大语言模型进行比较，ChatGPT的表现也很出色。

Abstract
Recent popularity surrounds large AI language models due to their impressive natural language capabilities. They contribute significantly to language-related tasks, including prompt-based learning, making them valuable for various specific tasks. This approach unlocks their full potential, enhancing precision and generalization. Research communities are actively exploring their applications, with ChatGPT receiving recognition. Despite extensive research on large language models, their potential in recommendation scenarios still needs to be explored. This study aims to fill this gap by investigating ChatGPT's capabilities as a zero-shot recommender system. Our goals include evaluating its ability to use user preferences for recommendations, reordering existing recommendation lists, leveraging information from similar users, and handling cold-start situations. We assess ChatGPT's performance through comprehensive experiments using three datasets (MovieLens Small, Last.FM, and Facebook Book). We compare ChatGPT's performance against standard recommendation algorithms and other large language models, such as GPT-3.5 and PaLM-2. To measure recommendation effectiveness, we employ widely-used evaluation metrics like Mean Average Precision (MAP), Recall, Precision, F1, normalized Discounted Cumulative Gain (nDCG), Item Coverage, Expected Popularity Complement (EPC), Average Coverage of Long Tail (ACLT), Average Recommendation Popularity (ARP), and Popularity-based Ranking-based Equal Opportunity (PopREO). Through thoroughly exploring ChatGPT's abilities in recommender systems, our study aims to contribute to the growing body of research on the versatility and potential applications of large language models. Our experiment code is available on the GitHub repository: https://github.com/sisinflab/Recommender-ChatGPT

摘要
现在，大型人工智能语言模型因其自然语言能力而受到广泛关注。它们在语言相关任务中发挥了重要作用，包括提示学习，使其在各种特定任务中成为了珍贵的资源。这种方法可以激活它们的潜力，提高精度和通用性。研究人员 aktif explore其应用，如ChatGPT receiving recognition。Despite extensive research on large language models, their potential in recommendation scenarios still needs to be explored. This study aims to fill this gap by investigating ChatGPT's capabilities as a zero-shot recommender system. Our goals include evaluating its ability to use user preferences for recommendations, reordering existing recommendation lists, leveraging information from similar users, and handling cold-start situations. We assess ChatGPT's performance through comprehensive experiments using three datasets (MovieLens Small, Last.FM, and Facebook Book). We compare ChatGPT's performance against standard recommendation algorithms and other large language models, such as GPT-3.5 and PaLM-2. To measure recommendation effectiveness, we employ widely-used evaluation metrics like Mean Average Precision (MAP), Recall, Precision, F1, normalized Discounted Cumulative Gain (nDCG), Item Coverage, Expected Popularity Complement (EPC), Average Coverage of Long Tail (ACLT), Average Recommendation Popularity (ARP), and Popularity-based Ranking-based Equal Opportunity (PopREO). Through thoroughly exploring ChatGPT's abilities in recommender systems, our study aims to contribute to the growing body of research on the versatility and potential applications of large language models. Our experiment code is available on the GitHub repository:

Spatial encoding of BOLD fMRI time series for categorizing static images across visual datasets: A pilot study on human vision

paper_url: http://arxiv.org/abs/2309.03590
repo_url: https://github.com/kancharlavamshi/Spatial-encoding-of-BOLD-fmri-time-series-for-categorical-static-images-across-visual-dataset
paper_authors: Vamshi K. Kancharala, Debanjali Bhattacharya, Neelam Sinha
for: 这个研究用于了解人脑如何处理不同复杂度的图像，以便更好地理解视觉功能。
methods: 这个研究使用了功能磁共振成像（fMRI）时间序列（TS），使用类别幂angular field（GAF）和马尔可夫过渡场（MTF）进行空间编码，并使用了多层感知网络（CNN）进行分类。
results: 研究发现，并行的CNN模型在分类图像 across COCO、ImageNet和SUN三个标准计算机视觉数据集时表现出色，与其他网络模型相比，提高了7%的多类分类精度。

Abstract
Functional MRI (fMRI) is widely used to examine brain functionality by detecting alteration in oxygenated blood flow that arises with brain activity. In this study, complexity specific image categorization across different visual datasets is performed using fMRI time series (TS) to understand differences in neuronal activities related to vision. Publicly available BOLD5000 dataset is used for this purpose, containing fMRI scans while viewing 5254 images of diverse categories, drawn from three standard computer vision datasets: COCO, ImageNet and SUN. To understand vision, it is important to study how brain functions while looking at different images. To achieve this, spatial encoding of fMRI BOLD TS has been performed that uses classical Gramian Angular Field (GAF) and Markov Transition Field (MTF) to obtain 2D BOLD TS, representing images of COCO, Imagenet and SUN. For classification, individual GAF and MTF features are fed into regular CNN. Subsequently, parallel CNN model is employed that uses combined 2D features for classifying images across COCO, Imagenet and SUN. The result of 2D CNN models is also compared with 1D LSTM and Bi-LSTM that utilizes raw fMRI BOLD signal for classification. It is seen that parallel CNN model outperforms other network models with an improvement of 7% for multi-class classification. Clinical relevance- The obtained result of this analysis establishes a baseline in studying how differently human brain functions while looking at images of diverse complexities.

摘要
Functional MRI (fMRI) 广泛用于评估大脑功能，通过检测大脑活动引起的氧游泡流变化。在这项研究中，使用 fMRI 时间序列（TS）来分类不同的视觉数据集，以了解与视觉相关的神经活动之间的差异。使用公共可用的 BOLD5000 数据集，包括在视觉 5254 个不同类别的图像上进行 fMRI 扫描，这些图像来自三个标准计算机视觉数据集：COCO、ImageNet 和 SUN。为了理解视觉，需要研究大脑如何在不同的图像上工作。为此，使用类传统的 Gramian Angular Field (GAF) 和 Markov Transition Field (MTF) 来获得 2D BOLD TS，表示 COCO、ImageNet 和 SUN 三个数据集中的图像。然后，使用各自 GAF 和 MTF 特征进行分类，并使用并行的 CNN 模型来结合这些 2D 特征进行图像分类。结果表明，并行 CNN 模型在多类分类中比其他网络模型提高了7%。临床相关性：这项分析的结果建立了对于研究人类大脑在不同复杂度的图像上如何工作的基线。

Interactive Hyperparameter Optimization in Multi-Objective Problems via Preference Learning

paper_url: http://arxiv.org/abs/2309.03581
repo_url: https://github.com/automl/interactive-mo-ml
paper_authors: Joseph Giovanelli, Alexander Tornede, Tanja Tornede, Marius Lindauer
for: 本文主要用于解决多目标机器学习（MO-ML）中的超参数优化问题，即在多个目标之间找到最佳的超参数配置。
methods: 本文提出了一种人类中心的交互式超参数优化方法，利用喜好学习提取用户需求，而不是让用户手动选择合适的指标。
results: 实验研究表明，该方法可以比用户手动选择的指标优化超参数，并且在高级用户知道选择哪个指标时表现相当。I hope that helps! Let me know if you have any further questions.

Abstract
Hyperparameter optimization (HPO) is important to leverage the full potential of machine learning (ML). In practice, users are often interested in multi-objective (MO) problems, i.e., optimizing potentially conflicting objectives, like accuracy and energy consumption. To tackle this, the vast majority of MO-ML algorithms return a Pareto front of non-dominated machine learning models to the user. Optimizing the hyperparameters of such algorithms is non-trivial as evaluating a hyperparameter configuration entails evaluating the quality of the resulting Pareto front. In literature, there are known indicators that assess the quality of a Pareto front (e.g., hypervolume, R2) by quantifying different properties (e.g., volume, proximity to a reference point). However, choosing the indicator that leads to the desired Pareto front might be a hard task for a user. In this paper, we propose a human-centered interactive HPO approach tailored towards multi-objective ML leveraging preference learning to extract desiderata from users that guide the optimization. Instead of relying on the user guessing the most suitable indicator for their needs, our approach automatically learns an appropriate indicator. Concretely, we leverage pairwise comparisons of distinct Pareto fronts to learn such an appropriate quality indicator. Then, we optimize the hyperparameters of the underlying MO-ML algorithm towards this learned indicator using a state-of-the-art HPO approach. In an experimental study targeting the environmental impact of ML, we demonstrate that our approach leads to substantially better Pareto fronts compared to optimizing based on a wrong indicator pre-selected by the user, and performs comparable in the case of an advanced user knowing which indicator to pick.

摘要
在这篇文章中，我们提出了一种人类中心的交互式 HPO 方法，适应多目标 ML 的需求。不同于基于用户的猜测来选择最适合的指标，我们的方法会自动学习一个适当的指标。具体来说，我们利用 Pareto 前纵之间的对比来学习这个适当的质量指标。然后，我们使用现有的 HPO 方法来优化超参数，以便以该学习的指标来评估 Pareto 前纵的质量。在针对机器学习的环境影响的实验研究中，我们示出了我们的方法可以相比于用户预先选择的指标来优化 Pareto 前纵，并且在用户了解哪个指标最适合的情况下，我们的方法可以与其相比。

DTW+S: Shape-based Comparison of Time-series with Ordered Local Trend

paper_url: http://arxiv.org/abs/2309.03579
repo_url: https://github.com/scc-usc/DTW_S_apps
paper_authors: Ajitesh Srivastava
for: 本研究旨在开发一种可以识别时间序列数据中相似的趋势的度量方法，用于应用领域中的分类和归类。
methods: 本研究使用DTW+S方法，该方法首先将时间序列数据转换为可读性好的“相似性保持”矩阵表示，其中每列表示当地趋势，然后应用动态时间戳匹配来计算这些矩阵之间的距离。
results: 研究表明，DTW+S方法可以更好地识别时间序列数据中的相似趋势，特别是当本地趋势比矩阵规模具有更大的重要性时。此外，DTW+S方法也可以在 ensemble 建立和时间序列数据的归类中得到更好的结果。

Abstract
Measuring distance or similarity between time-series data is a fundamental aspect of many applications including classification and clustering. Existing measures may fail to capture similarities due to local trends (shapes) and may even produce misleading results. Our goal is to develop a measure that looks for similar trends occurring around similar times and is easily interpretable for researchers in applied domains. This is particularly useful for applications where time-series have a sequence of meaningful local trends that are ordered, such as in epidemics (a surge to an increase to a peak to a decrease). We propose a novel measure, DTW+S, which creates an interpretable "closeness-preserving" matrix representation of the time-series, where each column represents local trends, and then it applies Dynamic Time Warping to compute distances between these matrices. We present a theoretical analysis that supports the choice of this representation. We demonstrate the utility of DTW+S in ensemble building and clustering of epidemic curves. We also demonstrate that our approach results in better classification compared to Dynamic Time Warping for a class of datasets, particularly when local trends rather than scale play a decisive role.

摘要
We propose a novel measure, DTW+S, which creates an interpretable "closeness-preserving" matrix representation of the time-series, where each column represents local trends, and then applies Dynamic Time Warping to compute distances between these matrices. We present a theoretical analysis that supports the choice of this representation.We demonstrate the utility of DTW+S in ensemble building and clustering of epidemic curves. We also show that our approach results in better classification compared to Dynamic Time Warping for a class of datasets, particularly when local trends rather than scale play a decisive role.Translated into Simplified Chinese:时间序列数据的距离或相似性的评估是许多应用程序中的基本要求，包括分类和归类。现有的度量可能不能捕捉相似性，因为它们可能忽略地方趋势（形状），甚至生成错误的结果。我们的目标是开发一种度量，它搜寻在相似时间点上的相似趋势，并且能够让应用领域研究者更好地理解。这特别有用于具有意义的地方趋势的时间序列，例如疫病肆虐（增长到峰值到减少）。我们提出了一种新的度量方法，DTW+S，它创建了可解释的“亲缘性保持”矩阵表示时间序列，每列表示地方趋势，然后应用动态时间戳对这些矩阵进行计算距离。我们提供了理论分析，支持我们的选择。我们示出DTW+S在套件建立和时间序列归类中的实用性。我们还表明，当地方趋势而不是比例决定时，我们的方法比动态时间戳更好地分类。

Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation

paper_url: http://arxiv.org/abs/2309.03549
repo_url: None
paper_authors: Jiaxi Gu, Shicong Wang, Haoyu Zhao, Tianyi Lu, Xing Zhang, Zuxuan Wu, Songcen Xu, Wei Zhang, Yu-Gang Jiang, Hang Xu
for: 本研究旨在应用Latent Diffusion Models（LDM）于文本到视频生成，这是一项复杂的挑战，因为模型训练和推断过程中的计算和存储限制。
methods: 我们提出了一个名为“Reuse and Diffuse”的框架，称为$\textit{VidRD}$，用于生成更多的视频帧。我们conditioned on an initial video clip with a small number of frames，iteratively generate additional frames by reusing the original latent features and following the previous diffusion process。此外，我们还在权重网中添加了时间层，并对这些层进行了微调以提高时间一致性。
results: 我们的方法在量化和质量评估中都达到了良好的结果。我们的项目页面可以在 $\href{https://anonymous0x233.github.io/ReuseAndDiffuse/}{here}$ 上找到。

Abstract
Inspired by the remarkable success of Latent Diffusion Models (LDMs) for image synthesis, we study LDM for text-to-video generation, which is a formidable challenge due to the computational and memory constraints during both model training and inference. A single LDM is usually only capable of generating a very limited number of video frames. Some existing works focus on separate prediction models for generating more video frames, which suffer from additional training cost and frame-level jittering, however. In this paper, we propose a framework called "Reuse and Diffuse" dubbed $\textit{VidRD}$ to produce more frames following the frames already generated by an LDM. Conditioned on an initial video clip with a small number of frames, additional frames are iteratively generated by reusing the original latent features and following the previous diffusion process. Besides, for the autoencoder used for translation between pixel space and latent space, we inject temporal layers into its decoder and fine-tune these layers for higher temporal consistency. We also propose a set of strategies for composing video-text data that involve diverse content from multiple existing datasets including video datasets for action recognition and image-text datasets. Extensive experiments show that our method achieves good results in both quantitative and qualitative evaluations. Our project page is available $\href{https://anonymous0x233.github.io/ReuseAndDiffuse/}{here}$.

摘要
受 latent diffusion models (LDMs) 的成功启发，我们研究了 LDM 的文本到视频生成，这是一项具有计算和内存约束的挑战。通常情况下，一个 LDM 只能生成一个非常有限的数量的视频帧。一些现有的方法是通过分立预测模型来生成更多的视频帧，但这会增加训练成本和帧级抖动。在这篇论文中，我们提出了一个名为“Reuse and Diffuse”的框架，称为 $\textit{VidRD}$，用于生成更多的视频帧。基于一个初始的视频片段，我们通过重用原始的秘密特征和前一个扩散过程来生成更多的帧。此外，我们在干扰层中插入了时间层，并对这些层进行了精细调整，以提高时间一致性。我们还提出了一些组合视频-文本数据的策略，包括多个现有数据集的视频数据和图像-文本数据。我们的实验表明，我们的方法在量和质量两个方面都取得了良好的结果。我们的项目页面可以在 $\href{https://anonymous0x233.github.io/ReuseAndDiffuse/}{这里}$ 找到。

DGC: Training Dynamic Graphs with Spatio-Temporal Non-Uniformity using Graph Partitioning by Chunks

paper_url: http://arxiv.org/abs/2309.03523
repo_url: None
paper_authors: Fahao Chen, Peng Li, Celimuge Wu
for: 这个研究旨在提高动态图神经网络（DGNN）的训练效率，建立一个分布式系统来加速DGNN训练。
methods: 本研究提出了一种基于图缩放的分割策略，将动态图分割成更小的块，以便更好地分配工作负荷到多个GPU上。此外，本研究还提出了一种粗略汇集和自适应停止汇集技术来提高训练效率。
results: experiments 表明，与现有的状态OF-THE-ART系统相比，DGC可以在测试环境中 achieve 1.25x - 7.52x的速度提升。此外，DGC还具有高效的运行时，可以快速地处理大型图。

Abstract
Dynamic Graph Neural Network (DGNN) has shown a strong capability of learning dynamic graphs by exploiting both spatial and temporal features. Although DGNN has recently received considerable attention by AI community and various DGNN models have been proposed, building a distributed system for efficient DGNN training is still challenging. It has been well recognized that how to partition the dynamic graph and assign workloads to multiple GPUs plays a critical role in training acceleration. Existing works partition a dynamic graph into snapshots or temporal sequences, which only work well when the graph has uniform spatio-temporal structures. However, dynamic graphs in practice are not uniformly structured, with some snapshots being very dense while others are sparse. To address this issue, we propose DGC, a distributed DGNN training system that achieves a 1.25x - 7.52x speedup over the state-of-the-art in our testbed. DGC's success stems from a new graph partitioning method that partitions dynamic graphs into chunks, which are essentially subgraphs with modest training workloads and few inter connections. This partitioning algorithm is based on graph coarsening, which can run very fast on large graphs. In addition, DGC has a highly efficient run-time, powered by the proposed chunk fusion and adaptive stale aggregation techniques. Extensive experimental results on 3 typical DGNN models and 4 popular dynamic graph datasets are presented to show the effectiveness of DGC.

摘要
“几何对应神经网络”（DGNN）有强大的能力学习动态图，利用图形空间和时间特征。 although DGNN 在艺术社群中获得了很大的关注，并提出了许多DGNN 模型，但是建立高效的DGNN 训练分布式系统仍然是挑战。 existing works 将动态图 partitioned into snapshots or temporal sequences，这些方法只有在图形中具有均匀的空间-时间结构下可以实现高效。然而，实际上的动态图不具有均匀的结构，有些快照是非常紧密的，而其他快照则是疏松的。为了解决这个问题，我们提出了DGC，一个高效的分布式DGNN 训练系统，在我们的测试环境中实现了1.25x至7.52x的速度提升。 DGC 的成功从一种新的图形分割方法中获得，这种分割方法将动态图分成块，这些块是具有轻量级训练工作和少量的相互连接的子图。这个分割算法基于图形缩小，可以在大型图上执行非常快速。此外，DGC 还具有非常高效的执行时间，推动了我们提出的块融合和自适应统计聚合技术。实际实验结果显示，DGC 在3种常见的 DGNN 模型和4种受欢迎的动态图dataset上具有很高的效果。”

Parameterized Aspects of Distinct Kemeny Rank Aggregation

paper_url: http://arxiv.org/abs/2309.03517
repo_url: None
paper_authors: Koustav De, Harshil Mittal, Palash Dey, Neeldhara Misra
for: 本文研究了使用基美方法进行排名聚合的计算问题，特别是在不同参数下的计算复杂性。
methods: 本文使用了参数化复杂性的概念，研究了不同参数下的计算复杂性，并提供了一系列的FPTP算法来解决这些问题。
results: 本文发现了在不同参数下，可以使用FPTP算法来计算基美排名，并且可以在Running time中得到满意的结果。此外，本文还提供了FPTPapproximation算法来解决基美排名聚合问题。

Abstract
The Kemeny method is one of the popular tools for rank aggregation. However, computing an optimal Kemeny ranking is NP-hard. Consequently, the computational task of finding a Kemeny ranking has been studied under the lens of parameterized complexity with respect to many parameters. We first present a comprehensive relationship, both theoretical and empirical, among these parameters. Further, we study the problem of computing all distinct Kemeny rankings under the lens of parameterized complexity. We consider the target Kemeny score, number of candidates, average distance of input rankings, maximum range of any candidate, and unanimity width as our parameters. For all these parameters, we already have FPT algorithms. We find that any desirable number of Kemeny rankings can also be found without substantial increase in running time. We also present FPT approximation algorithms for Kemeny rank aggregation with respect to these parameters.

摘要
“凯曼尼方法是一种受欢迎的选举排名协调工具。然而，计算优化的凯曼尼排名是NP困难的。因此，计算找到凯曼尼排名的计算任务在参数化复杂性下被研究。我们首先提供了完整的关系，both theoretically和empirically， Among these parameters. Furthermore, we study the problem of computing all distinct Kemeny rankings under the lens of parameterized complexity. We consider the target Kemeny score, number of candidates, average distance of input rankings, maximum range of any candidate, and unanimity width as our parameters. For all these parameters, we already have FPT algorithms. We find that any desirable number of Kemeny rankings can also be found without substantial increase in running time. We also present FPT approximation algorithms for Kemeny rank aggregation with respect to these parameters.”Note: FPT stands for "parameterized tractable" and refers to the fact that the algorithm's running time is bounded by a function of the input size and the parameter, rather than the input size alone.

Towards Robust Natural-Looking Mammography Lesion Synthesis on Ipsilateral Dual-Views Breast Cancer Analysis

paper_url: http://arxiv.org/abs/2309.03506
repo_url: None
paper_authors: Thanh-Huy Nguyen, Quang Hien Kha, Thai Ngoc Toan Truong, Ba Thinh Lam, Ba Hung Ngo, Quang Vinh Dinh, Nguyen Quoc Khanh Le
for: 提高癌症分类任务的精度和效率
methods: 利用多视图环境和简单且可靠的SynthMix框架，杜绝训练和测试阶段的分类器
results: 在VinDr-Mammo和CMMD数据集上实现了新方法的效果，比较前一代方法在实验设置中的表现

Abstract
In recent years, many mammographic image analysis methods have been introduced for improving cancer classification tasks. Two major issues of mammogram classification tasks are leveraging multi-view mammographic information and class-imbalance handling. In the first problem, many multi-view methods have been released for concatenating features of two or more views for the training and inference stage. Having said that, most multi-view existing methods are not explainable in the meaning of feature fusion, and treat many views equally for diagnosing. Our work aims to propose a simple but novel method for enhancing examined view (main view) by leveraging low-level feature information from the auxiliary view (ipsilateral view) before learning the high-level feature that contains the cancerous features. For the second issue, we also propose a simple but novel malignant mammogram synthesis framework for upsampling minor class samples. Our easy-to-implement and no-training framework has eliminated the current limitation of the CutMix algorithm which is unreliable synthesized images with random pasted patches, hard-contour problems, and domain shift problems. Our results on VinDr-Mammo and CMMD datasets show the effectiveness of our two new frameworks for both multi-view training and synthesizing mammographic images, outperforming the previous conventional methods in our experimental settings.

摘要

InteractionNet: Joint Planning and Prediction for Autonomous Driving with Transformers

paper_url: http://arxiv.org/abs/2309.03475
repo_url: None
paper_authors: Jiawei Fu, Yanqing Shen, Zhiqiang Jian, Shitao Chen, Jingmin Xin, Nanning Zheng
for: 本研究旨在提高自动驾驶车辆的规划和预测模块，以便更好地处理交通场景中的互动和动态变化。
methods: 本研究使用 transformer 来共享全局上下文推理，并将规划和预测融合在一起，以实现联合推理。此外，模型还使用另一个 transformer 来增强对感知区域中的车辆的注意力。
results: 相比其他基线模型，InteractionNet 在多个测试 benchmark 中表现出色，特别是在安全性方面，这主要归功于规划和预测的联合考虑。模型的代码将于 GitHub 上公开。

Abstract
Planning and prediction are two important modules of autonomous driving and have experienced tremendous advancement recently. Nevertheless, most existing methods regard planning and prediction as independent and ignore the correlation between them, leading to the lack of consideration for interaction and dynamic changes of traffic scenarios. To address this challenge, we propose InteractionNet, which leverages transformer to share global contextual reasoning among all traffic participants to capture interaction and interconnect planning and prediction to achieve joint. Besides, InteractionNet deploys another transformer to help the model pay extra attention to the perceived region containing critical or unseen vehicles. InteractionNet outperforms other baselines in several benchmarks, especially in terms of safety, which benefits from the joint consideration of planning and forecasting. The code will be available at https://github.com/fujiawei0724/InteractionNet.

摘要
《计划和预测两个重要模块在自动驾驶中受到了极大的提高。然而，大多数现有方法假设计划和预测是独立的，忽略了交通enario中参与者之间的交互关系和动态变化，导致缺乏考虑安全性。为解决这个挑战，我们提出了InteractionNet，它利用转换器来共享全局上下文推理，以捕捉交互和连接计划和预测，实现共同。此外，InteractionNet还部署了另一个转换器，使模型更注重感知区域中的重要或未经见过的车辆。InteractionNet在多个标准测试中表现出色，特别是在安全性方面，受到了计划和预测的共同考虑的启示。代码将在https://github.com/fujiawei0724/InteractionNet上公开。》Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

paper_url: http://arxiv.org/abs/2309.06578
repo_url: None
paper_authors: Sai Koneru, Jian Wu, Sarah Rajtmajer
for: 本研究的目的是使用大型自然语言模型（LLM）来探索科学Abstract中支持或驳斥特定假设的证据。
methods: 本研究使用了社会科学领域的社区驱动标注来创建了一个新的数据集，并对多种现有的状况标准比较LLM的表现。
results: 研究发现LLM可以准确地检测出科学Abstract中支持或驳斥特定假设的证据，并且可以与多种现有的状况标准进行比较。同时，研究还提出了未来研究的可能性，例如针对不同领域的研究和更多的数据集创建等。

Abstract
Hypothesis formulation and testing are central to empirical research. A strong hypothesis is a best guess based on existing evidence and informed by a comprehensive view of relevant literature. However, with exponential increase in the number of scientific articles published annually, manual aggregation and synthesis of evidence related to a given hypothesis is a challenge. Our work explores the ability of current large language models (LLMs) to discern evidence in support or refute of specific hypotheses based on the text of scientific abstracts. We share a novel dataset for the task of scientific hypothesis evidencing using community-driven annotations of studies in the social sciences. We compare the performance of LLMs to several state-of-the-art benchmarks and highlight opportunities for future research in this area. The dataset is available at https://github.com/Sai90000/ScientificHypothesisEvidencing.git

摘要

Fast FixMatch: Faster Semi-Supervised Learning with Curriculum Batch Size

paper_url: http://arxiv.org/abs/2309.03469
repo_url: None
paper_authors: John Chen, Chen Dun, Anastasios Kyrillidis
for: 本研究的目的是提出一种名为快速匹配（Fast FixMatch）的新ssl算法，以提高 semi-supervised learning（ssl）的效率和性能。
methods: 本研究使用了一种名为batch size curriculum（CBS）的方法，即在训练过程中逐渐增加无标签批处理的大小，以便降低训练计算量。此外，本研究还使用了强制标签扩展（strong labeled augmentation）和pseudo标签生成（CPL）等技术。
results: 本研究的结果表明，使用CBS和强制标签扩展/CPL可以 synergistically提高ssl的性能，同时降低训练计算量。具体来说，在CIFAR-10、CIFAR-100、SVHN和STL-10等 datasets上，快速匹配可以在所有 except 40、250和4000个标签 removed情况下实现2.1-3.4倍的训练计算量减少，而且与相同的参考状态得到同等的错误率。此外，快速匹配还可以在联合学习ssl任务和在线/流式学习ssl任务中实现2.6-3.3倍的训练计算量减少。

Abstract
Advances in Semi-Supervised Learning (SSL) have almost entirely closed the gap between SSL and Supervised Learning at a fraction of the number of labels. However, recent performance improvements have often come \textit{at the cost of significantly increased training computation}. To address this, we propose Curriculum Batch Size (CBS), \textit{an unlabeled batch size curriculum which exploits the natural training dynamics of deep neural networks.} A small unlabeled batch size is used in the beginning of training and is gradually increased to the end of training. A fixed curriculum is used regardless of dataset, model or number of epochs, and reduced training computations is demonstrated on all settings. We apply CBS, strong labeled augmentation, Curriculum Pseudo Labeling (CPL) \citep{FlexMatch} to FixMatch \citep{FixMatch} and term the new SSL algorithm Fast FixMatch. We perform an ablation study to show that strong labeled augmentation and/or CPL do not significantly reduce training computations, but, in synergy with CBS, they achieve optimal performance. Fast FixMatch also achieves substantially higher data utilization compared to previous state-of-the-art. Fast FixMatch achieves between $2.1\times$ - $3.4\times$ reduced training computations on CIFAR-10 with all but 40, 250 and 4000 labels removed, compared to vanilla FixMatch, while attaining the same cited state-of-the-art error rate \citep{FixMatch}. Similar results are achieved for CIFAR-100, SVHN and STL-10. Finally, Fast MixMatch achieves between $2.6\times$ - $3.3\times$ reduced training computations in federated SSL tasks and online/streaming learning SSL tasks, which further demonstrate the generializbility of Fast MixMatch to different scenarios and tasks.

摘要
SSL 技术的进步已经几乎完全将 semi-supervised learning (SSL) 和直接学习 (Supervised Learning) 的差距缩小到了一半，但是最近的性能改进通常是在增加训练计算的代价下得来的。为解决这个问题，我们提议了批处理大小学习纲（Curriculum Batch Size，CBS），它利用深度神经网络的自然训练dinamics来逐渐增加无标记批处理大小。我们在所有设置下使用了一个固定的学习纲，并且证明了它可以减少训练计算。我们将CBS、强大的标记增强、CURRICULUM PSEUDO LABELING（CPL）和 FixMatch 结合使用，并将其称为 Fast FixMatch。我们进行了一个ablation study，并证明了强大的标记增强和/或 CPL 不会减少训练计算，但是在协同作用下，它们可以达到最佳性能。 Fast FixMatch 还实现了较高的数据利用率，相比前一代 state-of-the-art。我们在 CIFAR-10、CIFAR-100、SVHN 和 STL-10 上进行了相似的实验，并得到了类似的结果。最后，Fast MixMatch 在 federated SSL 任务和在线/流式学习 SSL 任务中实现了 $2.6\times$ - $3.3\times$ 的减少训练计算，这再次证明了 Fast MixMatch 的通用性和可靠性。

Cross-Image Context Matters for Bongard Problems

paper_url: http://arxiv.org/abs/2309.03468
repo_url: https://github.com/nraghuraman/bongard-context
paper_authors: Nikhil Raghuraman, Adam W. Harley, Leonidas Guibas
for: 本研究旨在解决现代机器学习方法在Bongard问题上的缺陷，Bongard问题是一种类型的智能测试，需要从一组正例和负例图像中抽出抽象的概念，并将新的查询图像分类为是否符合该概念。
methods: 本研究使用了一些简单的方法来考虑跨图像上下文信息，包括使用多个正例和负例图像来分别提取概念的特征，并将这些特征组合在一起以提高分类精度。
results: 本研究实现了substantial的提升，在Bongard-LOGO和Bongard-HOI上达到了新的状态码性能（75.3%和72.45%），并在原始Bongard问题集上实现了strong的性能（60.84%）。

Abstract
Current machine learning methods struggle to solve Bongard problems, which are a type of IQ test that requires deriving an abstract "concept" from a set of positive and negative "support" images, and then classifying whether or not a new query image depicts the key concept. On Bongard-HOI, a benchmark for natural-image Bongard problems, existing methods have only reached 66% accuracy (where chance is 50%). Low accuracy is often attributed to neural nets' lack of ability to find human-like symbolic rules. In this work, we point out that many existing methods are forfeiting accuracy due to a much simpler problem: they do not incorporate information contained in the support set as a whole, and rely instead on information extracted from individual supports. This is a critical issue, because unlike in few-shot learning tasks concerning object classification, the "key concept" in a typical Bongard problem can only be distinguished using multiple positives and multiple negatives. We explore a variety of simple methods to take this cross-image context into account, and demonstrate substantial gains over prior methods, leading to new state-of-the-art performance on Bongard-LOGO (75.3%) and Bongard-HOI (72.45%) and strong performance on the original Bongard problem set (60.84%).

摘要
当前的机器学习方法难以解决博格ар问题，这种问题需要从一组正例和负例图像中推导抽象的概念，然后判断新的查询图像是否表示关键概念。在Bongard-HOIbenchmark上，现有的方法只达到66%的准确率（比例为50%）。低准确率常被归结于神经网络缺乏找到人类类似的 символиRule的能力。在这项工作中，我们指出了许多现有方法宁恶地丢失了准确性，因为它们不会将支持集中的信息全面地利用，而是仅仅从个别支持中提取信息。这是一个重要的问题，因为在典型的博格ар问题中，关键概念只能通过多个正例和多个负例来 отличи出来。我们探索了一些简单的方法来考虑这些跨图像上下文信息，并证明了substantial提高，达到了新的状态态Performance在Bongard-LOGO（75.3%）和Bongard-HOI（72.45%），以及在原始博格ар问题集上强性表现（60.84%）。

Autoregressive Omni-Aware Outpainting for Open-Vocabulary 360-Degree Image Generation

paper_url: http://arxiv.org/abs/2309.03467
repo_url: None
paper_authors: Zhuqiang Lu, Kun Hu, Chaoyue Wang, Lei Bai, Zhiyong Wang
for:* 这篇论文旨在提出一种基于权重学习的方法，用于从窄视场（NFoV）图像中生成全景图像。methods:* 该方法使用了权重学习的 autoregressive omni-aware 生成网络（AOG-Net），通过逐步填充不完整的全景图像，使用 NFoV 图像和文本引用来进行指导。* 该方法还使用了全球-本地conditioning机制，将文本引用、全景视觉指示、NFoV输入和全景几何都编码并转化为一个全球流和一个本地流，并将其 integrate into a conditioned generative backbone model。results:* 对于两个常用的全景图像集，该方法在indoor和outdoor的场景中达到了当今最佳性能。* 该方法可以使用大规模的模型来激活大量的文本引用，从而提高生成的精度和一致性。

Abstract
A 360-degree (omni-directional) image provides an all-encompassing spherical view of a scene. Recently, there has been an increasing interest in synthesising 360-degree images from conventional narrow field of view (NFoV) images captured by digital cameras and smartphones, for providing immersive experiences in various scenarios such as virtual reality. Yet, existing methods typically fall short in synthesizing intricate visual details or ensure the generated images align consistently with user-provided prompts. In this study, autoregressive omni-aware generative network (AOG-Net) is proposed for 360-degree image generation by out-painting an incomplete 360-degree image progressively with NFoV and text guidances joinly or individually. This autoregressive scheme not only allows for deriving finer-grained and text-consistent patterns by dynamically generating and adjusting the process but also offers users greater flexibility to edit their conditions throughout the generation process. A global-local conditioning mechanism is devised to comprehensively formulate the outpainting guidance in each autoregressive step. Text guidances, omni-visual cues, NFoV inputs and omni-geometry are encoded and further formulated with cross-attention based transformers into a global stream and a local stream into a conditioned generative backbone model. As AOG-Net is compatible to leverage large-scale models for the conditional encoder and the generative prior, it enables the generation to use extensive open-vocabulary text guidances. Comprehensive experiments on two commonly used 360-degree image datasets for both indoor and outdoor settings demonstrate the state-of-the-art performance of our proposed method. Our code will be made publicly available.

摘要
《全景图像生成方法 based on Omni-aware Generative Network》Introduction:Recently, there has been an increasing interest in generating 360-degree images from narrow field of view (NFoV) images captured by digital cameras and smartphones, for providing immersive experiences in various scenarios such as virtual reality. However, existing methods often fall short in synthesizing intricate visual details or ensuring the generated images align consistently with user-provided prompts.Methodology:In this study, we propose an autoregressive omni-aware generative network (AOG-Net) for 360-degree image generation. The network uses an incomplete 360-degree image as input and progressively out-paints it with NFoV and text guidance. The autoregressive scheme allows for deriving finer-grained and text-consistent patterns by dynamically generating and adjusting the process, offering users greater flexibility to edit their conditions throughout the generation process.Key Components:1. Global-Local Conditioning Mechanism: We devise a global-local conditioning mechanism to comprehensively formulate the outpainting guidance in each autoregressive step. Text guidances, omni-visual cues, NFoV inputs, and omni-geometry are encoded and further formulated with cross-attention based transformers into a global stream and a local stream into a conditioned generative backbone model.2. Compatibility with Large-Scale Models: Our proposed method is compatible with large-scale models for the conditional encoder and the generative prior, enabling the generation to use extensive open-vocabulary text guidances.Experiments:We conduct comprehensive experiments on two commonly used 360-degree image datasets for both indoor and outdoor settings, demonstrating the state-of-the-art performance of our proposed method.Conclusion:In this study, we proposed an autoregressive omni-aware generative network (AOG-Net) for 360-degree image generation, which offers a more flexible and controllable approach to synthesizing high-quality 360-degree images from NFoV images. Our proposed method has the potential to be applied in various scenarios, such as virtual reality, panoramic imaging, and 3D reconstruction. The code will be made publicly available.

MIRA: Cracking Black-box Watermarking on Deep Neural Networks via Model Inversion-based Removal Attacks

paper_url: http://arxiv.org/abs/2309.03466
repo_url: None
paper_authors: Yifan Lu, Wenxuan Li, Mi Zhang, Xudong Pan, Min Yang
for: 保护深度学习模型的知识产权，黑obox深度学习模型水印（black-box DNN watermarks）在学术和工业领域得到了广泛应用。
methods: 我们提出了一种名为模型反向攻击(\textsc{Mira})的新型攻击方法，可以对大多数主流黑obox深度学习模型水印进行有效的除法。
results: 我们在三个 benchmark 数据集和 DNN 架构上对 \textsc{Mira} 进行了广泛的评估，并证明了它在覆盖的水印上具有强大的除法效果，保留至少 90% 的盗取模型用途，并且不需要dataset的可用性。

Abstract
To protect the intellectual property of well-trained deep neural networks (DNNs), black-box DNN watermarks, which are embedded into the prediction behavior of DNN models on a set of specially-crafted samples, have gained increasing popularity in both academy and industry. Watermark robustness is usually implemented against attackers who steal the protected model and obfuscate its parameters for watermark removal. Recent studies empirically prove the robustness of most black-box watermarking schemes against known removal attempts. In this paper, we propose a novel Model Inversion-based Removal Attack (\textsc{Mira}), which is watermark-agnostic and effective against most of mainstream black-box DNN watermarking schemes. In general, our attack pipeline exploits the internals of the protected model to recover and unlearn the watermark message. We further design target class detection and recovered sample splitting algorithms to reduce the utility loss caused by \textsc{Mira} and achieve data-free watermark removal on half of the watermarking schemes. We conduct comprehensive evaluation of \textsc{Mira} against ten mainstream black-box watermarks on three benchmark datasets and DNN architectures. Compared with six baseline removal attacks, \textsc{Mira} achieves strong watermark removal effects on the covered watermarks, preserving at least $90\%$ of the stolen model utility, under more relaxed or even no assumptions on the dataset availability.

摘要
保护深度神经网络（DNN）的知识产权，黑盒DNN水印（black-box DNN watermarking）已在学术和业界中得到广泛应用。水印Robustness通常是针对偷窃保护模型并尝试从 Parameters 中除掉水印的攻击者。现有研究证明大多数黑盒水印 schemes 的 Robustness 可以抵抗知悉的 removal 试验。在这篇论文中，我们提出了一种新的 Model Inversion-based Removal Attack（\textsc{Mira}，这是针对 most 主流黑盒 DNN water marking schemes 的 watermark-agnostic 和高效的攻击方法。总的来说，我们的攻击管道利用 protected 模型的内部来恢复和忘记水印信息。我们还设计了目标类检测和恢复样本分割算法，以降低由 \textsc{Mira} 引起的实用损失。我们对 ten 主流黑盒水印 schemes 进行了三个 benchmark 数据集和 DNN 架构的全面评估。相比于六个基eline removal 攻击，\textsc{Mira} 在覆盖的水印上实现了强大的水印除法效果，保留至少 90% 的偷窃模型实用性，在更放宽或甚至无 dataset 可用性的情况下。

Automatic Algorithm Selection for Pseudo-Boolean Optimization with Given Computational Time Limits

paper_url: http://arxiv.org/abs/2309.03924
repo_url: None
paper_authors: Catalina Pezo, Dorit Hochbaum, Julio Godoy, Roberto Asin-Acha
for: 本研究旨在设计一个可靠的时间限制选择器，以解决NP困难优化问题中的 Pseudo-Boolean Optimization (PBO) 问题。
methods: 本研究使用了机器学习技术，特别是Anytime选择器，来自动选择最佳的解决方案。Anytime选择器会根据给定的时间限制，预测最佳的解决方案，并在该时间限制内执行该解决方案。
results: 研究表明，使用 Anytime 选择器可以大幅提高解决PBO问题的性能，比如在 Gurobi 优化软件失败时，我们的Anytime meta-solver可以为47%的情况提供可行的解决方案。

Abstract
Machine learning (ML) techniques have been proposed to automatically select the best solver from a portfolio of solvers, based on predicted performance. These techniques have been applied to various problems, such as Boolean Satisfiability, Traveling Salesperson, Graph Coloring, and others. These methods, known as meta-solvers, take an instance of a problem and a portfolio of solvers as input. They then predict the best-performing solver and execute it to deliver a solution. Typically, the quality of the solution improves with a longer computational time. This has led to the development of anytime selectors, which consider both the instance and a user-prescribed computational time limit. Anytime meta-solvers predict the best-performing solver within the specified time limit. Constructing an anytime meta-solver is considerably more challenging than building a meta-solver without the "anytime" feature. In this study, we focus on the task of designing anytime meta-solvers for the NP-hard optimization problem of Pseudo-Boolean Optimization (PBO), which generalizes Satisfiability and Maximum Satisfiability problems. The effectiveness of our approach is demonstrated via extensive empirical study in which our anytime meta-solver improves dramatically on the performance of Mixed Integer Programming solver Gurobi, which is the best-performing single solver in the portfolio. For example, out of all instances and time limits for which Gurobi failed to find feasible solutions, our meta-solver identified feasible solutions for 47% of these.

摘要
机器学习（ML）技术已经提议用于自动选择一个竞争力最高的解决方案从一个 portefolio 中，基于预测性能。这些技术已经应用于各种问题，如布尔满足问题、旅行商问题、图色问题和其他问题。这些方法，称为元解决方案，将一个问题和一个 portefolio 中的解决方案作为输入，然后预测最佳的解决方案并执行它来提供解决方案。通常，解决方案的质量随着计算时间的增加而提高。这导致了“任何时间”选择器的发展，它们考虑了问题和用户指定的计算时间限制。任何时间元解决方案预测在指定时间限制内最佳的解决方案。在本研究中，我们关注了对 Pseudo-Boolean Optimization (PBO) 问题的任何时间元解决方案的设计任务。PBO 问题是推理问题的一种扩展，包括满足问题和最大满足问题。我们的任何时间元解决方案在广泛的实验研究中表现出色，对于 Gurobi 混合整数编程 solver，该 solver 是 portefolio 中最高性能的单个解决方案。例如，对于 Gurobi 无法找到可行解的所有实例和计算时间限制，我们的元解决方案可以提供可行解的 47%。

SyncDreamer: Generating Multiview-consistent Images from a Single-view Image

paper_url: http://arxiv.org/abs/2309.03453
repo_url: https://github.com/liuyuan-pal/SyncDreamer
paper_authors: Yuan Liu, Cheng Lin, Zijiao Zeng, Xiaoxiao Long, Lingjie Liu, Taku Komura, Wenping Wang
for: 生成多视图图像 from 单视图图像
methods: 使用预训练大规模2D扩散模型和3D意识特征关注机制
results: 生成高一致性的多视图图像，适用于多种3D生成任务

Abstract
In this paper, we present a novel diffusion model called that generates multiview-consistent images from a single-view image. Using pretrained large-scale 2D diffusion models, recent work Zero123 demonstrates the ability to generate plausible novel views from a single-view image of an object. However, maintaining consistency in geometry and colors for the generated images remains a challenge. To address this issue, we propose a synchronized multiview diffusion model that models the joint probability distribution of multiview images, enabling the generation of multiview-consistent images in a single reverse process. SyncDreamer synchronizes the intermediate states of all the generated images at every step of the reverse process through a 3D-aware feature attention mechanism that correlates the corresponding features across different views. Experiments show that SyncDreamer generates images with high consistency across different views, thus making it well-suited for various 3D generation tasks such as novel-view-synthesis, text-to-3D, and image-to-3D.

摘要
在这篇论文中，我们提出了一种新的扩散模型，称为同视图一致图像生成模型。使用预训练的大规模2D扩散模型， Zero123 的最新研究表明了从单视图图像中生成可信度高的新视图图像的能力。然而，维护图像的几何学和颜色协调仍然是一大挑战。为解决这个问题，我们提议一种同步多视图扩散模型，该模型对多视图图像的联合概率分布进行了模型化，从而在单向过程中生成了协调的多视图图像。SyncDreamer 在每个反向过程的每个步骤中 synchronizes 所有生成的图像的中间状态通过一种3D-aware feature attention机制，相应地对不同视图的相关特征进行了相互协调。实验显示，SyncDreamer 能够生成具有高度一致性的多视图图像，因此适用于多种3D生成任务，如新视图合成、文本到3D和图像到3D。

XGen-7B Technical Report

paper_url: http://arxiv.org/abs/2309.03450
repo_url: None
paper_authors: Erik Nijkamp, Tian Xie, Hiroaki Hayashi, Bo Pang, Congying Xia, Chen Xing, Jesse Vig, Semih Yavuz, Philippe Laban, Ben Krause, Senthil Purushwalkam, Tong Niu, Wojciech Kryściński, Lidiya Murakhovs’ka, Prafulla Kumar Choubey, Alex Fabbri, Ye Liu, Rui Meng, Lifu Tu, Meghana Bhat, Chien-Sheng Wu, Silvio Savarese, Yingbo Zhou, Shafiq Joty, Caiming Xiong
for: 这个论文的目的是提高大语言模型（LLM）的性能和可用性，使其能够更好地支持各种任务和应用。
methods: 作者使用了一系列的7B参数模型，在8K字串长度和1.5T字符数下进行训练。他们还对这些模型进行了资料适应，创建了专门针对公共领域的指南数据进行训练的XGen-Inst模型。
results: 作者在标准的benchmark测试中发现，XGen模型可以与当前开源LLM相比或者更好地实现相同的结果。在长字串模型任务中，XGen-Inst模型也表现出了优势。

Abstract
Large Language Models (LLMs) have become ubiquitous across various domains, transforming the way we interact with information and conduct research. However, most high-performing LLMs remain confined behind proprietary walls, hindering scientific progress. Most open-source LLMs, on the other hand, are limited in their ability to support longer sequence lengths, which is a key requirement for many tasks that require inference over an input context. To address this, we have trained XGen, a series of 7B parameter models on up to 8K sequence length for up to 1.5T tokens. We have also finetuned the XGen models on public-domain instructional data, creating their instruction-tuned counterparts (XGen-Inst). We open-source our models for both research advancements and commercial applications. Our evaluation on standard benchmarks shows that XGen models achieve comparable or better results when compared with state-of-the-art open-source LLMs. Our targeted evaluation on long sequence modeling tasks shows the benefits of our 8K-sequence models over 2K-sequence open-source LLMs.

摘要

Large Language Models as Optimizers

paper_url: http://arxiv.org/abs/2309.03409
repo_url: https://github.com/chrisneagu/FTC-Skystone-Dark-Angels-Romania-2020
paper_authors: Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V. Le, Denny Zhou, Xinyun Chen
for: 这篇论文的目的是提出一种使用大型自然语言模型（LLM）进行优化的简单和有效方法，以解决许多现实世界中缺乏导数的优化问题。
methods: 这篇论文使用的方法是使用大型自然语言模型（LLM）来生成新的解决方案，并在每一步优化过程中，将生成的解决方案与之前生成的解决方案一起作为描述符，以便在下一步优化过程中生成更优的解决方案。
results: 研究人员通过在线性回归和旅行售商问题上应用OPRO，并使用多种LLM，发现OPTRO可以比人类设计的提示更好地优化提示，在GSM8K和Big-Bench Hard任务上达到8%的提高和50%的提高。

Abstract
Optimization is ubiquitous. While derivative-based algorithms have been powerful tools for various problems, the absence of gradient imposes challenges on many real-world applications. In this work, we propose Optimization by PROmpting (OPRO), a simple and effective approach to leverage large language models (LLMs) as optimizers, where the optimization task is described in natural language. In each optimization step, the LLM generates new solutions from the prompt that contains previously generated solutions with their values, then the new solutions are evaluated and added to the prompt for the next optimization step. We first showcase OPRO on linear regression and traveling salesman problems, then move on to prompt optimization where the goal is to find instructions that maximize the task accuracy. With a variety of LLMs, we demonstrate that the best prompts optimized by OPRO outperform human-designed prompts by up to 8% on GSM8K, and by up to 50% on Big-Bench Hard tasks.

摘要
优化是 ubique。而基于导数算法的优化方法在实际应用中存在很多挑战，因为导数缺失。在这项工作中，我们提出了优化通过PROmpting（OPRO），一种简单而有效的方法，使用大型自然语言模型（LLM）作为优化器，其中优化任务是通过自然语言描述的。在每次优化步骤中，LLM生成新的解决方案，这些解决方案基于先前生成的解决方案和其值，然后评估这些新的解决方案，并将其添加到下一次优化步骤中的描述中。我们首先应用OPRO在线性回归和旅行商问题上，然后转移到提示优化，其目标是找到可以最大化任务准确率的指令。使用不同的LLM，我们示出了OPRO最佳提示可以比人工设计的提示高达8%的提高在GSM8K上，以及比人工设计的提示高达50%的提高在Big-Bench Hard任务上。

2023-09-07

Evaluation of large language models for discovery of gene set function

ConDA: Contrastive Domain Adaptation for AI-generated Text Detection

Noisy Computing of the $\mathsf{OR}$ and $\mathsf{MAX}$ Functions

DiffusionEngine: Diffusion Model is Scalable Data Engine for Object Detection

A Function Interpretation Benchmark for Evaluating Interpretability Methods

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

OpinionGPT: Modelling Explicit Biases in Instruction-Tuned LLMs

FLM-101B: An Open LLM and How to Train It with $100K Budget

Uncovering Drift in Textual Data: An Unsupervised Method for Detecting and Mitigating Drift in Machine Learning Models

Training Acceleration of Low-Rank Decomposed Networks using Sequential Freezing and Rank Quantization

AnthroNet: Conditional Generation of Humans via Anthropometrics

Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck

FisheyePP4AV: A privacy-preserving method for autonomous vehicles on fisheye camera images

CPU frequency scheduling of real-time applications on embedded devices with temporal encoding-based deep reinforcement learning

Extending Transductive Knowledge Graph Embedding Models for Inductive Logical Relational Inference

Hybrid of representation learning and reinforcement learning for dynamic and complex robotic motion planning

TSGBench: Time Series Generation Benchmark

Enhancing Pipeline-Based Conversational Agents with Large Language Models

A Natural Gas Consumption Forecasting System for Continual Learning Scenarios based on Hoeffding Trees with Change Point Detection Mechanism

PyGraft: Configurable Generation of Schemas and Knowledge Graphs at Your Fingertips

Dataset Generation and Bonobo Classification from Weakly Labelled Videos

How adversarial attacks can disrupt seemingly stable accurate classifiers

Towards Comparable Knowledge Distillation in Semantic Image Segmentation

Anatomy-informed Data Augmentation for Enhanced Prostate Cancer Detection

Learning of Generalizable and Interpretable Knowledge in Grid-Based Reinforcement Learning Environments

Large-Scale Automatic Audiobook Creation

Promoting Fairness in GNNs: A Characterization of Stability

VideolandGPT: A User Study on a Conversational Recommender System

Beyond XAI:Obstacles Towards Responsible AI

NeuroCodeBench: a plain C neural network benchmark for software verification

Evaluating ChatGPT as a Recommender System: A Rigorous Approach

Spatial encoding of BOLD fMRI time series for categorizing static images across visual datasets: A pilot study on human vision

Interactive Hyperparameter Optimization in Multi-Objective Problems via Preference Learning

DTW+S: Shape-based Comparison of Time-series with Ordered Local Trend

Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation

DGC: Training Dynamic Graphs with Spatio-Temporal Non-Uniformity using Graph Partitioning by Chunks

Parameterized Aspects of Distinct Kemeny Rank Aggregation

Towards Robust Natural-Looking Mammography Lesion Synthesis on Ipsilateral Dual-Views Breast Cancer Analysis

InteractionNet: Joint Planning and Prediction for Autonomous Driving with Transformers

Can Large Language Models Discern Evidence for Scientific Hypotheses? Case Studies in the Social Sciences

Fast FixMatch: Faster Semi-Supervised Learning with Curriculum Batch Size

Cross-Image Context Matters for Bongard Problems

Autoregressive Omni-Aware Outpainting for Open-Vocabulary 360-Degree Image Generation

MIRA: Cracking Black-box Watermarking on Deep Neural Networks via Model Inversion-based Removal Attacks

Automatic Algorithm Selection for Pseudo-Boolean Optimization with Given Computational Time Limits

SyncDreamer: Generating Multiview-consistent Images from a Single-view Image

XGen-7B Technical Report

Large Language Models as Optimizers