2023-08-21

cs.AI

cs.AI - 2023-08-21

DynED: Dynamic Ensemble Diversification in Data Stream Classification

paper_url: http://arxiv.org/abs/2308.10807
repo_url: https://github.com/soheilabadifard/dyned
paper_authors: Soheil Abadifard, Sepehr Bakhshi, Sanaz Gheibuni, Fazli Can
for: 这个论文是为了提高在数据流中的分类精度，因为数据流中的变化可能会影响分类的精度。
methods: 这个论文使用了一种基于 MMR (最大侧边相关) 的组合方法，可以在组合时选择最高表现和多样性的组件。
results: 实验结果显示，提案的方法（DynED）在四个真实数据集和十一个 sintetic 数据集上的平均mean accuracy 高于五种基eline。

Abstract
Ensemble methods are commonly used in classification due to their remarkable performance. Achieving high accuracy in a data stream environment is a challenging task considering disruptive changes in the data distribution, also known as concept drift. A greater diversity of ensemble components is known to enhance prediction accuracy in such settings. Despite the diversity of components within an ensemble, not all contribute as expected to its overall performance. This necessitates a method for selecting components that exhibit high performance and diversity. We present a novel ensemble construction and maintenance approach based on MMR (Maximal Marginal Relevance) that dynamically combines the diversity and prediction accuracy of components during the process of structuring an ensemble. The experimental results on both four real and 11 synthetic datasets demonstrate that the proposed approach (DynED) provides a higher average mean accuracy compared to the five state-of-the-art baselines.

摘要
ensemble 方法通常在分类中使用，因为它们的表现非常出色。在数据流环境中达到高精度是一项具有挑战性的任务，因为数据分布的变化会导致分类模型的性能下降。更多的 ensemble 组件可以提高预测精度。然而，不 все ensemble 组件在整体性能方面都做出了预期的贡献。这种情况需要一种方法来选择表现好、多样化的组件。我们提出了一种基于 MMR（最大边缘关键性）的ensemble 构建和维护方法，可以在结构 ensemble 时动态组合多样性和预测精度。实验结果表明，提议的方法（DynED）在四个实际数据集和11个 sintetic 数据集上的平均含义精度比五种现有的基准方法高。

Differentiable Frank-Wolfe Optimization Layer

paper_url: http://arxiv.org/abs/2308.10806
repo_url: None
paper_authors: Zixuan Liu, Liu Liu, Xueqian Wang, Peilin Zhao
for: 这个论文是为了提出一种高效的梯度下降优化层（DFWLayer），用于解决大规模优化问题。
methods: 这个论文使用了Frank-Wolfe方法，一种可以解决具有约束的优化问题，而不需要计算权重矩阵和梯度矩阵。
results: 实验表明，DFWLayer不仅可以在精度和梯度方面与基eline竞争，而且能够快速计算前向和反向传播。此外，它还可以保证约束的满足。

Abstract
Differentiable optimization has received a significant amount of attention due to its foundational role in the domain of machine learning based on neural networks. The existing methods leverages the optimality conditions and implicit function theorem to obtain the Jacobian matrix of the output, which increases the computational cost and limits the application of differentiable optimization. In addition, some non-differentiable constraints lead to more challenges when using prior differentiable optimization layers. This paper proposes a differentiable layer, named Differentiable Frank-Wolfe Layer (DFWLayer), by rolling out the Frank-Wolfe method, a well-known optimization algorithm which can solve constrained optimization problems without projections and Hessian matrix computations, thus leading to a efficient way of dealing with large-scale problems. Theoretically, we establish a bound on the suboptimality gap of the DFWLayer in the context of l1-norm constraints. Experimental assessments demonstrate that the DFWLayer not only attains competitive accuracy in solutions and gradients but also consistently adheres to constraints. Moreover, it surpasses the baselines in both forward and backward computational speeds.

摘要
diferenciable 优化得到了广泛关注，因为它在基于神经网络的机器学习领域扮演了基础角色。现有方法利用优化性条件和隐函数定理获取输出 Jacobian 矩阵，这会增加计算成本并限制 differentiable 优化的应用。此外，一些非 differentiable 约束会使用先前的 differentiable 优化层面出现更多的挑战。本文提出了一种 differentiable 层，名为 Differentiable Frank-Wolfe Layer (DFWLayer)，通过折衔 Frank-Wolfe 算法，这是一种可以解决约束优化问题而无需投影和梯度矩阵计算，从而实现大规模问题的高效解决方法。理论上，我们提出了 l1-norm 约束下 DFWLayer 的优化误差的下界。实验评估表明，DFWLayer 不仅能够实现竞争性的解决方案和梯度，而且一直遵循约束。此外，它在前向和后向计算速度上也超过了基准值。

Artificial intelligence is ineffective and potentially harmful for fact checking

paper_url: http://arxiv.org/abs/2308.10800
repo_url: https://github.com/osome-iu/ai_fact_checking
paper_authors: Matthew R. DeVerna, Harry Yaojun Yan, Kai-Cheng Yang, Filippo Menczer
for: 本研究旨在调查人工智能模型对政治新闻的真假性检查对人们的信念和分享意愿的影响。
methods: 研究使用了一个受欢迎的人工智能模型生成的 фаクチュール检查来影响参与者对政治新闻的判断和分享意愿。
results: 研究发现，尽管人工智能模型能够有效地推翻假新闻头条，但是这并不会影响参与者对新闻的判断或分享意愿。此外，人工智能模型的检查可能会导致参与者对真正的新闻产生误解，并且增加对假新闻的信念。但是，参与者选择查看人工智能模型的检查后，他们更有可能分享正确的新闻。

Abstract
Fact checking can be an effective strategy against misinformation, but its implementation at scale is impeded by the overwhelming volume of information online. Recent artificial intelligence (AI) language models have shown impressive ability in fact-checking tasks, but how humans interact with fact-checking information provided by these models is unclear. Here we investigate the impact of fact checks generated by a popular AI model on belief in, and sharing intent of, political news in a preregistered randomized control experiment. Although the AI performs reasonably well in debunking false headlines, we find that it does not significantly affect participants' ability to discern headline accuracy or share accurate news. However, the AI fact-checker is harmful in specific cases: it decreases beliefs in true headlines that it mislabels as false and increases beliefs for false headlines that it is unsure about. On the positive side, the AI increases sharing intents for correctly labeled true headlines. When participants are given the option to view AI fact checks and choose to do so, they are significantly more likely to share both true and false news but only more likely to believe false news. Our findings highlight an important source of potential harm stemming from AI applications and underscore the critical need for policies to prevent or mitigate such unintended consequences.

摘要
fact-checking 可以是一种有效的反对谣言策略，但它的实施在大量信息在线上是受到妨碍的。最近的人工智能（AI）语言模型在事实核查任务中表现出了很好的能力，但人们如何与这些模型提供的事实核查信息相互作用的问题未得到清楚的答案。我们在一个预先注册的随机控制试验中调查了由一个受欢迎的 AI 模型生成的事实核查对政治新闻的信徒性和分享意愿的影响。虽然 AI 在证伪假头条中表现不错，但我们发现它不会有显著影响参与者识别头条是否准确或分享准确新闻的能力。然而，AI 事实核查器在特定情况下是有害的：它会降低真正的头条的信徒性，并增加假头条的信徒性。正面的是，AI 会增加正确标注的true头条的分享意愿。当参与者有选择查看 AI 事实核查并选择做此时，他们更有可能分享正确和错误的新闻，但只有更有可能信服错误的新闻。我们的发现表明人工智能应用中的潜在危害，并高亮需要采取政策来预防或减轻这些不良后果。

Stabilizing Unsupervised Environment Design with a Learned Adversary

paper_url: http://arxiv.org/abs/2308.10797
repo_url: https://github.com/facebookresearch/dcd
paper_authors: Ishita Mediratta, Minqi Jiang, Jack Parker-Holder, Michael Dennis, Eugene Vinitsky, Tim Rocktäschel
for: 本研究旨在提高普通能力Agent的训练，通过设计适应环境变化的训练任务来促进广泛的普通化和稳定性。
methods: 本研究使用了强化学习（RL）来训练教师策略，从 scratch generate任务，使得可以直接生成适应当前机器人能力的任务。
results: 本研究在多个已知的难度 Navigation 和赛车环境中实现了与当前状态的比较或超越，生成了可靠的普通化Agent。

Abstract
A key challenge in training generally-capable agents is the design of training tasks that facilitate broad generalization and robustness to environment variations. This challenge motivates the problem setting of Unsupervised Environment Design (UED), whereby a student agent trains on an adaptive distribution of tasks proposed by a teacher agent. A pioneering approach for UED is PAIRED, which uses reinforcement learning (RL) to train a teacher policy to design tasks from scratch, making it possible to directly generate tasks that are adapted to the agent's current capabilities. Despite its strong theoretical backing, PAIRED suffers from a variety of challenges that hinder its practical performance. Thus, state-of-the-art methods currently rely on curation and mutation rather than generation of new tasks. In this work, we investigate several key shortcomings of PAIRED and propose solutions for each shortcoming. As a result, we make it possible for PAIRED to match or exceed state-of-the-art methods, producing robust agents in several established challenging procedurally-generated environments, including a partially-observed maze navigation task and a continuous-control car racing environment. We believe this work motivates a renewed emphasis on UED methods based on learned models that directly generate challenging environments, potentially unlocking more open-ended RL training and, as a result, more general agents.

摘要
training 通常遇到的一个挑战是设计训练任务，以便在环境变化时展现广泛的普适性和稳定性。这个挑战导致了无监督环境设计（UED）的问题设定，其中学生机器人通过教师机器人提出的 adaptive 任务分布进行训练。一种开拓性的方法是PAIRED，它使用了强化学习（RL）来训练一个教师策略，从scratch 生成任务，使得可以直接生成适应机器人当前能力的任务。然而，PAIRED 受到了许多实际应用中的挑战，使得现有的state-of-the-art方法倾向于使用筛选和变化而不是生成新任务。在这项工作中，我们调查了PAIRED 中的一些关键缺陷，并提出了解决方案。因此，我们使得PAIRED 能够与现有的state-of-the-art方法匹配或超越，在一些已知的难度较高的进行生成任务的环境中，如部分观察 Maze 导航任务和连续控制车辆竞速环境。我们认为这项工作将激励更多关注基于学习模型生成环境的 UED 方法，可能开启更多的开放式RL 训练，从而实现更通用的机器人。

Instruction Tuning for Large Language Models: A Survey

paper_url: http://arxiv.org/abs/2308.10792
repo_url: None
paper_authors: Shengyu Zhang, Linfeng Dong, Xiaoya Li, Sen Zhang, Xiaofei Sun, Shuhe Wang, Jiwei Li, Runyi Hu, Tianwei Zhang, Fei Wu, Guoyin Wang
for: 本研究审视了大语言模型（LLM）的指令调整（IT）技术，以提高 LLM 的能力和可控性。
methods: 本研究涉及了 IT 技术的通用方法、指令数据集的建构、IT 模型的训练、以及不同modalities、领域和应用场景中的应用。
results: 本研究提供了一个系统性的文献综述，包括 IT 技术的总方法、指令数据集的建构、IT 模型的训练、以及不同modalities、领域和应用场景中的应用。同时，还提供了一些可能的障碍和批评，以及一些可能的进一步研究方向。

Abstract
This paper surveys research works in the quickly advancing field of instruction tuning (IT), a crucial technique to enhance the capabilities and controllability of large language models (LLMs). Instruction tuning refers to the process of further training LLMs on a dataset consisting of \textsc{(instruction, output)} pairs in a supervised fashion, which bridges the gap between the next-word prediction objective of LLMs and the users' objective of having LLMs adhere to human instructions. In this work, we make a systematic review of the literature, including the general methodology of IT, the construction of IT datasets, the training of IT models, and applications to different modalities, domains and applications, along with an analysis on aspects that influence the outcome of IT (e.g., generation of instruction outputs, size of the instruction dataset, etc). We also review the potential pitfalls of IT along with criticism against it, along with efforts pointing out current deficiencies of existing strategies and suggest some avenues for fruitful research.

摘要
In this review, we systematically examine the literature on IT, including the general methodology, dataset construction, model training, and applications across different modalities, domains, and applications. We also analyze factors that influence the effectiveness of IT, such as the quality of instruction outputs and the size of the instruction dataset.Furthermore, we review potential pitfalls and criticisms of IT, as well as efforts to address current deficiencies and suggest avenues for future research. Overall, this review aims to provide a comprehensive overview of the current state of IT and its potential for improving the performance and controllability of LLMs.Here is the translation in Simplified Chinese:这篇论文对 rapidly developing 的 instruction tuning (IT) 技术进行了评估，IT 技术可以增强 large language models (LLMs) 的能力和可控性。IT 技术包括在 supervised 的方式下训练 LLMs 在 instruction-output 对应的 dataset 上，从而 bridge LLMs 的下一个词预测目标和用户的对 LLMs 遵循人类指令的目标之间的差距。在这篇评估中，我们系统地查看了 IT 领域的文献，包括 IT 的通用方法、数据集的建构、模型的训练、以及不同的Modalities、Domains 和应用程序中的应用。我们还分析了 IT 的效果因素，例如 instruction 输出质量和 instruction 数据集的大小。此外，我们还审查了 IT 的潜在弱点和批评，以及现有缺陷的解决方案和未来研究的可能性。总之，这篇评估的目的是为了提供 IT 的全面和系统的评估，以便更好地理解 IT 的潜在和当前的应用。Here is the translation in Traditional Chinese:这篇论文对 rapidly developing 的 instruction tuning (IT) 技术进行了评价，IT 技术可以增强 large language models (LLMs) 的能力和可控性。IT 技术包括在 supervised 的方式下训练 LLMs 在 instruction-output 对应的 dataset 上，从而 bridge LLMs 的下一个词预测目标和用户的对 LLMs 遵循人类指令的目标之间的差距。在这篇评价中，我们系统地查看了 IT 领域的文献，包括 IT 的通用方法、数据集的建构、模型的训练、以及不同的Modalities、Domains 和应用程序中的应用。我们还分析了 IT 的效果因素，例如 instruction 输出质量和 instruction 数据集的大小。此外，我们还审查了 IT 的潜在弱点和批评，以及现有缺陷的解决方案和未来研究的可能性。总之，这篇评价的目的是为了提供 IT 的全面和系统的评价，以便更好地理解 IT 的潜在和现在的应用。

A Block-Ring connected Topology of Parameterized Quantum Circuits

paper_url: http://arxiv.org/abs/2308.10791
repo_url: None
paper_authors: Wenjie Liu, Qingshan Wu
for: 提高 parameterized quantum circuit 的效率和可表达性，解决 current circuit 中的优化困难和性能保证问题。
methods: 提出了一种新的 topology，即 Block-Ring (BR) topology，通过将所有 qubits 分配到多个块中，并在每个块内采用 all-to-all 模式和 ring 模式连接不同块来构建 PQCs。相比拥有最佳性能的 pure all-to-all topology circuits，BR topology 具有类似的性能，同时减少了参数的数量和二 Quintessence 门的数量。
results: BR topology 在 expressibility 和 entangling 能力方面与其他 topology circuits 相比较为优异，而且在 multilayer circuits 中的表现也更好。此外，通过对不同的二 Quintessence 门的影响进行分析，我们还发现 BR topology 在 controlled X-rotation 和 controlled Z-rotation gates 的情况下具有更好的性能。

Abstract
It is essential to select efficient topology of parameterized quantum circuits (PQCs) in variational quantum algorithms (VQAs). However, there are problems in current circuits, i.e. optimization difficulties caused by too many parameters or performance is hard to guarantee. How to reduce the number of parameters (number of single-qubit rotation gates and 2-qubit gates) in PQCs without reducing the performance has become a new challenge. To solve this problem, we propose a novel topology, called Block-Ring (BR) topology, to construct the PQCs. This topology allocate all qubits to several blocks, all-to-all mode is adopt inside each block and ring mode is applied to connect different blocks. Compared with the pure all-to-all topology circuits which own the best power, BR topology have similar performance and the number of parameters and 2-qubit gate reduced from 0(n^2) to 0(mn) , m is a hyperparameter set by ourselves. Besides, we compared BR topology with other topology circuits in terms of expressibility and entangling capability. Considering the effects of different 2-qubit gates on circuits, we also make a distinction between controlled X-rotation gates and controlled Z-rotation gates. Finally, the 1- and 2-layer configurations of PQCs are taken into consideration as well, which shows the BR's performance improvement in the condition of multilayer circuits.

摘要
“选择优化参数化量子环路（PQC）的架构是变量量子算法（VQA）中的重要任务。但是现有的环路参数过多或性能难以保证问题。如何将PQC的参数数量（单位quantum gate和2个量子闸道的数量）降低到最小化不损化性能成为了新的挑战。为解决这个问题，我们提出了一种新的架构，即封页环路（BR）架构，用于建立PQC。这个架构将所有的量子位数分配到多个封页中，每个封页运行完整的all-to-all模式，并通过环路连接不同的封页。与纯粹的all-to-all架构相比，BR架构具有相似的性能，并将参数数量从O(n^2)降低到O(mn)，其中m是我们自己设置的参数。此外，我们与其他架构circuit进行比较，包括表达能力和对应能力。这些对应能力的分析显示BR架构在表达能力和对应能力方面具有优势。此外，我们还考虑了不同的2个量子闸道的效果，并对BR架构的1和2层环路进行考虑。总之，BR架构在多层环路中表现出较好的性能。”

Sparse Linear Concept Discovery Models

paper_url: http://arxiv.org/abs/2308.10782
repo_url: https://github.com/konpanousis/conceptdiscoverymodels
paper_authors: Konstantinos P. Panousis, Dino Ienco, Diego Marcos
for: 这篇论文旨在创造可解释的深度学习模型，以便调查和更正模型做出的决策。
methods: 该论文提出了一种简单但具有很强的直观性的解释框架，基于对比语言图像模型和单个稀疏线性层。该框架通过使用数据驱动的 Bernoulli 分布来实现权重稀疏，并且在实验中表明，该框架不仅在准确性方面超越了最近的 CBM 方法，而且每个例子的概念稀疏性也得到了提高。
results: 该论文的实验结果表明，提出的解释框架不仅在准确性方面超越了最近的 CBM 方法，而且每个例子的概念稀疏性也得到了提高。

Abstract
The recent mass adoption of DNNs, even in safety-critical scenarios, has shifted the focus of the research community towards the creation of inherently intrepretable models. Concept Bottleneck Models (CBMs) constitute a popular approach where hidden layers are tied to human understandable concepts allowing for investigation and correction of the network's decisions. However, CBMs usually suffer from: (i) performance degradation and (ii) lower interpretability than intended due to the sheer amount of concepts contributing to each decision. In this work, we propose a simple yet highly intuitive interpretable framework based on Contrastive Language Image models and a single sparse linear layer. In stark contrast to related approaches, the sparsity in our framework is achieved via principled Bayesian arguments by inferring concept presence via a data-driven Bernoulli distribution. As we experimentally show, our framework not only outperforms recent CBM approaches accuracy-wise, but it also yields high per example concept sparsity, facilitating the individual investigation of the emerging concepts.

摘要
In this work, we propose a simple and highly intuitive interpretable framework based on Contrastive Language Image models and a single sparse linear layer. Our approach achieves sparsity through principled Bayesian arguments, inferring concept presence via a data-driven Bernoulli distribution. Experimental results show that our framework outperforms recent CBM approaches in accuracy and achieves high per-example concept sparsity, allowing for individual investigation of emerging concepts.Here is the translation in Simplified Chinese:近期，深度神经网络（DNNs）在安全关键场景得到了广泛采用，导致研究人员强调创建可解释性强的模型。概念瓶颈模型（CBMs）是一种受欢迎的方法，它将隐藏层与人理解的概念相关联。然而，CBMs通常会受到（i）性能下降和（ii）比预期更低的解释性的影响，这是由每个决策中参与的概念的绝对数量引起的。在这种情况下，我们提出了一种简单且易于理解的框架，基于对比语言图像模型和单个稀疏线性层。与相关的方法不同，我们的框架中的稀疏性是通过原则性的极 bayesian 理由来实现的，通过数据驱动的bernoulli分布来判断概念存在。我们的实验结果表明，我们的框架不仅在准确性方面超过了最近的 CBM 方法，而且每个例子的概念稀疏性也很高，可以方便地进行每个emerging概念的调查。

paper_url: http://arxiv.org/abs/2308.10757
repo_url: None
paper_authors: Carlo Mazzola, Marta Romeo, Francesco Rea, Alessandra Sciutti, Angelo Cangelosi
For: The paper is written to address the problem of Addressee Estimation in human-human communication, with the goal of enabling social robots to understand and interpret non-verbal bodily cues from speakers.* Methods: The paper uses a hybrid deep learning model that combines convolutional layers and LSTM cells to analyze images of the speaker’s face and 2D vectors of their body posture. The model is designed to be efficient and deployable on social robots in ecological scenarios.* Results: The paper demonstrates that the proposed model is able to solve the Addressee Estimation problem in terms of addressee localisation in space, from a robot ego-centric point of view.

Abstract
Communicating shapes our social word. For a robot to be considered social and being consequently integrated in our social environment it is fundamental to understand some of the dynamics that rule human-human communication. In this work, we tackle the problem of Addressee Estimation, the ability to understand an utterance's addressee, by interpreting and exploiting non-verbal bodily cues from the speaker. We do so by implementing an hybrid deep learning model composed of convolutional layers and LSTM cells taking as input images portraying the face of the speaker and 2D vectors of the speaker's body posture. Our implementation choices were guided by the aim to develop a model that could be deployed on social robots and be efficient in ecological scenarios. We demonstrate that our model is able to solve the Addressee Estimation problem in terms of addressee localisation in space, from a robot ego-centric point of view.

摘要
人际交流 shapes our 社会话语. 为了让 robot 被认为是社交的并被 subsequentially интегрирован到我们的社交环境中，是关键的理解一些人类到人类交流的 dinamics. 在这种工作中，我们面临了 Problem of Addressee Estimation，理解发言人的目标人的能力，通过解释和利用发言人的非语言姿势信息。我们通过实施一种 hybrid 深度学习模型，由 convolutional layers 和 LSTM cells 组成，输入图像包含发言人的脸部和发言人的身姿方向 Vektos。我们的实现选择是根据目标发送一个可以部署在社交 robot 上的模型，并且能够在生态环境中高效地解决问题。我们示示了我们的模型能够从 robot 自我中心的视角来解决目标人的地点定位问题。

DataVinci: Learning Syntactic and Semantic String Repairs

paper_url: http://arxiv.org/abs/2308.10922
repo_url: None
paper_authors: Mukul Singh, José Cambronero, Sumit Gulwani, Vu Le, Carina Negreanu, Gust Verbruggen
for: 该论文目的是为了提出一种不需要用户提供示例或约束的自动化字符串数据错误检测和修复系统。
methods: 该论文使用了自动学习的正则表达式来捕捉大多数列中的值，并基于这些模式和其他列的约束来自动生成修复提案。
results: 论文对4个现有和新的benchmark进行了测试，并与7基准相比，实现了更高的错误检测和修复精度。

Abstract
String data is common in real-world datasets: 67.6% of values in a sample of 1.8 million real Excel spreadsheets from the web were represented as text. Systems that successfully clean such string data can have a significant impact on real users. While prior work has explored errors in string data, proposed approaches have often been limited to error detection or require that the user provide annotations, examples, or constraints to fix the errors. Furthermore, these systems have focused independently on syntactic errors or semantic errors in strings, but ignore that strings often contain both syntactic and semantic substrings. We introduce DataVinci, a fully unsupervised string data error detection and repair system. DataVinci learns regular-expression-based patterns that cover a majority of values in a column and reports values that do not satisfy such patterns as data errors. DataVinci can automatically derive edits to the data error based on the majority patterns and constraints learned over other columns without the need for further user interaction. To handle strings with both syntactic and semantic substrings, DataVinci uses an LLM to abstract (and re-concretize) portions of strings that are semantic prior to learning majority patterns and deriving edits. Because not all data can result in majority patterns, DataVinci leverages execution information from an existing program (which reads the target data) to identify and correct data repairs that would not otherwise be identified. DataVinci outperforms 7 baselines on both error detection and repair when evaluated on 4 existing and new benchmarks.

摘要
STRING 数据是现实世界数据集中的常见类型：67.6%的值在我们采样的180万个真实Excel表格中被表示为文本。系统可以成功清理这些字符串数据会对实际用户产生重要影响。而过去的工作已经探讨了字符串数据中的错误，但提出的方法 oftentimes 有限制，需要用户提供笔记、示例或约束来修复错误。此外，这些系统通常只关注字符串中的语法错误或 semantics 错误，忽略了字符串中的语法和 semantics 都是错误的情况。我们介绍了 DataVinci，一种完全无监督的字符串数据错误检测和修复系统。DataVinci 可以学习语法模式，涵盖大多数列中的值，并报告不符合这些模式的值为数据错误。DataVinci 可以自动 derivation 基于多数模式和约束学习的数据修复，无需用户交互。为处理具有语法和 semantics 的字符串，DataVinci 使用 LLM 抽象（并重新具体化）字符串的 semantic 部分，然后学习多数模式和约束。由于不 todos 的数据可以导致多数模式，DataVinci 利用现有的程序（读取目标数据）的执行信息来识别和修复不会被识别的数据修复。DataVinci 在四个现有和新的benchmark上比基eline 7 高于 Error Detection 和 Repair。

paper_url: http://arxiv.org/abs/2308.10741
repo_url: None
paper_authors: Christian Schlarmann, Matthias Hein
for: 保护用户免受恶意内容的误导和宣扬false信息。
methods: 使用图像攻击法，通过改变图像的描述文本来诱导用户访问恶意网站或接受false信息。
results: 显示了基于多modal的基础模型可能受到恶意内容的威胁，需要采取防御措施来保护用户。

Abstract
Multi-modal foundation models combining vision and language models such as Flamingo or GPT-4 have recently gained enormous interest. Alignment of foundation models is used to prevent models from providing toxic or harmful output. While malicious users have successfully tried to jailbreak foundation models, an equally important question is if honest users could be harmed by malicious third-party content. In this paper we show that imperceivable attacks on images in order to change the caption output of a multi-modal foundation model can be used by malicious content providers to harm honest users e.g. by guiding them to malicious websites or broadcast fake information. This indicates that countermeasures to adversarial attacks should be used by any deployed multi-modal foundation model.

摘要
多modal基础模型，如flamingo或GPT-4，在最近受到了巨大的关注。对基础模型的对齐是为防止模型提供恶意或危险的输出。然而，具有恶意目的的用户已经成功地破坏了基础模型，另一个重要问题是：可以否由恶意第三方内容伤害正常用户？在这篇论文中，我们展示了图像上的不可见攻击可以让恶意内容提供者通过改变多modal基础模型的图像描述来让正常用户访问恶意网站或播送假信息。这表明，在部署多modal基础模型时应该使用对抗攻击的防御措施。

We Don’t Need No Adam, All We Need Is EVE: On The Variance of Dual Learning Rate And Beyond

paper_url: http://arxiv.org/abs/2308.10740
repo_url: https://github.com/akhadangi/EVE
paper_authors: Afshin Khadangi
for: 优化深度学习模型的性能和稳定性
methods: 使用不同学习率控制不同组件的梯度，并使用适应学习地图的摩擦TERM来更好地控制梯度下降的速率和方向
results: 在多种数据集和模型上对比，EVE方法显著超越了现有的优化技术，提高了性能和稳定性

Abstract
In the rapidly advancing field of deep learning, optimising deep neural networks is paramount. This paper introduces a novel method, Enhanced Velocity Estimation (EVE), which innovatively applies different learning rates to distinct components of the gradients. By bifurcating the learning rate, EVE enables more nuanced control and faster convergence, addressing the challenges associated with traditional single learning rate approaches. Utilising a momentum term that adapts to the learning landscape, the method achieves a more efficient navigation of the complex loss surface, resulting in enhanced performance and stability. Extensive experiments demonstrate that EVE significantly outperforms existing optimisation techniques across various benchmark datasets and architectures.

摘要
在深度学习领域的快速发展中，优化深度神经网络是非常重要的。本文介绍了一种新的方法——加速 velocity estimation（EVE），它创新地将不同的学习率应用到不同的梯度组成部分。通过分化学习率，EVE 可以更细致地控制和更快地 converges，解决了传统单学习率方法所面临的挑战。通过适应学习地图的推移 momentum 项，方法可以更有效地导航复杂的损失函数表面，从而提高性能和稳定性。经验表明，EVE 可以明显超过现有的优化技术在多个benchmark dataset和架构上。

CoMIX: A Multi-agent Reinforcement Learning Training Architecture for Efficient Decentralized Coordination and Independent Decision Making

paper_url: http://arxiv.org/abs/2308.10721
repo_url: None
paper_authors: Giovanni Minelli, Mirco Musolesi
for: 本研究旨在提高多机器人系统中的协调能力，使得每个机器人可以独立做出决策，同时也能够协同工作，以实现共同目标。
methods: 本研究提出了一种新的训练框架——协调QMIX（CoMIX），该框架使得每个机器人可以适应不同情况，灵活地改变决策，从而实现协同工作。
results: 经过多种 simulation 环境的实验表明，CoMIX 可以比基线方法更好地完成协同任务。这些结果证明了我们的增量策略方法是一种有效的技术，用于提高多机器人系统中的协调能力。

Abstract
Robust coordination skills enable agents to operate cohesively in shared environments, together towards a common goal and, ideally, individually without hindering each other's progress. To this end, this paper presents Coordinated QMIX (CoMIX), a novel training framework for decentralized agents that enables emergent coordination through flexible policies, allowing at the same time independent decision-making at individual level. CoMIX models selfish and collaborative behavior as incremental steps in each agent's decision process. This allows agents to dynamically adapt their behavior to different situations balancing independence and collaboration. Experiments using a variety of simulation environments demonstrate that CoMIX outperforms baselines on collaborative tasks. The results validate our incremental policy approach as effective technique for improving coordination in multi-agent systems.

摘要
<>将文本翻译成简化中文。<>强大的协调技能使代理人能够共同在共享环境中运行，共同向一个共同目标努力，而且理想情况下，每个代理人可以独立做出决策而不干扰别的进程。为了实现这一目标，这篇论文提出了协调QMIX（CoMIX），一种新的培训框架 для分布式代理人，允许emergent协调通过灵活的策略，同时允许每个代理人独立做出决策。CoMIX模型了每个代理人的自私和合作行为为增量的决策过程中的一部分。这使得代理人可以在不同的情况下动态地适应其行为，均衡独立和合作。实验使用了多种模拟环境，证明CoMIX在合作任务上超过了基eline。结果证明我们的增量策略方法是有效的技术，可以提高多代理人系统中的协调。

On the accuracy of interpolation based on single-layer artificial neural networks

paper_url: http://arxiv.org/abs/2308.10720
repo_url: None
paper_authors: Ferdinando Auricchio, Maria Roberta Belardo, Francesco Calabrò, Ariel F. Pascaner
for: 本文考虑了一种单个隐层神经网络（ANN），具有简单的Feedforward架构，以及一种被称为EXTREME LEARNING MACHINE（ELM）的训练方法。
methods: 本文使用了一种基于nodes的 interpolating函数的方法，并对不同类型的nodes进行比较。
results: 研究结果显示，使用Chebychev nodes进行训练时，ANN interpolating函数的准确性会随着节点数量的增加而提高。然而，使用其他类型的nodes时，函数的准确性通常会随着节点数量的增加而下降。

Abstract
In the present paper, we consider one-hidden layer ANNs with a feedforward architecture, also referred to as shallow or two-layer networks, so that the structure is determined by the number and types of neurons. The determination of the parameters that define the function, called training, is done via the resolution of the approximation problem, so by imposing the interpolation through a set of specific nodes. We present the case where the parameters are trained using a procedure that is referred to as Extreme Learning Machine (ELM) that leads to a linear interpolation problem. In such hypotheses, the existence of an ANN interpolating function is guaranteed. The focus is then on the accuracy of the interpolation outside of the given sampling interpolation nodes when they are the equispaced, the Chebychev, and the randomly selected ones. The study is motivated by the well-known bell-shaped Runge example, which makes it clear that the construction of a global interpolating polynomial is accurate only if trained on suitably chosen nodes, ad example the Chebychev ones. In order to evaluate the behavior when growing the number of interpolation nodes, we raise the number of neurons in our network and compare it with the interpolating polynomial. We test using Runge's function and other well-known examples with different regularities. As expected, the accuracy of the approximation with a global polynomial increases only if the Chebychev nodes are considered. Instead, the error for the ANN interpolating function always decays and in most cases we observe that the convergence follows what is observed in the polynomial case on Chebychev nodes, despite the set of nodes used for training.

摘要
在本文中，我们考虑一种具有一层隐藏层的人工神经网络（ANN），其架构为前向传播，并且由数量和类型的神经元决定结构。通过定义函数的参数的Resolution来进行训练，这被称为Extreme Learning Machine（ELM）。在这种假设下，存在一个ANN interpolating函数是保证的。然后我们关注在给定的采样 interpolating 节点外的准确性，包括等分、Chebychev和随机选择的节点。我们的研究是基于著名的bell-shaped Runge例子，它显示了建立全球 interpolating 多项式的准确性只有在适当选择节点时。为了评估增加 interpolating 节点数量的影响，我们增加了网络中神经元的数量，并与 interpolating 多项式进行比较。我们测试了Runge的函数和其他一些具有不同规则的例子。如期望，透 interpolating 多项式的准确性随着Chebychev节点数量增加。然而，对 ANN interpolating 函数的错误都在下降，并且在大多数情况下，我们发现ANN interpolating 函数的准确性与Chebychev节点数量增加的情况类似，即使使用的节点不同。

Sampling From Autoencoders’ Latent Space via Quantization And Probability Mass Function Concepts

paper_url: http://arxiv.org/abs/2308.10704
repo_url: None
paper_authors: Aymene Mohammed Bouayed, Adrian Iaccovelli, David Naccache
For: The paper focuses on sampling from the latent space of generative models built upon autoencoders, with the goal of generating lifelike images.* Methods: The proposed sampling algorithm is based on the concept of probability mass functions and quantization, and it establishes a vicinity around each latent vector from the input data to draw samples from. The algorithm improves upon previous techniques based on Gaussian mixture models (GMM) by reducing the time complexity from $\mathcal{O}(n\times d \times k \times i)$ to $\mathcal{O}(n\times d)$.* Results: The paper demonstrates the superior performance of the proposed sampling algorithm through experimental results on several benchmark datasets, including MNIST, CelebA, and MOBIUS. The algorithm achieves noteworthy improvements in the Fr'echet inception distance (FID) for image generation, with improvements of up to $0.89$ on the MNIST dataset, $1.69$ on the CelebA dataset, and $0.87$ on the MOBIUS dataset, compared to GMM sampling. Additionally, the paper shows that the proposed method is effective in estimating latent space distributions, as evidenced by the Wasserstein distance.Here is the information in Simplified Chinese text:* For: 这个研究目标是从生成模型中的潜在空间采样生成真实的图像。* Methods: 提议的采样算法基于概率质量函数和量化，它在输入数据中的每个潜在 вектор周围建立一个邻域，然后从这些定义的邻域中采样。这种策略使得采样的潜在 вектор主要居住在高概率区域中，从而可以有效地将其转换为真实的世界图像。* Results: 研究表明，提议的采样算法在多个 benchmark 数据集上表现出色，包括 MNIST、CelebA 和 MOBIUS。与 GMM 采样相比，提议的算法在 FID 指标上实现了显著的改善，最大达到 $0.89$ 在 MNIST 数据集上，$1.69$ 在 CelebA 数据集上，$0.87$ 在 MOBIUS 数据集上。此外，研究还证明了提议的方法可以有效地估计潜在空间分布，特别是通过 Wasserstein 距离来证明。

Abstract
In this study, we focus on sampling from the latent space of generative models built upon autoencoders so as the reconstructed samples are lifelike images. To do to, we introduce a novel post-training sampling algorithm rooted in the concept of probability mass functions, coupled with a quantization process. Our proposed algorithm establishes a vicinity around each latent vector from the input data and then proceeds to draw samples from these defined neighborhoods. This strategic approach ensures that the sampled latent vectors predominantly inhabit high-probability regions, which, in turn, can be effectively transformed into authentic real-world images. A noteworthy point of comparison for our sampling algorithm is the sampling technique based on Gaussian mixture models (GMM), owing to its inherent capability to represent clusters. Remarkably, we manage to improve the time complexity from the previous $\mathcal{O}(n\times d \times k \times i)$ associated with GMM sampling to a much more streamlined $\mathcal{O}(n\times d)$, thereby resulting in substantial speedup during runtime. Moreover, our experimental results, gauged through the Fr\'echet inception distance (FID) for image generation, underscore the superior performance of our sampling algorithm across a diverse range of models and datasets. On the MNIST benchmark dataset, our approach outperforms GMM sampling by yielding a noteworthy improvement of up to $0.89$ in FID value. Furthermore, when it comes to generating images of faces and ocular images, our approach showcases substantial enhancements with FID improvements of $1.69$ and $0.87$ respectively, as compared to GMM sampling, as evidenced on the CelebA and MOBIUS datasets. Lastly, we substantiate our methodology's efficacy in estimating latent space distributions in contrast to GMM sampling, particularly through the lens of the Wasserstein distance.

摘要
在这个研究中，我们关注于从生成模型基于 autoencoder 的latent空间中采样，以便生成的样本是真实的图像。为了实现这一目标，我们提出了一种新的后期采样算法，基于概率质量函数和量化过程。我们的提议的算法会定义latent vector的邻域，然后从这些定义的邻域中采样。这种策略可以确保采样的latent vector主要 inhabit高概率区域，从而可以高效地转换为真实的图像。与基于 Gaussian mixture models（GMM）的采样技术相比，我们的采样算法具有更高的时间复杂度提升，从 $\mathcal{O}(n\times d\times k\times i)$ 降低到 $\mathcal{O}(n\times d)$，这意味着在运行时可以获得重要的速度提升。此外，我们的实验结果，通过 Fréchet inception distance（FID）来衡量图像生成的性能，表明我们的采样算法在不同的模型和数据集上表现出色，与 GMM 采样相比，在 MNIST 数据集上可以达到 $0.89$ 的 FID 值提升。此外，当生成面部和眼部图像时，我们的方法也显示了显著的提升，FID 值分别提高 $1.69$ 和 $0.87$，比 GMM 采样更高。最后，我们验证了我们的方法在估计 latent space 分布方面的效果，特别是通过 Wasserstein distance。

Refashioning Emotion Recognition Modelling: The Advent of Generalised Large Models

paper_url: http://arxiv.org/abs/2308.11578
repo_url: None
paper_authors: Zixing Zhang, Liyizhe Peng, Tao Pang, Jing Han, Huan Zhao, Bjorn W. Schuller
for: 这篇论文的目的是实际应用感情识别和人工智能，探讨大语言模型在感情识别方面的表现，包括对 контекスト学习、几少shot学习、精度、通用性和解释等方面的分析和探讨。
methods: 这篇论文使用了大语言模型（LLMs），例如ChatGPT，进行感情识别 Task，并对其表现进行了多方面的探讨和分析，包括在对 контекスト学习、几少shot学习、精度、通用性和解释等方面的表现。
results: 根据研究结果显示，大语言模型在感情识别方面的表现具有优秀的对 контекスト学习、几少shot学习和精度等特点，并且能够在不同的benchmark上取得最佳的结果。此外，研究也提出了一些问题和挑战，以促进感情识别领域的进一步发展。

Abstract
After the inception of emotion recognition or affective computing, it has increasingly become an active research topic due to its broad applications. Over the past couple of decades, emotion recognition models have gradually migrated from statistically shallow models to neural network-based deep models, which can significantly boost the performance of emotion recognition models and consistently achieve the best results on different benchmarks. Therefore, in recent years, deep models have always been considered the first option for emotion recognition. However, the debut of large language models (LLMs), such as ChatGPT, has remarkably astonished the world due to their emerged capabilities of zero/few-shot learning, in-context learning, chain-of-thought, and others that are never shown in previous deep models. In the present paper, we comprehensively investigate how the LLMs perform in emotion recognition in terms of diverse aspects, including in-context learning, few-short learning, accuracy, generalisation, and explanation. Moreover, we offer some insights and pose other potential challenges, hoping to ignite broader discussions about enhancing emotion recognition in the new era of advanced and generalised large models.

摘要
après l'invention de la reconnaissance des émotions ou de l'informatique affective, elle est devenue un sujet de recherche actif en raison de ses applications équitables. Lors des dernières décennies, les modèles de reconnaissance des émotions ont graduellement migré de modèles statistiques superficiels à des modèles de réseaux de neurones profonds, ce qui peut considérablement améliorer les performances des modèles de reconnaissance des émotions et obtenir consistamment les meilleurs résultats sur différents benchmarks. Par conséquent, dans les années récentes, les modèles profonds ont toujours été considérés comme la première option pour la reconnaissance des émotions. Cependant, le débat des modèles de langage grands (LLMs), tels que ChatGPT, a remarquablement étonné le monde en raison de leurs capacités émergentes de apprentissage à zéro/few-shot, apprentissage en contexte, chaîne de pensée et d'autres capacités qui ne sont pas montrées dans les modèles profonds précédents. Dans le présent article, nous examinons comprehensivement comment les LLMs se comportent dans la reconnaissance des émotions en termes de divers aspects, y compris l'apprentissage en contexte, l'apprentissage à few-shot, la précision, la généralisation et l'explication. En outre, nous offrons des perspectives et posons des défis potentiels, espérant de susciter des discussions plus larges sur la amélioration de la reconnaissance des émotions dans la nouvelle ère des modèles avancés et généralisés.

Normative Conditional Reasoning as a Fragment of HOL

paper_url: http://arxiv.org/abs/2308.10686
repo_url: None
paper_authors: Xavier Parent, Christoph Benzmüller
for: 这篇论文关注了normative（偏好基于的）条件逻辑的机制化。
methods: 这篇论文使用了Isabelle/HOL进行浅层Semantical Embedding，以实现机制化。
results: 这篇论文可用于自动验证哲学和伦理学问题，例如Modal 逻辑立方体的自动验证。同时，它还可以用于评估伦理学Argument，如Parfit的厌恶结论。

Abstract
We report some results regarding the mechanization of normative (preference-based) conditional reasoning. Our focus is on Aqvist's system E for conditional obligation (and its extensions). Our mechanization is achieved via a shallow semantical embedding in Isabelle/HOL. We consider two possible uses of the framework. The first one is as a tool for meta-reasoning about the considered logic. We employ it for the automated verification of deontic correspondences (broadly conceived) and related matters, analogous to what has been previously achieved for the modal logic cube. The second use is as a tool for assessing ethical arguments. We provide a computer encoding of a well-known paradox in population ethics, Parfit's repugnant conclusion. Whether the presented encoding increases or decreases the attractiveness and persuasiveness of the repugnant conclusion is a question we would like to pass on to philosophy and ethics.

摘要
我们报告了一些关于normative（偏好基于的）条件逻辑机制的结果。我们的重点是Aqvist的系统E，以及其扩展。我们的机制实现是通过Isabelle/HOL中的浅semantical embedding来实现的。我们考虑了两种可能的应用场景。第一个是作为对考虑逻辑的meta-reasoning工具。我们使用它来自动验证伦理相互关系（广泛理解）以及相关问题，与modal logic cube的自动验证相似。第二个用途是作为评估伦理论点的工具。我们提供了一个计算编码的Parfit的恶名 conclusion。 weather这个编码提高了或降低了恶名 conclusion的吸引力和吸引力是一个我们希望通过哲学和伦理传达给学术界的问题。

Visual Crowd Analysis: Open Research Problems

paper_url: http://arxiv.org/abs/2308.10677
repo_url: None
paper_authors: Muhammad Asif Khan, Hamid Menouar, Ridha Hamila
for: 本文旨在探讨自动人群监测领域的最新进展和挑战，特别是Visual Crowd Analysis（VCAs）领域内的六大领域。methods: 本文使用了现代深度学习方法，总结了领域内最新的研究进展，并对每个领域进行了分类和评价。results: 本文发现了领域内最新的研究成果，包括novelty和性能提升等，并提供了未解决的挑战，以便未来研究可以继续发展和进步。

Abstract
Over the last decade, there has been a remarkable surge in interest in automated crowd monitoring within the computer vision community. Modern deep-learning approaches have made it possible to develop fully-automated vision-based crowd-monitoring applications. However, despite the magnitude of the issue at hand, the significant technological advancements, and the consistent interest of the research community, there are still numerous challenges that need to be overcome. In this article, we delve into six major areas of visual crowd analysis, emphasizing the key developments in each of these areas. We outline the crucial unresolved issues that must be tackled in future works, in order to ensure that the field of automated crowd monitoring continues to progress and thrive. Several surveys related to this topic have been conducted in the past. Nonetheless, this article thoroughly examines and presents a more intuitive categorization of works, while also depicting the latest breakthroughs within the field, incorporating more recent studies carried out within the last few years in a concise manner. By carefully choosing prominent works with significant contributions in terms of novelty or performance gains, this paper presents a more comprehensive exposition of advancements in the current state-of-the-art.

摘要
In this article, we will explore six major areas of visual crowd analysis, highlighting the key developments in each of these areas. We will also outline the crucial unresolved issues that must be tackled in future works to ensure that the field of automated crowd monitoring continues to progress and thrive.Several surveys have been conducted on this topic in the past, but this article provides a more comprehensive and intuitive categorization of works, incorporating more recent studies carried out within the last few years. By carefully selecting prominent works with significant contributions in terms of novelty or performance gains, this paper presents a more thorough exposition of the current state-of-the-art in automated crowd monitoring.

A Safe Deep Reinforcement Learning Approach for Energy Efficient Federated Learning in Wireless Communication Networks

paper_url: http://arxiv.org/abs/2308.10664
repo_url: None
paper_authors: Nikolaos Koursioumpas, Lina Magoula, Nikolaos Petropouleas, Alexandros-Ioannis Thanopoulos, Theodora Panagea, Nancy Alonistioti, M. A. Gutierrez-Estevez, Ramin Khalili
for: 针对艺术智能（AI）功能的无线网络中的环境影响，聚合学习（FL）技术 emerged as a key privacy-preserving decentralized AI technique.
methods: 提议在FL过程中协调设备的计算和通信资源，以最小化总能 consumption，保证模型的性能，并引入罚函数 durante training，约束策略的满足环境约束。
results: 比较四种状态的先进基eline解决方案，实现了94%的总能 consumption减少。

Abstract
Progressing towards a new era of Artificial Intelligence (AI) - enabled wireless networks, concerns regarding the environmental impact of AI have been raised both in industry and academia. Federated Learning (FL) has emerged as a key privacy preserving decentralized AI technique. Despite efforts currently being made in FL, its environmental impact is still an open problem. Targeting the minimization of the overall energy consumption of an FL process, we propose the orchestration of computational and communication resources of the involved devices to minimize the total energy required, while guaranteeing a certain performance of the model. To this end, we propose a Soft Actor Critic Deep Reinforcement Learning (DRL) solution, where a penalty function is introduced during training, penalizing the strategies that violate the constraints of the environment, and ensuring a safe RL process. A device level synchronization method, along with a computationally cost effective FL environment are proposed, with the goal of further reducing the energy consumption and communication overhead. Evaluation results show the effectiveness of the proposed scheme compared to four state-of-the-art baseline solutions in both static and dynamic environments, achieving a decrease of up to 94% in the total energy consumption.

摘要

Deep Evidential Learning for Bayesian Quantile Regression

paper_url: http://arxiv.org/abs/2308.10650
repo_url: None
paper_authors: Frederik Boe Hüttel, Filipe Rodrigues, Francisco Câmara Pereira
For: The paper proposes a deep Bayesian quantile regression model for estimating the quantiles of a continuous target distribution without assuming a Gaussian distribution.* Methods: The proposed method is based on evidential learning, which allows the model to capture aleatoric and epistemic uncertainty with a single deterministic forward-pass model.* Results: The method achieves calibrated uncertainties on non-Gaussian distributions, disentanglement of aleatoric and epistemic uncertainty, and robustness to out-of-distribution samples.Here are the three points in Simplified Chinese text:* For: 这篇论文提出了一种深度 Bayesian 量化回归模型，用于估计连续目标分布中的分位数，不需要假设 Gaussian 分布。* Methods: 该方法基于证据学习，通过单个推理途径模型，捕捉到随机变量和模型不确定性。* Results: 方法实现了非 Gaussian 分布上的准确性，分解随机变量和模型不确定性，并对异常样本进行robust处理。

Abstract
It is desirable to have accurate uncertainty estimation from a single deterministic forward-pass model, as traditional methods for uncertainty quantification are computationally expensive. However, this is difficult because single forward-pass models do not sample weights during inference and often make assumptions about the target distribution, such as assuming it is Gaussian. This can be restrictive in regression tasks, where the mean and standard deviation are inadequate to model the target distribution accurately. This paper proposes a deep Bayesian quantile regression model that can estimate the quantiles of a continuous target distribution without the Gaussian assumption. The proposed method is based on evidential learning, which allows the model to capture aleatoric and epistemic uncertainty with a single deterministic forward-pass model. This makes the method efficient and scalable to large models and datasets. We demonstrate that the proposed method achieves calibrated uncertainties on non-Gaussian distributions, disentanglement of aleatoric and epistemic uncertainty, and robustness to out-of-distribution samples.

摘要
Desirable 是有准确的不确定性估计从单个决定性前进模型，因为传统的不确定性量化方法是计算成本费时的。然而，这是困难的，因为单个前进模型在推理过程中不会采样权重，并常常假设目标分布是 Gaussian。这可能是回归任务中的限制，因为mean和标准差不够用于准确地模型目标分布。这篇文章提议了深度 bayesian 量化回归模型，可以无需 Gaussian 假设来估计连续目标分布的量iles。该方法基于证据学习，使得模型可以捕捉 aleatoric 和 epistemic 不确定性，并且可以使用单个决定性前进模型来实现高效和可扩展。我们示示了该方法可以实现准确的不确定性估计，分离 aleatoric 和 epistemic 不确定性，并且对于非 Gaussian 分布有 robustness。

EVE: Efficient zero-shot text-based Video Editing with Depth Map Guidance and Temporal Consistency Constraints

paper_url: http://arxiv.org/abs/2308.10648
repo_url: None
paper_authors: Yutao Chen, Xingning Dong, Tian Gan, Chunluan Zhou, Ming Yang, Qingpei Guo
for: 提高文本视频编辑效果
methods: 使用深度地图和时间一致约束，实现有效且高效的文本视频编辑
results: 通过实验证明EVE可以实现满意的平衡 между性能和效率，并提供了一个新的评价指标ZVE-50集合。

Abstract
Motivated by the superior performance of image diffusion models, more and more researchers strive to extend these models to the text-based video editing task. Nevertheless, current video editing tasks mainly suffer from the dilemma between the high fine-tuning cost and the limited generation capacity. Compared with images, we conjecture that videos necessitate more constraints to preserve the temporal consistency during editing. Towards this end, we propose EVE, a robust and efficient zero-shot video editing method. Under the guidance of depth maps and temporal consistency constraints, EVE derives satisfactory video editing results with an affordable computational and time cost. Moreover, recognizing the absence of a publicly available video editing dataset for fair comparisons, we construct a new benchmark ZVE-50 dataset. Through comprehensive experimentation, we validate that EVE could achieve a satisfactory trade-off between performance and efficiency. We will release our dataset and codebase to facilitate future researchers.

摘要
<>驱动于图像扩散模型的出色表现，越来越多的研究人员努力将这些模型扩展到文本基于视频编辑任务中。然而，当前的视频编辑任务主要受到高精度调整成本和生成能力的限制。相比于图像，我们推测视频需要更多的约束来保持时间一致性。为了实现这一目标，我们提出了EVE，一种可靠和高效的零shot视频编辑方法。在深度地图和时间一致性约束的导引下，EVE可以获得高质量的视频编辑结果，同时具有可接受的计算和时间成本。尽管目前没有公共可用的视频编辑数据集，我们为了促进后续研究人员的比较，构建了新的benchmark数据集ZVE-50。通过广泛的实验，我们证明EVE可以实现一个满意的性能和效率之间的交易。我们将将我们的数据集和代码库发布，以便未来的研究人员使用。

SCULPT: Shape-Conditioned Unpaired Learning of Pose-dependent Clothed and Textured Human Meshes

paper_url: http://arxiv.org/abs/2308.10638
repo_url: None
paper_authors: Soubhik Sanyal, Partha Ghosh, Jinlong Yang, Michael J. Black, Justus Thies, Timo Bolkart
for: The paper is written for generating 3D clothed human bodies with realistic texture and pose.
methods: The paper uses a deep neural network to learn the geometry and appearance distribution of clothed human bodies, using both 3D scan data and 2D image data. The network is trained in an unpaired manner, and the authors use attribute labels to alleviate entanglement between pose and clothing type, and pose and clothing appearance.
results: The paper presents the SCULPT dataset, a novel 3D generative model for clothed and textured 3D meshes of humans, and compares the results to state-of-the-art 3D generative models for clothed human bodies. The authors show that their method can generate highly realistic and diverse 3D clothed human bodies with realistic texture and pose.Here is the information in Simplified Chinese text:
for: 本文是为生成 clothed 人体 3D 模型而写的，以实现真实的 texture 和 pose。
methods: 本文使用深度神经网络来学习 clothed 人体的 geometry 和 appearance 分布，使用 3D 扫描数据和 2D 图像数据。网络在不带标签的情况下进行训练，并使用 attribute 标签来降低 pose 和衣服类型之间的杂糅，以及 pose 和衣服外观之间的杂糅。
results: 本文提出了 SCULPT 数据集，一种 novel 的 3D 生成模型，用于 clothed 和 textured 3D 模型的人体。并与当前最佳的 clothed 人体 3D 生成模型进行比较，显示其可以生成高度真实和多样化的 3D clothed 人体，并且可以实现真实的 texture 和 pose。

Abstract
We present SCULPT, a novel 3D generative model for clothed and textured 3D meshes of humans. Specifically, we devise a deep neural network that learns to represent the geometry and appearance distribution of clothed human bodies. Training such a model is challenging, as datasets of textured 3D meshes for humans are limited in size and accessibility. Our key observation is that there exist medium-sized 3D scan datasets like CAPE, as well as large-scale 2D image datasets of clothed humans and multiple appearances can be mapped to a single geometry. To effectively learn from the two data modalities, we propose an unpaired learning procedure for pose-dependent clothed and textured human meshes. Specifically, we learn a pose-dependent geometry space from 3D scan data. We represent this as per vertex displacements w.r.t. the SMPL model. Next, we train a geometry conditioned texture generator in an unsupervised way using the 2D image data. We use intermediate activations of the learned geometry model to condition our texture generator. To alleviate entanglement between pose and clothing type, and pose and clothing appearance, we condition both the texture and geometry generators with attribute labels such as clothing types for the geometry, and clothing colors for the texture generator. We automatically generated these conditioning labels for the 2D images based on the visual question answering model BLIP and CLIP. We validate our method on the SCULPT dataset, and compare to state-of-the-art 3D generative models for clothed human bodies. We will release the codebase for research purposes.

摘要
我们介绍SCULPT，一种新的3D生成模型，用于生成具有衣服和文字的3D网格人体。我们开发了一个深度神经网络，用于表示人体凝聚的几何和外观分布。因为这类数据集的大小和可访问性受限，训练这种模型是一项挑战。我们的关键观察是，有一些中等大小的3D扫描数据集，如CAPE，以及大规模的2D图像数据集，包含不同的衣服和多个出现。我们提出了一种不归纳学习方法，用于学习pose-dependent的凝聚和文字的人体网格。我们将3D扫描数据集中的每个骨骼位置转换为SMPL模型中的每个骨骼位置的偏移量。然后，我们在不监督的情况下使用2D图像数据集来训练一个凝聚 conditioned的文字生成器。我们使用了学习的凝聚空间中的中间层的活动来condition我们的文字生成器。为了消除姿势和衣服类型之间的杂糅，以及姿势和衣服外观之间的杂糅，我们将文字生成器和凝聚生成器分别conditioned with attribute标签，如衣服类型和衣服颜色。我们自动生成了这些conditioning标签基于视觉问答模型BLIP和CLIP。我们验证了我们的方法在SCULPT数据集上，并与状态艺术3D生成模型进行比较。我们将发布代码库用于研究用途。

RaLLe: A Framework for Developing and Evaluating Retrieval-Augmented Large Language Models

paper_url: http://arxiv.org/abs/2308.10633
repo_url: https://github.com/yhoshi3/ralle
paper_authors: Yasuto Hoshi, Daisuke Miyashita, Youyang Ng, Kento Tatsuno, Yasuhiro Morioka, Osamu Torii, Jun Deguchi
for: 提高知识型任务中的答案准确率，使用 Retrieval-augmented large language models (R-LLMs) combine pre-trained large language models (LLMs) 与信息检索系统。
methods: RaLLe 是一个开源框架，用于发展、评估和优化知识型任务中 R-LLMs 的性能。 RaLLe 提供了高度可配置的批处理、精细的检索和生成过程评估、以及可量化的系统性能评估。
results: RaLLe 可以帮助开发者提高 R-LLMs 的性能和准确率，特别是在知识型任务中。

Abstract
Retrieval-augmented large language models (R-LLMs) combine pre-trained large language models (LLMs) with information retrieval systems to improve the accuracy of factual question-answering. However, current libraries for building R-LLMs provide high-level abstractions without sufficient transparency for evaluating and optimizing prompts within specific inference processes such as retrieval and generation. To address this gap, we present RaLLe, an open-source framework designed to facilitate the development, evaluation, and optimization of R-LLMs for knowledge-intensive tasks. With RaLLe, developers can easily develop and evaluate R-LLMs, improving hand-crafted prompts, assessing individual inference processes, and objectively measuring overall system performance quantitatively. By leveraging these features, developers can enhance the performance and accuracy of their R-LLMs in knowledge-intensive generation tasks. We open-source our code at https://github.com/yhoshi3/RaLLe.

摘要
大型自然语言模型（R-LLM）结合信息检索系统，以提高问答准确性。然而，现有的R-LLM构建库提供了高级抽象，无法准确评估和优化提问过程中的具体步骤，如检索和生成。为解决这个空白，我们提出了RaLLe框架，用于促进R-LLM的开发、评估和优化。RaLLe提供了轻松开发和评估R-LLM的功能，以及评估具体步骤的能力，从而使开发者可以Quantitatively评估R-LLM的性能。通过这些特性，开发者可以提高知识密集任务中R-LLM的性能和准确性。我们在 GitHub 上开源了代码，请参考。

Weighting by Tying: A New Approach to Weighted Rank Correlation

paper_url: http://arxiv.org/abs/2308.10622
repo_url: None
paper_authors: Sascha Henzgen, Eyke Hüllermeier
for: 这篇论文旨在探讨权重rank correlation测量方法，用于捕捉两个顺序的同一集合中的协调度。
methods: 这篇论文提出了一种基于杂分函数的权重rank correlation测量方法，具有sound的形式基础和灵活的定义方式。
results: 该方法可以具有协调度的权重，并且可以在不同的应用中适应不同的权重分配方式。

Abstract
Measures of rank correlation are commonly used in statistics to capture the degree of concordance between two orderings of the same set of items. Standard measures like Kendall's tau and Spearman's rho coefficient put equal emphasis on each position of a ranking. Yet, motivated by applications in which some of the positions (typically those on the top) are more important than others, a few weighted variants of these measures have been proposed. Most of these generalizations fail to meet desirable formal properties, however. Besides, they are often quite inflexible in the sense of committing to a fixed weighing scheme. In this paper, we propose a weighted rank correlation measure on the basis of fuzzy order relations. Our measure, called scaled gamma, is related to Goodman and Kruskal's gamma rank correlation. It is parametrized by a fuzzy equivalence relation on the rank positions, which in turn is specified conveniently by a so-called scaling function. This approach combines soundness with flexibility: it has a sound formal foundation and allows for weighing rank positions in a flexible way.

摘要
Statistics 中常用度量对应关系来捕捉两个顺序的同一集合中元素之间的协调程度。标准度量如Kendall的tau和Spearman的rho系数均给每个排名位置同样的重要性。然而，受应用场景中一些排名位置（通常是排名首部）的重要性更高的情况下，一些加权变体被提议。大多数这些扩展都缺乏合适的正式性质，而且经常具有固定的加权方案。在这篇论文中，我们提出了基于杂化顺序关系的加权排名相关度量。我们的度量被称为归一化γ，与Goodman和Kruskal的γ排名相关度量有关。它是由杂化等价关系在排名位置上的一个杂化函数来规定的。这种方法结合了准确性和灵活性：它具有准确的形式基础，同时允许在灵活的方式下对排名位置进行加权。

Large Language Models for Software Engineering: A Systematic Literature Review

paper_url: http://arxiv.org/abs/2308.10620
repo_url: None
paper_authors: Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John Grundy, Haoyu Wang
for: 本研究的目的是实现大语言模型（LLMs）在软件工程（SE）中的应用和促进进程和结果的优化。
methods: 本研究使用了系统性的文献综述方法，收集了229篇2017年至2023年的研究论文，以回答四个关键研究询问（RQs）。
results: 本研究发现了不同类型的LLMs在SE任务中的应用，以及质感资料收集、预处理和应用的方法。此外，本研究也发现了LLMs在SE任务中的实际贡献，以及优化和评估LLMs的性能的策略。

Abstract
Large Language Models (LLMs) have significantly impacted numerous domains, notably including Software Engineering (SE). Nevertheless, a well-rounded understanding of the application, effects, and possible limitations of LLMs within SE is still in its early stages. To bridge this gap, our systematic literature review takes a deep dive into the intersection of LLMs and SE, with a particular focus on understanding how LLMs can be exploited in SE to optimize processes and outcomes. Through a comprehensive review approach, we collect and analyze a total of 229 research papers from 2017 to 2023 to answer four key research questions (RQs). In RQ1, we categorize and provide a comparative analysis of different LLMs that have been employed in SE tasks, laying out their distinctive features and uses. For RQ2, we detail the methods involved in data collection, preprocessing, and application in this realm, shedding light on the critical role of robust, well-curated datasets for successful LLM implementation. RQ3 allows us to examine the specific SE tasks where LLMs have shown remarkable success, illuminating their practical contributions to the field. Finally, RQ4 investigates the strategies employed to optimize and evaluate the performance of LLMs in SE, as well as the common techniques related to prompt optimization. Armed with insights drawn from addressing the aforementioned RQs, we sketch a picture of the current state-of-the-art, pinpointing trends, identifying gaps in existing research, and flagging promising areas for future study.

摘要
大语言模型（LLM）在软件工程（SE）领域产生了深远的影响，但是对于LLM在SE应用的全面理解仍处于早期阶段。为bridging这个差距，我们进行了系统性的文献综述，深入探讨LLM在SE中的应用、效果和可能的限制。通过涵盖229篇2017-2023年的研究论文，我们回答了四个关键研究问题（RQ）：RQ1：我们对不同类型的LLM在SE任务中使用的不同特点和用途进行了分类和比较分析。RQ2：我们详细介绍了在这个领域中数据收集、处理和应用的方法，探讨了robust和准确的数据集的重要性，以便成功地应用LLM。RQ3：我们探讨了LLM在SE任务中显著成功的场景，描述了它们的实践贡献。RQ4：我们调查了用于优化和评估LLM在SE中表现的策略，以及相关的提示优化技术。通过回答这些RQ，我们获得了关于LLM在SE中的当前状况，揭示出趋势、缺失和未来研究的潜在领域。

BackTrack: Robust template update via Backward Tracking of candidate template

paper_url: http://arxiv.org/abs/2308.10604
repo_url: None
paper_authors: Dongwook Lee, Wonjun Choi, Seohyung Lee, ByungIn Yoo, Eunho Yang, Seongju Hwang
for:这篇论文的目的是提出一个可靠且可靠的模板更新方法，以解决视觉对象跟踪中的模板更新问题，包括对象的形状变化、照明不同、遮挡等等。methods:这篇论文使用的方法是BackTrack，它是一个基于前向追踪的模板更新方法，可以量化候选模板的信任度，并在适当的时间更新模板。results:实验结果显示，BackTrack可以在不同的追踪评估测试上实现SOTA的追踪性能，并且在对象的形状变化、照明不同、遮挡等等情况下具有较高的可靠性。

Abstract
Variations of target appearance such as deformations, illumination variance, occlusion, etc., are the major challenges of visual object tracking that negatively impact the performance of a tracker. An effective method to tackle these challenges is template update, which updates the template to reflect the change of appearance in the target object during tracking. However, with template updates, inadequate quality of new templates or inappropriate timing of updates may induce a model drift problem, which severely degrades the tracking performance. Here, we propose BackTrack, a robust and reliable method to quantify the confidence of the candidate template by backward tracking it on the past frames. Based on the confidence score of candidates from BackTrack, we can update the template with a reliable candidate at the right time while rejecting unreliable candidates. BackTrack is a generic template update scheme and is applicable to any template-based trackers. Extensive experiments on various tracking benchmarks verify the effectiveness of BackTrack over existing template update algorithms, as it achieves SOTA performance on various tracking benchmarks.

摘要
target appearance variations, such as deformations, illumination variance, and occlusion, are the main challenges of visual object tracking that negatively impact the performance of a tracker. an effective method to address these challenges is template update, which updates the template to reflect the change of appearance in the target object during tracking. however, with template updates, the quality of new templates or the timing of updates may cause a model drift problem, which severely degrades the tracking performance. here, we propose BackTrack, a robust and reliable method to quantify the confidence of the candidate template by backward tracking it on past frames. based on the confidence score of candidates from BackTrack, we can update the template with a reliable candidate at the right time while rejecting unreliable candidates. BackTrack is a generic template update scheme and is applicable to any template-based trackers. extensive experiments on various tracking benchmarks verify the effectiveness of BackTrack over existing template update algorithms, as it achieves SOTA performance on various tracking benchmarks.

Age Recommendation from Texts and Sentences for Children

paper_url: http://arxiv.org/abs/2308.10586
repo_url: None
paper_authors: Rashedur Rahman, Gwénolé Lecorvé, Nicolas Béchet
for: 这个论文的目的是提出一种自动计算适合儿童阅读的年龄的方法，以便对儿童的阅读习惯进行个性化推荐。
methods: 该论文使用了现代机器学习模型Transformers进行年龄预测任务，并评估了不同模型的性能。同时，还进行了初步的解释性分析，检查了不同语言特征对年龄预测的影响。
results: 该论文的实验结果显示，使用Transformers模型可以在测试集上达到0.98和1.83的MAE分数，在文本级和句子级的年龄预测任务中均有显著的改善。同时，与专家的建议相比，文本级预测模型的性能相对较强，而句子级预测模型的性能甚至超过专家的建议。

Abstract
Children have less text understanding capability than adults. Moreover, this capability differs among the children of different ages. Hence, automatically predicting a recommended age based on texts or sentences would be a great benefit to propose adequate texts to children and to help authors writing in the most appropriate way. This paper presents our recent advances on the age recommendation task. We consider age recommendation as a regression task, and discuss the need for appropriate evaluation metrics, study the use of state-of-the-art machine learning model, namely Transformers, and compare it to different models coming from the literature. Our results are also compared with recommendations made by experts. Further, this paper deals with preliminary explainability of the age prediction model by analyzing various linguistic features. We conduct the experiments on a dataset of 3, 673 French texts (132K sentences, 2.5M words). To recommend age at the text level and sentence level, our best models achieve MAE scores of 0.98 and 1.83 respectively on the test set. Also, compared to the recommendations made by experts, our sentence-level recommendation model gets a similar score to the experts, while the text-level recommendation model outperforms the experts by an MAE score of 1.48.

摘要
children have less text understanding ability than adults, and this ability varies among children of different ages. Therefore, automatically predicting the recommended age based on texts or sentences would be a great benefit to provide appropriate texts for children and to help authors write in the most appropriate way. This paper presents our recent advances on the age recommendation task. We consider age recommendation as a regression task, and discuss the need for appropriate evaluation metrics, study the use of state-of-the-art machine learning models, namely Transformers, and compare them to different models from the literature. Our results are also compared with recommendations made by experts. Furthermore, this paper explores the preliminary explainability of the age prediction model by analyzing various linguistic features. We conduct experiments on a dataset of 3,673 French texts (132K sentences, 2.5M words) to recommend age at the text level and sentence level. Our best models achieve MAE scores of 0.98 and 1.83 respectively on the test set. Additionally, compared to the recommendations made by experts, our sentence-level recommendation model achieves a similar score to the experts, while the text-level recommendation model outperforms the experts by an MAE score of 1.48.Here's the translation in Traditional Chinese:children have less text understanding ability than adults, and this ability varies among children of different ages. Therefore, automatically predicting the recommended age based on texts or sentences would be a great benefit to provide appropriate texts for children and to help authors write in the most appropriate way. This paper presents our recent advances on the age recommendation task. We consider age recommendation as a regression task, and discuss the need for appropriate evaluation metrics, study the use of state-of-the-art machine learning models, namely Transformers, and compare them to different models from the literature. Our results are also compared with recommendations made by experts. Furthermore, this paper explores the preliminary explainability of the age prediction model by analyzing various linguistic features. We conduct experiments on a dataset of 3,673 French texts (132K sentences, 2.5M words) to recommend age at the text level and sentence level. Our best models achieve MAE scores of 0.98 and 1.83 respectively on the test set. Additionally, compared to the recommendations made by experts, our sentence-level recommendation model achieves a similar score to the experts, while the text-level recommendation model outperforms the experts by an MAE score of 1.48.

Pseudo-online framework for BCI evaluation: A MOABB perspective

paper_url: http://arxiv.org/abs/2308.11656
repo_url: None
paper_authors: Igor Carrara, Théodore Papadopoulo
for:* 这个研究旨在扩展现有的MOABB框架，以在pseudo-online模式下进行不同算法的比较，并使用基于重叠滑块窗口技术来模拟真实时间环境。methods:* 这个研究使用了offline模式下的EEG数据，并引入了idle项目事件以考虑所有不同的可能性。* 以normalized Matthews Correlation Coefficient (nMCC)和Information Transfer Rate (ITR)来评估算法的性能。results:* 这个研究分析了过去15年最佳的算法，在多个motor imagination（MI）数据集上进行了比较，显示了两种方法之间的 statistically significant differences。

Abstract
Objective: BCI (Brain-Computer Interface) technology operates in three modes: online, offline, and pseudo-online. In the online mode, real-time EEG data is constantly analyzed. In offline mode, the signal is acquired and processed afterwards. The pseudo-online mode processes collected data as if they were received in real-time. The main difference is that the offline mode often analyzes the whole data, while the online and pseudo-online modes only analyze data in short time windows. Offline analysis is usually done with asynchronous BCIs, which restricts analysis to predefined time windows. Asynchronous BCI, compatible with online and pseudo-online modes, allows flexible mental activity duration. Offline processing tends to be more accurate, while online analysis is better for therapeutic applications. Pseudo-online implementation approximates online processing without real-time constraints. Many BCI studies being offline introduce biases compared to real-life scenarios, impacting classification algorithm performance. Approach: The objective of this research paper is therefore to extend the current MOABB framework, operating in offline mode, so as to allow a comparison of different algorithms in a pseudo-online setting with the use of a technology based on overlapping sliding windows. To do this will require the introduction of a idle state event in the dataset that takes into account all different possibilities that are not task thinking. To validate the performance of the algorithms we will use the normalized Matthews Correlation Coefficient (nMCC) and the Information Transfer Rate (ITR). Main results: We analyzed the state-of-the-art algorithms of the last 15 years over several Motor Imagery (MI) datasets composed by several subjects, showing the differences between the two approaches from a statistical point of view. Significance: The ability to analyze the performance of different algorithms in offline and pseudo-online modes will allow the BCI community to obtain more accurate and comprehensive reports regarding the performance of classification algorithms.

摘要
目标：BCI（脑机器交互）技术运行在三个模式下：在线、离线和pseudo-在线模式。在在线模式下，实时EEG数据在分析。在离线模式下，信号被收集并后期处理。pseudo-在线模式处理收集的数据，如果收集的数据是在实时received的话。主要区别在于离线模式通常分析整个数据，而在线和pseudo-在线模式只分析短时间窗口中的数据。离线分析通常更加准确，而在线分析更适合治疗应用。pseudo-在线实现方式模拟了实时处理，不受实时限制。许多BCI研究在离线 introduce bias，影响分类算法性能。方法：本研究的目标是扩展现有的MOABB框架，从离线模式转换到pseudo-在线模式，以便对不同算法进行比较。为此，需要引入一个空闲状态事件，考虑所有不同的可能性，不是任务思考。以 validate算法性能，我们使用normalized Matthews Correlation Coefficient (nMCC)和Information Transfer Rate (ITR)。主要结果：我们分析了过去15年最新的BCI算法，在多个motor imagination（MI）数据集上，显示了两种方法之间的差异从统计角度。重要性：能够在离线和pseudo-在线模式下分析不同算法的性能，将BCI社区获得更加准确和全面的报告关于分类算法性能。

Overcoming Overconfidence for Active Learning

paper_url: http://arxiv.org/abs/2308.10571
repo_url: None
paper_authors: Yujin Hwang, Won Jo, Juyoung Hong, Yukyung Choi
for: 这篇论文的目的是提出两种解决活动学习场景中过度自信的问题的方法。methods: 这篇论文使用了两种方法来解决过度自信问题：一种是扩展限定训练分布的混合策略（Cross-Mix-and-Mix，CMaM），另一种是根据投票率对数据进行选择（Ranked Margin Sampling，RankedMS）。results: 经过多种实验和分析，这两种方法能够有效地减少过度自信，并且可以应用于实际场景。

Abstract
It is not an exaggeration to say that the recent progress in artificial intelligence technology depends on large-scale and high-quality data. Simultaneously, a prevalent issue exists everywhere: the budget for data labeling is constrained. Active learning is a prominent approach for addressing this issue, where valuable data for labeling is selected through a model and utilized to iteratively adjust the model. However, due to the limited amount of data in each iteration, the model is vulnerable to bias; thus, it is more likely to yield overconfident predictions. In this paper, we present two novel methods to address the problem of overconfidence that arises in the active learning scenario. The first is an augmentation strategy named Cross-Mix-and-Mix (CMaM), which aims to calibrate the model by expanding the limited training distribution. The second is a selection strategy named Ranked Margin Sampling (RankedMS), which prevents choosing data that leads to overly confident predictions. Through various experiments and analyses, we are able to demonstrate that our proposals facilitate efficient data selection by alleviating overconfidence, even though they are readily applicable.

摘要
不夸张地说，现代人工智能技术的进步几乎完全取决于大规模和高质量的数据。然而，一个普遍存在的问题是，数据标注预算受限。活动学习是一种常用的方法，通过选择模型中认为有价值的数据进行标注，并将其用于反馈调整模型。然而，由于每轮标注数据有限，模型容易受到偏见，因此更容易产生过自信预测。在这篇论文中，我们提出了两种解决活动学习场景中出现的过自信问题的方法。一种是名为混合混合混合（CMaM）的扩展策略，旨在通过扩展有限训练分布来规范模型。另一种是名为排名边缘抽样（RankedMS）的选择策略，避免选择导致过自信预测的数据。通过多种实验和分析，我们能够证明，我们的建议可以减轻过自信，使得数据选择变得更加高效。

Metaverse: A Vision, Architectural Elements, and Future Directions for Scalable and Realtime Virtual Worlds

paper_url: http://arxiv.org/abs/2308.10559
repo_url: None
paper_authors: Leila Ismail, Rajkumar Buyya
for: 这篇论文主要针对Metaverse的实现和发展提出了新的要求。
methods: 该论文通过对Metaverse定义的时间演化和需求的分析，提出了可扩展、可靠、高效的Metaverse系统的建构元素和应用分类。
results: 该论文提出了Metaverse实现的新要求，并提供了可扩展、可靠、高效的Metaverse系统的建构元素和应用分类，以及未来研究方向。

Abstract
With the emergence of Cloud computing, Internet of Things-enabled Human-Computer Interfaces, Generative Artificial Intelligence, and high-accurate Machine and Deep-learning recognition and predictive models, along with the Post Covid-19 proliferation of social networking, and remote communications, the Metaverse gained a lot of popularity. Metaverse has the prospective to extend the physical world using virtual and augmented reality so the users can interact seamlessly with the real and virtual worlds using avatars and holograms. It has the potential to impact people in the way they interact on social media, collaborate in their work, perform marketing and business, teach, learn, and even access personalized healthcare. Several works in the literature examine Metaverse in terms of hardware wearable devices, and virtual reality gaming applications. However, the requirements of realizing the Metaverse in realtime and at a large-scale need yet to be examined for the technology to be usable. To address this limitation, this paper presents the temporal evolution of Metaverse definitions and captures its evolving requirements. Consequently, we provide insights into Metaverse requirements. In addition to enabling technologies, we lay out architectural elements for scalable, reliable, and efficient Metaverse systems, and a classification of existing Metaverse applications along with proposing required future research directions.

摘要
With the emergence of Cloud computing, Internet of Things-enabled Human-Computer Interfaces, Generative Artificial Intelligence, and high-accurate Machine and Deep-learning recognition and predictive models, along with the Post Covid-19 proliferation of social networking, and remote communications, the Metaverse has gained a lot of popularity. The Metaverse has the prospective to extend the physical world using virtual and augmented reality, allowing users to interact seamlessly with the real and virtual worlds using avatars and holograms. It has the potential to impact people in the way they interact on social media, collaborate in their work, perform marketing and business, teach, learn, and even access personalized healthcare. While several works in the literature examine Metaverse in terms of hardware wearable devices and virtual reality gaming applications, the requirements of realizing the Metaverse in real-time and at a large-scale have yet to be fully explored. To address this limitation, this paper presents the temporal evolution of Metaverse definitions and captures its evolving requirements. Consequently, we provide insights into Metaverse requirements, as well as architectural elements for scalable, reliable, and efficient Metaverse systems, and a classification of existing Metaverse applications along with proposing required future research directions.

KGrEaT: A Framework to Evaluate Knowledge Graphs via Downstream Tasks

paper_url: http://arxiv.org/abs/2308.10537
repo_url: None
paper_authors: Nicolas Heist, Sven Hertling, Heiko Paulheim
for: This research paper aims to evaluate the quality of knowledge graphs (KGs) for downstream tasks, rather than just their correctness and completeness.
methods: The paper presents a framework called KGrEaT, which stands for Knowledge Graph Evaluation via Actual Tasks. KGrEaT maps a given KG to datasets for evaluation on various tasks and computes performance metrics for each task.
results: The paper shows that KGrEaT can be used to evaluate KGs on a fixed task setup, providing a more comprehensive assessment of their quality than traditional evaluation metrics. Additionally, KGrEaT is modular and can be easily extended with additional tasks and datasets.

Abstract
In recent years, countless research papers have addressed the topics of knowledge graph creation, extension, or completion in order to create knowledge graphs that are larger, more correct, or more diverse. This research is typically motivated by the argumentation that using such enhanced knowledge graphs to solve downstream tasks will improve performance. Nonetheless, this is hardly ever evaluated. Instead, the predominant evaluation metrics - aiming at correctness and completeness - are undoubtedly valuable but fail to capture the complete picture, i.e., how useful the created or enhanced knowledge graph actually is. Further, the accessibility of such a knowledge graph is rarely considered (e.g., whether it contains expressive labels, descriptions, and sufficient context information to link textual mentions to the entities of the knowledge graph). To better judge how well knowledge graphs perform on actual tasks, we present KGrEaT - a framework to estimate the quality of knowledge graphs via actual downstream tasks like classification, clustering, or recommendation. Instead of comparing different methods of processing knowledge graphs with respect to a single task, the purpose of KGrEaT is to compare various knowledge graphs as such by evaluating them on a fixed task setup. The framework takes a knowledge graph as input, automatically maps it to the datasets to be evaluated on, and computes performance metrics for the defined tasks. It is built in a modular way to be easily extendable with additional tasks and datasets.

摘要
近年来， countless research papers Addressed the topics of knowledge graph creation, extension, or completion，aiming to create larger, more correct, or more diverse knowledge graphs. This research is typically motivated by the argument that using such enhanced knowledge graphs to solve downstream tasks will improve performance. However, this is rarely evaluated. Instead, the predominant evaluation metrics - aiming at correctness and completeness - are valuable but fail to capture the complete picture, i.e., how useful the created or enhanced knowledge graph actually is. Furthermore, the accessibility of such a knowledge graph is rarely considered (e.g., whether it contains expressive labels, descriptions, and sufficient context information to link textual mentions to the entities of the knowledge graph).To better judge how well knowledge graphs perform on actual tasks, we present KGrEaT - a framework to estimate the quality of knowledge graphs via actual downstream tasks like classification, clustering, or recommendation. Instead of comparing different methods of processing knowledge graphs with respect to a single task, the purpose of KGrEaT is to compare various knowledge graphs as such by evaluating them on a fixed task setup. The framework takes a knowledge graph as input, automatically maps it to the datasets to be evaluated on, and computes performance metrics for the defined tasks. It is built in a modular way to be easily extendable with additional tasks and datasets.

Dataset Quantization

paper_url: http://arxiv.org/abs/2308.10524
repo_url: https://github.com/magic-research/dataset_quantization
paper_authors: Daquan Zhou, Kai Wang, Jianyang Gu, Xiangyu Peng, Dongze Lian, Yifan Zhang, Yang You, Jiashi Feng
for: 这篇论文的目的是提出一个新的数据集Quantization（DQ）框架，以将大规模数据集缩小为小型subset，并且可以用于训练任意神经网络架构。
methods: 本论文使用了 Gradient Matching 技术来实现数据集的缩小，并且透过对数据集进行排序和压缩来实现数据集的缩小。
results: 实验结果显示，DQ 可以将 ImageNet-1k 等大规模数据集缩小为小型subset，并且可以保持模型的表现力。此外，DQ 还可以在不同的神经网络架下进行模型训练，并且可以在数据集缩小后实现模型的适当调整。

Abstract
State-of-the-art deep neural networks are trained with large amounts (millions or even billions) of data. The expensive computation and memory costs make it difficult to train them on limited hardware resources, especially for recent popular large language models (LLM) and computer vision models (CV). Recent popular dataset distillation methods are thus developed, aiming to reduce the number of training samples via synthesizing small-scale datasets via gradient matching. However, as the gradient calculation is coupled with the specific network architecture, the synthesized dataset is biased and performs poorly when used for training unseen architectures. To address these limitations, we present dataset quantization (DQ), a new framework to compress large-scale datasets into small subsets which can be used for training any neural network architectures. Extensive experiments demonstrate that DQ is able to generate condensed small datasets for training unseen network architectures with state-of-the-art compression ratios for lossless model training. To the best of our knowledge, DQ is the first method that can successfully distill large-scale datasets such as ImageNet-1k with a state-of-the-art compression ratio. Notably, with 60% data from ImageNet and 20% data from Alpaca's instruction tuning data, the models can be trained with negligible or no performance drop for both vision tasks (including classification, semantic segmentation, and object detection) as well as language tasks (including instruction tuning tasks such as BBH and DROP).

摘要
现代深度神经网络通常需要大量数据进行训练，但这些数据的计算和内存成本高昂，尤其是在最新的大型自然语言模型（LLM）和计算机视觉模型（CV）中。为了解决这些限制，我们提出了数据压缩（DQ）框架，可以压缩大规模数据集成小型 subsets，可以用于训练任何神经网络架构。我们进行了广泛的实验，并证明了DQ可以生成高质量、高压缩比的小型数据集，用于训练未看过的网络架构。此外，我们发现，只使用60%的ImageNet数据和20%的Alpaca的指令调整数据，可以训练视觉任务（包括分类、semantic segmentation和对象检测）以及语言任务（包括指令调整任务），而无需或几乎无需性能下降。

When Less is Enough: Positive and Unlabeled Learning Model for Vulnerability Detection

paper_url: http://arxiv.org/abs/2308.10523
repo_url: https://github.com/pilot-vd-2023/pilot
paper_authors: Xin-Cheng Wen, Xinchen Wang, Cuiyun Gao, Shaohua Wang, Yang Liu, Zhaoquan Gu
for: 这个研究是为了解决自动代码漏洞探测中的问题，尤其是利用深度学习方法来探测漏洞。
methods: 这个研究使用了Positive和Unlabeled（PU）学习问题，提出了一个名为PILOT的模型，即PositIve和Unlabeled Learning mOdel for vulnerability deTection。PILOT只 learns from positive和Unlabeled数据进行漏洞探测。
results: PILOT模型可以从Positive和Unlabeled数据中获得更高的准确率，并且可以更好地避免欠拓扩数据的影响。

Abstract
Automated code vulnerability detection has gained increasing attention in recent years. The deep learning (DL)-based methods, which implicitly learn vulnerable code patterns, have proven effective in vulnerability detection. The performance of DL-based methods usually relies on the quantity and quality of labeled data. However, the current labeled data are generally automatically collected, such as crawled from human-generated commits, making it hard to ensure the quality of the labels. Prior studies have demonstrated that the non-vulnerable code (i.e., negative labels) tends to be unreliable in commonly-used datasets, while vulnerable code (i.e., positive labels) is more determined. Considering the large numbers of unlabeled data in practice, it is necessary and worth exploring to leverage the positive data and large numbers of unlabeled data for more accurate vulnerability detection. In this paper, we focus on the Positive and Unlabeled (PU) learning problem for vulnerability detection and propose a novel model named PILOT, i.e., PositIve and unlabeled Learning mOdel for vulnerability deTection. PILOT only learns from positive and unlabeled data for vulnerability detection. It mainly contains two modules: (1) A distance-aware label selection module, aiming at generating pseudo-labels for selected unlabeled data, which involves the inter-class distance prototype and progressive fine-tuning; (2) A mixed-supervision representation learning module to further alleviate the influence of noise and enhance the discrimination of representations.

摘要
自动化代码漏洞检测在最近几年内得到了越来越多的关注。基于深度学习（DL）方法的方法，可以隐式地学习漏洞代码的模式，已经证明了它们在漏洞检测中的效果。DL方法的性能通常取决于标注数据的量和质量。然而，目前的标注数据通常是通过自动收集的，如从人类生成的提交中爬取，因此很难保证标注的质量。先前的研究表明，非漏洞代码（即负标签）在通用的数据集中很难靠度，而漏洞代码（即正标签）更加准确。鉴于实际中的大量未标注数据，是必要和值得探索利用正确的数据和大量未标注数据来进行更加准确的漏洞检测。在这篇论文中，我们关注的是漏洞检测中的Positive和Unlabeled（PU）学习问题，并提出了一种名为PILOT的模型，即PositIve和unlabeled Learning mOdel for vulnerability deTection。PILOT只 learns from positive和unlabeled数据进行漏洞检测。它主要包括两个模块：1. 距离意识标签选择模块，旨在生成选择的未标注数据的 Pseudo-标签，这里涉及到了间类距离原型和进行进度调整的细化。2. 混合超级视角表示学习模块，以进一步减少噪声的影响和提高表示的推论能力。

Hybrid classical-quantum computing: are we forgetting the classical part in the binomial?

paper_url: http://arxiv.org/abs/2308.10513
repo_url: None
paper_authors: Esther Villar-Rodriguez, Aitor Gomez-Tejedor, Eneko Osaba
for: 本研究的主要目的是提出一个初步的分类法，用于 классифика hybrid quantum computing 方案，并提出一些关键问题，以促进对 quantum computing 应用中的实际挑战的研究。
methods: 本研究使用的方法包括 hybrid quantum computing 的概念分析和分类，以及对 hybrid 方案的应用场景和挑战的调查和分析。
results: 本研究提出了一个初步的分类法，并提出了一些关键问题，以促进对 quantum computing 应用中的实际挑战的研究。

Abstract
The expectations arising from the latest achievements in the quantum computing field are causing that researchers coming from classical artificial intelligence to be fascinated by this new paradigm. In turn, quantum computing, on the road towards usability, needs classical procedures. Hybridization is, in these circumstances, an indispensable step but can also be seen as a promising new avenue to get the most from both computational worlds. Nonetheless, hybrid approaches have now and will have in the future many challenges to face, which, if ignored, will threaten the viability or attractiveness of quantum computing for real-world applications. To identify them and pose pertinent questions, a proper characterization of the hybrid quantum computing field, and especially hybrid solvers, is compulsory. With this motivation in mind, the main purpose of this work is to propose a preliminary taxonomy for classifying hybrid schemes, and bring to the fore some questions to stir up researchers minds about the real challenges regarding the application of quantum computing.

摘要
最新的量子计算领域成果所带来的期望正在使классиical人工智能研究者们对这新的思维方式感到惊叹。然而，量子计算在实用化之路上需要经过类型的过程。融合是在这些情况下的一个不可或缺的步骤，同时也是一个有前途的新途径，以获得两个计算世界的最佳效果。然而，融合方案在未来会面临许多挑战，如果不加以解决，将对量子计算的实际应用产生威胁或吸引力。为了识别这些挑战并提出有关其应用的问题，我们需要进行一个正确的量子计算融合领域的Characterization，特别是融合解决方案。这是本文的主要目的，通过这种方式，我们可以提出一个初步的分类方法，并带来一些关于应用量子计算的真正挑战的问题，以便激发研究者们的思考和探索。

Performance Enhancement Leveraging Mask-RCNN on Bengali Document Layout Analysis

paper_url: http://arxiv.org/abs/2308.10511
repo_url: None
paper_authors: Shrestha Datta, Md Adith Mollah, Raisa Fairooz, Tariful Islam Fahim
for: 本研究旨在提高机器理解古 Bangla 文档的能力，特别是通过文档结构分析（DLA）将文档分成不同部分，如段落、图片和表格。
methods: 研究者使用了一种特殊的模型 called Mask R-CNN，通过步骤性的超参数调整来提高模型的性能。
results: 研究者在 BaDLAD 数据集上取得了一个好的 dice 分数为 0.889，但不是所有情况都是如此。研究者发现，每种语言都有其独特的挑战。

Abstract
Understanding digital documents is like solving a puzzle, especially historical ones. Document Layout Analysis (DLA) helps with this puzzle by dividing documents into sections like paragraphs, images, and tables. This is crucial for machines to read and understand these documents. In the DL Sprint 2.0 competition, we worked on understanding Bangla documents. We used a dataset called BaDLAD with lots of examples. We trained a special model called Mask R-CNN to help with this understanding. We made this model better by step-by-step hyperparameter tuning, and we achieved a good dice score of 0.889. However, not everything went perfectly. We tried using a model trained for English documents, but it didn't fit well with Bangla. This showed us that each language has its own challenges. Our solution for the DL Sprint 2.0 is publicly available at https://www.kaggle.com/competitions/dlsprint2/discussion/432201 along with notebooks, weights, and inference notebook.

摘要

Large Language Model as a User Simulator

paper_url: http://arxiv.org/abs/2308.11534
repo_url: https://github.com/FreedomIntelligence/ReaLM
paper_authors: Chuyi Kong, Yaxin Fan, Xiang Wan, Feng Jiang, Benyou Wang
for: 本研究的目的是推动ChatGPT的民主化，通过使用真实的用户和ChatGPT的对话，以提高人机对话的质量和多样性。
methods: 本研究使用了一种新的方法，即使用人类的问题作为学习目标，并使用用户模拟器（UserGPT）生成高质量的人类中心的 sintetic对话数据集（RealChat）。然后，这个数据集用于训练助手模型（ReaLM）。
results: 实验表明，ReaLM 模型在 Vicuna-Bench 和 MT-Bench 上与基eline模型相比，在Equivalent training set size时显著超越了它们。此外，人工评估也表明，我们的模型在同等规模下与当今最佳模型相匹配。

Abstract
The unparalleled performance of closed-sourced ChatGPT has sparked efforts towards its democratization, with notable strides made by leveraging real user and ChatGPT conversations, as evidenced by Vicuna. However, while current endeavors like Baize and UltraChat aim to auto-generate conversational data due to challenges in gathering human participation, they primarily rely on ChatGPT to simulate human behaviors based on directives rather than genuine human learning. This results in a limited scope, diminished diversity, and an absence of genuine multi-round conversational dynamics. To address the above issues, we innovatively target human questions extracted from genuine human-machine conversations as a learning goal and train a user simulator, UserGPT, to produce a high-quality human-centric synthetic conversation dataset, RealChat. Subsequently, this dataset trains our assistant model, ReaLM. Experimentally, ReaLM outpaces baseline models in both Vicuna-Bench and MT-Bench by pairwise comparison when considering equivalent training set sizes, and manual evaluation also shows that our model is highly competitive. Impressively, when fine-tuned with the latest LLaMA 2 model, ReaLM secured a leading score of 6.33 in the MT-Bench, outshining the contemporary same-scale models, including the LLaMA-2-7B-chat model. Further in-depth analysis demonstrates the scalability and transferability of our approach. A preliminary exploration into the interplay between training set data quality and resultant model performance is also undertaken, laying a robust groundwork for future investigations. The code is available at https://github.com/FreedomIntelligence/ReaLM.

摘要
“closed-source ChatGPT的无凡表现促使了它的民主化，如Vicuna所示。然而，现有的尝试如Baize和UltraChat尝试通过自动生成对话数据来缓解人工参与的挑战，但它们主要依靠ChatGPT来模拟人类行为，而不是真正的人类学习。这会导致 scope Limited, 多样性减退和缺乏真正的多回交流剖面。为解决这些问题，我们采用了一种创新的方法，targeting 人类问题从真正的人机对话中提取出来，并用UserGPT训练一个人centric的synthetic conversation dataset，RealChat。然后，这个数据集训练我们的助手模型，ReaLM。实验表明，ReaLM在 Vicuna-Bench 和 MT-Bench 上比基eline模型更高的性能，并且 manual evaluation 也表明我们的模型具有高级竞争力。印象地，当 fine-tune avec latest LLaMA 2 模型时，ReaLM 在 MT-Bench 上获得了6.33的最高分，击败了同等规模的同时代模型，包括 LLama-2-7B-chat 模型。进一步的深入分析还证明了我们的方法的扩展性和传递性。我们还进行了一些初步的数据质量和结果模型之间的关系分析，为未来的研究提供了坚实的基础。代码可以在 https://github.com/FreedomIntelligence/ReaLM 中找到。”

An Examination of the Compositionality of Large Generative Vision-Language Models

paper_url: http://arxiv.org/abs/2308.10509
repo_url: None
paper_authors: Teli Ma, Rong Li, Junwei Liang
for: 这个论文旨在探讨大语言模型（LLM）在多modal instruct tuning中 constructed的生成视语言模型（GVLM）在多modal compositional reasoning中的性能，以及如何不受现有的对比度视语言学习（contrastive vision-language learning）的评估约束。
methods: 这篇论文提出了一种新的评估指标来评估GVLMs的compositional reasoning能力，并提出了一种基于LLM的方法来减少对 benchmarks中的 morphological bias。此外， authors还提出了一个挑战任务来评估GVLMs的 robustness against inherent inclination toward syntactic correctness。
results: 作者们的研究发现，GVLMs在多modal compositional reasoning中的性能受到评估指标和 benchmarks中的 morphological bias的影响，而且GVLMs可以通过 MorphoBias Score 来评估其compositional reasoning能力。此外， authors还发现了一些GVLMs在某些任务上的robustness issue。

Abstract
With the success of Large Language Models (LLMs), a surge of Generative Vision-Language Models (GVLMs) have been constructed via multimodal instruction tuning. The tuning recipe substantially deviates from the common contrastive vision-language learning. However, the performance of GVLMs in multimodal compositional reasoning remains largely unexplored, as existing evaluation metrics and benchmarks focus predominantly on assessing contrastive models like CLIP. In this paper, we examine the potential evaluation metrics to assess the GVLMs and hypothesize generative score methods are suitable for evaluating compositionality. In addition, current benchmarks tend to prioritize syntactic correctness over semantics. The presence of morphological bias in these benchmarks can be exploited by GVLMs, leading to ineffective evaluations. To combat this, we define a MorphoBias Score to quantify the morphological bias and propose a novel LLM-based strategy to calibrate the bias. Moreover, a challenging task is added to evaluate the robustness of GVLMs against inherent inclination toward syntactic correctness. We include the calibrated dataset and the task into a new benchmark, namely MOrphologicall De-biased Benchmark (MODE). Our study provides the first unbiased benchmark for the compositionality of GVLMs, facilitating future research in this direction. We will release our code and datasets.

摘要
随着大语言模型（LLM）的成功，一批基于多模态指令调教的生成视语言模型（GVLM）已经被建立。但是，GVLM在多模态композиicional reasoning中的性能仍然未得到了足够的探索，因为现有的评价指标和 benchmark 主要用于评估对比型视语言学习模型，如 CLIP。在这篇文章中，我们检查GVLM的评价指标，并假设生成分数方法适用于评估compositional。此外，当前的 benchmark 倾向于优先采用 sintactic correctness 而忽略 semantics，这会导致 GVLM 的不效评价。为解决这个问题，我们定义了一个 MorphoBias Score，用于评估模型的 morphological bias，并提出了一种基于 LLM 的策略来减少这种偏见。此外，我们还添加了一个挑战任务，用于评估 GVLM 对 inherent inclination toward syntactic correctness 的 Robustness。我们将这些数据集和任务添加到一个新的 benchmark，名为 MOrphologicall De-biased Benchmark (MODE)。我们的研究为未来关于 GVLM 的作曲性研究提供了第一个不偏见的 benchmark，并将发布我们的代码和数据集。

Using Autoencoders and AutoDiff to Reconstruct Missing Variables in a Set of Time Series

paper_url: http://arxiv.org/abs/2308.10496
repo_url: None
paper_authors: Jan-Philipp Roche, Oliver Niggemann, Jens Friebe
for: 这篇论文目的是为了解决黑盒模型在机器学习中的缺失变量问题。
methods: 这篇论文使用自适应神经网络来重建缺失的时间序列变量。首先，使用自适应神经网络进行常规训练，然后定义缺失变量为神经网络输入中的一部分，并通过自动推导优化它们。
results: 该方法在一种强不线性电子组件上进行评估，能够成功地重建缺失的一个变量，甚至可以处理多个缺失变量。

Abstract
Existing black box modeling approaches in machine learning suffer from a fixed input and output feature combination. In this paper, a new approach to reconstruct missing variables in a set of time series is presented. An autoencoder is trained as usual with every feature on both sides and the neural network parameters are fixed after this training. Then, the searched variables are defined as missing variables at the autoencoder input and optimized via automatic differentiation. This optimization is performed with respect to the available features loss calculation. With this method, different input and output feature combinations of the trained model can be realized by defining the searched variables as missing variables and reconstructing them. The combination can be changed without training the autoencoder again. The approach is evaluated on the base of a strongly nonlinear electrical component. It is working well for one of four variables missing and generally even for multiple missing variables.

摘要
现有的黑盒模型方法在机器学习中受到固定输入和输出特征组合的限制。本文提出了一种新的缺失变量重建方法。在这种方法中，一个Autoencoder被训练得常，并且神经网络参数被固定后。然后，待搜索的变量被定义为Autoencoder输入中的缺失变量，并通过自动微分优化。这种优化是基于可用特征损失计算。通过这种方法，可以实现不同的输入和输出特征组合，只需要定义待搜索的变量为缺失变量，并且重建它们。这种组合可以更改无需再次训练Autoencoder。本方法在一种强不线性电子元件上进行了评估，能够成功地重建一个变量中的缺失，并且在多个缺失变量情况下也能够工作良好。

Texture Generation on 3D Meshes with Point-UV Diffusion

paper_url: http://arxiv.org/abs/2308.10490
repo_url: https://github.com/CVMI-Lab/Point-UV-Diffusion
paper_authors: Xin Yu, Peng Dai, Wenbo Li, Lan Ma, Zhengzhe Liu, Xiaojuan Qi
for: 这个论文主要探讨了在3D矩阵上生成高质量的 тексту涂抹。
methods: 该方法提出了一种叫Point-UV diffusion的粗糙到细的管道，将杂谱推敲模型与UV映射结合，在UV空间中生成3D一致性高的图像。
results: 该方法可以生成具有多种颜色和纹理的3D矩阵上的高质量Texture图像，并且可以处理任意 genus 的矩阵。

Abstract
In this work, we focus on synthesizing high-quality textures on 3D meshes. We present Point-UV diffusion, a coarse-to-fine pipeline that marries the denoising diffusion model with UV mapping to generate 3D consistent and high-quality texture images in UV space. We start with introducing a point diffusion model to synthesize low-frequency texture components with our tailored style guidance to tackle the biased color distribution. The derived coarse texture offers global consistency and serves as a condition for the subsequent UV diffusion stage, aiding in regularizing the model to generate a 3D consistent UV texture image. Then, a UV diffusion model with hybrid conditions is developed to enhance the texture fidelity in the 2D UV space. Our method can process meshes of any genus, generating diversified, geometry-compatible, and high-fidelity textures. Code is available at https://cvmi-lab.github.io/Point-UV-Diffusion

摘要
在这个工作中，我们关注于生成高质量的3D纹理图像。我们提出了点扩散模型，将杂谱扩散模型与UV映射结合，以生成在UV空间中的3D一致性高质量的纹理图像。我们首先引入点扩散模型来生成低频纹理组件，并采用我们自定义的风格指导来解决颜色分布偏见。 derivated的粗糙纹理提供了全局一致性，并作为后续UV扩散阶段的条件，帮助模型生成3D一致性的UV纹理图像。然后，我们开发了Hybrid condition的UV扩散模型，以提高在2D UV空间中的纹理准确性。我们的方法可以处理任意 genus的 mesh，生成多样化、准确性高的纹理图像。代码可以在https://cvmi-lab.github.io/Point-UV-Diffusion 获取。

Deciphering Raw Data in Neuro-Symbolic Learning with Provable Guarantees

paper_url: http://arxiv.org/abs/2308.10487
repo_url: None
paper_authors: Lue Tao, Yu-Xuan Huang, Wang-Zhou Dai, Yuan Jiang
for: 这篇论文旨在探讨 neuromorphic hybrid 系统中Machine Learning和符号逻辑的集成，以及如何通过逻辑推理来增强感知模型的准确性。
methods: 这篇论文使用了一种新的超visionsignalCharacterization方法，以确定知识库的可用性，并提出了一个可以判断知识库是否能够成功地帮助学习的标准。
results: 实验表明，这种方法可以帮助解释hybrid系统的成功和失败，并且可以在不同的知识库下进行可靠的学习。

Abstract
Neuro-symbolic hybrid systems are promising for integrating machine learning and symbolic reasoning, where perception models are facilitated with information inferred from a symbolic knowledge base through logical reasoning. Despite empirical evidence showing the ability of hybrid systems to learn accurate perception models, the theoretical understanding of learnability is still lacking. Hence, it remains unclear why a hybrid system succeeds for a specific task and when it may fail given a different knowledge base. In this paper, we introduce a novel way of characterising supervision signals from a knowledge base, and establish a criterion for determining the knowledge's efficacy in facilitating successful learning. This, for the first time, allows us to address the two questions above by inspecting the knowledge base under investigation. Our analysis suggests that many knowledge bases satisfy the criterion, thus enabling effective learning, while some fail to satisfy it, indicating potential failures. Comprehensive experiments confirm the utility of our criterion on benchmark tasks.

摘要
(Note: The above text is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is also widely used, particularly in Taiwan and Hong Kong.)

Deep Metric Loss for Multimodal Learning

paper_url: http://arxiv.org/abs/2308.10486
repo_url: None
paper_authors: Sehwan Moon, Hyunju Lee
for: This paper aims to improve the performance of multimodal learning models by introducing a novel loss function that subgroups instances according to their unimodal contributions.
methods: The proposed method uses a novel loss function called \text{MultiModal} loss, which groups instances based on their unimodal contributions to improve the efficiency of multimodal learning models.
results: The proposed method demonstrates improved classification performance on synthetic and real multimodal datasets, and ablation studies verify its effectiveness. Additionally, the method generates reliable prediction scores for each modality, which is essential for subgrouping.

Abstract
Multimodal learning often outperforms its unimodal counterparts by exploiting unimodal contributions and cross-modal interactions. However, focusing only on integrating multimodal features into a unified comprehensive representation overlooks the unimodal characteristics. In real data, the contributions of modalities can vary from instance to instance, and they often reinforce or conflict with each other. In this study, we introduce a novel \text{MultiModal} loss paradigm for multimodal learning, which subgroups instances according to their unimodal contributions. \text{MultiModal} loss can prevent inefficient learning caused by overfitting and efficiently optimize multimodal models. On synthetic data, \text{MultiModal} loss demonstrates improved classification performance by subgrouping difficult instances within certain modalities. On four real multimodal datasets, our loss is empirically shown to improve the performance of recent models. Ablation studies verify the effectiveness of our loss. Additionally, we show that our loss generates a reliable prediction score for each modality, which is essential for subgrouping. Our \text{MultiModal} loss is a novel loss function to subgroup instances according to the contribution of modalities in multimodal learning and is applicable to a variety of multimodal models with unimodal decisions. Our code is available at https://github.com/SehwanMoon/MultiModalLoss.

摘要
多模态学习通常比单模态学习表现更好，因为它可以利用单模态特征和跨模态交互来提高表达力。然而，只Focus on Integrating multimodal features into a unified comprehensive representation可能会忽略单模态特征。在实际数据中，不同模态的贡献可能会instance to instance的变化，并且经常增强或冲突。在这种情况下，我们引入了一种新的多模态损失函数（MultiModal loss），用于 subgrouping 实例根据其单模态贡献。MultiModal损失可以避免由过拟合而导致的学习不稳定性，并高效地优化多模态模型。在synthetic数据上，MultiModal损失能够提高分类性能，并在四个实际多模态数据集上证明了我们的损失的实际效果。剔除研究表明了我们的损失的有效性。此外，我们还证明了我们的损失可以生成可靠的预测分数，这是 subgrouping 中必要的。我们的MultiModal损失是一种新的loss函数，用于根据多模态中的单模态贡献 subgrouping 实例，并适用于多种多模态模型。我们的代码可以在https://github.com/SehwanMoon/MultiModalLoss上找到。

Deep Semi-supervised Anomaly Detection with Metapath-based Context Knowledge

paper_url: http://arxiv.org/abs/2308.10918
repo_url: None
paper_authors: Hwan Kim, Junghoon Kim, Byung Suk Lee, Sungsu Lim
for: 本文针对Graph anomaly detection的研究，提出了一种新的方法，利用Metapath-based semi-supervised learning，解决先前方法的局限性。
methods: 本文提出了一个新的框架，即Metapath-based Semi-supervised Anomaly Detection (MSAD)，其中GCN层在Encoder和Decoder中都用于有效地传递上下文信息。另外，采用了特制的异常社区和度量抽象，以提高结构和属性之间的学习差异。
results: 通过对7个真实网络进行了一系列实验，本文证明了MSAD方法与现有技术相比，具有更高的效果。这些成功的结果开 doors for future investigations，例如优化和分析度量抽象，以进一步提高graph anomaly detection的效果。

Abstract
Graph anomaly detection has attracted considerable attention in recent years. This paper introduces a novel approach that leverages metapath-based semi-supervised learning, addressing the limitations of previous methods. We present a new framework, Metapath-based Semi-supervised Anomaly Detection (MSAD), incorporating GCN layers in both the encoder and decoder to efficiently propagate context information between abnormal and normal nodes. The design of metapath-based context information and a specifically crafted anomaly community enhance the process of learning differences in structures and attributes, both globally and locally. Through a comprehensive set of experiments conducted on seven real-world networks, this paper demonstrates the superiority of the MSAD method compared to state-of-the-art techniques. The promising results of this study pave the way for future investigations, focusing on the optimization and analysis of metapath patterns to further enhance the effectiveness of anomaly detection on attributed networks.

摘要
“ anomaly detection in graphs has attracted significant attention in recent years. this paper proposes a novel approach that leverages metapath-based semi-supervised learning, addressing the limitations of previous methods. we present a new framework, metapath-based semi-supervised anomaly detection (msad), incorporating gcn layers in both the encoder and decoder to efficiently propagate context information between abnormal and normal nodes. the design of metapath-based context information and a specifically crafted anomaly community enhance the process of learning differences in structures and attributes, both globally and locally. through a comprehensive set of experiments conducted on seven real-world networks, this paper demonstrates the superiority of the msad method compared to state-of-the-art techniques. the promising results of this study pave the way for future investigations, focusing on the optimization and analysis of metapath patterns to further enhance the effectiveness of anomaly detection on attributed networks.”Here's a word-for-word translation of the text into Simplified Chinese:“几何 anomaly detection 在 graphs 中吸引了许多注意。这篇 paper 提出了一种新的方法，利用 metapath-based semi-supervised learning，解决先前的方法中的限制。我们提出了一个新框架，metapath-based semi-supervised anomaly detection (msad)，将 gcn 层在 encoder 和 decoder 中内置，以有效地传递 context 信息 между abnormal 和 normal 节点。metapath-based 上下文信息的设计和特定的 anomaly 社区强化了学习不同的结构和属性之间的差异， both globally 和 locally。透过在七个真实世界网络上进行的 comprehensive experiments，这篇 paper 展示了 msad 方法与先前技术相比，具有较好的性能。这些有利的结果点亮了未来的研究，将 focus 在 metapath 模式的优化和分析，以进一步提高具有特征的网络上的 anomaly detection。”

Unsupervised Dialogue Topic Segmentation in Hyperdimensional Space

paper_url: http://arxiv.org/abs/2308.10464
repo_url: https://github.com/seongminp/hdseg
paper_authors: Seongmin Park, Jinkyu Seo, Jihwa Lee
for: 这篇论文旨在提出一种基于超高维数字计算（HDC）的无监督对话话题分 segmentation方法，以提高对话 транскрип的理解。
methods: 该方法使用HDC技术生成了丰富的 токен表示，通过低成本的初始化多个无关的向量来实现。
results: 对于5个分 segmentationbenchmark中的4个，HyperSeg已经超过了当前状态的最佳性能，即使给出部分ground truth信息。此外，HyperSeg在均速度上比基eline faster by a factor of 10。此外，我们还证明了HyperSeg可以提高下游概要SUMMARY的准确率。

Abstract
We present HyperSeg, a hyperdimensional computing (HDC) approach to unsupervised dialogue topic segmentation. HDC is a class of vector symbolic architectures that leverages the probabilistic orthogonality of randomly drawn vectors at extremely high dimensions (typically over 10,000). HDC generates rich token representations through its low-cost initialization of many unrelated vectors. This is especially beneficial in topic segmentation, which often operates as a resource-constrained pre-processing step for downstream transcript understanding tasks. HyperSeg outperforms the current state-of-the-art in 4 out of 5 segmentation benchmarks -- even when baselines are given partial access to the ground truth -- and is 10 times faster on average. We show that HyperSeg also improves downstream summarization accuracy. With HyperSeg, we demonstrate the viability of HDC in a major language task. We open-source HyperSeg to provide a strong baseline for unsupervised topic segmentation.

摘要
我们介绍了HyperSeg，一种基于高维计算（HDC）的无监督对话话题分 segmentation方法。HDC是一种使用随机抽取的高维向量的可能性的vector symbolic架构，通过低成本的初始化多个无关vector，生成丰富的токен表示。这对话题分 segmentation特别有利，因为它经常被视为下游讲解任务的预处理步骤，并且有限的资源。HyperSeg在5个benchmark测试中比现有状态的法则在4个benchmark中表现出色，即使给出部分真实答案，并且平均速度比基eline快10倍。我们还证明HyperSeg可以提高下游概要精度。通过HyperSeg，我们证明了HDC在主要语言任务中的可行性。我们将HyperSeg开源，以提供无监督话题分 segmentation的强大基线。

paper_url: http://arxiv.org/abs/2308.10454
repo_url: None
paper_authors: Chen Cao, Zijian Ding, Gyeong-Geon Lee, Jiajun Jiao, Jionghao Lin, Xiaoming Zhai
for: 这项研究旨在探索将生成式人工智能（AI）与多模态相似理解结合，以提高科学、技术、工程和数学（STEM）教育的创新方法。
methods: 我们开发了一个新系统，利用生成AI的能力将复杂的数学、物理和编程原理转化为易于理解的 мета喻。这些méta喻然后转化为视觉形式，以进一步增强学习经验。
results: 我们通过Randomized A/B/C测试评估了我们的系统的有效性，发现学习效果和学习动机均得到了提高。这些结果将为教育系统设计提供指导，并证明了在STEM领域应用大型语言模型的潜力。

Abstract
This study explores the integration of generative artificial intelligence (AI), specifically large language models, with multi-modal analogical reasoning as an innovative approach to enhance science, technology, engineering, and mathematics (STEM) education. We have developed a novel system that utilizes the capacities of generative AI to transform intricate principles in mathematics, physics, and programming into comprehensible metaphors. To further augment the educational experience, these metaphors are subsequently converted into visual form. Our study aims to enhance the learners' understanding of STEM concepts and their learning engagement by using the visual metaphors. We examine the efficacy of our system via a randomized A/B/C test, assessing learning gains and motivation shifts among the learners. Our study demonstrates the potential of applying large language models to educational practice on STEM subjects. The results will shed light on the design of educational system in terms of harnessing AI's potential to empower educational stakeholders.

摘要

CVFC: Attention-Based Cross-View Feature Consistency for Weakly Supervised Semantic Segmentation of Pathology Images

paper_url: http://arxiv.org/abs/2308.10449
repo_url: None
paper_authors: Liangrui Pan, Lian Wang, Zhichao Feng, Liwen Xu, Shaoliang Peng
for: histopathology image segmentation for cancer diagnosis and prognosis
methods: attention-based cross-view feature consistency end-to-end pseudo-mask generation framework (CVFC) with three branches and multi-scale integrated feature maps
results: outperformed HistoSegNet, SEAM, C-CAM, WSSS-Tissue, and OEEM in terms of IoU and fwIoU on the WSSS4LUAD dataset with an IoU of 0.7122 and a fwIoU of 0.7018

Abstract
Histopathology image segmentation is the gold standard for diagnosing cancer, and can indicate cancer prognosis. However, histopathology image segmentation requires high-quality masks, so many studies now use imagelevel labels to achieve pixel-level segmentation to reduce the need for fine-grained annotation. To solve this problem, we propose an attention-based cross-view feature consistency end-to-end pseudo-mask generation framework named CVFC based on the attention mechanism. Specifically, CVFC is a three-branch joint framework composed of two Resnet38 and one Resnet50, and the independent branch multi-scale integrated feature map to generate a class activation map (CAM); in each branch, through down-sampling and The expansion method adjusts the size of the CAM; the middle branch projects the feature matrix to the query and key feature spaces, and generates a feature space perception matrix through the connection layer and inner product to adjust and refine the CAM of each branch; finally, through the feature consistency loss and feature cross loss to optimize the parameters of CVFC in co-training mode. After a large number of experiments, An IoU of 0.7122 and a fwIoU of 0.7018 are obtained on the WSSS4LUAD dataset, which outperforms HistoSegNet, SEAM, C-CAM, WSSS-Tissue, and OEEM, respectively.

摘要
histopathology图像分割是诊断癌症的标准金标，可以预测癌症诊断结果。然而， histopathology图像分割需要高质量的mask，因此许多研究现在使用图像级别标签来实现像素级分割，以降低细化注释的需求。为解决这个问题，我们提出了基于注意力机制的跨视图特征一致末端 Pseudo-mask生成框架CVFC。具体来说，CVFC是由三个分支组成的联合框架，每个分支包含两个Resnet38和一个Resnet50，并且每个分支独立的多尺度综合特征图生成分支级别活动图（CAM）。在每个分支中，通过下采样和扩展方法调整特征图的大小；中间分支将特征矩阵 projet 到查询和关键特征空间，并通过连接层和内积计算生成特征空间感知矩阵。最后，通过特征一致损失和特征横跨损失来优化CVFC的参数。经过大量实验，在WSSS4LUAD数据集上，CVFC获得了0.7122的IoU和0.7018的fwIoU，与HistoSegNet、SEAM、C-CAM、WSSS-Tissue和OEEM等方法相比，它具有更高的性能。

LDCSF: Local depth convolution-based Swim framework for classifying multi-label histopathology images

paper_url: http://arxiv.org/abs/2308.10446
repo_url: None
paper_authors: Liangrui Pan, Yutao Dou, Zhichao Feng, Liwen Xu, Shaoliang Peng
for: 这个论文主要针对的是提高计算生物学图像诊断的准确率，特别是liver cancer histopathology图像的多标签分类。methods: 该论文提出了一种基于Deep Learning的多标签计算生物学图像分类方法，即Locally Deep Convolutional Swim Framework (LDCSF)，它包括Swin transformer模块、局部深度卷积(LDC)模块、特征重构(FR)模块和ResNet模块。results: 该论文的实验结果显示，LDCSF方法可以提高liver cancer histopathology图像的多标签分类精度，其分类精度为0.9460、0.9960、0.9808和0.9847等。此外，该论文还利用了多标签 histopathology图像的分类结果来计算肿瘤-组织率，为liver cancer histopathology图像的微环境分析提供了基础。

Abstract
Histopathological images are the gold standard for diagnosing liver cancer. However, the accuracy of fully digital diagnosis in computational pathology needs to be improved. In this paper, in order to solve the problem of multi-label and low classification accuracy of histopathology images, we propose a locally deep convolutional Swim framework (LDCSF) to classify multi-label histopathology images. In order to be able to provide local field of view diagnostic results, we propose the LDCSF model, which consists of a Swin transformer module, a local depth convolution (LDC) module, a feature reconstruction (FR) module, and a ResNet module. The Swin transformer module reduces the amount of computation generated by the attention mechanism by limiting the attention to each window. The LDC then reconstructs the attention map and performs convolution operations in multiple channels, passing the resulting feature map to the next layer. The FR module uses the corresponding weight coefficient vectors obtained from the channels to dot product with the original feature map vector matrix to generate representative feature maps. Finally, the residual network undertakes the final classification task. As a result, the classification accuracy of LDCSF for interstitial area, necrosis, non-tumor and tumor reached 0.9460, 0.9960, 0.9808, 0.9847, respectively. Finally, we use the results of multi-label pathological image classification to calculate the tumor-to-stromal ratio, which lays the foundation for the analysis of the microenvironment of liver cancer histopathological images. Second, we released a multilabel histopathology image of liver cancer, our code and data are available at https://github.com/panliangrui/LSF.

摘要
histopathological 图像是肝癌诊断的标准。然而，计算pathology中完全数字诊断的准确率需要提高。在这篇论文中，我们提出了一种基于Swin transformer模块、本地深度卷积（LDC）模块、特征重建（FR）模块和ResNet模块的多标签 histopathology 图像分类模型（LDCSF）。为了提供本地视野诊断结果，我们提出了LDCSF模型，它包括Swin transformer模块、LDC模块、FR模块和ResNet模块。Swin transformer模块限制了计算量生成的注意力机制，并对每个窗口进行注意力限制。LDC模块重构了注意力图并执行多个核心的卷积操作，并将结果传递给下一层。FR模块使用对应的权重矢量向量从核心获取对应的特征图，并将其与原始特征图进行点积 multiplication 生成表示性特征图。最后，ResNet模块进行最终分类任务。因此，LDCSF的分类精度对interstitial area、necrosis、非肿瘤和肿瘤分别达到0.9460、0.9960、0.9808和0.9847。 finally，我们使用多标签病理图像分类结果计算肿瘤-组织率，这 laid the foundation for analyzing the microenvironment of liver cancer histopathological images. Second, we released a multilabel histopathology image of liver cancer, our code and data are available at https://github.com/panliangrui/LSF.

Dynamic Strategy Chain: Dynamic Zero-Shot CoT for Long Mental Health Support Generation

paper_url: http://arxiv.org/abs/2308.10444
repo_url: None
paper_authors: Qi Chen, Dexi Liu
for: 提供心理健康支持 through comprehensive and more acceptable responses.
methods: combines chain-of-thought (CoT) prompting and Large Language Models (LLMs), with a new zero-shot Dynamic Strategy Chain (DSC) prompting method that simulates mental health counseling strategies tailored to help-seekers’ needs.
results: deliver more human-like responses than CoT prompting methods on Long Counseling Text Generation for Mental Health Support (LTGM) tasks, as demonstrated by both automatic and manual evaluations.

Abstract
Long counseling Text Generation for Mental health support (LTGM), an innovative and challenging task, aims to provide help-seekers with mental health support through a comprehensive and more acceptable response. The combination of chain-of-thought (CoT) prompting and Large Language Models (LLMs) is employed and get the SOTA performance on various NLP tasks, especially on text generation tasks. Zero-shot CoT prompting is one of the most common methods in CoT prompting. However, in the LTGM task, Zero-shot CoT prompting can not simulate a counselor or provide personalized strategies without effective mental health counseling strategy prompts. To tackle this challenge, we propose a zero-shot Dynamic Strategy Chain (DSC) prompting method. Firstly, we utilize GPT2 to learn the responses written by mental health counselors and dynamically generate mental health counseling strategies tailored to the help-seekers' needs. Secondly, the Zero-shot DSC prompting is constructed according to mental health counseling strategies and the help-seekers' post. Finally, the Zero-shot DSC prompting is employed to guide LLMs in generating more human-like responses for the help-seekers. Both automatic and manual evaluations demonstrate that Zero-shot DSC prompting can deliver more human-like responses than CoT prompting methods on LTGM tasks.

摘要
长期咨询文本生成 для心理健康支持（LTGM）是一项创新和挑战性任务，旨在为寻求帮助的人提供全面和更acceptable的回应。利用链式思维（CoT）提示和大语言模型（LLM）的组合，实现了当前NLP任务中的State-of-the-Art（SOTA）性能。然而，在LTGM任务中，零shot CoT提示无法模拟咨询师或提供个性化策略。为解决这个挑战，我们提议一种零shot动态策略链（DSC）提示方法。首先，我们利用GPT2来学习咨询师写的回应，并动态生成适应help-seekers需求的心理健康咨询策略。其次，零shot DSC提示根据咨询策略和help-seekers的帖子而构建。最后，零shot DSC提示被用来导引LLM生成更人类化的回应。自动和手动评估表明，零shot DSC提示在LTGM任务中可以提供更人类化的回应，比CoT提示方法更高效。

Using Large Language Models for Cybersecurity Capture-The-Flag Challenges and Certification Questions

paper_url: http://arxiv.org/abs/2308.10443
repo_url: None
paper_authors: Wesley Tann, Yuancheng Liu, Jun Heng Sim, Choon Meng Seah, Ee-Chien Chang
for: 本研究旨在 investigate the effectiveness of Large Language Models (LLMs) in solving Capture-The-Flag (CTF) challenges and questions, particularly in the context of CTF exercises in the classroom.
methods: 本研究使用了三个流行的 LLMs：OpenAI ChatGPT、Google Bard 和 Microsoft Bing。研究者首先评估这些 LLMs 的问答能力在五个 Cisco 证书中，然后进行了对 LLMs 解决 CTF 挑战的资深研究，以了解它们的局限性。
results: 研究发现 LLMs 在解决 CTF 挑战时表现出色，但也存在一些局限性。在七个测试案例中，LLMs 能够成功解决所有的 CTF 挑战，但在某些情况下，jailbreak 提示可以绕过 LLMs 的伦理保障。研究结论表明，LLMs 在 CTF 挑战中的使用可能会对学习和评估学生的技能产生影响。

Abstract
The assessment of cybersecurity Capture-The-Flag (CTF) exercises involves participants finding text strings or ``flags'' by exploiting system vulnerabilities. Large Language Models (LLMs) are natural-language models trained on vast amounts of words to understand and generate text; they can perform well on many CTF challenges. Such LLMs are freely available to students. In the context of CTF exercises in the classroom, this raises concerns about academic integrity. Educators must understand LLMs' capabilities to modify their teaching to accommodate generative AI assistance. This research investigates the effectiveness of LLMs, particularly in the realm of CTF challenges and questions. Here we evaluate three popular LLMs, OpenAI ChatGPT, Google Bard, and Microsoft Bing. First, we assess the LLMs' question-answering performance on five Cisco certifications with varying difficulty levels. Next, we qualitatively study the LLMs' abilities in solving CTF challenges to understand their limitations. We report on the experience of using the LLMs for seven test cases in all five types of CTF challenges. In addition, we demonstrate how jailbreak prompts can bypass and break LLMs' ethical safeguards. The paper concludes by discussing LLM's impact on CTF exercises and its implications.

摘要
学生在Capture-The-Flag（CTF）考试中发现文本串或“flag”的过程涉及到系统漏洞的利用。大自然语言模型（LLM）是通过庞大量的单词训练来理解和生成文本的自然语言模型，它们在许多CTF挑战中表现出色。这些LLM在学生手中免费使用。在教学实践中，这种情况可能会让学生违反学术 integrity的规定。教育者需要了解LLM的能力，以便根据生成AI的帮助进行教学调整。这项研究探讨了LLM的效果，特别是在CTF挑战和问题上。我们首先评估了三个流行的LLM：OpenAI ChatGPT、Google Bard和Microsoft Bing。然后，我们进行了问题回答的评估，以评估这些LLM在不同的难度水平上的表现。接着，我们进行了CTF挑战的解决能力的详细研究，以了解LLM的局限性。我们报告了在七个测试案例中使用LLM的经验，包括所有五种CTF挑战类型。此外，我们还示例了如何使用“监狱break”提示绕过LLM的伦理保障。这篇论文结束于对CTF考试的LLM影响和其意义的讨论。

paper_url: http://arxiv.org/abs/2308.10442
repo_url: None
paper_authors: Yingdan Shi, Jingya Zhou, Congcong Zhang
for: 预测社交网络中的感染扩散范围，在最近几年内得到了广泛关注。现有大多数研究都是预测社交网络中的总被感染用户数量，而忽略了来自个人视角的感染可能性预测。作为一个更细化的预测任务，感染可能性预测充满吸引力和实用价值。
methods: 我们提出了一种名为动态社交网络感染可能性预测任务，它更加真实和有价值。这个任务在动态网络中预测每个用户的感染可能性。由于动态网络的特性和实际应用中的需求，这个任务尚未得到足够的研究。我们提出了一种名为 DySuse 的 novel 框架，基于动态图 embedding 技术。
results: 我们的框架在多种感染流行度模型下表现出了优于现有的动态图 embedding 模型，并在多个实验中达到了满意的预测性能。

Abstract
Influence estimation aims to predict the total influence spread in social networks and has received surged attention in recent years. Most current studies focus on estimating the total number of influenced users in a social network, and neglect susceptibility estimation that aims to predict the probability of each user being influenced from the individual perspective. As a more fine-grained estimation task, susceptibility estimation is full of attractiveness and practical value. Based on the significance of susceptibility estimation and dynamic properties of social networks, we propose a task, called susceptibility estimation in dynamic social networks, which is even more realistic and valuable in real-world applications. Susceptibility estimation in dynamic networks has yet to be explored so far and is computationally intractable to naively adopt Monte Carlo simulation to obtain the results. To this end, we propose a novel end-to-end framework DySuse based on dynamic graph embedding technology. Specifically, we leverage a structural feature module to independently capture the structural information of influence diffusion on each single graph snapshot. Besides, {we propose the progressive mechanism according to the property of influence diffusion,} to couple the structural and temporal information during diffusion tightly. Moreover, a self-attention block {is designed to} further capture temporal dependency by flexibly weighting historical timestamps. Experimental results show that our framework is superior to the existing dynamic graph embedding models and has satisfactory prediction performance in multiple influence diffusion models.

摘要
社会网络的影响估计在最近几年内收到了极大的关注，大多数当前的研究都是预测社会网络中总的影响范围，而忽略了每个用户的感染可能性的估计。作为一个更细致的估计任务，感染估计充满了吸引力和实际价值。基于社交网络的动态性和感染的重要性，我们提出了一项任务，即动态社交网络中的感染估计任务，这个任务在实际应用中更加真实和有价值。然而，在动态网络中进行感染估计是计算上困难的，直观采用蒙特卡罗 simulation 方法来获得结果是无法进行。为此，我们提出了一个新的框架，即 DySuse，基于动态图像技术。具体来说，我们利用一个结构特征模块来独立地捕捉影响扩散在每个单Graph时的结构信息。此外，我们提出了一种进步机制，以便在扩散过程中紧密地结合结构和时间信息。此外，我们还设计了一个自注意块，以便更好地捕捉时间相关性。实验结果表明，我们的框架在多种扩散模型下具有优于现有动态图像嵌入模型的预测性能，并且有满意的结果。

X-VoE: Measuring eXplanatory Violation of Expectation in Physical Events

paper_url: http://arxiv.org/abs/2308.10441
repo_url: https://github.com/daibopku/x-voe
paper_authors: Bo Dai, Linge Wang, Baoxiong Jia, Zeyu Zhang, Song-Chun Zhu, Chi Zhang, Yixin Zhu
for: This study aims to assess the ability of AI agents to understand intuitive physics, and to develop a comprehensive benchmark dataset (X-VoE) to evaluate their performance.
methods: The study uses the Violation of Expectation (VoE) paradigm, which is rooted in developmental psychology, to test AI models’ understanding of events and their underlying explanations. The dataset includes three distinct settings that probe models’ comprehension of events and their ability to infer occluded object states from visual sequences.
results: The experimental outcomes show that the proposed explanation-based learning system is able to align with human commonsense and visually expound VoE events by reconstructing concealed scenes. The results demonstrate the potential of X-VoE as a valuable tool for advancing AI with human-like intuitive physics capabilities.

Abstract
Intuitive physics is pivotal for human understanding of the physical world, enabling prediction and interpretation of events even in infancy. Nonetheless, replicating this level of intuitive physics in artificial intelligence (AI) remains a formidable challenge. This study introduces X-VoE, a comprehensive benchmark dataset, to assess AI agents' grasp of intuitive physics. Built on the developmental psychology-rooted Violation of Expectation (VoE) paradigm, X-VoE establishes a higher bar for the explanatory capacities of intuitive physics models. Each VoE scenario within X-VoE encompasses three distinct settings, probing models' comprehension of events and their underlying explanations. Beyond model evaluation, we present an explanation-based learning system that captures physics dynamics and infers occluded object states solely from visual sequences, without explicit occlusion labels. Experimental outcomes highlight our model's alignment with human commonsense when tested against X-VoE. A remarkable feature is our model's ability to visually expound VoE events by reconstructing concealed scenes. Concluding, we discuss the findings' implications and outline future research directions. Through X-VoE, we catalyze the advancement of AI endowed with human-like intuitive physics capabilities.

摘要
人类理解物理世界的能力是基于直觉物理，允许预测和解释事件，即使是在婴儿时期。然而，将这种直觉物理能力复制到人工智能（AI）中仍然是一项困难的挑战。本研究介绍了X-VoE，一个完整的比较 dataset，用于评估 AI 代理人的直觉物理理解能力。基于发展心理学 rooted Violation of Expectation（VoE） paradigm，X-VoE 确定了更高的解释能力标准 для直觉物理模型。每个 VoE 情况在 X-VoE 中包含三个不同的设定，探索模型对事件和其下面的解释方面的理解。此外，我们提出了一种基于解释的学习系统，可以从视觉序列中捕捉物理动力学和隐藏的 объек state，无需显式隐藏标签。实验结果显示，我们的模型与人类常识相一致，并能视觉地描述 Violation of Expectation 事件。具有一种remarkable feature的是，我们的模型可以将隐藏的场景重建出来，以便更好地解释 VoE 事件。本研究的结论和未来研究方向的讨论，随着 X-VoE，我们推动人工智能具有人类直觉物理能力的发展。

GPT-in-the-Loop: Adaptive Decision-Making for Multiagent Systems

paper_url: http://arxiv.org/abs/2308.10435
repo_url: None
paper_authors: Nathalia Nascimento, Paulo Alencar, Donald Cowan
For: This paper proposes a novel approach called “GPT-in-the-loop” that combines the advanced reasoning capabilities of Large Language Models (LLMs) with multiagent systems to achieve superior decision-making and adaptability in IoT applications.* Methods: The paper employs GPT-4 for enhanced problem-solving and explanation skills, and integrates it into the agent-driven Framework for the Internet of Things (FIoT) to create a GPT-in-the-loop approach.* Results: The paper presents comparative results in the IoT context, showing that the GPT-in-the-loop approach achieves superior decision-making and adaptability without the need for extensive training, outperforming traditional neuroevolutionary methods and solutions provided by software engineers.Here’s the information in Simplified Chinese text:* For: 这篇论文提出了一种新的方法，称为“GPT-in-the-loop”，它将大语言模型（LLM）的高级逻辑能力与多代（MAS）系统结合，以实现在互联网空间应用中的优秀决策和适应性。* Methods: 论文使用GPT-4来提高决策和解释能力，并将其集成到了代理驱动的 Framework for the Internet of Things（FIoT）中。* Results: 论文在互联网空间应用中提供了比较结果，显示GPT-in-the-loop方法在无需延长训练的前提下实现了superior的决策和适应性，比传统的神经演化方法和软件工程师提供的解决方案更高效。

Abstract
This paper introduces the "GPT-in-the-loop" approach, a novel method combining the advanced reasoning capabilities of Large Language Models (LLMs) like Generative Pre-trained Transformers (GPT) with multiagent (MAS) systems. Venturing beyond traditional adaptive approaches that generally require long training processes, our framework employs GPT-4 for enhanced problem-solving and explanation skills. Our experimental backdrop is the smart streetlight Internet of Things (IoT) application. Here, agents use sensors, actuators, and neural networks to create an energy-efficient lighting system. By integrating GPT-4, these agents achieve superior decision-making and adaptability without the need for extensive training. We compare this approach with both traditional neuroevolutionary methods and solutions provided by software engineers, underlining the potential of GPT-driven multiagent systems in IoT. Structurally, the paper outlines the incorporation of GPT into the agent-driven Framework for the Internet of Things (FIoT), introduces our proposed GPT-in-the-loop approach, presents comparative results in the IoT context, and concludes with insights and future directions.

摘要
Structurally, the paper outlines the incorporation of GPT into the agent-driven Framework for the Internet of Things (FIoT), introduces our proposed GPT-in-the-loop approach, presents comparative results in the IoT context, and concludes with insights and future directions.Translated into Simplified Chinese:这篇论文介绍了一种新的“GPT在循环”方法，这种方法将大型自然语言模型（LLM）如生成预训练转换器（GPT）的先进逻辑能力与多代理（MAS）系统结合起来。我们的框架超越了传统的适应方法，这些方法通常需要长时间的训练。我们在智能街灯互联网器件应用中使用GPT-4，这些代理可以在不需要训练的情况下实现更好的决策和适应能力。我们与传统的神经进化方法和软件工程师提供的解决方案进行比较，这 highlights the potential of GPT驱动的多代理系统在互联网器件领域。本文从结构上来说，它描述了将GPTintegrated into the agent-driven Framework for the Internet of Things（FIoT），介绍了我们的提议的GPT在循环方法，在互联网器件上进行比较性研究，并结束于未来方向的探讨。

Mechanisms that play a game, not toss a coin

paper_url: http://arxiv.org/abs/2308.10413
repo_url: None
paper_authors: Toby Walsh
for: 这个论文的目的是提出一种方法来使Randomized机制得到更好的normative property，而不是使用deterministic机制。
methods: 这个论文使用的方法包括在agent之间进行游戏，以换取机制中的随机性。这种方法可以保留原来的好的normative property，但是可以取得一个deterministic的机制，容易审核。
results: 论文提出了六种不同的领域中的derandomized机制，每个机制都有good的normative property。这些机制都是在mixed Nash equilibrium中，agent们采取了一种模块加法游戏，并且在大多数情况下，agent们会采取uniform mixed strategy。在所有except one的mixed Nash equilibrium中，agent们会报告他们对原始问题的真实偏好。这些derandomized方法因此被称为“quasi-strategy proof”。在一个领域中，论文还证明了一种新的normative property的出现，这个property是由derandomization带来的。

Abstract
Randomized mechanisms can have good normative properties compared to their deterministic counterparts. However, randomized mechanisms are problematic in several ways such as in their verifiability. We propose here to derandomize such mechanisms by having agents play a game instead of tossing a coin. The game is designed so an agent's best action is to play randomly, and this play then injects ``randomness'' into the mechanism. This derandomization retains many of the good normative properties of the original randomized mechanism but gives a mechanism that is deterministic and easy, for instance, to audit. We consider three related methods to derandomize randomized mechanism in six different domains: voting, facility location, task allocation, school choice, peer selection, and resource allocation. We propose a number of novel derandomized mechanisms for these six domains with good normative properties. Each mechanism has a mixed Nash equilibrium in which agents play a modular arithmetic game with an uniform mixed strategy. In all but one mixed Nash equilibrium, agents report their preferences over the original problem sincerely. The derandomized methods are thus ``quasi-strategy proof''. In one domain, we additionally show that a new and desirable normative property emerges as a result of derandomization.

摘要
随机机制可能比其权威性更好。然而，随机机制具有一些问题，如验证性。我们在这里提出了使用代理人进行游戏而不是投硬币来DERANDOM化这些机制。游戏的设计使得代理人的最佳行为是随机玩家，这些玩家的行为将插入“随机性”到机制中。这种DERANDOM化保留了许多原始随机机制的好性质，但具有权威性和易于审核的deterministic机制。我们在六个领域中考虑了三种相关的方法来DERANDOM化随机机制：投票、设施位置、任务分配、学校选择、同伴选择和资源分配。我们提出了一些新的DERANDOM化机制，每个机制都有一个混合 Nash 平衡，在其中代理人通过混合数学游戏来报告他们对原始问题的偏好。在所有 except one 的混合 Nash 平衡中，代理人都会透明报告他们的偏好。这些DERANDOM化方法因此被称为“ quasi-strategy proof”。在一个领域中，我们还证明了DERANDOM化后出现了一种新的权威性质。

Diffusion Model as Representation Learner

paper_url: http://arxiv.org/abs/2308.10916
repo_url: None
paper_authors: Xingyi Yang, Xinchao Wang
for: 这篇论文的目的是研究Diffusion Probabilistic Models（DPMs）的表示能力，以及将DPMs中学习的知识应用于识别任务。
methods: 该论文使用了一种新的知识传递方法，named RepFusion，它利用了off-the-shelf DPMs中学习的表示来为学生网络提供监督。
results: 该论文在多个图像分类、semantic segmentation和面部定位任务上进行了评估，并显示了与现有方法相比的出色表现。

Abstract
Diffusion Probabilistic Models (DPMs) have recently demonstrated impressive results on various generative tasks.Despite its promises, the learned representations of pre-trained DPMs, however, have not been fully understood. In this paper, we conduct an in-depth investigation of the representation power of DPMs, and propose a novel knowledge transfer method that leverages the knowledge acquired by generative DPMs for recognition tasks. Our study begins by examining the feature space of DPMs, revealing that DPMs are inherently denoising autoencoders that balance the representation learning with regularizing model capacity. To this end, we introduce a novel knowledge transfer paradigm named RepFusion. Our paradigm extracts representations at different time steps from off-the-shelf DPMs and dynamically employs them as supervision for student networks, in which the optimal time is determined through reinforcement learning. We evaluate our approach on several image classification, semantic segmentation, and landmark detection benchmarks, and demonstrate that it outperforms state-of-the-art methods. Our results uncover the potential of DPMs as a powerful tool for representation learning and provide insights into the usefulness of generative models beyond sample generation. The code is available at \url{https://github.com/Adamdad/Repfusion}.

摘要
Diffusion Probabilistic Models (DPMs) 在不同的生成任务中已经表现出了很好的结果。然而，学习的 DPMs 中的表示能力还没有得到完全的理解。在这篇文章中，我们进行了 DPMs 的深入调查，并提出了一种新的知识传递方法，该方法利用生成 DPMs 中所获得的知识来提高认知任务的性能。我们的研究开始于 DPMs 的特征空间的检查，发现 DPMs 是一种自适应的减噪自适应器，它在学习表示学习和规则化模型容量之间做出了平衡。为此，我们提出了一种名为 RepFusion 的新的知识传递方法。我们的方法在不同的时间步次从准备好的 DPMs 中提取表示，并通过动态地将其作为学生网络的超参论进行使用，通过 reinforcement learning 确定最佳时间。我们在多个图像分类、semantic segmentation 和 landmark detection benchmark 上评估了我们的方法，并示出它在比 state-of-the-art 方法的性能更高。我们的结果揭示了 DPMs 作为表示学习工具的潜力，并提供了生成模型在样本生成之外的用于表示学习的可能性。代码可以在 \url{https://github.com/Adamdad/Repfusion} 上获取。

Simple Baselines for Interactive Video Retrieval with Questions and Answers

paper_url: http://arxiv.org/abs/2308.10402
repo_url: https://github.com/kevinliang888/ivr-qa-baselines
paper_authors: Kaiqu Liang, Samuel Albanie
for: 提高视频检索系统的交互性，使其能够更好地回答用户问题。
methods: 使用问题 Answering 模型来模拟用户交互，并对视频检索系统进行改进。
results: 经过实验，问题基本交互的视频检索系统在 MSR-VTT、MSVD 和 AVSD 等 datasets 上显示出了显著的性能提升。

Abstract
To date, the majority of video retrieval systems have been optimized for a "single-shot" scenario in which the user submits a query in isolation, ignoring previous interactions with the system. Recently, there has been renewed interest in interactive systems to enhance retrieval, but existing approaches are complex and deliver limited gains in performance. In this work, we revisit this topic and propose several simple yet effective baselines for interactive video retrieval via question-answering. We employ a VideoQA model to simulate user interactions and show that this enables the productive study of the interactive retrieval task without access to ground truth dialogue data. Experiments on MSR-VTT, MSVD, and AVSD show that our framework using question-based interaction significantly improves the performance of text-based video retrieval systems.

摘要
至今，大多数视频检索系统都是基于单个查询场景优化的，忽略了用户与系统之前的互动。在最近几年，有关交互系统的重新兴趣，但现有的方法复杂，性能提升有限。在这项工作中，我们重新探讨交互检索任务，并提出了一些简单又有效的基线。我们使用视频问答模型来模拟用户互动，并显示这可以帮助无需真实对话数据进行产品性研究。在 MSR-VTT、MSVD 和 AVSD 上进行了实验，我们发现使用问题基于的互动方式可以显著提高文本基于视频检索系统的性能。

FairBench: A Four-Stage Automatic Framework for Detecting Stereotypes and Biases in Large Language Models

paper_url: http://arxiv.org/abs/2308.10397
repo_url: None
paper_authors: Yanhong Bai, Jiabao Zhao, Jinxin Shi, Tingjiang Wei, Xingjiao Wu, Liang He
for: This paper aims to enhance fairness and reduce adverse impacts on individuals or groups when Large Language Models (LLMs) are applied by detecting stereotypes and biases in the models.
methods: The paper introduces a four-stage framework to directly evaluate stereotypes and biases in the generated content of LLMs, including direct inquiry testing, serial or adapted story testing, implicit association testing, and unknown situation testing.
results: The paper evaluates five LLMs on the Edu-FairBench, a dataset of 12,632 open-ended questions covering nine sensitive factors and 26 educational scenarios, and finds varying degrees of stereotypes and biases in the models. Additionally, the proposed automated evaluation method has shown a high correlation with human annotations.Here is the information in Simplified Chinese text:
for: 这篇论文目标是通过探测大型自然语言模型（LLMs）中的偏见和刻板印象来增强应用中的公平和避免对个人或群体产生负面影响。
methods: 论文提出了一种四个阶段的框架，直接评估 LLMs 生成内容中的偏见和刻板印象，包括直接问题测试、串行或修改故事测试、隐藏协会测试和未知情况测试。
results: 论文使用教育领域的 Edu-FairBench Dataset，包含 12,632 个开放结束问题，覆盖九种敏感因素和 26 个教育场景，对五个 LLMS 进行评估，发现它们中存在不同程度的偏见和刻板印象。此外，提出的自动评估方法与人工笔迹之间存在高相关性。

Abstract
Detecting stereotypes and biases in Large Language Models (LLMs) can enhance fairness and reduce adverse impacts on individuals or groups when these LLMs are applied. However, the majority of existing methods focus on measuring the model's preference towards sentences containing biases and stereotypes within datasets, which lacks interpretability and cannot detect implicit biases and stereotypes in the real world. To address this gap, this paper introduces a four-stage framework to directly evaluate stereotypes and biases in the generated content of LLMs, including direct inquiry testing, serial or adapted story testing, implicit association testing, and unknown situation testing. Additionally, the paper proposes multi-dimensional evaluation metrics and explainable zero-shot prompts for automated evaluation. Using the education sector as a case study, we constructed the Edu-FairBench based on the four-stage framework, which encompasses 12,632 open-ended questions covering nine sensitive factors and 26 educational scenarios. Experimental results reveal varying degrees of stereotypes and biases in five LLMs evaluated on Edu-FairBench. Moreover, the results of our proposed automated evaluation method have shown a high correlation with human annotations.

摘要
检测 LLM 中的偏见和偏好可以提高公平性并减少对个人或群体的不良影响。然而，现有的大多数方法都是基于数据集中的偏见和偏好的表达，lacking interpretability和无法检测 LLM 在实际世界中的隐式偏见和偏好。为解决这个问题，本文提出了一个四个阶段的框架，直接评估 LLM 生成内容中的偏见和偏好，包括直接问题测试、串行或适应故事测试、隐藏关联测试和未知情况测试。此外，本文还提出了多维度评价指标和自动评价可解释的零开始提示。使用教育领域为例，我们构建了 Edu-FairBench，基于四个阶段框架，包含 12,632 个开放问题，覆盖 nine 个敏感因素和 26 个教育场景。实验结果表明 LLM 在 Edu-FairBench 上表现出不同程度的偏见和偏好，并且我们的自动评价方法与人工注释有高相关性。

Robotic Planning under Hierarchical Temporal Logic Specifications

paper_url: http://arxiv.org/abs/2308.10393
repo_url: None
paper_authors: Xusheng Luo, Shaojun Xu, Ruixuan Liu, Changliu Liu
for: 提高机器人规划的能效性，使得机器人可以更好地完成复杂任务。
methods: 使用层次结构的线性时间逻辑（LTL）规定，并采用分解法解决任务。
results: 在机器人导航和抓取等领域进行了广泛的实验研究，结果表明，使用层次结构的LTL规定和分解法可以提高规划的表达能力和效率。

Abstract
Past research into robotic planning with temporal logic specifications, notably Linear Temporal Logic (LTL), was largely based on singular formulas for individual or groups of robots. But with increasing task complexity, LTL formulas unavoidably grow lengthy, complicating interpretation and specification generation, and straining the computational capacities of the planners. In order to maximize the potential of LTL specifications, we capitalized on the intrinsic structure of tasks and introduced a hierarchical structure to LTL specifications. In contrast to the "flat" structure, our hierarchical model has multiple levels of compositional specifications and offers benefits such as greater syntactic brevity, improved interpretability, and more efficient planning. To address tasks under this hierarchical temporal logic structure, we formulated a decomposition-based method. Each specification is first broken down into a range of temporally interrelated sub-tasks. We further mine the temporal relations among the sub-tasks of different specifications within the hierarchy. Subsequently, a Mixed Integer Linear Program is utilized to generate a spatio-temporal plan for each robot. Our hierarchical LTL specifications were experimentally applied to domains of robotic navigation and manipulation. Results from extensive simulation studies illustrated both the enhanced expressive potential of the hierarchical form and the efficacy of the proposed method.

摘要
前研究涉及机器人规划，主要基于单个或组合机器人的时间逻辑要求（Linear Temporal Logic，LTL）。但是随着任务复杂度的增加，LTLFormula会不可避免地增加长度，从而使解释和规划生成变得更加困难，计划器的计算能力也会受到挑战。为了满足LTSpecifications的潜在能力，我们利用任务的内在结构，引入层次结构，从而提供更简洁的语法结构、更好的可读性和更高效的规划。为解决层次时间逻辑结构下的任务，我们提出了一种分解方法。首先，每个规定被分解成一系列相互关联的时间逻辑子任务。然后，我们在不同层次的规定之间挖掘时间关系，并使用混合整数线性编程来生成每个机器人的空间时间规划。我们在机器人导航和抓取等领域进行了实验，结果表明，我们的层次LTSpecifications和分解方法可以增强表达力和规划效率。

Neural Architectures Learning Fourier Transforms, Signal Processing and Much More….

paper_url: http://arxiv.org/abs/2308.10388
repo_url: None
paper_authors: Prateek Verma
for: 本文探讨了使用 Fourier Transform 和现代人工智能技术的关系，以及如何将这两者结合使用。
methods: 本文使用了 neural architecture 来学习signal processing 中的 kernel，并发现了这些 kernel 可以学习出各种杰出的信号处理特性，如窗口函数、开关检测器、高频滤波器、低频滤波器、修饰等。
results: 本文发现了一种使用 neural architecture 学习 kernel 的方法，可以不仅学习 sinusoidal kernel 的形状，还可以发现各种杰出的信号处理特性，如窗口函数、开关检测器、高频滤波器、低频滤波器、修饰等。

Abstract
This report will explore and answer fundamental questions about taking Fourier Transforms and tying it with recent advances in AI and neural architecture. One interpretation of the Fourier Transform is decomposing a signal into its constituent components by projecting them onto complex exponentials. Variants exist, such as discrete cosine transform that does not operate on the complex domain and projects an input signal to only cosine functions oscillating at different frequencies. However, this is a fundamental limitation, and it needs to be more suboptimal. The first one is that all kernels are sinusoidal: What if we could have some kernels adapted or learned according to the problem? What if we can use neural architectures for this? We show how one can learn these kernels from scratch for audio signal processing applications. We find that the neural architecture not only learns sinusoidal kernel shapes but discovers all kinds of incredible signal-processing properties. E.g., windowing functions, onset detectors, high pass filters, low pass filters, modulations, etc. Further, upon analysis of the filters, we find that the neural architecture has a comb filter-like structure on top of the learned kernels. Comb filters that allow harmonic frequencies to pass through are one of the core building blocks/types of filters similar to high-pass, low-pass, and band-pass filters of various traditional signal processing algorithms. Further, we can also use the convolution operation with a signal to be learned from scratch, and we will explore papers in the literature that uses this with that robust Transformer architectures. Further, we would also explore making the learned kernel's content adaptive, i.e., learning different kernels for different inputs.

摘要
The first question is: What if we could have some kernels adapted or learned according to the problem? What if we can use neural architectures for this? We show how one can learn these kernels from scratch for audio signal processing applications. We find that the neural architecture not only learns sinusoidal kernel shapes but discovers all kinds of incredible signal-processing properties, such as windowing functions, onset detectors, high pass filters, low pass filters, modulations, etc.Further, upon analysis of the filters, we find that the neural architecture has a comb filter-like structure on top of the learned kernels. Comb filters that allow harmonic frequencies to pass through are one of the core building blocks/types of filters similar to high-pass, low-pass, and band-pass filters of various traditional signal processing algorithms.Furthermore, we can also use the convolution operation with a signal to be learned from scratch, and we will explore papers in the literature that use this with robust Transformer architectures. Additionally, we will also explore making the learned kernel's content adaptive, i.e., learning different kernels for different inputs.

Unsupervised Opinion Aggregation – A Statistical Perspective

paper_url: http://arxiv.org/abs/2308.10386
repo_url: None
paper_authors: Noyan C. Sevuktekin, Andrew C. Singer
for: 本研究旨在开发一种基于专家意见的统计方法，以估计每位专家的能力水平，无需知道真实的状况。
methods: 该方法基于专家意见之间的相似性，通过度量每位专家与其他专家之间的相似性来衡量专家的能力水平。
results: 研究表明，更可靠的专家更有可能与其他专家相似，并且提出了一种不需要真实状况的完全无监督版本的朴素贝叶斯分类器，可以在许多问题上达到极限优的性能。

Abstract
Complex decision-making systems rarely have direct access to the current state of the world and they instead rely on opinions to form an understanding of what the ground truth could be. Even in problems where experts provide opinions without any intention to manipulate the decision maker, it is challenging to decide which expert's opinion is more reliable -- a challenge that is further amplified when decision-maker has limited, delayed, or no access to the ground truth after the fact. This paper explores a statistical approach to infer the competence of each expert based on their opinions without any need for the ground truth. Echoing the logic behind what is commonly referred to as \textit{the wisdom of crowds}, we propose measuring the competence of each expert by their likeliness to agree with their peers. We further show that the more reliable an expert is the more likely it is that they agree with their peers. We leverage this fact to propose a completely unsupervised version of the na\"{i}ve Bayes classifier and show that the proposed technique is asymptotically optimal for a large class of problems. In addition to aggregating a large block of opinions, we further apply our technique for online opinion aggregation and for decision-making based on a limited the number of opinions.

摘要
“复杂决策系统通常没有直接访问现实世界的现状，而是基于专家们的意见来形成决策者的认知。即使专家们没有意图欺骗决策者，也是困难决定哪位专家的意见更可靠——这种问题在决策者没有访问现实世界后的情况下更为复杂。这篇论文探讨了一种统计方法，用于无需真实世界的访问来评估专家的能力。根据专家们之间的一致性来衡量专家的能力，这种逻辑类似于“群智”的思想。我们表明，更可靠的专家更有可能与其他专家一致，并且我们可以利用这一点来提出一种无监督的普遍投票分类器。我们还证明这种技术在一定的问题类型上是可以达到极限优的。此外，我们还应用这种技术于在线意见集成和基于有限数量的意见决策。”

False Negative/Positive Control for SAM on Noisy Medical Images

paper_url: http://arxiv.org/abs/2308.10382
repo_url: https://github.com/xyimaging/FNPC
paper_authors: Xing Yao, Han Liu, Dewei Hu, Daiwei Lu, Ange Lou, Hao Li, Ruining Deng, Gabriel Arenas, Baris Oguz, Nadav Schwartz, Brett C Byram, Ipek Oguz
for: 这篇论文主要是为了提高Segment Anything Model（SAM）在医学影像分割 tasks中的表现和稳定性。
methods: 该论文提出了一种基于多个 bounding box 的提高 SAM 表现的测试阶段提高技术，以及一种基于 aleatoric uncertainty 的 false negative 和 false positive 修正策略 (FNPC)。
results: 在两个ultrasound数据集上，该方法能够提高 SAM 的表现和对不准确提示的Robustness，而无需进一步的训练或调整。此外， authors 还提出了 Single-Slice-to-Volume (SS2V) 方法，允许通过只需 bounding box 注解的单个2D slice，实现3D pixel-level segmentation。

Abstract
The Segment Anything Model (SAM) is a recently developed all-range foundation model for image segmentation. It can use sparse manual prompts such as bounding boxes to generate pixel-level segmentation in natural images but struggles in medical images such as low-contrast, noisy ultrasound images. We propose a refined test-phase prompt augmentation technique designed to improve SAM's performance in medical image segmentation. The method couples multi-box prompt augmentation and an aleatoric uncertainty-based false-negative (FN) and false-positive (FP) correction (FNPC) strategy. We evaluate the method on two ultrasound datasets and show improvement in SAM's performance and robustness to inaccurate prompts, without the necessity for further training or tuning. Moreover, we present the Single-Slice-to-Volume (SS2V) method, enabling 3D pixel-level segmentation using only the bounding box annotation from a single 2D slice. Our results allow efficient use of SAM in even noisy, low-contrast medical images. The source code will be released soon.

摘要
“seg anything模型（SAM）是一种最近研发的全范围基础模型，用于图像分割。它可以使用稀疏手动提示（如 bounding box）来生成图像中像素级分割，但在医疗图像（如低对比度、噪声射频图像）中表现不佳。我们提出了一种改进SAM在医疗图像分割中表现的测试阶段提示修复技术。该方法结合多个盒子提示的增强和 aleatoric 不确定性基于 false-negative（FN）和 false-positive（FP） corrections（FNPC）策略。我们在两个射频图像集上评估了该方法，并显示了SAM的表现和准确度的改进，而无需进一步训练或调整。此外，我们还提出了单片到体积（SS2V）方法，可以使用只有单个2Dslice的 bounding box注解来实现3D像素级分割。我们的结果允许SAM在噪声低对比度的医疗图像中进行高效使用。代码将即将发布。”

A Human-on-the-Loop Optimization Autoformalism Approach for Sustainability

paper_url: http://arxiv.org/abs/2308.10380
repo_url: None
paper_authors: Ming Jin, Bilgehan Sel, Fnu Hardeep, Wotao Yin
for: solves personalized energy-related problems using large language models (LLMs)
methods: augments an LLM with an optimization solver, enhancing its proficiency in understanding and responding to user specifications and preferences while providing nonlinear reasoning capabilities.
results: enables LLMs to analyze, explain, and tackle a variety of instance-specific energy-related problems, pushing beyond the limits of current prompt-based techniques.

Abstract
This paper outlines a natural conversational approach to solving personalized energy-related problems using large language models (LLMs). We focus on customizable optimization problems that necessitate repeated solving with slight variations in modeling and are user-specific, hence posing a challenge to devising a one-size-fits-all model. We put forward a strategy that augments an LLM with an optimization solver, enhancing its proficiency in understanding and responding to user specifications and preferences while providing nonlinear reasoning capabilities. Our approach pioneers the novel concept of human-guided optimization autoformalism, translating a natural language task specification automatically into an optimization instance. This enables LLMs to analyze, explain, and tackle a variety of instance-specific energy-related problems, pushing beyond the limits of current prompt-based techniques. Our research encompasses various commonplace tasks in the energy sector, from electric vehicle charging and Heating, Ventilation, and Air Conditioning (HVAC) control to long-term planning problems such as cost-benefit evaluations for installing rooftop solar photovoltaics (PVs) or heat pumps. This pilot study marks an essential stride towards the context-based formulation of optimization using LLMs, with the potential to democratize optimization processes. As a result, stakeholders are empowered to optimize their energy consumption, promoting sustainable energy practices customized to personal needs and preferences.

摘要
Our novel approach, called human-guided optimization autoformalism, automatically translates a natural language task specification into an optimization instance. This enables LLMs to analyze, explain, and solve a variety of instance-specific energy-related problems, beyond the limitations of current prompt-based techniques.Our research encompasses common tasks in the energy sector, including electric vehicle charging, Heating, Ventilation, and Air Conditioning (HVAC) control, and long-term planning problems such as cost-benefit evaluations for installing rooftop solar photovoltaics (PVs) or heat pumps. This pilot study marks an essential step towards context-based optimization using LLMs, with the potential to democratize optimization processes and empower stakeholders to optimize their energy consumption, promoting sustainable energy practices customized to personal needs and preferences.

Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models

paper_url: http://arxiv.org/abs/2308.10379
repo_url: None
paper_authors: Bilgehan Sel, Ahmad Al-Tawaha, Vanshaj Khattar, Lu Wang, Ruoxi Jia, Ming Jin
for: 提高大语言模型（LLM）的理解能力，超越传统的“链条”方法。
methods: 使用算法逻辑路径来驱动LLM的思维，开拓新的在 контекст学习模式。
results: 比单个查询方法更高效，与一种使用广泛搜索算法的多查询方法相当，并且发现LLM可以通过算法指导超越算法本身的性能。

Abstract
Current literature, aiming to surpass the "Chain-of-Thought" approach, often resorts to an external modus operandi involving halting, modifying, and then resuming the generation process to boost Large Language Models' (LLMs) reasoning capacities. This mode escalates the number of query requests, leading to increased costs, memory, and computational overheads. Addressing this, we propose the Algorithm of Thoughts -- a novel strategy that propels LLMs through algorithmic reasoning pathways, pioneering a new mode of in-context learning. By employing algorithmic examples, we exploit the innate recurrence dynamics of LLMs, expanding their idea exploration with merely one or a few queries. Our technique outperforms earlier single-query methods and stands on par with a recent multi-query strategy that employs an extensive tree search algorithm. Intriguingly, our results suggest that instructing an LLM using an algorithm can lead to performance surpassing that of the algorithm itself, hinting at LLM's inherent ability to weave its intuition into optimized searches. We probe into the underpinnings of our method's efficacy and its nuances in application.

摘要
当前的文献，尝试超越“链式思维”方法，经常采用外部的模式，包括停止、修改、然后继续生成过程，以提高大语言模型（LLM）的思维能力。这种模式会增加查询请求数量，导致成本、内存和计算负担的增加。为此，我们提出了思维算法——一种新的策略，通过算法的思维路径来驱动LLM，开拓一种新的在Context中学习模式。通过使用算法的示例，我们利用LLM的内生循环动力，扩展其想法探索，只需一些或一个查询。我们的技术超过了单个查询方法，与最近的多查询策略，使用广泛的树查找算法相当。有趣的是，我们的结果表明，通过向LLM提供算法的指导，可以让LLM的表现超越算法本身，这表明LLM具有自然的思维搜索优化能力。我们进一步探讨我们的方法的效果和其应用中的细节。

LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models

paper_url: http://arxiv.org/abs/2308.11462
repo_url: https://github.com/hazyresearch/legalbench
paper_authors: Neel Guha, Julian Nyarko, Daniel E. Ho, Christopher Ré, Adam Chilton, Aditya Narayana, Alex Chohlas-Wood, Austin Peters, Brandon Waldon, Daniel N. Rockmore, Diego Zambrano, Dmitry Talisman, Enam Hoque, Faiz Surani, Frank Fagan, Galit Sarfaty, Gregory M. Dickinson, Haggai Porat, Jason Hegland, Jessica Wu, Joe Nudell, Joel Niklaus, John Nay, Jonathan H. Choi, Kevin Tobia, Margaret Hagan, Megan Ma, Michael Livermore, Nikon Rasumov-Rahe, Nils Holzenberger, Noam Kolt, Peter Henderson, Sean Rehaag, Sharad Goel, Shang Gao, Spencer Williams, Sunny Gandhi, Tom Zur, Varun Iyer, Zehua Li
for: 这个论文的目的是为了研究大语言模型在法律领域中可以完成哪些类型的法律推理。
methods: 这篇论文使用了一个名为LegalBench的法律推理benchmark，该benchmark包含了162个任务，涵盖了六种不同的法律推理类型。
results: 这篇论文通过对20个开源和商业大语言模型进行实证评估，并示出了LegalBench可以帮助研究人员 explore这些模型在法律领域中的应用。

Abstract
The advent of large language models (LLMs) and their adoption by the legal community has given rise to the question: what types of legal reasoning can LLMs perform? To enable greater study of this question, we present LegalBench: a collaboratively constructed legal reasoning benchmark consisting of 162 tasks covering six different types of legal reasoning. LegalBench was built through an interdisciplinary process, in which we collected tasks designed and hand-crafted by legal professionals. Because these subject matter experts took a leading role in construction, tasks either measure legal reasoning capabilities that are practically useful, or measure reasoning skills that lawyers find interesting. To enable cross-disciplinary conversations about LLMs in the law, we additionally show how popular legal frameworks for describing legal reasoning -- which distinguish between its many forms -- correspond to LegalBench tasks, thus giving lawyers and LLM developers a common vocabulary. This paper describes LegalBench, presents an empirical evaluation of 20 open-source and commercial LLMs, and illustrates the types of research explorations LegalBench enables.

摘要
大量语言模型（LLM）的出现和法律社区的采用让人们问到了哪些类型的法律推理可以由LLM完成？为了推动这个问题的研究，我们提出了法律推理benchmark：legalbench，这是一个由162个任务组成的合作构建的法律推理benchmark，涵盖了六种不同的法律推理类型。legalbench通过了跨学科的过程建造，我们收集了由法律专业人员设计和手工制作的任务。由于这些专业人员在建构过程中扮演了主导角色，任务中的推理能力都是实用的或者法律专业人员感兴趣的。此外，我们还展示了法律框架，这些框架可以用来描述不同类型的法律推理，并与LLM开发者和法律专业人员共享一个语言，从而促进了不同领域之间的交流。本文介绍了legalbench，对20个开源和商业LLM进行了实证评估，并 illustrate了legalbench可以推动的研究探索。

Explaining Emergence

paper_url: http://arxiv.org/abs/2308.10912
repo_url: https://github.com/benjaminpatrickevans/BRATS
paper_authors: Hervé Zwirn
for: 本研究探讨了 Emergence 这一概念，即在不同领域中出现的意外现象，以及这些现象如何被观察者主观地描述。
methods: 本研究使用了数学模型，探讨了一些具有简单和决定性规则的系统是否能够显示出 Emergent 行为。
results: 研究发现，即使系统具有简单和决定性规则，也可以显示出 Emergent 行为，这种行为被称为 computational irreducibility。这种新的概念可以帮助我们从 объек oriented 的角度理解 Emergent 现象。

Abstract
Emergence is a pregnant property in various fields. It is the fact for a phenomenon to appear surprisingly and to be such that it seems at first sight that it is not possible to predict its apparition. That is the reason why it has often been said that emergence is a subjective property relative to the observer. Some mathematical systems having very simple and deterministic rules nevertheless show emergent behavior. Studying these systems shed a new light on the subject and allows to define a new concept, computational irreducibility, which deals with behaviors that even though they are totally deterministic cannot be predicted without simulating them. Computational irreducibility is then a key for understanding emergent phenomena from an objective point of view that does not need the mention of any observer.

摘要
emergence 是一种在多个领域中表现出的潜在性。这是指现象在初次看到时显示出不可预测的特征，以至于看起来是不可能预测其出现。这也是为什么emergence 常被称为主观性的特征。 certain mathematical systems with very simple and deterministic rules nevertheless exhibit emergent behavior. studying these systems has shed new light on the subject and has led to the development of a new concept, computational irreducibility, which deals with behaviors that are totally deterministic but cannot be predicted without simulating them. computational irreducibility is therefore a key to understanding emergent phenomena from an objective perspective that does not rely on the mention of any observer.Note: The word "emergence" in Chinese is emergence (出现), and "computational irreducibility" is 计算不可逆性 (computational irreducibility).

Imaginations of WALL-E : Reconstructing Experiences with an Imagination-Inspired Module for Advanced AI Systems

paper_url: http://arxiv.org/abs/2308.10354
repo_url: None
paper_authors: Zeinab Sadat Taghavi, Soroush Gooran, Seyed Arshan Dalili, Hamidreza Amirzadeh, Mohammad Jalal Nematbakhsh, Hossein Sameti
For: The paper aims to introduce a novel Artificial Intelligence (AI) system that leverages the concept of imagination to process and generate deep and interpretable information across modalities.* Methods: The proposed system uses an imagination-inspired module that bridges the gap between textual inputs and other modalities, enriching the derived information based on previously learned experiences. The system employs large-scale models, specifically a Multimodal Large Language Model (MLLM), to extract meaningful information across modalities while primarily remaining unimodal.* Results: The system outperformed the best Large Language Models (LLM) on multiple tasks, including emotion recognition and question-answering, achieving Weighted F1 (WF1) scores of 46.74%, 25.23%, and Overall F1 (OF1) score of 17%, respectively, compared to 22.89%, 12.28%, and 7% from the well-performing LLM.Here are the three points in Simplified Chinese text:
for: 本研究旨在提出一种基于哲学和心理分析的人工智能系统，以便在不同模式之间进行深入的信息交换和理解。
methods: 该系统使用基于想象的模块，将文本输入与其他模式之间的差距bridge，通过以前学习的经验进行填充，以提高获得的信息的深度和多样性。
results: 系统在多个任务上，包括情感识别和问答，与最佳大语言模型（LLM）进行了比较，并在MELD、IEMOCAP和CoQA数据集上取得了Weighted F1（WF1）分数为46.74%、25.23%和17%，比LMM的最佳表现更高。

Abstract
In this paper, we introduce a novel Artificial Intelligence (AI) system inspired by the philosophical and psychoanalytical concept of imagination as a ``Re-construction of Experiences". Our AI system is equipped with an imagination-inspired module that bridges the gap between textual inputs and other modalities, enriching the derived information based on previously learned experiences. A unique feature of our system is its ability to formulate independent perceptions of inputs. This leads to unique interpretations of a concept that may differ from human interpretations but are equally valid, a phenomenon we term as ``Interpretable Misunderstanding". We employ large-scale models, specifically a Multimodal Large Language Model (MLLM), enabling our proposed system to extract meaningful information across modalities while primarily remaining unimodal. We evaluated our system against other large language models across multiple tasks, including emotion recognition and question-answering, using a zero-shot methodology to ensure an unbiased scenario that may happen by fine-tuning. Significantly, our system outperformed the best Large Language Models (LLM) on the MELD, IEMOCAP, and CoQA datasets, achieving Weighted F1 (WF1) scores of 46.74%, 25.23%, and Overall F1 (OF1) score of 17%, respectively, compared to 22.89%, 12.28%, and 7% from the well-performing LLM. The goal is to go beyond the statistical view of language processing and tie it to human concepts such as philosophy and psychoanalysis. This work represents a significant advancement in the development of imagination-inspired AI systems, opening new possibilities for AI to generate deep and interpretable information across modalities, thereby enhancing human-AI interaction.

摘要
在这篇论文中，我们介绍了一种基于哲学和心理分析的人工智能（AI）系统，即“经验重建”。我们的AI系统具有基于想象的模块，可以将文本输入与其他模式相互转换，richard derived information based on previously learned experiences。我们的系统具有独特的特点，即能够独立地理解输入。这会导致对概念的解释不同于人类的解释，但具有相同的有效性，我们称之为“可理解的错误”。我们使用大规模模型，具体来说是多Modal Large Language Model（MLLM），使得我们提议的系统可以在不同模式之间提取有意义信息，同时保持主要的单模态。我们对其他大型语言模型进行了多个任务的评估，包括情感识别和问答，使用零扩展方法来保证无偏见的场景。结果显示，我们的系统在MELD、IEMOCAP和CoQA数据集上的WF1分数为46.74%、25.23%和总F1分数为17%，比最佳大型语言模型（LLM）高出22.89%、12.28%和7%。我们的目标是超越语言处理的统计视角，与人类概念相联系，如哲学和心理分析。这项工作代表了人工智能具有想象能力的系统的开发的一个重要突破，开启了新的可能性，让AI生成深层次可理解的信息 across modalities，从而提高人机交互。

A probabilistic analysis of selected notions of iterated conditioning under coherence

paper_url: http://arxiv.org/abs/2308.10338
repo_url: None
paper_authors: Lydia Castronovo, Giuseppe Sanfilippo
for: 本文研究了三值逻辑中的Iterated Conditionals，即 Cooper-Calabrese、de Finetti 和 Farrell 等人提出的 conditionals。
methods: 本文使用了三值逻辑的不同方法，包括 conjunction 和 disjunction among conditionals，以及 conditional random quantities 的概率传播规则。
results: 本文显示了 iterated conditionals 中的 compound probability theorem 和其他基本性质不受 Cooper-Calabrese、de Finetti 和 Farrell 等人的定义影响，但是可以通过使用 suitable random quantities 来满足这些性质。此外，本文还证明了一些 generalized versions of Bayes’ Rule 和 Modus Ponens 的有效性。

Abstract
It is well know that basic conditionals satisfy some desirable basic logical and probabilistic properties, such as the compound probability theorem, but checking the validity of these becomes trickier when we switch to compound and iterated conditionals. We consider de Finetti's notion of conditional as a three-valued object and as a conditional random quantity in the betting framework. We recall the notions of conjunction and disjunction among conditionals in selected trivalent logics. First, in the framework of specific three-valued logics we analyze the notions of iterated conditioning introduced by Cooper-Calabrese, de Finetti and Farrell, respectively. We show that the compound probability theorem and other basic properties are not preserved by these objects, by also computing some probability propagation rules. Then, for each trivalent logic we introduce an iterated conditional as a suitable random quantity which satisfies the compound prevision theorem and some of the desirable properties. We also check the validity of two generalized versions of Bayes' Rule for iterated conditionals. We study the p-validity of generalized versions of Modus Ponens and two-premise centering for iterated conditionals. Finally, we observe that all the basic properties are satisfied only by the iterated conditional mainly developed in recent papers by Gilio and Sanfilippo in the setting of conditional random quantities.

摘要
“基本条件满足一些愉悦的基础逻辑和概率性质，如复杂概率定理，但检查其有效性变得更加困难当我们转移到复杂和迭代条件。我们使用de Finetti的条件定义为三值对象和在赌博框架中的条件随机量。我们提及选择的三值逻辑中的 conjunction 和 disjunction。首先，在特定的三值逻辑框架中，我们分析由Cooper-Calabrese、de Finetti和Farrell分别引入的迭代条件。我们显示这些对象不 preserved 复杂概率定理和其他基本性质，同时计算一些概率传播规则。然后，我们为每个三值逻辑引入一个适当的迭代条件，该满足复杂预测定理和一些愉悦性质。我们还验证了两个扩展版本的 bayes 规则的有效性。最后，我们发现所有的基本性质只有在 Gilio 和 Sanfilippo 在 conditional random quantities 的设定中的迭代条件中得到。”

A Study on Robustness and Reliability of Large Language Model Code Generation

paper_url: http://arxiv.org/abs/2308.10335
repo_url: None
paper_authors: Li Zhong, Zilong Wang
for: 这个论文主要是为了评估大语言模型（LLM）生成的代码的可靠性和稳定性。methods: 作者使用了 StackOverflow 上的 1208 个编程问题和 24 种 Java API 来收集数据，并总结了这些 API 的常见错误模式。然后，他们使用现有的 популяр LLM 进行评估。results: 评估结果显示，即使使用 GPT-4，62% 的生成代码中都存在 API 错误，这些错误可能会在实际软件开发中导致不期望的后果。

Abstract
Recently, the large language models (LLMs) have shown extraordinary ability in understanding natural language and generating programming code. It has been a common practice of software engineers to consult LLMs when encountering coding questions. Although efforts have been made to avoid syntax errors and align the code with the intended semantics, the reliability and robustness of the code generationfrom LLMs have not yet been thoroughly studied. The executable code is not equivalent to the reliable and robust code, especially in the context of real-world software development.The misuse of APIs in the generated code could lead to severe problem, such as resource leaks, program crashes, etc.To make things worse, the users of LLM code generation services are actually the developers that are most vulnerable to these code that seems right -- They are always novice developers that are not familiar with the APIs that LLMs generate code for them. Therefore, they could hardly tell the misuse in the code generated by LLMs, which further facilitates the incorrect code applied in real-world software. Existing code evaluation benchmark and datasets focus on crafting small tasks such as programming questions in coding interviews, which however deviates from the problem that developers would ask LLM for real-world coding help. To fill the missing piece, in this work, we propose a dataset RobustAPI for evaluating the reliability and robustness of code generated by LLMs. We collect 1208 coding questions from StackOverflow on 24 representative Java APIs. We summarize thecommon misuse patterns of these APIs and evaluate them oncurrent popular LLMs. The evaluation results show that evenfor GPT-4, 62% of the generated code contains API misuses,which would cause unexpected consequences if the code isintroduced into real-world software.

摘要
最近，大型自然语言模型（LLM）在理解自然语言和生成代码方面表现出了极高的能力。软件工程师们常常咨询LLM当遇到编程问题。虽有努力避免语法错误和对代码进行Semantic alignment，但LLM代码生成的可靠性和稳定性尚未得到了全面的研究。生成的代码不等于可靠和稳定的代码，特别是在实际软件开发中。生成代码中的APImisuse可能导致严重问题，如资源泄露、程序崩溃等。worse still，使用LLM代码生成服务的用户通常是不熟悉这些API的新手 programmer，因此很难发现生成代码中的错误。这使得 incorrect code更容易在实际软件中应用。现有的代码评估标准和数据集都是为小型任务，如编程题目，而不是真实的软件开发问题。为填补这一缺失，在这个工作中，我们提出了一个名为RobustAPI的代码可靠性和稳定性评估数据集。我们收集了Stack Overflow上的1208个编程问题，并总结了24种常见API的错误模式。我们对当前流行的LLM进行了评估，结果显示，即使是GPT-4，62%的生成代码中包含API错误，这些错误会在实际软件中产生意外的后果。

UAV 3-D path planning based on MOEA/D with adaptive areal weight adjustment

paper_url: http://arxiv.org/abs/2308.10307
repo_url: None
paper_authors: Yougang Xiao, Hao Yang, Huan Liu, Keyu Wu, Guohua Wu
for: 该论文旨在提出一种基于分解的多目标演化算法（MOEA/D）以及一种适应区重量调整策略（AAWA），以达到让机器人飞行路径长度和地形威胁之间做出平衡。
methods: 该论文使用了一种基于分解的多目标演化算法（MOEA/D），并采用了一种适应区重量调整策略（AAWA）以提高解决方案的多样性。
results: 论文通过对二十个人工enario和四个实际enario进行比较，证明了MOEA/D-AAWA的效果。

Abstract
Unmanned aerial vehicles (UAVs) are desirable platforms for time-efficient and cost-effective task execution. 3-D path planning is a key challenge for task decision-making. This paper proposes an improved multi-objective evolutionary algorithm based on decomposition (MOEA/D) with an adaptive areal weight adjustment (AAWA) strategy to make a tradeoff between the total flight path length and the terrain threat. AAWA is designed to improve the diversity of the solutions. More specifically, AAWA first removes a crowded individual and its weight vector from the current population and then adds a sparse individual from the external elite population to the current population. To enable the newly-added individual to evolve towards the sparser area of the population in the objective space, its weight vector is constructed by the objective function value of its neighbors. The effectiveness of MOEA/D-AAWA is validated in twenty synthetic scenarios with different number of obstacles and four realistic scenarios in comparison with other three classical methods.

摘要
无人飞行器（UAV）是一种高效且经济的任务执行平台。三维路径规划是任务决策中的关键挑战。本文提出了基于分解（MOEA/D）的改进多目标进化算法，并与适应面积质量调整策略（AAWA）结合，以实现路径总长度和地形威胁之间的让担。AAWA是用于提高解的多样性。具体来说，AAWA首先从当前人口中移除拥挤的个体和其加重向量，然后从外部卓越人口中添加一个稀疏个体到当前人口中。为让新增加的个体演化向稀疏的区域，其加重向量由近宠的目标函数值构建。MOEA/D-AAWA的效果在二十个 synthetic 场景中与不同数量的障碍物以及四个实际场景进行比较，证明了其效果的 Validation。

Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video

paper_url: http://arxiv.org/abs/2308.10305
repo_url: https://github.com/kasvii/pmce
paper_authors: Yingxuan You, Hong Liu, Ti Wang, Wenhao Li, Runwei Ding, Xia Li
for: 这篇论文主要是用于提出一种基于视频的3D人体模型生成方法，以解决现有的视频基于方法一般会遇到复杂的人体姿势和形状参数的估计问题。
methods: 该方法使用了一个两栅Encoder来估计中帧3D人体姿势和从输入图像序列中提取时间特征。此外，我们还设计了一个协同演化解码器来实现人体姿势和图像引导的AdaLN来使 pose和mesh与人体身体形状相匹配。
results: 对于三个标准测试集（3DPW、Human3.6M和MPI-INF-3DHP），我们的提出的PMCE方法在每帧精度和时间一致性方面都超过了先前的状态OF-the-art方法。

Abstract
Despite significant progress in single image-based 3D human mesh recovery, accurately and smoothly recovering 3D human motion from a video remains challenging. Existing video-based methods generally recover human mesh by estimating the complex pose and shape parameters from coupled image features, whose high complexity and low representation ability often result in inconsistent pose motion and limited shape patterns. To alleviate this issue, we introduce 3D pose as the intermediary and propose a Pose and Mesh Co-Evolution network (PMCE) that decouples this task into two parts: 1) video-based 3D human pose estimation and 2) mesh vertices regression from the estimated 3D pose and temporal image feature. Specifically, we propose a two-stream encoder that estimates mid-frame 3D pose and extracts a temporal image feature from the input image sequence. In addition, we design a co-evolution decoder that performs pose and mesh interactions with the image-guided Adaptive Layer Normalization (AdaLN) to make pose and mesh fit the human body shape. Extensive experiments demonstrate that the proposed PMCE outperforms previous state-of-the-art methods in terms of both per-frame accuracy and temporal consistency on three benchmark datasets: 3DPW, Human3.6M, and MPI-INF-3DHP. Our code is available at https://github.com/kasvii/PMCE.

摘要
尽管单一图像基于的3D人体凝固得到了显著的进步，但从视频中准确地回归3D人体运动仍然是一个挑战。现有的视频基本方法通常通过计算复杂的姿势和形态参数从相关的图像特征来回归人体网格， whose high complexity and low representation ability often result in inconsistent pose motion and limited shape patterns. To address this issue, we introduce 3D pose as the intermediary and propose a Pose and Mesh Co-Evolution network (PMCE) that decouples this task into two parts: 1) video-based 3D human pose estimation and 2) mesh vertices regression from the estimated 3D pose and temporal image feature. Specifically, we propose a two-stream encoder that estimates mid-frame 3D pose and extracts a temporal image feature from the input image sequence. In addition, we design a co-evolution decoder that performs pose and mesh interactions with the image-guided Adaptive Layer Normalization (AdaLN) to make pose and mesh fit the human body shape. Extensive experiments demonstrate that the proposed PMCE outperforms previous state-of-the-art methods in terms of both per-frame accuracy and temporal consistency on three benchmark datasets: 3DPW, Human3.6M, and MPI-INF-3DHP. Our code is available at https://github.com/kasvii/PMCE.Here's the translation in Traditional Chinese:尽管单一图像基于的3D人体凝固已经取得了显著的进步，但从视频中准确地回归3D人体动作仍然是一个挑战。现有的视频基本方法通常通过计算复杂的姿势和形态参数从相关的图像特征来回归人体网格， whose high complexity and low representation ability often result in inconsistent pose motion and limited shape patterns. To address this issue, we introduce 3D pose as the intermediary and propose a Pose and Mesh Co-Evolution network (PMCE) that decouples this task into two parts: 1) video-based 3D human pose estimation and 2) mesh vertices regression from the estimated 3D pose and temporal image feature. Specifically, we propose a two-stream encoder that estimates mid-frame 3D pose and extracts a temporal image feature from the input image sequence. In addition, we design a co-evolution decoder that performs pose and mesh interactions with the image-guided Adaptive Layer Normalization (AdaLN) to make pose and mesh fit the human body shape. Extensive experiments demonstrate that the proposed PMCE outperforms previous state-of-the-art methods in terms of both per-frame accuracy and temporal consistency on three benchmark datasets: 3DPW, Human3.6M, and MPI-INF-3DHP. Our code is available at https://github.com/kasvii/PMCE.

2023-08-21

DynED: Dynamic Ensemble Diversification in Data Stream Classification

Differentiable Frank-Wolfe Optimization Layer

Artificial intelligence is ineffective and potentially harmful for fact checking

Stabilizing Unsupervised Environment Design with a Learned Adversary

Instruction Tuning for Large Language Models: A Survey

A Block-Ring connected Topology of Parameterized Quantum Circuits

Sparse Linear Concept Discovery Models

To Whom are You Talking? A Deep Learning Model to Endow Social Robots with Addressee Estimation Skills

DataVinci: Learning Syntactic and Semantic String Repairs

On the Adversarial Robustness of Multi-Modal Foundation Models

We Don’t Need No Adam, All We Need Is EVE: On The Variance of Dual Learning Rate And Beyond

CoMIX: A Multi-agent Reinforcement Learning Training Architecture for Efficient Decentralized Coordination and Independent Decision Making

On the accuracy of interpolation based on single-layer artificial neural networks

Sampling From Autoencoders’ Latent Space via Quantization And Probability Mass Function Concepts

Refashioning Emotion Recognition Modelling: The Advent of Generalised Large Models

Normative Conditional Reasoning as a Fragment of HOL

Visual Crowd Analysis: Open Research Problems

A Safe Deep Reinforcement Learning Approach for Energy Efficient Federated Learning in Wireless Communication Networks

Deep Evidential Learning for Bayesian Quantile Regression

EVE: Efficient zero-shot text-based Video Editing with Depth Map Guidance and Temporal Consistency Constraints

SCULPT: Shape-Conditioned Unpaired Learning of Pose-dependent Clothed and Textured Human Meshes

RaLLe: A Framework for Developing and Evaluating Retrieval-Augmented Large Language Models

Weighting by Tying: A New Approach to Weighted Rank Correlation

Large Language Models for Software Engineering: A Systematic Literature Review

BackTrack: Robust template update via Backward Tracking of candidate template

Age Recommendation from Texts and Sentences for Children

Pseudo-online framework for BCI evaluation: A MOABB perspective

Overcoming Overconfidence for Active Learning

Metaverse: A Vision, Architectural Elements, and Future Directions for Scalable and Realtime Virtual Worlds

KGrEaT: A Framework to Evaluate Knowledge Graphs via Downstream Tasks

Dataset Quantization

When Less is Enough: Positive and Unlabeled Learning Model for Vulnerability Detection

Hybrid classical-quantum computing: are we forgetting the classical part in the binomial?

Performance Enhancement Leveraging Mask-RCNN on Bengali Document Layout Analysis

Large Language Model as a User Simulator

An Examination of the Compositionality of Large Generative Vision-Language Models

Using Autoencoders and AutoDiff to Reconstruct Missing Variables in a Set of Time Series

Texture Generation on 3D Meshes with Point-UV Diffusion

Deciphering Raw Data in Neuro-Symbolic Learning with Provable Guarantees

Deep Metric Loss for Multimodal Learning

Deep Semi-supervised Anomaly Detection with Metapath-based Context Knowledge

Unsupervised Dialogue Topic Segmentation in Hyperdimensional Space

Elucidating STEM Concepts through Generative AI: A Multi-modal Exploration of Analogical Reasoning

CVFC: Attention-Based Cross-View Feature Consistency for Weakly Supervised Semantic Segmentation of Pathology Images

LDCSF: Local depth convolution-based Swim framework for classifying multi-label histopathology images

Dynamic Strategy Chain: Dynamic Zero-Shot CoT for Long Mental Health Support Generation

Using Large Language Models for Cybersecurity Capture-The-Flag Challenges and Certification Questions

DySuse: Susceptibility Estimation in Dynamic Social Networks

X-VoE: Measuring eXplanatory Violation of Expectation in Physical Events

GPT-in-the-Loop: Adaptive Decision-Making for Multiagent Systems

Mechanisms that play a game, not toss a coin

Diffusion Model as Representation Learner

Simple Baselines for Interactive Video Retrieval with Questions and Answers

FairBench: A Four-Stage Automatic Framework for Detecting Stereotypes and Biases in Large Language Models

Robotic Planning under Hierarchical Temporal Logic Specifications

Neural Architectures Learning Fourier Transforms, Signal Processing and Much More….

Unsupervised Opinion Aggregation – A Statistical Perspective

False Negative/Positive Control for SAM on Noisy Medical Images

A Human-on-the-Loop Optimization Autoformalism Approach for Sustainability

Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models

LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models

Explaining Emergence

Imaginations of WALL-E : Reconstructing Experiences with an Imagination-Inspired Module for Advanced AI Systems

A probabilistic analysis of selected notions of iterated conditioning under coherence

A Study on Robustness and Reliability of Large Language Model Code Generation

UAV 3-D path planning based on MOEA/D with adaptive areal weight adjustment

Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video