2023-09-04

cs.AI

cs.AI - 2023-09-04

Efficient Defense Against Model Stealing Attacks on Convolutional Neural Networks

paper_url: http://arxiv.org/abs/2309.01838
repo_url: https://github.com/kacemkhaled/defending-extraction
paper_authors: Kacem Khaled, Mouna Dhaouadi, Felipe Gohring de Magalhães, Gabriela Nicolescu
For: The paper is written to propose a simple yet effective and efficient defense against model stealing attacks for deep learning models.* Methods: The paper introduces a heuristic approach to perturb the output probabilities of the model to defend against stealing attacks, which can be easily integrated into models without additional training.* Results: The proposed defense is effective in defending against three state-of-the-art stealing attacks, and outperforms the state-of-the-art defenses with a $\times37$ faster inference latency without requiring any additional model and with a low impact on the model’s performance. The defense is also effective for quantized CNNs targeting edge devices.Here’s the same information in Simplified Chinese text:* For: 这篇论文是为了提出一种简单又有效的模型盗用攻击防御方案。* Methods: 论文提出一种基于归类抽象的方法，通过对模型输出概率进行扰动来防御盗用攻击。这种方法可以轻松地与现有模型集成，无需进行额外训练。* Results: 提出的防御方法有效地防止了三种state-of-the-art的盗用攻击，并且比现有的防御方法快速37倍，不需要额外的模型和占用较低的模型性能。此外，这种防御方法也适用于采用量化（即压缩）的卷积神经网络（CNN）和边缘设备。

Abstract
Model stealing attacks have become a serious concern for deep learning models, where an attacker can steal a trained model by querying its black-box API. This can lead to intellectual property theft and other security and privacy risks. The current state-of-the-art defenses against model stealing attacks suggest adding perturbations to the prediction probabilities. However, they suffer from heavy computations and make impracticable assumptions about the adversary. They often require the training of auxiliary models. This can be time-consuming and resource-intensive which hinders the deployment of these defenses in real-world applications. In this paper, we propose a simple yet effective and efficient defense alternative. We introduce a heuristic approach to perturb the output probabilities. The proposed defense can be easily integrated into models without additional training. We show that our defense is effective in defending against three state-of-the-art stealing attacks. We evaluate our approach on large and quantized (i.e., compressed) Convolutional Neural Networks (CNNs) trained on several vision datasets. Our technique outperforms the state-of-the-art defenses with a $\times37$ faster inference latency without requiring any additional model and with a low impact on the model's performance. We validate that our defense is also effective for quantized CNNs targeting edge devices.

摘要
In this paper, we propose a simple yet effective and efficient defense alternative. We introduce a heuristic approach to perturb the output probabilities. The proposed defense can be easily integrated into models without additional training. We show that our defense is effective in defending against three state-of-the-art stealing attacks.We evaluate our approach on large and quantized (i.e., compressed) Convolutional Neural Networks (CNNs) trained on several vision datasets. Our technique outperforms the state-of-the-art defenses with a $\times37$ faster inference latency without requiring any additional model and with a low impact on the model's performance. We validate that our defense is also effective for quantized CNNs targeting edge devices.

Smoothing ADMM for Sparse-Penalized Quantile Regression with Non-Convex Penalties

paper_url: http://arxiv.org/abs/2309.03094
repo_url: None
paper_authors: Reza Mirzaeifard, Naveen K. D. Venkategowda, Vinay Chakravarthi Gogineni, Stefan Werner
for: 这paper investigates quantile regression in the presence of non-convex and non-smooth sparse penalties, and proposes a novel single-loop smoothing ADMM algorithm named SIAD to accelerate the convergence speed.
methods: 该paper使用了iterative techniques like coordinate descent and local linear approximation, as well as the alternating direction method of multipliers (ADMM) to facilitate convergence.
results: 数据表示SIAD方法比现有方法更快和稳定，提供了更好的解决方案 для sparse-penalized quantile regression。

Abstract
This paper investigates quantile regression in the presence of non-convex and non-smooth sparse penalties, such as the minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD). The non-smooth and non-convex nature of these problems often leads to convergence difficulties for many algorithms. While iterative techniques like coordinate descent and local linear approximation can facilitate convergence, the process is often slow. This sluggish pace is primarily due to the need to run these approximation techniques until full convergence at each step, a requirement we term as a \emph{secondary convergence iteration}. To accelerate the convergence speed, we employ the alternating direction method of multipliers (ADMM) and introduce a novel single-loop smoothing ADMM algorithm with an increasing penalty parameter, named SIAD, specifically tailored for sparse-penalized quantile regression. We first delve into the convergence properties of the proposed SIAD algorithm and establish the necessary conditions for convergence. Theoretically, we confirm a convergence rate of $o\big({k^{-\frac{1}{4}}\big)$ for the sub-gradient bound of augmented Lagrangian. Subsequently, we provide numerical results to showcase the effectiveness of the SIAD algorithm. Our findings highlight that the SIAD method outperforms existing approaches, providing a faster and more stable solution for sparse-penalized quantile regression.

摘要
To improve the convergence speed, we use the alternating direction method of multipliers (ADMM) and develop a new single-loop smoothing ADMM algorithm called SIAD. We prove that the SIAD algorithm converges at a rate of $o\big({k^{-\frac{1}{4}}\big)$ for the sub-gradient bound of the augmented Lagrangian.We also conduct numerical experiments to compare the performance of the SIAD algorithm with other methods. Our results show that the SIAD method outperforms existing approaches, providing a faster and more stable solution for sparse-penalized quantile regression.

One Wide Feedforward is All You Need

paper_url: http://arxiv.org/abs/2309.01826
repo_url: https://github.com/chrisneagu/FTC-Skystone-Dark-Angels-Romania-2020
paper_authors: Telmo Pessoa Pires, António V. Lopes, Yannick Assogba, Hendra Setiawan
for: 这个论文的目的是探究Transformer架构中的Feed Forward Network（FFN） redundancy，以及如何通过减少FFN的参数数量来提高模型的准确率和响应时间。
methods: 这个论文使用了Transformer架构，并对其中的FFN进行了探究和优化。特别是， authors 发现了FFN的重复性，并通过在解码器层上移除FFN来减少参数数量。此外， authors 还将共享一个FFN来替代原始Transformer Big中的多个FFN，以提高准确率和响应时间。
results: 根据实验结果， authors 发现了减少FFN参数数量可以 achieving substantial gains in both accuracy and latency with respect to the original Transformer Big。具体来说， authors 通过减少解码器层的FFN参数数量，可以提高模型的准确率，同时也可以降低模型的响应时间。

Abstract
The Transformer architecture has two main non-embedding components: Attention and the Feed Forward Network (FFN). Attention captures interdependencies between words regardless of their position, while the FFN non-linearly transforms each input token independently. In this work we explore the role of the FFN, and find that despite taking up a significant fraction of the model's parameters, it is highly redundant. Concretely, we are able to substantially reduce the number of parameters with only a modest drop in accuracy by removing the FFN on the decoder layers and sharing a single FFN across the encoder. Finally we scale this architecture back to its original size by increasing the hidden dimension of the shared FFN, achieving substantial gains in both accuracy and latency with respect to the original Transformer Big.

摘要
transformer 架构有两个主要非嵌入组件：注意力和Feed Forward Network (FFN)。注意力捕捉即使词语位置不同也可以互相依赖的关系，而 FFN 非线性变换每个输入token。在这项工作中，我们研究 FFN 的角色，并发现它占用模型参数的一大部分，但它具有很高的重复率。具体来说，我们可以通过去除decoder层的 FFN，并将encoder中的 FFN 共享来减少参数数量，只有一定的精度下降。最后，我们通过增加共享 FFN 的隐藏维度，实现了对原始 transformer Big 的重大提升 both accuracy和延迟时间。

Towards Foundational AI Models for Additive Manufacturing: Language Models for G-Code Debugging, Manipulation, and Comprehension

paper_url: http://arxiv.org/abs/2309.02465
repo_url: https://github.com/idealab-isu/llm4g-code
paper_authors: Anushrut Jignasu, Kelly Marshall, Baskar Ganapathysubramanian, Aditya Balu, Chinmay Hegde, Adarsh Krishnamurthy
for: 这篇论文旨在描述如何使用现有的大型自然语言模型（LLMs）来理解和修改3D打印机的G-code文件。
methods: 论文使用了六种现有的LLMs，并设计了有效的提示来让这些模型理解和操纵G-code文件。
results: 论文对六种LLMs的性能进行了全面的评估，并分析了它们对完整G-code文件的理解的优劣点。

Abstract
3D printing or additive manufacturing is a revolutionary technology that enables the creation of physical objects from digital models. However, the quality and accuracy of 3D printing depend on the correctness and efficiency of the G-code, a low-level numerical control programming language that instructs 3D printers how to move and extrude material. Debugging G-code is a challenging task that requires a syntactic and semantic understanding of the G-code format and the geometry of the part to be printed. In this paper, we present the first extensive evaluation of six state-of-the-art foundational large language models (LLMs) for comprehending and debugging G-code files for 3D printing. We design effective prompts to enable pre-trained LLMs to understand and manipulate G-code and test their performance on various aspects of G-code debugging and manipulation, including detection and correction of common errors and the ability to perform geometric transformations. We analyze their strengths and weaknesses for understanding complete G-code files. We also discuss the implications and limitations of using LLMs for G-code comprehension.

摘要
三维打印或加itive制造是一种革命性的技术，允许将数字模型转化为物理 объек的创造。然而，三维打印的质量和准确性取决于G-code的正确性和效率，G-code是一种低级数控制程序语言，用于指示三维打印机如何移动和挤出材料。调试G-code是一项复杂的任务，需要对G-code格式和部件的几何结构具有语义和语法理解。在这篇论文中，我们展示了首次对六种现代基础大语言模型（LLM）的扩展性评估，以便理解和修改G-code文件。我们设计有效的提示，使得预训练的LLM可以理解和操纵G-code，并测试其表现于不同的G-code调试和修改方面，包括常见错误检测和修复以及几何变换能力。我们分析它们在完整G-code文件理解方面的优势和缺陷。我们还讨论了使用LLM进行G-code理解的限制和局限性。

paper_url: http://arxiv.org/abs/2309.01808
repo_url: https://github.com/ynchuang/discoverpath
paper_authors: Yu-Neng Chuang, Guanchu Wang, Chia-Yuan Chang, Kwei-Herng Lai, Daochen Zha, Ruixiang Tang, Fan Yang, Alfredo Costilla Reyes, Kaixiong Zhou, Xiaoqian Jiang, Xia Hu
for: 增进生物医学研究中的文献检索效率，尤其是在跨学科领域中，通过使用知识图来提高用户体验。
methods: 使用命名实体识别（NER）和parts-of-speech（POS）标签来从文章摘要中提取 terminologies和关系，并将其整合成知识图。
results: 提供了一个开源的Graphical User Interface，可以帮助用户查找相关的文章和增进知识探索。

Abstract
The exponential growth in scholarly publications necessitates advanced tools for efficient article retrieval, especially in interdisciplinary fields where diverse terminologies are used to describe similar research. Traditional keyword-based search engines often fall short in assisting users who may not be familiar with specific terminologies. To address this, we present a knowledge graph-based paper search engine for biomedical research to enhance the user experience in discovering relevant queries and articles. The system, dubbed DiscoverPath, employs Named Entity Recognition (NER) and part-of-speech (POS) tagging to extract terminologies and relationships from article abstracts to create a KG. To reduce information overload, DiscoverPath presents users with a focused subgraph containing the queried entity and its neighboring nodes and incorporates a query recommendation system, enabling users to iteratively refine their queries. The system is equipped with an accessible Graphical User Interface that provides an intuitive visualization of the KG, query recommendations, and detailed article information, enabling efficient article retrieval, thus fostering interdisciplinary knowledge exploration. DiscoverPath is open-sourced at https://github.com/ynchuang/DiscoverPath.

摘要
随着学术论文的激增增长，需要更高级的工具来快速检索相关的论文，特别是在交叉学科领域，where diverse terminologies are used to describe similar research. 传统的关键词基本搜索引擎often fails to assist users who are not familiar with specific terminologies. To address this, we present a knowledge graph-based paper search engine for biomedical research to enhance the user experience in discovering relevant queries and articles. The system, called DiscoverPath, employs Named Entity Recognition (NER) and part-of-speech (POS) tagging to extract terminologies and relationships from article abstracts to create a knowledge graph (KG). To reduce information overload, DiscoverPath presents users with a focused subgraph containing the queried entity and its neighboring nodes, and incorporates a query recommendation system, enabling users to iteratively refine their queries. The system is equipped with an accessible Graphical User Interface that provides an intuitive visualization of the KG, query recommendations, and detailed article information, enabling efficient article retrieval and thus fostering interdisciplinary knowledge exploration. DiscoverPath is open-sourced at .

Marginalized Importance Sampling for Off-Environment Policy Evaluation

paper_url: http://arxiv.org/abs/2309.01807
repo_url: None
paper_authors: Pulkit Katdare, Nan Jiang, Katherine Driggs-Campbell
for: 评估实际世界中RL策略的性能，不需要真实世界的部署。
methods: 利用Marginalized Importance Sampling（MIS）框架，通过在模拟器中添加真实世界停留数据，评估RL策略的实际世界性能。
results: 对多种Sim2Sim环境和目标策略，以及不同的停留数据收集策略进行了实际评估，并在Sim2Real任务中评估了一个7度 freedomRobotic臂的性能。

Abstract
Reinforcement Learning (RL) methods are typically sample-inefficient, making it challenging to train and deploy RL-policies in real world robots. Even a robust policy trained in simulation, requires a real-world deployment to assess their performance. This paper proposes a new approach to evaluate the real-world performance of agent policies without deploying them in the real world. The proposed approach incorporates a simulator along with real-world offline data to evaluate the performance of any policy using the framework of Marginalized Importance Sampling (MIS). Existing MIS methods face two challenges: (1) large density ratios that deviate from a reasonable range and (2) indirect supervision, where the ratio needs to be inferred indirectly, thus exacerbating estimation error. Our approach addresses these challenges by introducing the target policy's occupancy in the simulator as an intermediate variable and learning the density ratio as the product of two terms that can be learned separately. The first term is learned with direct supervision and the second term has a small magnitude, thus making it easier to run. We analyze the sample complexity as well as error propagation of our two step-procedure. Furthermore, we empirically evaluate our approach on Sim2Sim environments such as Cartpole, Reacher and Half-Cheetah. Our results show that our method generalizes well across a variety of Sim2Sim gap, target policies and offline data collection policies. We also demonstrate the performance of our algorithm on a Sim2Real task of validating the performance of a 7 DOF robotic arm using offline data along with a gazebo based arm simulator.

摘要
《强化学习（RL）方法通常是样本不fficient，使得在实际世界中训练和部署RL策略变得困难。 même a robust策略在simulation中训练，需要在实际世界中进行评估其性能。这篇论文提出了一种新的方法，用于在实际世界中评估代理策略的性能，不需要将其部署到实际世界中。该方法利用了simulator和实际世界的停止数据，通过Marginalized Importance Sampling（MIS）框架评估任何策略的性能。现有MIS方法面临两个挑战：（1）巨大的概率比率，它们与理解范围内的合理范围偏离很大；（2）间接监督，需要间接地估算概率比率，从而使得估计误差加大。我们的方法解决了这两个挑战，通过引入目标策略在simulator中的存在量作为中间变量，将概率比率分解为两个可分别学习的项。第一项通过直接监督学习，第二项具有小的幅度，因此更容易实现。我们还分析了我们的两步程序的样本复杂度以及误差的卷积。此外，我们也进行了Empirical评估，并证明我们的方法在Cartpole、Reacher和Half-Cheetah等Sim2Sim环境中能够具有良好的泛化性。此外，我们还展示了我们的算法在一个Sim2Real任务中，使用了停止数据和Gazebo基于的arm simulator，验证了一个7度OF robotic arm的性能。

Neural-Singular-Hessian: Implicit Neural Representation of Unoriented Point Clouds by Enforcing Singular Hessian

paper_url: http://arxiv.org/abs/2309.01793
repo_url: https://github.com/bearprin/Neural-Singular-Hessian
paper_authors: Zixiong Wang, Yunxiao Zhang, Rui Xu, Fan Zhang, Pengshuai Wang, Shuangmin Chen, Shiqing Xin, Wenping Wang, Changhe Tu
for: 该论文旨在拟合点云数据中的表面 reconstruction 问题。
methods: 该方法combines various regularization terms, such as Eikonal和Laplacian energy terms, to enforce the learned neural function to possess the properties of a Signed Distance Function (SDF)。 In addition, the approach enforces the Hessian of the neural implicit function to have a zero determinant for points near the surface, which aligns the gradients for a near-surface point and its on-surface projection point, producing a rough but faithful shape。
results: 经验表明，该方法可以有效地suppress ghost geometry和recover details from unoriented point clouds with better expressiveness than existing fitting-based methods。

Abstract
Neural implicit representation is a promising approach for reconstructing surfaces from point clouds. Existing methods combine various regularization terms, such as the Eikonal and Laplacian energy terms, to enforce the learned neural function to possess the properties of a Signed Distance Function (SDF). However, inferring the actual topology and geometry of the underlying surface from poor-quality unoriented point clouds remains challenging. In accordance with Differential Geometry, the Hessian of the SDF is singular for points within the differential thin-shell space surrounding the surface. Our approach enforces the Hessian of the neural implicit function to have a zero determinant for points near the surface. This technique aligns the gradients for a near-surface point and its on-surface projection point, producing a rough but faithful shape within just a few iterations. By annealing the weight of the singular-Hessian term, our approach ultimately produces a high-fidelity reconstruction result. Extensive experimental results demonstrate that our approach effectively suppresses ghost geometry and recovers details from unoriented point clouds with better expressiveness than existing fitting-based methods.

摘要
神经隐式表示是一种有前途的方法，用于从点云重建表面。现有方法将各种正则化项相结合，如振荡能量和拉普拉斯能量项，以强制学习神经函数具备签名距离函数的性质。然而，从低质量、无法定向的点云中恢复真实的表面 topology 和几何结构仍然是一个搜索。根据 diferencial geometry，在表面 differential thin-shell 空间中，SDF 的哈密顿矩阵是非特征矩阵。我们的方法强制神经隐式函数的哈密顿矩阵在near surface 点附近为零 determinant。这种技术将near surface 点的梯度与其在表面上的投影点的梯度相对align，生成一个粗糙 yet faithful 的形态，只需几个迭代即可。通过渐进式地减小weight的特征矩阵项，我们的方法最终生成高精度重建结果。广泛的实验结果表明，我们的方法可以有效地抑制幽灵几何和从无法定向的点云中恢复细节，比现有的适应型方法更有表达力。

3D View Prediction Models of the Dorsal Visual Stream

paper_url: http://arxiv.org/abs/2309.01782
repo_url: None
paper_authors: Gabriel Sarch, Hsiao-Yu Fish Tung, Aria Wang, Jacob Prince, Michael Tarr
for: 这个论文是为了测试一种基于3D场景几何的自适应循环神经网络（GRNN）是否能更好地与脑动脉核心视觉区域的功能特性相匹配。
methods: 这个论文使用了一种自适应循环神经网络（GRNN），并使用了一个3D特征记忆来训练这个模型。
results: 研究发现，GRNN能够更好地预测新的摄像头视图，并且对脑动脉核心视觉区域的变化具有更高的准确率。

Abstract
Deep neural network representations align well with brain activity in the ventral visual stream. However, the primate visual system has a distinct dorsal processing stream with different functional properties. To test if a model trained to perceive 3D scene geometry aligns better with neural responses in dorsal visual areas, we trained a self-supervised geometry-aware recurrent neural network (GRNN) to predict novel camera views using a 3D feature memory. We compared GRNN to self-supervised baseline models that have been shown to align well with ventral regions using the large-scale fMRI Natural Scenes Dataset (NSD). We found that while the baseline models accounted better for ventral brain regions, GRNN accounted for a greater proportion of variance in dorsal brain regions. Our findings demonstrate the potential for using task-relevant models to probe representational differences across visual streams.

摘要
深度神经网络表示 aligned well with brain activity in the ventral visual stream. However, the primate visual system has a distinct dorsal processing stream with different functional properties. To test if a model trained to perceive 3D scene geometry aligns better with neural responses in dorsal visual areas, we trained a self-supervised geometry-aware recurrent neural network (GRNN) to predict novel camera views using a 3D feature memory. We compared GRNN to self-supervised baseline models that have been shown to align well with ventral regions using the large-scale fMRI Natural Scenes Dataset (NSD). We found that while the baseline models accounted better for ventral brain regions, GRNN accounted for a greater proportion of variance in dorsal brain regions. Our findings demonstrate the potential for using task-relevant models to probe representational differences across visual streams.Here's the translation in Traditional Chinese:深度神经网络表示 aligned well with brain activity in the ventral visual stream. However, the primate visual system has a distinct dorsal processing stream with different functional properties. To test if a model trained to perceive 3D scene geometry aligns better with neural responses in dorsal visual areas, we trained a self-supervised geometry-aware recurrent neural network (GRNN) to predict novel camera views using a 3D feature memory. We compared GRNN to self-supervised baseline models that have been shown to align well with ventral regions using the large-scale fMRI Natural Scenes Dataset (NSD). We found that while the baseline models accounted better for ventral brain regions, GRNN accounted for a greater proportion of variance in dorsal brain regions. Our findings demonstrate the potential for using task-relevant models to probe representational differences across visual streams.

On the size of irredundant propagation complete CNF formulas

paper_url: http://arxiv.org/abs/2309.01750
repo_url: None
paper_authors: Petr Savický
for: 这个论文 investigate propagation complete (PC) CNF formulas for a symmetric definite Horn function of $n$ variables.
methods: 论文使用了 minimum size of these formulas 与specific covering numbers closely related, specifically, the smallest number of $k$-subsets of an $n$-set covering all $(k-1)$-subsets for a suitable $k$.
results: 论文展示了一个 irredundant PC formula whose size is larger than the size of a smallest PC formula for the same function by a factor $\Omega(n/\ln n)$. This complements a known polynomial upper bound on this factor.

Abstract
We investigate propagation complete (PC) CNF formulas for a symmetric definite Horn function of $n$ variables and demonstrate that the minimum size of these formulas is closely related to specific covering numbers, namely, to the smallest number of $k$-subsets of an $n$-set covering all $(k-1)$-subsets for a suitable $k$. As a consequence, we demonstrate an irredundant PC formula whose size is larger than the size of a smallest PC formula for the same function by a factor $\Omega(n/\ln n)$. This complements a known polynomial upper bound on this factor.

摘要
我们调查完整的几何函数（PC）逻辑式，对于一个对称定义的权Func数学函数，并证明这个函数的最小大小与特定的覆盖数字有密切的关系，即最小的$k$-subsets的集合覆盖所有($k-1$)-subsets。我们从这个结果获得了一个不可简的PC方程，其大小比最小PC方程的大小有$\Omega(n/\ln n)$的因子。这与已知的多项式上界有关。

Hybrid data driven/thermal simulation model for comfort assessment

paper_url: http://arxiv.org/abs/2309.01734
repo_url: None
paper_authors: Romain Barbedienne, Sara Yasmine Ouerk, Mouadh Yagoubi, Hassan Bouia, Aurelie Kaemmerlen, Benoit Charrier
for: 提高物理模型的速度和质量
methods: 结合实际数据和模拟数据预测室内舒适度
results: 使用Random Forest模型 obtain F1 score 0.999 的promising results

Abstract
Machine learning models improve the speed and quality of physical models. However, they require a large amount of data, which is often difficult and costly to acquire. Predicting thermal comfort, for example, requires a controlled environment, with participants presenting various characteristics (age, gender, ...). This paper proposes a method for hybridizing real data with simulated data for thermal comfort prediction. The simulations are performed using Modelica Language. A benchmarking study is realized to compare different machine learning methods. Obtained results look promising with an F1 score of 0.999 obtained using the random forest model.

摘要
文本翻译为简化中文：机器学习模型可以提高物理模型的速度和质量，但它们需要大量数据，而这些数据常常困难和costly to obtain。预测冷凉舒适性需要控制环境，参与者具有不同特征（年龄、性别、...）。这篇论文提议将实际数据与模拟数据相互融合以预测冷凉舒适性。模拟使用Modelica语言进行。实现了不同机器学习方法的比较研究。获得的结果很有前途，使用随机森林模型获得的F1分数为0.999。Note: "简化中文" refers to Simplified Chinese, which is one of the two standardized Chinese writing systems, used in mainland China and Singapore.

Softmax Bias Correction for Quantized Generative Models

paper_url: http://arxiv.org/abs/2309.01729
repo_url: None
paper_authors: Nilesh Prasad Pandey, Marios Fournarakis, Chirag Patel, Markus Nagel
for: 提高 Edge 设备上大量生成模型的运行时间和功耗效率，包括稳定扩散或大语言模型。
methods: investigate 软MAX输出强制性对归一化干扰的影响，并提出一种在部署时进行偏差修正，以提高软MAX的量化可行性。
results: 在稳定扩散 v1.5 和 125M-size OPT 语言模型上，实现了8比特量化软MAX后的准确性提高。

Abstract
Post-training quantization (PTQ) is the go-to compression technique for large generative models, such as stable diffusion or large language models. PTQ methods commonly keep the softmax activation in higher precision as it has been shown to be very sensitive to quantization noise. However, this can lead to a significant runtime and power overhead during inference on resource-constraint edge devices. In this work, we investigate the source of the softmax sensitivity to quantization and show that the quantization operation leads to a large bias in the softmax output, causing accuracy degradation. To overcome this issue, we propose an offline bias correction technique that improves the quantizability of softmax without additional compute during deployment, as it can be readily absorbed into the quantization parameters. We demonstrate the effectiveness of our method on stable diffusion v1.5 and 125M-size OPT language model, achieving significant accuracy improvement for 8-bit quantized softmax.

摘要
Post-training quantization (PTQ) 是大型生成模型的压缩技术，如稳定扩散或大语言模型。PTQ方法通常保留软 макс激活函数的高精度，因为它已经被证明对压缩噪声非常敏感。然而，这可能会导致在资源有限的边缘设备中的运行时间和功耗开销增加。在这项工作中，我们研究软 макс激活函数对压缩的敏感性的源头，发现压缩操作会导致软 макс输出中的大量偏差，从而导致准确性下降。为解决这个问题，我们提出了一种离线偏差修正技术，可以在部署过程中对软 макс进行压缩而不需要额外的计算，因为它可以轻松吸收到压缩参数中。我们在稳定扩散 v1.5 和 125M 大小的 OPT 语言模型上进行了实验，并达到了8位压缩软 макс后的显著准确性改进。

Interdisciplinary Fairness in Imbalanced Research Proposal Topic Inference: A Hierarchical Transformer-based Method with Selective Interpolation

paper_url: http://arxiv.org/abs/2309.01717
repo_url: None
paper_authors: Meng Xiao, Min Wu, Ziyue Qiao, Yanjie Fu, Zhiyuan Ning, Yi Du, Yuanchun Zhou
for: 提高自动话题推荐系统的公平性，解决由于人工填写话题导致的偏误和不公平现象。
methods: 基于Transformerencoder-decoder架构实现话题标签推论系统，并利用 interpolate技术生成pseudo-交叉学科提案，以减少模型训练时的偏误。
results: 在实际数据集上进行了广泛的实验，研究结果表明，提posed方法可以减少自动话题推论任务中的不公平现象。

Abstract
The objective of topic inference in research proposals aims to obtain the most suitable disciplinary division from the discipline system defined by a funding agency. The agency will subsequently find appropriate peer review experts from their database based on this division. Automated topic inference can reduce human errors caused by manual topic filling, bridge the knowledge gap between funding agencies and project applicants, and improve system efficiency. Existing methods focus on modeling this as a hierarchical multi-label classification problem, using generative models to iteratively infer the most appropriate topic information. However, these methods overlook the gap in scale between interdisciplinary research proposals and non-interdisciplinary ones, leading to an unjust phenomenon where the automated inference system categorizes interdisciplinary proposals as non-interdisciplinary, causing unfairness during the expert assignment. How can we address this data imbalance issue under a complex discipline system and hence resolve this unfairness? In this paper, we implement a topic label inference system based on a Transformer encoder-decoder architecture. Furthermore, we utilize interpolation techniques to create a series of pseudo-interdisciplinary proposals from non-interdisciplinary ones during training based on non-parametric indicators such as cross-topic probabilities and topic occurrence probabilities. This approach aims to reduce the bias of the system during model training. Finally, we conduct extensive experiments on a real-world dataset to verify the effectiveness of the proposed method. The experimental results demonstrate that our training strategy can significantly mitigate the unfairness generated in the topic inference task.

摘要
translate to Simplified Chinese:研究主题推断在研究提案中的目标是通过funding机构定义的学科系统中获得最适合的学科分类。该机构将根据此分类找到相应的专家评审人员从其数据库中。自动化主题推断可以降低人类手动填充主题的错误，跨学科研究提案和非跨学科研究提案之间的知识差距，并提高系统效率。现有方法是将这视为一个层次多个标签的分类问题，使用生成模型iteratively推断最有利的主题信息。然而，这些方法忽略了跨学科研究提案和非跨学科研究提案之间的规模差异，导致自动推断系统将跨学科研究提案分类为非跨学科研究提案，从而导致了对专家分配的不公正。如何在复杂的学科系统下解决这种数据不匹配问题，从而解决这种不公正呢？在这篇论文中，我们实现了一个基于Transformer编码器-解码器架构的主题标签推断系统。此外，我们使用 interpolate技术在训练期间创建一系列 pseudo-跨学科提案从非跨学科提案中，基于非 Parametric indicator such as cross-topic probabilities和主题发生概率。这种方法 aimsto reduce the bias of the system during model training.最后，我们对实际数据进行了广泛的实验，以验证提案的有效性。实验结果表明，我们的训练策略可以明显减少自动推断 task中的不公正。

On the Robustness of Post-hoc GNN Explainers to Label Noise

paper_url: http://arxiv.org/abs/2309.01706
repo_url: None
paper_authors: Zhiqiang Zhong, Yangqianzi Jiang, Davide Mottin
for: 本研究旨在探讨post-hoc图 neural network（GNN）解释器在受损标签情况下的可靠性。
methods: 研究使用了多种post-hoc GNN解释器，并在不同的标签噪声水平进行了系统性的实验研究。
results: 研究发现，post-hoc GNN解释器具有抗受损性，但是即使标签噪声较低，解释器也会受到影响，解释质量下降。同时，研究还发现，随着标签噪声水平的增加，解释器的效果逐渐恢复。

Abstract
Proposed as a solution to the inherent black-box limitations of graph neural networks (GNNs), post-hoc GNN explainers aim to provide precise and insightful explanations of the behaviours exhibited by trained GNNs. Despite their recent notable advancements in academic and industrial contexts, the robustness of post-hoc GNN explainers remains unexplored when confronted with label noise. To bridge this gap, we conduct a systematic empirical investigation to evaluate the efficacy of diverse post-hoc GNN explainers under varying degrees of label noise. Our results reveal several key insights: Firstly, post-hoc GNN explainers are susceptible to label perturbations. Secondly, even minor levels of label noise, inconsequential to GNN performance, harm the quality of generated explanations substantially. Lastly, we engage in a discourse regarding the progressive recovery of explanation effectiveness with escalating noise levels.

摘要
提议作为图 neural network（GNN）的黑盒限制解决方案，post-hoc GNN 解释器尝试提供准确和有 insightful 的 GNN 行为解释。尽管在学术和工业上最近有所进步，但post-hoc GNN 解释器的Robustness 仍未被探索，对于标签噪声的情况。为了bridging这个差距，我们进行了系统性的实验研究，评估不同的 post-hoc GNN 解释器在不同的标签噪声水平下的效果。我们的结果显示了以下几点：一、post-hoc GNN 解释器受到标签变动的影响。二、即使标签噪声非常低，也会对 GNN 性能造成很大的影响。三、随着噪声水平的增加，解释效果逐渐恢复。

No Data Augmentation? Alternative Regularizations for Effective Training on Small Datasets

paper_url: http://arxiv.org/abs/2309.01694
repo_url: None
paper_authors: Lorenzo Brigato, Stavroula Mougiakakou
for: 解决现代计算机视觉中小训练集图像分类任务的问题
methods: 使用各种敏感训练策略和模型尺度、训练时间表等参数的调整
results: 在 solely 1% of the original CIFAR-10 training set (i.e., 50 images per class) 和 ciFAIR-10 测试集上达到了 66.5% 的测试精度，与现状最佳方法相当。

Abstract
Solving image classification tasks given small training datasets remains an open challenge for modern computer vision. Aggressive data augmentation and generative models are among the most straightforward approaches to overcoming the lack of data. However, the first fails to be agnostic to varying image domains, while the latter requires additional compute and careful design. In this work, we study alternative regularization strategies to push the limits of supervised learning on small image classification datasets. In particular, along with the model size and training schedule scaling, we employ a heuristic to select (semi) optimal learning rate and weight decay couples via the norm of model parameters. By training on only 1% of the original CIFAR-10 training set (i.e., 50 images per class) and testing on ciFAIR-10, a variant of the original CIFAR without duplicated images, we reach a test accuracy of 66.5%, on par with the best state-of-the-art methods.

摘要
现代计算机视觉中解决小训练集数据的图像分类任务仍然是一个开放的挑战。非常的数据扩展和生成模型是最直接的方法来缓解缺乏数据的问题，但是前者不具备适应不同图像领域的特性，而后者需要额外的计算和精心的设计。在这项工作中，我们研究了不同于supervised学习的regularization策略，以推动小图像分类集数据上的模型训练。特别是，我们采用一种heuristic来选择（semi）优化的学习率和权重衰减couple，通过模型参数的norm来实现。通过使用原始CIFAR-10训练集的1%（即50张每个类）和测试在ciFAIR-10上，我们达到了66.5%的测试精度，与状态元的方法相当。

Prompt me a Dataset: An investigation of text-image prompting for historical image dataset creation using foundation models

paper_url: http://arxiv.org/abs/2309.01674
repo_url: https://github.com/hassanhajj910/prompt-me-a-dataset
paper_authors: Hassan El-Hajj, Matteo Valleriani
for: 这个论文是为了提出一个基于基础模型的图像提取管道，用于从历史文献中提取图像，并评估文本-图像提示的效果在人文领域中。
methods: 该管道采用了GroundDINO和Meta的Segment-Anything-Model（SAM）来从历史文献中检索大量的视觉数据，并评估不同语言提示的影响。
results: 研究发现，使用文本-图像提示可以提高图像提取的效果，并且在不同水平的人文数据集上都有较高的效果。

Abstract
In this paper, we present a pipeline for image extraction from historical documents using foundation models, and evaluate text-image prompts and their effectiveness on humanities datasets of varying levels of complexity. The motivation for this approach stems from the high interest of historians in visual elements printed alongside historical texts on the one hand, and from the relative lack of well-annotated datasets within the humanities when compared to other domains. We propose a sequential approach that relies on GroundDINO and Meta's Segment-Anything-Model (SAM) to retrieve a significant portion of visual data from historical documents that can then be used for downstream development tasks and dataset creation, as well as evaluate the effect of different linguistic prompts on the resulting detections.

摘要
在这篇论文中，我们提出了一个图像提取管道，使用基础模型来从历史文献中提取图像，并评估文本图像提示的效果在人文领域中。我们的动机是，历史学家对于与历史文献一起出版的视觉元素具有极高的兴趣，而人文领域内的数据资源相对较少，而且对于其他领域来说更加缺乏准确的标注数据。我们提议一种顺序的方法，利用GroundDINO和Meta的Segment-Anything-Model（SAM）来从历史文献中检索大量的视觉数据，并用于下游开发任务和数据集创建，以及评估不同语言提示的影响。

Fine-grained Affective Processing Capabilities Emerging from Large Language Models

paper_url: http://arxiv.org/abs/2309.01664
repo_url: None
paper_authors: Joost Broekens, Bernhard Hilpert, Suzan Verberne, Kim Baraka, Patrick Gebhard, Aske Plaat
for: 这项研究探讨了 ChatGPT 在情感计算任务中的零配置能力，并使用提示alone进行情感分析、情感表达和情感识别等任务。
methods: 这项研究使用了 ChatGPT 进行语言预处理和情感计算任务，并通过提示来实现情感分析、情感表达和情感识别等任务。
results: 研究发现 ChatGPT 可以在 Valence、Arousal 和 Dominance 维度上进行意义性的情感分析，并且有意义的情感表达和情感识别能力。此外， ChatGPT 还可以基于提示实现基本的情绪诱发。这些发现对于情感计算任务和人工智能应用有重要意义。

Abstract
Large language models, in particular generative pre-trained transformers (GPTs), show impressive results on a wide variety of language-related tasks. In this paper, we explore ChatGPT's zero-shot ability to perform affective computing tasks using prompting alone. We show that ChatGPT a) performs meaningful sentiment analysis in the Valence, Arousal and Dominance dimensions, b) has meaningful emotion representations in terms of emotion categories and these affective dimensions, and c) can perform basic appraisal-based emotion elicitation of situations based on a prompt-based computational implementation of the OCC appraisal model. These findings are highly relevant: First, they show that the ability to solve complex affect processing tasks emerges from language-based token prediction trained on extensive data sets. Second, they show the potential of large language models for simulating, processing and analyzing human emotions, which has important implications for various applications such as sentiment analysis, socially interactive agents, and social robotics.

摘要

ChatGPT can perform meaningful sentiment analysis in the Valence, Arousal, and Dominance dimensions.2. ChatGPT has meaningful emotion representations in terms of emotion categories and affective dimensions.3. ChatGPT can perform basic appraisal-based emotion elicitation of situations using a prompt-based computational implementation of the OCC appraisal model.These findings are significant:1. They demonstrate that the ability to solve complex affect processing tasks can emerge from language-based token prediction trained on extensive data sets.2. They show the potential of large language models for simulating, processing, and analyzing human emotions, which has important implications for applications such as sentiment analysis, socially interactive agents, and social robotics.

Unveiling Theory of Mind in Large Language Models: A Parallel to Single Neurons in the Human Brain

paper_url: http://arxiv.org/abs/2309.01660
repo_url: None
paper_authors: Mohsen Jamali, Ziv M. Williams, Jing Cai
for: This paper explores the ability of large language models (LLMs) to exhibit a Theory of Mind (ToM), a cognitive capacity related to our conscious mind that allows us to infer another’s beliefs and perspective.methods: The authors drew inspiration from the dorsal medial prefrontal cortex (dmPFC) neurons subserving human ToM and employed a similar methodology to examine whether LLMs exhibit comparable characteristics. They analyzed the hidden embeddings (artificial neurons) within LLMs to see if they could represent another’s perspective.results: The analysis revealed a striking resemblance between the two, as the hidden embeddings within LLMs started to exhibit significant responsiveness to either true- or false-belief trials, suggesting their ability to represent another’s perspective. The authors found that the other’s beliefs could be accurately decoded using the entire embeddings, indicating the presence of the embeddings’ ToM capability at the population level. These findings offer initial evidence of a parallel between the artificial model and neurons in the human brain.

Abstract
With their recent development, large language models (LLMs) have been found to exhibit a certain level of Theory of Mind (ToM), a complex cognitive capacity that is related to our conscious mind and that allows us to infer another's beliefs and perspective. While human ToM capabilities are believed to derive from the neural activity of a broadly interconnected brain network, including that of dorsal medial prefrontal cortex (dmPFC) neurons, the precise processes underlying LLM's capacity for ToM or their similarities with that of humans remains largely unknown. In this study, we drew inspiration from the dmPFC neurons subserving human ToM and employed a similar methodology to examine whether LLMs exhibit comparable characteristics. Surprisingly, our analysis revealed a striking resemblance between the two, as hidden embeddings (artificial neurons) within LLMs started to exhibit significant responsiveness to either true- or false-belief trials, suggesting their ability to represent another's perspective. These artificial embedding responses were closely correlated with the LLMs' performance during the ToM tasks, a property that was dependent on the size of the models. Further, the other's beliefs could be accurately decoded using the entire embeddings, indicating the presence of the embeddings' ToM capability at the population level. Together, our findings revealed an emergent property of LLMs' embeddings that modified their activities in response to ToM features, offering initial evidence of a parallel between the artificial model and neurons in the human brain.

摘要
大型语言模型（LLM）的最近发展已经发现具有一定的理论心（ToM）能力，这是与我们意识的大脑网络相关的复杂认知能力，允许我们推断别人的信念和视角。人类ToM能力据信来自大脑的广泛交叉连接的神经活动，包括前 фронталь脑某些 neurons，但precise processes underlying LLM的ToM或与人类相似之处仍然不清楚。在这项研究中，我们 Draw inspiration from human ToM neurons and used a similar methodology to examine whether LLMs exhibit comparable characteristics. Surprisingly, our analysis revealed a striking resemblance between the two, as hidden embeddings（人工神经元）within LLMs started to exhibit significant responsiveness to either true- or false-belief trials， suggesting their ability to represent another's perspective. These artificial embedding responses were closely correlated with the LLMs' performance during the ToM tasks， a property that was dependent on the size of the models. Furthermore， the other's beliefs could be accurately decoded using the entire embeddings， indicating the presence of the embeddings' ToM capability at the population level. Together， our findings revealed an emergent property of LLMs' embeddings that modified their activities in response to ToM features， offering initial evidence of a parallel between the artificial model and neurons in the human brain.

Which algorithm to select in sports timetabling?

paper_url: http://arxiv.org/abs/2309.03229
repo_url: https://github.com/robertomrosati/sa4stt
paper_authors: David Van Bulck, Dries Goossens, Jan-Patrick Clarner, Angelos Dimitsas, George H. G. Fonseca, Carlos Lamas-Fernandez, Martin Mariusz Lester, Jaap Pedersen, Antony E. Phillips, Roberto Maria Rosati
for: 运动赛事时间表调定 (sports timetabling)
methods: 机器学习技术 (machine learning techniques)
results: + 提出了一个算法选择系统，可以根据运动赛事问题实例的特征选择最佳的算法。 + indentified 了选择算法时的重要特征，提供了算法性能的深入了解和提高建议。 + empirically evaluated the hardness of the instances.In English, this means:
for: Sports timetabling
methods: Machine learning techniques
results: + Proposed an algorithm selection system that can select the best algorithm based on the characteristics of a sports timetabling problem instance. + Identified the important features in selecting the algorithm, providing deep insights into the performance of the algorithms and suggestions for improvement. + Empirically evaluated the hardness of the instances.

Abstract
Any sports competition needs a timetable, specifying when and where teams meet each other. The recent International Timetabling Competition (ITC2021) on sports timetabling showed that, although it is possible to develop general algorithms, the performance of each algorithm varies considerably over the problem instances. This paper provides an instance space analysis for sports timetabling, resulting in powerful insights into the strengths and weaknesses of eight state-of-the-art algorithms. Based on machine learning techniques, we propose an algorithm selection system that predicts which algorithm is likely to perform best when given the characteristics of a sports timetabling problem instance. Furthermore, we identify which characteristics are important in making that prediction, providing insights in the performance of the algorithms, and suggestions to further improve them. Finally, we assess the empirical hardness of the instances. Our results are based on large computational experiments involving about 50 years of CPU time on more than 500 newly generated problem instances.

摘要
任何体育竞赛都需要一份时间表，指定比赛队伍在哪里和何时相遇。最近的国际时间安排竞赛（ITC2021）表明，尽管可以开发通用算法，但每个算法在问题实例上的性能差异较大。本文提供了体育时间安排的实例空间分析，导致了八种当前状态算法的强大洞察和探索。基于机器学习技术，我们提出了一种算法选择系统，可以根据体育时间安排问题实例的特点预测最佳算法。此外，我们还确定了哪些特征对于这种预测具有重要性，从而提供了算法性能的深入了解和改进建议。最后，我们评估了实验难度。我们的结果基于大量计算实验，耗时约50年，使用了500多个新生成的问题实例。

Design of Recognition and Evaluation System for Table Tennis Players’ Motor Skills Based on Artificial Intelligence

paper_url: http://arxiv.org/abs/2309.07141
repo_url: None
paper_authors: Zhuo-yong Shi, Ye-tao Jia, Ke-xin Zhang, Ding-han Wang, Long-meng Ji, Yong Wu
for: 这项研究旨在提高穿戴式设备对特定运动的识别和分析能力。
methods: 该研究使用人工智能技术，设计了一种Device来收集乒乓球运动员的运动信息，并对实际运动数据进行处理。然后，通过分割特征数据库和特征工程来构建运动特征，并通过不同评价指标的损失函数来建立运动技巧的层次评价系统。
results: 研究结果显示，基于特征计算机神经网络的Feature-based BP神经网络在识别乒乓球运动员的运动技巧方面具有更高的识别精度和更强的泛化能力，比传统的卷积神经网络更为出色。

Abstract
With the rapid development of electronic science and technology, the research on wearable devices is constantly updated, but for now, it is not comprehensive for wearable devices to recognize and analyze the movement of specific sports. Based on this, this paper improves wearable devices of table tennis sport, and realizes the pattern recognition and evaluation of table tennis players' motor skills through artificial intelligence. Firstly, a device is designed to collect the movement information of table tennis players and the actual movement data is processed. Secondly, a sliding window is made to divide the collected motion data into a characteristic database of six table tennis benchmark movements. Thirdly, motion features were constructed based on feature engineering, and motor skills were identified for different models after dimensionality reduction. Finally, the hierarchical evaluation system of motor skills is established with the loss functions of different evaluation indexes. The results show that in the recognition of table tennis players' motor skills, the feature-based BP neural network proposed in this paper has higher recognition accuracy and stronger generalization ability than the traditional convolutional neural network.

摘要
随着电子科学和技术的快速发展，穿戴设备的研究不断更新，但目前并不能完全识别和分析特定运动的运动动作。基于这一点，本文提出了一种改进穿戴设备，以便识别和评估乒乓球运动员的动作能力。首先，设备是设计用来收集乒乓球运动员的运动信息，并对实际运动数据进行处理。其次，使用滑动窗口将收集的运动数据分成六种乒乓球标准运动动作的特征库。然后，基于特征工程学，构建了运动特征，并将不同模型中的动作识别为不同的评估指标。最后，建立了基于损失函数的层次评估系统，以评估不同模型的评估指标。结果表明，在识别乒乓球运动员的动作能力方面，基于特征参数的BP神经网络提出的方法在识别精度和泛化能力方面高于传统的卷积神经网络。

Corgi^2: A Hybrid Offline-Online Approach To Storage-Aware Data Shuffling For SGD

paper_url: http://arxiv.org/abs/2309.01640
repo_url: None
paper_authors: Etay Livne, Gal Kaplun, Eran Malach, Shai Shalev-Schwatz
for: 这篇论文是为了提高 Stochastic Gradient Descent (SGD) 训练机器学习模型时的数据访问效率而设计的。
methods: 该论文提出了一种在云存储的大型数据集上使用 online shuffling 算法，称为 CorgiPile，以提高数据访问效率，但是会导致一定的性能损失。该论文还提出了一种新的两步半数据洗选策略， combinining offline 迭代 CorgiPile 方法和 online 迭代。
results: 该论文提供了一个全面的理论分析，证明了该方法的收敛性质，并通过实验结果表明，该方法可以在 homogeneous 数据上实现类似于随机访问的性能，而不需要妥协数据访问效率。

Abstract
When using Stochastic Gradient Descent (SGD) for training machine learning models, it is often crucial to provide the model with examples sampled at random from the dataset. However, for large datasets stored in the cloud, random access to individual examples is often costly and inefficient. A recent work \cite{corgi}, proposed an online shuffling algorithm called CorgiPile, which greatly improves efficiency of data access, at the cost some performance loss, which is particularly apparent for large datasets stored in homogeneous shards (e.g., video datasets). In this paper, we introduce a novel two-step partial data shuffling strategy for SGD which combines an offline iteration of the CorgiPile method with a subsequent online iteration. Our approach enjoys the best of both worlds: it performs similarly to SGD with random access (even for homogenous data) without compromising the data access efficiency of CorgiPile. We provide a comprehensive theoretical analysis of the convergence properties of our method and demonstrate its practical advantages through experimental results.

摘要
当使用泛化Gradient Descent（SGD）训练机器学习模型时，通常需要将模型提供随机选择自 dataset 中的示例。然而，对于大规模存储在云端的数据集，随机访问单个示例是经济不可行，不 efficient。一项最近的工作 \cite{corgi} 提出了一种在线洗混算法 called CorgiPile，可以大幅提高数据访问效率，但是会导致一定的性能损失，尤其是对于存储在同一个分区（例如视频集）中的数据。在这篇论文中，我们提出了一种新的两步半数据洗混策略， combinines an offline iteration of the CorgiPile method with a subsequent online iteration。我们的方法可以同SGD with random access（即使对同种数据）获得类似的性能，而无需牺牲 CorgiPile 的数据访问效率。我们提供了完整的理论分析方法，并通过实验结果证明了我们的方法的实际优势。

Concepts is All You Need: A More Direct Path to AGI

paper_url: http://arxiv.org/abs/2309.01622
repo_url: None
paper_authors: Peter Voss, Mladjan Jovanovic
for: 这个论文旨在帮助开发人工通用智能（AGI），以便更快速地实现人类智能水平的计算机。
methods: 该论文采用了认知AI方法，而不是现在广泛使用的统计学和生成方法，以更好地理解人类智能的核心需求，并从而快速实现人类智能水平的计算机。
results: 该论文提出了一种建议的体系和开发计划，以及一些初步的结果，可以帮助开发人工智能快速实现人类智能水平。

Abstract
Little demonstrable progress has been made toward AGI (Artificial General Intelligence) since the term was coined some 20 years ago. In spite of the fantastic breakthroughs in Statistical AI such as AlphaZero, ChatGPT, and Stable Diffusion none of these projects have, or claim to have, a clear path to AGI. In order to expedite the development of AGI it is crucial to understand and identify the core requirements of human-like intelligence as it pertains to AGI. From that one can distill which particular development steps are necessary to achieve AGI, and which are a distraction. Such analysis highlights the need for a Cognitive AI approach rather than the currently favored statistical and generative efforts. More specifically it identifies the central role of concepts in human-like cognition. Here we outline an architecture and development plan, together with some preliminary results, that offers a much more direct path to full Human-Level AI (HLAI)/ AGI.

摘要
“自从AGI（人工通用智能）的概念提出20年前，实际的进步不多。尽管这些年来的统计AI（如AlphaZero、ChatGPT和稳定扩散）实现了非常惊人的突破，但是这些项目都没有或宣称不会有明确的AGI路径。以实现AGI为目标，需要了解和识别人类智能的核心需求，从而决定需要哪些开发步骤，哪些则是骚扰。这种分析表明了需要以认知AI为主，而不是目前受欢迎的统计和生成方法。更 Specifically，它显示了概念在人类智能中的中心角色。以下是一个建筑和开发计划，以及一些先验性结果，它将提供一个许多更直接的人类水准AI（HLAI）/AGI道路。”Note: Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

DeViL: Decoding Vision features into Language

paper_url: http://arxiv.org/abs/2309.01617
repo_url: https://github.com/ExplainableML/DeViL
paper_authors: Meghal Dani, Isabel Rio-Torto, Stephan Alaniz, Zeynep Akata
for: 本研究旨在提供深度神经网络决策过程的自然语言描述，尤其是对于视觉卷积网络的各层抽象特征。
methods: 我们提出的DeViL方法可以将视觉特征转换为自然语言描述，并不仅高亮特征位置，还生成了对应的文本描述。我们使用了 dropout 技术来进行验证，并使用了预训练的语言模型来生成文本描述。
results: DeViL方法可以生成与图像内容相关的自然语言描述，并且在 CC3M dataset 上超越了先前的轻量级captioning模型，以及描述了视觉模型中学习的概念。此外，DeViL 还在 MILANNOTATIONS dataset 上超越了当前的 neuron-wise 描述模型。

Abstract
Post-hoc explanation methods have often been criticised for abstracting away the decision-making process of deep neural networks. In this work, we would like to provide natural language descriptions for what different layers of a vision backbone have learned. Our DeViL method decodes vision features into language, not only highlighting the attribution locations but also generating textual descriptions of visual features at different layers of the network. We train a transformer network to translate individual image features of any vision layer into a prompt that a separate off-the-shelf language model decodes into natural language. By employing dropout both per-layer and per-spatial-location, our model can generalize training on image-text pairs to generate localized explanations. As it uses a pre-trained language model, our approach is fast to train, can be applied to any vision backbone, and produces textual descriptions at different layers of the vision network. Moreover, DeViL can create open-vocabulary attribution maps corresponding to words or phrases even outside the training scope of the vision model. We demonstrate that DeViL generates textual descriptions relevant to the image content on CC3M surpassing previous lightweight captioning models and attribution maps uncovering the learned concepts of the vision backbone. Finally, we show DeViL also outperforms the current state-of-the-art on the neuron-wise descriptions of the MILANNOTATIONS dataset. Code available at https://github.com/ExplainableML/DeViL

摘要
各自使用dropout both per-layer和per-spatial-location，我们的DeViL方法可以进行区域化解释。我们的方法通过将视觉特征翻译成语言提示，然后使用一个独立的语言模型解码成自然语言描述。我们的模型可以快速训练，可以应用于任何视觉后处理器，并且在不同层次上生成文本描述。此外，DeViL还可以生成对于训练词汇外的开 vocabulary扩展映射。我们示示了DeViL可以在CC3M上生成相关的图像内容的文本描述，并且在MILANNOTATIONS数据集中神经元级别的描述也超过了当前状态的最佳性能。代码可以在https://github.com/ExplainableML/DeViL上获取。

Deep Learning Overloaded Vehicle Identification for Long Span Bridges Based on Structural Health Monitoring Data

paper_url: http://arxiv.org/abs/2309.01593
repo_url: None
paper_authors: Yuqin Li, Jun Liu, Shengliang Zhong, Licheng Zhou, Shoubin Dong, Zejia Liu, Liqun Tang
for: 这个研究是为了找出长 Span 桥梁上的过载车辆，使用结构健康监控数据进行过载车辆识别。
methods: 本研究提出了一个深度学习基本的过载车辆识别方法（DOVI），使用时间卷网络架构对输入序列数据进行抽象，提供了一个端到端的过载车辆识别解决方案，不需要影响线或 velocity 和车辆底盘信息。
results: результа显示，提出的深度学习过载车辆识别方法比其他机器学习和深度学习方法更有效和更坚固，可以在多辆车辆下进行运行。

Abstract
Overloaded vehicles bring great harm to transportation infrastructures. BWIM (bridge weigh-in-motion) method for overloaded vehicle identification is getting more popular because it can be implemented without interruption to the traffic. However, its application is still limited because its effectiveness largely depends on professional knowledge and extra information, and is susceptible to occurrence of multiple vehicles. In this paper, a deep learning based overloaded vehicle identification approach (DOVI) is proposed, with the purpose of overloaded vehicle identification for long-span bridges by the use of structural health monitoring data. The proposed DOVI model uses temporal convolutional architectures to extract the spatial and temporal features of the input sequence data, thus provides an end-to-end overloaded vehicle identification solution which neither needs the influence line nor needs to obtain velocity and wheelbase information in advance and can be applied under the occurrence of multiple vehicles. Model evaluations are conducted on a simply supported beam and a long-span cable-stayed bridge under random traffic flow. Results demonstrate that the proposed deep-learning overloaded vehicle identification approach has better effectiveness and robustness, compared with other machine learning and deep learning approaches.

摘要
拥载过重车辆对交通基础设施造成严重损害。BWIM（桥上量测方法）方法在过重车辆标识方面获得更多的应用，因为它不需要中断交通。然而，它的应用仍然受限，因为它的效果受职业知识和附加信息的影响，并且容易发生多辆车辆的情况。在本文中，一种基于深度学习的过重车辆标识方法（DOVI）被提出，用于长链桥上的过重车辆标识，通过使用结构健康监测数据。提出的DOVI模型使用时间卷积架构提取输入序列数据的空间和时间特征，因此提供了一个终端到终点的过重车辆标识解决方案，无需影响线 nor 需要先知道速度和车辆跑道信息。模型评估在简支桥和长链悬臂桥上进行随机交通流下。结果表明，提出的深度学习过重车辆标识方法比其他机器学习和深度学习方法更有效和更坚定。

Les Houches Lectures on Deep Learning at Large & Infinite Width

paper_url: http://arxiv.org/abs/2309.01592
repo_url: None
paper_authors: Yasaman Bahri, Boris Hanin, Antonin Brossollet, Vittorio Erba, Christian Keup, Rosalba Pacelli, James B. Simon
for: 这些讲座主要关注深度神经网络的无穷宽限和大宽限的特性。
methods: 讲座涉及到深度神经网络的Random化、训练后的连接关系、线性模型、kernels和Gaussian Processes等方面。
results: 讲座讨论了无穷宽限下的深度神经网络的统计和动力学性质，以及训练后的大宽限网络的非平衡和平衡情况。

Abstract
These lectures, presented at the 2022 Les Houches Summer School on Statistical Physics and Machine Learning, focus on the infinite-width limit and large-width regime of deep neural networks. Topics covered include various statistical and dynamical properties of these networks. In particular, the lecturers discuss properties of random deep neural networks; connections between trained deep neural networks, linear models, kernels, and Gaussian processes that arise in the infinite-width limit; and perturbative and non-perturbative treatments of large but finite-width networks, at initialization and after training.

摘要
这些讲座，发生在2022年的勒舍夏学院深度学习和机器学习讲座，关注深度神经网络的无限宽限和大宽限。讲座讨论了各种统计和动力学性质，包括随机深度神经网络、训练后深度神经网络与线性模型、核函数和高斯过程之间的连接。此外，讲座还讨论了大宽度网络的非短程和短程训练初始化。

Rail Crack Propagation Forecasting Using Multi-horizons RNNs

paper_url: http://arxiv.org/abs/2309.01569
repo_url: None
paper_authors: Sara Yasmine Ouerk, Olivier Vo Van, Mouadh Yagoubi
for: 预测铁路裂隙长度的扩展，以便维护和评估材料和结构的安全性。
methods: 使用机器学习技术，特别是循环神经网络（RNN），来预测时间序列数据。
results: 比较state-of-the-art模型（LSTM和GRU），多个 horizons 模型表现出色，可以更好地预测铁路裂隙长度的扩展。

Abstract
The prediction of rail crack length propagation plays a crucial role in the maintenance and safety assessment of materials and structures. Traditional methods rely on physical models and empirical equations such as Paris law, which often have limitations in capturing the complex nature of crack growth. In recent years, machine learning techniques, particularly Recurrent Neural Networks (RNNs), have emerged as promising methods for time series forecasting. They allow to model time series data, and to incorporate exogenous variables into the model. The proposed approach involves collecting real data on the French rail network that includes historical crack length measurements, along with relevant exogenous factors that may influence crack growth. First, a pre-processing phase was performed to prepare a consistent data set for learning. Then, a suitable Bayesian multi-horizons recurrent architecture was designed to model the crack propagation phenomenon. Obtained results show that the Multi-horizons model outperforms state-of-the-art models such as LSTM and GRU.

摘要
预测铁路裂口长度的传播 игра着关键的角色在材料和结构的维护和安全评估中。传统方法通常基于物理模型和实验方程如巴黎法律，它们经常无法捕捉裂口增长的复杂性。在最近几年，机器学习技术特别是循环神经网络（RNN）已经出现为时间序列预测的有力方法。它允许模拟时间序列数据，并将外生变量纳入模型中。该方法中的提议包括收集法国铁路网络的历史裂口长度测量数据，以及可能影响裂口增长的相关外生因素。首先，一个预处理阶段进行了数据集的准备，以便学习。然后，一种适合的多个镜像感知架构被设计来模拟裂口增长现象。实验结果表明，多镜像模型在LSTM和GRU模型之上表现出了优异的性能。

OutRank: Speeding up AutoML-based Model Search for Large Sparse Data sets with Cardinality-aware Feature Ranking

paper_url: http://arxiv.org/abs/2309.01552
repo_url: https://github.com/outbrain/outrank
paper_authors: Blaž Škrlj, Blaž Mramor
for: 这种论文主要用于提高现代推荐系统的设计，以便更好地解决推荐任务。
methods: 本论文使用了一种称为OutRank的系统，用于精细地排序特征和数据质量相关的异常检测。OutRank使用了一种基于分类数据的变体，即对同类特征噪声进行Normalizaation，以便更好地发现有用的信号。此外，该方法还 incorporates 特征相似性和合并相关性信息。
results: 作者们在一个 synthetic 数据集上证明了OutRank的可行性，并在一个真实的点击率预测数据集上比Random Forest-based approaches表现出色，得到了更好的结果。OutRank可以探索更大的特征空间，达到300%更大的特征空间，从而更快地找到更好的模型。

Abstract
The design of modern recommender systems relies on understanding which parts of the feature space are relevant for solving a given recommendation task. However, real-world data sets in this domain are often characterized by their large size, sparsity, and noise, making it challenging to identify meaningful signals. Feature ranking represents an efficient branch of algorithms that can help address these challenges by identifying the most informative features and facilitating the automated search for more compact and better-performing models (AutoML). We introduce OutRank, a system for versatile feature ranking and data quality-related anomaly detection. OutRank was built with categorical data in mind, utilizing a variant of mutual information that is normalized with regard to the noise produced by features of the same cardinality. We further extend the similarity measure by incorporating information on feature similarity and combined relevance. The proposed approach's feasibility is demonstrated by speeding up the state-of-the-art AutoML system on a synthetic data set with no performance loss. Furthermore, we considered a real-life click-through-rate prediction data set where it outperformed strong baselines such as random forest-based approaches. The proposed approach enables exploration of up to 300% larger feature spaces compared to AutoML-only approaches, enabling faster search for better models on off-the-shelf hardware.

摘要
现代推荐系统的设计需要了解哪些特征空间中的特征是解决某个推荐任务的关键。然而，实际世界数据集经常具有大量数据、稀疏性和噪声等特点，使得找到有意义的信号变得困难。特征排名算法可以帮助解决这些挑战，通过识别最有用的特征并实现自动化模型搜索（AutoML）。我们介绍了一个名为OutRank的系统，用于多样化特征排名和数据质量相关异常检测。OutRank采用了分类数据的视角，使用一种对特征噪声产生的减法正则化的相互信息变体。我们进一步扩展了相互信息，通过 integrate feature similarity和共同相关性信息。我们的方法的可行性被证明通过加速现场AutoML系统的速度，而无损失性。此外，我们考虑了一个真实的点击率预测数据集，其中OutRank超过了强基线方法，如随机森林方法。我们的方法可以探索到300%更大的特征空间，比AutoML-只的方法更快地寻找更好的模型，在准备的硬件上。

ChatRule: Mining Logical Rules with Large Language Models for Knowledge Graph Reasoning

paper_url: http://arxiv.org/abs/2309.01538
repo_url: None
paper_authors: Linhao Luo, Jiaxin Ju, Bo Xiong, Yuan-Fang Li, Gholamreza Haffari, Shirui Pan
for: mines logical rules over knowledge graphs (KGs) to improve reasoning performance and provide interpretable results.
methods: uses large language models (LLMs) to generate logical rules, leveraging both the semantic and structural information of KGs, and incorporates facts from existing KGs to refine the generated rules.
results: evaluates the effectiveness and scalability of the proposed method on four large-scale KGs, showing impressive performance and outperforming existing methods.

Abstract
Logical rules are essential for uncovering the logical connections between relations, which could improve the reasoning performance and provide interpretable results on knowledge graphs (KGs). Although there have been many efforts to mine meaningful logical rules over KGs, existing methods suffer from the computationally intensive searches over the rule space and a lack of scalability for large-scale KGs. Besides, they often ignore the semantics of relations which is crucial for uncovering logical connections. Recently, large language models (LLMs) have shown impressive performance in the field of natural language processing and various applications, owing to their emergent ability and generalizability. In this paper, we propose a novel framework, ChatRule, unleashing the power of large language models for mining logical rules over knowledge graphs. Specifically, the framework is initiated with an LLM-based rule generator, leveraging both the semantic and structural information of KGs to prompt LLMs to generate logical rules. To refine the generated rules, a rule ranking module estimates the rule quality by incorporating facts from existing KGs. Last, a rule validator harnesses the reasoning ability of LLMs to validate the logical correctness of ranked rules through chain-of-thought reasoning. ChatRule is evaluated on four large-scale KGs, w.r.t. different rule quality metrics and downstream tasks, showing the effectiveness and scalability of our method.

摘要
<>使用逻辑规则可以探索知识图（KG）中的逻辑连接关系，提高推理性能并提供可读写的结果。 although there have been many efforts to mine meaningful logical rules over KGs, existing methods suffer from computationally intensive searches over the rule space and a lack of scalability for large-scale KGs. besides, they often ignore the semantics of relations which is crucial for uncovering logical connections. Recently, large language models (LLMs) have shown impressive performance in the field of natural language processing and various applications, owing to their emergent ability and generalizability. In this paper, we propose a novel framework, ChatRule, unleashing the power of large language models for mining logical rules over knowledge graphs. Specifically, the framework is initiated with an LLM-based rule generator, leveraging both the semantic and structural information of KGs to prompt LLMs to generate logical rules. To refine the generated rules, a rule ranking module estimates the rule quality by incorporating facts from existing KGs. Last, a rule validator harnesses the reasoning ability of LLMs to validate the logical correctness of ranked rules through chain-of-thought reasoning. ChatRule is evaluated on four large-scale KGs, w.r.t. different rule quality metrics and downstream tasks, showing the effectiveness and scalability of our method.<>Here's the translation in Simplified Chinese:使用逻辑规则可以探索知识图（KG）中的逻辑连接关系，提高推理性能并提供可读写的结果。 although there have been many efforts to mine meaningful logical rules over KGs, existing methods suffer from computationally intensive searches over the rule space and a lack of scalability for large-scale KGs. besides, they often ignore the semantics of relations which is crucial for uncovering logical connections. Recently, large language models (LLMs) have shown impressive performance in the field of natural language processing and various applications, owing to their emergent ability and generalizability. In this paper, we propose a novel framework, ChatRule, unleashing the power of large language models for mining logical rules over knowledge graphs. Specifically, the framework is initiated with an LLM-based rule generator, leveraging both the semantic and structural information of KGs to prompt LLMs to generate logical rules. To refine the generated rules, a rule ranking module estimates the rule quality by incorporating facts from existing KGs. Last, a rule validator harnesses the reasoning ability of LLMs to validate the logical correctness of ranked rules through chain-of-thought reasoning. ChatRule is evaluated on four large-scale KGs, w.r.t. different rule quality metrics and downstream tasks, showing the effectiveness and scalability of our method.

Are We Using Autoencoders in a Wrong Way?

paper_url: http://arxiv.org/abs/2309.01532
repo_url: https://github.com/GabMartino/icrst_trst_autoencoder
paper_authors: Gabriele Martino, Davide Moroni, Massimo Martinelli
for: This paper is written for revisiting the standard training for undercomplete autoencoders, specifically by modifying the shape of the latent space without using any explicit regularization term in the loss function.
methods: The paper uses the standard training for undercomplete autoencoders, but with a modified shape of the latent space. The model is trained to reconstruct not the same observation in input, but another one sampled from the same class distribution.
results: The paper explores the behavior of the latent space in the case of reconstruction of a random sample from the whole dataset.Here’s the same information in Simplified Chinese text:
for: 这篇论文是为了重新评估标准的半完全自动编码器训练方法，具体来说是通过不使用任何显式正则化项来修改半完全自动编码器的幂值空间的形态。
methods: 这篇论文使用标准的半完全自动编码器训练方法，但是半完全自动编码器的幂值空间被修改了。模型被训练以重建不同的输入观测值，而不是原始的输入观测值。
results: 这篇论文探索了随机从整个数据集中采样的整个数据集的行为。

Abstract
Autoencoders are certainly among the most studied and used Deep Learning models: the idea behind them is to train a model in order to reconstruct the same input data. The peculiarity of these models is to compress the information through a bottleneck, creating what is called Latent Space. Autoencoders are generally used for dimensionality reduction, anomaly detection and feature extraction. These models have been extensively studied and updated, given their high simplicity and power. Examples are (i) the Denoising Autoencoder, where the model is trained to reconstruct an image from a noisy one; (ii) Sparse Autoencoder, where the bottleneck is created by a regularization term in the loss function; (iii) Variational Autoencoder, where the latent space is used to generate new consistent data. In this article, we revisited the standard training for the undercomplete Autoencoder modifying the shape of the latent space without using any explicit regularization term in the loss function. We forced the model to reconstruct not the same observation in input, but another one sampled from the same class distribution. We also explored the behaviour of the latent space in the case of reconstruction of a random sample from the whole dataset.

摘要
自然语言处理中的Autoencoder是非常常用的深度学习模型之一，其核心思想是通过训练模型来重建输入数据。Autoencoder模型具有压缩信息的特点，创造了所谓的缓存空间（Latent Space）。这些模型通常用于维度减少、异常检测和特征提取。这些模型已经得到了广泛的研究和更新，因为它们的简单性和力量。例如，（i）噪声Autoencoder，其中模型通过噪声图像重建原始图像；（ii）稀疏Autoencoder，其中瓶颈是通过惩罚项来创造的；（iii）变量Autoencoder，其中缓存空间用于生成新的一致性数据。在这篇文章中，我们重新训练了含有缺失的Autoencoder模型，不使用任何显式的惩罚项在损失函数中。我们让模型重建不同的输入数据，而不是原始输入数据。我们还探索了缓存空间在重建整个数据集中的行为。

paper_url: http://arxiv.org/abs/2309.01516
repo_url: https://github.com/longkukuhi/multiway-adapter
paper_authors: Zijun Long, George Killick, Richard McCreadie, Gerardo Aragon Camarasa
for: 这个研究旨在解决大型多 modal 模型（LMMs）的适应问题，尤其是在新任务上进行高效的适应和传播知识。
methods: 我们提出了一个创新的框架，名为 Multiway-Adapter，它包括一个“对齐增强器”，用于深入对齐不同模式之间的知识。这个方法增加了LMMs中的 fewer than 1.25% 的额外参数，并在零基eline image-text搜寻中表现出色，而且可以降低 fine-tuning 时间达57%。
results: 我们的方法可以实现LMMs的资源有效适应和传播知识，并且可以提高零基eline image-text搜寻的性能，而且可以降低 fine-tuning 时间。

Abstract
As the size of Large Multi-Modal Models (LMMs) increases consistently, the adaptation of these pre-trained models to specialized tasks has become a computationally and memory-intensive challenge. Traditional fine-tuning methods require isolated, exhaustive retuning for each new task, limiting the models' versatility. Moreover, current efficient adaptation techniques often overlook modality alignment, focusing only on the knowledge extraction of new tasks. To tackle these issues, we introduce Multiway-Adapter, an innovative framework incorporating an 'Alignment Enhancer' to deepen modality alignment, enabling high transferability without tuning pre-trained parameters. Our method adds fewer than 1.25\% of additional parameters to LMMs, exemplified by the BEiT-3 model in our study. This leads to superior zero-shot image-text retrieval performance compared to fully fine-tuned models, while achieving up to a 57\% reduction in fine-tuning time. Our approach offers a resource-efficient and effective adaptation pathway for LMMs, broadening their applicability. The source code is publicly available at: \url{https://github.com/longkukuhi/MultiWay-Adapter}.

摘要
As the size of Large Multi-Modal Models (LMMs) increases consistently, the adaptation of these pre-trained models to specialized tasks has become a computationally and memory-intensive challenge. Traditional fine-tuning methods require isolated, exhaustive retuning for each new task, limiting the models' versatility. Moreover, current efficient adaptation techniques often overlook modality alignment, focusing only on the knowledge extraction of new tasks. To tackle these issues, we introduce Multiway-Adapter, an innovative framework incorporating an 'Alignment Enhancer' to deepen modality alignment, enabling high transferability without tuning pre-trained parameters. Our method adds fewer than 1.25% of additional parameters to LMMs, exemplified by the BEiT-3 model in our study. This leads to superior zero-shot image-text retrieval performance compared to fully fine-tuned models, while achieving up to a 57% reduction in fine-tuning time. Our approach offers a resource-efficient and effective adaptation pathway for LMMs, broadening their applicability. The source code is publicly available at: \url{https://github.com/longkukuhi/MultiWay-Adapter}.Here's the translation in Traditional Chinese:为了解决大型多modal模型（LMMs）的 Parameters 规模不断增加所带来的 computationally 和 memory-intensive 挑战，传统的 fine-tuning 方法通常需要隔离的、耗时的 retuning для each new task，限制模型的多样性。此外，现有的高效的 adaptation 技术 oft overlook modality alignment，专注于新任务的知识提取。为了解决这些问题，我们介绍 Multiway-Adapter，一个创新的框架，包括一个 'Alignment Enhancer'，以深化modal alignment，实现高转移性 без fine-tuning 预训练模型的 Parameters。我们的方法仅增加 LMMs 中的 fewer than 1.25% 的额外参数，例如 BEiT-3 模型在我们的研究中。这导致在 zero-shot 图像文本搜寻性能比完全 fine-tuned 模型更高，同时可以 achieve up to 57% 的 fine-tuning 时间减少。我们的方法提供了资源效率的和有效的 adaptation 通路 для LMMs，扩展其应用范围。source code 公开可用于：\url{https://github.com/longkukuhi/MultiWay-Adapter}.

RGI-Net: 3D Room Geometry Inference from Room Impulse Responses in the Absence of First-order Echoes

paper_url: http://arxiv.org/abs/2309.01513
repo_url: None
paper_authors: Inmo Yeon, Jung-Woo Choi
for: 这个论文是为了提出一种新的室内geometry推导方法，不需要先知道室内形状是 convex 的，也不需要知道墙的数量。
methods: 该方法使用了深度神经网络（DNN），称为 RGI-Net，可以从室内响应函数（RIR）中推导室内形状。 RGI-Net 学习和利用室内高阶反射的复杂关系，因此可以在非拥圆形室内或缺失首际反射情况下估计室内形状。
results: 该方法可以在实际场景中应用，只需要使用一个圆形 Mikrofon 阵列和一个单个扬声器，可以大幅提高实用性。 RGI-Net 还包括评估网络，可以分别评估墙的存在概率，因此不需要先知道墙的数量。

Abstract
Room geometry is important prior information for implementing realistic 3D audio rendering. For this reason, various room geometry inference (RGI) methods have been developed by utilizing the time of arrival (TOA) or time difference of arrival (TDOA) information in room impulse responses. However, the conventional RGI technique poses several assumptions, such as convex room shapes, the number of walls known in priori, and the visibility of first-order reflections. In this work, we introduce the deep neural network (DNN), RGI-Net, which can estimate room geometries without the aforementioned assumptions. RGI-Net learns and exploits complex relationships between high-order reflections in room impulse responses (RIRs) and, thus, can estimate room shapes even when the shape is non-convex or first-order reflections are missing in the RIRs. The network takes RIRs measured from a compact audio device equipped with a circular microphone array and a single loudspeaker, which greatly improves its practical applicability. RGI-Net includes the evaluation network that separately evaluates the presence probability of walls, so the geometry inference is possible without prior knowledge of the number of walls.

摘要

Neural Vector Fields: Generalizing Distance Vector Fields by Codebooks and Zero-Curl Regularization

paper_url: http://arxiv.org/abs/2309.01512
repo_url: None
paper_authors: Xianghui Yang, Guosheng Lin, Zhenghao Chen, Luping Zhou
for: 提出了一种新的3D表示方法，即神经Vector Fields（NVF），可以同时优化表示度和拟合精度。
methods: 该方法采用直接预测表面上的变换，而不是通过网络导数来获取方向场，从而解决了表面EXTRACTION的问题。此外，该方法还提出了两种形状代码库，即NVF（Lite或Ultra），以促进跨类重建。
results: 在四个表面重建场景中，NVF（Ultra）表现出色，包括水密vs非水密形状、Category-agnostic重建vs Category-unseen重建、Category-specific重建和跨域重建。

Abstract
Recent neural networks based surface reconstruction can be roughly divided into two categories, one warping templates explicitly and the other representing 3D surfaces implicitly. To enjoy the advantages of both, we propose a novel 3D representation, Neural Vector Fields (NVF), which adopts the explicit learning process to manipulate meshes and implicit unsigned distance function (UDF) representation to break the barriers in resolution and topology. This is achieved by directly predicting the displacements from surface queries and modeling shapes as Vector Fields, rather than relying on network differentiation to obtain direction fields as most existing UDF-based methods do. In this way, our approach is capable of encoding both the distance and the direction fields so that the calculation of direction fields is differentiation-free, circumventing the non-trivial surface extraction step. Furthermore, building upon NVFs, we propose to incorporate two types of shape codebooks, \ie, NVFs (Lite or Ultra), to promote cross-category reconstruction through encoding cross-object priors. Moreover, we propose a new regularization based on analyzing the zero-curl property of NVFs, and implement this through the fully differentiable framework of our NVF (ultra). We evaluate both NVFs on four surface reconstruction scenarios, including watertight vs non-watertight shapes, category-agnostic reconstruction vs category-unseen reconstruction, category-specific, and cross-domain reconstruction.

摘要
最近的神经网络基于表面重建可以大致分为两类，一是显式填充模板，另一是表示3D表面的隐式表示。为了利用这两者的优点，我们提出了一种新的3D表示方法，即神经向量场（NVF），它采用显式学习过程来操纵网格和隐式无符号距离函数（UDF）表示来突破分辨率和结构的限制。在这种方式下，我们直接预测表面上的变位异移，而不是通过网络导数来获取方向场，这样我们可以同时编码距离场和方向场，从而避免了不rivial的表面提取步骤。此外，我们在NVF的基础上，提出了两种形状码库，即NVF（Lite或Ultra），以便通过编码跨物类约束来促进跨类重建。此外，我们还提出了一种基于NVF的零核性分析 regularization，并通过我们的完全导数可 differentiable 框架来实现。我们在四个表面重建场景中评估了NVF，包括非水平 shapes、类型不可知的重建、类型特定重建和跨领域重建。

Memory Efficient Optimizers with 4-bit States

paper_url: http://arxiv.org/abs/2309.01507
repo_url: https://github.com/thu-ml/low-bit-optimizers
paper_authors: Bingrui Li, Jianfei Chen, Jun Zhu
for: 降低各种神经网络训练中的内存占用，使得训练模型在给定内存预算内可以达到最大训练模型。
methods: 通过对首 moments和次 moments进行详细的实验分析，下降优化器状态的有效位数至4位。特别是，我们发现了outsider pattern，现有的块 wise quantization无法准确地近似。我们使用更小的块大小，并使用行 wise和列 wise信息进行更好的量化。此外，我们还解决了量化第二 moment的零点问题，使用了 exclude 零点的直线量化器。
results: 我们在各种 benchmark 上评估了我们的4位优化器，包括自然语言理解、机器翻译、图像分类和指令调整。在所有任务上，我们的优化器可以与其全精度对手相当，同时具有更好的内存效率。

Abstract
Optimizer states are a major source of memory consumption for training neural networks, limiting the maximum trainable model within given memory budget. Compressing the optimizer states from 32-bit floating points to lower bitwidth is promising to reduce the training memory footprint, while the current lowest achievable bitwidth is 8-bit. In this work, we push optimizer states bitwidth down to 4-bit through a detailed empirical analysis of first and second moments. Specifically, we find that moments have complicated outlier patterns, that current block-wise quantization cannot accurately approximate. We use a smaller block size and propose to utilize both row-wise and column-wise information for better quantization. We further identify a zero point problem of quantizing the second moment, and solve this problem with a linear quantizer that excludes the zero point. Our 4-bit optimizer is evaluated on a wide variety of benchmarks including natural language understanding, machine translation, image classification, and instruction tuning. On all the tasks our optimizers can achieve comparable accuracy with their full-precision counterparts, while enjoying better memory efficiency.

摘要

BadSQA: Stealthy Backdoor Attacks Using Presence Events as Triggers in Non-Intrusive Speech Quality Assessment

paper_url: http://arxiv.org/abs/2309.01480
repo_url: None
paper_authors: Ying Ren, Kailai Shen, Zhe Ye, Diqun Yan
for: 本研究旨在攻击非侵入式语音质量评估（NISQA）系统，以实现高度隐蔽的攻击。
methods: 本研究提出了一种基于存在事件的新型后门攻击方法，可以在NISQA任务中实现高度隐蔽的攻击。
results: 实验结果表明，提出的后门攻击方法在四个基准数据集和两个现状最佳NISQA模型下，可以 достиieves an average attack success rate of up to 99% with a poisoning rate of only 3%.

Abstract
Non-Intrusive speech quality assessment (NISQA) has gained significant attention for predicting the mean opinion score (MOS) of speech without requiring the reference speech. In practical NISQA scenarios, untrusted third-party resources are often employed during deep neural network training to reduce costs. However, it would introduce a potential security vulnerability as specially designed untrusted resources can launch backdoor attacks against NISQA systems. Existing backdoor attacks primarily focus on classification tasks and are not directly applicable to NISQA which is a regression task. In this paper, we propose a novel backdoor attack on NISQA tasks, leveraging presence events as triggers to achieving highly stealthy attacks. To evaluate the effectiveness of our proposed approach, we conducted experiments on four benchmark datasets and employed two state-of-the-art NISQA models. The results demonstrate that the proposed backdoor attack achieved an average attack success rate of up to 99% with a poisoning rate of only 3%.

摘要

Pure Monte Carlo Counterfactual Regret Minimization

paper_url: http://arxiv.org/abs/2309.03084
repo_url: None
paper_authors: Ju Qi, Ting Feng, Falun Hei, Zhemei Fang, Yunfeng Luo
for: solves large-scale incomplete information games
methods: builds upon CFR and Fictitious Play, combines counterfactual regret and best response strategy
results: achieves better performance, reduces time and space complexity, and converges faster than MCCFR with a new warm-start algorithm

Abstract
Counterfactual Regret Minimization (CFR) and its variants are the best algorithms so far for solving large-scale incomplete information games. Building upon CFR, this paper proposes a new algorithm named Pure CFR (PCFR) for achieving better performance. PCFR can be seen as a combination of CFR and Fictitious Play (FP), inheriting the concept of counterfactual regret (value) from CFR, and using the best response strategy instead of the regret matching strategy for the next iteration. Our theoretical proof that PCFR can achieve Blackwell approachability enables PCFR's ability to combine with any CFR variant including Monte Carlo CFR (MCCFR). The resultant Pure MCCFR (PMCCFR) can significantly reduce time and space complexity. Particularly, the convergence speed of PMCCFR is at least three times more than that of MCCFR. In addition, since PMCCFR does not pass through the path of strictly dominated strategies, we developed a new warm-start algorithm inspired by the strictly dominated strategies elimination method. Consequently, the PMCCFR with new warm start algorithm can converge by two orders of magnitude faster than the CFR+ algorithm.

摘要
Counterfactual Regret Minimization (CFR) 和其变种是目前最佳的大规模不完整信息游戏解决方案。本文提出了一种新的算法名为纯Counterfactual Regret Minimization (PCFR)，以实现更好的性能。PCFR可以看作是CFR和虚拟玩家(FP)的组合，沿用CFR中的counterfactual regret（价值）概念，并使用下一轮的最佳回应策略而不是 regret matching 策略。我们提供了理论证明，表明PCFR可以实现黑尔方程（Blackwell）可接近性，这使得PCFR可以与任何CFR变种，包括Monte Carlo CFR (MCCFR)结合。结果是PMCCFR可以显著降低时间和空间复杂度。尤其是PMCCFR的快速收敛速度至少三倍于MCCFR。此外，由于PMCCFR不通过严格dominated策略的路径，我们开发了一种新的暖启动算法， drawing inspiration from the strictly dominated strategies elimination method。因此，PMCCFR with new warm start algorithm可以在CFR+算法 convergence speed two orders of magnitude faster。

Interactive Graph Convolutional Filtering

paper_url: http://arxiv.org/abs/2309.01453
repo_url: None
paper_authors: Jin Zhang, Defu Lian, Hong Xie, Yawen Li, Enhong Chen
for: 这 paper 的目的是提出一种解决交互推荐系统中的冷启动和数据稀缺问题的新方法。
methods: 这 paper 使用了一种基于图模型的交互卷积滤波器模型，并使用了变量推理技术来解决非线性模型的计算困难。它还使用了 bayesian 元学习方法来有效地解决冷启动问题，并 derive 了对方法的理论 regret 下界，以确保方法的稳定性。
results: experiments 表明，该方法在三个实际 dataset 上表现出色，比存在的基准模型更高。

Abstract
Interactive Recommender Systems (IRS) have been increasingly used in various domains, including personalized article recommendation, social media, and online advertising. However, IRS faces significant challenges in providing accurate recommendations under limited observations, especially in the context of interactive collaborative filtering. These problems are exacerbated by the cold start problem and data sparsity problem. Existing Multi-Armed Bandit methods, despite their carefully designed exploration strategies, often struggle to provide satisfactory results in the early stages due to the lack of interaction data. Furthermore, these methods are computationally intractable when applied to non-linear models, limiting their applicability. To address these challenges, we propose a novel method, the Interactive Graph Convolutional Filtering model. Our proposed method extends interactive collaborative filtering into the graph model to enhance the performance of collaborative filtering between users and items. We incorporate variational inference techniques to overcome the computational hurdles posed by non-linear models. Furthermore, we employ Bayesian meta-learning methods to effectively address the cold-start problem and derive theoretical regret bounds for our proposed method, ensuring a robust performance guarantee. Extensive experimental results on three real-world datasets validate our method and demonstrate its superiority over existing baselines.

摘要

Effective Multi-Graph Neural Networks for Illicit Account Detection on Cryptocurrency Transaction Networks

paper_url: http://arxiv.org/abs/2309.02460
repo_url: None
paper_authors: Zhihao Ding, Jieming Shi, Qing Li, Jiannong Cao
for: 本研究旨在检测 криптовалюencies 交易网络上的非法帐户，以防止normal用户 suffer 亏金额。
methods: 本文使用 DIAM 模型，该模型包括 Edge2Seq 模块和 Multigraph Discrepancy (MGD) 模块，自动学习有效的节点表示，并 capture 非法节点特征。
results: 对于 4 个大型 криптовалюencies 数据集（Bitcoin 和 Ethereum），DIAM 模型与 14 种现有解决方案进行比较，并 consistently 实现最佳性能，准确地检测非法帐户，而且高效。例如，在一个 Bitcoin 数据集上，DIAM 模型 achieve F1 score 96.55%，significantly higher than 最佳竞争者的 F1 score 83.92%。

Abstract
We study illicit account detection on transaction networks of cryptocurrencies that are increasi_testngly important in online financial markets. The surge of illicit activities on cryptocurrencies has resulted in billions of losses from normal users. Existing solutions either rely on tedious feature engineering to get handcrafted features, or are inadequate to fully utilize the rich semantics of cryptocurrency transaction data, and consequently, yield sub-optimal performance. In this paper, we formulate the illicit account detection problem as a classification task over directed multigraphs with edge attributes, and present DIAM, a novel multi-graph neural network model to effectively detect illicit accounts on large transaction networks. First, DIAM includes an Edge2Seq module that automatically learns effective node representations preserving intrinsic transaction patterns of parallel edges, by considering both edge attributes and directed edge sequence dependencies. Then utilizing the multigraph topology, DIAM employs a new Multigraph Discrepancy (MGD) module with a well-designed message passing mechanism to capture the discrepant features between normal and illicit nodes, supported by an attention mechanism. Assembling all techniques, DIAM is trained in an end-to-end manner. Extensive experiments, comparing against 14 existing solutions on 4 large cryptocurrency datasets of Bitcoin and Ethereum, demonstrate that DIAM consistently achieves the best performance to accurately detect illicit accounts, while being efficient. For instance, on a Bitcoin dataset with 20 million nodes and 203 million edges, DIAM achieves F1 score 96.55%, significantly higher than the F1 score 83.92% of the best competitor.

摘要
我们研究非法账户检测在 криптовалюencies 交易网络上，这些网络在在线金融市场中变得越来越重要。非法活动对于正常用户而言导致了亿万元的损失。现有的解决方案可以分为两类：一是 tedious 的特征工程来获取手工特征，二是不充分利用 криптовалюencies 交易数据的semantics，导致效果不佳。在这篇论文中，我们将非法账户检测问题定义为一个分类任务，并提出了一种多граaph神经网络模型（DIAM），可以有效地检测非法账户在大规模交易网络上。首先，DIAM包含一个 Edge2Seq 模块，可以自动学习有效的节点表示，保留交易 patrerns 的内在特征，通过考虑边Attributes和指向edge sequence dependencies。然后，DIAM使用一种新的多граaph不匹配度（MGD）模块，通过一种合理的消息传递机制，捕捉非法和正常节点之间的不同特征。最后，DIAM结合所有技术，在端到端方式进行训练。广泛的实验证明，对于4个大的 криптовалюencies dataset（Bitcoin和Ethereum），DIAM在14种现有解决方案中显示出了最高的性能，可以准确地检测非法账户，同时具有高效性。例如，在一个 Bitcoin dataset中，包含2000万个节点和203亿个边，DIAM的 F1 分数为96.55%，远高于最佳竞争者的 F1 分数83.92%。

paper_url: http://arxiv.org/abs/2309.01418
repo_url: None
paper_authors: Dan Mitrea, Viorica Chifu, Tudor Cioara, Ionut Anghel, Cristina Pop
for: 这篇论文的目的是提出一种基于hedonic game的P2P能源交易模型，以便在能源社区中帮助潜在的购买者和卖家之间进行能源交易，同时考虑社交因素。
methods: 该模型使用hedonic game理论来协调和合作，考虑了社交关系内的能源价格和社会偏好，并且通过了区块链技术来实现P2P能源交易。
results: 在一个实验中，该模型在能源社区中提高了总能源交易量的5%，并且在社交环境下帮助提高了能源交易量的10%，同时帮助实现了社区内的能源需求和供应的更好均衡。

Abstract
Lately, the energy communities have gained a lot of attention as they have the potential to significantly contribute to the resilience and flexibility of the energy system, facilitating widespread integration of intermittent renewable energy sources. Within these communities the prosumers can engage in peer-to-peer trading, fostering local collaborations and increasing awareness about energy usage and flexible consumption. However, even under these favorable conditions, prosumer engagement levels remain low, requiring trading mechanisms that are aligned with their social values and expectations. In this paper, we introduce an innovative hedonic game coordination and cooperation model for P2P energy trading among prosumers which considers the social relationships within an energy community to create energy coalitions and facilitate energy transactions among them. We defined a heuristic that optimizes the prosumers coalitions, considering their social and energy price preferences and balancing the energy demand and supply within the community. We integrated the proposed hedonic game model into a state-of-the-art blockchain-based P2P energy flexibility market and evaluated its performance within an energy community of prosumers. The evaluation results on a blockchain-based P2P energy flexibility market show the effectiveness in considering social factors when creating coalitions, increasing the total amount of energy transacted in a market session by 5% compared with other game theory-based solutions. Finally, it shows the importance of the social dimensions of P2P energy transactions, the positive social dynamics in the energy community increasing the amount of energy transacted by more than 10% while contributing to a more balanced energy demand and supply within the community.

摘要
近些时间，能源社区获得了很多关注，因为它们可以为能源系统的可持续性和灵活性做出重要贡献，激发广泛的可变性可再生能源源泉的Integration。在这些社区中，潜在消费者（prosumer）可以进行Peer-to-Peer（P2P）贸易，促进本地合作和提高能源消耗和灵活消耗的认识。然而，即使在这些有利条件下，潜在消费者参与度仍然低，需要与他们的社会价值观和期望相匹配的交易机制。在这篇论文中，我们介绍了一种创新的 Hedonic Game 协调和合作模型，用于P2P能源贸易中潜在消费者之间的协作。我们定义了一个启发函数，用于优化潜在消费者的协会，考虑其社会和能源价格偏好，并均衡能源需求和供应在社区内。我们将该模型集成到了一个国际领先的区块链技术基础的P2P能源灵活市场中，并对其在能源社区中的潜在消费者进行评估。评估结果表明，考虑社会因素时创建协会可以提高P2P能源贸易市场Session中的总能源交易量，比其他Game theory基础的解决方案提高5%。此外，它还表明了社会维度上的P2P能源贸易的重要性，通过提高能源社区内的能源交易量高于10%，同时为能源需求和供应做出更好的均衡。

Towards frugal unsupervised detection of subtle abnormalities in medical imaging

paper_url: http://arxiv.org/abs/2309.02458
repo_url: https://github.com/geoffroyo/onlineem
paper_authors: Geoffroy Oudoumanessah, Carole Lartizien, Michel Dojat, Florence Forbes
for: 这篇论文的目的是提出一种基于混合分布的无监督异常检测方法，来探析医疗影像中的异常现象。
methods: 这篇论文使用了混合分布的方法，包括mixtures of probability distributions，来进行异常检测。这种方法可以处理复杂的多个变量参考模型，并且具有较少的参数和较好的解释性。然而，标准的估计方法，如期望最大化算法，不适合处理大量数据，因为它们需要高度的内存使用。
results: 这篇论文的结果显示，使用混合分布的方法可以实现高度的异常检测精度，并且可以适应不同的医疗影像数据。实验结果显示，这种方法可以检测出 Parkinson 病患的脑部畸形，并且与 Hoehn 和 Yahr 病程 scales 相符。

Abstract
Anomaly detection in medical imaging is a challenging task in contexts where abnormalities are not annotated. This problem can be addressed through unsupervised anomaly detection (UAD) methods, which identify features that do not match with a reference model of normal profiles. Artificial neural networks have been extensively used for UAD but they do not generally achieve an optimal trade-o$\hookleftarrow$ between accuracy and computational demand. As an alternative, we investigate mixtures of probability distributions whose versatility has been widely recognized for a variety of data and tasks, while not requiring excessive design e$\hookleftarrow$ort or tuning. Their expressivity makes them good candidates to account for complex multivariate reference models. Their much smaller number of parameters makes them more amenable to interpretation and e cient learning. However, standard estimation procedures, such as the Expectation-Maximization algorithm, do not scale well to large data volumes as they require high memory usage. To address this issue, we propose to incrementally compute inferential quantities. This online approach is illustrated on the challenging detection of subtle abnormalities in MR brain scans for the follow-up of newly diagnosed Parkinsonian patients. The identified structural abnormalities are consistent with the disease progression, as accounted by the Hoehn and Yahr scale.

摘要
医学成像异常检测在没有标注异常的情况下是一个挑战。这个问题可以通过无监督异常检测（USD）方法解决，这些方法可以标识不符合参照模型的常见 Profile 中的特征。人工神经网络已经广泛应用于 USD，但它们通常不能达到最佳的准确性和计算成本之间的平衡。作为一个替代方案，我们调查混合概率分布的使用。这种混合分布的多样性使其适用于多种数据和任务，而不需要过度的设计或调整。它的表达能力使其成为质量异常检测中的好选择，但标准估计过程，如期望最大化算法，不适用于大量数据。为解决这个问题，我们提议逐步计算推理量。这种在线方法在对抗性检测MR brain scan中的轻微异常情况中得到了描述。已经标识出的结构异常与疾病进程相符，如根据豪恩-雅尔scale的评估。

Metric Learning for Projections Bias of Generalized Zero-shot Learning

paper_url: http://arxiv.org/abs/2309.01390
repo_url: None
paper_authors: Chong Zhang, Mingyu Jin, Qinkai Yu, Haochen Xue, Xiaobo Jin
for: 本研究旨在提高Generalized Zero-shot Learning（GZSL）模型的可靠性和效果，使其能够正确识别未经见过的类别。
methods: 本研究使用了Variational Autoencoder & Generative Adversarial Networks（VAEGAN）框架，并提出了一种新的参数化 Mahalanobis 距离表示法，以便在推理过程中减少偏见。同时，我们改进了VAEGAN 网络结构，以便使用两个分支来分别预测已经见过的样本和通过这个seen样本生成的未经见过的样本。
results: 我们在四个数据集上进行了广泛的评估，并证明了我们的方法在与状态方法相比有superiority。 codes 可以在https://anonymous.4open.science/r/111hxr 上获取。

Abstract
Generalized zero-shot learning models (GZSL) aim to recognize samples from seen or unseen classes using only samples from seen classes as training data. During inference, GZSL methods are often biased towards seen classes due to the visibility of seen class samples during training. Most current GZSL methods try to learn an accurate projection function (from visual space to semantic space) to avoid bias and ensure the effectiveness of GZSL methods. However, during inference, the computation of distance will be important when we classify the projection of any sample into its nearest class since we may learn a biased projection function in the model. In our work, we attempt to learn a parameterized Mahalanobis distance within the framework of VAEGAN (Variational Autoencoder \& Generative Adversarial Networks), where the weight matrix depends on the network's output. In particular, we improved the network structure of VAEGAN to leverage the discriminative models of two branches to separately predict the seen samples and the unseen samples generated by this seen one. We proposed a new loss function with two branches to help us learn the optimized Mahalanobis distance representation. Comprehensive evaluation benchmarks on four datasets demonstrate the superiority of our method over the state-of-the-art counterparts. Our codes are available at https://anonymous.4open.science/r/111hxr.

摘要
通用零shot学习模型（GZSL）目标是使用已知类样本进行训练，并在测试时recognize未知类样本。然而，大多数现有GZSL方法在测试时仍然受到已知类样本的影响，导致模型偏向已知类。为解决这个问题，现有的GZSL方法通常尝试学习一个准确的投影函数（从视觉空间到 semantic空间），以避免偏见和保证GZSL方法的效果。然而，在测试时，计算距离是非常重要的，因为我们可能会学习一个偏向的投影函数。在我们的工作中，我们尝试了在VAEGAN（variational autoencoder & generative adversarial networks）框架中学习一个参数化的马ха拉诺比斯距离。具体来说，我们改进了VAEGAN的网络结构，使其能够利用两个分支的探测模型来分别预测已知样本和由已知样本生成的未知样本。我们提出了一个新的两支loss函数，以帮助我们学习优化的马ха拉诺比斯距离表示。我们在四个数据集上进行了全面的评估，并证明了我们的方法在当前的状态革命性。我们的代码可以在https://anonymous.4open.science/r/111hxr中获取。

LoRA-like Calibration for Multimodal Deception Detection using ATSFace Data

paper_url: http://arxiv.org/abs/2309.01383
repo_url: None
paper_authors: Shun-Wen Hsiao, Cheng-Yuan Sun
for: 本研究旨在开发一种能够有效地检测人类视频中的谎言，并提供可读性的模型。
methods: 我们提出了一种注意力意识的神经网络模型，该模型通过综合评估视频、音频和文本特征，找到谎言的表征。我们还使用多模态融合策略，提高了准确率。
results: 我们在一个真实的评估数据集上实现了92%的准确率。此外，模型还可以显示视频中的注意力焦点，为检测谎言提供了有价值的信息。

Abstract
Recently, deception detection on human videos is an eye-catching techniques and can serve lots applications. AI model in this domain demonstrates the high accuracy, but AI tends to be a non-interpretable black box. We introduce an attention-aware neural network addressing challenges inherent in video data and deception dynamics. This model, through its continuous assessment of visual, audio, and text features, pinpoints deceptive cues. We employ a multimodal fusion strategy that enhances accuracy; our approach yields a 92\% accuracy rate on a real-life trial dataset. Most important of all, the model indicates the attention focus in the videos, providing valuable insights on deception cues. Hence, our method adeptly detects deceit and elucidates the underlying process. We further enriched our study with an experiment involving students answering questions either truthfully or deceitfully, resulting in a new dataset of 309 video clips, named ATSFace. Using this, we also introduced a calibration method, which is inspired by Low-Rank Adaptation (LoRA), to refine individual-based deception detection accuracy.

摘要

Memory augment is All You Need for image restoration

paper_url: http://arxiv.org/abs/2309.01377
repo_url: https://github.com/zhangbaijin/memorynet
paper_authors: Xiao Feng Zhang, Chao Chen Gu, Shan Ying Zhu
for: 本研究旨在提出一种基于三级层次记忆的图像恢复方法，以提高图像恢复性能。
methods: 该方法使用了一种名为MemoryNet的三级层次记忆层和对比学习策略，将样本分为正例、负例和实际三种样本，并通过对比学习来塑造学习的特征。
results: 实验表明，该方法在Derain/Deshadow/Deblur任务上能够提高图像恢复性能，并且在三个不同类型的质量畸变数据集上获得了显著的PSNR和SSIM提升，这是一种强有力的证明，表明恢复的图像是可见真实的。

Abstract
Image restoration is a low-level vision task, most CNN methods are designed as a black box, lacking transparency and internal aesthetics. Although some methods combining traditional optimization algorithms with DNNs have been proposed, they all have some limitations. In this paper, we propose a three-granularity memory layer and contrast learning named MemoryNet, specifically, dividing the samples into positive, negative, and actual three samples for contrastive learning, where the memory layer is able to preserve the deep features of the image and the contrastive learning converges the learned features to balance. Experiments on Derain/Deshadow/Deblur task demonstrate that these methods are effective in improving restoration performance. In addition, this paper's model obtains significant PSNR, SSIM gain on three datasets with different degradation types, which is a strong proof that the recovered images are perceptually realistic. The source code of MemoryNet can be obtained from https://github.com/zhangbaijin/MemoryNet

摘要
Image restoration 是一个低级视觉任务，大多数 CNN 方法都是黑盒子，缺乏透明度和内部美学。虽然一些将传统优化算法与 DNN 结合的方法有被提议，但它们都有一些限制。在这篇论文中，我们提出了三级别内存层和对比学习名为 MemoryNet，具体来说，将样本分为正样本、负样本和实际三个样本进行对比学习，内存层能够保留图像深度特征，对比学习使得学习的特征进行平衡。实验表明，这些方法可以提高修复性能。此外，这篇论文的模型在三个不同类型的损害数据集上获得了显著的 PSNR、SSIM 提升，这是一个强大的证明，修复的图像是有感知真实的。MemoryNet 的源代码可以从 GitHub 上获取：https://github.com/zhangbaijin/MemoryNet

ReOnto: A Neuro-Symbolic Approach for Biomedical Relation Extraction

paper_url: http://arxiv.org/abs/2309.01370
repo_url: https://github.com/kracr/reonto-relation-extraction
paper_authors: Monika Jain, Kuldeep Singh, Raghava Mutharaju
for: 这个研究旨在提高生物医学文本中关系提取 task 的性能，通过使用 neuromorphic 知识来解决生物医学关系的特殊性。
methods: 这种新的技术called ReOnto，使用图 neural network 获得句子表示，并利用公开 accessible ontology 作为先验知识来识别两个实体之间的句子关系。
results: 实验结果表明，使用符号知识从 ontology 与图 neural network 结合使用，可以超过所有基线（约3%）。

Abstract
Relation Extraction (RE) is the task of extracting semantic relationships between entities in a sentence and aligning them to relations defined in a vocabulary, which is generally in the form of a Knowledge Graph (KG) or an ontology. Various approaches have been proposed so far to address this task. However, applying these techniques to biomedical text often yields unsatisfactory results because it is hard to infer relations directly from sentences due to the nature of the biomedical relations. To address these issues, we present a novel technique called ReOnto, that makes use of neuro symbolic knowledge for the RE task. ReOnto employs a graph neural network to acquire the sentence representation and leverages publicly accessible ontologies as prior knowledge to identify the sentential relation between two entities. The approach involves extracting the relation path between the two entities from the ontology. We evaluate the effect of using symbolic knowledge from ontologies with graph neural networks. Experimental results on two public biomedical datasets, BioRel and ADE, show that our method outperforms all the baselines (approximately by 3\%).

摘要
relation extraction (RE) 是将 sentence 中 entities 之间的 semantic 关系提取出来，并将其与知识图（KG）或ontology 中定义的关系进行对应的任务。目前已经有很多方法提出来了。但是在生物医学文本中应用这些技术时，通常会得到不满足的结果，因为生物医学关系很难直接从句子中提取。为解决这些问题，我们提出了一种新的技术 called ReOnto，它利用 neurosymbolic 知识来进行 RE 任务。ReOnto 使用图ael 神经网络来获取句子表示，并利用公共可访问的 ontology 作为先验知识来确定句子中两个 entit 之间的关系。该方法包括从 ontology 中提取两个 entit 之间的关系路径。我们通过对 symbolic 知识和 graph neural networks 的结合效果进行实验，并在两个公共生物医学数据集（BioRel 和 ADE）上进行了评估。结果表明，我们的方法比所有基eline（约为 3%） superior。

Refined Temporal Pyramidal Compression-and-Amplification Transformer for 3D Human Pose Estimation

paper_url: http://arxiv.org/abs/2309.01365
repo_url: https://github.com/hbing-l/rtpca
paper_authors: Hanbing Liu, Wangmeng Xiang, Jun-Yan He, Zhi-Qi Cheng, Bin Luo, Yifeng Geng, Xuansong Xie
for: 优化3D人体pose预测器的精度和结构，基于transformer的Refined Temporal Pyramidal Compression-and-Amplification（RTPCA）模型。
methods: 利用时间维度，RTPCA模型通过Temporal Pyramidal Compression-and-Amplification（TPCA）块和 Cross-Layer Refinement（XLR）模块，提高了内块时间模型和 между块特征交互。TPCA块利用时间 pyramid 思想，强化关键和值表示能力，并从运动序列中提取空间 semantics。XLR模块通过不断交互查询、键和值，营养丰富的semantic表示。
results: 在Human3.6M、HumanEva-I和MPI-INF-3DHP测试集上达到了state-of-the-art result，与其他基于transformer的方法相比，具有较少的计算负担。

Abstract
Accurately estimating the 3D pose of humans in video sequences requires both accuracy and a well-structured architecture. With the success of transformers, we introduce the Refined Temporal Pyramidal Compression-and-Amplification (RTPCA) transformer. Exploiting the temporal dimension, RTPCA extends intra-block temporal modeling via its Temporal Pyramidal Compression-and-Amplification (TPCA) structure and refines inter-block feature interaction with a Cross-Layer Refinement (XLR) module. In particular, TPCA block exploits a temporal pyramid paradigm, reinforcing key and value representation capabilities and seamlessly extracting spatial semantics from motion sequences. We stitch these TPCA blocks with XLR that promotes rich semantic representation through continuous interaction of queries, keys, and values. This strategy embodies early-stage information with current flows, addressing typical deficits in detail and stability seen in other transformer-based methods. We demonstrate the effectiveness of RTPCA by achieving state-of-the-art results on Human3.6M, HumanEva-I, and MPI-INF-3DHP benchmarks with minimal computational overhead. The source code is available at https://github.com/hbing-l/RTPCA.

摘要
准确估计视频序列中人体3D姿势需要 Both accuracy和一个良好的架构。在Transformers的成功基础上，我们引入Refined Temporal Pyramidal Compression-and-Amplification（RTPCA）transformer。利用时间维度，RTPCA通过其Temporal Pyramidal Compression-and-Amplification（TPCA）结构进一步发挥 intra-block 时间模型化，并使用 Cross-Layer Refinement（XLR）模块来细化 inter-block 特征互动。特别是，TPCA块采用了时间PYRAMID思想，强化关键和值表示能力，并快速从运动序列中提取空间语义。我们将这些TPCA块与XLR相连，以便通过 queries、keys 和values之间的连续互动，实现丰富的semantic表示。这种策略既保留了早期信息，又与当前流量互动，解决了其他基于Transformer的方法中常见的缺失细节和稳定性问题。我们在Human3.6M、HumanEva-I和MPI-INF-3DHP benchmark上实现了state-of-the-art的结果，而且计算开销很小。代码可以在https://github.com/hbing-l/RTPCA中获取。

Self-driven Grounding: Large Language Model Agents with Automatical Language-aligned Skill Learning

paper_url: http://arxiv.org/abs/2309.01352
repo_url: None
paper_authors: Shaohui Peng, Xing Hu, Qi Yi, Rui Zhang, Jiaming Guo, Di Huang, Zikang Tian, Ruizhi Chen, Zidong Du, Qi Guo, Yunji Chen, Ling Li
for: 提高大语言模型在真实环境中的应用能力
methods: 自动提出子目标、与环境互动验证、自适应学习练习技能
results: 在知名的 instruktion following task 上比较出色的表现，与循证学习方法相当，但需要更少的示范数据，证明学习到的技能有效并证明了框架的可行性和效率。

Abstract
Large language models (LLMs) show their powerful automatic reasoning and planning capability with a wealth of semantic knowledge about the human world. However, the grounding problem still hinders the applications of LLMs in the real-world environment. Existing studies try to fine-tune the LLM or utilize pre-defined behavior APIs to bridge the LLMs and the environment, which not only costs huge human efforts to customize for every single task but also weakens the generality strengths of LLMs. To autonomously ground the LLM onto the environment, we proposed the Self-Driven Grounding (SDG) framework to automatically and progressively ground the LLM with self-driven skill learning. SDG first employs the LLM to propose the hypothesis of sub-goals to achieve tasks and then verify the feasibility of the hypothesis via interacting with the underlying environment. Once verified, SDG can then learn generalized skills with the guidance of these successfully grounded subgoals. These skills can be further utilized to accomplish more complex tasks which fail to pass the verification phase. Verified in the famous instruction following task set-BabyAI, SDG achieves comparable performance in the most challenging tasks compared with imitation learning methods that cost millions of demonstrations, proving the effectiveness of learned skills and showing the feasibility and efficiency of our framework.

摘要
SDG first uses the LLM to propose hypotheses of sub-goals to achieve tasks and then verifies the feasibility of these hypotheses by interacting with the underlying environment. Once verified, SDG can learn generalized skills with the guidance of these successfully grounded sub-goals. These skills can be used to accomplish more complex tasks that fail to pass the verification phase. In the famous instruction following task set-BabyAI, SDG achieves comparable performance in the most challenging tasks with millions of demonstrations, demonstrating the effectiveness of learned skills and the feasibility and efficiency of our framework.

UniSA: Unified Generative Framework for Sentiment Analysis

paper_url: http://arxiv.org/abs/2309.01339
repo_url: https://github.com/dawn0815/saeval-benchmark
paper_authors: Zaijing Li, Ting-En Lin, Yuchuan Wu, Meng Liu, Fengxiao Tang, Ming Zhao, Yongbin Li
for: 本研究旨在解决各种情感分析子任务之间的协调问题，提高多模态情感分析的性能。
methods: 该研究提出了一种任务特定提示方法，并 introduce了一种多模态生成框架 named UniSA，以及一个新的情感分析评价标准 benchmark。
results: 实验结果表明，UniSA在各种情感分析子任务中表现 Comparable 于现状况，并且在不同子任务之间具有良好的泛化能力。

Abstract
Sentiment analysis is a crucial task that aims to understand people's emotional states and predict emotional categories based on multimodal information. It consists of several subtasks, such as emotion recognition in conversation (ERC), aspect-based sentiment analysis (ABSA), and multimodal sentiment analysis (MSA). However, unifying all subtasks in sentiment analysis presents numerous challenges, including modality alignment, unified input/output forms, and dataset bias. To address these challenges, we propose a Task-Specific Prompt method to jointly model subtasks and introduce a multimodal generative framework called UniSA. Additionally, we organize the benchmark datasets of main subtasks into a new Sentiment Analysis Evaluation benchmark, SAEval. We design novel pre-training tasks and training methods to enable the model to learn generic sentiment knowledge among subtasks to improve the model's multimodal sentiment perception ability. Our experimental results show that UniSA performs comparably to the state-of-the-art on all subtasks and generalizes well to various subtasks in sentiment analysis.

摘要
（简化中文）情感分析是一项非常重要的任务，旨在理解人们的情感状态并根据多modal信息预测情感类别。它包括多个子任务，如对话中情感识别（ERC）、基于特征的情感分析（ABSA）和多modal情感分析（MSA）。然而，在情感分析中统一所有子任务存在许多挑战，包括模式匹配、统一输入/输出格式和数据集偏见。为了解决这些挑战，我们提出了任务特定提示方法，用于同时模型子任务，并引入了一个多modal生成框架called UniSA。此外，我们还将主要的 benchmark dataset组织成了一个新的情感分析评估 benchmark，称为 SAEval。我们还设计了新的预训练任务和训练方法，以便模型可以从多个子任务中学习通用的情感知识，提高模型的多modal情感感知能力。我们的实验结果显示，UniSA在所有子任务上表现相当于当前状态的顶尖水平，并且在不同的子任务中具有良好的通用性。

Learning for Interval Prediction of Electricity Demand: A Cluster-based Bootstrapping Approach

paper_url: http://arxiv.org/abs/2309.01336
repo_url: None
paper_authors: Rohit Dube, Natarajan Gautam, Amarnath Banerjee, Harsha Nagarajan
for: 这篇论文是为了提供一种基于差分bootstrap的日均电力需求Interval估计方法，以便在小聚合负荷setting中更好地管理运营。
methods: 该方法使用机器学习算法获取日均电力需求的点估计值，并使用这些点估计值和相应的差分来生成Interval估计值。具体来说，首先使用一种不supervised learning算法将日均电力需求数据分为类似的日期集合，然后将这些集合用于生成Interval估计值。
results: 该方法在使用实际电力需求数据进行评估时，与其他 bootstrap方法相比，能够更好地保持Interval估计值的准确性和稳定性。具体来说，该方法可以在不同的信任 интер val中提供更加精准的Interval估计值，并且可以避免因点估计值的偏差而导致的误差。

Abstract
Accurate predictions of electricity demands are necessary for managing operations in a small aggregation load setting like a Microgrid. Due to low aggregation, the electricity demands can be highly stochastic and point estimates would lead to inflated errors. Interval estimation in this scenario, would provide a range of values within which the future values might lie and helps quantify the errors around the point estimates. This paper introduces a residual bootstrap algorithm to generate interval estimates of day-ahead electricity demand. A machine learning algorithm is used to obtain the point estimates of electricity demand and respective residuals on the training set. The obtained residuals are stored in memory and the memory is further partitioned. Days with similar demand patterns are grouped in clusters using an unsupervised learning algorithm and these clusters are used to partition the memory. The point estimates for test day are used to find the closest cluster of similar days and the residuals are bootstrapped from the chosen cluster. This algorithm is evaluated on the real electricity demand data from EULR(End Use Load Research) and is compared to other bootstrapping methods for varying confidence intervals.

摘要
准确的电力需求预测是微型电网运营管理中必要的。由于低聚合，电力需求具有高度抽象和点估计会带来膨胀的错误。间隔估计在这种情况下，可以提供未来值的范围，并帮助量化估计错误。本文介绍了剩余 Bootstrap 算法，用于生成间隔估计的日前电力需求。一个机器学习算法用于获取电力需求的点估计和相应的偏差在训练集上。获取的偏差被存储在内存中，并将内存进一步分区。根据类似的需求模式，天数被分组到 clusters 中使用不监督学习算法。测试日点估计用于找到最接近的 cluster，并从选择的 cluster 中 bootstrapping 偏差。这种算法在 EULR（End Use Load Research）实际电力需求数据上进行了评估，并与其他各种各样的启动方法进行了比较。

Can I Trust Your Answer? Visually Grounded Video Question Answering

paper_url: http://arxiv.org/abs/2309.01327
repo_url: https://github.com/doc-doc/next-gqa
paper_authors: Junbin Xiao, Angela Yao, Yicong Li, Tat Seng Chua
for: 这个论文旨在探讨利用预处理技术来提高视频语言理解的趋势，具体来说是考虑视频语言模型（VLMs）能够回答问题并同时提供视觉证据，以确定这些技术的预测是否真正受到视频内容的支持，而不是语言或视觉上的偶合关系。
methods: 作者提出了NExT-GQA数据集，用于检验当今最佳的VLMs。通过后期注意力分析，发现这些模型尚未能够坚持回答的根据，这表明这些模型的预测不可靠。为此，作者提出了一种视频定位机制，包括 Gaussian mask 优化和跨模态学习。
results: 作者的实验表明，这种定位机制可以提高视频定位和回答。不同的后端模型的实验结果也表明，这种定位机制可以提高视频定位和回答的可靠性。

Abstract
We study visually grounded VideoQA in response to the emerging trends of utilizing pretraining techniques for video-language understanding. Specifically, by forcing vision-language models (VLMs) to answer questions and simultaneously provide visual evidence, we seek to ascertain the extent to which the predictions of such techniques are genuinely anchored in relevant video content, versus spurious correlations from language or irrelevant visual context. Towards this, we construct NExT-GQA -- an extension of NExT-QA with 10.5$K$ temporal grounding (or location) labels tied to the original QA pairs. With NExT-GQA, we scrutinize a variety of state-of-the-art VLMs. Through post-hoc attention analysis, we find that these models are weak in substantiating the answers despite their strong QA performance. This exposes a severe limitation of these models in making reliable predictions. As a remedy, we further explore and suggest a video grounding mechanism via Gaussian mask optimization and cross-modal learning. Experiments with different backbones demonstrate that this grounding mechanism improves both video grounding and QA. Our dataset and code are released. With these efforts, we aim to push towards the reliability of deploying VLMs in VQA systems.

摘要
我们研究基于视觉的视频问答系统，响应现代技术的趋势，使用预训练技术来理解视频语言。特别是，我们要证明视频语言模型（VLM）的预测是否围绕视频内容进行 anchored，而不是语言或无关的视觉上下文的偶合。为此，我们构建了NExT-GQA数据集，包含10500个时间（或位置）标签，与原始问答对相关。通过对多种现状顶尖VLM进行探究，我们发现这些模型具有强大的问答能力，但却弱于证明答案的能力。这表明这些模型在作出可靠预测时存在严重的限制。为此，我们进一步探讨视频基准机制，包括 Gaussian 掩码优化和跨模态学习。实验表明，这种基准机制可以提高视频基准和问答能力。我们发布了数据集和代码，以便推动VLM在VQA系统中的可靠部署。

Learning a Patent-Informed Biomedical Knowledge Graph Reveals Technological Potential of Drug Repositioning Candidates

paper_url: http://arxiv.org/abs/2309.03227
repo_url: https://github.com/ysjegal/ysjegal-drug-repositioning
paper_authors: Yongseung Jegal, Jaewoong Choi, Jiho Lee, Ki-Su Park, Seyoung Lee, Janghyeok Yoon
for:This paper aims to present a novel protocol for identifying drug repositioning candidates with both technological potential and scientific evidence.methods:The protocol involves constructing a scientific biomedical knowledge graph (s-BKG) and a patent-informed biomedical knowledge graph (p-BKG), and using a graph embedding protocol to evaluate the relevance scores of potential drug candidates.results:The case study on Alzheimer’s disease demonstrates the efficacy and feasibility of the proposed method, and the quantitative outcomes and systematic methods are expected to bridge the gap between computational discoveries and successful market applications in drug repositioning research.

Abstract
Drug repositioning-a promising strategy for discovering new therapeutic uses for existing drugs-has been increasingly explored in the computational science literature using biomedical databases. However, the technological potential of drug repositioning candidates has often been overlooked. This study presents a novel protocol to comprehensively analyse various sources such as pharmaceutical patents and biomedical databases, and identify drug repositioning candidates with both technological potential and scientific evidence. To this end, first, we constructed a scientific biomedical knowledge graph (s-BKG) comprising relationships between drugs, diseases, and genes derived from biomedical databases. Our protocol involves identifying drugs that exhibit limited association with the target disease but are closely located in the s-BKG, as potential drug candidates. We constructed a patent-informed biomedical knowledge graph (p-BKG) by adding pharmaceutical patent information. Finally, we developed a graph embedding protocol to ascertain the structure of the p-BKG, thereby calculating the relevance scores of those candidates with target disease-related patents to evaluate their technological potential. Our case study on Alzheimer's disease demonstrates its efficacy and feasibility, while the quantitative outcomes and systematic methods are expected to bridge the gap between computational discoveries and successful market applications in drug repositioning research.

摘要
药物重新定位策略，即在现有药物上发现新的治疗用途，在计算科学文献中得到了越来越多的探索。然而，药物重新定位候选者的技术潜力经常被忽视。本研究提出了一种新的协议，用于全面分析各种来源，包括药品专利和生物医学数据库，并从科学角度评估药物重新定位候选者。为此，我们首先构建了一个生物医学知识图（s-BKG），其中包括药物、疾病和基因之间的科学关系，来自生物医学数据库。我们的协议是通过识别具有较少与目标疾病相关的药物，但与s-BKG相互关联的药物作为重新定位候选者。然后，我们将药品专利信息添加到了生物医学知识图（p-BKG）中，并开发了一个图像嵌入协议，以确定p-BKG的结构，从而计算重新定位候选者与疾病相关专利的相互关系。我们的实验案例涉及阿兹海默病，并证明了其可行性和实用性，而量化结果和系统方法即将bridge计算发现和成功应用在药物重新定位研究中的差距。

Code Representation Pre-training with Complements from Program Executions

paper_url: http://arxiv.org/abs/2309.09980
repo_url: None
paper_authors: Jiabo Huang, Jianyu Zhao, Yuyang Rong, Yiwen Guo, Yifeng He, Hao Chen
for: 提高代码智能的研究，使用大型自然语言处理模型（LLM）。
methods: 使用自定义随机测试工具生成测试用例，并将其用于预训练代码表示。
results: 与其他预训练方法相比，FuzzPretrain在代码搜索中提高了6%以上/9%以上的MAP值。

Abstract
Large language models (LLMs) for natural language processing have been grafted onto programming language modeling for advancing code intelligence. Although it can be represented in the text format, code is syntactically more rigorous in order to be properly compiled or interpreted to perform a desired set of behaviors given any inputs. In this case, existing works benefit from syntactic representations to learn from code less ambiguously in the forms of abstract syntax tree, control-flow graph, etc. However, programs with the same purpose can be implemented in various ways showing different syntactic representations while the ones with similar implementations can have distinct behaviors. Though trivially demonstrated during executions, such semantics about functionality are challenging to be learned directly from code, especially in an unsupervised manner. Hence, in this paper, we propose FuzzPretrain to explore the dynamic information of programs revealed by their test cases and embed it into the feature representations of code as complements. The test cases are obtained with the assistance of a customized fuzzer and are only required during pre-training. FuzzPretrain yielded more than 6%/9% mAP improvements on code search over its counterparts trained with only source code or AST, respectively. Our extensive experimental results show the benefits of learning discriminative code representations with program executions.

摘要
大型自然语言处理（LLM）模型已经被应用到程序语言模型中，以提高代码智能。虽然代码可以被表示为文本格式，但它的语法更加严格，以便在输入任意时执行所需的行为。在这种情况下，现有的工作受益于语法表示，以更好地从代码中学习不同的行为。然而，实现相同的目的可以通过不同的方式实现，导致代码的语法表示异常。尽管在执行时可以轻松地证明这些semantics，但在无监督情况下学习代码的Semantics是困难的。因此，在这篇论文中，我们提出了FuzzPretrain，它可以通过测试用例来探索代码中的动态信息，并将其 embedding到代码的特征表示中作为补充。这些测试用例通过自定义的随机测试工具生成，只需在预训练时使用。FuzzPretrain在代码搜索中实现了6%/9%的mAP提升，与只使用源代码或AST训练的对手相比。我们的广泛的实验结果表明了从代码执行中学习特征代码表示的利antages。

ExMobileViT: Lightweight Classifier Extension for Mobile Vision Transformer

paper_url: http://arxiv.org/abs/2309.01310
repo_url: None
paper_authors: Gyeongdong Yang, Yungwook Kwon, Hyunjin Kim
for: 提高手机友好的视transformer性能，降低计算负担
methods: 使用均值池化结果来扩展通道数，再利用早期注意力阶段的信息
results: 相比原MobileViT，提高精度，仅增加5%参数量

Abstract
The paper proposes an efficient structure for enhancing the performance of mobile-friendly vision transformer with small computational overhead. The vision transformer (ViT) is very attractive in that it reaches outperforming results in image classification, compared to conventional convolutional neural networks (CNNs). Due to its need of high computational resources, MobileNet-based ViT models such as MobileViT-S have been developed. However, their performance cannot reach the original ViT model. The proposed structure relieves the above weakness by storing the information from early attention stages and reusing it in the final classifier. This paper is motivated by the idea that the data itself from early attention stages can have important meaning for the final classification. In order to reuse the early information in attention stages, the average pooling results of various scaled features from early attention stages are used to expand channels in the fully-connected layer of the final classifier. It is expected that the inductive bias introduced by the averaged features can enhance the final performance. Because the proposed structure only needs the average pooling of features from the attention stages and channel expansions in the final classifier, its computational and storage overheads are very small, keeping the benefits of low-cost MobileNet-based ViT (MobileViT). Compared with the original MobileViTs on the ImageNet dataset, the proposed ExMobileViT has noticeable accuracy enhancements, having only about 5% additional parameters.

摘要
文章提出了一种高效的结构，以提高移动设备友好的视Transformer（ViT）性能，而不需要大量计算资源。由于ViT模型的需求高于常见的卷积神经网络（CNN），因此基于MobileNet的ViT模型如MobileViT-S已经开发。然而，其性能不能达到原始ViT模型的水平。提出的结构解决了上述弱点，通过将早期注意力阶段中的信息存储并重复使用在最终分类器中。这篇论文受到了数据本身在早期注意力阶段的重要性的想法所 inspirited。为了重复使用早期注意力阶段的信息，使用了不同缩放因子的特征的平均池化结果来扩展最终分类器的 Fully Connected（FC）层的通道数。预计通过引入缩放因子引入的预测偏好，可以提高最终性能。由于提出的结构只需要平均池化早期注意力阶段的特征，以及在最终分类器的FC层中进行通道扩展，因此计算和存储开销非常小，保留了低成本的MobileNet基于ViT（MobileViT）的好处。与原始MobileViT在ImageNet dataset上的性能相比，提出的ExMobileViT具有显著的准确性提升，只有约5%的额外参数。

Partial Proof of a Conjecture with Implications for Spectral Majorization

paper_url: http://arxiv.org/abs/2309.01302
repo_url: None
paper_authors: Jeffrey Uhlmann
for: 这项研究探讨了一个关于 $n\times n$ ($n\leq 6$) 正定矩阵的性质的悬念。
methods: 这项研究使用了计算机辅助的和平方方法（SoS）来证明多项式非负性。
results: 研究发现了一个新的矩阵家族，其特点是对角线majorize其 спектrum。此外，这个家族可以通过克로内克组合扩展到 $n>6$ ，保持特殊的majorization property。

Abstract
In this paper we report on new results relating to a conjecture regarding properties of $n\times n$, $n\leq 6$, positive definite matrices. The conjecture has been proven for $n\leq 4$ using computer-assisted sum of squares (SoS) methods for proving polynomial nonnegativity. Based on these proven cases, we report on the recent identification of a new family of matrices with the property that their diagonals majorize their spectrum. We then present new results showing that this family can extended via Kronecker composition to $n>6$ while retaining the special majorization property. We conclude with general considerations on the future of computer-assisted and AI-based proofs.

摘要
在这篇论文中，我们报告了新的结果，与positive definite矩阵($n\times n$, $n\leq 6$)的性质有关。我们已经使用计算机支持的sum of squares（SoS）方法证明了$n\leq 4$的情况。基于已经证明的 случа例，我们报告了一个新的矩阵家族，其特点是主对角线majorize其 спектrum。然后，我们发现了一种可以通过克ро内克组合延伸到$n>6$的方法，保持特殊的majorization性。我们结束于计算机支持和人工智能基于证明的未来考虑。Here's the translation in Traditional Chinese:在这篇论文中，我们报告了新的结果，与positive definite矩阵（$n\times n$, $n\leq 6）的性质有关。我们已经使用计算机支持的sum of squares（SoS）方法证明了$n\leq 4$的情况。基于已经证明的个案例，我们报告了一个新的矩阵家族，其特点是主对角线majorize其 спектrum。然后，我们发现了一种可以通过克ро内克组合延伸到$n>6$的方法，保持特殊的majorization性。我们结束于计算机支持和人工智能基于证明的未来考虑。

AlphaZero Gomoku

paper_url: http://arxiv.org/abs/2309.01294
repo_url: https://github.com/suragnair/alpha-zero-general
paper_authors: Wen Liang, Chao Yu, Brian Whiteaker, Inyoung Huh, Hua Shao, Youzhi Liang
for: 这个论文的目的是探索AlphaZero算法在gomoku棋盘游戏中的表现。
methods: 这个论文使用了AlphaZero算法，具体来说是将深度学习与Monte Carlo搜索结合在一起，以便在gomoku棋盘游戏中实现人工智能的优势。
results: 测试结果表明，AlphaZero算法在gomoku棋盘游戏中表现出了优势，并且能够在不同的游戏环境下保持稳定的高水平。

Abstract
In the past few years, AlphaZero's exceptional capability in mastering intricate board games has garnered considerable interest. Initially designed for the game of Go, this revolutionary algorithm merges deep learning techniques with the Monte Carlo tree search (MCTS) to surpass earlier top-tier methods. In our study, we broaden the use of AlphaZero to Gomoku, an age-old tactical board game also referred to as "Five in a Row." Intriguingly, Gomoku has innate challenges due to a bias towards the initial player, who has a theoretical advantage. To add value, we strive for a balanced game-play. Our tests demonstrate AlphaZero's versatility in adapting to games other than Go. MCTS has become a predominant algorithm for decision processes in intricate scenarios, especially board games. MCTS creates a search tree by examining potential future actions and uses random sampling to predict possible results. By leveraging the best of both worlds, the AlphaZero technique fuses deep learning from Reinforcement Learning with the balancing act of MCTS, establishing a fresh standard in game-playing AI. Its triumph is notably evident in board games such as Go, chess, and shogi.

摘要
Recently, AlphaZero的出色的能力在复杂游戏中精通得到了广泛关注。 alphaZero最初是设计用于围棋游戏，这种革命性的算法将深度学习技术与Monte Carlo Tree Search（MCTS）相结合，超越了之前的顶尖方法。在我们的研究中，我们扩展了AlphaZero的使用范围到了古 mobil游戏（Gomoku），这是一款具有偏好初始玩家的战略游戏。很有趣的是，Gomoku拥有内生的挑战，因为初始玩家有理论上的优势。为了增加价值，我们努力寻求平衡的游戏环境。我们的测试表明AlphaZero在不同于Go的游戏中也有很好的适应能力。MCTS已成为复杂enario中决策过程中的主流算法，特别是板球游戏。MCTS通过评估未来动作的可能性来构建搜索树，并使用随机抽样来预测可能的结果。AlphaZero技术将深度学习从强化学习与MCTS的平衡过程相结合，建立了新的游戏AI标准。其胜利在板球游戏、国际象棋和将棋等游戏中得到了广泛证明。

2023-09-04

Efficient Defense Against Model Stealing Attacks on Convolutional Neural Networks

Smoothing ADMM for Sparse-Penalized Quantile Regression with Non-Convex Penalties

One Wide Feedforward is All You Need

Towards Foundational AI Models for Additive Manufacturing: Language Models for G-Code Debugging, Manipulation, and Comprehension

DiscoverPath: A Knowledge Refinement and Retrieval System for Interdisciplinarity on Biomedical Research

Marginalized Importance Sampling for Off-Environment Policy Evaluation

Neural-Singular-Hessian: Implicit Neural Representation of Unoriented Point Clouds by Enforcing Singular Hessian

3D View Prediction Models of the Dorsal Visual Stream

On the size of irredundant propagation complete CNF formulas

Hybrid data driven/thermal simulation model for comfort assessment

Softmax Bias Correction for Quantized Generative Models

Interdisciplinary Fairness in Imbalanced Research Proposal Topic Inference: A Hierarchical Transformer-based Method with Selective Interpolation

On the Robustness of Post-hoc GNN Explainers to Label Noise

No Data Augmentation? Alternative Regularizations for Effective Training on Small Datasets

Prompt me a Dataset: An investigation of text-image prompting for historical image dataset creation using foundation models

Fine-grained Affective Processing Capabilities Emerging from Large Language Models

Unveiling Theory of Mind in Large Language Models: A Parallel to Single Neurons in the Human Brain

Which algorithm to select in sports timetabling?

Design of Recognition and Evaluation System for Table Tennis Players’ Motor Skills Based on Artificial Intelligence

Corgi^2: A Hybrid Offline-Online Approach To Storage-Aware Data Shuffling For SGD

Concepts is All You Need: A More Direct Path to AGI

DeViL: Decoding Vision features into Language

Deep Learning Overloaded Vehicle Identification for Long Span Bridges Based on Structural Health Monitoring Data

Les Houches Lectures on Deep Learning at Large & Infinite Width

Rail Crack Propagation Forecasting Using Multi-horizons RNNs

OutRank: Speeding up AutoML-based Model Search for Large Sparse Data sets with Cardinality-aware Feature Ranking

ChatRule: Mining Logical Rules with Large Language Models for Knowledge Graph Reasoning

Are We Using Autoencoders in a Wrong Way?

MultiWay-Adapater: Adapting large-scale multi-modal models for scalable image-text retrieval

RGI-Net: 3D Room Geometry Inference from Room Impulse Responses in the Absence of First-order Echoes

Neural Vector Fields: Generalizing Distance Vector Fields by Codebooks and Zero-Curl Regularization

Memory Efficient Optimizers with 4-bit States

BadSQA: Stealthy Backdoor Attacks Using Presence Events as Triggers in Non-Intrusive Speech Quality Assessment

Pure Monte Carlo Counterfactual Regret Minimization

Interactive Graph Convolutional Filtering

Effective Multi-Graph Neural Networks for Illicit Account Detection on Cryptocurrency Transaction Networks

Social Factors in P2P Energy Trading Using Hedonic Games

Towards frugal unsupervised detection of subtle abnormalities in medical imaging

Metric Learning for Projections Bias of Generalized Zero-shot Learning

LoRA-like Calibration for Multimodal Deception Detection using ATSFace Data

Memory augment is All You Need for image restoration

ReOnto: A Neuro-Symbolic Approach for Biomedical Relation Extraction

Refined Temporal Pyramidal Compression-and-Amplification Transformer for 3D Human Pose Estimation

Self-driven Grounding: Large Language Model Agents with Automatical Language-aligned Skill Learning

UniSA: Unified Generative Framework for Sentiment Analysis

Learning for Interval Prediction of Electricity Demand: A Cluster-based Bootstrapping Approach

Can I Trust Your Answer? Visually Grounded Video Question Answering

Learning a Patent-Informed Biomedical Knowledge Graph Reveals Technological Potential of Drug Repositioning Candidates

Code Representation Pre-training with Complements from Program Executions

ExMobileViT: Lightweight Classifier Extension for Mobile Vision Transformer

Partial Proof of a Conjecture with Implications for Spectral Majorization

AlphaZero Gomoku