2023-12-03

cs.AI

cs.AI - 2023-12-03

Revisiting Non-separable Binary Classification and its Applications in Anomaly Detection

paper_url: http://arxiv.org/abs/2312.01541
repo_url: https://github.com/mattlaued/xor-is-linearly-classifiable
paper_authors: Matthew Lau, Ismaila Seck, Athanasios P Meliopoulos, Wenke Lee, Eugene Ndiaye
for: 解决XOR问题的线性分类是否可能
methods: 提议一种不同的分类方法——等式分离，通过修改支持向量机的目标函数来分辨数据内或外部的边缘
results: equality separation可以用于异常检测，并且可以在超级vised异常检测实验中证明seen和unseen异常的检测。Here’s a more detailed explanation of each point:1. for: The paper aims to solve the problem of linearly classifying XOR, which has been a long-standing challenge in deep learning.2. methods: The authors propose a new method called equality separation, which adapts the support vector machine (SVM) objective to distinguish data within or outside the margin. This method can be integrated into neural network pipelines with a smooth approximation.3. results: The authors show that equality separation can be used for anomaly detection, and they introduce a quantitative measure called closing numbers to formalize this notion. They also test their hypothesis on supervised anomaly detection experiments, demonstrating that equality separation can detect both seen and unseen anomalies.

Abstract
The inability to linearly classify XOR has motivated much of deep learning. We revisit this age-old problem and show that linear classification of XOR is indeed possible. Instead of separating data between halfspaces, we propose a slightly different paradigm, equality separation, that adapts the SVM objective to distinguish data within or outside the margin. Our classifier can then be integrated into neural network pipelines with a smooth approximation. From its properties, we intuit that equality separation is suitable for anomaly detection. To formalize this notion, we introduce closing numbers, a quantitative measure on the capacity for classifiers to form closed decision regions for anomaly detection. Springboarding from this theoretical connection between binary classification and anomaly detection, we test our hypothesis on supervised anomaly detection experiments, showing that equality separation can detect both seen and unseen anomalies.

摘要
“XOR问题的非线性分类问题挑战了深度学习的发展。我们回顾这个老问题，并证明了 XOR 问题的线性分类是可能的。而不是将数据分为半空间，我们提议一种微妙的 парадиг，即等式分离，将 SVM 目标函数改造为用于在边缘区划分数据。我们的分类器可以与神经网络核心结构结合使用，并且通过缓和近似来实现。从其性质来看，我们认为等式分离适用于异常检测。为了正式表述这个概念，我们引入 closing numbers，一种量化度量分类器形成关闭决策区的能力。从这种理论上的连接 между 二分类和异常检测，我们在超级vised anomaly detection experiment中测试了我们的假设，并证明了等式分离可以检测到both seen和unseen异常。”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Unlocking the Potential of Federated Learning: The Symphony of Dataset Distillation via Deep Generative Latents

paper_url: http://arxiv.org/abs/2312.01537
repo_url: https://github.com/feddg23/feddg-main
paper_authors: Yuqi Jia, Saeed Vahidian, Jingwei Sun, Jianyi Zhang, Vyacheslav Kungurtsev, Neil Zhenqiang Gong, Yiran Chen
for: 提高 federated learning 中数据不一致性的问题，提出了一种高效的服务器端数据简化框架，减少了本地设备的计算和通信占用，同时保护了客户端的隐私。
methods: 提出了一种基于先进的深度生成模型的服务器端数据简化技术，使得本地设备可以训练更小的代理模型，同时在服务器端训练更大的全球模型，从而最大限度地减少资源的利用。
results: 实验结果表明，该方法可以提高 federated learning 的准确率，相比非数据简化方法可以提高准确率高达 40%，并且比现有的数据简化方法提高了 18%。此外，该方法的训练速度比基eline快，因为而不是服务器在训练多个不同数据分布的多种数据分布，而是在多模态分布上训练。

Abstract
Data heterogeneity presents significant challenges for federated learning (FL). Recently, dataset distillation techniques have been introduced, and performed at the client level, to attempt to mitigate some of these challenges. In this paper, we propose a highly efficient FL dataset distillation framework on the server side, significantly reducing both the computational and communication demands on local devices while enhancing the clients' privacy. Unlike previous strategies that perform dataset distillation on local devices and upload synthetic data to the server, our technique enables the server to leverage prior knowledge from pre-trained deep generative models to synthesize essential data representations from a heterogeneous model architecture. This process allows local devices to train smaller surrogate models while enabling the training of a larger global model on the server, effectively minimizing resource utilization. We substantiate our claim with a theoretical analysis, demonstrating the asymptotic resemblance of the process to the hypothetical ideal of completely centralized training on a heterogeneous dataset. Empirical evidence from our comprehensive experiments indicates our method's superiority, delivering an accuracy enhancement of up to 40% over non-dataset-distillation techniques in highly heterogeneous FL contexts, and surpassing existing dataset-distillation methods by 18%. In addition to the high accuracy, our framework converges faster than the baselines because rather than the server trains on several sets of heterogeneous data distributions, it trains on a multi-modal distribution. Our code is available at https://github.com/FedDG23/FedDG-main.git

摘要
“数据多样性对联合学习（FL）带来重要挑战。最近， dataset distillation 技术在客户端上进行了应用，以减轻一些这些挑战。在这篇论文中，我们提出了一个高效的 FL dataset distillation 框架，在服务器端进行了实现，对本地设备的计算和通信占用量进行了显著减少，同时保持了客户端的隐私。不同于先前的策略，不在本地设备上执行 dataset distillation，而是使服务器利用先前训练的深度生成模型来提取数据的主要表示，并将其传递给本地设备进行训练。这种方法使本地设备可以训练较小的副本模型，同时允许服务器训练一个更大的全球模型，从而减少资源的使用。我们通过理论分析，证明了这种过程的极限相似性，与完全中央化训练在多样性数据上的理论模型相似。实验证明了我们的方法的优越性，在高度多样性的 FL 上提高了精度达40%，比非 dataset distillation 技术高出18%。此外，我们的框架在基eline上更快 converges，因为服务器不需要训练多个不同数据分布的各种模型，而是训练一个多Modal的分布。我们的代码可以在 GitHub 上找到：https://github.com/FedDG23/FedDG-main.git”

NovoMol: Recurrent Neural Network for Orally Bioavailable Drug Design and Validation on PDGFRα Receptor

paper_url: http://arxiv.org/abs/2312.01527
repo_url: https://github.com/ishirraov/novomol
paper_authors: Ishir Rao
For: 提高药物临床试验的效率，解决医药产业中药物候选者的时间和成功率问题。* Methods: 使用回归神经网络mass生成药物，对药物进行数学预测，并对药物进行优化。* Results: 通过使用QED来衡量药物的胃肠溶解度，在5个训练周期后，76%的生成药物达到了QED的胃肠溶解度阈值，96%的生成药物达到了传统使用的Lipinski的五则规则。训练模型后，对PDGFRα受体进行了特定的药物候选者生成，44%的生成药物在与现有的State-of-the-art药物Imatinib（蛋白质绑定亲和力-9.4 kcal/mol）的绑定亲和力上超过了现有药物。

Abstract
Longer timelines and lower success rates of drug candidates limit the productivity of clinical trials in the pharmaceutical industry. Promising de novo drug design techniques help solve this by exploring a broader chemical space, efficiently generating new molecules, and providing improved therapies. However, optimizing for molecular characteristics found in approved oral drugs remains a challenge, limiting de novo usage. In this work, we propose NovoMol, a novel de novo method using recurrent neural networks to mass-generate drug molecules with high oral bioavailability, increasing clinical trial time efficiency. Molecules were optimized for desirable traits and ranked using the quantitative estimate of drug-likeness (QED). Generated molecules meeting QED's oral bioavailability threshold were used to retrain the neural network, and, after five training cycles, 76% of generated molecules passed this strict threshold and 96% passed the traditionally used Lipinski's Rule of Five. The trained model was then used to generate specific drug candidates for the cancer-related PDGFR{\alpha} receptor and 44% of generated candidates had better binding affinity than the current state-of-the-art drug, Imatinib (with a receptor binding affinity of -9.4 kcal/mol), and the best-generated candidate at -12.9 kcal/mol. NovoMol provides a time/cost-efficient AI-based de novo method offering promising drug candidates for clinical trials.

摘要
长时间和低成功率的药物候选者限制了医药工业中临床试验的产量。promising de novo药物设计技术可以解决这个问题，探索更广泛的化学空间，效率生成新分子，提供改进的治疗方案。然而，仍然需要优化批量药物的分子特征，限制de novo的使用。在这项工作中，我们提出了NovoMol，一种新的de novo方法，使用回归神经网络来批量生成高口服bioavailability的药物分子，提高临床试验时间效率。分子被优化为愿望的特征，并根据量子药理性估计（QED）进行排名。通过五次训练，76%的生成分子达到了QED的口服bioavailability阈值，96%达到了传统使用的利平斯基Rule of Five。训练后，模型被用来生成特定的PDGFRα受体相关的药物候选者，44%的生成候选者有更高的绑定率，比现有的 estado-of-the-art药物Imatinib（受体绑定率-9.4 kcal/mol）更好。NovoMol提供了时间/成本高效的人工智能基于de novo方法，为临床试验提供了优秀的药物候选者。

Tackling Bias in Pre-trained Language Models: Current Trends and Under-represented Societies

paper_url: http://arxiv.org/abs/2312.01509
repo_url: None
paper_authors: Vithya Yogarajan, Gillian Dobbie, Te Taka Keegan, Rostam J. Neuwirth
for: 本研究旨在探讨现有语言模型中的偏见问题，以及如何适应不同社会群体的需求。
methods: 本研究使用了现有的方法和数据集来评估语言模型中的偏见，包括度量指标、测试数据集和修正策略。
results: 研究发现现有的偏见检测和修正方法对于不同社会群体可能存在一定的限制和不足，需要针对不同社会群体的需求进行定制和调整。

Abstract
The benefits and capabilities of pre-trained language models (LLMs) in current and future innovations are vital to any society. However, introducing and using LLMs comes with biases and discrimination, resulting in concerns about equality, diversity and fairness, and must be addressed. While understanding and acknowledging bias in LLMs and developing mitigation strategies are crucial, the generalised assumptions towards societal needs can result in disadvantages towards under-represented societies and indigenous populations. Furthermore, the ongoing changes to actual and proposed amendments to regulations and laws worldwide also impact research capabilities in tackling the bias problem. This research presents a comprehensive survey synthesising the current trends and limitations in techniques used for identifying and mitigating bias in LLMs, where the overview of methods for tackling bias are grouped into metrics, benchmark datasets, and mitigation strategies. The importance and novelty of this survey are that it explores the perspective of under-represented societies. We argue that current practices tackling the bias problem cannot simply be 'plugged in' to address the needs of under-represented societies. We use examples from New Zealand to present requirements for adopting existing techniques to under-represented societies.

摘要
现代和未来的创新中，预训言语模型（LLM）的优势和能力是社会中不可或缺的。然而，在引入和使用 LLM 时，存在偏见和歧视的问题，这会导致平等、多样性和公正的问题。为了解决这些问题，我们必须理解和承认 LLM 中的偏见，并开发消除方法。然而，通过普遍的假设来认为社会需求可能会导致弱化有限表示的社会和原住民族。此外，全球不断改变的法律和规定也会影响研究对偏见问题的能力。本研究提供了一份全面的评估，汇总当前的趋势和局限性，以及用于识别和消除偏见的技术。我们认为现有的偏见缓解方法无法直接应用于弱化表示的社会。我们使用新西兰的例子来说明在采用现有技术时的需求。

Effectively Fine-tune to Improve Large Multimodal Models for Radiology Report Generation

paper_url: http://arxiv.org/abs/2312.01504
repo_url: None
paper_authors: Yuzhe Lu, Sungmin Hong, Yash Shah, Panpan Xu
for: 这个研究旨在自动生成医疗影像报告，以减少执业医生的时间和错误率。
methods: 本研究提出了一个简单 yet effective的 two-stage 练习方法，将视觉特征与大型语言模型（LLM）的文本嵌入空间进行调整。
results: 本研究使用 OpenLLaMA-7B 取得了顶尖水准的性能，而不需要专业预训。此外，我们还提供了软件图示和注意力机制的详细分析，对未来研究提供了新的方向。

Abstract
Writing radiology reports from medical images requires a high level of domain expertise. It is time-consuming even for trained radiologists and can be error-prone for inexperienced radiologists. It would be appealing to automate this task by leveraging generative AI, which has shown drastic progress in vision and language understanding. In particular, Large Language Models (LLM) have demonstrated impressive capabilities recently and continued to set new state-of-the-art performance on almost all natural language tasks. While many have proposed architectures to combine vision models with LLMs for multimodal tasks, few have explored practical fine-tuning strategies. In this work, we proposed a simple yet effective two-stage fine-tuning protocol to align visual features to LLM's text embedding space as soft visual prompts. Our framework with OpenLLaMA-7B achieved state-of-the-art level performance without domain-specific pretraining. Moreover, we provide detailed analyses of soft visual prompts and attention mechanisms, shedding light on future research directions.

摘要
评估医学影像需要高水平的领域专业知识。 even for 训练过的 radiologist 可以是时间consuming ，而不经验的 radiologist 可能会有错误。因此，使用生成 AI 自动化这个任务是非常吸引人。特别是，大语言模型（LLM）在 recent 时期表现出了惊人的进步，在 almost all natural language tasks 中 setting new state-of-the-art performance。虽然许多人已经提出了将视觉模型与 LLM 结合的建议，但只有几个人探讨了实用的 fine-tuning 策略。在这项工作中，我们提出了一种简单 yet effective 的 two-stage fine-tuning 协议，将视觉特征与 LLM 的文本嵌入空间相对轴。我们的框架与 OpenLLaMA-7B 实现了 state-of-the-art 级别的性能，而无需域pecific 预训练。此外，我们还提供了软visual prompts 和 attention 机制的详细分析，为未来的研究提供了光明。

ADT: Agent-based Dynamic Thresholding for Anomaly Detection

paper_url: http://arxiv.org/abs/2312.01488
repo_url: None
paper_authors: Xue Yang, Enda Howley, Micheal Schukat
for: 这篇论文旨在提出一个基于动态决策网络的侦错探测方法，以便在实际应用中实现动态阈值调整。
methods: 这篇论文使用了一个专案网络，将侦错探测视为一个Markov Decision Process，并提出了一个基于深度Q网络的代理者基本Dynamic Thresholding（ADT）框架。
results: 经过三个真实世界数据集的实验，这篇论文显示了ADT的阈值调整能力、数据有效性、稳定性和Robustness。

Abstract
The complexity and scale of IT systems are increasing dramatically, posing many challenges to real-world anomaly detection. Deep learning anomaly detection has emerged, aiming at feature learning and anomaly scoring, which has gained tremendous success. However, little work has been done on the thresholding problem despite it being a critical factor for the effectiveness of anomaly detection. In this paper, we model thresholding in anomaly detection as a Markov Decision Process and propose an agent-based dynamic thresholding (ADT) framework based on a deep Q-network. The proposed method can be integrated into many systems that require dynamic thresholding. An auto-encoder is utilized in this study to obtain feature representations and produce anomaly scores for complex input data. ADT can adjust thresholds adaptively by utilizing the anomaly scores from the auto-encoder and significantly improve anomaly detection performance. The properties of ADT are studied through experiments on three real-world datasets and compared with benchmarks, hence demonstrating its thresholding capability, data-efficient learning, stability, and robustness. Our study validates the effectiveness of reinforcement learning in optimal thresholding control in anomaly detection.

摘要
IT系统的复杂性和规模在不断增加，对现实世界异常检测带来了很多挑战。深度学习异常检测已经出现，旨在学习特征和异常分配，取得了巨大成功。然而，对阈值问题的研究仍然很少，即使这是异常检测效iveness的关键因素。在这篇论文中，我们将异常检测的阈值模型为Markov决策过程，并提出了基于深度Q网络的自适应动态阈值（ADT）框架。我们的方法可以与许多需要动态阈值的系统集成。在这种研究中，我们使用 auto-encoder 来获得特征表示和生成复杂输入数据的异常分数。ADT可以通过利用 auto-encoder 生成的异常分数进行适应性的阈值调整，提高异常检测性能。我们的实验表明，ADT具有阈值控制、数据效率学习、稳定性和稳定性等性质。我们的研究证明了深度学习在异常检测中的阈值控制优化的有效性。

Context-Enhanced Relational Operators with Vector Embeddings

paper_url: http://arxiv.org/abs/2312.01476
repo_url: None
paper_authors: Viktor Sanca, Manos Chatzakis, Anastasia Ailamaki
for: 这篇论文的目的是解决传统关系数据库管理系统中处理数据处理管道中的数据拓扑和上下文rich多Modal数据的挑战。
methods: 论文使用了 representation learning 模型将上下文rich数据映射到矢量 embedding 中，并将这些 embedding 与关系运算相结合，以实现机器自动化的上下文处理。
results: 论文提出了一种 hybrid 关系和矢量数据处理方法，并实现了逻辑和物理优化。使用示例串embeddings，论文示出了在关系Join操作器上启用 hybrid 上下文增强处理的能力，并实现了对执行时间的一个次元级别的提升。

Abstract
Collecting data, extracting value, and combining insights from relational and context-rich multi-modal sources in data processing pipelines presents a challenge for traditional relational DBMS. While relational operators allow declarative and optimizable query specification, they are limited to data transformations unsuitable for capturing or analyzing context. On the other hand, representation learning models can map context-rich data into embeddings, allowing machine-automated context processing but requiring imperative data transformation integration with the analytical query. To bridge this dichotomy, we present a context-enhanced relational join and introduce an embedding operator composable with relational operators. This enables hybrid relational and context-rich vector data processing, with algebraic equivalences compatible with relational algebra and corresponding logical and physical optimizations. We investigate model-operator interaction with vector data processing and study the characteristics of the E-join operator. Using an example of string embeddings, we demonstrate enabling hybrid context-enhanced processing on relational join operators with vector embeddings. The importance of holistic optimization, from logical to physical, is demonstrated in an order of magnitude execution time improvement.

摘要
Collecting data, extracting value, and combining insights from relational and context-rich multi-modal sources in data processing pipelines presents a challenge for traditional relational DBMS. While relational operators allow declarative and optimizable query specification, they are limited to data transformations unsuitable for capturing or analyzing context. On the other hand, representation learning models can map context-rich data into embeddings, allowing machine-automated context processing but requiring imperative data transformation integration with the analytical query. To bridge this dichotomy, we present a context-enhanced relational join and introduce an embedding operator composable with relational operators. This enables hybrid relational and context-rich vector data processing, with algebraic equivalences compatible with relational algebra and corresponding logical and physical optimizations. We investigate model-operator interaction with vector data processing and study the characteristics of the E-join operator. Using an example of string embeddings, we demonstrate enabling hybrid context-enhanced processing on relational join operators with vector embeddings. The importance of holistic optimization, from logical to physical, is demonstrated in an order of magnitude execution time improvement.Here's the translation in Traditional Chinese:收集数据，提取价值，并结合多种多模式的资料处理管道中的数据处理问题，对传统的关联式DBMS提出了挑战。关联式操作符允许宣告式和可优化的查询规则，但它们仅适用于不适合捕捉或分析上下文的资料变数。相反，表示学习模型可以将上下文丰富的数据映射到嵌入中，allowing机器自动处理上下文，但需要强制性的数据变数融合。 To bridge this gap, we present a context-enhanced relational join and introduce an embedding operator composable with relational operators. This enables hybrid relational and context-rich vector data processing, with algebraic equivalences compatible with relational algebra and corresponding logical and physical optimizations. We investigate model-operator interaction with vector data processing and study the characteristics of the E-join operator. Using an example of string embeddings, we demonstrate enabling hybrid context-enhanced processing on relational join operators with vector embeddings. The importance of holistic optimization, from logical to physical, is demonstrated in an order of magnitude execution time improvement.

Personality of AI

paper_url: http://arxiv.org/abs/2312.02998
repo_url: https://github.com/Committing/personalitypolice.com_public
paper_authors: Byunggu Yu, Junwhan Kim
for: 本研究探讨了大型自然语言模型（LLM）如何与人类用户进行匹配，不仅是基本匹配，而是提出了“人格匹配”的概念，以适应组织设置中的语言模型。
methods: 本研究认为训练方法对AI模型的未定性特征的形成产生了影响，因此将人类人格测试与AI模型训练方法进行了比较。通过一个原创的案例研究，我们示出了AI模型的人格细致调整的必要性，并提出了关于应用人类设计的测试到AI模型、开发专门的AI人格测试和shape AI人格的问题。
results: 本研究提供了人工智能匹配的开始点，以便将来的探索和发展。通过人类-机器团队和共存的概念，我们可以更好地理解和利用AI技术，以提高组织的效率和创新力。

Abstract
This research paper delves into the evolving landscape of fine-tuning large language models (LLMs) to align with human users, extending beyond basic alignment to propose "personality alignment" for language models in organizational settings. Acknowledging the impact of training methods on the formation of undefined personality traits in AI models, the study draws parallels with human fitting processes using personality tests. Through an original case study, we demonstrate the necessity of personality fine-tuning for AIs and raise intriguing questions about applying human-designed tests to AIs, engineering specialized AI personality tests, and shaping AI personalities to suit organizational roles. The paper serves as a starting point for discussions and developments in the burgeoning field of AI personality alignment, offering a foundational anchor for future exploration in human-machine teaming and co-existence.

摘要
这份研究论文探讨了大语言模型（LLM）在人类用户的整合方面的发展，从基础对应到人格对应，提出了在组织设置中使用“人格对应”来对语言模型进行调整。认可训练方法对AI模型的未定性特征的形成的影响，研究借鉴人类适应过程使用人格测试。通过原创的案例研究，我们证明了AI模型的人格调整的必要性，并提出了对应人工设计AI测试、工程化AI人格测试和适应组织角色的AI人格定制等问题。这篇论文为人机团队和人机共存领域的发展提供了一个基础锚点，供未来的探索和发展。

BenchMARL: Benchmarking Multi-Agent Reinforcement Learning

paper_url: http://arxiv.org/abs/2312.01472
repo_url: https://github.com/facebookresearch/benchmarl
paper_authors: Matteo Bettini, Amanda Prorok, Vincent Moens
For: This paper aims to address the reproducibility crisis in Multi-Agent Reinforcement Learning (MARL) by introducing BenchMARL, a training library for standardized benchmarking.* Methods: BenchMARL uses TorchRL as its backend, allowing for high-performance and state-of-the-art implementations, and its design enables systematic configuration and reporting of complex benchmarks with simple one-line inputs.* Results: BenchMARL is the first MARL training library that enables standardized benchmarking across different algorithms, models, and environments, and it is open-sourced on GitHub for the broad community of MARL PyTorch users.Here is the simplified Chinese text for the three key points:* For: 本文提出了一种解决多智能奖励学习（MARL） reproduceability crisis 的方法，即引入 BenchMARL，一个用于标准化测试的训练库。* Methods: BenchMARL 使用 TorchRL 作为 backend，可以实现高性能和维护状态的最佳实现，并且设计了系统化的配置和报告方式，可以通过一行输入创建和运行复杂的测试。* Results: BenchMARL 是首个用于标准化 MARL 测试的训练库，可以在不同的算法、模型和环境下进行标准化测试，并且开源在 GitHub 上，以便广泛的 MARL PyTorch 用户群体使用。

Abstract
The field of Multi-Agent Reinforcement Learning (MARL) is currently facing a reproducibility crisis. While solutions for standardized reporting have been proposed to address the issue, we still lack a benchmarking tool that enables standardization and reproducibility, while leveraging cutting-edge Reinforcement Learning (RL) implementations. In this paper, we introduce BenchMARL, the first MARL training library created to enable standardized benchmarking across different algorithms, models, and environments. BenchMARL uses TorchRL as its backend, granting it high performance and maintained state-of-the-art implementations while addressing the broad community of MARL PyTorch users. Its design enables systematic configuration and reporting, thus allowing users to create and run complex benchmarks from simple one-line inputs. BenchMARL is open-sourced on GitHub: https://github.com/facebookresearch/BenchMARL

摘要
当前的多智能奖励学习（Multi-Agent Reinforcement Learning，MARL）领域正面临一场可重复性危机。虽然有解决方案建议使用标准化报告，但我们仍然缺乏一个可以标准化和可重复性的测试工具，同时利用前沿的奖励学习（Reinforcement Learning，RL）实现。在这篇论文中，我们介绍了BenchMARL，第一个用于MARL训练的标准化测试库。BenchMARL使用TorchRL作为后端，从而实现了高性能和维护了前沿的PyTorch用户社区的状态。BenchMARL的设计允许用户系统地配置和报告，从而让用户可以通过一行命令创建和运行复杂的benchmark。BenchMARL在GitHub上开源：https://github.com/facebookresearch/BenchMARL。

Exploring Adversarial Robustness of LiDAR-Camera Fusion Model in Autonomous Driving

paper_url: http://arxiv.org/abs/2312.01468
repo_url: None
paper_authors: Bo Yang, Xiaoyu Ji, Xiaoyu Ji, Xiaoyu Ji, Xiaoyu Ji
for: This paper assesses the adversarial robustness of LiDAR-camera fusion models in 3D object detection, with a focus on safety concerns in autonomous driving.
methods: The paper introduces an attack technique that adds a limited number of physically constrained adversarial points above a car to deceive the fusion model, without changing the image data channel.
results: Experimental results show that the fusion model can be deceived solely by manipulating the LiDAR data channel, raising safety concerns in autonomous driving. The paper also explores the effects of various factors on the attack success rate.

Abstract
Our study assesses the adversarial robustness of LiDAR-camera fusion models in 3D object detection. We introduce an attack technique that, by simply adding a limited number of physically constrained adversarial points above a car, can make the car undetectable by the fusion model. Experimental results reveal that even without changes to the image data channel, the fusion model can be deceived solely by manipulating the LiDAR data channel. This finding raises safety concerns in the field of autonomous driving. Further, we explore how the quantity of adversarial points, the distance between the front-near car and the LiDAR-equipped car, and various angular factors affect the attack success rate. We believe our research can contribute to the understanding of multi-sensor robustness, offering insights and guidance to enhance the safety of autonomous driving.

摘要
我们的研究评估了涉及推理 LiDAR-camera 融合模型的攻击表现。我们介绍了一种简单地在车辆上添加一定数量的物理约束的恶意点的攻击技术，可以使车辆被检测器抑制。实验结果表明，无需改变图像数据频道，涉及 LiDAR 数据频道的攻击 already 可以让车辆被抑制。这种发现对于自动驾驶领域的安全提出了很大的问题。我们进一步探讨了攻击点的数量、车辆前方近距离 LiDAR 搭载车辆之间的距离以及不同的角度因素对攻击成功率的影响。我们认为，我们的研究可以帮助我们更好地理解多感器的可靠性，提供有用的指导和技术来提高自动驾驶的安全性。

D-Bot: Database Diagnosis System using Large Language Models

paper_url: http://arxiv.org/abs/2312.01454
repo_url: https://github.com/tsinghuadatabasegroup/db-gpt
paper_authors: Xuanhe Zhou, Guoliang Li, Zhaoyan Sun, Zhiyuan Liu, Weize Chen, Jianming Wu, Jiesi Liu, Ruohang Feng, Guoyang Zeng
for: 帮助数据库管理员（DBA）更好地管理、维护和优化数据库系统，提高DBA的工作效率和响应时间。
methods: 使用大型自然语言模型（LLM），自动从诊断文档中获取知识，生成有理据的诊断报告（包括根本原因和解决方案），并且可以在acceptable时间内（比如10分钟）完成诊断。
results: 对实际 benchmark 进行了验证，并显示了 D-Bot 可以有效地分析未经见过的异常，并且与传统方法和原生模型（如 GPT-4）相比，具有显著的性能优势。

Abstract
Database administrators (DBAs) play an important role in managing, maintaining and optimizing database systems. However, it is hard and tedious for DBAs to manage a large number of databases and give timely response (waiting for hours is intolerable in many online cases). In addition, existing empirical methods only support limited diagnosis scenarios, which are also labor-intensive to update the diagnosis rules for database version updates. Recently large language models (LLMs) have shown great potential in various fields. Thus, we propose D-Bot, an LLM-based database diagnosis system that can automatically acquire knowledge from diagnosis documents, and generate reasonable and well-founded diagnosis report (i.e., identifying the root causes and solutions) within acceptable time (e.g., under 10 minutes compared to hours by a DBA). The techniques in D-Bot include (i) offline knowledge extraction from documents, (ii) automatic prompt generation (e.g., knowledge matching, tool retrieval), (iii) root cause analysis using tree search algorithm, and (iv) collaborative mechanism for complex anomalies with multiple root causes. We verify D-Bot on real benchmarks (including 539 anomalies of six typical applications), and the results show that D-Bot can effectively analyze the root causes of unseen anomalies and significantly outperforms traditional methods and vanilla models like GPT-4.

摘要

Offline knowledge extraction from documents2. Automatic prompt generation (e.g., knowledge matching, tool retrieval)3. Root cause analysis using tree search algorithms4. Collaborative mechanism for complex anomalies with multiple root causesWe verify D-Bot on real benchmarks (including 539 anomalies of six typical applications), and the results show that D-Bot can effectively analyze the root causes of unseen anomalies and significantly outperforms traditional methods and vanilla models like GPT-4.

Foveation in the Era of Deep Learning

paper_url: http://arxiv.org/abs/2312.01450
repo_url: https://github.com/georgekillick90/fovconvnext
paper_authors: George Killick, Paul Henderson, Paul Siebert, Gerardo Aragon-Camarasa
for: 本研究旨在解决视觉场景中活动投入的挑战，使用瑞夫感知器实现瑞夫感知活动视觉架构。
methods: 我们引入了一个端到端可微分的瑞夫感知活动视觉架构，利用图 convolutional neural network 处理瑞夫感知图像，并提出了一种简单 yet effective 的瑞夫感知图像采样表述。我们的模型学习通过iteratively 关注图像中 relevante 区域进行分类。
results: 我们进行了多种图像集上详细的实验，与之前的方法进行了比较，并测试了不同的选择，如瑞夫感知度和Network perform 数据或计算预算的影响。我们发现我们的模型比一个状态 искусственный神经网络和相关参数的瑞夫感知视觉架构提高了 объек recognition 性能。

Abstract
In this paper, we tackle the challenge of actively attending to visual scenes using a foveated sensor. We introduce an end-to-end differentiable foveated active vision architecture that leverages a graph convolutional network to process foveated images, and a simple yet effective formulation for foveated image sampling. Our model learns to iteratively attend to regions of the image relevant for classification. We conduct detailed experiments on a variety of image datasets, comparing the performance of our method with previous approaches to foveated vision while measuring how the impact of different choices, such as the degree of foveation, and the number of fixations the network performs, affect object recognition performance. We find that our model outperforms a state-of-the-art CNN and foveated vision architectures of comparable parameters and a given pixel or computation budget

摘要
在这篇论文中，我们面临了使用瑕点感知器进行视觉场景活动投入的挑战。我们提出了一种从头到尾可微分的活动感知视觉架构，利用图像加注 convolutional neural network（CNN）处理瑕点图像，并提出了一种简单 yet effective的瑕点图像采样方法。我们的模型会逐次吸引图像中相关于分类的区域，并通过不同的选择，如瑕点度和fixation数量，影响物体识别性能。我们进行了多个图像集数据进行详细的实验，并与之前的方法进行比较，检查不同的选择对物体识别性能的影响。我们发现，我们的模型在与相对参数和 computation budget 相同的情况下，超过了一个状态元投入 CNN 和相似参数的active vision架构。

Learning Curricula in Open-Ended Worlds

paper_url: http://arxiv.org/abs/2312.03126
repo_url: https://github.com/facebookresearch/dcd
paper_authors: Minqi Jiang
for: 提高RLagent的开放性和普适性
methods: 自动生成训练环境的极限 Frontier curriculum
results: RLagent展现出更好的 robustness和泛化能力，能够在未经见过的环境中表现出更好的性能

Abstract
Deep reinforcement learning (RL) provides powerful methods for training optimal sequential decision-making agents. As collecting real-world interactions can entail additional costs and safety risks, the common paradigm of sim2real conducts training in a simulator, followed by real-world deployment. Unfortunately, RL agents easily overfit to the choice of simulated training environments, and worse still, learning ends when the agent masters the specific set of simulated environments. In contrast, the real world is highly open-ended, featuring endlessly evolving environments and challenges, making such RL approaches unsuitable. Simply randomizing over simulated environments is insufficient, as it requires making arbitrary distributional assumptions and can be combinatorially less likely to sample specific environment instances that are useful for learning. An ideal learning process should automatically adapt the training environment to maximize the learning potential of the agent over an open-ended task space that matches or surpasses the complexity of the real world. This thesis develops a class of methods called Unsupervised Environment Design (UED), which aim to produce such open-ended processes. Given an environment design space, UED automatically generates an infinite sequence or curriculum of training environments at the frontier of the learning agent's capabilities. Through extensive empirical studies and theoretical arguments founded on minimax-regret decision theory and game theory, the findings in this thesis show that UED autocurricula can produce RL agents exhibiting significantly improved robustness and generalization to previously unseen environment instances. Such autocurricula are promising paths toward open-ended learning systems that achieve more general intelligence by continually generating and mastering additional challenges of their own design.

摘要
深度强化学习（RL）提供了强大的方法来训练优化的序列决策机器人。由于收集真实世界交互可能会带来额外成本和安全隐患，因此常见的 simulate-to-real paradigm 是在模拟器中进行训练，然后将其部署到真实世界中。然而，RL 代理人很容易在模拟环境中过拟合，而且学习结束于代理人掌握特定的模拟环境。相比之下，真实世界是高度开放的，具有无限制的环境和挑战，使得这些 RL 方法不适用。偶尔随机选择模拟环境是不充分的，因为它需要作出伪装分布Assumption 并可能是 combinatorially 更不可能 sampling 特定环境实例，这些实例对学习具有用值。一个理想的学习过程应该自动调整培训环境，以确保代理人在开放任务空间中学习的可能性最大，并且与真实世界的复杂度匹配或超越。这份论文开发了一种方法，称为无监督环境设计（UED），以便生成开放任务空间中的无限序列或训练环境。通过实验研究和基于最小最大 regret 决策理论和游戏理论的理论支持，这些发现表明，UED 自动训练环境可以生成高度 Robustness 和泛化性，使代理人在未经见过的环境实例中表现出色。这些自动训练课程是开放式学习系统的可能性的追求，它们可以不断生成和掌握更多的挑战，以实现更广泛的智能。

Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models

paper_url: http://arxiv.org/abs/2312.01409
repo_url: None
paper_authors: Shengqu Cai, Duygu Ceylan, Matheus Gadelha, Chun-Hao Paul Huang, Tuanfeng Yang Wang, Gordon Wetzstein
for: 帮助用户创造高质量的计算机生成视频，通过结合动态3D网格的可控性和emerging扩散模型的表达性和可编辑性。
methods: 使用动画、低精度渲染的网格作为输入，并在预训练的文本到图像生成模型的不同阶段插入真实的对应关系信息，以生成高质量和时间协调的帧。
results: 在不同的动作和摄像头路径例子中，实现高质量和时间协调的计算机生成视频，并且允许用户应用自己的创意而不受限制。

Abstract
Traditional 3D content creation tools empower users to bring their imagination to life by giving them direct control over a scene's geometry, appearance, motion, and camera path. Creating computer-generated videos, however, is a tedious manual process, which can be automated by emerging text-to-video diffusion models. Despite great promise, video diffusion models are difficult to control, hindering a user to apply their own creativity rather than amplifying it. To address this challenge, we present a novel approach that combines the controllability of dynamic 3D meshes with the expressivity and editability of emerging diffusion models. For this purpose, our approach takes an animated, low-fidelity rendered mesh as input and injects the ground truth correspondence information obtained from the dynamic mesh into various stages of a pre-trained text-to-image generation model to output high-quality and temporally consistent frames. We demonstrate our approach on various examples where motion can be obtained by animating rigged assets or changing the camera path.

摘要
传统的3D内容创建工具让用户直接控制场景的几何结构、外观、运动和摄像机道，但创建计算机生成视频是一个繁琐的手动过程，可以通过新兴的文本到视频扩散模型来自动化。然而，这些扩散模型难以控制，使得用户无法应用自己的创意而是受到限制。为解决这个挑战，我们提出了一种新的方法，将动态3D网格的可控性与新兴的扩散模型的表达力和可编辑性结合在一起。我们的方法接受一个动画、低质量渲染的网格输入，并将动态网格中的真实匹配信息注入到预训练的文本到图像生成模型的不同阶段，以生成高质量和时间协调的帧。我们在不同的示例中展示了如何通过动画 rigged 资产或改变摄像机道来获得运动。

Towards Mitigating Perceived Unfairness in Contracts from a Non-Legal Stakeholder’s Perspective

paper_url: http://arxiv.org/abs/2312.01398
repo_url: None
paper_authors: Anmol Singhal, Preethu Rose Anish, Shirish Karande, Smita Ghaisas
For: The paper aims to identify potentially unfair clauses in commercial contracts and to develop a method using Pre-trained Language Models (PLMs) to identify unfairness in contractual sentences.* Methods: The paper uses an empirical study and compares chain of thought prompting and semi-supervised fine-tuning approaches to identify unfairness in contractual sentences. The authors use BERT-based fine-tuning, which achieves an accuracy of 84% on a dataset consisting of proprietary contracts.* Results: The paper finds that BERT-based fine-tuning outperforms chain of thought prompting using Vicuna-13B by a margin of 9%. The authors achieve an accuracy of 84% in identifying potentially unfair clauses in commercial contracts using PLMs.

Abstract
Commercial contracts are known to be a valuable source for deriving project-specific requirements. However, contract negotiations mainly occur among the legal counsel of the parties involved. The participation of non-legal stakeholders, including requirement analysts, engineers, and solution architects, whose primary responsibility lies in ensuring the seamless implementation of contractual terms, is often indirect and inadequate. Consequently, a significant number of sentences in contractual clauses, though legally accurate, can appear unfair from an implementation perspective to non-legal stakeholders. This perception poses a problem since requirements indicated in the clauses are obligatory and can involve punitive measures and penalties if not implemented as committed in the contract. Therefore, the identification of potentially unfair clauses in contracts becomes crucial. In this work, we conduct an empirical study to analyze the perspectives of different stakeholders regarding contractual fairness. We then investigate the ability of Pre-trained Language Models (PLMs) to identify unfairness in contractual sentences by comparing chain of thought prompting and semi-supervised fine-tuning approaches. Using BERT-based fine-tuning, we achieved an accuracy of 84% on a dataset consisting of proprietary contracts. It outperformed chain of thought prompting using Vicuna-13B by a margin of 9%.

摘要
商业合同是一个价值颇高的来源，可以 derivate 项目特定的需求。然而，合同谈判主要由各方法的法律顾问进行，非法领域的参与者，包括需求分析师、工程师和解决方案建筑师，他们的主要责任是确保合同条款的顺利实施，往往 indirect 和不充分。因此，一些合同条款中的句子，尽管法律上准确，但从实施角度来看可能会看起来不公正。这种情况会导致合同中的需求被视为不公正，从而影响实施。因此，对合同中可能不公正的句子进行标识成为了一项重要的任务。在这项工作中，我们进行了一项employmulti-stakeholder perspective的研究，以分析不同参与者对合同公正性的看法。然后，我们investigate了PLMs的能力来标识合同中的不公正句子，并比较了链条思维提示和semi-supervised fine-tuning两种方法。使用BERT基于的精度 fine-tuning，我们在一个包含专用合同的数据集上达到了84%的准确率，超过了链条思维提示使用Vicuna-13B的margin of 9%。

paper_url: http://arxiv.org/abs/2312.01367
repo_url: None
paper_authors: Bowen Sun, Shibao Zheng
for: 这篇论文主要是为了解决文本描述 Face Recognition 领域中的一些挑战。
methods: 这篇论文使用了一种控制性的扩散过程，通过实现概率传输理论连接，来实现文本描述 Face Recognition。
results: 根据实验结果，这种方法可以在文本描述 Face Recognition 领域达到了最高精度，并且在验证和识别两个任务中都表现出色。

Abstract
Diffusion probabilistic models (DPMs) have exhibited exceptional proficiency in generating visual media of outstanding quality and realism. Nonetheless, their potential in non-generative domains, such as face recognition, has yet to be thoroughly investigated. Meanwhile, despite the extensive development of multi-modal face recognition methods, their emphasis has predominantly centered on visual modalities. In this context, face recognition through textual description presents a unique and promising solution that not only transcends the limitations from application scenarios but also expands the potential for research in the field of cross-modal face recognition. It is regrettable that this avenue remains unexplored and underutilized, a consequence from the challenges mainly associated with three aspects: 1) the intrinsic imprecision of verbal descriptions; 2) the significant gaps between texts and images; and 3) the immense hurdle posed by insufficient databases.To tackle this problem, we present DiFace, a solution that effectively achieves face recognition via text through a controllable diffusion process, by establishing its theoretical connection with probability transport. Our approach not only unleashes the potential of DPMs across a broader spectrum of tasks but also achieves, to the best of our knowledge, a significant accuracy in text-to-image face recognition for the first time, as demonstrated by our experiments on verification and identification.

摘要
diffuse probabilistic models (DPMs) 有出色地表现出高品质和真实性的视觉媒体生成能力。然而，它们在非生成领域，如人脸识别，的潜力尚未得到全面探索。同时，虽然视觉多模态人脸识别方法的研发得到了广泛的应用，但是它们主要集中在视觉modalities上。在这个上下文中，通过文本描述进行人脸识别是一个独特和有前途的解决方案，不仅能够突破应用场景的限制，还可以拓宽跨modalities的人脸识别研究领域。然而，这一可能性尚未得到充分探索和利用，主要因为三个方面的挑战：1）文本描述的内在不准确性；2）图像和文本之间的巨大差距；3）数据库的缺乏。为解决这个问题，我们提出了DiFace方法，通过控制扩散过程，实现文本描述到人脸识别的功能。我们的方法不仅可以拓宽 DPMs 的应用范围，还在我们知道的范围内实现了文本描述到人脸识别的首次精度的实验 validate 和验证。

Analyze the robustness of three NMF algorithms (Robust NMF with L1 norm, L2-1 norm NMF, L2 NMF)

paper_url: http://arxiv.org/abs/2312.01357
repo_url: None
paper_authors: Cheng Zeng, Jiaqi Tian, Yixuan Xu
for: 研究非正交矩阵因子分解（NMF）在不同类型噪声下的Robustness。
methods: 采用L1 NMF、L2 NMF和L21 NMF三种不同的NMF算法，使用ORL和YaleB数据集进行噪声添加和封闭噪声的实验，并使用RMSE、ACC和NMI等评价指标来评估不同NMF算法在噪声环境中的性能。
results: 通过评价指标来评估不同NMF算法在噪声环境中的性能，并取得了噪声环境下NMF算法的抵抗力和实际应用中的可行性。

Abstract
Non-negative matrix factorization (NMF) and its variants have been widely employed in clustering and classification tasks (Long, & Jian , 2021). However, noises can seriously affect the results of our experiments. Our research is dedicated to investigating the noise robustness of non-negative matrix factorization (NMF) in the face of different types of noise. Specifically, we adopt three different NMF algorithms, namely L1 NMF, L2 NMF, and L21 NMF, and use the ORL and YaleB data sets to simulate a series of experiments with salt-and-pepper noise and Block-occlusion noise separately. In the experiment, we use a variety of evaluation indicators, including root mean square error (RMSE), accuracy (ACC), and normalized mutual information (NMI), to evaluate the performance of different NMF algorithms in noisy environments. Through these indicators, we quantify the resistance of NMF algorithms to noise and gain insights into their feasibility in practical applications.

摘要
非正式矩阵分解（NMF）和其变种在聚类和分类任务中广泛应用（龙、剑，2021）。然而，噪声可以严重地影响我们的实验结果。我们的研究旨在调查非正式矩阵分解在不同类型的噪声下的强度。特别是，我们采用了三种不同的NMF算法，即L1 NMF、L2 NMF和L21 NMF，并使用ORL和YaleB数据集来实现一系列对噪声和块填充噪声 separately。在实验中，我们使用了一些评价指标，包括平均平方误差（RMSE）、准确率（ACC）和normalized mutual information（NMI），来评估不同NMF算法在噪声环境中的表现。通过这些指标，我们可以量化不同NMF算法对噪声的抵抗力和实际应用中的可行性。

Honesty Is the Best Policy: Defining and Mitigating AI Deception

paper_url: http://arxiv.org/abs/2312.01350
repo_url: None
paper_authors: Francis Rhys Ward, Francesco Belardinelli, Francesca Toni, Tom Everitt
for: 本研究旨在解决人工智能系统中的陷阱代理人问题，以保障系统的安全、信任worthiness和合作。
methods: 本文提出了一种 formal definition of deception in structural causal games，基于哲学文献，适用于实际的机器学习系统。
results: 本研究实验ally shows that our formal definition of deception aligns with philosophical and common-sense meanings of deception, and our graphical criteria for deception can be used to mitigate deception in reinforcement learning agents and language models.

Abstract
Deceptive agents are a challenge for the safety, trustworthiness, and cooperation of AI systems. We focus on the problem that agents might deceive in order to achieve their goals (for instance, in our experiments with language models, the goal of being evaluated as truthful). There are a number of existing definitions of deception in the literature on game theory and symbolic AI, but there is no overarching theory of deception for learning agents in games. We introduce a formal definition of deception in structural causal games, grounded in the philosophy literature, and applicable to real-world machine learning systems. Several examples and results illustrate that our formal definition aligns with the philosophical and commonsense meaning of deception. Our main technical result is to provide graphical criteria for deception. We show, experimentally, that these results can be used to mitigate deception in reinforcement learning agents and language models.

摘要
诱导者是人工智能系统的安全、可靠性和合作的挑战。我们关注在代理者可能为了实现目标而隐瞒真实信息的问题（例如，在我们的语言模型实验中，目标是被评估为真实的）。现有在游戏理论和符号AI领域的多种定义误导，但是没有涵盖学习代理者的总体理论。我们提出了结构 causal游戏中的正式定义误导，基于哲学文献，并适用于现实世界机器学习系统。一些例子和结果表明，我们的正式定义与哲学和常识中的误导相吻合。我们的主要技术成果是在图示中提供误导的 критери习。我们实验表明，这些结果可以用来抑制误导在强化学习代理者和语言模型中。

tsMorph: generation of semi-synthetic time series to understand algorithm performance

paper_url: http://arxiv.org/abs/2312.01344
repo_url: None
paper_authors: Moisés Santos, André de Carvalho, Carlos Soares
for: 本研究旨在探讨时间序列预测方法在不同条件下的表现。
methods: 本研究使用了 dataset morphing 技术，通过创建两个原始数据集之间的序列来生成 semi-synthetic 时间序列。
results: 实验结果显示，Long Short-Term Memory Network 预测算法在时间序列频率增加时表现得更好，这些实验证明了 tsMorph 的效用，并为时间序列预测方法的研究提供了一个有用的工具。

Abstract
Time series forecasting is a subject of significant scientific and industrial importance. Despite the widespread utilization of forecasting methods, there is a dearth of research aimed at comprehending the conditions under which these methods yield favorable or unfavorable performances. Empirical studies, although common, encounter challenges due to the limited availability of datasets, impeding the extraction of reliable insights. To address this, we present tsMorph, a straightforward approach for generating semi-synthetic time series through dataset morphing. tsMorph operates by creating a sequence of datasets derived from two original datasets. These newly generated datasets exhibit a progressive departure from the characteristics of one dataset and a convergence toward the attributes of the other. This method provides a valuable alternative for obtaining substantial datasets. In this paper, we demonstrate the utility of tsMorph by assessing the performance of the Long Short-Term Memory Network forecasting algorithm. The time series under examination are sourced from the NN5 Competition. The findings reveal compelling insights. Notably, the performance of the Long Short-Term Memory Network improves proportionally with the frequency of the time series. These experiments affirm that tsMorph serves as an effective tool for gaining an understanding of forecasting algorithm behaviors, offering a pathway to overcome the limitations posed by empirical studies and enabling more extensive and reliable experimentation.

摘要
时间序列预测是一个科学和工业上的重要问题。尽管预测方法广泛应用，但是有很少研究旨在理解这些方法在不同条件下的表现。实际研究受到数据集的有限性的限制，导致EXTRACTING RELIABLE INSIGHTS困难。为了解决这个问题，我们提出了tsMorph方法，它可以生成基于两个原始数据集的半人工时间序列。这些新生成的数据集会逐渐偏离一个数据集的特征，而又 converges toward另一个数据集的特征。这种方法可以提供充足的数据集。在这篇论文中，我们使用Long Short-Term Memory Network预测算法来评估tsMorph的Utility。时间序列来源于NN5竞赛。我们的发现表明，Long Short-Term Memory Network的表现与时间序列频率成直接相关。这些实验证明了tsMorph是一种有效的工具，可以帮助我们理解预测算法的行为，并提供一条可靠的实验方式。

AI-Powered Arabic Crossword Puzzle Generation for Educational Applications

paper_url: http://arxiv.org/abs/2312.01339
repo_url: None
paper_authors: Kamyar Zeinalipour, Mohamed Zaky Saad, Marco Maggini, Marco Gori
for: 这个研究旨在开发一个基于进步人工智能技术的阿拉伯语十字游戏生成器，以提高学习效果和推广教育技术。
methods: 这个系统使用了 cutting-edge 大语言模型，包括 GPT4、GPT3-Davinci、GPT3-Curie、GPT3-Babbage、GPT3-Ada 和 BERT，以生成独特和挑战性的问题和答案。 fine-tuning、几兆/零兆学习策略和调查严格的品质检查协议，以确保生成的问题和答案质量高。
results: 这个系统可以实现高质量的教育十字游戏，推广学习和问题解决能力，进而改善学习体验和学习效果。

Abstract
This paper presents the first Arabic crossword puzzle generator driven by advanced AI technology. Leveraging cutting-edge large language models including GPT4, GPT3-Davinci, GPT3-Curie, GPT3-Babbage, GPT3-Ada, and BERT, the system generates distinctive and challenging clues. Based on a dataset comprising over 50,000 clue-answer pairs, the generator employs fine-tuning, few/zero-shot learning strategies, and rigorous quality-checking protocols to enforce the generation of high-quality clue-answer pairs. Importantly, educational crosswords contribute to enhancing memory, expanding vocabulary, and promoting problem-solving skills, thereby augmenting the learning experience through a fun and engaging approach, reshaping the landscape of traditional learning methods. The overall system can be exploited as a powerful educational tool that amalgamates AI and innovative learning techniques, heralding a transformative era for Arabic crossword puzzles and the intersection of technology and education.

摘要

Facial Emotion Recognition Under Mask Coverage Using a Data Augmentation Technique

paper_url: http://arxiv.org/abs/2312.01335
repo_url: https://github.com/areffarhadi/masked_face_emotion_recognition
paper_authors: Aref Farhadipour, Pouya Taghipour
for: 这个研究旨在开发一个可以识别穿着不同面具的人们情绪的人工智能视觉系统。
methods: 我们提出了一种新的数据增强技术，使用四种面具类型来增强我们的模型性能。我们使用了四种卷积神经网络，Alexnet、Squeezenet、Resnet50和VGGFace2，通过转移学习进行训练。
results: 我们的模型在多面具模式下表现出色，比单面具模式更高的精度。VGGFace2网络在人依赖模式下取得了97.82%的准确率，而在人独立模式下取得了74.21%的准确率。我们还使用了多种度量来评估我们的系统效率，包括精度、敏感度、特异度、AUC、F1分数和迷思矩阵。此外，我们还使用了LIME算法来可视化CNN的决策采取策略。

Abstract
Identifying human emotions using AI-based computer vision systems, when individuals wear face masks, presents a new challenge in the current Covid-19 pandemic. In this study, we propose a facial emotion recognition system capable of recognizing emotions from individuals wearing different face masks. A novel data augmentation technique was utilized to improve the performance of our model using four mask types for each face image. We evaluated the effectiveness of four convolutional neural networks, Alexnet, Squeezenet, Resnet50 and VGGFace2 that were trained using transfer learning. The experimental findings revealed that our model works effectively in multi-mask mode compared to single-mask mode. The VGGFace2 network achieved the highest accuracy rate, with 97.82% for the person-dependent mode and 74.21% for the person-independent mode using the JAFFE dataset. However, we evaluated our proposed model using the UIBVFED dataset. The Resnet50 has demonstrated superior performance, with accuracies of 73.68% for the person-dependent mode and 59.57% for the person-independent mode. Moreover, we employed metrics such as precision, sensitivity, specificity, AUC, F1 score, and confusion matrix to measure our system's efficiency in detail. Additionally, the LIME algorithm was used to visualize CNN's decision-making strategy.

摘要
identifying human emotions using AI-based computer vision systems during the current Covid-19 pandemic, when individuals wear face masks, presents a new challenge. In this study, we propose a facial emotion recognition system that can recognize emotions from individuals wearing different face masks. We used a novel data augmentation technique to improve the performance of our model, using four mask types for each face image. We evaluated the effectiveness of four convolutional neural networks (CNNs), Alexnet, Squeezenet, Resnet50, and VGGFace2, that were trained using transfer learning. The experimental findings showed that our model works effectively in multi-mask mode compared to single-mask mode. The VGGFace2 network achieved the highest accuracy rate, with 97.82% for the person-dependent mode and 74.21% for the person-independent mode using the JAFFE dataset. However, we evaluated our proposed model using the UIBVFED dataset. The Resnet50 demonstrated superior performance, with accuracies of 73.68% for the person-dependent mode and 59.57% for the person-independent mode. Moreover, we employed metrics such as precision, sensitivity, specificity, AUC, F1 score, and confusion matrix to measure our system's efficiency in detail. Additionally, the LIME algorithm was used to visualize the CNN's decision-making strategy.

JarviX: A LLM No code Platform for Tabular Data Analysis and Optimization

paper_url: http://arxiv.org/abs/2312.02213
repo_url: None
paper_authors: Shang-Ching Liu, ShengKun Wang, Wenqi Lin, Chung-Wei Hsiung, Yi-Chen Hsieh, Yu-Ping Cheng, Sian-Hong Luo, Tsungyao Chang, Jianwei Zhang
for: 这篇论文主要是为了提出一个智能数据分析框架 JarviX，用于自动化数据分析和可视化。
methods: 这篇论文使用了大语言模型（LLMs）来实现高精度数据分析和可视化，并提供了一个自动化机器学习（AutoML）管道来优化机器配置。
results: 实验结果表明，JarviX 可以提供高效和可靠的数据分析和可视化结果，并且可以自动生成数据概况报告、分析问题和结果解释。

Abstract
In this study, we introduce JarviX, a sophisticated data analytics framework. JarviX is designed to employ Large Language Models (LLMs) to facilitate an automated guide and execute high-precision data analyzes on tabular datasets. This framework emphasizes the significance of varying column types, capitalizing on state-of-the-art LLMs to generate concise data insight summaries, propose relevant analysis inquiries, visualize data effectively, and provide comprehensive explanations for results drawn from an extensive data analysis pipeline. Moreover, JarviX incorporates an automated machine learning (AutoML) pipeline for predictive modeling. This integration forms a comprehensive and automated optimization cycle, which proves particularly advantageous for optimizing machine configuration. The efficacy and adaptability of JarviX are substantiated through a series of practical use case studies.

摘要
在本研究中，我们介绍了 JarviX，一种复杂的数据分析框架。 JarviX 采用大型自然语言模型（LLM）来自动生成数据分析摘要、提出相关的分析问题、可读性地视觉化数据，以及为数据分析管道中的结果提供全面的解释。此外，JarviX 还包含自动机器学习（AutoML）管道，以便预测模型化。这种整体和自动化优化循环，对机器配置优化具有特点优势。 JarviX 的可效性和适应性通过一系列实践用例研究得到证明。

ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models

paper_url: http://arxiv.org/abs/2312.01305
repo_url: None
paper_authors: Jeong-gi Kwak, Erqun Dong, Yuhe Jin, Hanseok Ko, Shweta Mahajan, Kwang Moo Yi
for: 通过一幅图像生成新视图是一项具有挑战性的任务，需要对图像中对象的3D结构进行理解，并生成高质量、空间一致的新视图。
methods: 我们使用一个预训练的视频扩散模型来解决这个问题。我们的关键想法是将新视图生成 reformulated 为生成一段扫描视频 – 一个扫描视频 –，这样就能够利用视频扩散模型所学习的强大假设。
results: 通过创建一个平滑的相机轨迹，并使用视图条件扩散模型和视频扩散模型进行减噪，我们可以获得高一致性的新视图合成，超过了现状卷积模型的表现。

Abstract
Generating novel views of an object from a single image is a challenging task. It requires an understanding of the underlying 3D structure of the object from an image and rendering high-quality, spatially consistent new views. While recent methods for view synthesis based on diffusion have shown great progress, achieving consistency among various view estimates and at the same time abiding by the desired camera pose remains a critical problem yet to be solved. In this work, we demonstrate a strikingly simple method, where we utilize a pre-trained video diffusion model to solve this problem. Our key idea is that synthesizing a novel view could be reformulated as synthesizing a video of a camera going around the object of interest -- a scanning video -- which then allows us to leverage the powerful priors that a video diffusion model would have learned. Thus, to perform novel-view synthesis, we create a smooth camera trajectory to the target view that we wish to render, and denoise using both a view-conditioned diffusion model and a video diffusion model. By doing so, we obtain a highly consistent novel view synthesis, outperforming the state of the art.

摘要
<>translate_language: zh-CN Generating novel views of an object from a single image is a challenging task. It requires an understanding of the underlying 3D structure of the object from an image and rendering high-quality, spatially consistent new views. While recent methods for view synthesis based on diffusion have shown great progress, achieving consistency among various view estimates and at the same time abiding by the desired camera pose remains a critical problem yet to be solved. In this work, we demonstrate a strikingly simple method, where we utilize a pre-trained video diffusion model to solve this problem. Our key idea is that synthesizing a novel view could be reformulated as synthesizing a video of a camera going around the object of interest -- a scanning video -- which then allows us to leverage the powerful priors that a video diffusion model would have learned. Thus, to perform novel-view synthesis, we create a smooth camera trajectory to the target view that we wish to render, and denoise using both a view-conditioned diffusion model and a video diffusion model. By doing so, we obtain a highly consistent novel view synthesis, outperforming the state of the art.Note: The "zh-CN" language code specifies Simplified Chinese.

Churn Prediction via Multimodal Fusion Learning:Integrating Customer Financial Literacy, Voice, and Behavioral Data

paper_url: http://arxiv.org/abs/2312.01301
repo_url: None
paper_authors: David Hason Rudd, Huan Huo, Md Rafiqul Islam, Guandong Xu
for: 这份研究的目的是为了提高客户退买预测的准确性，以应对现代企业面临的客户退买问题。
methods: 这篇研究使用了多modal融合学习模型，融合了客户情感、财务素养（FL）水平和财务行为数据，以提高退买预测的准确性和不偏见性。研究使用了SMOGN COREG超级模型来评估客户FL水平，并使用了人工神经网络和权重整合技术来预测退买倾向。此外，研究还使用了语音情感识别模型，使用预训 CNN-VGG16 来识别客户的情感。
results: 研究结果显示，融合多modal数据可以提高退买预测的准确性。比较 Late Fusion、基准模型和融合模型，融合模型在测试精度、mean average precision和macro-averaged F1 score中获得了91.2%、66和54的成绩。此外，分析显示，低FL scores和负情感具有正相关，高隐含预测客户退买风险。

Abstract
In todays competitive landscape, businesses grapple with customer retention. Churn prediction models, although beneficial, often lack accuracy due to the reliance on a single data source. The intricate nature of human behavior and high dimensional customer data further complicate these efforts. To address these concerns, this paper proposes a multimodal fusion learning model for identifying customer churn risk levels in financial service providers. Our multimodal approach integrates customer sentiments financial literacy (FL) level, and financial behavioral data, enabling more accurate and bias-free churn prediction models. The proposed FL model utilizes a SMOGN COREG supervised model to gauge customer FL levels from their financial data. The baseline churn model applies an ensemble artificial neural network and oversampling techniques to predict churn propensity in high-dimensional financial data. We also incorporate a speech emotion recognition model employing a pre-trained CNN-VGG16 to recognize customer emotions based on pitch, energy, and tone. To integrate these diverse features while retaining unique insights, we introduced late and hybrid fusion techniques that complementary boost coordinated multimodal co learning. Robust metrics were utilized to evaluate the proposed multimodal fusion model and hence the approach validity, including mean average precision and macro-averaged F1 score. Our novel approach demonstrates a marked improvement in churn prediction, achieving a test accuracy of 91.2%, a Mean Average Precision (MAP) score of 66, and a Macro-Averaged F1 score of 54 through the proposed hybrid fusion learning technique compared with late fusion and baseline models. Furthermore, the analysis demonstrates a positive correlation between negative emotions, low FL scores, and high-risk customers.

摘要
今天的竞争场景中，企业面临Customer Retention的挑战。虽然预测客户弃用的模型具有优势，但它们经常缺乏准确性，因为它们仅仅基于单一数据源。人类行为的复杂性和高维客户数据更加增加了这些努力的困难。为了解决这些问题，这篇论文提出了一种多模式融合学习模型，用于在金融服务提供者中预测客户弃用风险水平。我们的多模式approach集成了客户情感、财务文化水平（FL）和财务行为数据，从而实现更加准确和不偏的预测模型。我们的FL模型使用SMOGN COREG指导模型来测量客户FL水平从 их金融数据中。基本的弃用模型采用了一个ensemble人工神经网络和扩展技术来预测高维金融数据中的弃用可能性。我们还 integratespeech感知模型，使用预训练的CNN-VGG16来识别客户情感的变化，并根据抽象、能量和音调来识别客户的情感。为了融合这些多样的特征而保留每个特征的独特意义，我们引入了晚期和hybrid融合技术，这些技术可以相互补做，从而提高多模式融合学习的效果。我们使用了多种Robust度量来评估我们的多模式融合模型的有效性，包括测试准确率、 macro-averaged F1分数和 Mean Average Precision（MAP）分数。我们的新方法在测试数据集上达到了91.2%的测试准确率，MAP分数为66和Macro-Averaged F1分数为54，相比基eline模型和晚期融合模型的性能有明显的提高。此外，分析还表明，低FL分、负情感和高风险客户之间存在正相关关系。

TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents

paper_url: http://arxiv.org/abs/2312.01279
repo_url: None
paper_authors: James Enouen, Hootan Nakhost, Sayna Ebrahimi, Sercan O Arik, Yan Liu, Tomas Pfister
for: 这篇论文的目的是解释大型自然语言模型（LLM）的生成内容，以便更好地理解它们的决策过程和生成结果。
methods: 这篇论文使用了后期解释技术，尤其是使用Shapley值来解释深度学习模型。但是，在处理长输入上下文和自动生成的输出序列时，Shapley值的扩展具有 significiant challenges。这篇论文提出了一种名为TextGenSHAP的高效后期解释方法，该方法利用了LM特有的技术。
results: 根据实验结果，TextGenSHAP比普通的Shapley值计算方法更快速，处理单个token的解释时间从小时降低到了分钟级别，而处理整个文档的解释时间则只需要几秒钟。此外，这篇论文还证明了在两个重要场景中，实时Shapley值可以提供更好的理解和改进LLM的性能：在回答长文书问题时，可以 lokalisiert到重要的单词和句子；在改进现有文档检索系统时，可以提高选择的段落和最终回答的准确率。

Abstract
Large language models (LLMs) have attracted huge interest in practical applications given their increasingly accurate responses and coherent reasoning abilities. Given their nature as black-boxes using complex reasoning processes on their inputs, it is inevitable that the demand for scalable and faithful explanations for LLMs' generated content will continue to grow. There have been major developments in the explainability of neural network models over the past decade. Among them, post-hoc explainability methods, especially Shapley values, have proven effective for interpreting deep learning models. However, there are major challenges in scaling up Shapley values for LLMs, particularly when dealing with long input contexts containing thousands of tokens and autoregressively generated output sequences. Furthermore, it is often unclear how to effectively utilize generated explanations to improve the performance of LLMs. In this paper, we introduce TextGenSHAP, an efficient post-hoc explanation method incorporating LM-specific techniques. We demonstrate that this leads to significant increases in speed compared to conventional Shapley value computations, reducing processing times from hours to minutes for token-level explanations, and to just seconds for document-level explanations. In addition, we demonstrate how real-time Shapley values can be utilized in two important scenarios, providing better understanding of long-document question answering by localizing important words and sentences; and improving existing document retrieval systems through enhancing the accuracy of selected passages and ultimately the final responses.

摘要
在这篇文章中，我们介绍 TextGenSHAP，一种高效的后续解释方法，它特别适用于 LLMs。我们证明，TextGenSHAP 可以在速度方面与传统 Shapley 值计算相比，提高速度，从 hour 降低到 minute 级别，而且可以在 document 级别进行解释，只需要几秒钟。此外，我们还示出了在两个重要的应用场景中，使用实时 Shapley 值可以提供更好的理解，即对长文件问答中的重要单词和句子进行地图化，以及通过提高选择的段落和最终回答的准确率来提高现有的文档检索系统。

Running cognitive evaluations on large language models: The do’s and the don’ts

paper_url: http://arxiv.org/abs/2312.01276
repo_url: None
paper_authors: Anna A. Ivanova
for: 这篇论文旨在提出对大语言模型（LLM）的语言基础能力评估方法的方法学考虑。
methods: 文章根据三个Literature case study（通用常识知识 bencmark，理解思想测试和语法一致测试），描述了应用语言测试 onto LLM 时可能出现的坑。然后，文章列出了10个准则，可以帮助设计高质量的AI系统评估。
results: 文章结论提出了四个当前在活跃讨论的领域：提问敏感、文化和语言多样性、使用LLM作为研究助手以及在开放 versus 封闭LLM上进行评估。总的来说，文章的目标是贡献到AI Psychology领域的最佳实践。

Abstract
In this paper, I describe methodological considerations for studies that aim to evaluate the cognitive capacities of large language models (LLMs) using language-based behavioral assessments. Drawing on three case studies from the literature (a commonsense knowledge benchmark, a theory of mind evaluation, and a test of syntactic agreement), I describe common pitfalls that might arise when applying a cognitive test to an LLM. I then list 10 do's and don'ts that should help design high-quality cognitive evaluations for AI systems. I conclude by discussing four areas where the do's and don'ts are currently under active discussion -- prompt sensitivity, cultural and linguistic diversity, using LLMs as research assistants, and running evaluations on open vs. closed LLMs. Overall, the goal of the paper is to contribute to the broader discussion of best practices in the rapidly growing field of AI Psychology.

摘要
在这篇论文中，我介绍了对大语言模型（LLM）使用语言基于行为评估来评估其认知能力的方法ológico Considerations。基于文献中的三个案例（通用常识准入标准、理解他者的能力评估和语法一致性测试），我描述了在应用认知测试于 LLM 时可能出现的常见困难。然后，我列出了10个做法和不做法，以帮助设计高质量的认知评估方法 для AI 系统。我的结论是，在 rapidly growing field of AI Psychology 中，这些做法和不做法在当前正在活跃的讨论中。Here's the translation of the text into Traditional Chinese:在这篇论文中，我介绍了对大语言模型（LLM）使用语言基于行为评估来评估其认知能力的方法ológico Considerations。基于文献中的三个案例（通用常识准入标准、理解他者的能力评估和语法一致性测试），我描述了在应用认知测验于 LLM 时可能出现的常见困难。然后，我列出了10个做法和不做法，以帮助设计高质量的认知评估方法 для AI 系统。我的结论是，在 rapidly growing field of AI Psychology 中，这些做法和不做法在当前正在活跃的讨论中。

Low-Precision Mixed-Computation Models for Inference on Edge

paper_url: http://arxiv.org/abs/2312.02210
repo_url: None
paper_authors: Seyedarmin Azizi, Mahdi Nazemi, Mehdi Kamal, Massoud Pedram
for: 这个研究旨在提出一种混合计算神经网络处理方法，用于边缘应用程序，这个方法利用低精度（低关系）Posit和低精度固定点（FixP）数系统。
methods: 这个混合计算方法使用4位Posit（Posit4）来表示高敏感度的 weights，而 FixP4 则用于表示其他 weights。具体来说，这个方法使用 weight 的重要性和量化误差来分配适当的数系统。此外，这个方法还引入了 Posit 表示的梯度近似，以改善 weight 更新的 backwards 过程中的质量。
results: 研究发现，使用混合计算方法可以提高模型的精度，而且耗能较低。具体来说，在视觉和语言模型中，混合计算方法的精度平均提高了1.5%，并且仅增加了0.19%的能量负担。

Abstract
This paper presents a mixed-computation neural network processing approach for edge applications that incorporates low-precision (low-width) Posit and low-precision fixed point (FixP) number systems. This mixed-computation approach employs 4-bit Posit (Posit4), which has higher precision around zero, for representing weights with high sensitivity, while it uses 4-bit FixP (FixP4) for representing other weights. A heuristic for analyzing the importance and the quantization error of the weights is presented to assign the proper number system to different weights. Additionally, a gradient approximation for Posit representation is introduced to improve the quality of weight updates in the backpropagation process. Due to the high energy consumption of the fully Posit-based computations, neural network operations are carried out in FixP or Posit/FixP. An efficient hardware implementation of a MAC operation with a first Posit operand and FixP for a second operand and accumulator is presented. The efficacy of the proposed low-precision mixed-computation approach is extensively assessed on vision and language models. The results show that, on average, the accuracy of the mixed-computation is about 1.5% higher than that of FixP with a cost of 0.19% energy overhead.

摘要

2023-12-03

Revisiting Non-separable Binary Classification and its Applications in Anomaly Detection

Unlocking the Potential of Federated Learning: The Symphony of Dataset Distillation via Deep Generative Latents

NovoMol: Recurrent Neural Network for Orally Bioavailable Drug Design and Validation on PDGFRα Receptor

Tackling Bias in Pre-trained Language Models: Current Trends and Under-represented Societies

Effectively Fine-tune to Improve Large Multimodal Models for Radiology Report Generation

ADT: Agent-based Dynamic Thresholding for Anomaly Detection

Context-Enhanced Relational Operators with Vector Embeddings

Personality of AI

BenchMARL: Benchmarking Multi-Agent Reinforcement Learning

Exploring Adversarial Robustness of LiDAR-Camera Fusion Model in Autonomous Driving

D-Bot: Database Diagnosis System using Large Language Models

Foveation in the Era of Deep Learning

Learning Curricula in Open-Ended Worlds

Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models

Towards Mitigating Perceived Unfairness in Contracts from a Non-Legal Stakeholder’s Perspective

DiFace: Cross-Modal Face Recognition through Controlled Diffusion

Analyze the robustness of three NMF algorithms (Robust NMF with L1 norm, L2-1 norm NMF, L2 NMF)

Honesty Is the Best Policy: Defining and Mitigating AI Deception

tsMorph: generation of semi-synthetic time series to understand algorithm performance

AI-Powered Arabic Crossword Puzzle Generation for Educational Applications

Facial Emotion Recognition Under Mask Coverage Using a Data Augmentation Technique

JarviX: A LLM No code Platform for Tabular Data Analysis and Optimization

ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models

Churn Prediction via Multimodal Fusion Learning:Integrating Customer Financial Literacy, Voice, and Behavioral Data

TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents

Running cognitive evaluations on large language models: The do’s and the don’ts

Low-Precision Mixed-Computation Models for Inference on Edge