results: equality separation可以用于异常检测,并且可以在超级vised异常检测实验中证明seen和unseen异常的检测。Here’s a more detailed explanation of each point:1. for: The paper aims to solve the problem of linearly classifying XOR, which has been a long-standing challenge in deep learning.2. methods: The authors propose a new method called equality separation, which adapts the support vector machine (SVM) objective to distinguish data within or outside the margin. This method can be integrated into neural network pipelines with a smooth approximation.3. results: The authors show that equality separation can be used for anomaly detection, and they introduce a quantitative measure called closing numbers to formalize this notion. They also test their hypothesis on supervised anomaly detection experiments, demonstrating that equality separation can detect both seen and unseen anomalies.Abstract
The inability to linearly classify XOR has motivated much of deep learning. We revisit this age-old problem and show that linear classification of XOR is indeed possible. Instead of separating data between halfspaces, we propose a slightly different paradigm, equality separation, that adapts the SVM objective to distinguish data within or outside the margin. Our classifier can then be integrated into neural network pipelines with a smooth approximation. From its properties, we intuit that equality separation is suitable for anomaly detection. To formalize this notion, we introduce closing numbers, a quantitative measure on the capacity for classifiers to form closed decision regions for anomaly detection. Springboarding from this theoretical connection between binary classification and anomaly detection, we test our hypothesis on supervised anomaly detection experiments, showing that equality separation can detect both seen and unseen anomalies.
摘要
“XOR问题的非线性分类问题挑战了深度学习的发展。我们回顾这个老问题,并证明了 XOR 问题的线性分类是可能的。而不是将数据分为半空间,我们提议一种微妙的 парадиг,即等式分离,将 SVM 目标函数改造为用于在边缘区划分数据。我们的分类器可以与神经网络核心结构结合使用,并且通过缓和近似来实现。从其性质来看,我们认为等式分离适用于异常检测。为了正式表述这个概念,我们引入 closing numbers,一种量化度量分类器形成关闭决策区的能力。从这种理论上的连接 между 二分类和异常检测,我们在超级vised anomaly detection experiment中测试了我们的假设,并证明了等式分离可以检测到both seen和unseen异常。”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.
Unlocking the Potential of Federated Learning: The Symphony of Dataset Distillation via Deep Generative Latents
results: 实验结果表明,该方法可以提高 federated learning 的准确率,相比非数据简化方法可以提高准确率高达 40%,并且比现有的数据简化方法提高了 18%。此外,该方法的训练速度比基eline快,因为而不是服务器在训练多个不同数据分布的多种数据分布,而是在多模态分布上训练。Abstract
Data heterogeneity presents significant challenges for federated learning (FL). Recently, dataset distillation techniques have been introduced, and performed at the client level, to attempt to mitigate some of these challenges. In this paper, we propose a highly efficient FL dataset distillation framework on the server side, significantly reducing both the computational and communication demands on local devices while enhancing the clients' privacy. Unlike previous strategies that perform dataset distillation on local devices and upload synthetic data to the server, our technique enables the server to leverage prior knowledge from pre-trained deep generative models to synthesize essential data representations from a heterogeneous model architecture. This process allows local devices to train smaller surrogate models while enabling the training of a larger global model on the server, effectively minimizing resource utilization. We substantiate our claim with a theoretical analysis, demonstrating the asymptotic resemblance of the process to the hypothetical ideal of completely centralized training on a heterogeneous dataset. Empirical evidence from our comprehensive experiments indicates our method's superiority, delivering an accuracy enhancement of up to 40% over non-dataset-distillation techniques in highly heterogeneous FL contexts, and surpassing existing dataset-distillation methods by 18%. In addition to the high accuracy, our framework converges faster than the baselines because rather than the server trains on several sets of heterogeneous data distributions, it trains on a multi-modal distribution. Our code is available at https://github.com/FedDG23/FedDG-main.git
摘要
“数据多样性对联合学习(FL)带来重要挑战。最近, dataset distillation 技术在客户端上进行了应用,以减轻一些这些挑战。在这篇论文中,我们提出了一个高效的 FL dataset distillation 框架,在服务器端进行了实现,对本地设备的计算和通信占用量进行了显著减少,同时保持了客户端的隐私。不同于先前的策略,不在本地设备上执行 dataset distillation,而是使服务器利用先前训练的深度生成模型来提取数据的主要表示,并将其传递给本地设备进行训练。这种方法使本地设备可以训练较小的副本模型,同时允许服务器训练一个更大的全球模型,从而减少资源的使用。我们通过理论分析,证明了这种过程的极限相似性,与完全中央化训练在多样性数据上的理论模型相似。实验证明了我们的方法的优越性,在高度多样性的 FL 上提高了精度达40%,比非 dataset distillation 技术高出18%。此外,我们的框架在基eline上更快 converges,因为服务器不需要训练多个不同数据分布的各种模型,而是训练一个多Modal的分布。我们的代码可以在 GitHub 上找到:https://github.com/FedDG23/FedDG-main.git”
NovoMol: Recurrent Neural Network for Orally Bioavailable Drug Design and Validation on PDGFRα Receptor
For: 提高药物临床试验的效率,解决医药产业中药物候选者的时间和成功率问题。* Methods: 使用回归神经网络mass生成药物,对药物进行数学预测,并对药物进行优化。* Results: 通过使用QED来衡量药物的胃肠溶解度,在5个训练周期后,76%的生成药物达到了QED的胃肠溶解度阈值,96%的生成药物达到了传统使用的Lipinski的五则规则。 训练模型后,对PDGFRα受体进行了特定的药物候选者生成,44%的生成药物在与现有的State-of-the-art药物Imatinib(蛋白质绑定亲和力-9.4 kcal/mol)的绑定亲和力上超过了现有药物。Abstract
Longer timelines and lower success rates of drug candidates limit the productivity of clinical trials in the pharmaceutical industry. Promising de novo drug design techniques help solve this by exploring a broader chemical space, efficiently generating new molecules, and providing improved therapies. However, optimizing for molecular characteristics found in approved oral drugs remains a challenge, limiting de novo usage. In this work, we propose NovoMol, a novel de novo method using recurrent neural networks to mass-generate drug molecules with high oral bioavailability, increasing clinical trial time efficiency. Molecules were optimized for desirable traits and ranked using the quantitative estimate of drug-likeness (QED). Generated molecules meeting QED's oral bioavailability threshold were used to retrain the neural network, and, after five training cycles, 76% of generated molecules passed this strict threshold and 96% passed the traditionally used Lipinski's Rule of Five. The trained model was then used to generate specific drug candidates for the cancer-related PDGFR{\alpha} receptor and 44% of generated candidates had better binding affinity than the current state-of-the-art drug, Imatinib (with a receptor binding affinity of -9.4 kcal/mol), and the best-generated candidate at -12.9 kcal/mol. NovoMol provides a time/cost-efficient AI-based de novo method offering promising drug candidates for clinical trials.
摘要
长时间和低成功率的药物候选者限制了医药工业中临床试验的产量。promising de novo药物设计技术可以解决这个问题,探索更广泛的化学空间,效率生成新分子,提供改进的治疗方案。然而,仍然需要优化批量药物的分子特征,限制de novo的使用。在这项工作中,我们提出了NovoMol,一种新的de novo方法,使用回归神经网络来批量生成高口服bioavailability的药物分子,提高临床试验时间效率。分子被优化为愿望的特征,并根据量子药理性估计(QED)进行排名。通过五次训练,76%的生成分子达到了QED的口服bioavailability阈值,96%达到了传统使用的利平斯基Rule of Five。训练后,模型被用来生成特定的PDGFRα受体相关的药物候选者,44%的生成候选者有更高的绑定率,比现有的 estado-of-the-art药物Imatinib(受体绑定率-9.4 kcal/mol)更好。NovoMol提供了时间/成本高效的人工智能基于de novo方法,为临床试验提供了优秀的药物候选者。
Tackling Bias in Pre-trained Language Models: Current Trends and Under-represented Societies
results: 研究发现现有的偏见检测和修正方法对于不同社会群体可能存在一定的限制和不足,需要针对不同社会群体的需求进行定制和调整。Abstract
The benefits and capabilities of pre-trained language models (LLMs) in current and future innovations are vital to any society. However, introducing and using LLMs comes with biases and discrimination, resulting in concerns about equality, diversity and fairness, and must be addressed. While understanding and acknowledging bias in LLMs and developing mitigation strategies are crucial, the generalised assumptions towards societal needs can result in disadvantages towards under-represented societies and indigenous populations. Furthermore, the ongoing changes to actual and proposed amendments to regulations and laws worldwide also impact research capabilities in tackling the bias problem. This research presents a comprehensive survey synthesising the current trends and limitations in techniques used for identifying and mitigating bias in LLMs, where the overview of methods for tackling bias are grouped into metrics, benchmark datasets, and mitigation strategies. The importance and novelty of this survey are that it explores the perspective of under-represented societies. We argue that current practices tackling the bias problem cannot simply be 'plugged in' to address the needs of under-represented societies. We use examples from New Zealand to present requirements for adopting existing techniques to under-represented societies.
摘要
现代和未来的创新中,预训言语模型(LLM)的优势和能力是社会中不可或缺的。然而,在引入和使用 LLM 时,存在偏见和歧视的问题,这会导致平等、多样性和公正的问题。为了解决这些问题,我们必须理解和承认 LLM 中的偏见,并开发消除方法。然而,通过普遍的假设来认为社会需求可能会导致弱化有限表示的社会和原住民族。此外,全球不断改变的法律和规定也会影响研究对偏见问题的能力。本研究提供了一份全面的评估,汇总当前的趋势和局限性,以及用于识别和消除偏见的技术。我们认为现有的偏见缓解方法无法直接应用于弱化表示的社会。我们使用新西兰的例子来说明在采用现有技术时的需求。
Effectively Fine-tune to Improve Large Multimodal Models for Radiology Report Generation
results: 本研究使用 OpenLLaMA-7B 取得了顶尖水准的性能,而不需要专业预训。此外,我们还提供了软件图示和注意力机制的详细分析,对未来研究提供了新的方向。Abstract
Writing radiology reports from medical images requires a high level of domain expertise. It is time-consuming even for trained radiologists and can be error-prone for inexperienced radiologists. It would be appealing to automate this task by leveraging generative AI, which has shown drastic progress in vision and language understanding. In particular, Large Language Models (LLM) have demonstrated impressive capabilities recently and continued to set new state-of-the-art performance on almost all natural language tasks. While many have proposed architectures to combine vision models with LLMs for multimodal tasks, few have explored practical fine-tuning strategies. In this work, we proposed a simple yet effective two-stage fine-tuning protocol to align visual features to LLM's text embedding space as soft visual prompts. Our framework with OpenLLaMA-7B achieved state-of-the-art level performance without domain-specific pretraining. Moreover, we provide detailed analyses of soft visual prompts and attention mechanisms, shedding light on future research directions.
摘要
评估医学影像需要高水平的领域专业知识。 even for 训练过的 radiologist 可以是时间consuming ,而不经验的 radiologist 可能会有错误。因此,使用生成 AI 自动化这个任务是非常吸引人。特别是,大语言模型(LLM)在 recent 时期表现出了惊人的进步,在 almost all natural language tasks 中 setting new state-of-the-art performance。虽然许多人已经提出了将视觉模型与 LLM 结合的建议,但只有几个人探讨了实用的 fine-tuning 策略。在这项工作中,我们提出了一种简单 yet effective 的 two-stage fine-tuning 协议,将视觉特征与 LLM 的文本嵌入空间相对轴。我们的框架与 OpenLLaMA-7B 实现了 state-of-the-art 级别的性能,而无需域pecific 预训练。此外,我们还提供了软visual prompts 和 attention 机制的详细分析,为未来的研究提供了光明。
ADT: Agent-based Dynamic Thresholding for Anomaly Detection
results: 经过三个真实世界数据集的实验,这篇论文显示了ADT的阈值调整能力、数据有效性、稳定性和Robustness。Abstract
The complexity and scale of IT systems are increasing dramatically, posing many challenges to real-world anomaly detection. Deep learning anomaly detection has emerged, aiming at feature learning and anomaly scoring, which has gained tremendous success. However, little work has been done on the thresholding problem despite it being a critical factor for the effectiveness of anomaly detection. In this paper, we model thresholding in anomaly detection as a Markov Decision Process and propose an agent-based dynamic thresholding (ADT) framework based on a deep Q-network. The proposed method can be integrated into many systems that require dynamic thresholding. An auto-encoder is utilized in this study to obtain feature representations and produce anomaly scores for complex input data. ADT can adjust thresholds adaptively by utilizing the anomaly scores from the auto-encoder and significantly improve anomaly detection performance. The properties of ADT are studied through experiments on three real-world datasets and compared with benchmarks, hence demonstrating its thresholding capability, data-efficient learning, stability, and robustness. Our study validates the effectiveness of reinforcement learning in optimal thresholding control in anomaly detection.
摘要
IT系统的复杂性和规模在不断增加,对现实世界异常检测带来了很多挑战。深度学习异常检测已经出现,旨在学习特征和异常分配,取得了巨大成功。然而,对阈值问题的研究仍然很少,即使这是异常检测效iveness的关键因素。在这篇论文中,我们将异常检测的阈值模型为Markov决策过程,并提出了基于深度Q网络的自适应动态阈值(ADT)框架。我们的方法可以与许多需要动态阈值的系统集成。在这种研究中,我们使用 auto-encoder 来获得特征表示和生成复杂输入数据的异常分数。ADT可以通过利用 auto-encoder 生成的异常分数进行适应性的阈值调整,提高异常检测性能。我们的实验表明,ADT具有阈值控制、数据效率学习、稳定性和稳定性等性质。我们的研究证明了深度学习在异常检测中的阈值控制优化的有效性。
Context-Enhanced Relational Operators with Vector Embeddings
results: 论文提出了一种 hybrid 关系和矢量数据处理方法,并实现了逻辑和物理优化。使用示例串embeddings,论文示出了在关系Join操作器上启用 hybrid 上下文增强处理的能力,并实现了对执行时间的一个次元级别的提升。Abstract
Collecting data, extracting value, and combining insights from relational and context-rich multi-modal sources in data processing pipelines presents a challenge for traditional relational DBMS. While relational operators allow declarative and optimizable query specification, they are limited to data transformations unsuitable for capturing or analyzing context. On the other hand, representation learning models can map context-rich data into embeddings, allowing machine-automated context processing but requiring imperative data transformation integration with the analytical query. To bridge this dichotomy, we present a context-enhanced relational join and introduce an embedding operator composable with relational operators. This enables hybrid relational and context-rich vector data processing, with algebraic equivalences compatible with relational algebra and corresponding logical and physical optimizations. We investigate model-operator interaction with vector data processing and study the characteristics of the E-join operator. Using an example of string embeddings, we demonstrate enabling hybrid context-enhanced processing on relational join operators with vector embeddings. The importance of holistic optimization, from logical to physical, is demonstrated in an order of magnitude execution time improvement.
摘要
Collecting data, extracting value, and combining insights from relational and context-rich multi-modal sources in data processing pipelines presents a challenge for traditional relational DBMS. While relational operators allow declarative and optimizable query specification, they are limited to data transformations unsuitable for capturing or analyzing context. On the other hand, representation learning models can map context-rich data into embeddings, allowing machine-automated context processing but requiring imperative data transformation integration with the analytical query. To bridge this dichotomy, we present a context-enhanced relational join and introduce an embedding operator composable with relational operators. This enables hybrid relational and context-rich vector data processing, with algebraic equivalences compatible with relational algebra and corresponding logical and physical optimizations. We investigate model-operator interaction with vector data processing and study the characteristics of the E-join operator. Using an example of string embeddings, we demonstrate enabling hybrid context-enhanced processing on relational join operators with vector embeddings. The importance of holistic optimization, from logical to physical, is demonstrated in an order of magnitude execution time improvement.Here's the translation in Traditional Chinese:收集数据,提取价值,并结合多种多模式的资料处理管道中的数据处理问题,对传统的关联式DBMS提出了挑战。关联式操作符允许宣告式和可优化的查询规则,但它们仅适用于不适合捕捉或分析上下文的资料变数。相反,表示学习模型可以将上下文丰富的数据映射到嵌入中,allowing机器自动处理上下文,但需要强制性的数据变数融合。 To bridge this gap, we present a context-enhanced relational join and introduce an embedding operator composable with relational operators. This enables hybrid relational and context-rich vector data processing, with algebraic equivalences compatible with relational algebra and corresponding logical and physical optimizations. We investigate model-operator interaction with vector data processing and study the characteristics of the E-join operator. Using an example of string embeddings, we demonstrate enabling hybrid context-enhanced processing on relational join operators with vector embeddings. The importance of holistic optimization, from logical to physical, is demonstrated in an order of magnitude execution time improvement.
results: 本研究提供了人工智能匹配的开始点,以便将来的探索和发展。通过人类-机器团队和共存的概念,我们可以更好地理解和利用AI技术,以提高组织的效率和创新力。Abstract
This research paper delves into the evolving landscape of fine-tuning large language models (LLMs) to align with human users, extending beyond basic alignment to propose "personality alignment" for language models in organizational settings. Acknowledging the impact of training methods on the formation of undefined personality traits in AI models, the study draws parallels with human fitting processes using personality tests. Through an original case study, we demonstrate the necessity of personality fine-tuning for AIs and raise intriguing questions about applying human-designed tests to AIs, engineering specialized AI personality tests, and shaping AI personalities to suit organizational roles. The paper serves as a starting point for discussions and developments in the burgeoning field of AI personality alignment, offering a foundational anchor for future exploration in human-machine teaming and co-existence.
摘要
这份研究论文探讨了大语言模型(LLM)在人类用户的整合方面的发展,从基础对应到人格对应,提出了在组织设置中使用“人格对应”来对语言模型进行调整。认可训练方法对AI模型的未定性特征的形成的影响,研究借鉴人类适应过程使用人格测试。通过原创的案例研究,我们证明了AI模型的人格调整的必要性,并提出了对应人工设计AI测试、工程化AI人格测试和适应组织角色的AI人格定制等问题。这篇论文为人机团队和人机共存领域的发展提供了一个基础锚点,供未来的探索和发展。
paper_authors: Matteo Bettini, Amanda Prorok, Vincent Moens
For: This paper aims to address the reproducibility crisis in Multi-Agent Reinforcement Learning (MARL) by introducing BenchMARL, a training library for standardized benchmarking.* Methods: BenchMARL uses TorchRL as its backend, allowing for high-performance and state-of-the-art implementations, and its design enables systematic configuration and reporting of complex benchmarks with simple one-line inputs.* Results: BenchMARL is the first MARL training library that enables standardized benchmarking across different algorithms, models, and environments, and it is open-sourced on GitHub for the broad community of MARL PyTorch users.Here is the simplified Chinese text for the three key points:* For: 本文提出了一种解决多智能奖励学习(MARL) reproduceability crisis 的方法,即引入 BenchMARL,一个用于标准化测试的训练库。* Methods: BenchMARL 使用 TorchRL 作为 backend,可以实现高性能和维护状态的最佳实现,并且设计了系统化的配置和报告方式,可以通过一行输入创建和运行复杂的测试。* Results: BenchMARL 是首个用于标准化 MARL 测试的训练库,可以在不同的算法、模型和环境下进行标准化测试,并且开源在 GitHub 上,以便广泛的 MARL PyTorch 用户群体使用。Abstract
The field of Multi-Agent Reinforcement Learning (MARL) is currently facing a reproducibility crisis. While solutions for standardized reporting have been proposed to address the issue, we still lack a benchmarking tool that enables standardization and reproducibility, while leveraging cutting-edge Reinforcement Learning (RL) implementations. In this paper, we introduce BenchMARL, the first MARL training library created to enable standardized benchmarking across different algorithms, models, and environments. BenchMARL uses TorchRL as its backend, granting it high performance and maintained state-of-the-art implementations while addressing the broad community of MARL PyTorch users. Its design enables systematic configuration and reporting, thus allowing users to create and run complex benchmarks from simple one-line inputs. BenchMARL is open-sourced on GitHub: https://github.com/facebookresearch/BenchMARL
摘要
当前的多智能奖励学习(Multi-Agent Reinforcement Learning,MARL)领域正面临一场可重复性危机。虽然有解决方案建议使用标准化报告,但我们仍然缺乏一个可以标准化和可重复性的测试工具,同时利用前沿的奖励学习(Reinforcement Learning,RL)实现。在这篇论文中,我们介绍了BenchMARL,第一个用于MARL训练的标准化测试库。BenchMARL使用TorchRL作为后端,从而实现了高性能和维护了前沿的PyTorch用户社区的状态。BenchMARL的设计允许用户系统地配置和报告,从而让用户可以通过一行命令创建和运行复杂的benchmark。BenchMARL在GitHub上开源:https://github.com/facebookresearch/BenchMARL。
Exploring Adversarial Robustness of LiDAR-Camera Fusion Model in Autonomous Driving
paper_authors: Bo Yang, Xiaoyu Ji, Xiaoyu Ji, Xiaoyu Ji, Xiaoyu Ji
for: This paper assesses the adversarial robustness of LiDAR-camera fusion models in 3D object detection, with a focus on safety concerns in autonomous driving.
methods: The paper introduces an attack technique that adds a limited number of physically constrained adversarial points above a car to deceive the fusion model, without changing the image data channel.
results: Experimental results show that the fusion model can be deceived solely by manipulating the LiDAR data channel, raising safety concerns in autonomous driving. The paper also explores the effects of various factors on the attack success rate.Abstract
Our study assesses the adversarial robustness of LiDAR-camera fusion models in 3D object detection. We introduce an attack technique that, by simply adding a limited number of physically constrained adversarial points above a car, can make the car undetectable by the fusion model. Experimental results reveal that even without changes to the image data channel, the fusion model can be deceived solely by manipulating the LiDAR data channel. This finding raises safety concerns in the field of autonomous driving. Further, we explore how the quantity of adversarial points, the distance between the front-near car and the LiDAR-equipped car, and various angular factors affect the attack success rate. We believe our research can contribute to the understanding of multi-sensor robustness, offering insights and guidance to enhance the safety of autonomous driving.
摘要
我们的研究评估了涉及推理 LiDAR-camera 融合模型的攻击表现。我们介绍了一种简单地在车辆上添加一定数量的物理约束的恶意点的攻击技术,可以使车辆被检测器抑制。实验结果表明,无需改变图像数据频道,涉及 LiDAR 数据频道的攻击 already 可以让车辆被抑制。这种发现对于自动驾驶领域的安全提出了很大的问题。我们进一步探讨了攻击点的数量、车辆前方近距离 LiDAR 搭载车辆之间的距离以及不同的角度因素对攻击成功率的影响。我们认为,我们的研究可以帮助我们更好地理解多感器的可靠性,提供有用的指导和技术来提高自动驾驶的安全性。
D-Bot: Database Diagnosis System using Large Language Models
results: 对实际 benchmark 进行了验证,并显示了 D-Bot 可以有效地分析未经见过的异常,并且与传统方法和原生模型(如 GPT-4)相比,具有显著的性能优势。Abstract
Database administrators (DBAs) play an important role in managing, maintaining and optimizing database systems. However, it is hard and tedious for DBAs to manage a large number of databases and give timely response (waiting for hours is intolerable in many online cases). In addition, existing empirical methods only support limited diagnosis scenarios, which are also labor-intensive to update the diagnosis rules for database version updates. Recently large language models (LLMs) have shown great potential in various fields. Thus, we propose D-Bot, an LLM-based database diagnosis system that can automatically acquire knowledge from diagnosis documents, and generate reasonable and well-founded diagnosis report (i.e., identifying the root causes and solutions) within acceptable time (e.g., under 10 minutes compared to hours by a DBA). The techniques in D-Bot include (i) offline knowledge extraction from documents, (ii) automatic prompt generation (e.g., knowledge matching, tool retrieval), (iii) root cause analysis using tree search algorithm, and (iv) collaborative mechanism for complex anomalies with multiple root causes. We verify D-Bot on real benchmarks (including 539 anomalies of six typical applications), and the results show that D-Bot can effectively analyze the root causes of unseen anomalies and significantly outperforms traditional methods and vanilla models like GPT-4.
摘要
Offline knowledge extraction from documents2. Automatic prompt generation (e.g., knowledge matching, tool retrieval)3. Root cause analysis using tree search algorithms4. Collaborative mechanism for complex anomalies with multiple root causesWe verify D-Bot on real benchmarks (including 539 anomalies of six typical applications), and the results show that D-Bot can effectively analyze the root causes of unseen anomalies and significantly outperforms traditional methods and vanilla models like GPT-4.
results: 我们进行了多种图像集上详细的实验,与之前的方法进行了比较,并测试了不同的选择,如瑞夫感知度和Network perform 数据或计算预算的影响。我们发现我们的模型比一个状态 искусственный神经网络和相关参数的瑞夫感知视觉架构提高了 объек recognition 性能。Abstract
In this paper, we tackle the challenge of actively attending to visual scenes using a foveated sensor. We introduce an end-to-end differentiable foveated active vision architecture that leverages a graph convolutional network to process foveated images, and a simple yet effective formulation for foveated image sampling. Our model learns to iteratively attend to regions of the image relevant for classification. We conduct detailed experiments on a variety of image datasets, comparing the performance of our method with previous approaches to foveated vision while measuring how the impact of different choices, such as the degree of foveation, and the number of fixations the network performs, affect object recognition performance. We find that our model outperforms a state-of-the-art CNN and foveated vision architectures of comparable parameters and a given pixel or computation budget
摘要
在这篇论文中,我们面临了使用瑕点感知器进行视觉场景活动投入的挑战。我们提出了一种从头到尾可微分的活动感知视觉架构,利用图像加注 convolutional neural network(CNN)处理瑕点图像,并提出了一种简单 yet effective的瑕点图像采样方法。我们的模型会逐次吸引图像中相关于分类的区域,并通过不同的选择,如瑕点度和fixation数量,影响物体识别性能。我们进行了多个图像集数据进行详细的实验,并与之前的方法进行比较,检查不同的选择对物体识别性能的影响。我们发现,我们的模型在与相对参数和 computation budget 相同的情况下,超过了一个状态元投入 CNN 和相似参数的active vision架构。
results: RLagent展现出更好的 robustness和泛化能力,能够在未经见过的环境中表现出更好的性能Abstract
Deep reinforcement learning (RL) provides powerful methods for training optimal sequential decision-making agents. As collecting real-world interactions can entail additional costs and safety risks, the common paradigm of sim2real conducts training in a simulator, followed by real-world deployment. Unfortunately, RL agents easily overfit to the choice of simulated training environments, and worse still, learning ends when the agent masters the specific set of simulated environments. In contrast, the real world is highly open-ended, featuring endlessly evolving environments and challenges, making such RL approaches unsuitable. Simply randomizing over simulated environments is insufficient, as it requires making arbitrary distributional assumptions and can be combinatorially less likely to sample specific environment instances that are useful for learning. An ideal learning process should automatically adapt the training environment to maximize the learning potential of the agent over an open-ended task space that matches or surpasses the complexity of the real world. This thesis develops a class of methods called Unsupervised Environment Design (UED), which aim to produce such open-ended processes. Given an environment design space, UED automatically generates an infinite sequence or curriculum of training environments at the frontier of the learning agent's capabilities. Through extensive empirical studies and theoretical arguments founded on minimax-regret decision theory and game theory, the findings in this thesis show that UED autocurricula can produce RL agents exhibiting significantly improved robustness and generalization to previously unseen environment instances. Such autocurricula are promising paths toward open-ended learning systems that achieve more general intelligence by continually generating and mastering additional challenges of their own design.
摘要
深度强化学习(RL)提供了强大的方法来训练优化的序列决策机器人。由于收集真实世界交互可能会带来额外成本和安全隐患,因此常见的 simulate-to-real paradigm 是在模拟器中进行训练,然后将其部署到真实世界中。然而,RL 代理人很容易在模拟环境中过拟合,而且学习结束于代理人掌握特定的模拟环境。相比之下,真实世界是高度开放的,具有无限制的环境和挑战,使得这些 RL 方法不适用。偶尔随机选择模拟环境是不充分的,因为它需要作出伪装分布Assumption 并可能是 combinatorially 更不可能 sampling 特定环境实例,这些实例对学习具有用值。一个理想的学习过程应该自动调整培训环境,以确保代理人在开放任务空间中学习的可能性最大,并且与真实世界的复杂度匹配或超越。这份论文开发了一种方法,称为无监督环境设计(UED),以便生成开放任务空间中的无限序列或训练环境。通过实验研究和基于最小最大 regret 决策理论和游戏理论的理论支持,这些发现表明,UED 自动训练环境可以生成高度 Robustness 和泛化性,使代理人在未经见过的环境实例中表现出色。这些自动训练课程是开放式学习系统的可能性的追求,它们可以不断生成和掌握更多的挑战,以实现更广泛的智能。
Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models
results: 在不同的动作和摄像头路径例子中,实现高质量和时间协调的计算机生成视频,并且允许用户应用自己的创意而不受限制。Abstract
Traditional 3D content creation tools empower users to bring their imagination to life by giving them direct control over a scene's geometry, appearance, motion, and camera path. Creating computer-generated videos, however, is a tedious manual process, which can be automated by emerging text-to-video diffusion models. Despite great promise, video diffusion models are difficult to control, hindering a user to apply their own creativity rather than amplifying it. To address this challenge, we present a novel approach that combines the controllability of dynamic 3D meshes with the expressivity and editability of emerging diffusion models. For this purpose, our approach takes an animated, low-fidelity rendered mesh as input and injects the ground truth correspondence information obtained from the dynamic mesh into various stages of a pre-trained text-to-image generation model to output high-quality and temporally consistent frames. We demonstrate our approach on various examples where motion can be obtained by animating rigged assets or changing the camera path.
摘要
传统的3D内容创建工具让用户直接控制场景的几何结构、外观、运动和摄像机道,但创建计算机生成视频是一个繁琐的手动过程,可以通过新兴的文本到视频扩散模型来自动化。然而,这些扩散模型难以控制,使得用户无法应用自己的创意而是受到限制。为解决这个挑战,我们提出了一种新的方法,将动态3D网格的可控性与新兴的扩散模型的表达力和可编辑性结合在一起。我们的方法接受一个动画、低质量渲染的网格输入,并将动态网格中的真实匹配信息注入到预训练的文本到图像生成模型的不同阶段,以生成高质量和时间协调的帧。我们在不同的示例中展示了如何通过动画 rigged 资产或改变摄像机道来获得运动。
Towards Mitigating Perceived Unfairness in Contracts from a Non-Legal Stakeholder’s Perspective
paper_authors: Anmol Singhal, Preethu Rose Anish, Shirish Karande, Smita Ghaisas
For: The paper aims to identify potentially unfair clauses in commercial contracts and to develop a method using Pre-trained Language Models (PLMs) to identify unfairness in contractual sentences.* Methods: The paper uses an empirical study and compares chain of thought prompting and semi-supervised fine-tuning approaches to identify unfairness in contractual sentences. The authors use BERT-based fine-tuning, which achieves an accuracy of 84% on a dataset consisting of proprietary contracts.* Results: The paper finds that BERT-based fine-tuning outperforms chain of thought prompting using Vicuna-13B by a margin of 9%. The authors achieve an accuracy of 84% in identifying potentially unfair clauses in commercial contracts using PLMs.Abstract
Commercial contracts are known to be a valuable source for deriving project-specific requirements. However, contract negotiations mainly occur among the legal counsel of the parties involved. The participation of non-legal stakeholders, including requirement analysts, engineers, and solution architects, whose primary responsibility lies in ensuring the seamless implementation of contractual terms, is often indirect and inadequate. Consequently, a significant number of sentences in contractual clauses, though legally accurate, can appear unfair from an implementation perspective to non-legal stakeholders. This perception poses a problem since requirements indicated in the clauses are obligatory and can involve punitive measures and penalties if not implemented as committed in the contract. Therefore, the identification of potentially unfair clauses in contracts becomes crucial. In this work, we conduct an empirical study to analyze the perspectives of different stakeholders regarding contractual fairness. We then investigate the ability of Pre-trained Language Models (PLMs) to identify unfairness in contractual sentences by comparing chain of thought prompting and semi-supervised fine-tuning approaches. Using BERT-based fine-tuning, we achieved an accuracy of 84% on a dataset consisting of proprietary contracts. It outperformed chain of thought prompting using Vicuna-13B by a margin of 9%.
摘要
商业合同是一个价值颇高的来源,可以 derivate 项目特定的需求。然而,合同谈判主要由各方法的法律顾问进行,非法领域的参与者,包括需求分析师、工程师和解决方案建筑师,他们的主要责任是确保合同条款的顺利实施,往往 indirect 和不充分。因此,一些合同条款中的句子,尽管法律上准确,但从实施角度来看可能会看起来不公正。这种情况会导致合同中的需求被视为不公正,从而影响实施。因此,对合同中可能不公正的句子进行标识成为了一项重要的任务。在这项工作中,我们进行了一项employmulti-stakeholder perspective的研究,以分析不同参与者对合同公正性的看法。然后,我们investigate了PLMs的能力来标识合同中的不公正句子,并比较了链条思维提示和semi-supervised fine-tuning两种方法。使用BERT基于的精度 fine-tuning,我们在一个包含专用合同的数据集上达到了84%的准确率,超过了链条思维提示使用Vicuna-13B的margin of 9%。
DiFace: Cross-Modal Face Recognition through Controlled Diffusion
methods: 这篇论文使用了一种控制性的扩散过程,通过实现概率传输理论连接,来实现文本描述 Face Recognition。
results: 根据实验结果,这种方法可以在文本描述 Face Recognition 领域达到了最高精度,并且在验证和识别两个任务中都表现出色。Abstract
Diffusion probabilistic models (DPMs) have exhibited exceptional proficiency in generating visual media of outstanding quality and realism. Nonetheless, their potential in non-generative domains, such as face recognition, has yet to be thoroughly investigated. Meanwhile, despite the extensive development of multi-modal face recognition methods, their emphasis has predominantly centered on visual modalities. In this context, face recognition through textual description presents a unique and promising solution that not only transcends the limitations from application scenarios but also expands the potential for research in the field of cross-modal face recognition. It is regrettable that this avenue remains unexplored and underutilized, a consequence from the challenges mainly associated with three aspects: 1) the intrinsic imprecision of verbal descriptions; 2) the significant gaps between texts and images; and 3) the immense hurdle posed by insufficient databases.To tackle this problem, we present DiFace, a solution that effectively achieves face recognition via text through a controllable diffusion process, by establishing its theoretical connection with probability transport. Our approach not only unleashes the potential of DPMs across a broader spectrum of tasks but also achieves, to the best of our knowledge, a significant accuracy in text-to-image face recognition for the first time, as demonstrated by our experiments on verification and identification.
摘要
diffuse probabilistic models (DPMs) 有出色地表现出高品质和真实性的视觉媒体生成能力。然而,它们在非生成领域,如人脸识别,的潜力尚未得到全面探索。同时,虽然视觉多模态人脸识别方法的研发得到了广泛的应用,但是它们主要集中在视觉modalities上。在这个上下文中,通过文本描述进行人脸识别是一个独特和有前途的解决方案,不仅能够突破应用场景的限制,还可以拓宽跨modalities的人脸识别研究领域。然而,这一可能性尚未得到充分探索和利用,主要因为三个方面的挑战:1)文本描述的内在不准确性;2)图像和文本之间的巨大差距;3)数据库的缺乏。为解决这个问题,我们提出了DiFace方法,通过控制扩散过程,实现文本描述到人脸识别的功能。我们的方法不仅可以拓宽 DPMs 的应用范围,还在我们知道的范围内实现了文本描述到人脸识别的首次精度的实验 validate 和验证。
Analyze the robustness of three NMF algorithms (Robust NMF with L1 norm, L2-1 norm NMF, L2 NMF)
results: 通过评价指标来评估不同NMF算法在噪声环境中的性能,并取得了噪声环境下NMF算法的抵抗力和实际应用中的可行性。Abstract
Non-negative matrix factorization (NMF) and its variants have been widely employed in clustering and classification tasks (Long, & Jian , 2021). However, noises can seriously affect the results of our experiments. Our research is dedicated to investigating the noise robustness of non-negative matrix factorization (NMF) in the face of different types of noise. Specifically, we adopt three different NMF algorithms, namely L1 NMF, L2 NMF, and L21 NMF, and use the ORL and YaleB data sets to simulate a series of experiments with salt-and-pepper noise and Block-occlusion noise separately. In the experiment, we use a variety of evaluation indicators, including root mean square error (RMSE), accuracy (ACC), and normalized mutual information (NMI), to evaluate the performance of different NMF algorithms in noisy environments. Through these indicators, we quantify the resistance of NMF algorithms to noise and gain insights into their feasibility in practical applications.
摘要
非正式矩阵分解(NMF)和其变种在聚类和分类任务中广泛应用(龙、剑,2021)。然而,噪声可以严重地影响我们的实验结果。我们的研究旨在调查非正式矩阵分解在不同类型的噪声下的强度。特别是,我们采用了三种不同的NMF算法,即L1 NMF、L2 NMF和L21 NMF,并使用ORL和YaleB数据集来实现一系列对噪声和块填充噪声 separately。在实验中,我们使用了一些评价指标,包括平均平方误差(RMSE)、准确率(ACC)和normalized mutual information(NMI),来评估不同NMF算法在噪声环境中的表现。通过这些指标,我们可以量化不同NMF算法对噪声的抵抗力和实际应用中的可行性。
Honesty Is the Best Policy: Defining and Mitigating AI Deception
methods: 本文提出了一种 formal definition of deception in structural causal games,基于哲学文献,适用于实际的机器学习系统。
results: 本研究实验ally shows that our formal definition of deception aligns with philosophical and common-sense meanings of deception, and our graphical criteria for deception can be used to mitigate deception in reinforcement learning agents and language models.Abstract
Deceptive agents are a challenge for the safety, trustworthiness, and cooperation of AI systems. We focus on the problem that agents might deceive in order to achieve their goals (for instance, in our experiments with language models, the goal of being evaluated as truthful). There are a number of existing definitions of deception in the literature on game theory and symbolic AI, but there is no overarching theory of deception for learning agents in games. We introduce a formal definition of deception in structural causal games, grounded in the philosophy literature, and applicable to real-world machine learning systems. Several examples and results illustrate that our formal definition aligns with the philosophical and commonsense meaning of deception. Our main technical result is to provide graphical criteria for deception. We show, experimentally, that these results can be used to mitigate deception in reinforcement learning agents and language models.
摘要
诱导者是人工智能系统的安全、可靠性和合作的挑战。我们关注在代理者可能为了实现目标而隐瞒真实信息的问题(例如,在我们的语言模型实验中,目标是被评估为真实的)。现有在游戏理论和符号AI领域的多种定义误导,但是没有涵盖学习代理者的总体理论。我们提出了结构 causal游戏中的正式定义误导,基于哲学文献,并适用于现实世界机器学习系统。一些例子和结果表明,我们的正式定义与哲学和常识中的误导相吻合。我们的主要技术成果是在图示中提供误导的 критери习。我们实验表明,这些结果可以用来抑制误导在强化学习代理者和语言模型中。
tsMorph: generation of semi-synthetic time series to understand algorithm performance
results: 实验结果显示,Long Short-Term Memory Network 预测算法在时间序列频率增加时表现得更好,这些实验证明了 tsMorph 的效用,并为时间序列预测方法的研究提供了一个有用的工具。Abstract
Time series forecasting is a subject of significant scientific and industrial importance. Despite the widespread utilization of forecasting methods, there is a dearth of research aimed at comprehending the conditions under which these methods yield favorable or unfavorable performances. Empirical studies, although common, encounter challenges due to the limited availability of datasets, impeding the extraction of reliable insights. To address this, we present tsMorph, a straightforward approach for generating semi-synthetic time series through dataset morphing. tsMorph operates by creating a sequence of datasets derived from two original datasets. These newly generated datasets exhibit a progressive departure from the characteristics of one dataset and a convergence toward the attributes of the other. This method provides a valuable alternative for obtaining substantial datasets. In this paper, we demonstrate the utility of tsMorph by assessing the performance of the Long Short-Term Memory Network forecasting algorithm. The time series under examination are sourced from the NN5 Competition. The findings reveal compelling insights. Notably, the performance of the Long Short-Term Memory Network improves proportionally with the frequency of the time series. These experiments affirm that tsMorph serves as an effective tool for gaining an understanding of forecasting algorithm behaviors, offering a pathway to overcome the limitations posed by empirical studies and enabling more extensive and reliable experimentation.
摘要
时间序列预测是一个科学和工业上的重要问题。尽管预测方法广泛应用,但是有很少研究旨在理解这些方法在不同条件下的表现。实际研究受到数据集的有限性的限制,导致EXTRACTING RELIABLE INSIGHTS困难。为了解决这个问题,我们提出了tsMorph方法,它可以生成基于两个原始数据集的半人工时间序列。这些新生成的数据集会逐渐偏离一个数据集的特征,而又 converges toward另一个数据集的特征。这种方法可以提供充足的数据集。在这篇论文中,我们使用Long Short-Term Memory Network预测算法来评估tsMorph的Utility。时间序列来源于NN5竞赛。我们的发现表明,Long Short-Term Memory Network的表现与时间序列频率成直接相关。这些实验证明了tsMorph是一种有效的工具,可以帮助我们理解预测算法的行为,并提供一条可靠的实验方式。
AI-Powered Arabic Crossword Puzzle Generation for Educational Applications
results: 这个系统可以实现高质量的教育十字游戏,推广学习和问题解决能力,进而改善学习体验和学习效果。Abstract
This paper presents the first Arabic crossword puzzle generator driven by advanced AI technology. Leveraging cutting-edge large language models including GPT4, GPT3-Davinci, GPT3-Curie, GPT3-Babbage, GPT3-Ada, and BERT, the system generates distinctive and challenging clues. Based on a dataset comprising over 50,000 clue-answer pairs, the generator employs fine-tuning, few/zero-shot learning strategies, and rigorous quality-checking protocols to enforce the generation of high-quality clue-answer pairs. Importantly, educational crosswords contribute to enhancing memory, expanding vocabulary, and promoting problem-solving skills, thereby augmenting the learning experience through a fun and engaging approach, reshaping the landscape of traditional learning methods. The overall system can be exploited as a powerful educational tool that amalgamates AI and innovative learning techniques, heralding a transformative era for Arabic crossword puzzles and the intersection of technology and education.
摘要
Facial Emotion Recognition Under Mask Coverage Using a Data Augmentation Technique
results: 我们的模型在多面具模式下表现出色,比单面具模式更高的精度。VGGFace2网络在人依赖模式下取得了97.82%的准确率,而在人独立模式下取得了74.21%的准确率。我们还使用了多种度量来评估我们的系统效率,包括精度、敏感度、特异度、AUC、F1分数和迷思矩阵。此外,我们还使用了LIME算法来可视化CNN的决策采取策略。Abstract
Identifying human emotions using AI-based computer vision systems, when individuals wear face masks, presents a new challenge in the current Covid-19 pandemic. In this study, we propose a facial emotion recognition system capable of recognizing emotions from individuals wearing different face masks. A novel data augmentation technique was utilized to improve the performance of our model using four mask types for each face image. We evaluated the effectiveness of four convolutional neural networks, Alexnet, Squeezenet, Resnet50 and VGGFace2 that were trained using transfer learning. The experimental findings revealed that our model works effectively in multi-mask mode compared to single-mask mode. The VGGFace2 network achieved the highest accuracy rate, with 97.82% for the person-dependent mode and 74.21% for the person-independent mode using the JAFFE dataset. However, we evaluated our proposed model using the UIBVFED dataset. The Resnet50 has demonstrated superior performance, with accuracies of 73.68% for the person-dependent mode and 59.57% for the person-independent mode. Moreover, we employed metrics such as precision, sensitivity, specificity, AUC, F1 score, and confusion matrix to measure our system's efficiency in detail. Additionally, the LIME algorithm was used to visualize CNN's decision-making strategy.
摘要
identifying human emotions using AI-based computer vision systems during the current Covid-19 pandemic, when individuals wear face masks, presents a new challenge. In this study, we propose a facial emotion recognition system that can recognize emotions from individuals wearing different face masks. We used a novel data augmentation technique to improve the performance of our model, using four mask types for each face image. We evaluated the effectiveness of four convolutional neural networks (CNNs), Alexnet, Squeezenet, Resnet50, and VGGFace2, that were trained using transfer learning. The experimental findings showed that our model works effectively in multi-mask mode compared to single-mask mode. The VGGFace2 network achieved the highest accuracy rate, with 97.82% for the person-dependent mode and 74.21% for the person-independent mode using the JAFFE dataset. However, we evaluated our proposed model using the UIBVFED dataset. The Resnet50 demonstrated superior performance, with accuracies of 73.68% for the person-dependent mode and 59.57% for the person-independent mode. Moreover, we employed metrics such as precision, sensitivity, specificity, AUC, F1 score, and confusion matrix to measure our system's efficiency in detail. Additionally, the LIME algorithm was used to visualize the CNN's decision-making strategy.
JarviX: A LLM No code Platform for Tabular Data Analysis and Optimization
results: 实验结果表明,JarviX 可以提供高效和可靠的数据分析和可视化结果,并且可以自动生成数据概况报告、分析问题和结果解释。Abstract
In this study, we introduce JarviX, a sophisticated data analytics framework. JarviX is designed to employ Large Language Models (LLMs) to facilitate an automated guide and execute high-precision data analyzes on tabular datasets. This framework emphasizes the significance of varying column types, capitalizing on state-of-the-art LLMs to generate concise data insight summaries, propose relevant analysis inquiries, visualize data effectively, and provide comprehensive explanations for results drawn from an extensive data analysis pipeline. Moreover, JarviX incorporates an automated machine learning (AutoML) pipeline for predictive modeling. This integration forms a comprehensive and automated optimization cycle, which proves particularly advantageous for optimizing machine configuration. The efficacy and adaptability of JarviX are substantiated through a series of practical use case studies.
摘要
在本研究中,我们介绍了 JarviX,一种复杂的数据分析框架。 JarviX 采用大型自然语言模型(LLM)来自动生成数据分析摘要、提出相关的分析问题、可读性地视觉化数据,以及为数据分析管道中的结果提供全面的解释。此外,JarviX 还包含自动机器学习(AutoML)管道,以便预测模型化。这种整体和自动化优化循环,对机器配置优化具有特点优势。 JarviX 的可效性和适应性通过一系列实践用例研究得到证明。
ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models
results: 通过创建一个平滑的相机轨迹,并使用视图条件扩散模型和视频扩散模型进行减噪,我们可以获得高一致性的新视图合成,超过了现状卷积模型的表现。Abstract
Generating novel views of an object from a single image is a challenging task. It requires an understanding of the underlying 3D structure of the object from an image and rendering high-quality, spatially consistent new views. While recent methods for view synthesis based on diffusion have shown great progress, achieving consistency among various view estimates and at the same time abiding by the desired camera pose remains a critical problem yet to be solved. In this work, we demonstrate a strikingly simple method, where we utilize a pre-trained video diffusion model to solve this problem. Our key idea is that synthesizing a novel view could be reformulated as synthesizing a video of a camera going around the object of interest -- a scanning video -- which then allows us to leverage the powerful priors that a video diffusion model would have learned. Thus, to perform novel-view synthesis, we create a smooth camera trajectory to the target view that we wish to render, and denoise using both a view-conditioned diffusion model and a video diffusion model. By doing so, we obtain a highly consistent novel view synthesis, outperforming the state of the art.
摘要
<>translate_language: zh-CN Generating novel views of an object from a single image is a challenging task. It requires an understanding of the underlying 3D structure of the object from an image and rendering high-quality, spatially consistent new views. While recent methods for view synthesis based on diffusion have shown great progress, achieving consistency among various view estimates and at the same time abiding by the desired camera pose remains a critical problem yet to be solved. In this work, we demonstrate a strikingly simple method, where we utilize a pre-trained video diffusion model to solve this problem. Our key idea is that synthesizing a novel view could be reformulated as synthesizing a video of a camera going around the object of interest -- a scanning video -- which then allows us to leverage the powerful priors that a video diffusion model would have learned. Thus, to perform novel-view synthesis, we create a smooth camera trajectory to the target view that we wish to render, and denoise using both a view-conditioned diffusion model and a video diffusion model. By doing so, we obtain a highly consistent novel view synthesis, outperforming the state of the art.Note: The "zh-CN" language code specifies Simplified Chinese.
Churn Prediction via Multimodal Fusion Learning:Integrating Customer Financial Literacy, Voice, and Behavioral Data
results: 研究结果显示,融合多modal数据可以提高退买预测的准确性。比较 Late Fusion、基准模型和融合模型,融合模型在测试精度、mean average precision和macro-averaged F1 score中获得了91.2%、66和54的成绩。此外,分析显示,低FL scores和负情感具有正相关,高隐含预测客户退买风险。Abstract
In todays competitive landscape, businesses grapple with customer retention. Churn prediction models, although beneficial, often lack accuracy due to the reliance on a single data source. The intricate nature of human behavior and high dimensional customer data further complicate these efforts. To address these concerns, this paper proposes a multimodal fusion learning model for identifying customer churn risk levels in financial service providers. Our multimodal approach integrates customer sentiments financial literacy (FL) level, and financial behavioral data, enabling more accurate and bias-free churn prediction models. The proposed FL model utilizes a SMOGN COREG supervised model to gauge customer FL levels from their financial data. The baseline churn model applies an ensemble artificial neural network and oversampling techniques to predict churn propensity in high-dimensional financial data. We also incorporate a speech emotion recognition model employing a pre-trained CNN-VGG16 to recognize customer emotions based on pitch, energy, and tone. To integrate these diverse features while retaining unique insights, we introduced late and hybrid fusion techniques that complementary boost coordinated multimodal co learning. Robust metrics were utilized to evaluate the proposed multimodal fusion model and hence the approach validity, including mean average precision and macro-averaged F1 score. Our novel approach demonstrates a marked improvement in churn prediction, achieving a test accuracy of 91.2%, a Mean Average Precision (MAP) score of 66, and a Macro-Averaged F1 score of 54 through the proposed hybrid fusion learning technique compared with late fusion and baseline models. Furthermore, the analysis demonstrates a positive correlation between negative emotions, low FL scores, and high-risk customers.
摘要
今天的竞争场景中,企业面临Customer Retention的挑战。虽然预测客户弃用的模型具有优势,但它们经常缺乏准确性,因为它们仅仅基于单一数据源。人类行为的复杂性和高维客户数据更加增加了这些努力的困难。为了解决这些问题,这篇论文提出了一种多模式融合学习模型,用于在金融服务提供者中预测客户弃用风险水平。我们的多模式approach集成了客户情感、财务文化水平(FL)和财务行为数据,从而实现更加准确和不偏的预测模型。我们的FL模型使用SMOGN COREG指导模型来测量客户FL水平从 их金融数据中。基本的弃用模型采用了一个ensemble人工神经网络和扩展技术来预测高维金融数据中的弃用可能性。我们还 integratespeech感知模型,使用预训练的CNN-VGG16来识别客户情感的变化,并根据抽象、能量和音调来识别客户的情感。为了融合这些多样的特征而保留每个特征的独特意义,我们引入了晚期和hybrid融合技术,这些技术可以相互补做,从而提高多模式融合学习的效果。我们使用了多种Robust度量来评估我们的多模式融合模型的有效性,包括测试准确率、 macro-averaged F1分数和 Mean Average Precision(MAP)分数。我们的新方法在测试数据集上达到了91.2%的测试准确率,MAP分数为66和Macro-Averaged F1分数为54,相比基eline模型和晚期融合模型的性能有明显的提高。此外,分析还表明,低FL分、负情感和高风险客户之间存在正相关关系。
TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents
results: 根据实验结果,TextGenSHAP比普通的Shapley值计算方法更快速,处理单个token的解释时间从小时降低到了分钟级别,而处理整个文档的解释时间则只需要几秒钟。此外,这篇论文还证明了在两个重要场景中,实时Shapley值可以提供更好的理解和改进LLM的性能:在回答长文书问题时,可以 lokalisiert到重要的单词和句子;在改进现有文档检索系统时,可以提高选择的段落和最终回答的准确率。Abstract
Large language models (LLMs) have attracted huge interest in practical applications given their increasingly accurate responses and coherent reasoning abilities. Given their nature as black-boxes using complex reasoning processes on their inputs, it is inevitable that the demand for scalable and faithful explanations for LLMs' generated content will continue to grow. There have been major developments in the explainability of neural network models over the past decade. Among them, post-hoc explainability methods, especially Shapley values, have proven effective for interpreting deep learning models. However, there are major challenges in scaling up Shapley values for LLMs, particularly when dealing with long input contexts containing thousands of tokens and autoregressively generated output sequences. Furthermore, it is often unclear how to effectively utilize generated explanations to improve the performance of LLMs. In this paper, we introduce TextGenSHAP, an efficient post-hoc explanation method incorporating LM-specific techniques. We demonstrate that this leads to significant increases in speed compared to conventional Shapley value computations, reducing processing times from hours to minutes for token-level explanations, and to just seconds for document-level explanations. In addition, we demonstrate how real-time Shapley values can be utilized in two important scenarios, providing better understanding of long-document question answering by localizing important words and sentences; and improving existing document retrieval systems through enhancing the accuracy of selected passages and ultimately the final responses.
摘要
在这篇文章中,我们介绍 TextGenSHAP,一种高效的后续解释方法,它特别适用于 LLMs。我们证明,TextGenSHAP 可以在速度方面与传统 Shapley 值计算相比,提高速度,从 hour 降低到 minute 级别,而且可以在 document 级别进行解释,只需要几秒钟。此外,我们还示出了在两个重要的应用场景中,使用实时 Shapley 值可以提供更好的理解,即对长文件问答中的重要单词和句子进行地图化,以及通过提高选择的段落和最终回答的准确率来提高现有的文档检索系统。
Running cognitive evaluations on large language models: The do’s and the don’ts
methods: 文章根据三个Literature case study(通用常识知识 bencmark,理解思想测试和语法一致测试),描述了应用语言测试 onto LLM 时可能出现的坑。然后,文章列出了10个准则,可以帮助设计高质量的AI系统评估。
results: 文章结论提出了四个当前在活跃讨论的领域:提问敏感、文化和语言多样性、使用LLM作为研究助手以及在开放 versus 封闭LLM上进行评估。总的来说,文章的目标是贡献到AI Psychology领域的最佳实践。Abstract
In this paper, I describe methodological considerations for studies that aim to evaluate the cognitive capacities of large language models (LLMs) using language-based behavioral assessments. Drawing on three case studies from the literature (a commonsense knowledge benchmark, a theory of mind evaluation, and a test of syntactic agreement), I describe common pitfalls that might arise when applying a cognitive test to an LLM. I then list 10 do's and don'ts that should help design high-quality cognitive evaluations for AI systems. I conclude by discussing four areas where the do's and don'ts are currently under active discussion -- prompt sensitivity, cultural and linguistic diversity, using LLMs as research assistants, and running evaluations on open vs. closed LLMs. Overall, the goal of the paper is to contribute to the broader discussion of best practices in the rapidly growing field of AI Psychology.
摘要
在这篇论文中,我介绍了对大语言模型(LLM)使用语言基于行为评估来评估其认知能力的方法ológico Considerations。基于文献中的三个案例(通用常识准入标准、理解他者的能力评估和语法一致性测试),我描述了在应用认知测试于 LLM 时可能出现的常见困难。然后,我列出了10个做法和不做法,以帮助设计高质量的认知评估方法 для AI 系统。我的结论是,在 rapidly growing field of AI Psychology 中,这些做法和不做法在当前正在活跃的讨论中。Here's the translation of the text into Traditional Chinese:在这篇论文中,我介绍了对大语言模型(LLM)使用语言基于行为评估来评估其认知能力的方法ológico Considerations。基于文献中的三个案例(通用常识准入标准、理解他者的能力评估和语法一致性测试),我描述了在应用认知测验于 LLM 时可能出现的常见困难。然后,我列出了10个做法和不做法,以帮助设计高质量的认知评估方法 для AI 系统。我的结论是,在 rapidly growing field of AI Psychology 中,这些做法和不做法在当前正在活跃的讨论中。
Low-Precision Mixed-Computation Models for Inference on Edge
results: 研究发现,使用混合计算方法可以提高模型的精度,而且耗能较低。具体来说,在视觉和语言模型中,混合计算方法的精度平均提高了1.5%,并且仅增加了0.19%的能量负担。Abstract
This paper presents a mixed-computation neural network processing approach for edge applications that incorporates low-precision (low-width) Posit and low-precision fixed point (FixP) number systems. This mixed-computation approach employs 4-bit Posit (Posit4), which has higher precision around zero, for representing weights with high sensitivity, while it uses 4-bit FixP (FixP4) for representing other weights. A heuristic for analyzing the importance and the quantization error of the weights is presented to assign the proper number system to different weights. Additionally, a gradient approximation for Posit representation is introduced to improve the quality of weight updates in the backpropagation process. Due to the high energy consumption of the fully Posit-based computations, neural network operations are carried out in FixP or Posit/FixP. An efficient hardware implementation of a MAC operation with a first Posit operand and FixP for a second operand and accumulator is presented. The efficacy of the proposed low-precision mixed-computation approach is extensively assessed on vision and language models. The results show that, on average, the accuracy of the mixed-computation is about 1.5% higher than that of FixP with a cost of 0.19% energy overhead.
摘要