cs.AI - 2023-07-09

On the Challenges of Deploying Privacy-Preserving Synthetic Data in the Enterprise

  • paper_url: http://arxiv.org/abs/2307.04208
  • repo_url: None
  • paper_authors: Lauren Arthur, Jason Costello, Jonathan Hardy, Will O’Brien, James Rea, Gareth Rees, Georgi Ganev
  • for: 本研究旨在探讨生成AI技术在企业部署中所遇到的挑战,尤其是由于巨量个人敏感数据的隐私问题。
  • methods: 本研究系统化了40多个挑战,并将其分为五大类:生成、基础设施与架构、治理、合规与法规、并 adopt。
  • results: 本研究提出了一种战略和系统的方法,可以帮助企业有效地解决挑战,并在实施解决方案时建立信任。
    Abstract Generative AI technologies are gaining unprecedented popularity, causing a mix of excitement and apprehension through their remarkable capabilities. In this paper, we study the challenges associated with deploying synthetic data, a subfield of Generative AI. Our focus centers on enterprise deployment, with an emphasis on privacy concerns caused by the vast amount of personal and highly sensitive data. We identify 40+ challenges and systematize them into five main groups -- i) generation, ii) infrastructure & architecture, iii) governance, iv) compliance & regulation, and v) adoption. Additionally, we discuss a strategic and systematic approach that enterprises can employ to effectively address the challenges and achieve their goals by establishing trust in the implemented solutions.
    摘要 <>通用AI技术在当前时期得到了历史上无 precedent的普及度,这引发了一种混乱的感觉,同时也带来了一些担忧。在这篇论文中,我们研究了生成数据的投入问题,这是通用AI技术的一个子领域。我们的研究重点在于企业部署,强调个人隐私权和敏感数据的巨大数量所引起的隐私问题。我们识别了40多个挑战,并将它们分为五个主要组:一、生成;二、基础设施与架构;三、管理;四、合规与法规;五、采用。此外,我们还讨论了企业可以采用的策略和系统性的方法,以确保在实施解决方案时建立信任。

Natural Language Instructions for Intuitive Human Interaction with Robotic Assistants in Field Construction Work

  • paper_url: http://arxiv.org/abs/2307.04195
  • repo_url: None
  • paper_authors: Somin Park, Xi Wang, Carol C. Menassa, Vineet R. Kamat, Joyce Y. Chai
  • For: This paper aims to provide a framework for human workers to interact with construction robots based on natural language instructions, enabling intuitive and familiar communication and improving teamwork and supervision in field construction.* Methods: The proposed method consists of three stages: Natural Language Understanding (NLU), Information Mapping (IM), and Robot Control (RC). The NLU module uses a language model to predict a tag for each word in the input natural language instruction. The IM module generates the final instructional output essential for the robot to acknowledge and perform the construction task, based on the result of the NLU module and building component information.* Results: A case study for drywall installation is conducted to evaluate the proposed approach, and the obtained results highlight the potential of using natural language-based interaction to replicate the communication that occurs between human workers within the context of human-robot teams.
    Abstract The introduction of robots is widely considered to have significant potential of alleviating the issues of worker shortage and stagnant productivity that afflict the construction industry. However, it is challenging to use fully automated robots in complex and unstructured construction sites. Human-Robot Collaboration (HRC) has shown promise of combining human workers' flexibility and robot assistants' physical abilities to jointly address the uncertainties inherent in construction work. When introducing HRC in construction, it is critical to recognize the importance of teamwork and supervision in field construction and establish a natural and intuitive communication system for the human workers and robotic assistants. Natural language-based interaction can enable intuitive and familiar communication with robots for human workers who are non-experts in robot programming. However, limited research has been conducted on this topic in construction. This paper proposes a framework to allow human workers to interact with construction robots based on natural language instructions. The proposed method consists of three stages: Natural Language Understanding (NLU), Information Mapping (IM), and Robot Control (RC). Natural language instructions are input to a language model to predict a tag for each word in the NLU module. The IM module uses the result of the NLU module and building component information to generate the final instructional output essential for a robot to acknowledge and perform the construction task. A case study for drywall installation is conducted to evaluate the proposed approach. The obtained results highlight the potential of using natural language-based interaction to replicate the communication that occurs between human workers within the context of human-robot teams.
    摘要 introduce robots 广泛认为可以有效缓解建筑业的工作人员短缺和低效 Productivity 问题。然而,在复杂和无结构的建筑 Site 中使用完全自动化 Robot 很具挑战性。人 robot合作 (HRC) 表现出了将人工作者的灵活性和机器助手的物理能力结合起来解决建筑工作中的不确定性的潜力。在将 HRC 引入建筑时,需要认可场地建筑 Teamwork 和监督的重要性,并建立一个自然和直观的沟通系统,以便人工作者和机器助手之间能够协作。使用自然语言基于的交互可以让人工作者成为不熟悉机器程序的情况下,与机器助手进行直观和熟悉的交互。然而,建筑业中对这个主题的研究较少。这篇论文提出了一种框架,以便人工作者通过自然语言指令与建筑机器人进行交互。该方法包括三个阶段:自然语言理解 (NLU)、信息映射 (IM) 和机器人控制 (RC)。自然语言指令将输入到语言模型中,以便预测每个词的标签。IM模块使用NLU模块的结果和建筑元件信息,生成构建任务所需的最终指令输出,以便机器人认可并执行构建任务。一个关于墙壁安装的实验研究被进行,以评估提议的方法。研究结果表明,使用自然语言基于的交互可以复制人工作者之间的通信,在人机合作团队中。

SAS Video-QA: Self-Adaptive Sampling for Efficient Video Question-Answering

  • paper_url: http://arxiv.org/abs/2307.04192
  • repo_url: https://github.com/declare-lab/sas-vqa
  • paper_authors: Wei Han, Hui Chen, Min-Yen Kan, Soujanya Poria
  • for: 提高视频理解模型的效果和可靠性,尤其是在实时应用场景中。
  • methods: 提出了两种框架采样策略:最域幂frames(MDF)和最含义frames(MIF),以最大化保留关键帧。MDF通过循环采样来减少风险,而MIF通过辅助模型来搜索个性化的关键帧。
  • results: 在三个公共数据集上(CLIP、GIT和All-in-one)进行了实验,结果表明,提出的采样策略可以提高图文预训练模型的性能。
    Abstract Video question--answering is a fundamental task in the field of video understanding. Although current vision--language models (VLMs) equipped with Video Transformers have enabled temporal modeling and yielded superior results, they are at the cost of huge computational power and thus too expensive to deploy in real-time application scenarios. An economical workaround only samples a small portion of frames to represent the main content of that video and tune an image--text model on these sampled frames. Recent video understanding models usually randomly sample a set of frames or clips, regardless of internal correlations between their visual contents, nor their relevance to the problem. We argue that such kinds of aimless sampling may omit the key frames from which the correct answer can be deduced, and the situation gets worse when the sampling sparsity increases, which always happens as the video lengths increase. To mitigate this issue, we propose two frame sampling strategies, namely the most domain frames (MDF) and most implied frames (MIF), to maximally preserve those frames that are most likely vital to the given questions. MDF passively minimizes the risk of key frame omission in a bootstrap manner, while MIS actively searches key frames customized for each video--question pair with the assistance of auxiliary models. The experimental results on three public datasets from three advanced VLMs (CLIP, GIT and All-in-one) demonstrate that our proposed strategies can boost the performance for image--text pretrained models. The source codes pertaining to the method proposed in this paper are publicly available at https://github.com/declare-lab/sas-vqa.
    摘要 视频问答是视频理解领域的基本任务。尽管当前的视频语言模型(VLM)搭配视频变换器已经实现了时间模型化,并且获得了更高的性能,但是这些模型却需要很大的计算能力,因此在实时应用场景中成本太高。为了解决这个问题,我们提出了两种帧 sampling 策略:最域帧(MDF)和最含义帧(MIF),以最大化保留关键帧。MDF 采用 bootstrap 方式减少风险 omitted 关键帧,而 MIF 通过辅助模型自动搜索每个视频问题对应的关键帧。我们在三个公共数据集上(CLIP、GIT 和 All-in-one)进行了实验,结果显示,我们提出的策略可以提高图文预训练模型的性能。关于我们提出的方法的源代码,可以在 GitHub 上获取:https://github.com/declare-lab/sas-vqa。

Review of feedback in Automated Essay Scoring

  • paper_url: http://arxiv.org/abs/2307.05553
  • repo_url: None
  • paper_authors: You-Jin Jong, Yong-Jin Kim, Ok-Chol Ri
  • for: 这篇论文主要是为了探讨自动化论文评分系统的发展和其在写作技己提升方面的应用。
  • methods: 本论文通过审查已有的研究和最新的案例研究,探讨了不同类型的反馈和论文特征在自动化论文评分系统中的应用。
  • results: 研究发现,反馈是自动化论文评分系统的关键因素,可以帮助用户提升写作技己。
    Abstract The first automated essay scoring system was developed 50 years ago. Automated essay scoring systems are developing into systems with richer functions than the previous simple scoring systems. Its purpose is not only to score essays but also as a learning tool to improve the writing skill of users. Feedback is the most important aspect of making an automated essay scoring system useful in real life. The importance of feedback was already emphasized in the first AES system. This paper reviews research on feedback including different feedback types and essay traits on automated essay scoring. We also reviewed the latest case studies of the automated essay scoring system that provides feedback.
    摘要 50 年前开发出了第一个自动化评分系统,现在自动化评分系统在功能上不断提高,不再是简单的评分系统。它的目的不只是评分文章,更重要的是作为学习工具,帮助用户提高写作技巧。回馈是自动化评分系统在实际应用中最重要的一部分。这篇文章检视了不同类型的回馈和文章特征在自动化评分系统中的应用,同时也检视了最新的案例研究。

Latent Graph Attention for Enhanced Spatial Context

  • paper_url: http://arxiv.org/abs/2307.04149
  • repo_url: None
  • paper_authors: Ayush Singh, Yash Bhambhu, Himanshu Buckchash, Deepak K. Gupta, Dilip K. Prasad
  • for: 这个论文的目的是提出一种 computationally inexpensive 和稳定的全局 контекст模型,用于提高图像转换 зада务中的性能。
  • methods: 该论文使用了一种名为 Latent Graph Attention (LGA) 的模型,它利用一种分布式图像网络来传递信息,并通过自适应的深度设置来控制全局 контекст的扩展。
  • results: 该论文通过在三个复杂的应用中(透明物体分割、雾度修复和光流估计)的实验,表明 incorporating LGA 可以提高性能,而且可以使小型结构更接近大型结构的性能。
    Abstract Global contexts in images are quite valuable in image-to-image translation problems. Conventional attention-based and graph-based models capture the global context to a large extent, however, these are computationally expensive. Moreover, the existing approaches are limited to only learning the pairwise semantic relation between any two points on the image. In this paper, we present Latent Graph Attention (LGA) a computationally inexpensive (linear to the number of nodes) and stable, modular framework for incorporating the global context in the existing architectures, especially empowering small-scale architectures to give performance closer to large size architectures, thus making the light-weight architectures more useful for edge devices with lower compute power and lower energy needs. LGA propagates information spatially using a network of locally connected graphs, thereby facilitating to construct a semantically coherent relation between any two spatially distant points that also takes into account the influence of the intermediate pixels. Moreover, the depth of the graph network can be used to adapt the extent of contextual spread to the target dataset, thereby being able to explicitly control the added computational cost. To enhance the learning mechanism of LGA, we also introduce a novel contrastive loss term that helps our LGA module to couple well with the original architecture at the expense of minimal additional computational load. We show that incorporating LGA improves the performance on three challenging applications, namely transparent object segmentation, image restoration for dehazing and optical flow estimation.
    摘要

A Survey and Approach to Chart Classification

  • paper_url: http://arxiv.org/abs/2307.04147
  • repo_url: None
  • paper_authors: Anurag Dhote, Mohammed Javed, David S Doermann
  • for: 这篇论文主要针对的是自动化图表分类问题,即通过分类图表的类型和特征来理解图表中含义的问题。
  • methods: 这篇论文主要介绍了现有的图表分类技术,包括传统的机器学习方法、卷积神经网络和变换器等。
  • results: 论文中提出了一种基于视觉变换器的图表分类模型,并在CHARTINFO UB-UNITECH PMC数据集上进行了比较性表现分析,实现了图表分类领域的状态机器。
    Abstract Charts represent an essential source of visual information in documents and facilitate a deep understanding and interpretation of information typically conveyed numerically. In the scientific literature, there are many charts, each with its stylistic differences. Recently the document understanding community has begun to address the problem of automatic chart understanding, which begins with chart classification. In this paper, we present a survey of the current state-of-the-art techniques for chart classification and discuss the available datasets and their supported chart types. We broadly classify these contributions as traditional approaches based on ML, CNN, and Transformers. Furthermore, we carry out an extensive comparative performance analysis of CNN-based and transformer-based approaches on the recently published CHARTINFO UB-UNITECH PMC dataset for the CHART-Infographics competition at ICPR 2022. The data set includes 15 different chart categories, including 22,923 training images and 13,260 test images. We have implemented a vision-based transformer model that produces state-of-the-art results in chart classification.
    摘要

Emotion Analysis on EEG Signal Using Machine Learning and Neural Network

  • paper_url: http://arxiv.org/abs/2307.05375
  • repo_url: None
  • paper_authors: S. M. Masrur Ahmed, Eshaan Tanzim Sabur
  • for: 本研究的目的是提高使用脑波识别情绪的性能。
  • methods: 本研究使用了EEG信号处理技术和人工神经网络模型,包括SVM、KNN和RNN等,以提高情绪识别性能。
  • results: 研究在DEAP数据集上进行了多种情绪状态的分类和测试,并获得了较高的识别精度。
    Abstract Emotion has a significant influence on how one thinks and interacts with others. It serves as a link between how a person feels and the actions one takes, or it could be said that it influences one's life decisions on occasion. Since the patterns of emotions and their reflections vary from person to person, their inquiry must be based on approaches that are effective over a wide range of population regions. To extract features and enhance accuracy, emotion recognition using brain waves or EEG signals requires the implementation of efficient signal processing techniques. Various approaches to human-machine interaction technologies have been ongoing for a long time, and in recent years, researchers have had great success in automatically understanding emotion using brain signals. In our research, several emotional states were classified and tested on EEG signals collected from a well-known publicly available dataset, the DEAP Dataset, using SVM (Support Vector Machine), KNN (K-Nearest Neighbor), and an advanced neural network model, RNN (Recurrent Neural Network), trained with LSTM (Long Short Term Memory). The main purpose of this study is to improve ways to improve emotion recognition performance using brain signals. Emotions, on the other hand, can change with time. As a result, the changes in emotion over time are also examined in our research.
    摘要 感情对人们的思维和社交互动产生重要影响。它是人们的情感和行为之间的联系,也可以说是影响人们的生活决策的一种因素。由于人们的情感模式和表达方式不同,因此对于不同人群的情感识别需要采用有效的方法。为了提取特征和提高准确性,使用大脑电声信号进行情感识别需要实施有效的信号处理技术。在人机交互技术方面,研究人员已经在过去几年里取得了很大的成功,通过自动理解大脑电声信号来识别情感。在我们的研究中,我们使用了DEAP数据集,使用SVM、KNN和RNN(长短期 памя存)模型,并在LSTM(长短期 памя存)模型中进行训练,以提高情感识别性能。本研究的主要目标是提高情感识别性能使用大脑电声信号。同时,我们还对情感的变化考查了时间的影响。

A Novel Explainable Artificial Intelligence Model in Image Classification problem

  • paper_url: http://arxiv.org/abs/2307.04137
  • repo_url: None
  • paper_authors: Quoc Hung Cao, Truong Thanh Hung Nguyen, Vo Thanh Khang Nguyen, Xuan Phong Nguyen
  • for: 本研究旨在提供一种新的图像分类模型解释方法,以帮助AI科学家和实际应用者更深入地理解模型内部的工作机制。
  • methods: 本研究使用了LIME、CAM和GradCAM等多种现有的解释算法,并将其综合使用以提高解释效果。同时,本方法还实现了提高解释效果的时间和空间约束。
  • results: 对于ILSVRC数据集中的多种图像分类模型,包括ResNet50、Inception-v3和VGG16等,本方法在准确率和解释效果两个方面均取得了出色的结果。
    Abstract In recent years, artificial intelligence is increasingly being applied widely in many different fields and has a profound and direct impact on human life. Following this is the need to understand the principles of the model making predictions. Since most of the current high-precision models are black boxes, neither the AI scientist nor the end-user deeply understands what's going on inside these models. Therefore, many algorithms are studied for the purpose of explaining AI models, especially those in the problem of image classification in the field of computer vision such as LIME, CAM, GradCAM. However, these algorithms still have limitations such as LIME's long execution time and CAM's confusing interpretation of concreteness and clarity. Therefore, in this paper, we propose a new method called Segmentation - Class Activation Mapping (SeCAM) that combines the advantages of these algorithms above, while at the same time overcoming their disadvantages. We tested this algorithm with various models, including ResNet50, Inception-v3, VGG16 from ImageNet Large Scale Visual Recognition Challenge (ILSVRC) data set. Outstanding results when the algorithm has met all the requirements for a specific explanation in a remarkably concise time.
    摘要 Recently,人工智能在各个领域广泛应用,对人类生活产生深远的影响。随着模型预测的需求,需要理解模型的原理。然而,现有的高精度模型大多是黑obox,科学家和用户都无法深入了解模型内部的工作原理。为此,许多算法被研究以解释AI模型,特别是计算视觉领域的图像分类问题中的LIME、CAM和GradCAM等算法。然而,这些算法仍有局限性,如LIME的执行时间过长和CAM的抽象和明确性的含糊不清。因此,在本文中,我们提出了一种新的方法 called Segmentation - Class Activation Mapping(SeCAM),该方法结合了上述算法的优点,同时超越了它们的缺点。我们在不同的模型,包括ResNet50、Inception-v3和VGG16等,在ImageNet大规模视觉识别挑战(ILSVRC)数据集上进行了测试,结果很出色,可以快速和准确地解释特定的模型预测结果。

Reasoning over the Behaviour of Objects in Video-Clips for Adverb-Type Recognition

  • paper_url: http://arxiv.org/abs/2307.04132
  • repo_url: None
  • paper_authors: Amrit Diggavi Seshadri, Alessandra Russo
  • for: 本研究旨在Recognize scene adverbs from raw video clips, without assuming knowledge of the clips’ underlying action types.
  • methods: 提议一个新的框架,利用对 Raw video clips 中对象行为的抽象来认识clip的相应adverb-types。该框架包括一个新的数据采集管道和一种基于符号和转换器的推理方法。
  • results: 实验结果表明,提议的方法可以与之前的状态OF-THE-ART技术进行比较,而且支持符号视频处理的努力。此外,我们还发布了两个新的数据集,以支持符号视频处理:MSR-VTT-ASP和ActivityNet-ASP数据集。
    Abstract In this work, following the intuition that adverbs describing scene-sequences are best identified by reasoning over high-level concepts of object-behavior, we propose the design of a new framework that reasons over object-behaviours extracted from raw-video-clips to recognize the clip's corresponding adverb-types. Importantly, while previous works for general scene adverb-recognition assume knowledge of the clips underlying action-types, our method is directly applicable in the more general problem setting where the action-type of a video-clip is unknown. Specifically, we propose a novel pipeline that extracts human-interpretable object-behaviour-facts from raw video clips and propose novel symbolic and transformer based reasoning methods that operate over these extracted facts to identify adverb-types. Experiment results demonstrate that our proposed methods perform favourably against the previous state-of-the-art. Additionally, to support efforts in symbolic video-processing, we release two new datasets of object-behaviour-facts extracted from raw video clips - the MSR-VTT-ASP and ActivityNet-ASP datasets.
    摘要 在这个工作中,我们采用直觉,认为Scene-sequences中的副词可以通过对物体行为高级概念的逻辑来识别。因此,我们提出了一种新的框架,可以从 raw-video-clip 中提取 object-behaviour-facts,并使用符号和变换器来进行逻辑和计算来识别 adverb-types。与前一些Scene adverb-recognition工作不同,我们的方法不需要知道clip的 action-type。具体来说,我们提出了一个新的管道,可以从 raw video clips 中提取人类可解释的 object-behaviour-facts,并提出了一种新的符号和变换器来进行逻辑和计算来识别 adverb-types。实验结果表明,我们的提出的方法在比较prevailing state-of-the-art方法之上表现出色。此外,为支持符号视频处理,我们释放了两个新的对象行为事实数据集 - MSR-VTT-ASP 和 ActivityNet-ASP 数据集。

  • paper_url: http://arxiv.org/abs/2307.04131
  • repo_url: None
  • paper_authors: Yiyang Zhao, Tian Guo
  • for: 降低神经网络设计过程中的能耗成本和碳负担
  • methods: 提出了一种基于碳效率的神经建筑搜索方法(CE-NAS),包括碳效率评估算法、多目标优化器和启发式GPU分配策略
  • results: 使用最新的NASbenchmark数据集和两个碳轨迹进行跟踪驱动的模拟结果显示,CE-NAS在碳负担和搜索效率方面比基eline三个基线更好
    Abstract This work presents a novel approach to neural architecture search (NAS) that aims to reduce energy costs and increase carbon efficiency during the model design process. The proposed framework, called carbon-efficient NAS (CE-NAS), consists of NAS evaluation algorithms with different energy requirements, a multi-objective optimizer, and a heuristic GPU allocation strategy. CE-NAS dynamically balances energy-efficient sampling and energy-consuming evaluation tasks based on current carbon emissions. Using a recent NAS benchmark dataset and two carbon traces, our trace-driven simulations demonstrate that CE-NAS achieves better carbon and search efficiency than the three baselines.
    摘要

FILM: How can Few-Shot Image Classification Benefit from Pre-Trained Language Models?

  • paper_url: http://arxiv.org/abs/2307.04114
  • repo_url: None
  • paper_authors: Zihao Jiang, Yunkai Dang, Dong Pang, Huishuai Zhang, Weiran Huang
  • for: 增强少样本学习的方法,使模型可以通过少量样本进行泛化。
  • methods: 使用预训练语言模型,基于对比学习来提取 semantic information,并通过metric模块进行对visual特征和文本嵌入的Alignment。
  • results: 通过对多个 benchmark 进行广泛的实验,证明了我们的方法的有效性。
    Abstract Few-shot learning aims to train models that can be generalized to novel classes with only a few samples. Recently, a line of works are proposed to enhance few-shot learning with accessible semantic information from class names. However, these works focus on improving existing modules such as visual prototypes and feature extractors of the standard few-shot learning framework. This limits the full potential use of semantic information. In this paper, we propose a novel few-shot learning framework that uses pre-trained language models based on contrastive learning. To address the challenge of alignment between visual features and textual embeddings obtained from text-based pre-trained language model, we carefully design the textual branch of our framework and introduce a metric module to generalize the cosine similarity. For better transferability, we let the metric module adapt to different few-shot tasks and adopt MAML to train the model via bi-level optimization. Moreover, we conduct extensive experiments on multiple benchmarks to demonstrate the effectiveness of our method.
    摘要 少量学习目标是训练模型能够通过少量样本泛化到新类。最近,一些工作提出了使用可 accessible semantic information from class names 进行增强少量学习。然而,这些工作通常是修改标准少量学习框架中的视觉原型和特征提取器。这限制了semantic information的全部潜力。在本文中,我们提出了一种新的少量学习框架,使用基于对比学习的预训练语言模型。为了解决视觉特征和文本嵌入获得自文本预训练语言模型之间的对应挑战,我们在文本分支中进行了精心的设计,并引入了一个度量模块来泛化cosine相似性。为了提高trasferability,我们让度量模块适应不同的少量任务,并采用MAML来训练模型viabi-level优化。此外,我们在多个benchmark上进行了广泛的实验,以证明我们的方法的有效性。

A User Study on Explainable Online Reinforcement Learning for Adaptive Systems

  • paper_url: http://arxiv.org/abs/2307.04098
  • repo_url: None
  • paper_authors: Andreas Metzger, Jan Laufer, Felix Feit, Klaus Pohl
  • For: The paper aims to evaluate the effectiveness and usability of an explainable reinforcement learning technique (XRL-DINE) for software engineers to understand and debug adaptive systems.* Methods: The paper uses an empirical user study involving 54 software engineers to assess the performance of software engineers when performing different tasks using XRL-DINE, and to evaluate the perceived usefulness and ease of use of XRL-DINE.* Results: The study finds that XRL-DINE provides visual insights into why certain decisions were made at important time points, and that software engineers perceive XRL-DINE as useful and easy to use.
    Abstract Online reinforcement learning (RL) is increasingly used for realizing adaptive systems in the presence of design time uncertainty. Online RL facilitates learning from actual operational data and thereby leverages feedback only available at runtime. However, Online RL requires the definition of an effective and correct reward function, which quantifies the feedback to the RL algorithm and thereby guides learning. With Deep RL gaining interest, the learned knowledge is no longer explicitly represented, but is represented as a neural network. For a human, it becomes practically impossible to relate the parametrization of the neural network to concrete RL decisions. Deep RL thus essentially appears as a black box, which severely limits the debugging of adaptive systems. We previously introduced the explainable RL technique XRL-DINE, which provides visual insights into why certain decisions were made at important time points. Here, we introduce an empirical user study involving 54 software engineers from academia and industry to assess (1) the performance of software engineers when performing different tasks using XRL-DINE and (2) the perceived usefulness and ease of use of XRL-DINE.
    摘要

DebateKG: Automatic Policy Debate Case Creation with Semantic Knowledge Graphs

  • paper_url: http://arxiv.org/abs/2307.04090
  • repo_url: https://github.com/hellisotherpeople/debatekg
  • paper_authors: Allen Roush
  • for: 本研究的目的是使用自然语言处理系统解决竞赛辩论中的问题,特别是构建高质量辩论案例。
  • methods: 本研究使用受限短 PATH 搜索在Argumentative Semantic Knowledge Graphs上进行了实现。
  • results: 研究人员在Policy Debate中的一种美国竞赛辩论中,使用这种方法可以大幅提高DebateSum数据集的质量,并且开发了一种新的评价方法来评估不同的知识图。
    Abstract Recent work within the Argument Mining community has shown the applicability of Natural Language Processing systems for solving problems found within competitive debate. One of the most important tasks within competitive debate is for debaters to create high quality debate cases. We show that effective debate cases can be constructed using constrained shortest path traversals on Argumentative Semantic Knowledge Graphs. We study this potential in the context of a type of American Competitive Debate, called Policy Debate, which already has a large scale dataset targeting it called DebateSum. We significantly improve upon DebateSum by introducing 53180 new examples, as well as further useful metadata for every example, to the dataset. We leverage the txtai semantic search and knowledge graph toolchain to produce and contribute 9 semantic knowledge graphs built on this dataset. We create a unique method for evaluating which knowledge graphs are better in the context of producing policy debate cases. A demo which automatically generates debate cases, along with all other code and the Knowledge Graphs, are open-sourced and made available to the public here: https://github.com/Hellisotherpeople/DebateKG
    摘要 近期在论证挖掘社区中的工作表明,自然语言处理系统可以解决竞论中的问题。竞论中最重要的任务之一是论者创造高质量辩论案例。我们表明,使用限制最短路径搜索的方法可以构建有效的辩论案例。我们在美国竞论中的一种类型,即政策辩论,已经有了大规模的数据集,即DebateSum。我们在DebateSum上进行了大幅改进,并将53180个新的例子和每个例子的更多有用的元数据添加到数据集中。我们利用txtai的 semantic search和知识图工具链来生成和投入9个基于这个数据集的Semantic Knowledge Graph。我们还创造了一种用于评估知识图的评价方法。一个可以自动生成辩论案例的 demo,以及所有代码和知识图,都被开源并公开提供于公众,可以在以下链接中找到:https://github.com/Hellisotherpeople/DebateKG。

Semi Supervised Meta Learning for Spatiotemporal Learning

  • paper_url: http://arxiv.org/abs/2308.01916
  • repo_url: None
  • paper_authors: Faraz Waseem, Pratyush Muthukumar
  • for: 这个论文的目的是应用元学习到自动编码器中进行空间时间学习。
  • methods: 这个论文使用的方法是将元学习搅入现有的状态艺术架构中。他们通过三个步骤来实现这一目标:首先,他们使用隐藏状态搅入网络(MANN)架构来应用元学习到他们的小规模空间时间数据集中进行视频重建任务。其次,他们使用一个预训练的MAE编码器并在其上添加一个分类头进行动作分类任务。最后,他们使用一个预训练的MAE编码器并与Mann架构结合来进行动作分类任务。
  • results: 这个论文的结果表明,通过应用元学习到现有的状态艺术架构中,可以提高空间时间学习的性能。
    Abstract We approached the goal of applying meta-learning to self-supervised masked autoencoders for spatiotemporal learning in three steps. Broadly, we seek to understand the impact of applying meta-learning to existing state-of-the-art representation learning architectures. Thus, we test spatiotemporal learning through: a meta-learning architecture only, a representation learning architecture only, and an architecture applying representation learning alongside a meta learning architecture. We utilize the Memory Augmented Neural Network (MANN) architecture to apply meta-learning to our framework. Specifically, we first experiment with applying a pre-trained MAE and fine-tuning on our small-scale spatiotemporal dataset for video reconstruction tasks. Next, we experiment with training an MAE encoder and applying a classification head for action classification tasks. Finally, we experiment with applying a pre-trained MAE and fine-tune with MANN backbone for action classification tasks.
    摘要 我们通过三步来应用meta学到自我超visedMasked autoencoders中的空间时间学习。大致来说,我们想要了解将meta学应用到现有的状态艺术 repreentation learning架构中的影响。因此,我们通过以下三种方式进行测试:1. 仅使用meta学架构,2. 仅使用 repreentation learning架构,3. 将 repreentation learning架构与meta学架构相结合。我们使用Memory Augmented Neural Network(MANN)架构来应用meta学到我们的小规模空间时间数据集中。具体来说,我们首先试验使用预训练的MAE和我们小规模空间时间数据集进行视频重建任务。然后,我们试验将MAE编码器训练并应用分类头进行动作分类任务。最后,我们试验使用预训练的MAE和Mann架构进行动作分类任务。

Disentangling Societal Inequality from Model Biases: Gender Inequality in Divorce Court Proceedings

  • paper_url: http://arxiv.org/abs/2307.10200
  • repo_url: None
  • paper_authors: Sujan Dutta, Parth Srivastava, Vaishnavi Solunke, Swaprava Nath, Ashiqur R. KhudaBukhsh
  • For: This paper uses a large corpus of court proceedings to investigate gender inequality in the context of divorce in India.* Methods: The authors use natural language processing (NLP) techniques to analyze the court proceedings and quantify societal inequalities. They also modify existing NLP resources to better suit their research goals.* Results: The authors find that while there may be changing norms in India with more women challenging patriarchy, there is still striking gender inequality in the context of divorce, with women often experiencing domestic violence.Here’s the same information in Simplified Chinese text:* For: 这篇论文通过废除婚姻的法律分裂的法院记录,调查印度妇女在婚姻中的不平等。* Methods: 作者使用自然语言处理(NLP)技术分析法院记录,量化社会不平等。他们还对现有的NLP资源进行了一些修改,以更好地适应他们的研究目标。* Results: 作者发现,尽管印度可能存在变革,但在废除婚姻中,女性仍然面临着家庭暴力的问题,是一种 striking 的性别不平等。
    Abstract Divorce is the legal dissolution of a marriage by a court. Since this is usually an unpleasant outcome of a marital union, each party may have reasons to call the decision to quit which is generally documented in detail in the court proceedings. Via a substantial corpus of 17,306 court proceedings, this paper investigates gender inequality through the lens of divorce court proceedings. While emerging data sources (e.g., public court records) on sensitive societal issues hold promise in aiding social science research, biases present in cutting-edge natural language processing (NLP) methods may interfere with or affect such studies. We thus require a thorough analysis of potential gaps and limitations present in extant NLP resources. In this paper, on the methodological side, we demonstrate that existing NLP resources required several non-trivial modifications to quantify societal inequalities. On the substantive side, we find that while a large number of court cases perhaps suggest changing norms in India where women are increasingly challenging patriarchy, AI-powered analyses of these court proceedings indicate striking gender inequality with women often subjected to domestic violence.
    摘要

A Personalized Reinforcement Learning Summarization Service for Learning Structure from Unstructured Data

  • paper_url: http://arxiv.org/abs/2307.05696
  • repo_url: None
  • paper_authors: Samira Ghodratnama, Amin Beheshti, Mehrdad Zakershahrak
  • for: 提供个性化文摘服务,帮助用户从大量文档中提取有意义的信息。
  • methods: 使用强化学习算法生成个性化文摘,并将文摘映射到一个层次结构中,以便用户更好地理解和浏览文档。
  • results: 提高了用户的理解和 Navigation 能力,帮助用户从文档中提取有意义的信息。
    Abstract The exponential growth of textual data has created a crucial need for tools that assist users in extracting meaningful insights. Traditional document summarization approaches often fail to meet individual user requirements and lack structure for efficient information processing. To address these limitations, we propose Summation, a hierarchical personalized concept-based summarization approach. It synthesizes documents into a concise hierarchical concept map and actively engages users by learning and adapting to their preferences. Using a Reinforcement Learning algorithm, Summation generates personalized summaries for unseen documents on specific topics. This framework enhances comprehension, enables effective navigation, and empowers users to extract meaningful insights from large document collections aligned with their unique requirements.
    摘要 文本数据的指数增长带来了提取有意义信息的重要需求。传统文摘方法frequently fail to meet individual user requirements and lack structure for efficient information processing. To address these limitations, we propose Summation, a hierarchical personalized concept-based summarization approach. It synthesizes documents into a concise hierarchical concept map and actively engages users by learning and adapting to their preferences. Using a Reinforcement Learning algorithm, Summation generates personalized summaries for unseen documents on specific topics. This framework enhances comprehension, enables effective navigation, and empowers users to extract meaningful insights from large document collections aligned with their unique requirements.Here's a word-for-word translation of the text into Simplified Chinese:文本数据的指数增长带来了提取有意义信息的重要需求。传统文摘方法frequently fail to meet individual user requirements and lack structure for efficient information processing. To address these limitations, we propose Summation, a hierarchical personalized concept-based summarization approach. It synthesizes documents into a concise hierarchical concept map and actively engages users by learning and adapting to their preferences. Using a Reinforcement Learning algorithm, Summation generates personalized summaries for unseen documents on specific topics. This framework enhances comprehension, enables effective navigation, and empowers users to extract meaningful insights from large document collections aligned with their unique requirements.

Multi-Head Attention Mechanism Learning for Cancer New Subtypes and Treatment Based on Cancer Multi-Omics Data

  • paper_url: http://arxiv.org/abs/2307.04075
  • repo_url: None
  • paper_authors: Liangrui Pan, Dazhen Liu, Yutao Dou, Lian Wang, Zhichao Feng, Pengfei Rong, Liwen Xu, Shaoliang Peng
    for:The paper aims to identify and characterize cancer subtypes using unsupervised contrastive learning on multi-omics data.methods:The proposed method uses a generalization framework based on attention mechanisms for unsupervised contrastive learning (AMUCL), which includes a decoupled contrastive learning model (DMACL) based on a multi-head attention mechanism to deeply extract multi-omics data features and identify new cancer subtypes.results:The DMACL model achieved the most reliable cancer subtype clustering results on a single-cell multi-omics dataset and a cancer multi-omics dataset, with a C-index of 0.002, a Silhouette score of 0.801, and a Davies Bouldin Score of 0.38 on the single-cell dataset, and a C-index of 0.016, a Silhouette score of 0.688, and a Davies Bouldin Score of 0.46 on the cancer dataset. The results also revealed six cancer subtypes of AML, which were validated through GO functional enrichment, subtype-specific biological functions, and GSEA.
    Abstract Due to the high heterogeneity and clinical characteristics of cancer, there are significant differences in multi-omics data and clinical features among subtypes of different cancers. Therefore, the identification and discovery of cancer subtypes are crucial for the diagnosis, treatment, and prognosis of cancer. In this study, we proposed a generalization framework based on attention mechanisms for unsupervised contrastive learning (AMUCL) to analyze cancer multi-omics data for the identification and characterization of cancer subtypes. AMUCL framework includes a unsupervised multi-head attention mechanism, which deeply extracts multi-omics data features. Importantly, a decoupled contrastive learning model (DMACL) based on a multi-head attention mechanism is proposed to learn multi-omics data features and clusters and identify new cancer subtypes. This unsupervised contrastive learning method clusters subtypes by calculating the similarity between samples in the feature space and sample space of multi-omics data. Compared to 11 other deep learning models, the DMACL model achieved a C-index of 0.002, a Silhouette score of 0.801, and a Davies Bouldin Score of 0.38 on a single-cell multi-omics dataset. On a cancer multi-omics dataset, the DMACL model obtained a C-index of 0.016, a Silhouette score of 0.688, and a Davies Bouldin Score of 0.46, and obtained the most reliable cancer subtype clustering results for each type of cancer. Finally, we used the DMACL model in the AMUCL framework to reveal six cancer subtypes of AML. By analyzing the GO functional enrichment, subtype-specific biological functions, and GSEA of AML, we further enhanced the interpretability of cancer subtype analysis based on the generalizable AMUCL framework.
    摘要 因为癌症的高度多样性和临床特征,不同类型的癌症在多Omics数据和临床特征上存在显著的差异。因此,癌症类型的识别和描述是诊断、治疗和预 afterwards 的关键。在本研究中,我们提出了基于注意力机制的通用泛化学习(AMUCL)框架,用于分析癌症多Omics数据,以识别和描述癌症类型。AMUCL框架包括一种多头注意力机制,深度提取多Omics数据特征。另外,基于多头注意力机制的异步对比学习模型(DMACL)被提出,以学习多Omics数据特征和群集,并Identify新的癌症类型。这种无监督对比学习方法,通过计算样本在特征空间和样本空间的相似度,对癌症类型进行归类。与11种深度学习模型相比,DMACL模型在单个细胞多Omics数据集上达到了C指数0.002,Silhouette分数0.801和Davies Bouldin分数0.38。在癌症多Omics数据集上,DMACL模型获得了C指数0.016,Silhouette分数0.688和Davies Bouldin分数0.46,并获得了每种癌症类型的最可靠归类结果。最后,我们使用DMACL模型在AMUCL框架中,揭示了急性骨髓癌(AML)中的六种癌症类型。通过分析GO功能强化、癌症类型特有的生物功能和GSEA,我们进一步增强了癌症类型分析的可读性,基于通用的AMUCL框架。

Contextual Dynamic Pricing with Strategic Buyers

  • paper_url: http://arxiv.org/abs/2307.04055
  • repo_url: None
  • paper_authors: Pangpang Liu, Zhuoran Yang, Zhaoran Wang, Will Wei Sun
  • for: 本研究旨在解决在优化销售价格时,买家可以通过 manipulate 自己的特征数据来获得更低的价格,这会增加销售商的损失。
  • methods: 本研究使用了 contextual dynamic pricing 问题,即销售商不知道买家真实的特征,只能根据买家提供的报告来决定价格。同时,销售商也不知道买家对产品的评价,只能根据买家的回答来判断销售成功或失败。
  • results: 研究人员提出了一种策略性动态价格策略,可以考虑到买家的策略行为,以 maximize 销售商的总收入。此策略不仅不比随机价格策略差,而且可以同时考虑到买家的策略行为和不确定的 manipulate 成本。实验结果表明,这种策略可以与其他不考虑买家策略行为的价格策略相比,具有更高的效果。
    Abstract Personalized pricing, which involves tailoring prices based on individual characteristics, is commonly used by firms to implement a consumer-specific pricing policy. In this process, buyers can also strategically manipulate their feature data to obtain a lower price, incurring certain manipulation costs. Such strategic behavior can hinder firms from maximizing their profits. In this paper, we study the contextual dynamic pricing problem with strategic buyers. The seller does not observe the buyer's true feature, but a manipulated feature according to buyers' strategic behavior. In addition, the seller does not observe the buyers' valuation of the product, but only a binary response indicating whether a sale happens or not. Recognizing these challenges, we propose a strategic dynamic pricing policy that incorporates the buyers' strategic behavior into the online learning to maximize the seller's cumulative revenue. We first prove that existing non-strategic pricing policies that neglect the buyers' strategic behavior result in a linear $\Omega(T)$ regret with $T$ the total time horizon, indicating that these policies are not better than a random pricing policy. We then establish that our proposed policy achieves a sublinear regret upper bound of $O(\sqrt{T})$. Importantly, our policy is not a mere amalgamation of existing dynamic pricing policies and strategic behavior handling algorithms. Our policy can also accommodate the scenario when the marginal cost of manipulation is unknown in advance. To account for it, we simultaneously estimate the valuation parameter and the cost parameter in the online pricing policy, which is shown to also achieve an $O(\sqrt{T})$ regret bound. Extensive experiments support our theoretical developments and demonstrate the superior performance of our policy compared to other pricing policies that are unaware of the strategic behaviors.
    摘要 企业通常采用个性化价格策略,根据消费者个人特点来调整价格。在这个过程中,消费者也可以通过操作自己的特征数据来获得更低的价格,这会产生一定的操作成本。这种战略性行为可能会妨碍企业实现最大利润。本文研究了Contextual Dynamic Pricing问题,在这个问题中,卖方不知道买方的真实特征,只知道买方通过战略行为 manipulate的特征。此外,卖方也不知道买方对产品的评估价值,只知道一个二分类回应,表示是否成交或者不成交。识别这些挑战,我们提出了一种战略动态价格策略,该策略将买方的战略行为纳入在线学习中,以最大化卖方的总收益。我们首先证明了忽略买方战略行为的非战略价格策略会在总时间周期T上得到线性Ω(T)的 regret,这表明这些策略与随机价格策略相当。然后,我们证明了我们提出的策略可以达到O(√T)的 regret上限,这表明我们的策略不仅不 inferior于随机价格策略,还可以在不知道 manipulate 成本的情况下进行优化。为了考虑这种情况,我们同时估算价格参数和成本参数,并证明该策略可以达到O(√T)的 regret上限。实验结果支持我们的理论发展,并证明了我们的策略比其他不考虑买方战略行为的价格策略更高效。

A Physics-Informed Low-Shot Learning For sEMG-Based Estimation of Muscle Force and Joint Kinematics

  • paper_url: http://arxiv.org/abs/2307.05361
  • repo_url: None
  • paper_authors: Yue Shi, Shuhao Ma, Yihui Zhao, Zhiqiang Zhang
    for:This paper aims to improve the estimation of muscle force and joint kinematics from surface electromyography (sEMG) data using a physics-informed low-shot learning method.methods:The proposed method integrates Lagrange’s equation of motion and an inverse dynamic muscle model into a generative adversarial network (GAN) framework for structured feature decoding and extrapolated estimation from small sample data.results:The proposed method outperforms selected benchmark methods, including physics-informed convolution neural network (PI-CNN), vallina GAN, and multi-layer extreme learning machine (ML-ELM), in estimating muscle forces and joint kinematics. The estimations are also unbiased compared to physics-based inverse dynamics.
    Abstract Muscle force and joint kinematics estimation from surface electromyography (sEMG) are essential for real-time biomechanical analysis of the dynamic interplay among neural muscle stimulation, muscle dynamics, and kinetics. Recent advances in deep neural networks (DNNs) have shown the potential to improve biomechanical analysis in a fully automated and reproducible manner. However, the small sample nature and physical interpretability of biomechanical analysis limit the applications of DNNs. This paper presents a novel physics-informed low-shot learning method for sEMG-based estimation of muscle force and joint kinematics. This method seamlessly integrates Lagrange's equation of motion and inverse dynamic muscle model into the generative adversarial network (GAN) framework for structured feature decoding and extrapolated estimation from the small sample data. Specifically, Lagrange's equation of motion is introduced into the generative model to restrain the structured decoding of the high-level features following the laws of physics. And a physics-informed policy gradient is designed to improve the adversarial learning efficiency by rewarding the consistent physical representation of the extrapolated estimations and the physical references. Experimental validations are conducted on two scenarios (i.e. the walking trials and wrist motion trials). Results indicate that the estimations of the muscle forces and joint kinematics are unbiased compared to the physics-based inverse dynamics, which outperforms the selected benchmark methods, including physics-informed convolution neural network (PI-CNN), vallina generative adversarial network (GAN), and multi-layer extreme learning machine (ML-ELM).
    摘要 Muscle force和关节动态观测从表面电omyography(sEMG)是生动机动分析中的关键因素。最近的深度神经网络(DNNs)技术已经表现出可以自动化和复制生动机动分析的潜在力量。然而,生动机动分析的小样本特征和物理解释性限制了DNNs的应用。本文提出了一种新的物理学习低精度学习方法,用于sEMG基于的肌力和关节动态观测。这种方法将拉格朗日方程组入生成对抗网络(GAN)框架中,用于结构化特征解码和推断预测。具体来说,拉格朗日方程组入生成模型中,以便结构化解码高级特征遵循物理法律。此外,我们还设计了一种物理学习策略,以提高对抗学习效率,通过奖励遵循物理表述的推断估计和物理参考。实验 validate 在两个场景(即行走试验和手部运动试验)。结果表明,肌力和关节动态观测的估计不受偏见,与物理反向动力学相符,超越了选择的参考方法,包括物理学习核lear neural network(PI-CNN)、 vallina GAN 和多层极限学习机(ML-ELM)。

Optimization-based Learning for Dynamic Load Planning in Trucking Service Networks

  • paper_url: http://arxiv.org/abs/2307.04050
  • repo_url: None
  • paper_authors: Ritesh Ojha, Wenbo Chen, Hanyu Zhang, Reem Khir, Alan Erera, Pascal Van Hentenryck
  • For: This paper aims to develop a decision-support tool for parcel carriers to optimize their service network design and load planning.* Methods: The paper formulates the Dynamic Load Planning Problem (DLPP) as a Mixed-Integer Programming (MIP) model and proposes a Goal-Directed Optimization method to eliminate symmetries and improve the quality of solutions. The paper also introduces an optimization proxy that combines a machine learning model and a feasibility restoration model to address computational challenges.* Results: The proposed approach is tested on industrial instances and shows significant improvements in terms of computational efficiency and solution quality compared to a commercial solver. The approach also demonstrates the benefits of load consolidation and the potential for significant cost savings through the combination of machine learning and optimization.Here is the summary in Simplified Chinese:* 为:这篇论文目标是为快递公司开发一种决策支持工具,以优化其服务网络设计和货运规划。* 方法:该论文将动态货运规划问题(DLPP)формализова为杂程式(MIP)模型,并提出了一种目标导向优化方法,以消除对称性并提高解的质量。论文还介绍了一种优化代理,该代理结合机器学习模型和可行性修复模型,以解决优化模型的计算挑战。* 结果:提出的方法在工业实例中进行了广泛的计算研究,并显示出了明显的计算效率和解质量的改善,相比于商业 solve。该方法还 demonstarted了货物集中和加工成本的减少,以及机器学习和优化的结合可以带来的显著经济效益。
    Abstract The load planning problem is a critical challenge in service network design for parcel carriers: it decides how many trailers (or loads) to assign for dispatch over time between pairs of terminals. Another key challenge is to determine a flow plan, which specifies how parcel volumes are assigned to planned loads. This paper considers the Dynamic Load Planning Problem (DLPP) that considers both flow and load planning challenges jointly to adjust loads and flows as the demand forecast changes over time before the day of operations. The paper aims at developing a decision-support tool to inform planners making these decisions at terminals across the network. The paper formulates the DLPP as a MIP and shows that it admits a large number of symmetries in a network where each commodity can be routed through primary and alternate paths. As a result, an optimization solver may return fundamentally different solutions to closely related problems, confusing planners and reducing trust in optimization. To remedy this limitation, the paper proposes a Goal-Directed Optimization that eliminates those symmetries by generating optimal solutions staying close to a reference plan. The paper also proposes an optimization proxy to address the computational challenges of the optimization models. The proxy combines a machine learning model and a feasibility restoration model and finds solutions that satisfy real-time constraints imposed by planners-in-the-loop. An extensive computational study on industrial instances shows that the optimization proxy is around 10 times faster than the commercial solver in obtaining the same quality solutions and orders of magnitude faster for generating solutions that are consistent with each other. The proposed approach also demonstrates the benefits of the DLPP for load consolidation, and the significant savings obtained from combining machine learning and optimization.
    摘要 服务网络设计中的负载观念问题是一个扮演性的挑战,决定在不同终点之间分配多少货车(或负载),以及将货物量分配到计划中的负载上。这篇文章考虑了时间流动负载观念问题(DLPP),考虑了流动和负载观念问题的共同挑战,以适应需求预测在时间上的变化。文章的目标是发展一个帮助计划人员做出决策的决策支持工具。文章将DLPP表述为一个内部数据流过程(MIP),并证明了这个问题在网络中的各种商品可以通过主要和备用路径进行路由。因此,优化 solver 可能会返回 closely related 问题的不同解,导致计划人员误解和依靠优化减少。为解决这个限制,文章提出了一个目标导向优化,删除了这些对称性。文章还提出了一个优化代理, combinates 机器学习模型和可行性修复模型,寻找满足实时约束的解决方案。一系列的 Computational 研究显示,优化代理 比商业 solver 在获得相同质量解决方案上约 10 倍快,而且在生成相容的解决方案上具有数量级的优化。提案的方法也显示了 DLPP 的负载整合和机器学习优化的重要性。

The Value of Chess Squares

  • paper_url: http://arxiv.org/abs/2307.05330
  • repo_url: https://github.com/Dpay123/chess
  • paper_authors: Aditya Gupta, Shiva Maharaj, Nicholas Polson, Vadim Sokolov
  • for: 这个研究的目的是确定棋盘上的棋子和位置的价值,以及用于评估棋盘上的位置的精度。
  • methods: 本研究使用了新的评估方法,其中包括对棋子和平方的边缘价值的引入。
  • results: 研究发现,使用新的评估方法可以更好地评估棋盘上的位置和棋子的价值,并提供了有价值的棋盘结构和棋子评估的新视角。
    Abstract Valuing chess squares and determining the placement of pieces on the board are the main objectives of our study. With the emergence of chess AI, it has become possible to accurately assess the worth of positions in a game of chess. The conventional approach assigns fixed values to pieces $(\symking=\infty, \symqueen=9, \symrook=5, \symbishop=3, \symknight=3, \sympawn=1)$. We enhance this analysis by introducing marginal valuations for both pieces and squares. We demonstrate our method by examining the positioning of Knights and Bishops, and also provide valuable insights into the valuation of pawns. Notably, Nimzowitsch was among the pioneers in advocating for the significance of Pawn structure and valuation. Finally, we conclude by suggesting potential avenues for future research.
    摘要 我们的研究的主要目标是评估棋盘上的棋子和坐标的价值。随着棋盘智能的出现,可以准确评估棋盘上的位置价值。传统方法将棋子的价值分别设置为($\symking=\infty, \symqueen=9, \symrook=5, \symbishop=3, \symknight=3, \sympawn=1)$。我们增强了这种分析,通过引入棋子和坐标的边缘价值。我们通过研究夜莺和主教的位置问题,并提供了关于坐标价值的有益信息。值得注意的是,尼莫迪奇(Nimzowitsch)是棋盘价值和结构的先驱者之一,他认为Pawn结构的价值很重要。最后,我们 conclude by suggesting potential future research directions.Here's the translation of the text into Traditional Chinese:我们的研究的主要目标是评估棋盘上的棋子和坐标的价值。随着棋盘智能的出现,可以准确评估棋盘上的位置价值。传统方法将棋子的价值分别设置为($\symking=\infty, \symqueen=9, \symrook=5, \symbishop=3, \symknight=3, \sympawn=1)$。我们增强了这种分析,通过引入棋子和坐标的边缘价值。我们通过研究夜莺和主教的位置问题,并提供了关于坐标价值的有益信息。值得注意的是,尼莫迪奇(Nimzowitsch)是棋盘价值和结构的先驱者之一,他认为Pawn结构的价值很重要。最后,我们 conclude by suggesting potential future research directions.

Designing a Direct Feedback Loop between Humans and Convolutional Neural Networks through Local Explanations

  • paper_url: http://arxiv.org/abs/2307.04036
  • repo_url: https://github.com/tongstevensun/deepfuse
  • paper_authors: Tong Steven Sun, Yuyang Gao, Shubham Khaladkar, Sijia Liu, Liang Zhao, Young-Ho Kim, Sungsoo Ray Hong
  • for: 本研究旨在提供一种实时Feedback loopbetween用户和Convolutional Neural Networks (CNNs),以便在诊断和修复CNNs的漏洞方面提供直接反馈。
  • methods: 本研究使用了Local explanation方法,通过可见的直观性,帮助ML工程师更好地理解CNNs的输出。同时,本研究还提出了一种实时反馈机制,使得用户可以在执行CNNs时,对其输出进行修改和调整。
  • results: 本研究通过一个两天的实验(S2),证明了DeepFuse可以帮助参与者创建一个更加准确和合理的CNN模型,同时也提高了参与者对CNNs的理解和修复能力。此外,本研究还发现,通过DeepFuse的指导,参与者可以更加准确地诊断和修复CNNs的漏洞。
    Abstract The local explanation provides heatmaps on images to explain how Convolutional Neural Networks (CNNs) derive their output. Due to its visual straightforwardness, the method has been one of the most popular explainable AI (XAI) methods for diagnosing CNNs. Through our formative study (S1), however, we captured ML engineers' ambivalent perspective about the local explanation as a valuable and indispensable envision in building CNNs versus the process that exhausts them due to the heuristic nature of detecting vulnerability. Moreover, steering the CNNs based on the vulnerability learned from the diagnosis seemed highly challenging. To mitigate the gap, we designed DeepFuse, the first interactive design that realizes the direct feedback loop between a user and CNNs in diagnosing and revising CNN's vulnerability using local explanations. DeepFuse helps CNN engineers to systemically search "unreasonable" local explanations and annotate the new boundaries for those identified as unreasonable in a labor-efficient manner. Next, it steers the model based on the given annotation such that the model doesn't introduce similar mistakes. We conducted a two-day study (S2) with 12 experienced CNN engineers. Using DeepFuse, participants made a more accurate and "reasonable" model than the current state-of-the-art. Also, participants found the way DeepFuse guides case-based reasoning can practically improve their current practice. We provide implications for design that explain how future HCI-driven design can move our practice forward to make XAI-driven insights more actionable.
    摘要 本地解释提供图像上的热图,以解释卷积神经网络(CNN)的输出来源。由于其视觉直观,这种方法在诊断CNN方面得到了广泛的应用。但是,根据我们的首次研究(S1),机器学习工程师对本地解释持有折衔的看法,认为它作为CNN建模的重要工具,但同时也会带来劳动iously检测漏洞的困难。此外,通过检测的漏洞来调整CNN也显然具有困难。为了缓解这一问题,我们设计了深度融合(DeepFuse),第一个实现用户与CNN之间的直接反馈循环的交互设计。DeepFuse帮助CNN工程师系统地搜索"不合理"的本地解释,并在劳动效率高的情况下注释新的边界。然后,它将模型按照给出的注释进行调整,以避免模型 introduce 类似的错误。我们进行了两天的研究(S2),与12名经验丰富的CNN工程师进行了合作。使用DeepFuse,参与者创建了更加准确和"合理"的模型,并认为DeepFuse的指导方法可以实际地改善他们当前的做法。我们提供了设计方面的推荐,解释了未来HCID驱动的设计如何将XAI驱动的洞察力变得更加操作化。

Learning Variational Neighbor Labels for Test-Time Domain Generalization

  • paper_url: http://arxiv.org/abs/2307.04033
  • repo_url: None
  • paper_authors: Sameer Ambekar, Zehao Xiao, Jiayi Shen, Xiantong Zhen, Cees G. M. Snoek
  • for: 本研究努力实现领域总结,即模型在训练于源频道后在未见目标频道上部署。我们遵循严格的源训练和目标测试的分离,但是利用目标频道自身的无标注数据来进行推理。
  • methods: 我们提出了三个贡献。首先,我们提出了使用概率 Pseudo-labeling 将目标样本泛化到目标频道上,以使源频道训练的模型在测试时能够泛化到目标频道。其次,我们学习了variational neighbor labels,以使用邻居目标样本的信息来生成更加Robust的 Pseudo labels。第三,我们引入了一个元总结阶段,以在训练中模拟泛化过程,以学习更好地泛化目标信息。
  • results: 我们在六个广泛使用的 dataset 上进行了实验,结果表明我们的提议具有优势、能力和有效性。
    Abstract This paper strives for domain generalization, where models are trained exclusively on source domains before being deployed at unseen target domains. We follow the strict separation of source training and target testing but exploit the value of the unlabeled target data itself during inference. We make three contributions. First, we propose probabilistic pseudo-labeling of target samples to generalize the source-trained model to the target domain at test time. We formulate the generalization at test time as a variational inference problem by modeling pseudo labels as distributions to consider the uncertainty during generalization and alleviate the misleading signal of inaccurate pseudo labels. Second, we learn variational neighbor labels that incorporate the information of neighboring target samples to generate more robust pseudo labels. Third, to learn the ability to incorporate more representative target information and generate more precise and robust variational neighbor labels, we introduce a meta-generalization stage during training to simulate the generalization procedure. Experiments on six widely-used datasets demonstrate the benefits, abilities, and effectiveness of our proposal.
    摘要 这篇论文努力实现领域通用化,即在训练时仅使用源领域,然后在未见目标领域进行部署。我们遵循严格的源训练和目标测试的分离,但是利用目标数据本身的价值进行推理。我们提出了三个贡献:1. 我们提议在测试时使用目标样本的概率 Pseudo-标签来泛化源训练模型到目标领域。我们将泛化视为测试时的变量推理问题,并在泛化过程中考虑 pseudo labels 的不确定性,以避免 pseudo labels 的不准确信号。2. 我们学习了基于邻域目标样本的变量邻域标签,以生成更加稳健的 pseudo labels。3. 为了学习更好地汇集更多的目标信息并生成更加精准和稳定的变量邻域标签,我们引入了一个元泛化阶段在训练中进行模拟泛化过程。我们在六个广泛使用的 dataset 上进行了实验,并证明了我们的提议的优点、能力和有效性。

On “Indifference” and Backward Induction in Games with Perfect Information

  • paper_url: http://arxiv.org/abs/2307.04029
  • repo_url: None
  • paper_authors: Nimrod Megiddo
  • for: 这 paper written for? + 这 paper 探讨了在游戏中不同结果间的偏袋不可小 perturbations 问题。
  • methods: 这 paper 使用了哪些方法? + 这 paper 使用了 rational choice 概念和其他玩家的利得来解决了偏袋不可小 perturbations 问题。
  • results: 这 paper 得到了哪些结果? + 这 paper 得到了一种基于其他玩家的利得的 rationality 概念来解决偏袋不可小 perturbations 问题的方法,即 Tit-for-Tat。
    Abstract Indifference of a player with respect to two distinct outcomes of a game cannot be handled by small perturbations, because the actual choice may have significant impact on other players, and cause them to act in a way that has significant impact of the indifferent player. It is argued that ties among rational choices can be resolved by refinements of the concept of rationality based on the utilities of other players. One such refinement is the concept of Tit-for-Tat.
    摘要 “玩家对两个游戏结果的不在焦虑不能通过小幅度的改变来处理,因为他的实际选择可能会对其他玩家造成重要影响,使其发生重要影响。有一种解决方案是基于其他玩家的利益来修改 rationality 概念,例如 Tit-for-Tat。”Note: "玩家" (wán jiā) in Chinese refers to "player" in English.

Measuring the Success of Diffusion Models at Imitating Human Artists

  • paper_url: http://arxiv.org/abs/2307.04028
  • repo_url: None
  • paper_authors: Stephen Casper, Zifan Guo, Shreya Mogulothu, Zachary Marinov, Chinmay Deshpande, Rui-Jie Yew, Zheng Dai, Dylan Hadfield-Menell
  • for: 这个论文旨在研究现代扩散模型是否可以模仿人类艺术家的作品。
  • methods: 这个论文使用了 Contrastive Language-Image Pretrained (CLIP) 算法来测试模型是否可以模仿特定艺术家的风格。
  • results: 研究发现,当模型被请求模仿某个艺术家的作品时,CLIP 可以很准确地将这些作品归类回原始艺术家。此外,研究还发现,这些模仿作品与艺术家的原始作品之间存在高度的统计学相似性。
    Abstract Modern diffusion models have set the state-of-the-art in AI image generation. Their success is due, in part, to training on Internet-scale data which often includes copyrighted work. This prompts questions about the extent to which these models learn from, imitate, or copy the work of human artists. This work suggests that tying copyright liability to the capabilities of the model may be useful given the evolving ecosystem of generative models. Specifically, much of the legal analysis of copyright and generative systems focuses on the use of protected data for training. As a result, the connections between data, training, and the system are often obscured. In our approach, we consider simple image classification techniques to measure a model's ability to imitate specific artists. Specifically, we use Contrastive Language-Image Pretrained (CLIP) encoders to classify images in a zero-shot fashion. Our process first prompts a model to imitate a specific artist. Then, we test whether CLIP can be used to reclassify the artist (or the artist's work) from the imitation. If these tests match the imitation back to the original artist, this suggests the model can imitate that artist's expression. Our approach is simple and quantitative. Furthermore, it uses standard techniques and does not require additional training. We demonstrate our approach with an audit of Stable Diffusion's capacity to imitate 70 professional digital artists with copyrighted work online. When Stable Diffusion is prompted to imitate an artist from this set, we find that the artist can be identified from the imitation with an average accuracy of 81.0%. Finally, we also show that a sample of the artist's work can be matched to these imitation images with a high degree of statistical reliability. Overall, these results suggest that Stable Diffusion is broadly successful at imitating individual human artists.
    摘要 现代扩散模型已经设置了人工智能图像生成的 estado-del-arte。其成功部分归功于训练在互联网规模的数据上,这些数据经常包含版权工作。这些问题引发了关于模型从人类艺术家的作品中学习、模仿或复制的问题。这个研究建议将版权责任与模型的能力相关联可能是有用的, giventhe evolving ecosystem of generative models。具体来说,法律分析对于版权和生成系统的关系经常集中在使用保护的数据进行训练。因此,数据、训练和系统之间的连接经常被隐藏。我们的方法是通过使用语义相似性来衡量模型是否能够模仿特定艺术家。我们使用语言-图像预训练(CLIP)编码器来在零扩展方式进行图像分类。我们的过程首先要让模型模仿特定艺术家。然后,我们测试是否可以使用CLIP来重新分类艺术家(或艺术家的作品)。如果这些测试匹配艺术家(或艺术家的作品)与模仿,这表示模型可以模仿这位艺术家的表达。我们的方法是简单而量化的,并且不需要额外训练。我们通过对Stable Diffusion的可行性进行审核,发现它可以模仿70名职业数字艺术家的版权作品。当Stable Diffusion被让模仿这些艺术家时,我们发现这些艺术家可以从模仿中匹配出来,均为81.0%。此外,我们还证明了这些模仿图像和艺术家的作品之间存在高度统计学的相互关联。总之,这些结果表明Stable Diffusion能够广泛地模仿人类艺术家。

GP-guided MPPI for Efficient Navigation in Complex Unknown Cluttered Environments

  • paper_url: http://arxiv.org/abs/2307.04019
  • repo_url: None
  • paper_authors: Ihab S. Mohamed, Mahmoud Ali, Lantao Liu
  • For: The paper is written for robotic navigation in unknown, cluttered environments with limited sensing capabilities.* Methods: The paper uses local trajectory optimization methods, specifically Model Predictive Path Intergal (MPPI), and integrates it with a local perception model based on Sparse Gaussian Process (SGP) to learn about the navigable space surrounding the robot and identify suggested subgoals.* Results: The proposed control strategy, called GP-MPPI, is validated through both simulated and real-world experiments of 2D autonomous navigation tasks in complex unknown environments, demonstrating its efficiency and robustness in guiding the robot safely towards its desired goal while avoiding obstacles and escaping entrapment in local minima.Here is the information in Simplified Chinese text:* 为:论文写作的目的是Robotic Navigation在未知、拥堵的环境中进行路径规划,具有限制的感知能力。* 方法:论文使用本地规划方法,即Model Predictive Path Intergal (MPPI),并将其与本地感知模型基于Sparse Gaussian Process (SGP)相结合,以学习环境中可行的空间,并提供了一系列建议的目标,以便MPPI计划器选择最优化的目标。* 结果:论文提出的控制策略,即GP-MPPI,通过在实验室和真实环境中进行的2D自主导航任务的实验 validate了其效率和可靠性,证明了它在避免障碍物和脱险地逃脱局部最优化的情况下安全地导航到目标。
    Abstract Robotic navigation in unknown, cluttered environments with limited sensing capabilities poses significant challenges in robotics. Local trajectory optimization methods, such as Model Predictive Path Intergal (MPPI), are a promising solution to this challenge. However, global guidance is required to ensure effective navigation, especially when encountering challenging environmental conditions or navigating beyond the planning horizon. This study presents the GP-MPPI, an online learning-based control strategy that integrates MPPI with a local perception model based on Sparse Gaussian Process (SGP). The key idea is to leverage the learning capability of SGP to construct a variance (uncertainty) surface, which enables the robot to learn about the navigable space surrounding it, identify a set of suggested subgoals, and ultimately recommend the optimal subgoal that minimizes a predefined cost function to the local MPPI planner. Afterward, MPPI computes the optimal control sequence that satisfies the robot and collision avoidance constraints. Such an approach eliminates the necessity of a global map of the environment or an offline training process. We validate the efficiency and robustness of our proposed control strategy through both simulated and real-world experiments of 2D autonomous navigation tasks in complex unknown environments, demonstrating its superiority in guiding the robot safely towards its desired goal while avoiding obstacles and escaping entrapment in local minima. The GPU implementation of GP-MPPI, including the supplementary video, is available at https://github.com/IhabMohamed/GP-MPPI.
    摘要 人工智能导航在未知、杂乱环境中具有限制的感知能力 pose significant challenges in robotics. Local trajectory optimization methods, such as Model Predictive Path Intergal (MPPI), are a promising solution to this challenge. However, global guidance is required to ensure effective navigation, especially when encountering challenging environmental conditions or navigating beyond the planning horizon. This study presents the GP-MPPI, an online learning-based control strategy that integrates MPPI with a local perception model based on Sparse Gaussian Process (SGP). The key idea is to leverage the learning capability of SGP to construct a variance (uncertainty) surface, which enables the robot to learn about the navigable space surrounding it, identify a set of suggested subgoals, and ultimately recommend the optimal subgoal that minimizes a predefined cost function to the local MPPI planner. Afterward, MPPI computes the optimal control sequence that satisfies the robot and collision avoidance constraints. Such an approach eliminates the necessity of a global map of the environment or an offline training process. We validate the efficiency and robustness of our proposed control strategy through both simulated and real-world experiments of 2D autonomous navigation tasks in complex unknown environments, demonstrating its superiority in guiding the robot safely towards its desired goal while avoiding obstacles and escaping entrapment in local minima. The GPU implementation of GP-MPPI, including the supplementary video, is available at https://github.com/IhabMohamed/GP-MPPI.

Proceedings Nineteenth conference on Theoretical Aspects of Rationality and Knowledge

Abstract The TARK conference (Theoretical Aspects of Rationality and Knowledge) is a conference that aims to bring together researchers from a wide variety of fields, including computer science, artificial intelligence, game theory, decision theory, philosophy, logic, linguistics, and cognitive science. Its goal is to further our understanding of interdisciplinary issues involving reasoning about rationality and knowledge. Previous conferences have been held biennially around the world since 1986, on the initiative of Joe Halpern (Cornell University). Topics of interest include, but are not limited to, semantic models for knowledge, belief, awareness and uncertainty, bounded rationality and resource-bounded reasoning, commonsense epistemic reasoning, epistemic logic, epistemic game theory, knowledge and action, applications of reasoning about knowledge and other mental states, belief revision, computational social choice, algorithmic game theory, and foundations of multi-agent systems. Information about TARK, including conference proceedings, is available at http://www.tark.org/ These proceedings contain the papers that have been accepted for presentation at the Nineteenth Conference on Theoretical Aspects of Rationality and Knowledge (TARK 2023), held between June 28 and June 30, 2023, at the University of Oxford, United Kingdom. The conference website can be found at https://sites.google.com/view/tark-2023
摘要 TARK conference(理性和知识的理论方面)是一个会议,旨在让不同领域的研究人员(包括计算机科学、人工智能、游戏理论、决策理论、哲学、逻辑、语言科学和认知科学)共同分享他们的研究成果。会议的目标是深入了解跨学科问题,有关理性和知识的推理。自1986年以来,TARK会议每两年在世界各地举行,由 Джо·哈尔佩恩(科内尔大学)发起。会议的主题包括,但不限于:知识、信念、意识和不确定性的semantic模型,有限智能和资源有限的推理,通常的epistemic推理,epistemic逻辑、epistemic游戏理论、知识和行动、理性和知识之间的关系,以及应用推理知识和其他心理状态的问题。以下是TARK会议的论文集,包括2023年6月28日-6月30日在英国牛津大学举行的第十九届TARK会议(TARK 2023) Accepted Papers。会议网站的地址为