cs.AI - 2023-08-06

AI-GOMS: Large AI-Driven Global Ocean Modeling System

  • paper_url: http://arxiv.org/abs/2308.03152
  • repo_url: None
  • paper_authors: Wei Xiong, Yanfei Xiang, Hao Wu, Shuyi Zhou, Yuze Sun, Muyuan Ma, Xiaomeng Huang
  • for: 这篇论文旨在提出一种基于人工智能的全球海洋模型系统(AI-GOMS),以提高海洋日干预测的准确性和效率。
  • methods: 该论文使用了一种基于 Fourier-based Masked Autoencoder 结构的基本海洋变量预测模型,以及一些轻量级细化模型,包括地区下降、浪谱解码和生物化协同模块。
  • results: 该论文在30天预测期内,对全球海洋基本变量(15层深度)的预测达到了最佳性能,并且在统计指标上表现出色。此外,AI-GOMS还能模拟mesoscale涝团在日本海洋区域的涝团,以及在 тропиical Pacific 海洋区域的海洋层次分布。
    Abstract Ocean modeling is a powerful tool for simulating the physical, chemical, and biological processes of the ocean, which is the foundation for marine science research and operational oceanography. Modern numerical ocean modeling mainly consists of governing equations and numerical algorithms. Nonlinear instability, computational expense, low reusability efficiency and high coupling costs have gradually become the main bottlenecks for the further development of numerical ocean modeling. Recently, artificial intelligence-based modeling in scientific computing has shown revolutionary potential for digital twins and scientific simulations, but the bottlenecks of numerical ocean modeling have not been further solved. Here, we present AI-GOMS, a large AI-driven global ocean modeling system, for accurate and efficient global ocean daily prediction. AI-GOMS consists of a backbone model with the Fourier-based Masked Autoencoder structure for basic ocean variable prediction and lightweight fine-tuning models incorporating regional downscaling, wave decoding, and biochemistry coupling modules. AI-GOMS has achieved the best performance in 30 days of prediction for the global ocean basic variables with 15 depth layers at 1/4{\deg} spatial resolution. Beyond the good performance in statistical metrics, AI-GOMS realizes the simulation of mesoscale eddies in the Kuroshio region at 1/12{\deg} spatial resolution and ocean stratification in the tropical Pacific Ocean. AI-GOMS provides a new backbone-downstream paradigm for Earth system modeling, which makes the system transferable, scalable and reusable.
    摘要 海洋模型是一种强大的工具,用于模拟海洋的物理、化学和生物过程,这是海洋科学研究和操作海洋学的基础。现代数值海洋模型主要由管理方程和数值算法组成。不线性不稳定、计算成本高、重用率低和对接成本高逐渐成为数值海洋模型的主要瓶颈。在科学计算中,基于人工智能的模型在近年来表现出革命性的潜力,但是数值海洋模型中的瓶颈问题没有得到解决。我们现在提出了AI-GOMS,一个大型基于人工智能的全球海洋模型系统,用于准确和高效地预测全球海洋的日常变化。AI-GOMS包括一个基本模型,使用带mask的自适应神经网络结构来预测基本海洋变量,以及轻量级的精度适应模型,包括地区下降、波解码和生物化学相互作用模块。AI-GOMS在30天预测全球海洋基本变量中达到了15层深度的1/4度空间分辨率,并实现了库ashiyo区域的微型旋转和海洋层次分布在赤道太平洋海域。AI-GOMS提供了一个新的后台下渠模型,使得系统可以易于传输、扩展和重用。

“We care”: Improving Code Mixed Speech Emotion Recognition in Customer-Care Conversations

  • paper_url: http://arxiv.org/abs/2308.03150
  • repo_url: None
  • paper_authors: N V S Abhishek, Pushpak Bhattacharyya
  • for: 这个论文的目的是提高对自然语言 conversational AI 中的情绪识别精度。
  • methods: 该论文使用了自然混合语言的 conversational AI 数据集(NSED),并通过 incorporating word-level VAD value 提高了 SER 任务的精度。
  • results: 根据该论文的结果,在 NSED 数据集上,通过 incorporating word-level VAD value 可以提高 SER 任务的精度,特别是对于负情绪的识别精度提高了2%。
    Abstract Speech Emotion Recognition (SER) is the task of identifying the emotion expressed in a spoken utterance. Emotion recognition is essential in building robust conversational agents in domains such as law, healthcare, education, and customer support. Most of the studies published on SER use datasets created by employing professional actors in a noise-free environment. In natural settings such as a customer care conversation, the audio is often noisy with speakers regularly switching between different languages as they see fit. We have worked in collaboration with a leading unicorn in the Conversational AI sector to develop Natural Speech Emotion Dataset (NSED). NSED is a natural code-mixed speech emotion dataset where each utterance in a conversation is annotated with emotion, sentiment, valence, arousal, and dominance (VAD) values. In this paper, we show that by incorporating word-level VAD value we improve on the task of SER by 2%, for negative emotions, over the baseline value for NSED. High accuracy for negative emotion recognition is essential because customers expressing negative opinions/views need to be pacified with urgency, lest complaints and dissatisfaction snowball and get out of hand. Escalation of negative opinions speedily is crucial for business interests. Our study then can be utilized to develop conversational agents which are more polite and empathetic in such situations.
    摘要 《语音情感识别(SER)是指根据说话人的语音特征来确定他们表达的情感。情感识别是建立Robust conversational agents的关键,特别是在法律、医疗、教育和客户支持等领域。大多数已发表的SER研究使用了由专业演员创建的数据集,这些数据集通常在干净的环境中采集。然而,在自然环境中,如客户服务对话,音频通常具有噪音和Speaker switching between不同的语言的现象。我们与行业领袖的Conversational AI公司合作,开发了自然语音情感数据集(NSED)。NSED是一个自然语音混合语言的情感数据集,每个对话中的每句话都有情感、情感、浓淡、高低(VAD)值的注释。在这篇论文中,我们表明,通过 incorporating 单词级VAD值,可以在NSED数据集上提高SER任务的准确率,对负情感的准确率提高2%。正确识别负情感的精度非常重要,因为客户表达负面意见时需要尽快 pacify,以避免投诉和不满的情况扩散和恶化。这些情况下,我们的研究可以用于开发更加偏袋和同情的对话机器人。》

Towards socially-competent and culturally-adaptive artificial agents Expressive order, interactional disruptions and recovery strategies

  • paper_url: http://arxiv.org/abs/2308.03146
  • repo_url: None
  • paper_authors: Chiara Bassetti, Enrico Blanzieri, Stefano Borgo, Sofia Marangon
  • for: 这个论文的目的是使人工智能代理人在多方交互情况下展现社交技巧和地方社会规范知识。
  • methods: 这个论文使用了分离表达和功能两个顺序的方法,并提出了一种框架来使人工智能代理人在多方交互情况下成为社交合格的。
  • results: 这个论文的结果表明,通过分类功能和社交干扰,以及研究人工智能架构是如何利用这些知识的,可以使人工智能代理人在多方交互情况下展现出社交技巧。
    Abstract The development of artificial agents for social interaction pushes to enrich robots with social skills and knowledge about (local) social norms. One possibility is to distinguish the expressive and the functional orders during a human-robot interaction. The overarching aim of this work is to set a framework to make the artificial agent socially-competent beyond dyadic interaction-interaction in varying multi-party social situations-and beyond individual-based user personalization, thereby enlarging the current conception of "culturally-adaptive". The core idea is to provide the artificial agent with the capability to handle different kinds of interactional disruptions, and associated recovery strategies, in microsociology. The result is obtained by classifying functional and social disruptions, and by investigating the requirements a robot's architecture should satisfy to exploit such knowledge. The paper also highlights how this level of competence is achieved by focusing on just three dimensions: (i) social capability, (ii) relational role, and (iii) proximity, leaving aside the further complexity of full-fledged human-human interactions. Without going into technical aspects, End-to-end Data-driven Architectures and Modular Architectures are discussed to evaluate the degree to which they can exploit this new set of social and cultural knowledge. Finally, a list of general requirements for such agents is proposed.
    摘要 人工智能代理人的发展推动了赋予机器人社交技能和地方社会规范知识。一种可能是在人机交互中分辨出表达性和功能性的两个顺序。本研究的核心目标是为人工代理人提供社会能力,使其在多方社会情况下能够应对不同类型的交互干扰和相应的恢复策略,从而超越当前的“文化适应”概念。以下是本研究的核心思想:1. 分类功能性和社交性的干扰,并调查机器人体系是如何利用这些知识。2. 只关注三个维度:社会能力、关系角色和距离,忽略人类之间的复杂交互。3. 评估END-TO-END数据驱动体系和模块化体系是否能够利用这些新的社会和文化知识。4. 提出人工代理人的通用要求列表。希望这个翻译能够帮助您!(_ _)

Embedding-based Retrieval with LLM for Effective Agriculture Information Extracting from Unstructured Data

  • paper_url: http://arxiv.org/abs/2308.03107
  • repo_url: None
  • paper_authors: Ruoling Peng, Kang Liu, Po Yang, Zhipeng Yuan, Shunbao Li
  • for: 本研究旨在提高农业领域病虫识别率,使用域名独立普通话语言模型(LLM)自动提取农业文档中的结构化数据。
  • methods: 本研究使用文本检索和筛选,使用嵌入空间Retrieval,然后使用LLM问答系统自动提取文档中的实体和属性,并将其转换为结构化数据。
  • results: 对比现有方法,本研究在标准测试集上保持了更高的准确率,同时具有高效性。
    Abstract Pest identification is a crucial aspect of pest control in agriculture. However, most farmers are not capable of accurately identifying pests in the field, and there is a limited number of structured data sources available for rapid querying. In this work, we explored using domain-agnostic general pre-trained large language model(LLM) to extract structured data from agricultural documents with minimal or no human intervention. We propose a methodology that involves text retrieval and filtering using embedding-based retrieval, followed by LLM question-answering to automatically extract entities and attributes from the documents, and transform them into structured data. In comparison to existing methods, our approach achieves consistently better accuracy in the benchmark while maintaining efficiency.
    摘要 害虫识别是农业害虫控制中的关键环节。然而,大多数农民无法在场地上准确地识别害虫,而有限的结构化数据源也不可供快速查询。在这项工作中,我们探讨使用领域独立的大型自然语言模型(LLM)来从农业文档中提取结构化数据,无需或 minimal 人工干预。我们提议一种方法,包括文本检索和筛选使用嵌入空间 retrieval,然后使用 LLM 问答来自动提取文档中的实体和属性,并将其转换为结构化数据。相比现有方法,我们的方法在 benchmark 中具有更高的稳定性和效率。

Language-based Photo Color Adjustment for Graphic Designs

  • paper_url: http://arxiv.org/abs/2308.03059
  • repo_url: None
  • paper_authors: Zhenwei Wang, Nanxuan Zhao, Gerhard Hancke, Rynson W. H. Lau
  • for: 这篇论文的目的是为 graphic design 提供一种语言基于的图像重新色调方法,以便更好地传达图像的信息和增加美感。
  • methods: 该方法使用了一种语言基于的模型,可以帮助专业人士和新手插画师来重新色调图像。它可以预测图像中的源颜色和目标区域,然后将目标区域重新色调为源颜色。
  • results: 该方法可以准确地重新色调图像,并且可以根据用户的多重指令来生成多个可能的结果。它还可以保持原始图像的 semantics,以便更好地增加图像的美感。
    Abstract Adjusting the photo color to associate with some design elements is an essential way for a graphic design to effectively deliver its message and make it aesthetically pleasing. However, existing tools and previous works face a dilemma between the ease of use and level of expressiveness. To this end, we introduce an interactive language-based approach for photo recoloring, which provides an intuitive system that can assist both experts and novices on graphic design. Given a graphic design containing a photo that needs to be recolored, our model can predict the source colors and the target regions, and then recolor the target regions with the source colors based on the given language-based instruction. The multi-granularity of the instruction allows diverse user intentions. The proposed novel task faces several unique challenges, including: 1) color accuracy for recoloring with exactly the same color from the target design element as specified by the user; 2) multi-granularity instructions for parsing instructions correctly to generate a specific result or multiple plausible ones; and 3) locality for recoloring in semantically meaningful local regions to preserve original image semantics. To address these challenges, we propose a model called LangRecol with two main components: the language-based source color prediction module and the semantic-palette-based photo recoloring module. We also introduce an approach for generating a synthetic graphic design dataset with instructions to enable model training. We evaluate our model via extensive experiments and user studies. We also discuss several practical applications, showing the effectiveness and practicality of our approach. Code and data for this paper are at: https://zhenwwang.github.io/langrecol.
    摘要 adjusting the photo color to match with some design elements is a crucial way for a graphic design to effectively deliver its message and make it visually appealing. However, existing tools and previous works face a dilemma between ease of use and level of expressiveness. To address this challenge, we propose an interactive language-based approach for photo recoloring, which provides an intuitive system that can assist both experts and novices in graphic design. Given a graphic design containing a photo that needs to be recolored, our model can predict the source colors and the target regions, and then recolor the target regions with the source colors based on the given language-based instruction. The multi-granularity of the instruction allows diverse user intentions. The proposed novel task faces several unique challenges, including: 1) color accuracy for recoloring with exactly the same color from the target design element as specified by the user; 2) multi-granularity instructions for parsing instructions correctly to generate a specific result or multiple plausible ones; and 3) locality for recoloring in semantically meaningful local regions to preserve original image semantics. To address these challenges, we propose a model called LangRecol with two main components: the language-based source color prediction module and the semantic-palette-based photo recoloring module. We also introduce an approach for generating a synthetic graphic design dataset with instructions to enable model training. We evaluate our model via extensive experiments and user studies. We also discuss several practical applications, showing the effectiveness and practicality of our approach. Code and data for this paper are available at: .

Comparative Analysis of Epileptic Seizure Prediction: Exploring Diverse Pre-Processing Techniques and Machine Learning Models

  • paper_url: http://arxiv.org/abs/2308.05176
  • repo_url: None
  • paper_authors: Md. Simul Hasan Talukder, Rejwan Bin Sulaiman
  • for: 预测 эпилептических发作
  • methods: 使用多种机器学习模型进行预测
  • results: 比较分析五种机器学习模型的性能,结果显示Extra Trees模型在预测 эпилептических发作中表现最佳,其准确率达99.29%,超过了之前的研究成果。
    Abstract Epilepsy is a prevalent neurological disorder characterized by recurrent and unpredictable seizures, necessitating accurate prediction for effective management and patient care. Application of machine learning (ML) on electroencephalogram (EEG) recordings, along with its ability to provide valuable insights into brain activity during seizures, is able to make accurate and robust seizure prediction an indispensable component in relevant studies. In this research, we present a comprehensive comparative analysis of five machine learning models - Random Forest (RF), Decision Tree (DT), Extra Trees (ET), Logistic Regression (LR), and Gradient Boosting (GB) - for the prediction of epileptic seizures using EEG data. The dataset underwent meticulous preprocessing, including cleaning, normalization, outlier handling, and oversampling, ensuring data quality and facilitating accurate model training. These preprocessing techniques played a crucial role in enhancing the models' performance. The results of our analysis demonstrate the performance of each model in terms of accuracy. The LR classifier achieved an accuracy of 56.95%, while GB and DT both attained 97.17% accuracy. RT achieved a higher accuracy of 98.99%, while the ET model exhibited the best performance with an accuracy of 99.29%. Our findings reveal that the ET model outperformed not only the other models in the comparative analysis but also surpassed the state-of-the-art results from previous research. The superior performance of the ET model makes it a compelling choice for accurate and robust epileptic seizure prediction using EEG data.
    摘要 эпилепсия是一种常见的神经疾病,表现为不规则的发作,需要准确预测以提供有效的管理和患者照料。通过机器学习(ML)的应用于电энцефалографиagram(EEG)记录,可以提供有价值的脑活动信息 durante发作,因此成为有效的预测组成部分。在这项研究中,我们进行了全面的比较分析,推荐五种机器学习模型:Random Forest(RF)、Decision Tree(DT)、Extra Trees(ET)、Logistic Regression(LR)和Gradient Boosting(GB),用于预测癫痫发作。数据集经过了仔细的处理,包括清洁、 нормализа、异常处理和扩展,以确保数据质量,并促进模型训练。这些处理技术在提高模型性能方面发挥了关键作用。我们的分析结果表明每个模型在准确率方面的表现,LR分类器达到56.95%的准确率,而GB和DT都达到97.17%的准确率。RT达到98.99%的准确率,而ET模型表现出了最好的性能,准确率达99.29%。我们的发现表明,ET模型不仅在本研究中的比较分析中表现出色,还超越了过去研究中的状态体现。ET模型的出色表现使其成为精准和可靠的癫痫发作预测方法。

Weakly Supervised Multi-Task Representation Learning for Human Activity Analysis Using Wearables

  • paper_url: http://arxiv.org/abs/2308.03805
  • repo_url: None
  • paper_authors: Taoran Sheng, Manfred Huber
  • for: 这 paper 的目的是提出一种多任务弱监督 siamese 网络,用于将数据映射到多个表示空间中,以便同时解决多个任务。
  • methods: 该方法使用了多输出 siamese 网络,其中每个输出都关注一个不同的方面的数据。这使得数据样本的表示向量在多个表示空间中被排序,使得同Semantic meaning的数据在同一个空间中几乎重合。
  • results: 经过一系列实验表明,该模型可以同时解决多个任务,并且在许多情况下可以超越单任务监督方法的性能。此外,paper 还进一步分析了该框架的architecture和多任务之间的关系,以及将多个任务合并到同一个框架中的可扩展性。
    Abstract Sensor data streams from wearable devices and smart environments are widely studied in areas like human activity recognition (HAR), person identification, or health monitoring. However, most of the previous works in activity and sensor stream analysis have been focusing on one aspect of the data, e.g. only recognizing the type of the activity or only identifying the person who performed the activity. We instead propose an approach that uses a weakly supervised multi-output siamese network that learns to map the data into multiple representation spaces, where each representation space focuses on one aspect of the data. The representation vectors of the data samples are positioned in the space such that the data with the same semantic meaning in that aspect are closely located to each other. Therefore, as demonstrated with a set of experiments, the trained model can provide metrics for clustering data based on multiple aspects, allowing it to address multiple tasks simultaneously and even to outperform single task supervised methods in many situations. In addition, further experiments are presented that in more detail analyze the effect of the architecture and of using multiple tasks within this framework, that investigate the scalability of the model to include additional tasks, and that demonstrate the ability of the framework to combine data for which only partial relationship information with respect to the target tasks is available.
    摘要 仪器数据流从智能设备和智能环境中广泛研究,如人动作识别(HAR)、人识别或健康监测。然而,大多数之前的活动和数据流分析工作都是专注一个方面的数据,例如仅仅识别活动的类型或仅仅识别活动的执行者。我们提议一种使用弱监督多输出siamesenet来映射数据到多个表示空间,其中每个表示空间专注一个数据方面。数据样本的表示向量在空间中的位置是根据数据semantic意义相似的,因此训练模型可以为多个任务提供分 clustering metrics,使其能同时解决多个任务,甚至在许多情况下超越单任务监督方法。此外,我们还进行了进一步的实验,分析了这种架构的影响和多任务内部的效果,以及模型可以扩展到包括更多任务的可扩展性。最后,我们还示出了将多个任务的数据合并起来,只有部分关系信息与目标任务相关的情况下,模型仍然可以提供高效的分类结果。

Serverless Federated AUPRC Optimization for Multi-Party Collaborative Imbalanced Data Mining

  • paper_url: http://arxiv.org/abs/2308.03035
  • repo_url: https://github.com/xidongwu/d-auprc
  • paper_authors: Xidong Wu, Zhengmian Hu, Jian Pei, Heng Huang
  • for: 这篇论文targets the problem of multi-party collaborative training for imbalanced data tasks, with the goal of maximizing the Area Under Precision-Recall Curve (AUPRC).
  • methods: 该论文提出了一种新的ServerLess biAsed sTochastic gradiEnt (SLATE) algorithm,用于直接优化AUPRC。此外,还提出了一种基于势量衰减技术的ServerLess biAsed sTochastic gradiEnt with Momentum-based variance reduction (SLATE-M) algorithm,用于提高收敛率。
  • results: 该论文的实验结果表明,SLATE-M算法可以与单机在线方法匹配最佳理论收敛率,并且在多方合作训练中减少了通信成本。
    Abstract Multi-party collaborative training, such as distributed learning and federated learning, is used to address the big data challenges. However, traditional multi-party collaborative training algorithms were mainly designed for balanced data mining tasks and are intended to optimize accuracy (\emph{e.g.}, cross-entropy). The data distribution in many real-world applications is skewed and classifiers, which are trained to improve accuracy, perform poorly when applied to imbalanced data tasks since models could be significantly biased toward the primary class. Therefore, the Area Under Precision-Recall Curve (AUPRC) was introduced as an effective metric. Although single-machine AUPRC maximization methods have been designed, multi-party collaborative algorithm has never been studied. The change from the single-machine to the multi-party setting poses critical challenges. To address the above challenge, we study the serverless multi-party collaborative AUPRC maximization problem since serverless multi-party collaborative training can cut down the communications cost by avoiding the server node bottleneck, and reformulate it as a conditional stochastic optimization problem in a serverless multi-party collaborative learning setting and propose a new ServerLess biAsed sTochastic gradiEnt (SLATE) algorithm to directly optimize the AUPRC. After that, we use the variance reduction technique and propose ServerLess biAsed sTochastic gradiEnt with Momentum-based variance reduction (SLATE-M) algorithm to improve the convergence rate, which matches the best theoretical convergence result reached by the single-machine online method. To the best of our knowledge, this is the first work to solve the multi-party collaborative AUPRC maximization problem.
    摘要 多方合作训练,如分布式学习和联合学习,用于解决大数据挑战。然而,传统的多方合作训练算法主要是为了均衡数据挖掘任务而设计,并且是为了提高准确率(例如,交叉熵)。然而,在实际应用中,数据分布往往偏斜,并且基于主要类别的模型可能会受到偏见。因此,Area Under Precision-Recall Curve(AUPRC)被引入为一个有效的度量。虽然单机AUPRC最大化方法已经被设计,但多方合作算法尚未被研究。在从单机到多机设置中的变化 pose critical challenges。为解决上述挑战,我们研究了无服务器多方合作AUPRC最大化问题,并将其重新定义为无服务器多方合作学习Setting中的conditional stochastic optimization问题。然后,我们提出了一种新的ServerLess biAsed sTochastic gradiEnt(SLATE)算法,以直接优化AUPRC。接着,我们使用了减少偏差的技术,并提出了ServerLess biAsed sTochastic gradiEnt with Momentum-based variance reduction(SLATE-M)算法,以提高收敛率,并与单机在线方法的最佳理论收敛率相匹配。到目前为止,这是首次解决多方合作AUPRC最大化问题的研究。

Pre-Trained Large Language Models for Industrial Control

  • paper_url: http://arxiv.org/abs/2308.03028
  • repo_url: None
  • paper_authors: Lei Song, Chuheng Zhang, Li Zhao, Jiang Bian
  • For: 这个论文旨在研究使用基础模型GPT-4来控制建筑物的冷暖空调系统(HVAC)。* Methods: 作者使用了GPT-4来控制HVAC系统,并通过提供简短的描述、选择的示例和当前观察来让GPT-4执行操作。* Results: 研究发现,GPT-4可以准确地控制HVAC系统,并且能够在不同的场景下进行一致性的控制。此外,研究还发现了不同的文本背景Context对性能的影响。
    Abstract For industrial control, developing high-performance controllers with few samples and low technical debt is appealing. Foundation models, possessing rich prior knowledge obtained from pre-training with Internet-scale corpus, have the potential to be a good controller with proper prompts. In this paper, we take HVAC (Heating, Ventilation, and Air Conditioning) building control as an example to examine the ability of GPT-4 (one of the first-tier foundation models) as the controller. To control HVAC, we wrap the task as a language game by providing text including a short description for the task, several selected demonstrations, and the current observation to GPT-4 on each step and execute the actions responded by GPT-4. We conduct series of experiments to answer the following questions: 1)~How well can GPT-4 control HVAC? 2)~How well can GPT-4 generalize to different scenarios for HVAC control? 3) How different parts of the text context affect the performance? In general, we found GPT-4 achieves the performance comparable to RL methods with few samples and low technical debt, indicating the potential of directly applying foundation models to industrial control tasks.
    摘要 для industrial control, developing high-performance controllers with few samples and low technical debt is appealing. Foundation models, possessing rich prior knowledge obtained from pre-training with Internet-scale corpus, have the potential to be a good controller with proper prompts. In this paper, we take HVAC (Heating, Ventilation, and Air Conditioning) building control as an example to examine the ability of GPT-4 (one of the first-tier foundation models) as the controller. To control HVAC, we wrap the task as a language game by providing text including a short description for the task, several selected demonstrations, and the current observation to GPT-4 on each step and execute the actions responded by GPT-4. We conduct series of experiments to answer the following questions: 1)~How well can GPT-4 control HVAC? 2)~How well can GPT-4 generalize to different scenarios for HVAC control? 3) How different parts of the text context affect the performance? In general, we found GPT-4 achieves the performance comparable to RL methods with few samples and low technical debt, indicating the potential of directly applying foundation models to industrial control tasks.Here's a word-for-word translation of the text into Simplified Chinese:为了 industrial control, 开发高性能控制器只需几个样本和低技术债是吸引人的。基础模型,通过 Internet 规模的预训练获得了丰富的先前知识,有potential为good controller。在这篇论文中,我们选择了 HVAC(卫生、通风、空调)建筑控制作为例子,检查 GPT-4(一个首层基础模型)的控制能力。为了控制 HVAC,我们将任务包装成语言游戏,在每步提供文本描述任务、选择的示例和当前观察,并执行 GPT-4 回答的动作。我们进行了一系列实验,回答以下问题:1)GPT-4 能控制 HVAC 吗?2)GPT-4 能在不同场景下控制 HVAC 吗?3)不同文本上下文部分如何影响性能?总的来说,我们发现 GPT-4 在几个样本和低技术债的情况下可以与RL方法匹配性能,表明可以直接应用基础模型到industrial control任务。

Towards Scene-Text to Scene-Text Translation

  • paper_url: http://arxiv.org/abs/2308.03024
  • repo_url: None
  • paper_authors: Onkar Susladkar, Prajwal Gatti, Anand Mishra
  • for: 这个研究旨在实现“视觉”翻译场景文本从源语言(例如英语)到目标语言(例如中文)。
  • methods: 我们引入了一种新的决策扩散方法,即VTNet,以Addressing several challenges in visual translation, such as interpolating font to unseen characters and preserving text size and background.
  • results: 我们通过了广泛的实验和相关方法比较,并且我们的模型超过了之前的状态态-of-the-art结果在传统的场景文本编辑benchmark中。
    Abstract In this work, we study the task of ``visually" translating scene text from a source language (e.g., English) to a target language (e.g., Chinese). Visual translation involves not just the recognition and translation of scene text but also the generation of the translated image that preserves visual features of the text, such as font, size, and background. There are several challenges associated with this task, such as interpolating font to unseen characters and preserving text size and the background. To address these, we introduce VTNet, a novel conditional diffusion-based method. To train the VTNet, we create a synthetic cross-lingual dataset of 600K samples of scene text images in six popular languages, including English, Hindi, Tamil, Chinese, Bengali, and German. We evaluate the performance of VTnet through extensive experiments and comparisons to related methods. Our model also surpasses the previous state-of-the-art results on the conventional scene-text editing benchmarks. Further, we present rigorous qualitative studies to understand the strengths and shortcomings of our model. Results show that our approach generalizes well to unseen words and fonts. We firmly believe our work can benefit real-world applications, such as text translation using a phone camera and translating educational materials. Code and data will be made publicly available.
    摘要 在这个研究中,我们研究了将场景文本从源语言(例如英语)翻译到目标语言(例如中文)的任务。视觉翻译不仅包括场景文本的识别和翻译,还包括生成翻译后的图像,保持文本的视觉特征,如字体、大小和背景。这个任务存在许多挑战,如 interpolating 字体到未seen 字符和保持文本大小和背景。为解决这些挑战,我们引入 VTNet,一种新的决策扩散方法。为训练 VTNet,我们创建了600,000个样本的场景文本图像 Synthetic 跨语言数据集,包括英语、捷尔文、泰米尔语、中文、孟加拉语和德语。我们通过广泛的实验和相关方法的比较来评估 VTNet 的性能。我们的模型还超过了之前的状态法则结果在场景文本编辑标准准则上。此外,我们进行了严格的Qualitative 研究,以了解我们的模型的优势和缺陷。结果表明我们的方法可以通过未看过的字符和字体进行泛化。我们认为我们的工作可以实际应用中批用,例如通过手机摄像头进行文本翻译和翻译教学材料。代码和数据将公开发布。

SAPIEN: Affective Virtual Agents Powered by Large Language Models

  • paper_url: http://arxiv.org/abs/2308.03022
  • repo_url: None
  • paper_authors: Masum Hasan, Cengiz Ozel, Sammy Potter, Ehsan Hoque
  • for: 这篇论文描述了一个基于大语言模型的虚拟代表人物平台,可以在13种语言中进行高 fidelioty的对话,并表现出情绪 through facial expressions and voice.
  • methods: 该平台使用了大语言模型驱动的虚拟代表人物,并允许用户自定义虚拟代表人物的性格、背景和对话前提,以提供丰富、沉浸的互动体验.
  • results: 该论文介绍了该平台的概述和其应用领域的可能性,从娱乐到心理健康、communication training、语言学习、教育、医疗等领域。此外,论文还考虑了这种真实的虚拟代表人物表现的伦理问题,并考虑了 Ensuring responsible use的挑战。
    Abstract In this demo paper, we introduce SAPIEN, a platform for high-fidelity virtual agents driven by large language models that can hold open domain conversations with users in 13 different languages, and display emotions through facial expressions and voice. The platform allows users to customize their virtual agent's personality, background, and conversation premise, thus providing a rich, immersive interaction experience. Furthermore, after the virtual meeting, the user can choose to get the conversation analyzed and receive actionable feedback on their communication skills. This paper illustrates an overview of the platform and discusses the various application domains of this technology, ranging from entertainment to mental health, communication training, language learning, education, healthcare, and beyond. Additionally, we consider the ethical implications of such realistic virtual agent representations and the potential challenges in ensuring responsible use.
    摘要 在这份演示文献中,我们介绍了SAPIEN平台,这是一个可以在13种不同语言上进行高效虚拟代表的大语言模型驱动的平台,可以为用户提供自定义虚拟代表的人格、背景和对话前提,从而提供丰富的 immerse 交互体验。此外,用户可以在虚拟会议后选择获取对话分析,并获得有用的沟通技巧反馈。本文介绍了该平台的概述和其应用领域的多样性,从娱乐到心理健康、沟通培训、语言学习、教育、医疗和更多。此外,我们还考虑了这些真实虚拟代表表现的伦理问题,并讨论了 garantizar 负责任使用的挑战。

Cal-SFDA: Source-Free Domain-adaptive Semantic Segmentation with Differentiable Expected Calibration Error

  • paper_url: http://arxiv.org/abs/2308.03003
  • repo_url: https://github.com/jo-wang/cal-sfda
  • paper_authors: Zixin Wang, Yadan Luo, Zhi Chen, Sen Wang, Zi Huang
  • for: 避免源数据泄露的域适应 semantic segmentation 问题
  • methods: 自然学习方法 pseudo-label 高信度区域,适应目标数据
  • results: 比前状态下提高 5.25% of mIoU,并且可以做公平的模型选择
    Abstract The prevalence of domain adaptive semantic segmentation has prompted concerns regarding source domain data leakage, where private information from the source domain could inadvertently be exposed in the target domain. To circumvent the requirement for source data, source-free domain adaptation has emerged as a viable solution that leverages self-training methods to pseudo-label high-confidence regions and adapt the model to the target data. However, the confidence scores obtained are often highly biased due to over-confidence and class-imbalance issues, which render both model selection and optimization problematic. In this paper, we propose a novel calibration-guided source-free domain adaptive semantic segmentation (Cal-SFDA) framework. The core idea is to estimate the expected calibration error (ECE) from the segmentation predictions, serving as a strong indicator of the model's generalization capability to the unlabeled target domain. The estimated ECE scores, in turn, assist the model training and fair selection in both source training and target adaptation stages. During model pre-training on the source domain, we ensure the differentiability of the ECE objective by leveraging the LogSumExp trick and using ECE scores to select the best source checkpoints for adaptation. To enable ECE estimation on the target domain without requiring labels, we train a value net for ECE estimation and apply statistic warm-up on its BatchNorm layers for stability. The estimated ECE scores assist in determining the reliability of prediction and enable class-balanced pseudo-labeling by positively guiding the adaptation progress and inhibiting potential error accumulation. Extensive experiments on two widely-used synthetic-to-real transfer tasks show that the proposed approach surpasses previous state-of-the-art by up to 5.25% of mIoU with fair model selection criteria.
    摘要 域际适应 semantic segmentation 的普遍使用引发了来源域数据泄露的问题,即private information从来源域可能不计划地泄露到target域。为了绕过来源数据的需求,source-free domain adaptation emerged as a viable solution,利用 self-training 方法 Pseudo-label high-confidence regions and adapt the model to the target data。然而,获得的信任分布 часто受到过度自信和分类不均问题的影响,这 rendering both model selection and optimization problematic。在这篇论文中,我们提出了一种新的 calibration-guided source-free domain adaptive semantic segmentation (Cal-SFDA) 框架。核心思想是 estimate the expected calibration error (ECE) from the segmentation predictions, serving as a strong indicator of the model's generalization capability to the unlabeled target domain。Estimated ECE scores, in turn, assist the model training and fair selection in both source training and target adaptation stages。在模型预训练的 source domain 阶段,我们利用 LogSumExp 技巧和 ECE 分数选择最佳的来源检查点进行适应。为了在target domain 上无需标签进行 ECE 估计,我们训练了一个值网来估计 ECE 分数,并在其 BatchNorm 层上应用统计暖身。Estimated ECE scores assist in determining the reliability of prediction and enable class-balanced pseudo-labeling by positively guiding the adaptation progress and inhibiting potential error accumulation。我们在两个Synthetic-to-real transfer task上进行了广泛的实验,结果表明,我们的方法比前一个state-of-the-art 高于5.25%的mIoU,并且满足了公平的模型选择标准。

Spanish Pre-trained BERT Model and Evaluation Data

  • paper_url: http://arxiv.org/abs/2308.02976
  • repo_url: https://github.com/dccuchile/beto
  • paper_authors: José Cañete, Gabriel Chaperon, Rodrigo Fuentes, Jou-Hui Ho, Hojin Kang, Jorge Pérez
  • for: 本研究的目的是提供一个基于BERT的西班牙语模型,并且将西班牙语相关任务集成一个单一的库,以便进行模型训练和评估。
  • methods: 本研究使用BERT预训概念,并将西班牙语数据集用于预训。
  • results: 经过精致调整后,我们的西班牙语模型在大多数任务中比其他基于BERT的多语言模型优化,甚至在一些任务中创下新的最佳纪录。
    Abstract The Spanish language is one of the top 5 spoken languages in the world. Nevertheless, finding resources to train or evaluate Spanish language models is not an easy task. In this paper we help bridge this gap by presenting a BERT-based language model pre-trained exclusively on Spanish data. As a second contribution, we also compiled several tasks specifically for the Spanish language in a single repository much in the spirit of the GLUE benchmark. By fine-tuning our pre-trained Spanish model, we obtain better results compared to other BERT-based models pre-trained on multilingual corpora for most of the tasks, even achieving a new state-of-the-art on some of them. We have publicly released our model, the pre-training data, and the compilation of the Spanish benchmarks.
    摘要 西班牙语是全球前5种最广泛使用的语言之一,然而找到用于训练或评估西班牙语模型的资源并不是一件容易的事情。在这篇论文中,我们帮助填补这一差距,并提供了基于BERT的西班牙语模型,该模型在西班牙语数据上进行了专门预训练。此外,我们还编译了一些特定于西班牙语的任务,并将其集成到了GLUE数据集的同类型的单一存储中。经过我们的西班牙语模型的微调,我们在大多数任务上Obtained better results than other BERT-based models pre-trained on multilingual corpora,甚至在一些任务上创造了新的状态反应。我们已经公开发布了我们的模型、预训练数据和西班牙语benchmark集。

Understanding User Intent Modeling for Conversational Recommender Systems: A Systematic Literature Review

  • paper_url: http://arxiv.org/abs/2308.08496
  • repo_url: None
  • paper_authors: Siamak Farshidi, Kiyan Rezaee, Sara Mazaheri, Amir Hossein Rahimi, Ali Dadashzadeh, Morteza Ziabakhsh, Sadegh Eskandari, Slinger Jansen
  • for: 本研究旨在帮助研究者选择适合其系统的用户意图模型,以提高对话推荐系统的个性化响应。
  • methods: 我们通过系统性的文献综述方法收集了对话推荐系统中通常使用的模型,并基于这些数据开发了一个决策模型。我们还完成了两个案例研究来评估我们提议的决策模型的效果。
  • results: 我们的研究分析了59种不同的模型和74种通常使用的特征,提供了对用户意图模型的实际应用和评估的深入理解。我们还发现了一些模型组合的潜在优势、趋势、评价标准和常用的训练和评估数据集。
    Abstract Context: User intent modeling is a crucial process in Natural Language Processing that aims to identify the underlying purpose behind a user's request, enabling personalized responses. With a vast array of approaches introduced in the literature (over 13,000 papers in the last decade), understanding the related concepts and commonly used models in AI-based systems is essential. Method: We conducted a systematic literature review to gather data on models typically employed in designing conversational recommender systems. From the collected data, we developed a decision model to assist researchers in selecting the most suitable models for their systems. Additionally, we performed two case studies to evaluate the effectiveness of our proposed decision model. Results: Our study analyzed 59 distinct models and identified 74 commonly used features. We provided insights into potential model combinations, trends in model selection, quality concerns, evaluation measures, and frequently used datasets for training and evaluating these models. Contribution: Our study contributes practical insights and a comprehensive understanding of user intent modeling, empowering the development of more effective and personalized conversational recommender systems. With the Conversational Recommender System, researchers can perform a more systematic and efficient assessment of fitting intent modeling frameworks.
    摘要 Context: 用户意图模型化是自然语言处理领域的关键过程,旨在识别用户请求的含义,以提供个性化回答。随着文献中所提出的多种方法的激增(过去一个十年内有超过13,000篇论文),了解相关的概念和在人工智能系统中常用的模型是非常重要。方法:我们进行了系统性的文献综述,以收集用于设计对话推荐系统的模型的数据。从收集到的数据中,我们开发了一个决策模型,以 помо助研究人员选择最适合他们系统的模型。此外,我们进行了两个实验,以评估我们的提议的决策模型的有效性。结果:我们的研究分析了59种不同的模型,并发现了74种通常使用的特征。我们提供了模型组合的可能性、模型选择趋势、质量问题、评价标准和用于训练和评估这些模型的常用数据集。贡献:我们的研究提供了实用的准确和对话推荐系统的全面理解,推动了更有效和个性化的对话推荐系统的开发。通过对话推荐系统,研究人员可以更系统和有效地评估适用的意图模型框架。

Science and engineering for what? A large-scale analysis of students’ projects in science fairs

  • paper_url: http://arxiv.org/abs/2308.02962
  • repo_url: None
  • paper_authors: Adelmo Eloy, Thomas Palmeira Ferraz, Fellip Silva Alves, Roseli de Deus Lopes
  • for: The paper is written to analyze the themes and topics that have driven students’ inquiry and design in science fair projects over the past 20 years in Brazil.
  • methods: The paper uses topic modeling to identify the main topics being explored in the projects, and to examine variations over time, region, and school setting.
  • results: The analysis found a broad range of topics being explored, with significant variations over time, region, and school setting, and the authors argue that the results and proposed methodology can support further research and inform instruction and resource design for open inquiry experiences in different settings.Here are the three points in Simplified Chinese text:
  • for: 这个论文是为了分析过去20年在巴西的科学展上学生的探索和设计主题。
  • methods: 这篇论文使用主题分析来确定学生的探索和设计主题,并分析时间、地区和学校背景的变化。
  • results: 分析发现了各种主题的探索,时间、地区和学校背景中存在显著的变化,作者认为这些结果和提出的方法可以支持更多的研究,并且用于不同设置的 instrucion 和资源设计。
    Abstract Science and Engineering fairs offer K-12 students opportunities to engage with authentic STEM practices. Particularly, students are given the chance to experience authentic and open inquiry processes, by defining which themes, questions and approaches will guide their scientific endeavors. In this study, we analyzed data from over 5,000 projects presented at a nationwide science fair in Brazil over the past 20 years using topic modeling to identify the main topics that have driven students' inquiry and design. Our analysis identified a broad range of topics being explored, with significant variations over time, region, and school setting. We argue those results and proposed methodology can not only support further research in the context of science fairs, but also inform instruction and design of contexts-specific resources to support students in open inquiry experiences in different settings.
    摘要 科学和工程博览会为小学到高中学生提供了实践科学技术的机会。特别是,学生有机会经历真正的开放探索过程,确定他们的科学做业的主题、问题和方法。在这项研究中,我们使用主题分析对 brasil 国内过去 20 年的全国科学博览会上的项目数据进行分析,并确定了学生的探索主题的主要趋势。我们的分析发现了广泛的主题被探索,并且在时间、地区和学校背景中有显著的变化。我们认为这些结果和方法可以不仅支持后续在科学博览会上的研究,还可以 Inform 不同设置中的 instrucion 和资源设计,以支持学生在开放探索经历中。

Data Fusion for Multi-Task Learning of Building Extraction and Height Estimation

  • paper_url: http://arxiv.org/abs/2308.02960
  • repo_url: https://github.com/SaadAhmedJamal/IEEE_DFC2023
  • paper_authors: Saad Ahmed Jamal, Arioluwa Aribisala
  • for: 这个论文是针对都市重建问题进行的一种多任务学习方法,利用光学和雷达卫星图像进行建筑物提取和高度估算。
  • methods: 这个论文使用多任务学习方法,将建筑物提取和高度估算作为两个独立的任务进行实现,并在这两个任务之间设置约束。
  • results: 根据设计实验结果,论文的基准结果在建筑物提取和高度估算方面得到了显著提高。
    Abstract In accordance with the urban reconstruction problem proposed by the DFC23 Track 2 Contest, this paper attempts a multitask-learning method of building extraction and height estimation using both optical and radar satellite imagery. Contrary to the initial goal of multitask learning which could potentially give a superior solution by reusing features and forming implicit constraints between multiple tasks, this paper reports the individual implementation of the building extraction and height estimation under constraints. The baseline results for the building extraction and the height estimation significantly increased after designed experiments.
    摘要 根据DFC23 Track 2 Contest提出的城市重建问题,本文提出了一种多任务学习方法,通过光学和雷达卫星图像进行建筑物提取和高度估计。与初始目标的多任务学习不同,本文报告了每个任务的单独实现,而不是 reuse features和形成多任务之间的含义约束。经过设计实验,基准结果显著提高。

A criterion for Artificial General Intelligence: hypothetic-deductive reasoning, tested on ChatGPT

  • paper_url: http://arxiv.org/abs/2308.02950
  • repo_url: None
  • paper_authors: Louis Vervoort, Vitaliy Mizyakov, Anastasia Ugleva
  • for: 这篇论文探讨了一种逻辑能力,即假设推理能力,是否能让AI达到人类智能水平。
  • methods: 作者提出了一些简单的测试方法来评估AI的假设推理能力,并应用到了ChatGPT上。
  • results: 研究发现,目前ChatGPT在较复杂的问题上的假设推理能力有限,但如果AI可以在多种情况下表现出这种逻辑能力,那么它就可以被视为人类智能水平。
    Abstract We argue that a key reasoning skill that any advanced AI, say GPT-4, should master in order to qualify as 'thinking machine', or AGI, is hypothetic-deductive reasoning. Problem-solving or question-answering can quite generally be construed as involving two steps: hypothesizing that a certain set of hypotheses T applies to the problem or question at hand, and deducing the solution or answer from T - hence the term hypothetic-deductive reasoning. An elementary proxy of hypothetic-deductive reasoning is causal reasoning. We propose simple tests for both types of reasoning, and apply them to ChatGPT. Our study shows that, at present, the chatbot has a limited capacity for either type of reasoning, as soon as the problems considered are somewhat complex. However, we submit that if an AI would be capable of this type of reasoning in a sufficiently wide range of contexts, it would be an AGI.
    摘要 我们认为,任何高级AI,如GPT-4,以“思考机器”或AGI的标准,应具备推理能力。我们认为,这种推理能力应包括假设推理。问题解释或问题回答通常可以分为两步:首先假设一个假设集T适用于问题或问题,然后从T中推理出解释或答案。因此,我们称这种推理为假设推理。我们提出了两种这种推理的简单测验,并将其应用到ChatGPT。我们的研究表明,现在,这个 chatbot 仅有限的能力进行这些类型的推理,只有在问题变得相当复杂时。但我们认为,如果AI能够在充分广泛的上下文中进行这种推理,则它将是一个AGI。

dPASP: A Comprehensive Differentiable Probabilistic Answer Set Programming Environment For Neurosymbolic Learning and Reasoning

  • paper_url: http://arxiv.org/abs/2308.02944
  • repo_url: None
  • paper_authors: Renato Lui Geh, Jonas Gonçalves, Igor Cataneo Silveira, Denis Deratani Mauá, Fabio Gagliardi Cozman
  • for: This paper proposes a new framework called dPASP for differentiable neuro-symbolic reasoning, which allows for the combination of discrete probabilistic models, neural predicates, logic constraints, and interval-valued probabilistic choices.
  • methods: The paper discusses several semantics for probabilistic logic programs that can express nondeterministic, contradictory, incomplete, and/or statistical knowledge, and how gradient-based learning can be performed with neural predicates and probabilistic choices under these semantics.
  • results: The paper describes an implemented package that supports inference and learning in the language, along with several example programs, and demonstrates that the package allows for end-to-end training of rather sophisticated models and loss functions with minimal user knowledge of deep learning system’s inner workings.
    Abstract We present dPASP, a novel declarative probabilistic logic programming framework for differentiable neuro-symbolic reasoning. The framework allows for the specification of discrete probabilistic models with neural predicates, logic constraints and interval-valued probabilistic choices, thus supporting models that combine low-level perception (images, texts, etc), common-sense reasoning, and (vague) statistical knowledge. To support all such features, we discuss the several semantics for probabilistic logic programs that can express nondeterministic, contradictory, incomplete and/or statistical knowledge. We also discuss how gradient-based learning can be performed with neural predicates and probabilistic choices under selected semantics. We then describe an implemented package that supports inference and learning in the language, along with several example programs. The package requires minimal user knowledge of deep learning system's inner workings, while allowing end-to-end training of rather sophisticated models and loss functions.
    摘要 我们介绍了dPASP,一种新的宣告型概率逻辑编程框架,用于不同槽的神经符号逻辑推理。该框架允许指定混合低水平感知(图像、文本等)、通用理智、抽象统计知识的概率模型。为支持这些特性,我们讨论了概率逻辑程序的多种 semantics,包括不确定、矛盾、不完整和统计知识。我们还讨论了如何在选定 semantics 下使用神经 predicate 和概率选择来进行梯度基本学习。然后,我们描述了实现的包,包括推理和学习语言的实现,以及一些示例程序。该包需要最小化用户对深度学习系统内部办公的知识,同时允许执行比较复杂的模型和损失函数的整体训练。

Dark-Skin Individuals Are at More Risk on the Street: Unmasking Fairness Issues of Autonomous Driving Systems

  • paper_url: http://arxiv.org/abs/2308.02935
  • repo_url: None
  • paper_authors: Xinyue Li, Zhenpeng Chen, Jie M. Zhang, Federica Sarro, Ying Zhang, Xuanzhe Liu
    for: 本研究检测了自动驾驶系统中人工步行检测器的公平性问题,这是一个尚未得到充分研究的问题。methods: 我们用了8种广泛研究的人工步行检测器进行公平性测试,并对大规模的实际世界数据集进行了大规模的批处理和标注。我们为数据集提供了广泛的标注,包括16,070个性别标签、20,115个年龄标签和3,513个皮肤颜色标签。results: 我们的发现表明,年龄和皮肤颜色存在显著的公平问题。对于成年人和儿童,检测精度差异为19.67%,而对于轻皮和暗皮人,检测精度差异为7.52%。而gender只有1.1%的差异。此外,我们发现在文献中常见的自动驾驶测试场景下,对暗皮人的偏见明显增加。我们将代码、数据和结果公开发布,以便未来关于公平性在自动驾驶领域的研究。
    Abstract This paper conducts fairness testing on automated pedestrian detection, a crucial but under-explored issue in autonomous driving systems. We evaluate eight widely-studied pedestrian detectors across demographic groups on large-scale real-world datasets. To enable thorough fairness testing, we provide extensive annotations for the datasets, resulting in 8,311 images with 16,070 gender labels, 20,115 age labels, and 3,513 skin tone labels. Our findings reveal significant fairness issues related to age and skin tone. The detection accuracy for adults is 19.67% higher compared to children, and there is a 7.52% accuracy disparity between light-skin and dark-skin individuals. Gender, however, shows only a 1.1% difference in detection accuracy. Additionally, we investigate common scenarios explored in the literature on autonomous driving testing, and find that the bias towards dark-skin pedestrians increases significantly under scenarios of low contrast and low brightness. We publicly release the code, data, and results to support future research on fairness in autonomous driving.
    摘要 Translation notes:* "crucial" is translated as "重要的" (zhòng yào de), which means "very important" or "crucial"* "under-explored" is translated as "未探索的" (wèi tàn gōu de), which means "under-explored" or "unexplored"* "demographic groups" is translated as "人口群体" (rén kǒu qún tǐ), which means "demographic groups" or "population groups"* "gender" is translated as "性别" (xìng bèi), which means "gender"* "skin tone" is translated as "皮肤色" (pí fù sè), which means "skin tone"* "dark-skin" is translated as "黑皮肤" (hēi pí fù), which means "dark-skin"* "low contrast" is translated as "低对比度" (dī yǎo duì bǐ dù), which means "low contrast"* "low brightness" is translated as "低亮度" (dī liàng dù), which means "low brightness"* "scenarios" is translated as "场景" (chǎng jǐng), which means "scenarios" or "settings"* "bias" is translated as "偏见" (piān jiàn), which means "bias" or "prejudice"

ConvFormer: Revisiting Transformer for Sequential User Modeling

  • paper_url: http://arxiv.org/abs/2308.02925
  • repo_url: None
  • paper_authors: Hao Wang, Jianxun Lian, Mingqi Wu, Haoxuan Li, Jiajun Fan, Wanyue Xu, Chaozhuo Li, Xing Xie
  • for: 这篇论文的目的是提高个性化推荐系统中的序列用户模型,以更好地理解用户行为序列。
  • methods: 该论文使用了改进的Transformer结构,探讨了 item-to-item 机制在序列用户模型中的效果,并从实验分析中提出了三个关键指南。
  • results: 实验结果表明,该模型在四个公共数据集上实现了state-of-the-art的结果,并证实了提出的三个指南的有用性。
    Abstract Sequential user modeling, a critical task in personalized recommender systems, focuses on predicting the next item a user would prefer, requiring a deep understanding of user behavior sequences. Despite the remarkable success of Transformer-based models across various domains, their full potential in comprehending user behavior remains untapped. In this paper, we re-examine Transformer-like architectures aiming to advance state-of-the-art performance. We start by revisiting the core building blocks of Transformer-based methods, analyzing the effectiveness of the item-to-item mechanism within the context of sequential user modeling. After conducting a thorough experimental analysis, we identify three essential criteria for devising efficient sequential user models, which we hope will serve as practical guidelines to inspire and shape future designs. Following this, we introduce ConvFormer, a simple but powerful modification to the Transformer architecture that meets these criteria, yielding state-of-the-art results. Additionally, we present an acceleration technique to minimize the complexity associated with processing extremely long sequences. Experiments on four public datasets showcase ConvFormer's superiority and confirm the validity of our proposed criteria.
    摘要 纵向用户模型化,个人化推荐系统中的关键任务,旨在预测用户下一个 preference 的 item,需要深刻了解用户行为序列。尽管转换器基本模型在不同领域取得了很大成功,但它们在理解用户行为方面的潜在能力还未得到充分利用。本文重新评估转换器类型的架构,以提高个人化推荐系统的性能。我们从转换器基本建构块开始,分析 item-to-item 机制在用户行为序列中的效果。经过广泛的实验分析,我们确定了三个关键标准 для设计高效的纵向用户模型,希望这些标准能够作为实践指南,激发和改进未来的设计。接着,我们介绍 ConvFormer,一种简单 yet 强大的修改,满足这些标准,并实现了状态之最好的结果。此外,我们还提出了加速技术,以降低处理极长序列的复杂性。经过四个公共数据集的实验,ConvFormer 的优势和我们提出的标准的有效性得到了证明。

Adversarial Erasing with Pruned Elements: Towards Better Graph Lottery Ticket

  • paper_url: http://arxiv.org/abs/2308.02916
  • repo_url: https://github.com/wangyuwen0627/ace-glt
  • paper_authors: Yuwen Wang, Shunyu Liu, Kaixuan Chen, Tongtian Zhu, Ji Qiao, Mengjie Shi, Yuanyu Wan, Mingli Song
  • for: 提高大输入图的深度图神经网络计算成本,并保持原始性能。
  • methods: 提出了一种新的敌对补做法(ACE),通过在剪枝过程中挖掘价值信息,提高GLT的性能。
  • results: 实验结果显示,我们的ACE-GLT在多种任务中表现出色,超过了现有方法。
    Abstract Graph Lottery Ticket (GLT), a combination of core subgraph and sparse subnetwork, has been proposed to mitigate the computational cost of deep Graph Neural Networks (GNNs) on large input graphs while preserving original performance. However, the winning GLTs in exisiting studies are obtained by applying iterative magnitude-based pruning (IMP) without re-evaluating and re-considering the pruned information, which disregards the dynamic changes in the significance of edges/weights during graph/model structure pruning, and thus limits the appeal of the winning tickets. In this paper, we formulate a conjecture, i.e., existing overlooked valuable information in the pruned graph connections and model parameters which can be re-grouped into GLT to enhance the final performance. Specifically, we propose an adversarial complementary erasing (ACE) framework to explore the valuable information from the pruned components, thereby developing a more powerful GLT, referred to as the ACE-GLT. The main idea is to mine valuable information from pruned edges/weights after each round of IMP, and employ the ACE technique to refine the GLT processing. Finally, experimental results demonstrate that our ACE-GLT outperforms existing methods for searching GLT in diverse tasks. Our code will be made publicly available.
    摘要 Graph Lottery Ticket (GLT), 一种组合核心子图和稀疏子网络的方法,被提出以减少深度图神经网络(GNNs)中输入图的计算成本而保持原始性能。然而,现有的赢家GLT在存在的研究中通常通过不重新评估和重新考虑被剪除的信息来获得赢家票,这会忽略图/模型结构剪除过程中边/权重的动态变化,从而限制赢家票的吸引力。在这篇论文中,我们提出一个 conjecture,即现有的忽略了有价值信息的剪除graph连接和模型参数,可以重新组织成GLT,以提高最终性能。具体来说,我们提出一种对抗补偿抹除(ACE)框架,以探索剪除后的有价值信息,并使用ACE技术来练化GLT处理。最后,我们通过实验结果发现,我们的ACE-GLT在多种任务中超过了现有的搜索GLT方法。我们将代码公开发布。

Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques

  • paper_url: http://arxiv.org/abs/2308.04455
  • repo_url: https://github.com/deep-privacy/SA-toolkit
  • paper_authors: Pierre Champion
  • for: 防止语音数据隐私泄露
  • methods: 使用量化变换提高匿名化精度,并对各组件进行独立评估
  • results: 提出新的匿名化方法,并对现有系统进行攻击和破解分析
    Abstract The growing use of voice user interfaces has led to a surge in the collection and storage of speech data. While data collection allows for the development of efficient tools powering most speech services, it also poses serious privacy issues for users as centralized storage makes private personal speech data vulnerable to cyber threats. With the increasing use of voice-based digital assistants like Amazon's Alexa, Google's Home, and Apple's Siri, and with the increasing ease with which personal speech data can be collected, the risk of malicious use of voice-cloning and speaker/gender/pathological/etc. recognition has increased. This thesis proposes solutions for anonymizing speech and evaluating the degree of the anonymization. In this work, anonymization refers to making personal speech data unlinkable to an identity while maintaining the usefulness (utility) of the speech signal (e.g., access to linguistic content). We start by identifying several challenges that evaluation protocols need to consider to evaluate the degree of privacy protection properly. We clarify how anonymization systems must be configured for evaluation purposes and highlight that many practical deployment configurations do not permit privacy evaluation. Furthermore, we study and examine the most common voice conversion-based anonymization system and identify its weak points before suggesting new methods to overcome some limitations. We isolate all components of the anonymization system to evaluate the degree of speaker PPI associated with each of them. Then, we propose several transformation methods for each component to reduce as much as possible speaker PPI while maintaining utility. We promote anonymization algorithms based on quantization-based transformation as an alternative to the most-used and well-known noise-based approach. Finally, we endeavor a new attack method to invert anonymization.
    摘要 随着声控交互的使用增加,声音数据的收集和存储也逐渐增加。然而,这也会产生一些隐私问题,因为中央存储的私人声音数据容易受到网络攻击。随着声音基于的数字助手 like Alexa、Home 和 Siri 的使用的加剧,以及声音数据的采集变得更加容易,黑客利用声音恶意识别和 speaker/性别/疾病等识别的风险也在增加。这个论文提出了一些解决隐私问题的方法,包括声音anonimization和评估隐私保护度的方法。在这个工作中,anonimization指的是让个人声音数据与身份分离开来,而不是完全隐藏声音内容。我们开始 by 认识评估协议中需要考虑的一些挑战,并且 clarify 如何配置anonimization系统以便评估。我们还发现了许多实际部署配置不允许隐私评估。此外,我们还研究了最常用的声音转换基于的隐私保护系统的弱点,并提出了一些新的方法来解决一些限制。我们将声音隐私系统中的每个组件分解出来,并评估它们中 speaker PPI 的水平。然后,我们提出了一些变换方法来减少 speaker PPI,同时保持用户体验。我们推荐使用量化变换算法,而不是传统的噪声基本方法。最后,我们提出了一种新的攻击方法,以尝试破坏隐私保护。

Anomaly Detection in Global Financial Markets with Graph Neural Networks and Nonextensive Entropy

  • paper_url: http://arxiv.org/abs/2308.02914
  • repo_url: None
  • paper_authors: Kleyton da Costa
  • for: 本研究探讨了在全球金融市场中检测异常现象的能力,特别是在多变量系统中。
  • methods: 本研究使用图神经网络(GNN)来检测异常现象,并考虑了不确定性enarioMeasured by nonextensive entropy。
  • results: 主要发现结果表明,在危机时期,高度相关的资产结构下降,并且在不同的nonextensive entropy参数下,异常数量 statistically different。
    Abstract Anomaly detection is a challenging task, particularly in systems with many variables. Anomalies are outliers that statistically differ from the analyzed data and can arise from rare events, malfunctions, or system misuse. This study investigated the ability to detect anomalies in global financial markets through Graph Neural Networks (GNN) considering an uncertainty scenario measured by a nonextensive entropy. The main findings show that the complex structure of highly correlated assets decreases in a crisis, and the number of anomalies is statistically different for nonextensive entropy parameters considering before, during, and after crisis.
    摘要 <>传统的异常检测任务是非常具有挑战性,尤其在系统中存在多个变量时。异常是数据中的异常值,可能由罕见事件、设备故障或系统滥用引起。本研究通过图神经网络(GNN)检测了全球金融市场中的异常现象,并考虑了不确定性enario中的 nonextensive entropy。主要发现结果表明,危机时期的复杂资产结构下降,并且 nonextensive entropy参数中的异常数量在危机前、 durante 和危机后期 Statistically different。Note: "nonextensive entropy" is translated as "不确定性enario" in Simplified Chinese, which is a combination of "不确定" (uncertain) and "scenario" (a hypothetical or imaginary situation).