2023-11-12

cs.AI

cs.AI - 2023-11-12

Creating a Discipline-specific Commons for Infectious Disease Epidemiology

paper_url: http://arxiv.org/abs/2311.06989
repo_url: None
paper_authors: Michael M. Wagner, William Hogan, John Levander, Adam Darr, Matt Diller, Max Sibilla, Alexander T. Loiacono. Terence Sperringer, Jr., Shawn T. Brown
for: 本研究目的是建立一个感染病谱（ID） epidemiology共享平台，让 epidemiologists、公共卫生官员、数据生产者和软件开发者可以不仅分享数据和软件，还可以在改进它们的可操作性方面得到帮助。
methods: 本研究使用OWL 2和逻辑查询来推导可能兼容的软件和数据集组合，以及这些组合的统计信息。同时， authors 还使用 DATS 2.2 和自己设计的软件元数据Schema来表示对象。
results: 研究发现，由于软件输入/输出格式的标准化不足，实现可操作性受限。然而，逻辑搜索基于名称的数据格式的 triple store 仍然能够确定众多的可能兼容的软件和数据集组合。I hope this helps! Let me know if you have any further questions.

Abstract
Objective: To create a commons for infectious disease (ID) epidemiology in which epidemiologists, public health officers, data producers, and software developers can not only share data and software, but receive assistance in improving their interoperability. Materials and Methods: We represented 586 datasets, 54 software, and 24 data formats in OWL 2 and then used logical queries to infer potentially interoperable combinations of software and datasets, as well as statistics about the FAIRness of the collection. We represented the objects in DATS 2.2 and a software metadata schema of our own design. We used these representations as the basis for the Content, Search, FAIR-o-meter, and Workflow pages that constitute the MIDAS Digital Commons. Results: Interoperability was limited by lack of standardization of input and output formats of software. When formats existed, they were human-readable specifications (22/24; 92%); only 3 formats (13%) had machine-readable specifications. Nevertheless, logical search of a triple store based on named data formats was able to identify scores of potentially interoperable combinations of software and datasets. Discussion: We improved the findability and availability of a sample of software and datasets and developed metrics for assessing interoperability. The barriers to interoperability included poor documentation of software input/output formats and little attention to standardization of most types of data in this field. Conclusion: Centralizing and formalizing the representation of digital objects within a commons promotes FAIRness, enables its measurement over time and the identification of potentially interoperable combinations of data and software.

摘要
目标：创建一个媒体共享平台 для感染病（ID）epidemiology，让感染病学家、公共卫生官员、数据生产者和软件开发者可以不仅分享数据和软件，而且在提高它们的相互适用性方面获得帮助。材料和方法：我们将586个数据集、54个软件和24个数据格式表示为OWL 2，然后使用逻辑查询来推理可能相互适用的软件和数据集的组合，以及这些集合的统计数据。我们使用这些表示来构建Content、搜索、FAIR-o-meter和工作流页面，这些页面组成了MIDAS数字共享平台。结果：宏观的可操作性受到数据集和软件之间的标准化问题的限制。虽然大多数数据格式有人类可读的规范（22/24，92%），但只有13%的格式有机器可读的规范。然而，基于命名数据格式的 triple store 上的逻辑搜索仍可以identify scores of potentially interoperable combinations of software and datasets。讨论：我们提高了一个样本的软件和数据集的可用性和找到性，并开发了评估相互适用性的 metrics。障碍因子包括感染病软件的输入/输出格式的ocumentation缺乏和这个领域大多数数据类型的标准化得 little attention。结论：在一个共享平台上中央和正式地表示数字对象可以提高FAIRness，并且可以随时衡量和识别数据和软件之间的可能相互适用组合。

Assessing the Interpretability of Programmatic Policies with Large Language Models

paper_url: http://arxiv.org/abs/2311.06979
repo_url: None
paper_authors: Zahra Bashir, Michael Bowling, Levi H. S. Lelis
for: 这篇论文目的是评估程序编程策略的可解性。
methods: 这篇论文使用大型自然语言模型（LLM）来评估程序编程策略的可解性。
results: 该评估方法可以准确地评估程序编程策略的可解性，并且可以用来比较不同的程序编程策略的可解性水平。

Abstract
Although the synthesis of programs encoding policies often carries the promise of interpretability, systematic evaluations to assess the interpretability of these policies were never performed, likely because of the complexity of such an evaluation. In this paper, we introduce a novel metric that uses large-language models (LLM) to assess the interpretability of programmatic policies. For our metric, an LLM is given both a program and a description of its associated programming language. The LLM then formulates a natural language explanation of the program. This explanation is subsequently fed into a second LLM, which tries to reconstruct the program from the natural language explanation. Our metric measures the behavioral similarity between the reconstructed program and the original. We validate our approach using obfuscated programs that are used to solve classic programming problems. We also assess our metric with programmatic policies synthesized for playing a real-time strategy game, comparing the interpretability scores of programmatic policies synthesized by an existing system to lightly obfuscated versions of the same programs. Our LLM-based interpretability score consistently ranks less interpretable programs lower and more interpretable ones higher. These findings suggest that our metric could serve as a reliable and inexpensive tool for evaluating the interpretability of programmatic policies.

摘要
Simplified Chinese translation:尽管编译程序策略的合成 часто承诺可读性，但系统性的评估以评估这些策略的可读性从未进行过，可能是因为评估的复杂性。在这篇论文中，我们引入了一个新的指标，使用大型自然语言模型（LLM）来评估程序策略的可读性。我们给LLM提供了一个程序和其关联的编程语言描述。LLM然后将程序转换成自然语言形式的解释。这个解释被Feed入第二个LLM，它尝试从自然语言解释中重构程序。我们的指标测量重构后的程序与原始程序之间的行为相似性。我们验证我们的方法使用了难以解读的程序，用于解决经典编程问题。我们还对使用LLM进行评估的程序策略与轻微隐藏版本的同样程序进行比较。我们的LLM可读性分数一直 ranks不可读性程序低和可读性程序高。这些发现表示我们的指标可能是一种可靠且便宜的评估程序策略可读性的工具。

Physics-Informed Data Denoising for Real-Life Sensing Systems

paper_url: http://arxiv.org/abs/2311.06968
repo_url: None
paper_authors: Xiyuan Zhang, Xiaohan Fu, Diyan Teng, Chengyu Dong, Keerthivasan Vijayakumar, Jiayun Zhang, Ranak Roy Chowdhury, Junsheng Han, Dezhi Hong, Rashmi Kulkarni, Jingbo Shang, Rajesh Gupta
for: 这篇论文是为了提出一种基于物理法则的实时减杂模型，以提高实际应用中的感应器资料品质。
methods: 这篇论文使用的方法是基于物理法则的实时减杂模型，利用不同感应器测值之间的物理关系来导正减杂过程，不需要使用实际减杂数据。
results: 这篇论文的实验结果显示，这种基于物理法则的实时减杂模型可以在不同领域中实现高性能，例如陀螺仪 Navigation、CO2监控和HVAC控制，并且可以实现实时减杂（4ms для每一秒的序列），与高精度、高成本的替代方法相匹配。

Abstract
Sensors measuring real-life physical processes are ubiquitous in today's interconnected world. These sensors inherently bear noise that often adversely affects performance and reliability of the systems they support. Classic filtering-based approaches introduce strong assumptions on the time or frequency characteristics of sensory measurements, while learning-based denoising approaches typically rely on using ground truth clean data to train a denoising model, which is often challenging or prohibitive to obtain for many real-world applications. We observe that in many scenarios, the relationships between different sensor measurements (e.g., location and acceleration) are analytically described by laws of physics (e.g., second-order differential equation). By incorporating such physics constraints, we can guide the denoising process to improve even in the absence of ground truth data. In light of this, we design a physics-informed denoising model that leverages the inherent algebraic relationships between different measurements governed by the underlying physics. By obviating the need for ground truth clean data, our method offers a practical denoising solution for real-world applications. We conducted experiments in various domains, including inertial navigation, CO2 monitoring, and HVAC control, and achieved state-of-the-art performance compared with existing denoising methods. Our method can denoise data in real time (4ms for a sequence of 1s) for low-cost noisy sensors and produces results that closely align with those from high-precision, high-cost alternatives, leading to an efficient, cost-effective approach for more accurate sensor-based systems.

摘要
现代互连世界中的感测器广泛存在，这些感测器自然带有噪声，这些噪声可能会影响感测器支持的系统性能和可靠性。 классический滤波器基本方法假设时间或频率特性的感测值，而学习基于权重的净化方法通常需要使用clean数据来训练净化模型，这在许多实际应用中是困难或不可能的。我们发现，在许多场景下，不同感测值之间的关系可以通过物理法则（如二阶差分方程）进行描述。我们可以利用这些物理约束，导引净化过程，以提高净化效果，甚至在clean数据不可获得的情况下。在这个意义上，我们设计了物理约束净化模型，利用不同感测值之间的物理关系，从而减少了净化过程中的噪声。我们的方法不需要clean数据，可以在实时（4毫秒）内进行净化，并且与高精度、高成本的alternative结果高度一致，从而提供了高效、成本效果的净化方案。

Towards probabilistic Weather Forecasting with Conditioned Spatio-Temporal Normalizing Flows

paper_url: http://arxiv.org/abs/2311.06958
repo_url: None
paper_authors: Christina Winkler
for: 这篇论文是为了模型随机空间时间分布而写的。
methods: 这篇论文使用 conditional normalizing flows 来实现随机空间时间模型。
results: 实验表明，这种方法能够捕捉随机空间时间的相关性，并可以在训练时间以外的时间范围内预测。Here’s the English version for reference:
for: This paper is written for modeling multimodal spatial distributions and capturing temporal correlations.
methods: The paper uses conditional normalizing flows to achieve stochastic spatio-temporal modeling.
results: Experiments show that the method can capture spatio-temporal correlations and extrapolate well beyond the training time horizon.

Abstract
Generative normalizing flows are able to model multimodal spatial distributions, and they have been shown to model temporal correlations successfully as well. These models provide several benefits over other types of generative models due to their training stability, invertibility and efficiency in sampling and inference. This makes them a suitable candidate for stochastic spatio-temporal prediction problems, which are omnipresent in many fields of sciences, such as earth sciences, astrophysics or molecular sciences. In this paper, we present conditional normalizing flows for stochastic spatio-temporal modelling. The method is evaluated on the task of daily temperature and hourly geopotential map prediction from ERA5 datasets. Experiments show that our method is able to capture spatio-temporal correlations and extrapolates well beyond the time horizon used during training.

摘要

FLASH-RL: Federated Learning Addressing System and Static Heterogeneity using Reinforcement Learning

paper_url: http://arxiv.org/abs/2311.06917
repo_url: https://github.com/Sofianebouaziz1/FLASH-RL
paper_authors: Sofiane Bouaziz, Hadjer Benmeziane, Youcef Imine, Leila Hamdad, Smail Niar, Hamza Ouarnoughi
for: 这篇论文主要关注的是联合学习（Federated Learning，FL）中的训练执行效率和稳定性，以及如何对应系统和静态不统一性。
methods: 本文提出了一个名为FLASH-RL的框架，使用了双层深度问题学习（Double Deep Q-Learning，DDQL）来解决系统和静态不统一性。此外，本文还引入了一个名为“实验增强”的新的评估函数，以评估客户端的贡献度。
results: 实验结果显示，FLASH-RL可以实现与现有解决方案相对的平衡，即模型性能和终端延迟之间的平衡。具体来说，FLASH-RL可以对MNIST和CIFAR-10数据集进行训练，并且在训练轮次和终端延迟方面实现了24.83%和24.67%的提升。此外，FLASH-RL还可以实现模型性能和训练轮次之间的平衡，并且在滑块检测中实现了2.82%的提升。

Abstract
Federated Learning (FL) has emerged as a promising Machine Learning paradigm, enabling multiple users to collaboratively train a shared model while preserving their local data. To minimize computing and communication costs associated with parameter transfer, it is common practice in FL to select a subset of clients in each training round. This selection must consider both system and static heterogeneity. Therefore, we propose FLASH-RL, a framework that utilizes Double Deep QLearning (DDQL) to address both system and static heterogeneity in FL. FLASH-RL introduces a new reputation-based utility function to evaluate client contributions based on their current and past performances. Additionally, an adapted DDQL algorithm is proposed to expedite the learning process. Experimental results on MNIST and CIFAR-10 datasets have shown FLASH-RL's effectiveness in achieving a balanced trade-off between model performance and end-to-end latency against existing solutions. Indeed, FLASH-RL reduces latency by up to 24.83% compared to FedAVG and 24.67% compared to FAVOR. It also reduces the training rounds by up to 60.44% compared to FedAVG and +76% compared to FAVOR. In fall detection using the MobiAct dataset, FLASH-RL outperforms FedAVG by up to 2.82% in model's performance and reduces latency by up to 34.75%. Additionally, FLASH-RL achieves the target performance faster, with up to a 45.32% reduction in training rounds compared to FedAVG.

摘要
Federated Learning (FL) 已经成为一种有前途的机器学习方法，允许多个用户共同训练共享模型，保留他们的本地数据。为了降低计算和通信成本相关的参数传输，FL 通常会选择每轮训练中的一 subset of clients。这种选择必须考虑系统和静态不同化。因此，我们提议 FLASH-RL，一个基于 Double Deep QLearning (DDQL) 的框架，用于解决 FL 中的系统和静态不同化。FLASH-RL 引入了一个基于客户端表现的声誉基于的用户贡献函数。此外，我们还提出了一种适应 DDQL 算法，以促进学习过程。实验结果表明，FLASH-RL 在 MNIST 和 CIFAR-10 数据集上可以很好地寻求一个平衡的交易off между模型性能和端到端延迟，相比现有的解决方案。具体来说，FLASH-RL 可以降低延迟时间达到 24.83%，相比 FedAVG 和 FAVOR。它还可以降低训练轮数达到 60.44%，相比 FedAVG 和 FAVOR。在 fall detection 中使用 MobiAct 数据集，FLASH-RL 可以在模型性能方面与 FedAVG 相比提高达 2.82%，并降低延迟时间达 34.75%。此外，FLASH-RL 可以更快地实现目标性能，相比 FedAVG 的训练轮数减少达 45.32%。

TSViT: A Time Series Vision Transformer for Fault Diagnosis

paper_url: http://arxiv.org/abs/2311.06916
repo_url: None
paper_authors: Shouhua Zhang, Jiehan Zhou, Xue Ma, Chenglin Wen, Susanna Pirttikangas, Chen Yu, Weishan Zhang, Chunsheng Yang
For: This paper is written for fault diagnosis in mechanical systems using Convolutional Neural Networks (CNNs) and the Time Series Vision Transformer (TSViT) model.* Methods: The TSViT model uses a combination of convolutional layers to segment vibration signals and capture local features, as well as a transformer encoder to learn long-term temporal information.* Results: The experimental results on two distinct datasets show that TSViT achieves an average accuracy of 100% and 99.99% on two test sets, respectively, outperforming other methods in terms of performance, computational complexity, and parameter quantity.

Abstract
Traditional fault diagnosis methods using Convolutional Neural Networks (CNNs) face limitations in capturing temporal features (i.e., the variation of vibration signals over time). To address this issue, this paper introduces a novel model, the Time Series Vision Transformer (TSViT), specifically designed for fault diagnosis. On one hand, TSViT model integrates a convolutional layer to segment vibration signals and capture local features. On the other hand, it employs a transformer encoder to learn long-term temporal information. The experimental results with other methods on two distinct datasets validate the effectiveness and generalizability of TSViT with a comparative analysis of its hyperparameters' impact on model performance, computational complexity, and overall parameter quantity. TSViT reaches average accuracies of 100% and 99.99% on two test sets, correspondingly.

摘要
传统的疲劳诊断方法使用卷积神经网络（CNN）受到时间特征的限制，这限制了其在捕捉振荡信号的变化过程中的表现。为解决这个问题，本文提出了一种新的模型——时间序列视力 трансформер（TSViT），专门用于疲劳诊断。一方面，TSViT模型包含了一层卷积层，用于分割振荡信号并捕捉本地特征。另一方面，它使用变换器编码器来学习长期的时间特征。实验结果表明，TSViT模型在两个不同的数据集上达到了100%和99.99%的平均准确率，对比其他方法的性能表现明显优于。此外，TSViT模型的计算复杂度和总参数数量也被详细分析。

Flames: Benchmarking Value Alignment of Chinese Large Language Models

paper_url: http://arxiv.org/abs/2311.06899
repo_url: None
paper_authors: Kexin Huang, Xiangyang Liu, Qianyu Guo, Tianxiang Sun, Jiawei Sun, Yaru Wang, Zeyang Zhou, Yixu Wang, Yan Teng, Xipeng Qiu, Yingchun Wang, Dahua Lin
for: 本研究旨在评估大语言模型（LLMs）是否与人类价值观align得通。
methods: 本研究提出了首个高度 adversarial benchmark named Flames，包括2,251个手动制作的提问、~18.7K个模型回应、和一个特定的分数器。
results: 根据Flames框架，我们手动制作了逻辑攻击提问，并使用这些提问让主流LLMs回应。我们发现所有评估的LLMs在Flames上表现相对较差，特别是在安全和公平性维度。Claude emerges as the best-performing model overall，但其无害率只有63.08%，而GPT-4的分数只有39.04%。Flames的复杂性已经超越了现有的 benchmark，设置了一个新的挑战 для当代LLMs，并 highlighted the need for further alignment of LLMs。

Abstract
The widespread adoption of large language models (LLMs) across various regions underscores the urgent need to evaluate their alignment with human values. Current benchmarks, however, fall short of effectively uncovering safety vulnerabilities in LLMs. Despite numerous models achieving high scores and 'topping the chart' in these evaluations, there is still a significant gap in LLMs' deeper alignment with human values and achieving genuine harmlessness. To this end, this paper proposes the first highly adversarial benchmark named Flames, consisting of 2,251 manually crafted prompts, ~18.7K model responses with fine-grained annotations, and a specified scorer. Our framework encompasses both common harmlessness principles, such as fairness, safety, legality, and data protection, and a unique morality dimension that integrates specific Chinese values such as harmony. Based on the framework, we carefully design adversarial prompts that incorporate complex scenarios and jailbreaking methods, mostly with implicit malice. By prompting mainstream LLMs with such adversarially constructed prompts, we obtain model responses, which are then rigorously annotated for evaluation. Our findings indicate that all the evaluated LLMs demonstrate relatively poor performance on Flames, particularly in the safety and fairness dimensions. Claude emerges as the best-performing model overall, but with its harmless rate being only 63.08% while GPT-4 only scores 39.04%. The complexity of Flames has far exceeded existing benchmarks, setting a new challenge for contemporary LLMs and highlighting the need for further alignment of LLMs. To efficiently evaluate new models on the benchmark, we develop a specified scorer capable of scoring LLMs across multiple dimensions, achieving an accuracy of 77.4%. The Flames Benchmark is publicly available on https://github.com/AIFlames/Flames.

摘要
广泛的大语言模型（LLM）在不同地区的采用，强调了评估它们与人类价值的调和。现有的标准 however，无法有效发现语言模型的安全漏洞。 despite numerous models achieving high scores and "topping the chart" in these evaluations, there is still a significant gap in LLMs' deeper alignment with human values and achieving genuine harmlessness. To address this issue, this paper proposes the first highly adversarial benchmark named Flames, which includes 2,251 manually crafted prompts, ~18.7K model responses with fine-grained annotations, and a specified scorer. Our framework encompasses both common harmlessness principles, such as fairness, safety, legality, and data protection, and a unique morality dimension that integrates specific Chinese values such as harmony. Based on the framework, we carefully design adversarial prompts that incorporate complex scenarios and jailbreaking methods, mostly with implicit malice. By prompting mainstream LLMs with such adversarially constructed prompts, we obtain model responses, which are then rigorously annotated for evaluation. Our findings indicate that all the evaluated LLMs demonstrate relatively poor performance on Flames, particularly in the safety and fairness dimensions. Claude emerges as the best-performing model overall, but with its harmless rate being only 63.08% while GPT-4 only scores 39.04%. The complexity of Flames has far exceeded existing benchmarks, setting a new challenge for contemporary LLMs and highlighting the need for further alignment of LLMs. To efficiently evaluate new models on the benchmark, we develop a specified scorer capable of scoring LLMs across multiple dimensions, achieving an accuracy of 77.4%. The Flames Benchmark is publicly available on .

Anticipating User Needs: Insights from Design Fiction on Conversational Agents for Computational Thinking

paper_url: http://arxiv.org/abs/2311.06887
repo_url: None
paper_authors: Jacob Penney, João Felipe Pimentel, Igor Steinmacher, Marco A. Gerosa
for: 这篇论文的目的是为了帮助设计一个能够帮助学生学习计算思维和程式设计的聊天机器人。
methods: 这篇论文使用了设计幻想（design fiction）的方法来了解教育导师对于一个基于生成人工智能（genAI）的聊天机器人的需求和预期。
results: 根据这篇论文的结果，导师们希望一个基于genAI的聊天机器人能够运行学生逐步进行运算，并且能够根据学生的学习背景、技能和缺失、以及学习风格来调整帮助方式。

Abstract
Computational thinking, and by extension, computer programming, is notoriously challenging to learn. Conversational agents and generative artificial intelligence (genAI) have the potential to facilitate this learning process by offering personalized guidance, interactive learning experiences, and code generation. However, current genAI-based chatbots focus on professional developers and may not adequately consider educational needs. Involving educators in conceiving educational tools is critical for ensuring usefulness and usability. We enlisted \numParticipants{} instructors to engage in design fiction sessions in which we elicited abilities such a conversational agent supported by genAI should display. Participants envisioned a conversational agent that guides students stepwise through exercises, tuning its method of guidance with an awareness of the educational background, skills and deficits, and learning preferences. The insights obtained in this paper can guide future implementations of tutoring conversational agents oriented toward teaching computational thinking and computer programming.

摘要
We conducted design fiction sessions with \numParticipants{} instructors to explore the abilities that a conversational agent supported by genAI should have. Participants envisioned a conversational agent that guides students step-by-step through exercises, adapting its method of guidance based on the student's educational background, skills, and learning preferences. The insights from this study can inform the development of future tutoring conversational agents that teach computational thinking and computer programming.

Modeling User Viewing Flow using Large Language Models for Article Recommendation

paper_url: http://arxiv.org/abs/2311.07619
repo_url: None
paper_authors: Zhenghao Liu, Zulong Chen, Moufeng Zhang, Shaoyang Duan, Hong Wen, Liangyue Li, Nan Li, Yu Gu, Ge Yu
for: 这 paper 是为了提出一种基于用户常见 preference 和当下兴趣的文章推荐方法（SINGLE），用于解决文章推荐任务。
methods: 该方法使用用户常见视图流模型来概括用户的总体兴趣，并使用 Large Language Models (LLMs) 捕捉用户常见的 preference。此外，它还设计了用户即时视图流模型来建立用户点击文章历史和候选文章之间的互动。
results: 根据在 Alibaba Technology Association (ATA) 网站上的实验结果，SINGLE 方法在 online A/B 测试中获得了2.4% 的提升，并且further 分析表明，SINGLE 方法可以建立更加适合用户的推荐系统，通过模仿用户查看不同文章的行为和推荐更适合用户兴趣的文章。

Abstract
This paper proposes the User Viewing Flow Modeling (SINGLE) method for the article recommendation task, which models the user constant preference and instant interest from user-clicked articles. Specifically, we employ a user constant viewing flow modeling method to summarize the user's general interest to recommend articles. We utilize Large Language Models (LLMs) to capture constant user preferences from previously clicked articles, such as skills and positions. Then we design the user instant viewing flow modeling method to build interactions between user-clicked article history and candidate articles. It attentively reads the representations of user-clicked articles and aims to learn the user's different interest views to match the candidate article. Our experimental results on the Alibaba Technology Association (ATA) website show the advantage of SINGLE, which achieves 2.4% improvements over previous baseline models in the online A/B test. Our further analyses illustrate that SINGLE has the ability to build a more tailored recommendation system by mimicking different article viewing behaviors of users and recommending more appropriate and diverse articles to match user interests.

摘要
Note: Simplified Chinese is also known as "简化字符" or "简体字".Please note that the translation is done using Google Translate and may not be perfect.

Understanding Practices around Computational News Discovery Tools in the Domain of Science Journalism

paper_url: http://arxiv.org/abs/2311.06864
repo_url: None
paper_authors: Sachita Nishal, Jasmine Sinchai, Nicholas Diakopoulos
for: 该论文主要目标是帮助科技新闻记者更加快速地找到新闻灵感，因为他们的工作负担增加，资源减少，科学出版生态系统扩大。
methods: 该论文使用计算机方法来帮助科技新闻记者发现新闻，包括一个交互式工具，用于评估科学新闻的时效性和新闻价值。
results: 研究发现，计算机工具可以帮助科技新闻记者更加快速地找到新闻灵感，但是需要考虑新闻价值、科学背景和社会影响等因素。

Abstract
Science and technology journalists today face challenges in finding newsworthy leads due to increased workloads, reduced resources, and expanding scientific publishing ecosystems. Given this context, we explore computational methods to aid these journalists' news discovery in terms of time-efficiency and agency. In particular, we prototyped three computational information subsidies into an interactive tool that we used as a probe to better understand how such a tool may offer utility or more broadly shape the practices of professional science journalists. Our findings highlight central considerations around science journalists' agency, context, and responsibilities that such tools can influence and could account for in design. Based on this, we suggest design opportunities for greater and longer-term user agency; incorporating contextual, personal and collaborative notions of newsworthiness; and leveraging flexible interfaces and generative models. Overall, our findings contribute a richer view of the sociotechnical system around computational news discovery tools, and suggest ways to improve such tools to better support the practices of science journalists.

摘要
现代科技期刊记者面临着新闻发现的挑战，这主要归结于增加的工作负担、减少的资源以及科学出版生态系统的扩展。为了解决这些问题，我们研究了计算机方法来帮助记者新闻发现，以提高效率和使用者感受。具体来说，我们将三种计算机信息补充变为一个互动工具，用于更好地理解这种工具如何提供用户价值，以及如何改进这种工具以更好地支持专业科技记者的实践。我们的发现表明，计算机新闻发现工具的设计应该考虑科技记者的决策权、上下文和责任，并且应该包括个人、社交和协作的新闻可能性。基于这些发现，我们建议设计者可以通过提供更多的用户参与、个性化的新闻可能性和灵活的界面来提高用户参与度和持续时间。总之，我们的发现为计算机新闻发现工具的设计提供了更加丰富的社会技术系统视角，并提供了改进这种工具以更好地支持专业科技记者的方法。

Can Large Language Models Augment a Biomedical Ontology with missing Concepts and Relations?

paper_url: http://arxiv.org/abs/2311.06858
repo_url: https://github.com/minitour/ontology-extension-chatgpt
paper_authors: Antonio Zaitoun, Tomer Sagi, Szymon Wilk, Mor Peleg
for: 扩展现有 ontology 中的概念和关系
methods: 使用大型自然语言模型 (LLM) 和对话交互来自动扩展 ontology
results: 对 clinical practice guidelines (CPGs) 进行分析，检测不在 SNOMED-CT 中的新医学概念和关系

Abstract
Ontologies play a crucial role in organizing and representing knowledge. However, even current ontologies do not encompass all relevant concepts and relationships. Here, we explore the potential of large language models (LLM) to expand an existing ontology in a semi-automated fashion. We demonstrate our approach on the biomedical ontology SNOMED-CT utilizing semantic relation types from the widely used UMLS semantic network. We propose a method that uses conversational interactions with an LLM to analyze clinical practice guidelines (CPGs) and detect the relationships among the new medical concepts that are not present in SNOMED-CT. Our initial experimentation with the conversational prompts yielded promising preliminary results given a manually generated gold standard, directing our future potential improvements.

摘要
（注意：以下是简化中文，不同于正式中文） ontology 是知识组织和表示的关键作用，但当前 ontology 并不包括所有相关的概念和关系。我们探讨了使用大型自然语言模型（LLM）来扩展现有 ontology 的方法。我们使用 UMLS semantic network 中的 semantic relation types，并在 SNOMED-CT 生物医学 ontology 中进行了实验。我们提议一种使用 conversational interactions 与 LLM 分析临床实践指南 (CPGs)，检测不在 SNOMED-CT 中的新医疗概念之间的关系。我们的初步实验表明，使用 conversational prompts 可以获得有价值的初步结果，引导我们未来的可能的改进。

On learning spatial sequences with the movement of attention

paper_url: http://arxiv.org/abs/2311.06856
repo_url: None
paper_authors: Viacheslav M. Osaulenko
for: 本研究的目的是解释人类如何通过视觉经验recognize不同的身体运动，以及在不同感知modalities中存在固有的空间序列表示。
methods: 本研究使用新的数学表示方法，提出了对于空间序列的抽象层次结构，并提出了两个假设来解释这种抽象的形成。
results: 研究发现，人类认知中的注意力运动是关键，并且可以应用到新的学习算法中。通过对 redundancy的处理，人类可以更好地recognize和泛化不同的身体运动。

Abstract
In this paper we start with a simple question, how is it possible that humans can recognize different movements over skin with only a prior visual experience of them? Or in general, what is the representation of spatial sequences that are invariant to scale, rotation, and translation across different modalities? To answer, we rethink the mathematical representation of spatial sequences, argue against the minimum description length principle, and focus on the movements of attention. We advance the idea that spatial sequences must be represented on different levels of abstraction, this adds redundancy but is necessary for recognition and generalization. To address the open question of how these abstractions are formed we propose two hypotheses: the first invites exploring selectionism learning, instead of finding parameters in some models; the second proposes to find new data structures, not neural network architectures, to efficiently store and operate over redundant features to be further selected. Movements of attention are central to human cognition and lessons should be applied to new better learning algorithms.

摘要
在这篇论文中，我们开始于一个简单的问题：人类如何通过视觉经验 alone 识别不同的身体运动？或者更一般来说，如何表示不同模式之间的空间序列是具有扩展、旋转和平移不变性的？为了回答这个问题，我们重新思考了空间序列的数学表示方式，反对最小描述长度原则，并将注意力的运动作为中心。我们提出了两个假设：第一个是探索选择主义学习，而不是找到某些模型中的参数；第二个是找到更好的存储和运算缓存重复特征的数据结构，以便进一步选择。我们认为 spatial sequences 必须在不同的层次上表示，这会添加冗余，但是是必要的 для认知和泛化。为了解决如何形成这些抽象，我们提出了两个假设：第一个是通过选择主义学习来形成抽象；第二个是找到更好的学习算法，以便更有效地存储和运算重复特征。我们认为人类认知中的运动是中心，我们应该从这里学习新的更好的学习算法。

Distribution Re-weighting and Voting Paradoxes

paper_url: http://arxiv.org/abs/2311.06840
repo_url: None
paper_authors: Bijan Mazaheri, Siddharth Jain, Matthew Cook, Jehoshua Bruck
for: 本研究探讨了域专家知识的分布偏移问题，即训练只限于特定标签 subsets。
methods: 研究使用标准分布偏移策略，包括数据重新权重，以及常见 causal inference 调整。
results: 研究发现，标准分布偏移策略可能导致域专家之间的反面互相矛盾，同时也与选举 preference 中的假设相似。

Abstract
We explore a specific type of distribution shift called domain expertise, in which training is limited to a subset of all possible labels. This setting is common among specialized human experts, or specific focused studies. We show how the standard approach to distribution shift, which involves re-weighting data, can result in paradoxical disagreements among differing domain expertise. We also demonstrate how standard adjustments for causal inference lead to the same paradox. We prove that the characteristics of these paradoxes exactly mimic another set of paradoxes which arise among sets of voter preferences.

摘要
我们研究一种特定的分布偏移问题，即领域专家知识，在培训中仅限于一 subset of all possible labels。这种情况常见于专业人士或特定领域研究。我们显示了标准的分布偏移方法，即重新权重数据，可能导致不同领域专家之间的意见不一致。我们也示出了标准的 causal inference 调整也会导致同样的парадок。我们证明了这些 парадок 的特征与另一种来自选民偏好的集合中的 парадок 完全相同。

Open-Set Graph Anomaly Detection via Normal Structure Regularisation

paper_url: http://arxiv.org/abs/2311.06835
repo_url: None
paper_authors: Qizhou Wang, Guansong Pang, Mahsa Salehi, Wray Buntine, Christopher Leckie
for:这篇论文针对的是 Graph Anomaly Detection (GAD) 任务，具体来说是开放集 GAD，这种任务的目标是通过一小量标注正常和异常节点（称为 seen anomalies）来检测图像中的异常节点。methods:这篇论文提出了一种新的开放集 GAD方法，即normal structure regularization (NSReg)，该方法利用标注节点中的正常图 структуры来解决现有方法过于强调seen anomalies，导致检测未看到异常节点的问题。results:实验结果表明，NSReg 在实际世界数据集上具有superiority，可以更好地检测图像中的异常节点。

Abstract
This paper considers an under-explored Graph Anomaly Detection (GAD) task, namely open-set GAD, which aims to detect anomalous nodes using a small number of labelled training normal and anomaly nodes (known as seen anomalies) that cannot illustrate all possible inference-time abnormalities. The task has attracted growing attention due to the availability of anomaly prior knowledge from the label information that can help to substantially reduce detection errors. However, current methods tend to over-emphasise fitting the seen anomalies, leading to a weak generalisation ability to detect unseen anomalies, i.e., those that are not illustrated by the labelled anomaly nodes. Further, they were introduced to handle Euclidean data, failing to effectively capture important non-Euclidean features for GAD. In this work, we propose a novel open-set GAD approach, namely normal structure regularisation (NSReg), to leverage the rich normal graph structure embedded in the labelled nodes to tackle the aforementioned two issues. In particular, NSReg trains an anomaly-discriminative supervised graph anomaly detector, with a plug-and-play regularisation term to enforce compact, semantically-rich representations of normal nodes. To this end, the regularisation is designed to differentiate various types of normal nodes, including labelled normal nodes that are connected in their local neighbourhood, and those that are not connected. By doing so, it helps incorporate strong normality into the supervised anomaly detector learning, mitigating their overfitting to the seen anomalies. Extensive empirical results on real-world datasets demonstrate the superiority of our proposed NSReg for open-set GAD.

摘要
To address these issues, we propose a novel open-set GAD approach called normal structure regularization (NSReg). NSReg leverages the rich normal graph structure embedded in the labeled nodes to improve the detection of anomalies. Specifically, NSReg trains an anomaly-discriminative supervised graph anomaly detector with a plug-and-play regularization term that enforces compact, semantically-rich representations of normal nodes. The regularization differentiates various types of normal nodes, including labeled normal nodes that are connected in their local neighborhood and those that are not connected. This helps incorporate strong normality into the supervised anomaly detector learning, mitigating overfitting to the seen anomalies.Extensive empirical results on real-world datasets demonstrate the superiority of our proposed NSReg for open-set GAD.

Fairness Hacking: The Malicious Practice of Shrouding Unfairness in Algorithms

paper_url: http://arxiv.org/abs/2311.06826
repo_url: None
paper_authors: Kristof Meding, Thilo Hagendorff
for: 本研究旨在探讨如何透过“公平黑客”的做法，对Algorithmic discrimination进行隐藏。
methods: 本研究使用了两种不同的“公平黑客”分类：一是内部公平黑客（misuse of a particular metric by adding or removing sensitive attributes from the analysis），二是间接公平黑客（search for a specific fair metric with given attributes）。
results: 本研究使用了真实数据来示范两种“公平黑客”的存在，并说明了这些“公平黑客”可能对于End-users和广泛的AI实践Community造成的伤害。

Abstract
Fairness in machine learning (ML) is an ever-growing field of research due to the manifold potential for harm from algorithmic discrimination. To prevent such harm, a large body of literature develops new approaches to quantify fairness. Here, we investigate how one can divert the quantification of fairness by describing a practice we call "fairness hacking" for the purpose of shrouding unfairness in algorithms. This impacts end-users who rely on learning algorithms, as well as the broader community interested in fair AI practices. We introduce two different categories of fairness hacking in reference to the established concept of p-hacking. The first category, intra-metric fairness hacking, describes the misuse of a particular metric by adding or removing sensitive attributes from the analysis. In this context, countermeasures that have been developed to prevent or reduce p-hacking can be applied to similarly prevent or reduce fairness hacking. The second category of fairness hacking is inter-metric fairness hacking. Inter-metric fairness hacking is the search for a specific fair metric with given attributes. We argue that countermeasures to prevent or reduce inter-metric fairness hacking are still in their infancy. Finally, we demonstrate both types of fairness hacking using real datasets. Our paper intends to serve as a guidance for discussions within the fair ML community to prevent or reduce the misuse of fairness metrics, and thus reduce overall harm from ML applications.

摘要
机器学习（ML）中的公平是一个不断成长的研究领域，因为算法歧视可能导致严重的伤害。为防止这种伤害，一大量的文献开发了新的方法来量化公平。在这篇论文中，我们调查了如何将公平量化歪曲到算法中，以及这对终端用户和更广泛的AI实践社区的影响。我们分别将这种歪曲分为两个类别：在内部度量公平歪曲中，我们添加或删除敏感特征，以影响分析结果。在这种情况下，已经发展了预防或减少p-hacking的对策，可以类似地适用于预防或减少公平歪曲。第二个类别是between度量公平歪曲，它是搜寻特定的公平度量器，以满足给定的属性。我们认为预防或减少between度量公平歪曲的对策仍在起步阶段。最后，我们使用实际数据示出了这两种公平歪曲的示例。我们的论文旨在为公平ML社区提供指南，以预防或减少公平度量器的歪曲，并因此减少ML应用中的伤害。

Training A Multi-stage Deep Classifier with Feedback Signals

paper_url: http://arxiv.org/abs/2311.06823
repo_url: None
paper_authors: Chao Xu, Yu Yang, Rongzhao Wang, Guan Wang, Bojia Lin
for: 这篇论文是针对多阶段分类器（Multi-Stage Classifier，MSC）的训练框架，尤其是两阶段二元分类的情况。
methods: 本文提出了一个新的训练框架，名为反向训练（Feedback Training），它在实际运行的顺序反之训练每个阶段的分类器，并使用后期阶段的分类器来导引初期阶段的分类器训练。
results: 实验结果显示，提案的训练框架在几个 shot 训练情况下具有优秀的表现，并且在实际应用中具有很好的适用性。

Abstract
Multi-Stage Classifier (MSC) - several classifiers working sequentially in an arranged order and classification decision is partially made at each step - is widely used in industrial applications for various resource limitation reasons. The classifiers of a multi-stage process are usually Neural Network (NN) models trained independently or in their inference order without considering the signals from the latter stages. Aimed at two-stage binary classification process, the most common type of MSC, we propose a novel training framework, named Feedback Training. The classifiers are trained in an order reverse to their actual working order, and the classifier at the later stage is used to guide the training of initial-stage classifier via a sample weighting method. We experimentally show the efficacy of our proposed approach, and its great superiority under the scenario of few-shot training.

摘要
多阶段分类器（MSC）是在工业应用中广泛使用的，它是一种多个分类器在预先定义的顺序中工作，并在每个阶段中做出部分分类决策。多阶段分类器的每个阶段通常使用独立地或在推理阶段无视后续阶段的神经网络（NN）模型进行训练。我们对二阶段二分类过程进行了改进，并提出了一种新的训练框架，名为反向培训。在这种框架中，分类器们在它们的实际工作顺序相反的顺序进行训练，并使用后期阶段的分类器来引导初始阶段分类器的训练，通过一种权重方法。我们通过实验证明了我们的提议的优势，特别在几个步骤训练的场景下表现出色。

Dual-Branch Reconstruction Network for Industrial Anomaly Detection with RGB-D Data

paper_url: http://arxiv.org/abs/2311.06797
repo_url: None
paper_authors: Chenyang Bi, Yueyang Li, Haichi Luo
for: 这个论文主要针对的是产业异常检测领域中的无监控异常检测方法，并且强调了3D点云和RGB图像的多 modal 检测。
methods: 本论文提出了一个轻量级的双支分支重建网络（DBRN），利用RGB-D输入，通过学习决策界面来区分正常和异常的示例。此外，本论文还引入了一个重要性分配模组，以帮助将这两种模式的特征融合，从而获得全面的决策结果。
results: 根据MVTec 3D-AD dataset的评估结果，DBRN可以 дости得92.8% AUROC的高准确率，而且不需要大量的预训数据和快取库。

Abstract
Unsupervised anomaly detection methods are at the forefront of industrial anomaly detection efforts and have made notable progress. Previous work primarily used 2D information as input, but multi-modal industrial anomaly detection based on 3D point clouds and RGB images is just beginning to emerge. The regular approach involves utilizing large pre-trained models for feature representation and storing them in memory banks. However, the above methods require a longer inference time and higher memory usage, which cannot meet the real-time requirements of the industry. To overcome these issues, we propose a lightweight dual-branch reconstruction network(DBRN) based on RGB-D input, learning the decision boundary between normal and abnormal examples. The requirement for alignment between the two modalities is eliminated by using depth maps instead of point cloud input. Furthermore, we introduce an importance scoring module in the discriminative network to assist in fusing features from these two modalities, thereby obtaining a comprehensive discriminative result. DBRN achieves 92.8% AUROC with high inference efficiency on the MVTec 3D-AD dataset without large pre-trained models and memory banks.

摘要
“无监督异常检测方法在工业中领先，它们已经做出了杰出的进步。以往的工作主要使用2D信息作为输入，但是基于3D点Cloud和RGB图像的多modal工业异常检测则刚开始出现。常规的方法是利用大型预训模型来表示特征，并将其储存在内存库中。但这些方法需要较长的推论时间和更高的内存使用，这不能满足工业的实时需求。为了解决这些问题，我们提出了一个轻量级双支架网络（DBRN），基于RGB-D输入，学习决策界面 между正常和异常的例子。depth maps而不是点Cloud输入，从而消除了两个模式之间的对齐需求。此外，我们引入了优先级分配模组，以帮助将这两个模式的特征融合，从而获得了全面的决策结果。DBRN在MVTec 3D-AD dataset上 achieve 92.8% AUROC，并且具有高推论效率和低资源需求。”

Alleviating Behavior Data Imbalance for Multi-Behavior Graph Collaborative Filtering

paper_url: http://arxiv.org/abs/2311.06777
repo_url: None
paper_authors: Yijie Zhang, Yuanchen Bei, Shiqi Yang, Hao Chen, Zhiqing Li, Lijia Chen, Feiran Huang
for: 提高多行为推荐的性能，解决多行为数据不均衡问题。
methods: 利用多任务学习框架进行多行为图建议，对缺乏数据的行为进行 represencing学习以提高推荐性能。
results: 在两个常用的多行为数据集上实现了IMGCF模型的有效性。

Abstract
Graph collaborative filtering, which learns user and item representations through message propagation over the user-item interaction graph, has been shown to effectively enhance recommendation performance. However, most current graph collaborative filtering models mainly construct the interaction graph on a single behavior domain (e.g. click), even though users exhibit various types of behaviors on real-world platforms, including actions like click, cart, and purchase. Furthermore, due to variations in user engagement, there exists an imbalance in the scale of different types of behaviors. For instance, users may click and view multiple items but only make selective purchases from a small subset of them. How to alleviate the behavior imbalance problem and utilize information from the multiple behavior graphs concurrently to improve the target behavior conversion (e.g. purchase) remains underexplored. To this end, we propose IMGCF, a simple but effective model to alleviate behavior data imbalance for multi-behavior graph collaborative filtering. Specifically, IMGCF utilizes a multi-task learning framework for collaborative filtering on multi-behavior graphs. Then, to mitigate the data imbalance issue, IMGCF improves representation learning on the sparse behavior by leveraging representations learned from the behavior domain with abundant data volumes. Experiments on two widely-used multi-behavior datasets demonstrate the effectiveness of IMGCF.

摘要
graph collaborative filtering，通过message propagation over the user-item interaction graph来学习用户和物品表示，已经能够有效提高推荐性能。然而，当前大多数图共同 Filtering模型都是基于单一的行为Domain（例如，点击）构建交互图，即使用户在现实世界平台上表现出多种行为，例如点击、购物车和购买。此外，由于用户参与度的变化，存在不同类型行为之间的数据不均衡问题。例如，用户可能会对多个物品进行点击和浏览，但只是从一小部分中进行选择性购买。为了解决这个问题并利用多个行为图同时提高目标行为转化（例如，购买），我们提出了IMGCF模型。specifically, IMGCF使用多任务学习框架 для图共同 Filtering on multi-behavior graphs。然后，为了缓解数据不均衡问题，IMGCF在稀薄的行为上提高表示学习，通过使用行为Domain中充满数据量的表示学习来增强表示学习。实验表明，IMGCF在两个广泛使用的多种行为dataset上显示出效果。

ChatAnything: Facetime Chat with LLM-Enhanced Personas

paper_url: http://arxiv.org/abs/2311.06772
repo_url: https://github.com/zhoudaquan/ChatAnything
paper_authors: Yilin Zhao, Xinbin Yuan, Shanghua Gao, Zhijie Lin, Qibin Hou, Jiashi Feng, Daquan Zhou
for: 这个技术报告targets generating anthropomorphized personas for LLM-based characters in an online manner, including visual appearance, personality, and tones, using only text descriptions.
methods: authors propose two novel concepts, the mixture of voices (MoV) and the mixture of diffusers (MoD), for diverse voice and appearance generation. They also utilize in-context learning capability of LLMs for personality generation and incorporate pixel-level guidance to infuse human face landmarks during the image generation phase.
results: the proposed framework, ChatAnything, can animate anything with any personas that are anthropomorphic using just a few text inputs. The authors also report a significant increase in the detection rate of face landmarks, from 57.0% to 92.5%, allowing for automatic face animation based on generated speech content.

Abstract
In this technical report, we target generating anthropomorphized personas for LLM-based characters in an online manner, including visual appearance, personality and tones, with only text descriptions. To achieve this, we first leverage the in-context learning capability of LLMs for personality generation by carefully designing a set of system prompts. We then propose two novel concepts: the mixture of voices (MoV) and the mixture of diffusers (MoD) for diverse voice and appearance generation. For MoV, we utilize the text-to-speech (TTS) algorithms with a variety of pre-defined tones and select the most matching one based on the user-provided text description automatically. For MoD, we combine the recent popular text-to-image generation techniques and talking head algorithms to streamline the process of generating talking objects. We termed the whole framework as ChatAnything. With it, users could be able to animate anything with any personas that are anthropomorphic using just a few text inputs. However, we have observed that the anthropomorphic objects produced by current generative models are often undetectable by pre-trained face landmark detectors, leading to failure of the face motion generation, even if these faces possess human-like appearances because those images are nearly seen during the training (e.g., OOD samples). To address this issue, we incorporate pixel-level guidance to infuse human face landmarks during the image generation phase. To benchmark these metrics, we have built an evaluation dataset. Based on it, we verify that the detection rate of the face landmark is significantly increased from 57.0% to 92.5% thus allowing automatic face animation based on generated speech content. The code and more results can be found at https://chatanything.github.io/.

摘要
在这份技术报告中，我们目标是通过文本描述生成LLM基于人物的人形化人物，包括视觉外表、性格和语言口音。为达到这一目标，我们首先利用LLM在场景学习中的增强学习能力进行人格生成，通过精心设计系统提示。然后，我们提出了两项新概念：混合声音（MoV）和混合扩散器（MoD），用于多样化的声音和外表生成。对MoV，我们利用文本识别算法中的多种预定的声音，并根据用户提供的文本描述自动选择最符合的一个。对MoD，我们结合了最新的文本识别技术和说话头算法，以便更加方便地生成说话的对象。我们将整个框架称为ChatAnything。通过它，用户可以使用只有几个文本输入来animate任何东西，并且可以为这些人物选择任何人形化。然而，我们发现当前的生成模型中的人形对象经常无法被预训练的面部准 markers检测器探测出来，导致对话动画失败，即使这些面部具有人类化的外观，因为这些图像在训练时未被考虑（例如，OD）。为解决这个问题，我们在生成图像阶段添加像素级指导，以启用人类面部准 markers。根据这些指标，我们建立了评估集。根据它，我们证明了面部准 markers检测率的提高，从57.0%提高到92.5%，因此允许基于生成的语音内容自动进行面部动画。代码和更多结果可以在https://chatanything.github.io/查看。

Learning Globally Optimized Language Structure via Adversarial Training

paper_url: http://arxiv.org/abs/2311.06771
repo_url: None
paper_authors: Xuwang Yin
for: 提高文本生成能力
methods: 使用反对敌攻击策略来训练EBM
results: 实验结果表明，该方法可以明显提高文本生成质量，比过去方法更有 Promise。主要贡献包括： + 针对文本生成的反对敌攻击策略，生成负样本以外的潜在模式 + 基于反对敌攻击的EBM训练算法 + 对文本生成任务的实验验证

Abstract
Recent work has explored integrating autoregressive language models with energy-based models (EBMs) to enhance text generation capabilities. However, learning effective EBMs for text is challenged by the discrete nature of language. This work proposes an adversarial training strategy to address limitations in prior efforts. Specifically, an iterative adversarial attack algorithm is presented to generate negative samples for training the EBM by perturbing text from the autoregressive model. This aims to enable the EBM to suppress spurious modes outside the support of the data distribution. Experiments on an arithmetic sequence generation task demonstrate that the proposed adversarial training approach can substantially enhance the quality of generated sequences compared to prior methods. The results highlight the promise of adversarial techniques to improve discrete EBM training. Key contributions include: (1) an adversarial attack strategy tailored to text to generate negative samples, circumventing MCMC limitations; (2) an adversarial training algorithm for EBMs leveraging these attacks; (3) empirical validation of performance improvements on a sequence generation task.

摘要

An adversarial attack strategy tailored to text to generate negative samples, overcoming the limitations of Markov chain Monte Carlo (MCMC) methods.2. An adversarial training algorithm for EBMs leveraging these attacks.3. Empirical validation of performance improvements on a sequence generation task.

Evaluating the Efficacy of Interactive Language Therapy Based on LLM for High-Functioning Autistic Adolescent Psychological Counseling

paper_url: http://arxiv.org/abs/2311.09243
repo_url: None
paper_authors: Yujin Cho, Mingeon Kim, Seojin Kim, Oyun Kwon, Ryan Donghan Kwon, Yoonha Lee, Dohyun Lim
for: 这个研究旨在评估大自然语言模型（LLM）在临床语言治疗中的有效性，以帮助高功能自闭症青少年。
methods: 这个研究使用了一种特制的评价表卡，由临床心理学家和心理医生组成的评审组对LLM的表现进行评估。
results: 研究发现，LLM在临床语言治疗中表现出了强大的同情和适应能力，但是在人类治疗师的深度情感理解和个性化方面存在挑战。

Abstract
This study investigates the efficacy of Large Language Models (LLMs) in interactive language therapy for high-functioning autistic adolescents. With the rapid advancement of artificial intelligence, particularly in natural language processing, LLMs present a novel opportunity to augment traditional psychological counseling methods. This research primarily focuses on evaluating the LLM's ability to engage in empathetic, adaptable, and contextually appropriate interactions within a therapeutic setting. A comprehensive evaluation was conducted by a panel of clinical psychologists and psychiatrists using a specially developed scorecard. The assessment covered various aspects of the LLM's performance, including empathy, communication skills, adaptability, engagement, and the ability to establish a therapeutic alliance. The study avoided direct testing with patients, prioritizing privacy and ethical considerations, and instead relied on simulated scenarios to gauge the LLM's effectiveness. The results indicate that LLMs hold significant promise as supportive tools in therapy, demonstrating strengths in empathetic engagement and adaptability in conversation. However, challenges in achieving the depth of personalization and emotional understanding characteristic of human therapists were noted. The study also highlights the importance of ethical considerations in the application of AI in therapeutic contexts. This research provides valuable insights into the potential and limitations of using LLMs in psychological counseling for autistic adolescents. It lays the groundwork for future explorations into AI's role in mental health care, emphasizing the need for ongoing development to enhance the capabilities of these models in therapeutic settings.

摘要
A comprehensive evaluation was conducted by a panel of clinical psychologists and psychiatrists using a specially developed scorecard. The assessment covered various aspects of the LLM's performance, including empathy, communication skills, adaptability, engagement, and the ability to establish a therapeutic alliance. The study avoided direct testing with patients, prioritizing privacy and ethical considerations, and instead relied on simulated scenarios to gauge the LLM's effectiveness.The results indicate that LLMs show great promise as supportive tools in therapy, excelling in empathetic engagement and adaptability in conversation. However, the study also noted challenges in achieving the depth of personalization and emotional understanding that are characteristic of human therapists. The research highlights the importance of ethical considerations in the application of AI in therapeutic contexts.This study provides valuable insights into the potential and limitations of using LLMs in psychological counseling for autistic adolescents. It lays the groundwork for future explorations into AI's role in mental health care, emphasizing the need for ongoing development to enhance the capabilities of these models in therapeutic settings.

Large Language Models’ Understanding of Math: Source Criticism and Extrapolation

paper_url: http://arxiv.org/abs/2311.07618
repo_url: None
paper_authors: Roozbeh Yousefzadeh, Xuenan Cao
for: 这篇论文是否提出了GPT-4模型是否具有数学理解能力的问题？
methods: 作者使用的方法是通过制作一些数学问题，其正式证明不可通过网络找到，以评估GPT-4模型是否具有数学理解能力。
results: 研究发现，GPT-4模型无法解决这些简单的数学问题，这casts doubt on whether GPT-4 has acquired an understanding of even basic mathematical concepts。

Abstract
It has been suggested that large language models such as GPT-4 have acquired some form of understanding beyond the correlations among the words in text including some understanding of mathematics as well. Here, we perform a critical inquiry into this claim by evaluating the mathematical understanding of the GPT-4 model. Considering that GPT-4's training set is a secret, it is not straightforward to evaluate whether the model's correct answers are based on a mathematical understanding or based on replication of proofs that the model has seen before. We specifically craft mathematical questions which their formal proofs are not readily available on the web, proofs that are more likely not seen by the GPT-4. We see that GPT-4 is unable to solve those problems despite their simplicity. It is hard to find scientific evidence suggesting that GPT-4 has acquired an understanding of even basic mathematical concepts. A straightforward way to find failure modes of GPT-4 in theorem proving is to craft questions where their formal proofs are not available on the web. Our finding suggests that GPT-4's ability is to reproduce, rephrase, and polish the mathematical proofs that it has seen before, and not in grasping mathematical concepts. We also see that GPT-4's ability to prove mathematical theorems is continuously expanding over time despite the claim that it is a fixed model. We suggest that the task of proving mathematical theorems in formal language is comparable to the methods used in search engines such as Google while predicting the next word in a sentence may be a misguided approach, a recipe that often leads to excessive extrapolation and eventual failures. Prompting the GPT-4 over and over may benefit the GPT-4 and the OpenAI, but we question whether it is valuable for machine learning or for theorem proving.

摘要
它已被建议大型语言模型如GPT-4已经获得了一些形式的理解，包括数学方面的理解。在这篇文章中，我们进行了一个批判性的评价，以评估GPT-4模型的数学理解能力。由于GPT-4的训练集是机密的，因此不可能直接评估模型是否基于数学理解或是基于复制证明的。我们专门设计了一些没有正式证明的数学问题，以验证GPT-4是否具备数学理解能力。我们发现GPT-4无法解决这些问题，即使它们的简单程度。这 suggeSTS that GPT-4没有获得基本数学概念的理解。我们的发现显示GPT-4的能力是重复、重写和精致化已经看过的数学证明，而不是具备数学概念的理解。此外，我们发现GPT-4的数学证明能力随时间的推移而增长，这与 claim 的固定模型不符。我们建议使用正式语言进行数学证明，而不是预测下一个字的方法，这种方法通常会导致不必要的推理和最终失败。重复提示GPT-4可能对GPT-4和OpenAI有利，但我们问到是这种方法对机器学习或数学证明有价值。

ReIDTracker Sea: the technical report of BoaTrack and SeaDronesSee-MOT challenge at MaCVi of WACV24

paper_url: http://arxiv.org/abs/2311.07616
repo_url: None
paper_authors: Kaer Huang, Weitu Chong
for: solves the problem of multi-object tracking in maritime unmanned aerial vehicles (UAVs) and unmanned surface vehicles (USVs) usage scenarios, with a completely unsupervised approach.
methods: uses instance representation learning by self-supervision on ImageNet, and cooperates with high-quality detectors to complete the multi-target tracking task simply and efficiently.
results: achieved top 3 performance on both UAV-based Multi-Object Tracking with Reidentification and USV-based Multi-Object Tracking benchmarks, and won the championship in many multiple Multi-Object Tracking competitions, such as BDD100K MOT, MOTS, and Waymo 2D MOT.

Abstract
Multi-Object Tracking is one of the most important technologies in maritime computer vision. Our solution tries to explore Multi-Object Tracking in maritime Unmanned Aerial vehicles (UAVs) and Unmanned Surface Vehicles (USVs) usage scenarios. Most of the current Multi-Object Tracking algorithms require complex association strategies and association information (2D location and motion, 3D motion, 3D depth, 2D appearance) to achieve better performance, which makes the entire tracking system extremely complex and heavy. At the same time, most of the current Multi-Object Tracking algorithms still require video annotation data which is costly to obtain for training. Our solution tries to explore Multi-Object Tracking in a completely unsupervised way. The scheme accomplishes instance representation learning by using self-supervision on ImageNet. Then, by cooperating with high-quality detectors, the multi-target tracking task can be completed simply and efficiently. The scheme achieved top 3 performance on both UAV-based Multi-Object Tracking with Reidentification and USV-based Multi-Object Tracking benchmarks and the solution won the championship in many multiple Multi-Object Tracking competitions. such as BDD100K MOT,MOTS, Waymo 2D MOT

摘要
多bject 跟踪是海上计算机视觉中最重要的技术之一。我们的解决方案尝试在海上无人飞机（UAV）和无人水面车（USV）使用场景中探索多bject 跟踪。现有的多bject 跟踪算法大多需要复杂的关联策略和关联信息（2D位置和运动、3D运动、3D深度、2D外观）以实现更好的性能，这使整个跟踪系统变得极其复杂和重量。同时，大多数现有的多bject 跟踪算法仍需要视频注释数据进行训练，这是昂贵的。我们的解决方案尝试在无监督的情况下实现多bject 跟踪。我们使用 ImageNet 上的自我超级vised 学习实现实例表示学习，然后与高质量的探测器合作，完成了多bject 跟踪任务，这种方法简单、高效。我们的方案在 UAV-based 多bject 跟踪与标识和 USV-based 多bject 跟踪标准套件上达到了前三名的表现，并在多个多bject 跟踪竞赛中获得了冠军。例如 BDD100K MOT、MOTS、Waymo 2D MOT 等。

Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data

paper_url: http://arxiv.org/abs/2311.06753
repo_url: None
paper_authors: Yassir Fathullah, Chunyang Wu, Egor Lakomkin, Junteng Jia, Yuan Shangguan, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer
for: 延伸LLM模型，提供整体的语音处理和推理能力，无需使用精心挑选的对应数据。
methods: 使用音频提示代替文本，维护广泛的LLM功能，并能够交换文本和音频模式，利用对话的前Context提供更好的结果。
results: 实验显示，我们的端到端方法与或超越箱式系统（音频识别器+LLM）在响应提示的模型化方面具有相同或更好的性能。

Abstract
In this work, we extend the instruction-tuned Llama-2 model with end-to-end general-purpose speech processing and reasoning abilities while maintaining the wide range of LLM capabilities, without using any carefully curated paired data. The proposed model can utilize audio prompts as a replacement for text and sustain a conversation. Such a model also has extended cross-modal capabilities such as being able to perform speech question answering, speech translation, and audio summarization amongst many other closed and open-domain tasks. This is unlike prior approaches in speech, in which LLMs are extended to handle audio for a limited number of pre-designated tasks. Experiments show that our end-to-end approach is on par with or outperforms a cascaded system (speech recognizer + LLM) in terms of modeling the response to a prompt. Furthermore, unlike a cascade, our approach shows the ability to interchange text and audio modalities and utilize the prior context in a conversation to provide better results.

摘要
在这项工作中，我们扩展了 LLama-2 模型，以涉及总体语音处理和推理能力，而无需使用任何精心准备的对应数据。我们提议的模型可以使用音频提示而不是文本进行对话，并且可以实现多modal功能，如语音问答、语音翻译和音频摘要等多种关闭和开放领域任务。这与之前的语音处理方法不同， LLMs 被扩展以处理音频，仅用于限定数量的预定义任务。实验显示，我们的末端方法与或超过简单系统（语音识别器 + LLM）在回快提示的模型化方面。此外，我们的方法还能够交换文本和音频模式，并使用对话中的先前 контекст提供更好的结果。

Federated Learning for Generalization, Robustness, Fairness: A Survey and Benchmark

paper_url: http://arxiv.org/abs/2311.06750
repo_url: https://github.com/wenkehuang/marsfl
paper_authors: Wenke Huang, Mang Ye, Zekun Shi, Guancheng Wan, He Li, Bo Du, Qiang Yang
for: 本文提供了一个系统性的报告 Federated Learning 的最新研究发展。
methods: 本文综述了 Federated Learning 的三大研究方向：泛化、稳定性和公平性，并介绍了它们的背景概念、任务设定和主要挑战。
results: 本文对一些知名数据集进行了抽查，并提供了一个公共网站（https://github.com/WenkeHuang/MarsFL）以跟踪这个快速发展的领域的最新进展。

Abstract
Federated learning has emerged as a promising paradigm for privacy-preserving collaboration among different parties. Recently, with the popularity of federated learning, an influx of approaches have delivered towards different realistic challenges. In this survey, we provide a systematic overview of the important and recent developments of research on federated learning. Firstly, we introduce the study history and terminology definition of this area. Then, we comprehensively review three basic lines of research: generalization, robustness, and fairness, by introducing their respective background concepts, task settings, and main challenges. We also offer a detailed overview of representative literature on both methods and datasets. We further benchmark the reviewed methods on several well-known datasets. Finally, we point out several open issues in this field and suggest opportunities for further research. We also provide a public website to continuously track developments in this fast advancing field: https://github.com/WenkeHuang/MarsFL.

摘要
federated learning 已经出现为一种保持隐私的合作方式，各种不同的方面之间进行合作。随着联合学习的流行，一些方法在不同的挑战上提出了许多方法。在这份报告中，我们提供了联合学习的系统性评论，包括该领域的研究历史和术语定义，以及三个基本的研究方向：泛化、鲁棒性和公正。我们还详细介绍了这些研究方向的背景概念、任务设定和主要挑战。此外，我们还为评论的方法和数据集进行了详细的介绍。最后，我们对一些知名的数据集进行了比较。我们还指出了这个领域的一些开放问题，并建议了进一步的研究方向。此外，我们还提供了一个公共网站，以跟踪这个快速发展的领域的最新进展：https://github.com/WenkeHuang/MarsFL。

Aggregate, Decompose, and Fine-Tune: A Simple Yet Effective Factor-Tuning Method for Vision Transformer

paper_url: http://arxiv.org/abs/2311.06749
repo_url: None
paper_authors: Dongping Chen
for: 提高vision transformer（ViT）模型的 Fine-Tuning效果，解决inner-和cross-layer redundancy问题。
methods: 提出了一种简单 yet effective的 Fine-Tuning方法，即EFfective Factor-Tuning（EFFT）。
results: 在VTAB-1K数据集上，EFFT方法超过了所有基eline，在top-1准确率中取得了75.9%，仅使用0.28%的参数进行全面 Fine-Tuning。

Abstract
Recent advancements have illuminated the efficacy of some tensorization-decomposition Parameter-Efficient Fine-Tuning methods like LoRA and FacT in the context of Vision Transformers (ViT). However, these methods grapple with the challenges of inadequately addressing inner- and cross-layer redundancy. To tackle this issue, we introduce EFfective Factor-Tuning (EFFT), a simple yet effective fine-tuning method. Within the VTAB-1K dataset, our EFFT surpasses all baselines, attaining state-of-the-art performance with a categorical average of 75.9% in top-1 accuracy with only 0.28% of the parameters for full fine-tuning. Considering the simplicity and efficacy of EFFT, it holds the potential to serve as a foundational benchmark. The code and model are now available at https://github.com/Dongping-Chen/EFFT-EFfective-Factor-Tuning.

摘要
现有最新的进展在tensorization-decomposition Parameter-Efficient Fine-Tuning方法如LoRA和FacT在视transformer（ViT）上得到了证明。然而，这些方法尚未能充分地处理内部和交叉层重复。为解决这个问题，我们介绍了效果因素调整（EFFT），一种简单 yet effective fine-tuning方法。在VTAB-1K数据集上，我们的EFFT超过了所有基elines，在顶部一Accuracy中达到了75.9%，只需0.28%的参数进行全面 fine-tuning。考虑到EFFT的简单性和效果，它具有潜在的基础性。代码和模型现在可以在https://github.com/Dongping-Chen/EFFT-EFfective-Factor-Tuning中找到。

Two Stream Scene Understanding on Graph Embedding

paper_url: http://arxiv.org/abs/2311.06746
repo_url: None
paper_authors: Wenkai Yang, Wenyuan Sun, Runxaing Huang
for: 提高计算机视觉中场景理解的能力
methods: 使用两条流网络架构，其中一条是图像特征流，另一条是图像树结构，将两者融合以提高图像分类和场景图生成任务的性能
results: 在ADE20K数据集上进行了实验，并证明了该方法可以提高图像分类精度 compared to 传统方法

Abstract
The paper presents a novel two-stream network architecture for enhancing scene understanding in computer vision. This architecture utilizes a graph feature stream and an image feature stream, aiming to merge the strengths of both modalities for improved performance in image classification and scene graph generation tasks. The graph feature stream network comprises a segmentation structure, scene graph generation, and a graph representation module. The segmentation structure employs the UPSNet architecture with a backbone that can be a residual network, Vit, or Swin Transformer. The scene graph generation component focuses on extracting object labels and neighborhood relationships from the semantic map to create a scene graph. Graph Convolutional Networks (GCN), GraphSAGE, and Graph Attention Networks (GAT) are employed for graph representation, with an emphasis on capturing node features and their interconnections. The image feature stream network, on the other hand, focuses on image classification through the use of Vision Transformer and Swin Transformer models. The two streams are fused using various data fusion methods. This fusion is designed to leverage the complementary strengths of graph-based and image-based features.Experiments conducted on the ADE20K dataset demonstrate the effectiveness of the proposed two-stream network in improving image classification accuracy compared to conventional methods. This research provides a significant contribution to the field of computer vision, particularly in the areas of scene understanding and image classification, by effectively combining graph-based and image-based approaches.

摘要

Detecting and Correcting Hate Speech in Multimodal Memes with Large Visual Language Model

paper_url: http://arxiv.org/abs/2311.06737
repo_url: None
paper_authors: Minh-Hao Van, Xintao Wu
for: 这个论文主要是为了探讨大语言模型（LLMs）在视觉语言处理中的应用，以及在社交媒体平台上使用这些模型来检测和修正仇恨幻灵的能力。
methods: 该论文使用了预训练的LLaVA模型，通过零shot提示来实现仇恨幻灵检测和修正任务。
results: 经验证明，预训练的LLaVA模型在仇恨幻灵检测和修正任务中具有显著的效果，但也存在一些缺点和局限性。

Abstract
Recently, large language models (LLMs) have taken the spotlight in natural language processing. Further, integrating LLMs with vision enables the users to explore more emergent abilities in multimodality. Visual language models (VLMs), such as LLaVA, Flamingo, or GPT-4, have demonstrated impressive performance on various visio-linguistic tasks. Consequently, there are enormous applications of large models that could be potentially used on social media platforms. Despite that, there is a lack of related work on detecting or correcting hateful memes with VLMs. In this work, we study the ability of VLMs on hateful meme detection and hateful meme correction tasks with zero-shot prompting. From our empirical experiments, we show the effectiveness of the pretrained LLaVA model and discuss its strengths and weaknesses in these tasks.

摘要
Translation notes:* "Recently" is translated as "最近" (most recent)* "large language models" is translated as "大型语言模型" (dà xíng yǔ yán módel)* "integrating" is translated as "结合" (combine)* "vision" is translated as "视觉" (sī jué)* "visio-linguistic" is translated as "语言视觉" (yǔ yán sī jué)* "Visual language models" is translated as "视觉语言模型" (sī jué yǔ yán módel)* "hateful memes" is translated as "仇恨的照片" (chōu hěn de zhào pǐn)* "zero-shot prompting" is translated as "零shot提示" (zhèng shè zhǐ xiǎng)* "empirical experiments" is translated as "实验" (shí yàn)* "strengths and weaknesses" is translated as "优点和缺点" (yù diǎn hé zuò diǎn)

DeepQC: A Deep Learning System for Automatic Quality Control of In-situ Soil Moisture Sensor Time Series Data

paper_url: http://arxiv.org/abs/2311.06735
repo_url: None
paper_authors: Lahari Bandaru, Bharat C Irigireddy, Brian Davis
for: 这个研究的目的是为了开发一种能够在农业中实时检测 anomalies 的深度学习模型，以提高 soil moisture 数据的质量并帮助农民更好地管理气候变化中的风险。methods: 这个研究使用了 Bi-directional Long Short-Term Memory (LSTM) 模型，称为 DeepQC，来检测 soil moisture 数据中的 anomalies。该模型使用了手动标注的 PSA 观测数据进行训练、验证和测试，并与 Flagit 模型进行比较。results: 研究发现，DeepQC 模型可以准确地检测 soil moisture 数据中的 anomalies，并且其性能与数据量无关。相比之下，Flaggit 模型在检测 anomalies 时存在一定的限制。此外，DeepQC 模型可以在 significatively 更快的时间内完成检测任务。

Abstract
Amidst changing climate, real-time soil moisture monitoring is vital for the development of in-season decision support tools to help farmers manage weather related risks. Precision Sustainable Agriculture (PSA) recently established a real-time soil moisture monitoring network across the central, Midwest, and eastern U.S., but field-scale sensor observations often come with data gaps and anomalies. To maintain the data quality needed for development of decision tools, a quality control system is necessary. The International Soil Moisture Network (ISMN) introduced the Flagit module for anomaly detection in soil moisture observations. However, under certain conditions, Flagit's quality control approaches may underperform in identifying anomalies. Recently deep learning methods have been successfully applied to detect anomalies in time series data in various disciplines. However, their use in agriculture has not been yet investigated. This study focuses on developing a Bi-directional Long Short-Term Memory (LSTM) model, referred to as DeepQC, to identify anomalies in soil moisture data. Manual flagged PSA observations were used for training, validation, and testing the model, following an 80:10:10 split. The study then compared the DeepQC and Flagit based estimates to assess their relative performance. Flagit corrected flagged 95.5% of the corrected observations and 50.3% of the anomaly observations, indicating its limitations in identifying anomalies. On the other hand, the DeepQC correctly flagged 99.7% of the correct observations and 95.6% of the anomalies in significantly less time, demonstrating its superiority over Flagit approach. Importantly, DeepQC's performance remained consistent regardless of the number of anomalies. Given the promising results obtained with the DeepQC, future studies will focus on implementing this model on national and global soil moisture networks.

摘要
在变化的气候条件下，实时土壤湿度监测是不可或缺的 для开发季节性决策支持工具，帮助农民管理气候相关风险。准确可持续农业（PSA）已经在中部、中西部和东部美国建立了实时土壤湿度监测网络，但场地级别的感器观测经常出现数据 gap和异常。为保持需要的数据质量，一个质控系统是必要的。国际土壤湿度网络（ISMN）提出了Flagit模块，用于土壤湿度观测中异常检测，但在某些条件下，Flagit的质控方法可能会下降异常检测性能。现在，深度学习方法在不同领域的时间序列数据中已经得到了成功应用，但在农业中尚未得到研究。本研究旨在开发一种双向长短时间记忆网络（LSTM）模型，称为DeepQC，用于土壤湿度观测中异常检测。使用PSA手动标注的训练、验证和测试数据，按照80:10:10的分 splitting。研究 comparing DeepQC和Flagit两种方法的相对性能，结果显示，DeepQC可以更高精度地检测异常，并且可以在迅速的时间内完成。此外，DeepQC的性能不受异常数量的影响，表明其在不同情况下具有优势。这些结果表明，DeepQC可以成为土壤湿度观测中异常检测的有力工具。未来研究将ocus on在全国和全球土壤湿度网络中实施DeepQC模型。

An advantage based policy transfer algorithm for reinforcement learning with metrics of transferability

paper_url: http://arxiv.org/abs/2311.06731
repo_url: None
paper_authors: Md Ferdous Alam, Parinaz Naghizadeh, David Hoelzle
for: 这个论文的目的是提出一种基于优势的离线策略传递算法（APT-RL），用于在固定Domain环境中进行策略传递。
methods: 这个论文使用了优势概念作为补偿，将知识从源环境传递到目标环境，并提出了一种新的传递性能度标准来评估传递RL算法的性能。
results: 数值实验表明，APT-RL在三个连续控制 benchmark 任务中表现出色，比既有的传递RL算法更高效，并且在大多数任务上比学习从头开始更高效$10%$到$75%$。

Abstract
Reinforcement learning (RL) can enable sequential decision-making in complex and high-dimensional environments if the acquisition of a new state-action pair is efficient, i.e., when interaction with the environment is inexpensive. However, there are a myriad of real-world applications in which a high number of interactions are infeasible. In these environments, transfer RL algorithms, which can be used for the transfer of knowledge from one or multiple source environments to a target environment, have been shown to increase learning speed and improve initial and asymptotic performance. However, most existing transfer RL algorithms are on-policy and sample inefficient, and often require heuristic choices in algorithm design. This paper proposes an off-policy Advantage-based Policy Transfer algorithm, APT-RL, for fixed domain environments. Its novelty is in using the popular notion of ``advantage'' as a regularizer, to weigh the knowledge that should be transferred from the source, relative to new knowledge learned in the target, removing the need for heuristic choices. Further, we propose a new transfer performance metric to evaluate the performance of our algorithm and unify existing transfer RL frameworks. Finally, we present a scalable, theoretically-backed task similarity measurement algorithm to illustrate the alignments between our proposed transferability metric and similarities between source and target environments. Numerical experiments on three continuous control benchmark tasks demonstrate that APT-RL outperforms existing transfer RL algorithms on most tasks, and is $10\%$ to $75\%$ more sample efficient than learning from scratch.

摘要
利用增强学习（RL）可以在复杂高维环境中进行顺序决策，如果新的状态动作对获得是效果的，即在与环境交互时的成本低。然而，现实世界中有许多应用程序在哪里交互的数量太多。在这些环境中，转移RL算法可以将知识从一个或多个源环境传递到目标环境，提高学习速度和初始和终态性能。然而，现有的大多数转移RL算法是在政策上的，并且通常需要许多的参数选择。本文提出了一种基于优点的转移RL算法，称为APT-RL，用于固定领域环境。它的创新在于使用流行的“优点”概念作为规则，以比较源环境中的知识和目标环境中的新知识，从而消除参数选择。此外，我们提出了一个新的转移性能指标，用于评估我们的算法的性能，并将现有的转移RL框架统一。最后，我们提供了一种可扩展、理论支持的任务相似度测量算法，用于证明我们的转移性能指标与源和目标环境之间的相似性。数值实验表明，APT-RL在三个连续控制 benchmark 任务上表现出色，超过现有的转移RL算法，并且在大多数任务上比learn from scratch的学习效率高$10\%$到$75\%$。

Enabling Human-Centered AI: A Methodological Perspective

paper_url: http://arxiv.org/abs/2311.06703
repo_url: None
paper_authors: Wei Xu, Zaifeng Gao
for: 本文提出了一种涵盖设计目标、设计原则、实施方法、跨学科团队、HCAI方法和HCAI过程的完整HCAI框架，以帮助实践HCAI的应用。
methods: 本文提出了一种“三层”方法来实现HCAI框架，包括设计目标、设计原则、实施方法和HCAI过程。
results: 本文认为该框架可以解决现有HCAI框架的杠�和实践中遇到的挑战，并且可以帮助实践HCAI的应用。

Abstract
Human-centered AI (HCAI) is a design philosophy that advocates prioritizing humans in designing, developing, and deploying intelligent systems, aiming to maximize the benefits of AI to humans and avoid potential adverse impacts. While HCAI continues to influence, the lack of guidance on methodology in practice makes its adoption challenging. This paper proposes a comprehensive HCAI framework based on our previous work with integrated components, including design goals, design principles, implementation approaches, interdisciplinary teams, HCAI methods, and HCAI processes. This paper also presents a "three-layer" approach to facilitate the implementation of the framework. We believe this systematic and executable framework can overcome the weaknesses in current HCAI frameworks and the challenges currently faced in practice, putting it into action to enable HCAI further.

摘要
人类中心的人工智能（HCAI）是一种设计哲学，强调在设计、开发和部署智能系统时，优先考虑人类的需求和利益，以最大化人工智能对人类的 beneficial impacts，避免 potential negative impacts。 although HCAI continues to influence, the lack of guidance on methodology in practice makes its adoption challenging. This paper proposes a comprehensive HCAI framework based on our previous work with integrated components, including design goals, design principles, implementation approaches, interdisciplinary teams, HCAI methods, and HCAI processes. This paper also presents a "three-layer" approach to facilitate the implementation of the framework. We believe this systematic and executable framework can overcome the weaknesses in current HCAI frameworks and the challenges currently faced in practice, putting it into action to enable HCAI further.Here's the translation of the text into Traditional Chinese:人类中心的人工智能（HCAI）是一种设计哲学，强调在设计、开发和部署智能系统时，优先考虑人类的需求和利益，以最大化人工智能对人类的 beneficial impacts，避免 potential negative impacts。 although HCAI continues to influence, the lack of guidance on methodology in practice makes its adoption challenging. This paper proposes a comprehensive HCAI framework based on our previous work with integrated components, including design goals, design principles, implementation approaches, interdisciplinary teams, HCAI methods, and HCAI processes. This paper also presents a "three-layer" approach to facilitate the implementation of the framework. We believe this systematic and executable framework can overcome the weaknesses in current HCAI frameworks and the challenges currently faced in practice, putting it into action to enable HCAI further.

An Investigation of Hepatitis B Virus Genome using Markov Models

paper_url: http://arxiv.org/abs/2311.06699
repo_url: None
paper_authors: Khadijeh, Jahanian, Elnaz Shalbafian, Morteza Saberi, Roohallah Alizadehsani, Iman Dehzangi
for:这项研究的目的是investigating the mutational footprint of APOBEC3 enzymes in the HBV genome.methods:这项研究使用了一种多变量数据分析技术，通过分析全基因组HBV序列从多种自然感染者中获取的数据，以探索APOBEC3蛋白的抑瘤作用。results:这项研究发现，APOBEC3蛋白对HBV genomes的抑瘤具有不同的结果，并且这些结果与HIV genomes中的抑瘤结果不同。 Specifically, the study found that either APOBEC3 enzymes are not active against HBV, or the induction of G-to-A mutations by these enzymes is not sequence context-dependent in the HBV genome.

Abstract
The human genome encodes a family of editing enzymes known as APOBEC3 (apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like 3). Several family members, such as APO-BEC3G, APOBEC3F, and APOBEC3H haplotype II, exhibit activity against viruses such as HIV. These enzymes induce C-to-U mutations in the negative strand of viral genomes, resulting in multiple G-to-A changes, commonly referred to as 'hypermutation.' Mutations catalyzed by these enzymes are sequence context-dependent in the HIV genome; for instance, APOBEC3G preferen-tially mutates G within GG, TGG, and TGGG contexts, while other members mutate G within GA, TGA, and TGAA contexts. However, the same sequence context has not been explored in relation to these enzymes and HBV. In this study, our objective is to identify the mutational footprint of APOBEC3 enzymes in the HBV genome. To achieve this, we employ a multivariable data analytics technique to investigate motif preferences and potential sequence hierarchies of mutation by APOBEC3 enzymes using full genome HBV sequences from a diverse range of naturally infected patients. This approach allows us to distinguish between normal and hypermutated sequences based on the representation of mono- to tetra-nucleotide motifs. Additionally, we aim to identify motifs associated with hypermutation induced by different APOBEC3 enzymes in HBV genomes. Our analyses reveal that either APOBEC3 enzymes are not active against HBV, or the induction of G-to-A mutations by these enzymes is not sequence context-dependent in the HBV genome.

摘要
人类基因组编码一家编辑酶家族，称为APOBEC3（肽聚簇BmRNA编辑酶， catalytic polypeptide-like 3）。家族成员，如APO-BEC3G、APOBEC3F和APOBEC3H haplotype II，对病毒如HIV有活性。这些酶在病毒基因组中引起C-to-U变化，导致多个G-to-A变化，通常称为"超迁变"。这些变化由这些酶catalyzed是病毒基因组中的序列上下文依赖的；例如，APOBEC3G更偏好在GG、TGG和TGGG上进行mutation，而其他成员更偏好在GA、TGA和TGAA上进行mutation。然而，这些序列上下文尚未在APOBEC3酶和HBV之间进行研究。在这项研究中，我们的目标是确定APOBEC3酶对HBV基因组中的mutational footprint。为了实现这一目标，我们使用多变量数据分析技术，investigate APOBEC3酶对HBV基因组中的motif偏好和可能的序列层次结构。这种方法允许我们根据全基因组HBV序列从多种自然感染的患者中分离出正常和高度突变序列。此外，我们还计划确定APOBEC3酶对HBV基因组中的特定序列上下文所启用的mutation。我们的分析发现，或者APOBEC3酶不活跃对HBV，或者它们在HBV基因组中不是序列上下文依赖的。

Conversational Data Exploration: A Game-Changer for Designing Data Science Pipelines

paper_url: http://arxiv.org/abs/2311.06695
repo_url: None
paper_authors: Genoveva Vargas-Solar, Tania Cerquitelli, Javier A. Espinosa-Oviedo, François Cheval, Anthelme Buchaille, Luca Polgar
for: 这篇论文是为了提供一种对话方式，使系统Chatin实现数据探索的直观体验。
methods: 该论文使用了一种新的数据科学解决方案，通过对话方式帮助非技术用户从不同领域探索数据，并从数据中提取知识。
results: 该论文通过实现对话方式，为非技术用户提供了一种直观的数据探索体验，并帮助他们更好地理解数据。

Abstract
This paper proposes a conversational approach implemented by the system Chatin for driving an intuitive data exploration experience. Our work aims to unlock the full potential of data analytics and artificial intelligence with a new generation of data science solutions. Chatin is a cutting-edge tool that democratises access to AI-driven solutions, empowering non-technical users from various disciplines to explore data and extract knowledge from it.

摘要
这篇论文提出了一种对话方式，通过系统Chatin实现了INTUISIVE DATA EXPLORATION EXPERIENCE。我们的工作目标是通过一代新的数据科学解决方案，解放数据分析和人工智能的潜力。Chatin是一种先进的工具，通过启用非技术用户从各个领域来探索数据，从数据中提取知识。

Comparative Multi-View Language Grounding

paper_url: http://arxiv.org/abs/2311.06694
repo_url: None
paper_authors: Chancharik Mitra, Abrar Anwar, Rodolfo Corona, Dan Klein, Trevor Darrell, Jesse Thomason
for: 解决对象引用问题时给出比较语言描述
methods: 使用 transformers 进行多视图考虑和语言描述的 Pragmatic 理解
results: 比较理解帮助实现 SOTA 性能在 SNARE 对象引用任务上

Abstract
In this work, we consider the task of resolving object referents when given a comparative language description. We present a Multi-view Approach to Grounding in Context (MAGiC) that leverages transformers to pragmatically reason over both objects given multiple image views and a language description. In contrast to past efforts that attempt to connect vision and language for this task without fully considering the resulting referential context, MAGiC makes use of the comparative information by jointly reasoning over multiple views of both object referent candidates and the referring language expression. We present an analysis demonstrating that comparative reasoning contributes to SOTA performance on the SNARE object reference task.

摘要
在这个工作中，我们考虑了对象引用的解决方法，当给定一个比较语言描述时。我们提出了一种多视图方法 для文本背景（MAGiC），利用转换器来 Pragmatic 地在多个图像视图和语言描述之间进行关系reasoning。与过去的尝试不同，MAGiC 利用比较信息，同时对多个对象引用候选者和引用语言表达进行联合理解。我们提供分析，表明了比较理解对 SNARE 对象引用任务的最高性能做出贡献。