2023-11-05

cs.AI

cs.AI - 2023-11-05

Modelling Cellular Perturbations with the Sparse Additive Mechanism Shift Variational Autoencoder

paper_url: http://arxiv.org/abs/2311.02794
repo_url: https://github.com/insitro/sams-vae
paper_authors: Michael Bereket, Theofanis Karaletsos
For: This paper proposes a new method called SAMS-VAE for modeling the effects of diverse interventions on cells in order to characterize unknown biological mechanisms of action in drug discovery.* Methods: SAMS-VAE models the latent state of a perturbed sample as the sum of a local latent variable capturing sample-specific variation and sparse global variables of latent intervention effects, and sparsifies these global latent variables for individual perturbations to identify disentangled, perturbation-specific latent subspaces that are flexibly composable.* Results: SAMS-VAE outperforms comparable models in terms of generalization across in-distribution and out-of-distribution tasks, including a combinatorial reasoning task under resource paucity, and yields interpretable latent structures which correlate strongly to known biological mechanisms.Here is the same information in Simplified Chinese text:* For: 这篇论文提出了一种新的方法called SAMS-VAE，用于模型不同干预对细胞的影响，以Characterize unknown biological mechanisms of action in drug discovery。* Methods: SAMS-VAE models the latent state of a perturbed sample as the sum of a local latent variable capturing sample-specific variation and sparse global variables of latent intervention effects，并将这些全球隐变量简化为具体的干预特定隐变空间，以便可以flexibly composable。* Results: SAMS-VAE比相关的模型在不同的任务上表现出色，包括资源缺乏下的combined reasoning任务，并且生成了可解释的隐变结构，与知道的生物机制强相关。

Abstract
Generative models of observations under interventions have been a vibrant topic of interest across machine learning and the sciences in recent years. For example, in drug discovery, there is a need to model the effects of diverse interventions on cells in order to characterize unknown biological mechanisms of action. We propose the Sparse Additive Mechanism Shift Variational Autoencoder, SAMS-VAE, to combine compositionality, disentanglement, and interpretability for perturbation models. SAMS-VAE models the latent state of a perturbed sample as the sum of a local latent variable capturing sample-specific variation and sparse global variables of latent intervention effects. Crucially, SAMS-VAE sparsifies these global latent variables for individual perturbations to identify disentangled, perturbation-specific latent subspaces that are flexibly composable. We evaluate SAMS-VAE both quantitatively and qualitatively on a range of tasks using two popular single cell sequencing datasets. In order to measure perturbation-specific model-properties, we also introduce a framework for evaluation of perturbation models based on average treatment effects with links to posterior predictive checks. SAMS-VAE outperforms comparable models in terms of generalization across in-distribution and out-of-distribution tasks, including a combinatorial reasoning task under resource paucity, and yields interpretable latent structures which correlate strongly to known biological mechanisms. Our results suggest SAMS-VAE is an interesting addition to the modeling toolkit for machine learning-driven scientific discovery.

摘要
“生成干预测数据的模型在机器学习和科学领域中具有很大的兴趣，例如药物探索中需要模型不同类型的干预效应以描述未知生物机制。我们提出了对compose, disentangle和可解释性具有优势的SAMS-VAE模型，用于干预模型。SAMS-VAE将批处数据的latent state视为受到干预的sample专有的本地 latent variable和稀有的全球 latent variable，并将这些全球 latent variable压缩以归一化干预效应。我们通过两个受欢迎的单细胞测量数据集进行评估，并提出了基于对干预模型的平均治疗效应的评估框架，以及与后 posterior predictive checks 的连结。SAMS-VAE在不同任务中表现出色，包括资源缺乏下的构成逻辑任务，并具有可解释的latent结构，与生物机制具有强相关。我们的结果显示SAMS-VAE是机器学习驱动科学探索的有趣添加。”

CausalCite: A Causal Formulation of Paper Citations

paper_url: http://arxiv.org/abs/2311.02790
repo_url: https://github.com/causalnlp/causal-cite
paper_authors: Ishan Kumar, Zhijing Jin, Ehsan Mokhtarian, Siyuan Guo, Yuen Chen, Negar Kiyavash, Mrinmaya Sachan, Bernhard Schoelkopf
for: 这 paper 的目的是提出一种 causal inference 方法，用于评估科学论文的影响力。
methods: 该方法基于高维文本嵌入，使用 LLMs 对每篇论文进行编码，然后通过cosine similarity提取相似样本，并使用这些样本的权重平均值 Synthesize 一个 counterfactual 样本。
results: 该方法可以准确地评估论文的影响力，并且具有高相关性和稳定性。 authors 还提供了一些建议，用于未来的研究人员可以更好地使用该 metric。 code 和数据可以在 https://github.com/causalNLP/causal-cite 上获取。

Abstract
Evaluating the significance of a paper is pivotal yet challenging for the scientific community. While the citation count is the most commonly used proxy for this purpose, they are widely criticized for failing to accurately reflect a paper's true impact. In this work, we propose a causal inference method, TextMatch, which adapts the traditional matching framework to high-dimensional text embeddings. Specifically, we encode each paper using the text embeddings by large language models (LLMs), extract similar samples by cosine similarity, and synthesize a counterfactual sample by the weighted average of similar papers according to their similarity values. We apply the resulting metric, called CausalCite, as a causal formulation of paper citations. We show its effectiveness on various criteria, such as high correlation with paper impact as reported by scientific experts on a previous dataset of 1K papers, (test-of-time) awards for past papers, and its stability across various sub-fields of AI. We also provide a set of findings that can serve as suggested ways for future researchers to use our metric for a better understanding of a paper's quality. Our code and data are at https://github.com/causalNLP/causal-cite.

摘要
评估一篇论文的重要性是科学社区中的一项核心任务，但是也是一项具有挑战性的任务。虽然引用数是最常用的代理，但它们被广泛批评因为不能准确反映论文的真实影响。在这种情况下，我们提出了一种 causal inference 方法，即 TextMatch，该方法将传统的匹配框架应用到高维文本嵌入。具体来说，我们使用大型自然语言模型（LLM）生成的文本嵌入来编码每篇论文，然后通过cosinus相似性来提取相似的样本，并使用这些样本的相似性值来权重混合这些相似的论文。我们称这种度量为 CausalCite，它是一种用于评估论文引用的 causal 形式。我们在不同的评估标准下显示了 CausalCite 的有效性，包括高相关性与论文影响力（由科学专家在过去的数据集上提供的）、奖励、以及在不同的人工智能子领域中的稳定性。我们还提供了一些发现，可以帮助未来的研究人员通过我们的度量来更好地理解一篇论文的质量。我们的代码和数据可以在 GitHub 上找到：https://github.com/causalNLP/causal-cite。

Make a Donut: Language-Guided Hierarchical EMD-Space Planning for Zero-shot Deformable Object Manipulation

paper_url: http://arxiv.org/abs/2311.02787
repo_url: None
paper_authors: Yang You, Bokui Shen, Congyue Deng, Haoran Geng, He Wang, Leonidas Guibas
for: 这个论文的目的是解决机器人 manipulate 弹性对象的问题，这是机器人学中最吸引人又最困难的问题。
methods: 这个论文使用了大语言模型（LLM）来提出一个没有示范的层次规划方法，可以解决复杂的长期任务。每个阶段都有工具和子目标，使用了DiffPhysics-P2P损失函数和地球运动距离（EMD）空间来优化预测控制策略。
results: 实验结果表明，这种方法在糖体 manipulate 任务中表现出色，包括短期和长期任务。它还能够Robustly 扩展到未经示范的复杂任务。

Abstract
Deformable object manipulation stands as one of the most captivating yet formidable challenges in robotics. While previous techniques have predominantly relied on learning latent dynamics through demonstrations, typically represented as either particles or images, there exists a pertinent limitation: acquiring suitable demonstrations, especially for long-horizon tasks, can be elusive. Moreover, basing learning entirely on demonstrations can hamper the model's ability to generalize beyond the demonstrated tasks. In this work, we introduce a demonstration-free hierarchical planning approach capable of tackling intricate long-horizon tasks without necessitating any training. We employ large language models (LLMs) to articulate a high-level, stage-by-stage plan corresponding to a specified task. For every individual stage, the LLM provides both the tool's name and the Python code to craft intermediate subgoal point clouds. With the tool and subgoal for a particular stage at our disposal, we present a granular closed-loop model predictive control strategy. This leverages Differentiable Physics with Point-to-Point correspondence (DiffPhysics-P2P) loss in the earth mover distance (EMD) space, applied iteratively. Experimental findings affirm that our technique surpasses multiple benchmarks in dough manipulation, spanning both short and long horizons. Remarkably, our model demonstrates robust generalization capabilities to novel and previously unencountered complex tasks without any preliminary demonstrations. We further substantiate our approach with experimental trials on real-world robotic platforms.

摘要
manipulate 非常复杂的物体stood as one of the most captivating yet formidable challenges in robotics. While previous techniques have predominantly relied on learning latent dynamics through demonstrations, typically represented as either particles or images, there exists a pertinent limitation: acquiring suitable demonstrations, especially for long-horizon tasks, can be elusive. Moreover, basing learning entirely on demonstrations can hamper the model's ability to generalize beyond the demonstrated tasks. In this work, we introduce a demonstration-free hierarchical planning approach capable of tackling intricate long-horizon tasks without necessitating any training. We employ large language models (LLMs) to articulate a high-level, stage-by-stage plan corresponding to a specified task. For every individual stage, the LLM provides both the tool's name and the Python code to craft intermediate subgoal point clouds. With the tool and subgoal for a particular stage at our disposal, we present a granular closed-loop model predictive control strategy. This leverages Differentiable Physics with Point-to-Point correspondence (DiffPhysics-P2P) loss in the earth mover distance (EMD) space, applied iteratively. Experimental findings affirm that our technique surpasses multiple benchmarks in dough manipulation, spanning both short and long horizons. Remarkably, our model demonstrates robust generalization capabilities to novel and previously unencountered complex tasks without any preliminary demonstrations. We further substantiate our approach with experimental trials on real-world robotic platforms.

Towards Generic Anomaly Detection and Understanding: Large-scale Visual-linguistic Model (GPT-4V) Takes the Lead

paper_url: http://arxiv.org/abs/2311.02782
repo_url: https://github.com/caoyunkang/GPT4V-for-Generic-Anomaly-Detection
paper_authors: Yunkang Cao, Xiaohao Xu, Chen Sun, Xiaonan Huang, Weiming Shen
for: 这个研究旨在应用GPT-4V（视力语言模型）来进行一般化的偏常探测任务。
methods: 这个研究使用GPT-4V模型进行多modal, multi-domain偏常探测任务，包括图像、影片、点 cloud和时间序列数据，并涵盖了不同的应用领域，如工业、医疗、逻辑、影片、3D偏常探测和位置任务。
results: GPT-4V在zero/one-shot偏常探测中显示出了高效的探测和解释全球和细部 semantic 模式，实现了精准地区别 normal 和偏常的分别。

Abstract
Anomaly detection is a crucial task across different domains and data types. However, existing anomaly detection models are often designed for specific domains and modalities. This study explores the use of GPT-4V(ision), a powerful visual-linguistic model, to address anomaly detection tasks in a generic manner. We investigate the application of GPT-4V in multi-modality, multi-domain anomaly detection tasks, including image, video, point cloud, and time series data, across multiple application areas, such as industrial, medical, logical, video, 3D anomaly detection, and localization tasks. To enhance GPT-4V's performance, we incorporate different kinds of additional cues such as class information, human expertise, and reference images as prompts.Based on our experiments, GPT-4V proves to be highly effective in detecting and explaining global and fine-grained semantic patterns in zero/one-shot anomaly detection. This enables accurate differentiation between normal and abnormal instances. Although we conducted extensive evaluations in this study, there is still room for future evaluation to further exploit GPT-4V's generic anomaly detection capacity from different aspects. These include exploring quantitative metrics, expanding evaluation benchmarks, incorporating multi-round interactions, and incorporating human feedback loops. Nevertheless, GPT-4V exhibits promising performance in generic anomaly detection and understanding, thus opening up a new avenue for anomaly detection.

摘要
<>Translate the given text into Simplified Chinese.<> anomaly detection 是一项重要的任务 across 不同的领域和数据类型。然而，现有的异常检测模型通常是为特定的领域和Modalities 设计的。本研究探索使用 GPT-4V（视力语言模型）来Address 异常检测任务的通用方式。我们 investigate GPT-4V 在多modal, multi-domain 异常检测任务中的应用，包括图像、视频、点云和时间序列数据，以及多个应用领域，如工业、医疗、逻辑、视频、3D 异常检测和位置定位任务。为了提高 GPT-4V 的表现，我们 incorporate 不同类型的额外提示，如类信息、人工智能和参考图像。根据我们的实验，GPT-4V 在检测和解释 Zero/one-shot 异常检测中表现出色，可以准确地分辨正常和异常实例。虽然我们进行了广泛的评估，但还有更多的可能性来自 GPT-4V 的通用异常检测能力。这些包括探索量化指标、扩展评估标准、 incorporating 多 Round Interactions 和 incorporating 人类反馈循环。然而，GPT-4V 在通用异常检测和理解方面表现出色，因此开启了一个新的途径 для异常检测。

ChaTA: Towards an Intelligent Question-Answer Teaching Assistant using Open-Source LLMs

paper_url: http://arxiv.org/abs/2311.02775
repo_url: None
paper_authors: Yann Hicke, Anmol Agarwal, Qianou Ma, Paul Denny
for: 这篇论文目的是为了解决知识检索和智能问答（QA）中的扩展和智能化问题。
methods: 这篇论文使用了开源的大语言模型（LLM），以保持数据隐私。它使用了LLaMA-2家族模型，并应用了改进技术，包括检索增强生成（RAG）、监督微调（SFT）和人类反馈学习的替代方法（RLHF）。
results: 在一个 Piazza 数据集上，这篇论文通过人工评估和自动 LLlM 评估，发现了改进技术的共同作用，提高了答案质量约33%，并发现了RAG 是一个有力的添加。这项工作将为开发智能 QA 助手Customizable for 课程而铺平道路。

Abstract
To address the challenges of scalable and intelligent question-answering (QA), we introduce an innovative solution that leverages open-source Large Language Models (LLMs) to ensure data privacy. We use models from the LLaMA-2 family and augmentations including retrieval augmented generation (RAG), supervised fine-tuning (SFT), and an alternative to reinforcement learning with human feedback (RLHF). We perform our experiments on a Piazza dataset from an introductory CS course with 10k QA pairs and 1.5k pairs of preferences data and conduct both human evaluations and automatic LLM evaluations on a small subset. We find preliminary evidence that modeling techniques collectively enhance the quality of answers by 33%, and RAG is an impactful addition. This work paves the way for the development of ChaTA, an intelligent QA assistant customizable for courses with an online QA platform.

摘要

Communication Efficient and Privacy-Preserving Federated Learning Based on Evolution Strategies

paper_url: http://arxiv.org/abs/2311.03405
repo_url: https://github.com/Eric-Lan0/FedES
paper_authors: Guangchen Lan
for: 这个研究旨在提出一种基于演化策略的联合学习算法（Federated Evolution Strategies，FedES），以便在分布式的深度神经网络（DNNs）训练中实现低通信负载和数据隐私。
methods: 这个研究使用了演化策略（Evolution Strategies）来实现联合学习，而不是传输模型参数。因此，它具有非常低的通信负载。此外，这个方法还可以保护数据隐私，因为第三方无法估算梯度 без knowing 预先共享的种子。
results: 实验结果显示，FedES 可以实现低通信负载和数据隐私，同时保持与反射方法相同的参数整合性。

Abstract
Federated learning (FL) is an emerging paradigm for training deep neural networks (DNNs) in distributed manners. Current FL approaches all suffer from high communication overhead and information leakage. In this work, we present a federated learning algorithm based on evolution strategies (FedES), a zeroth-order training method. Instead of transmitting model parameters, FedES only communicates loss values, and thus has very low communication overhead. Moreover, a third party is unable to estimate gradients without knowing the pre-shared seed, which protects data privacy. Experimental results demonstrate FedES can achieve the above benefits while keeping convergence performance the same as that with back propagation methods.

摘要
Federated learning（FL）是一种新趋势的深度神经网络（DNNs）训练方法，现有的FL方法都受到高度通信开销和信息泄露的限制。在这项工作中，我们提出了基于进化策略（FedES）的 federated learning算法，而不是传输模型参数，FedES只在交换损失值，因此通信开销非常低。此外，第三方无法估计梯度，不知道预先分享的种子，因此保护了数据隐私。实验结果表明，FedES可以实现这些优点，同时保持与反向传播方法相同的凝结性能。Note: The translation is done using Google Translate and may not be perfect. Please let me know if you need further assistance or if you would like me to use a different translation tool.

Rule Learning as Machine Translation using the Atomic Knowledge Bank

paper_url: http://arxiv.org/abs/2311.02765
repo_url: https://github.com/krisaesoey/atomictranslation
paper_authors: Kristoffer Æsøy, Ana Ozaki
for: 本研究旨在探讨使用机器学习模型进行逻辑推理是否可靠和可控的问题。
methods: 本研究使用 transformers 将自然语言中表达的规则翻译成逻辑规则，并使用逻辑推理工具进行逻辑推理。
results: 研究发现，使用 transformers 翻译自然语言中表达的规则可以生成可靠和可控的逻辑规则，并且可以用于逻辑推理。

Abstract
Machine learning models, and in particular language models, are being applied to various tasks that require reasoning. While such models are good at capturing patterns their ability to reason in a trustable and controlled manner is frequently questioned. On the other hand, logic-based rule systems allow for controlled inspection and already established verification methods. However it is well-known that creating such systems manually is time-consuming and prone to errors. We explore the capability of transformers to translate sentences expressing rules in natural language into logical rules. We see reasoners as the most reliable tools for performing logical reasoning and focus on translating language into the format expected by such tools. We perform experiments using the DKET dataset from the literature and create a dataset for language to logic translation based on the Atomic knowledge bank.

摘要
机器学习模型，尤其是语言模型，在各种需要逻辑 reasoning 任务中应用。虽然这些模型能够捕捉模式，但其逻辑 reasoning 能力受到一定的质疑。然而，逻辑基础的规则系统具有可控的检查和已知的验证方法。然而，手动创建这些系统可能需要很长时间，并且容易出错。我们 investigate transformer 能力将自然语言中的句子翻译成逻辑规则。我们认为逻辑工具是逻辑 reasoning 最可靠的工具，因此我们将着眼于将语言翻译成这些工具所期望的格式。我们使用文献中的 DKET 数据集进行实验，并创建了基于 Atomic knowledge bank 的语言到逻辑翻译数据集。

Causal Question Answering with Reinforcement Learning

paper_url: http://arxiv.org/abs/2311.02760
repo_url: https://github.com/Aryia-Behroziuan/References
paper_authors: Lukas Blübaum, Stefan Heindorf
for: 本研究的目的是回答 causal вопро题，即寻找 causal 关系和其背景数据。
methods: 本文使用 reinforcement learning 方法，具体来说是 actor-critic 算法，来搜索 causal 关系和解释问题。
results: 本文的实验结果表明，使用 reinforcement learning 方法可以成功地回答 causal 问题，并且可以快速地搜索到解释问题的路径。

Abstract
Causal questions inquire about causal relationships between different events or phenomena. Specifically, they often aim to determine whether there is a relationship between two phenomena, or to identify all causes/effects of a phenomenon. Causal questions are important for a variety of use cases, including virtual assistants and search engines. However, many current approaches to causal question answering cannot provide explanations or evidence for their answers. Hence, in this paper, we aim to answer causal questions with CauseNet, a large-scale dataset of causal relations and their provenance data. Inspired by recent, successful applications of reinforcement learning to knowledge graph tasks, such as link prediction and fact-checking, we explore the application of reinforcement learning on CauseNet for causal question answering. We introduce an Actor-Critic based agent which learns to search through the graph to answer causal questions. We bootstrap the agent with a supervised learning procedure to deal with large action spaces and sparse rewards. Our evaluation shows that the agent successfully prunes the search space to answer binary causal questions by visiting less than 30 nodes per question compared to over 3,000 nodes by a naive breadth-first search. Our ablation study indicates that our supervised learning strategy provides a strong foundation upon which our reinforcement learning agent improves. The paths returned by our agent explain the mechanisms by which a cause produces an effect. Moreover, for each edge on a path, CauseNet stores its original source on the web allowing for easy verification of paths.

摘要
causal 问题查询的关系是两个或多个事件或现象之间的关系。特别是，它们通常想要确定两个现象之间是否存在关系，或者找出一个现象的所有原因。 causal 问题是各种用例中重要的，包括虚拟助手和搜索引擎。然而，许多当前的 causal 问题回答方法无法提供解释或证据。因此，在这篇论文中，我们使用 CauseNet，一个大规模的 causal 关系和其来源数据集，回答 causal 问题。以 reciprocal learning 的 inspiration，我们在 CauseNet 上应用 reciprocal learning 来回答 causal 问题。我们 introduce 一个 actor-critic 基于的搜索者，该搜索者可以在图上搜索以回答 causal 问题。我们使用一种监督学习过程来处理大的动作空间和罕见奖励。我们的评估显示，我们的搜索者可以成功地减少搜索空间，以回答 binary 的 causal 问题，每个问题只需访问 fewer than 30 个节点，而不是 naive 的广度优先搜索所需的 more than 3,000 个节点。我们的剥离研究表明，我们的监督学习策略提供了一个强大的基础，于而我们的 reciprocal learning 代理进行改进。 path 返回的 by our agent 解释了一个原因如何产生一个效果。此外，每个边在路径上，CauseNet 都将其原始来源保存在网上，以便轻松验证路径。

Learning Independently from Causality in Multi-Agent Environments

paper_url: http://arxiv.org/abs/2311.02741
repo_url: None
paper_authors: Rafael Pina, Varuna De Silva, Corentin Artaud
for: 这个论文旨在研究多智能 Reinforcement Learning（MARL）领域中的懒散代理问题，并从 causality 的视角来 investigate 这个问题。
methods: 该论文使用了 fully decentralized MARL setup，并使用了 causality 来链接个体观察和团队奖励。
results: 实验结果表明，通过使用 causality 来链接个体观察和团队奖励，可以提高独立代理的智能行为，并且帮助团队实现更好的性能。

Abstract
Multi-Agent Reinforcement Learning (MARL) comprises an area of growing interest in the field of machine learning. Despite notable advances, there are still problems that require investigation. The lazy agent pathology is a famous problem in MARL that denotes the event when some of the agents in a MARL team do not contribute to the common goal, letting the teammates do all the work. In this work, we aim to investigate this problem from a causality-based perspective. We intend to create the bridge between the fields of MARL and causality and argue about the usefulness of this link. We study a fully decentralised MARL setup where agents need to learn cooperation strategies and show that there is a causal relation between individual observations and the team reward. The experiments carried show how this relation can be used to improve independent agents in MARL, resulting not only on better performances as a team but also on the rise of more intelligent behaviours on individual agents.

摘要
多智能机器学习（MARL）是一个快速发展的领域之一，尚未解决的问题仍然存在。懒散代理症是MARL领域中著名的问题，表示一些代理在MARL团队中不做贡献，让团队其他成员完成所有工作。在这种情况下，我们希望从 causality 的视角来调查这个问题。我们想要建立 MARL 和 causality 之间的桥梁，并讨论这种链接的有用性。我们研究了一个完全分散式的 MARL 设置，其中代理需要学习合作策略，并证明了个体观察与团队奖励之间存在 causal 关系。实验表明，这种关系可以用来改进独立的代理在 MARL 中的表现，不仅提高团队的性能，还使得代理的行为更加聪明。

AV-Lip-Sync+: Leveraging AV-HuBERT to Exploit Multimodal Inconsistency for Video Deepfake Detection

paper_url: http://arxiv.org/abs/2311.02733
repo_url: None
paper_authors: Sahibzada Adil Shahzad, Ammarah Hashmi, Yan-Tsung Peng, Yu Tsao, Hsin-Min Wang
for: 防止伪造 multimedia 内容的传播，尤其是 fake news 和 false propaganda。
methods: 使用 multi-modal self-supervised learning (SSL) 特征提取器，捕捉视频和音频模式之间的不一致，进而实现多模式识别。
results: 比较所有现有模型，取得新的 state-of-the-art 性能在 FakeAVCeleb 和 DeepfakeTIMIT 数据集上。

Abstract
Multimodal manipulations (also known as audio-visual deepfakes) make it difficult for unimodal deepfake detectors to detect forgeries in multimedia content. To avoid the spread of false propaganda and fake news, timely detection is crucial. The damage to either modality (i.e., visual or audio) can only be discovered through multi-modal models that can exploit both pieces of information simultaneously. Previous methods mainly adopt uni-modal video forensics and use supervised pre-training for forgery detection. This study proposes a new method based on a multi-modal self-supervised-learning (SSL) feature extractor to exploit inconsistency between audio and visual modalities for multi-modal video forgery detection. We use the transformer-based SSL pre-trained Audio-Visual HuBERT (AV-HuBERT) model as a visual and acoustic feature extractor and a multi-scale temporal convolutional neural network to capture the temporal correlation between the audio and visual modalities. Since AV-HuBERT only extracts visual features from the lip region, we also adopt another transformer-based video model to exploit facial features and capture spatial and temporal artifacts caused during the deepfake generation process. Experimental results show that our model outperforms all existing models and achieves new state-of-the-art performance on the FakeAVCeleb and DeepfakeTIMIT datasets.

摘要
多模态操作（也称为音频视频深伪）使得单模态深伪检测器很难检测多媒体内容中的伪造。为避免假新闻和假 пропаганда 的传播，实时检测是关键。过去的方法主要采用单模态视频科学和监测预训练来检测伪造。本研究提出了一种基于多模态自适应学（SSL）特征提取器来利用视频和音频模态之间的不一致来检测多模态视频伪造。我们使用基于 transformer 的 SSL 预训练 Audio-Visual HuBERT（AV-HuBERT）模型作为视觉和听音特征提取器，并使用多尺度时间卷积神经网络来捕捉音频和视觉模态之间的时间相关性。由于 AV-HuBERT 只提取视觉特征自唇部分，我们还采用另一种基于 transformer 的视频模型来利用脸部特征和捕捉深伪生成过程中的空间和时间偏差。实验结果表明，我们的模型在 FakeAVCeleb 和 DeepfakeTIMIT 数据集上的性能比所有现有模型高，实现了新的状态纪录水平。

Extraction of Atypical Aspects from Customer Reviews: Datasets and Experiments with Language Models

paper_url: http://arxiv.org/abs/2311.02702
repo_url: https://github.com/smitanannaware/xtrata
paper_authors: Smita Nannaware, Erfan Al-Hossami, Razvan Bunescu
for: 本研究的目的是检测顾客评论中的非常规方面，以便提高用户满意度。
methods: 本研究使用了人工批注 benchmark 数据集，以评估不同语言模型的性能。
results: 研究发现，使用 Flan-T5 进行 fine-tuning 和 GPT-3.5 的零shot 和几 shot 提示可以准确检测顾客评论中的非常规方面。

Abstract
A restaurant dinner may become a memorable experience due to an unexpected aspect enjoyed by the customer, such as an origami-making station in the waiting area. If aspects that are atypical for a restaurant experience were known in advance, they could be leveraged to make recommendations that have the potential to engender serendipitous experiences, further increasing user satisfaction. Although relatively rare, whenever encountered, atypical aspects often end up being mentioned in reviews due to their memorable quality. Correspondingly, in this paper we introduce the task of detecting atypical aspects in customer reviews. To facilitate the development of extraction models, we manually annotate benchmark datasets of reviews in three domains - restaurants, hotels, and hair salons, which we use to evaluate a number of language models, ranging from fine-tuning the instruction-based text-to-text transformer Flan-T5 to zero-shot and few-shot prompting of GPT-3.5.

摘要
餐厅晚餐可能变成一个深刻的记忆，因为顾客在等待区域内感受到了一个意外的元素，如 Origami 制作站。如果在餐厅经验中不寻常的方面知道在先，可以利用这些方面来提供建议，以便在用户满意度方面产生巧合体验。虽然这些不寻常的方面相对罕见，但当遇到时，它们通常会被评论中提及，因为它们具有深刻的特点。在这篇论文中，我们介绍了检测顾客评论中不寻常的方面的任务。为了促进EXTRACTION 模型的开发，我们手动标注了多个领域的客评论 benchmark 数据集 - 餐厅、酒店和美发店，并使用这些数据集来评估多种语言模型，从 fine-tuning Flan-T5 到零shot 和几shot 的 GPT-3.5 的Prompting。

Architecture Matters: Uncovering Implicit Mechanisms in Graph Contrastive Learning

paper_url: http://arxiv.org/abs/2311.02687
repo_url: None
paper_authors: Xiaojun Guo, Yifei Wang, Zeming Wei, Yisen Wang
for: 研究 graph contrastive learning (GCL) 方法的系统性特点，包括Positive samples 不是必须的、negative samples 不需要 для图类别或节点类别，以及数据增强对 GCL 的影响较小。
methods: 研究如何通过理解 GNN 的隐式概念偏好来解释 GCL 方法的特点。
results: 发现 GCL 方法的特点与传统的 visual contrastive learning (VCL) 方法有很大差异，包括Positive samples 不是必须的、negative samples 不需要 для图类别或节点类别，以及数据增强对 GCL 的影响较小。

Abstract
With the prosperity of contrastive learning for visual representation learning (VCL), it is also adapted to the graph domain and yields promising performance. However, through a systematic study of various graph contrastive learning (GCL) methods, we observe that some common phenomena among existing GCL methods that are quite different from the original VCL methods, including 1) positive samples are not a must for GCL; 2) negative samples are not necessary for graph classification, neither for node classification when adopting specific normalization modules; 3) data augmentations have much less influence on GCL, as simple domain-agnostic augmentations (e.g., Gaussian noise) can also attain fairly good performance. By uncovering how the implicit inductive bias of GNNs works in contrastive learning, we theoretically provide insights into the above intriguing properties of GCL. Rather than directly porting existing VCL methods to GCL, we advocate for more attention toward the unique architecture of graph learning and consider its implicit influence when designing GCL methods. Code is available at https: //github.com/PKU-ML/ArchitectureMattersGCL.

摘要
With the prosperity of contrastive learning for visual representation learning (VCL), it has also been adapted to the graph domain and has shown promising performance. However, through a systematic study of various graph contrastive learning (GCL) methods, we observe that some common phenomena exist among existing GCL methods that are quite different from the original VCL methods, including:1. Positive samples are not a must for GCL.2. Negative samples are not necessary for graph classification, nor for node classification when using specific normalization modules.3. Data augmentations have much less influence on GCL, as simple domain-agnostic augmentations (e.g., Gaussian noise) can also achieve fairly good performance.By uncovering how the implicit inductive bias of GNNs works in contrastive learning, we provide theoretical insights into the above intriguing properties of GCL. Rather than directly porting existing VCL methods to GCL, we advocate for more attention to be paid to the unique architecture of graph learning and consider its implicit influence when designing GCL methods. Code is available at: .

Compute at Scale – A Broad Investigation into the Data Center Industry

paper_url: http://arxiv.org/abs/2311.02651
repo_url: None
paper_authors: Konstantin Pilz, Lennart Heim
for: 这份报告描述了数据中心行业的现状和其对人工智能发展的重要性。
methods: 报告使用了大量的计算资源和网络连接来描述数据中心的特点和业务模式。
results: 根据报告，全球数据中心市场的价值约为2500亿美元，预计在下一个七年内将双倍增长。同时，全球大约有500个大型数据中心（每个MW电压），美国、欧洲和中国是最重要的市场。

Abstract
This report characterizes the data center industry and its importance for AI development. Data centers are industrial facilities that efficiently provide compute at scale and thus constitute the engine rooms of today's digital economy. As large-scale AI training and inference become increasingly computationally expensive, they are dominantly executed from this designated infrastructure. Key features of data centers include large-scale compute clusters that require extensive cooling and consume large amounts of power, the need for fast connectivity both within the data center and to the internet, and an emphasis on security and reliability. The global industry is valued at approximately $250B and is expected to double over the next seven years. There are likely about 500 large (above 10 MW) data centers globally, with the US, Europe, and China constituting the most important markets. The report further covers important actors, business models, main inputs, and typical locations of data centers.

摘要
这份报告描述了数据中心业和其对人工智能发展的重要性。数据中心是大规模计算的工业设施，它们提供大规模计算能力，并因此成为当今数字经济的发动机。随着大规模人工智能训练和推断变得越来越计算昂贵，它们主要在这些指定的基础设施上进行执行。数据中心的主要特点包括大规模计算集群，需要广泛的冷却和大量的电力，快速的内部连接和互联网连接，以及安全性和可靠性的强调。全球业态估价约2500亿美元，预计在下一个七年内将 doubles。全球可能有约500个大于10MW的数据中心，美国、欧洲和中国是最重要的市场。报告还涵盖了重要的actor、业务模式、主要输入和典型的数据中心所在地。

New Approach for an Affective Computing-Driven Quality of Experience (QoE) Prediction

paper_url: http://arxiv.org/abs/2311.02647
repo_url: None
paper_authors: Joshua Bègue, Mohamed Aymen Labiod, Abdelhamid Melloulk
for:这篇论文旨在提出一种基于情感计算的 качество经验（QoE）预测模型，以便在多媒体QoE评估场景下提高体验质量。methods:这篇论文使用了计算多核电enzephalogram（EEG）信息，并使用了差异 entropy和功率 спектル密度来特征提取。然后，使用深度学习模型来研究是否可以通过这些特征来预测QoE。results:这篇论文使用了一个公共可用的数据集，并使用了多个深度学习模型来研究QoE预测的可能性。结果显示，使用LSTM模型可以获得最好的结果，其中F1分数在68%到78%之间。此外，分析表明，Delta频率带是最不必要的，两个电极具有更高的重要性，而两个电极具有很低的影响。

Abstract
In human interactions, emotion recognition is crucial. For this reason, the topic of computer-vision approaches for automatic emotion recognition is currently being extensively researched. Processing multi-channel electroencephalogram (EEG) information is one of the most researched methods for automatic emotion recognition. This paper presents a new model for an affective computing-driven Quality of Experience (QoE) prediction. In order to validate the proposed model, a publicly available dataset is used. The dataset contains EEG, ECG, and respiratory data and is focused on a multimedia QoE assessment context. The EEG data are retained on which the differential entropy and the power spectral density are calculated with an observation window of three seconds. These two features were extracted to train several deep-learning models to investigate the possibility of predicting QoE with five different factors. The performance of these models is compared, and the best model is optimized to improve the results. The best results were obtained with an LSTM-based model, presenting an F1-score from 68% to 78%. An analysis of the model and its features shows that the Delta frequency band is the least necessary, that two electrodes have a higher importance, and that two other electrodes have a very low impact on the model's performances.

摘要
人际交互中，情感认知是非常重要的。因此，计算机视觉方法自动情感认知的研究在当前已经非常广泛。处理多通道电enzephalogram（EEG）信息是最广泛研究的方法。这篇论文提出了一种新的情感计算驱动的品质经验（QoE）预测模型。为验证提议的模型，使用了一个公共可用的数据集。该数据集包括EEG、ECG和呼吸数据，并且是关于多媒体QoE评估上下文。EEG数据上计算了差异积分和功率spectral density，使用观察窗口为3秒。这两个特征用于训练多个深度学习模型，以 investigate可能通过五个因素预测QoE。模型的性能相比，LSTM模型显示最佳结果，其F1得分在68%到78%之间。分析模型和其特征显示，Delta频率带是最不必要的，两个电极有更高的重要性，两个电极具有很低的影响。

PotholeGuard: A Pothole Detection Approach by Point Cloud Semantic Segmentation

paper_url: http://arxiv.org/abs/2311.02641
repo_url: None
paper_authors: Sahil Nawale, Dhruv Khut, Daksh Dave, Gauransh Sawhney, Pushkar Aggrawal, Dr. Kailas Devadakar
for: 本研究旨在提供一种robust和准确的3D破洞 segmentation方法，用于道路安全维护。
methods: 该方法使用点云图像 segmentation，并提供了一种新的点云缺失缓解机制，以及一种本地关系学习模块，以提高本地特征表示。
results: 对三个公共数据集进行了广泛的实验，并证明了PotholeGuard方法在现有方法中的超越性。

Abstract
Pothole detection is crucial for road safety and maintenance, traditionally relying on 2D image segmentation. However, existing 3D Semantic Pothole Segmentation research often overlooks point cloud sparsity, leading to suboptimal local feature capture and segmentation accuracy. Our research presents an innovative point cloud-based pothole segmentation architecture. Our model efficiently identifies hidden features and uses a feedback mechanism to enhance local characteristics, improving feature presentation. We introduce a local relationship learning module to understand local shape relationships, enhancing structural insights. Additionally, we propose a lightweight adaptive structure for refining local point features using the K nearest neighbor algorithm, addressing point cloud density differences and domain selection. Shared MLP Pooling is integrated to learn deep aggregation features, facilitating semantic data exploration and segmentation guidance. Extensive experiments on three public datasets confirm PotholeGuard's superior performance over state-of-the-art methods. Our approach offers a promising solution for robust and accurate 3D pothole segmentation, with applications in road maintenance and safety.

摘要
《破洞检测是公路安全和维护中非常重要的一环，传统上靠的是2D图像分割。然而，现有的3D语义破洞分割研究经常忽略点云稀疏性，导致本地特征捕捉和分割精度受到限制。我们的研究提出了一种创新的点云基于的破洞分割建筑。我们的模型能够高效发现隐藏的特征，并使用反馈机制来增强本地特征，提高特征表现。我们引入了地方关系学习模块，以更好地理解地方形态关系，提高结构性能。此外，我们提议了一种轻量级适应结构，通过K最近邻近算法来调整本地点特征，解决点云密度差异和领域选择问题。我们将 Shared MLP Pooling 集成到整个模型中，以学习深度聚合特征，促进 semantic 数据探索和分割引导。我们的方法在三个公共数据集上进行了广泛的实验，并证明了 PotholeGuard 的超过状态艺术方法的性能。我们的方法可以为公路维护和安全带来一个可靠和准确的3D破洞分割解决方案。

Assessing the Promise and Pitfalls of ChatGPT for Automated Code Generation

paper_url: http://arxiv.org/abs/2311.02640
repo_url: https://github.com/dsaatusu/chatgpt-promises-and-pitfalls
paper_authors: Muhammad Fawad Akbar Khan, Max Ramsdell, Erik Falor, Hamid Karimi
for: 评估 chatGPT 大语言模型在代码生成方面的能力，并与人工程师的代码相比。
methods: 使用了一个新的代码生成数据集，并对 chatGPT 和人工程师的代码进行了比较性评估，以评估 chatGPT 的代码生成能力。
results: 显示了 chatGPT 在数据分析任务中的强大能力（准确率为 93.1%），但在视觉graphical挑战中存在局限性。 chatGPT 的代码具有较高的含义性和安全性，且倾向于使用模块化设计和更好的错误处理。机器学习模型也能够准确地分辨出 chatGPT 的代码和人工程师的代码之间的差异。

Abstract
This paper presents a comprehensive evaluation of the code generation capabilities of ChatGPT, a prominent large language model, compared to human programmers. A novel dataset of 131 code-generation prompts across 5 categories was curated to enable robust analysis. Code solutions were generated by both ChatGPT and humans for all prompts, resulting in 262 code samples. A meticulous manual assessment methodology prioritized evaluating correctness, comprehensibility, and security using 14 established code quality metrics. The key findings reveal ChatGPT's strengths in crafting concise, efficient code with advanced constructs, showcasing strengths in data analysis tasks (93.1% accuracy) but limitations in visual-graphical challenges. Comparative analysis with human code highlights ChatGPT's inclination towards modular design and superior error handling. Additionally, machine learning models effectively distinguished ChatGPT from human code with up to 88% accuracy, suggesting detectable coding style disparities. By providing profound insights into ChatGPT's code generation capabilities and limitations through quantitative metrics and qualitative analysis, this study makes valuable contributions toward advancing AI-based programming assistants. The curated dataset and methodology offer a robust foundation for future research in this nascent domain. All data and codes are available on https://github.com/DSAatUSU/ChatGPT-promises-and-pitfalls.

摘要
The key findings show that ChatGPT excels in crafting concise and efficient code with advanced constructs, particularly in data analysis tasks (93.1% accuracy). However, it struggles with visual-graphical challenges. Comparative analysis with human code reveals that ChatGPT tends towards modular design and superior error handling. Additionally, machine learning models can accurately distinguish ChatGPT code from human code (up to 88% accuracy), indicating detectable differences in coding style.This study provides valuable insights into ChatGPT's code generation capabilities and limitations through quantitative metrics and qualitative analysis. The curated dataset and methodology serve as a robust foundation for future research in this emerging field. All data and codes are available on GitHub (https://github.com/DSAatUSU/ChatGPT-promises-and-pitfalls).

A Critical Perceptual Pre-trained Model for Complex Trajectory Recovery

paper_url: http://arxiv.org/abs/2311.02631
repo_url: None
paper_authors: Dedong Li, Ziyue Li, Zhishuai Li, Lei Bai, Qingyuan Gong, Lijun Sun, Wolfgang Ketter, Rui Zhao
For: 提高复杂路径轨迹恢复的精度， especialmente 处理跨度远 road segment 和多个拐弯的情况。* Methods: 使用顺序语言模型在预训练 manner 学习道路段表示 вектор，并提出了多视图图文件和复杂度意识Transformer（MGCAT）模型，可以在轨迹预训练中 adaptively 聚合多视图图文件特征，以及增强关键节点的注意力。* Results: 对大规模数据集进行了广泛的实验，结果表明，我们的方法可以更好地学习轨迹恢复的表示，全体 F1 分数提高 5.22%，特别是复杂轨迹 F1 分数提高 8.16%。

Abstract
The trajectory on the road traffic is commonly collected at a low sampling rate, and trajectory recovery aims to recover a complete and continuous trajectory from the sparse and discrete inputs. Recently, sequential language models have been innovatively adopted for trajectory recovery in a pre-trained manner: it learns road segment representation vectors, which will be used in the downstream tasks. However, existing methods are incapable of handling complex trajectories: when the trajectory crosses remote road segments or makes several turns, which we call critical nodes, the quality of learned representations deteriorates, and the recovered trajectories skip the critical nodes. This work is dedicated to offering a more robust trajectory recovery for complex trajectories. Firstly, we define the trajectory complexity based on the detour score and entropy score and construct the complexity-aware semantic graphs correspondingly. Then, we propose a Multi-view Graph and Complexity Aware Transformer (MGCAT) model to encode these semantics in trajectory pre-training from two aspects: 1) adaptively aggregate the multi-view graph features considering trajectory pattern, and 2) higher attention to critical nodes in a complex trajectory. Such that, our MGCAT is perceptual when handling the critical scenario of complex trajectories. Extensive experiments are conducted on large-scale datasets. The results prove that our method learns better representations for trajectory recovery, with 5.22% higher F1-score overall and 8.16% higher F1-score for complex trajectories particularly. The code is available at https://github.com/bonaldli/ComplexTraj.

摘要
trajectory 在路况资料收集时通常采集得非常低， trajectory 恢复目标是恢复完整、连续的 trajectory 从稀疏、离散输入中。最近，序列语言模型在预训练方式下被创新地应用于 trajectory 恢复中，它学习了路段表示 вектор，这些 вектор 将在下游任务中使用。然而，现有方法无法处理复杂的 trajectory：当 trajectory 过 remote 路段或多次转弯时，学习的表示质量会下降， recovered trajectory 会跳过关键节点。这项工作旨在提供更加稳定的 trajectory 恢复方法，能够处理复杂的 trajectory。我们首先定义 trajectory 的复杂性基于拐弯分数和 entropy 分数，并构建了相应的复杂性意识图。然后，我们提出了多视图图和复杂性意识图 transformer（MGCAT）模型，用于在 trajectory 预训练中编码这些semantics。MGCAT 模型通过以下两种方式来编码这些semantics：1）适应性地集合多视图图特征，考虑 trajectory 模式；2）在复杂的 trajectory 中高优先级关注关键节点。这样，我们的 MGCAT 在处理复杂的 trajectory 时具有较高的感知性。我们在大规模数据集上进行了广泛的实验，结果表明我们的方法可以更好地学习 trajectory 恢复的表示，全局 F1 分数提高 5.22%，而复杂 trajectory 的 F1 分数提高 8.16%。代码可以在上获取。

The New Frontier of Cybersecurity: Emerging Threats and Innovations

paper_url: http://arxiv.org/abs/2311.02630
repo_url: None
paper_authors: Daksh Dave, Gauransh Sawhney, Pushkar Aggarwal, Nitish Silswal, Dhruv Khut
for: 这项研究旨在全面检讨Cybersecurity领域的多种威胁，包括Malware攻击、社会工程攻击、网络漏洞和数据泄露等四类威胁。
methods: 本研究采用资深研究方法，检讨了这些威胁对个人、组织和社会的影响。
results: 研究发现了一系列新兴Cybersecurity威胁，包括高级 persistente攻击、劫持攻击、物联网（IoT）漏洞和社会工程攻击等。这些威胁对组织和个人都构成了严重的风险。因此，需要采取多层防御措施，包括强大的安全措施、全面的员工培训和定期的安全审核。

Abstract
In today's digitally interconnected world, cybersecurity threats have reached unprecedented levels, presenting a pressing concern for individuals, organizations, and governments. This study employs a qualitative research approach to comprehensively examine the diverse threats of cybersecurity and their impacts across various sectors. Four primary categories of threats are identified and analyzed, encompassing malware attacks, social engineering attacks, network vulnerabilities, and data breaches. The research delves into the consequences of these threats on individuals, organizations, and society at large. The findings reveal a range of key emerging threats in cybersecurity, including advanced persistent threats, ransomware attacks, Internet of Things (IoT) vulnerabilities, and social engineering exploits. Consequently, it is evident that emerging cybersecurity threats pose substantial risks to both organizations and individuals. The sophistication and diversity of these emerging threats necessitate a multi-layered approach to cybersecurity. This approach should include robust security measures, comprehensive employee training, and regular security audits. The implications of these emerging threats are extensive, with potential consequences such as financial loss, reputational damage, and compromised personal information. This study emphasizes the importance of implementing effective measures to mitigate these threats. It highlights the significance of using strong passwords, encryption methods, and regularly updating software to bolster cyber defenses.

摘要
The research reveals a range of key emerging threats in cybersecurity, including advanced persistent threats, ransomware attacks, Internet of Things (IoT) vulnerabilities, and social engineering exploits. These emerging threats pose substantial risks to both organizations and individuals, with potential consequences such as financial loss, reputational damage, and compromised personal information.To mitigate these threats, this study emphasizes the importance of implementing effective measures, such as robust security measures, comprehensive employee training, and regular security audits. Additionally, using strong passwords, encryption methods, and regularly updating software can help bolster cyber defenses. The sophistication and diversity of emerging threats necessitate a multi-layered approach to cybersecurity.The findings of this study have far-reaching implications, highlighting the significance of taking proactive measures to protect against cyber threats. With the increasing dependence on digital technologies, it is essential to stay vigilant and adapt to the evolving landscape of cyber threats. By prioritizing cybersecurity, individuals and organizations can minimize the risks of financial loss, reputational damage, and compromised personal information.

AIOps-Driven Enhancement of Log Anomaly Detection in Unsupervised Scenarios

paper_url: http://arxiv.org/abs/2311.02621
repo_url: None
paper_authors: Daksh Dave, Gauransh Sawhney, Dhruv Khut, Sahil Nawale, Pushkar Aggrawal, Prasenjit Bhavathankar
for: 本研究旨在提高AIOps平台中的日志异常检测效果，填补现有研究空白。
methods: 本研究提出了一种新型的混合方法，结合了不监督学习策略，包括原始数据处理和人工神经网络。
results: 实验结果表明，提出的方法可以减少 Pseudo-positive 的数量，并且可以处理日志在原始、未处理的形式下进行分析。

Abstract
Artificial intelligence operations (AIOps) play a pivotal role in identifying, mitigating, and analyzing anomalous system behaviors and alerts. However, the research landscape in this field remains limited, leaving significant gaps unexplored. This study introduces a novel hybrid framework through an innovative algorithm that incorporates an unsupervised strategy. This strategy integrates Principal Component Analysis (PCA) and Artificial Neural Networks (ANNs) and uses a custom loss function to substantially enhance the effectiveness of log anomaly detection. The proposed approach encompasses the utilization of both simulated and real-world datasets, including logs from SockShop and Hadoop Distributed File System (HDFS). The experimental results are highly promising, demonstrating significant reductions in pseudo-positives. Moreover, this strategy offers notable advantages, such as the ability to process logs in their raw, unprocessed form, and the potential for further enhancements. The successful implementation of this approach showcases a remarkable reduction in anomalous logs, thus unequivocally establishing the efficacy of the proposed methodology. Ultimately, this study makes a substantial contribution to the advancement of log anomaly detection within AIOps platforms, addressing the critical need for effective and efficient log analysis in modern and complex systems.

摘要

Get the Ball Rolling: Alerting Autonomous Robots When to Help to Close the Healthcare Loop

paper_url: http://arxiv.org/abs/2311.02602
repo_url: None
paper_authors: Jiaxin Shen, Yanyao Liu, Ziming Wang, Ziyuan Jiao, Yufeng Chen, Wenjuan Han
for: 推动健康Robot研究 без人类干预或指令，提出自主帮助挑战和大规模人均数据采集。
methods: 提出健康Robot拥有自动确定帮助需要、生成有用子任务、通过物理机器执行计划、接受环境反馈以生成新任务的能力。
results: 解决开放场景中自主任务生成、现场与静止常识之间的漏洞、语言指令与实际世界之间的漏洞等挑战，并提出Helpy方法尝试填补健康循环学习无需人类干预的情况。

Abstract
To facilitate the advancement of research in healthcare robots without human intervention or commands, we introduce the Autonomous Helping Challenge, along with a crowd-sourcing large-scale dataset. The goal is to create healthcare robots that possess the ability to determine when assistance is necessary, generate useful sub-tasks to aid in planning, carry out these plans through a physical robot, and receive feedback from the environment in order to generate new tasks and continue the process. Besides the general challenge in open-ended scenarios, Autonomous Helping focuses on three specific challenges: autonomous task generation, the gap between the current scene and static commonsense, and the gap between language instruction and the real world. Additionally, we propose Helpy, a potential approach to close the healthcare loop in the learning-free setting.

摘要
为了推动医疗机器人自主研究的发展，我们提出了无人指导的帮助挑战，同时发布了大规模的人类参与评估数据集。我们的目标是创造一种具有自动确定帮助需求、生成有用子任务、执行 física robot 计划、并接受环境反馈以生成新任务的医疗机器人。除了开放场景中的总体挑战外，Autonomous Helping 特点在于三个特定挑战：自动任务生成、现场与静态常识之间的差距、以及语言指令与实际世界之间的差距。此外，我们提出了一种可能的方法来在无学习设定下关闭医疗循环——Helpy。

Automated Camera Calibration via Homography Estimation with GNNs

paper_url: http://arxiv.org/abs/2311.02598
repo_url: None
paper_authors: Giacomo D’Amicantonio, Egor Bondarev, Peter H. N. De With
for: 这篇论文是为了提高交通监测系统中相机的准确性和自动化调整而提出的一种新方法。
methods: 该方法基于图граaph neural networks，使用 bird’s-eye-view 图像生成交叉口视角图像集合，并使用这些图像集合学习拓扑结构，从而估算出Homography矩阵。
results: 该方法在实验中表现出色，在真实世界相机上取得了最高精度的准确性和自动化调整。

Abstract
Over the past few decades, a significant rise of camera-based applications for traffic monitoring has occurred. Governments and local administrations are increasingly relying on the data collected from these cameras to enhance road safety and optimize traffic conditions. However, for effective data utilization, it is imperative to ensure accurate and automated calibration of the involved cameras. This paper proposes a novel approach to address this challenge by leveraging the topological structure of intersections. We propose a framework involving the generation of a set of synthetic intersection viewpoint images from a bird's-eye-view image, framed as a graph of virtual cameras to model these images. Using the capabilities of Graph Neural Networks, we effectively learn the relationships within this graph, thereby facilitating the estimation of a homography matrix. This estimation leverages the neighbourhood representation for any real-world camera and is enhanced by exploiting multiple images instead of a single match. In turn, the homography matrix allows the retrieval of extrinsic calibration parameters. As a result, the proposed framework demonstrates superior performance on both synthetic datasets and real-world cameras, setting a new state-of-the-art benchmark.

摘要
We propose a framework that involves generating a set of synthetic intersection viewpoint images from a bird's-eye-view image, framed as a graph of virtual cameras to model these images. Using the capabilities of Graph Neural Networks, we effectively learn the relationships within this graph, thereby facilitating the estimation of a homography matrix. This estimation leverages the neighborhood representation for any real-world camera and is enhanced by exploiting multiple images instead of a single match. In turn, the homography matrix allows the retrieval of extrinsic calibration parameters.As a result, the proposed framework demonstrates superior performance on both synthetic datasets and real-world cameras, setting a new state-of-the-art benchmark.

FloodBrain: Flood Disaster Reporting by Web-based Retrieval Augmented Generation with an LLM

paper_url: http://arxiv.org/abs/2311.02597
repo_url: None
paper_authors: Grace Colverd, Paul Darm, Leonard Silverberg, Noah Kasmanoff
for: 急需快速灾害影响报告，以便规划人道援助。
methods: 利用大量语言模型（LLMs）实现文本生成和问题解答等功能，并将资讯EXTRACT AND CURATE FROM THE WEB以生成灾害影响报告。
results: 比较不同的大量语言模型（LLMs）的生成报告和人工撰写的报告，发现与人工评分相似的相关性。另外，透过组件分析，发现单一管道元件的重要性。以增进大量语言模型的应用，减少灾害发生后的协调时间。

Abstract
Fast disaster impact reporting is crucial in planning humanitarian assistance. Large Language Models (LLMs) are well known for their ability to write coherent text and fulfill a variety of tasks relevant to impact reporting, such as question answering or text summarization. However, LLMs are constrained by the knowledge within their training data and are prone to generating inaccurate, or "hallucinated", information. To address this, we introduce a sophisticated pipeline embodied in our tool FloodBrain (floodbrain.com), specialized in generating flood disaster impact reports by extracting and curating information from the web. Our pipeline assimilates information from web search results to produce detailed and accurate reports on flood events. We test different LLMs as backbones in our tool and compare their generated reports to human-written reports on different metrics. Similar to other studies, we find a notable correlation between the scores assigned by GPT-4 and the scores given by human evaluators when comparing our generated reports to human-authored ones. Additionally, we conduct an ablation study to test our single pipeline components and their relevancy for the final reports. With our tool, we aim to advance the use of LLMs for disaster impact reporting and reduce the time for coordination of humanitarian efforts in the wake of flood disasters.

摘要
快速灾害影响报告是紧急 situations 规划人道援助的关键。大型自然语言模型（LLM）因其可以生成协调的文本和完成多种与影响报告相关的任务，如问答或文本摘要。然而，LLM 受训数据中知识的限制，容易生成错误或 "幻见" 信息。为解决这问题，我们提出了一个复杂的管道，并在我们的工具 FloodBrain (floodbrain.com) 中实现了生成洪水灾害影响报告。我们的管道从网络搜索结果中提取和筛选信息，生成详细和准确的洪水事件报告。我们测试了不同的 LLM 作为管道的脑库，并与人类评估器的评分相比较。与其他研究相似，我们发现了 GPT-4 的评分和人类评估器的评分之间存在显著的相关性。此外，我们还进行了减少学Component 的研究，以评估它们在最终报告中的重要性。我们的工具 aim 是使用 LLM 进行灾害影响报告，提高人道援助协调的效率，并减少洪水灾害后的协调时间。

scBeacon: single-cell biomarker extraction via identifying paired cell clusters across biological conditions with contrastive siamese networks

paper_url: http://arxiv.org/abs/2311.02594
repo_url: None
paper_authors: Chenyu Liu, Kweon Yong Jin, Jun Ding
for: 这篇论文旨在提高单细胞水平的标识分析，尤其是在疾病和健康状态之间的交互作用下。
methods: 这篇论文提出了一个名为 scBeacon 的新的框架，这是一个基于深度对称 siamese 网络的无监控方法，可以对单细胞水平的标识进行分组，并且可以对不同状态下的单细胞进行匹配。
results: 根据评估结果，scBeacon 在各种数据集上表现出色，较 existing 的单细胞标识分析工具有更高的精度和灵活性。

Abstract
Despite the breakthroughs in biomarker discovery facilitated by differential gene analysis, challenges remain, particularly at the single-cell level. Traditional methodologies heavily rely on user-supplied cell annotations, focusing on individually expressed data, often neglecting the critical interactions between biological conditions, such as healthy versus diseased states. In response, here we introduce scBeacon, an innovative framework built upon a deep contrastive siamese network. scBeacon pioneers an unsupervised approach, adeptly identifying matched cell populations across varied conditions, enabling a refined differential gene analysis. By utilizing a VQ-VAE framework, a contrastive siamese network, and a greedy iterative strategy, scBeacon effectively pinpoints differential genes that hold potential as key biomarkers. Comprehensive evaluations on a diverse array of datasets validate scBeacon's superiority over existing single-cell differential gene analysis tools. Its precision and adaptability underscore its significant role in enhancing diagnostic accuracy in biomarker discovery. With the emphasis on the importance of biomarkers in diagnosis, scBeacon is positioned to be a pivotal asset in the evolution of personalized medicine and targeted treatments.

摘要
尽管生物标志物发现方面已经做出了重大突破，但是在单个细胞水平还存在一些挑战。传统的方法ologies依赖用户提供的细胞注释，宁静关注个别表达数据，经常忽略生物条件之间的关键互动，如健康与疾病状态之间的对比。为此，我们在这里引入scBeacon，一种创新的框架，基于深度对比性同构网络。scBeacon采用无监督方法，能够准确地匹配不同状态下的细胞人口，从而提高了差异基因分析的精度。通过VQ-VAE框架、对比性同构网络和迅速迭代策略，scBeacon可以有效地找到具有潜在作用的差异基因，这些基因可能成为重要的生物标志物。对于一系列多样化的数据集进行了全面的评估，scBeacon的精度和适应性得到了证明，与现有的单个细胞差异基因分析工具相比，具有显著的优势。鉴于生物标志物在诊断中的重要性，scBeacon将成为个人化医学和Targeted therapy的核心资产。

Differentially Private Pre-Trained Model Fusion using Decentralized Federated Graph Matching

paper_url: http://arxiv.org/abs/2311.03396
repo_url: None
paper_authors: Qian Chen, Yiqiang Chen, Xinlong Jiang, Teng Zhang, Weiwei Dai, Wuliang Huang, Zhen Yan, Bo Ye
for: 本研究旨在提供一种保持隐私的模型融合方法，以便在模型作为服务的场景中实现高质量的模型服务传递。
methods: 本研究使用了图structured architecture，并采用了本地差分隐私机制和分布式联合图匹配来保证模型融合过程中的隐私。
results: 实验结果表明，PrivFusion可以保持模型性能的同时保障隐私，并且在实际的医疗应用中得到了较好的效果。

Abstract
Model fusion is becoming a crucial component in the context of model-as-a-service scenarios, enabling the delivery of high-quality model services to local users. However, this approach introduces privacy risks and imposes certain limitations on its applications. Ensuring secure model exchange and knowledge fusion among users becomes a significant challenge in this setting. To tackle this issue, we propose PrivFusion, a novel architecture that preserves privacy while facilitating model fusion under the constraints of local differential privacy. PrivFusion leverages a graph-based structure, enabling the fusion of models from multiple parties without necessitating retraining. By employing randomized mechanisms, PrivFusion ensures privacy guarantees throughout the fusion process. To enhance model privacy, our approach incorporates a hybrid local differentially private mechanism and decentralized federated graph matching, effectively protecting both activation values and weights. Additionally, we introduce a perturbation filter adapter to alleviate the impact of randomized noise, thereby preserving the utility of the fused model. Through extensive experiments conducted on diverse image datasets and real-world healthcare applications, we provide empirical evidence showcasing the effectiveness of PrivFusion in maintaining model performance while preserving privacy. Our contributions offer valuable insights and practical solutions for secure and collaborative data analysis within the domain of privacy-preserving model fusion.

摘要
<>模型融合在服务模式下成为关键组件，使得本地用户获得高质量模型服务。然而，这种方法带来隐私风险并带来一些应用限制。保持安全的模型交换和用户知识融合成为这种设置中的主要挑战。为解决这个问题，我们提出了 PrivFusion，一种新的架构，可以在保持隐私的情况下进行模型融合。PrivFusion利用图structured，可以在多方参与者的情况下进行模型融合，不需要重新训练。通过随机机制，PrivFusion保证了隐私保障 throughout the fusion process。为增强模型隐私，我们的方法包括了hybrid本地分布式隐私机制和分布式图匹配，以保护模型的活动值和权重。此外，我们还提出了抑制随机噪声的滤波器适配器，以降低随机噪声的影响，保持融合模型的用用性。通过对多个图像数据集和实际医疗应用进行了广泛的实验，我们提供了Empirical evidence，证明PrivFusion可以保持模型性能的同时保护隐私。我们的贡献提供了有价值的实践解决方案和技术方法，用于在隐私保护下进行安全的数据分析。

paper_url: http://arxiv.org/abs/2311.03395
repo_url: None
paper_authors: Kumar Srinivas Bobba, Kartheeban K, Vamsi Krishna Sai Boddu, Vijaya Mani Surendra Bolla, Dinesh Bugga
for: 帮助视障人群在日常生活中独立行动，提高生活质量。
methods: 使用计算机视觉、距离估算附加于超声波传感器、语音识别和语音助手，为用户提供实时环境信息。
results: 可以帮助视障人群在环境中导航、识别物体和人员、阅读文本、避免障碍。

Abstract
As able-bodied people, we often take our vision for granted. For people who are visually impaired, however, their disability can have a significant impact on their daily lives. We are developing proprietary headgear that will help visually impaired people navigate their surroundings, identify objects and people, read text, and avoid obstacles. The headgear will use a combination of computer vision, distance estimation with ultrasonic sensors, voice recognition, and voice assistants to provide users with real-time information about their environment. Users will be able to interact with the headgear through voice commands, such as ''What is that?'' to identify an object or ''Navigate to the front door'' to find their way around. The headgear will then provide the user with a verbal description of the object or spoken navigation instructions. We believe that this headgear has the potential to make a significant difference in the lives of visually impaired people, allowing them to live more independently and participate more fully in society.

摘要
As 能够的人们，我们经常忽略我们的视力。但对于有视力障碍的人们，他们的障碍可能会对他们的日常生活产生深远的影响。我们正在开发专有的头盔，帮助有视力障碍的人们在环境中导航、识别物体和人员、阅读文本，并避免障碍。这个头盔使用计算机视觉、ultrasonic探测、语音识别和语音助手等技术，为用户提供实时环境信息。用户可以通过声音命令，如 ''什么是那？'' 识别物体，或 ''导航到门口'' 查找方向。头盔然后为用户提供物体的声音描述或导航说明。我们认为这个头盔有可能对有视力障碍人员的生活产生深远的影响，让他们更独立地生活，更全面地参与社会。

KITS: Inductive Spatio-Temporal Kriging with Increment Training Strategy

paper_url: http://arxiv.org/abs/2311.02565
repo_url: None
paper_authors: Qianxiong Xu, Cheng Long, Ziyue Li, Sijie Ruan, Rui Zhao, Zhishuai Li
For: This paper proposes a new method called KITS (Kriging with Increment Training Strategy) to address the issue of graph gap in inductive spatio-temporal kriging methods based on graph neural networks.* Methods: The KITS method adds virtual nodes to the training graph to mitigate the graph gap issue, and pairs each virtual node with its most similar observed node to fuse their features together. The method also constructs reliable pseudo labels for virtual nodes to enhance the supervision signal.* Results: The KITS method consistently outperforms existing kriging methods by large margins, with an improvement over MAE score of up to 18.33%.

Abstract
Sensors are commonly deployed to perceive the environment. However, due to the high cost, sensors are usually sparsely deployed. Kriging is the tailored task to infer the unobserved nodes (without sensors) using the observed source nodes (with sensors). The essence of kriging task is transferability. Recently, several inductive spatio-temporal kriging methods have been proposed based on graph neural networks, being trained based on a graph built on top of observed nodes via pretext tasks such as masking nodes out and reconstructing them. However, the graph in training is inevitably much sparser than the graph in inference that includes all the observed and unobserved nodes. The learned pattern cannot be well generalized for inference, denoted as graph gap. To address this issue, we first present a novel Increment training strategy: instead of masking nodes (and reconstructing them), we add virtual nodes into the training graph so as to mitigate the graph gap issue naturally. Nevertheless, the empty-shell virtual nodes without labels could have bad-learned features and lack supervision signals. To solve these issues, we pair each virtual node with its most similar observed node and fuse their features together; to enhance the supervision signal, we construct reliable pseudo labels for virtual nodes. As a result, the learned pattern of virtual nodes could be safely transferred to real unobserved nodes for reliable kriging. We name our new Kriging model with Increment Training Strategy as KITS. Extensive experiments demonstrate that KITS consistently outperforms existing kriging methods by large margins, e.g., the improvement over MAE score could be as high as 18.33%.

摘要
感知器通常用于感知环境。然而，由于成本高昂，感知器通常会受到稀畴部署。基于树状网络的 krilling 任务可以用来推断没有感知器的节点（无感知节点）。 krilling 任务的核心思想是传播性。现在，基于图ael 神经网络的一些 inductive spatio-temporal krilling 方法已经被提出，这些方法通过在观察节点基础上建立图来进行训练，然后通过预测任务来学习。然而，训练图和推断图都包含所有观察和无感知节点，这会导致学习的模式难以在推断中 generalized。这种问题被称为图 gap。为解决这个问题，我们首先提出了一种新的增量训练策略：而不是将节点屏蔽（并重建它们），我们会将虚拟节点添加到训练图中，以mitigate the graph gap issue naturally。然而，空 shell 的虚拟节点没有标签可能会有坏学习特征和缺乏监督信号。为解决这些问题，我们将每个虚拟节点与其最相似的观察节点进行对应，并将它们的特征特性相加。此外，为增强监督信号，我们将虚拟节点的 pseudo label 建立起来。因此，学习的虚拟节点模式可以安全地传输到实际的无感知节点，以确保可靠的 krilling。我们称之为 KITS。我们的实验表明，KITS 可以大幅超过现有的 krilling 方法，例如 MAE 分数的改进率可以高达 18.33%。

Time Series Synthesis Using the Matrix Profile for Anonymization

paper_url: http://arxiv.org/abs/2311.02563
repo_url: None
paper_authors: Audrey Der, Chin-Chia Michael Yeh, Yan Zheng, Junpeng Wang, Huiyuan Chen, Zhongfang Zhuang, Liang Wang, Wei Zhang, Eamonn Keogh
for: 提供一种方法 Synthesize time series data，以便在保持数据相似性的情况下，避免遵循 privacy regulations 或 commercial confidentiality 的限制。
methods: 提出了 Time Series Synthesis Using Matrix Profile (TSSUMP) 方法，该方法可以在保持数据相似性的情况下，将时间序列数据synthesized，以便在数据分析 tasks 中使用。
results: 通过实际案例研究，表明 TSSUMP 方法可以减少时间序列数据的相关性，同时保持数据的相似性，使得数据分析工具可以在synthesized时间序列上达到 near-identical 性能。

Abstract
Publishing and sharing data is crucial for the data mining community, allowing collaboration and driving open innovation. However, many researchers cannot release their data due to privacy regulations or fear of leaking confidential business information. To alleviate such issues, we propose the Time Series Synthesis Using the Matrix Profile (TSSUMP) method, where synthesized time series can be released in lieu of the original data. The TSSUMP method synthesizes time series by preserving similarity join information (i.e., Matrix Profile) while reducing the correlation between the synthesized and the original time series. As a result, neither the values for the individual time steps nor the local patterns (or shapes) from the original data can be recovered, yet the resulting data can be used for downstream tasks that data analysts are interested in. We concentrate on similarity joins because they are one of the most widely applied time series data mining routines across different data mining tasks. We test our method on a case study of ECG and gender masking prediction. In this case study, the gender information is not only removed from the synthesized time series, but the synthesized time series also preserves enough information from the original time series. As a result, unmodified data mining tools can obtain near-identical performance on the synthesized time series as on the original time series.

摘要
发布和分享数据对数据挖掘社区至关重要，它帮助研究人员合作和推动开放创新。然而，许多研究人员无法发布自己的数据，因为隐私法规或担心泄露商业机密信息。为解决这些问题，我们提出了时间序列合成使用矩阵Profile（TSSUMP）方法，其中合成的时间序列可以代替原始数据。TSSUMP方法将时间序列合成，保持相似性Join信息（即矩阵Profile），同时减少合成时间序列和原始时间序列之间的相关性。因此，不能回归原始数据中的值，也不能回归本地特征（或形状）。然而，合成的数据仍然可以用于下游任务，数据分析师感兴趣的任务。我们专注于相似Join，因为它们是时间序列数据挖掘任务中最常用的 Routine。我们在ECG和性别遮盾预测case study中测试了我们的方法。在这个case study中， gender信息不仅从合成的时间序列中被除了，还保留了原始时间序列中的足够信息。因此，未修改的数据挖掘工具可以在合成的时间序列上获得近似于原始时间序列的性能。

Ego-Network Transformer for Subsequence Classification in Time Series Data

paper_url: http://arxiv.org/abs/2311.02561
repo_url: None
paper_authors: Chin-Chia Michael Yeh, Huiyuan Chen, Yujie Fan, Xin Dai, Yan Zheng, Vivian Lai, Junpeng Wang, Zhongfang Zhuang, Liang Wang, Wei Zhang, Eamonn Keogh
for: 这篇论文旨在解决实际时间序列数据中的背景序列与前景序列混合类别问题。
methods: 本论文提出了一个新的时间序列子序列分类方法，将每个子序列表示为一个自我网络，具有重要最近邻信息。
results: 根据128个单变量和30个多变量时间序列数据集进行实验，结果显示本方法比据点方法表现出色，在104个数据集中表现更好。

Abstract
Time series classification is a widely studied problem in the field of time series data mining. Previous research has predominantly focused on scenarios where relevant or foreground subsequences have already been extracted, with each subsequence corresponding to a single label. However, real-world time series data often contain foreground subsequences that are intertwined with background subsequences. Successfully classifying these relevant subsequences requires not only distinguishing between different classes but also accurately identifying the foreground subsequences amidst the background. To address this challenge, we propose a novel subsequence classification method that represents each subsequence as an ego-network, providing crucial nearest neighbor information to the model. The ego-networks of all subsequences collectively form a time series subsequence graph, and we introduce an algorithm to efficiently construct this graph. Furthermore, we have demonstrated the significance of enforcing temporal consistency in the prediction of adjacent subsequences for the subsequence classification problem. To evaluate the effectiveness of our approach, we conducted experiments using 128 univariate and 30 multivariate time series datasets. The experimental results demonstrate the superior performance of our method compared to alternative approaches. Specifically, our method outperforms the baseline on 104 out of 158 datasets.

摘要
时间序列分类是时间序列数据挖掘领域广泛研究的问题。先前的研究主要集中在已经提取了相关或前景 subsequences 的情况下进行研究，每个 subsequences 都对应一个单独的标签。然而，实际世界中的时间序列数据经常包含相关的前景 subsequences，需要不仅分辨不同的类型，还需要准确地识别前景 subsequences 中的相关部分。为解决这个挑战，我们提出了一种新的 subsequences 分类方法，即将每个 subsequences 表示为一个自我网络，提供了关键的最近邻居信息给模型。所有 subsequences 的ego-networks 共同形成了时间序列 subsequences 图，我们介绍了一种有效地构建这个图的算法。此外，我们还证明了在预测相邻 subsequences 时应该保持时间一致性的重要性。为评估我们的方法的有效性，我们对 128 个单variate 和 30 个多variate 时间序列数据集进行了实验。实验结果表明，我们的方法与其他方法相比，在 104 个数据集上表现出了更高的性能。具体来说，我们的方法在 158 个数据集中超过了基准值。

Sketching Multidimensional Time Series for Fast Discord Mining

paper_url: http://arxiv.org/abs/2311.03393
repo_url: None
paper_authors: Chin-Chia Michael Yeh, Yan Zheng, Menghai Pan, Huiyuan Chen, Zhongfang Zhuang, Junpeng Wang, Liang Wang, Wei Zhang, Jeff M. Phillips, Eamonn Keogh
for: 本研究旨在提高多维时间序列异常检测中维度缩放的效率，并提供一种可靠地检测多维时间序列异常的方法。
methods: 本研究使用缩放矩阵 Profile 来捕捉时间序列异常，并提出一种基于缩放矩阵的快速异常检测算法。
results: 实验结果表明，提出的算法可以在多个实际世界应用中提高吞吐量，并且只具有 minimal impact on the quality of the approximated solution。此外，该算法还可以处理动态添加或删除维度的情况，允许数据分析师在实时进行 “what-if” 分析。

Abstract
Time series discords are a useful primitive for time series anomaly detection, and the matrix profile is capable of capturing discord effectively. There exist many research efforts to improve the scalability of discord discovery with respect to the length of time series. However, there is surprisingly little work focused on reducing the time complexity of matrix profile computation associated with dimensionality of a multidimensional time series. In this work, we propose a sketch for discord mining among multi-dimensional time series. After an initial pre-processing of the sketch as fast as reading the data, the discord mining has runtime independent of the dimensionality of the original data. On several real world examples from water treatment and transportation, the proposed algorithm improves the throughput by at least an order of magnitude (50X) and only has minimal impact on the quality of the approximated solution. Additionally, the proposed method can handle the dynamic addition or deletion of dimensions inconsequential overhead. This allows a data analyst to consider "what-if" scenarios in real time while exploring the data.

摘要
时序列冲突是一种有用的原始 primitives для时序列异常检测，matrix profile 可以有效地捕捉冲突。有很多研究努力以提高时序列冲突发现的可扩展性，但是奇怪的是，有 surprisingly little work focused on reducing the time complexity of matrix profile computation associated with the dimensionality of a multidimensional time series.在这种工作中，我们提议一种笔记 для多维时序列冲突挖掘。经过初始快速预处理的笔记，冲突挖掘的运行时间与原始数据的维度无关。在几个实际世界示例中（水处理和交通），我们提出的算法可以提高通过put throughput 至少一个数量级（50X），并且只有 minimal impact on the quality of the approximated solution。此外，我们的方法还可以处理动态添加或删除维度的无关 overhead。这意味着数据分析师可以在实时中考虑 "what-if" 场景，在探索数据时进行实时探索。

Nonlinear Multi-objective Reinforcement Learning with Provable Guarantees

paper_url: http://arxiv.org/abs/2311.02544
repo_url: None
paper_authors: Nianli Peng, Brandon Fain
for: 解决单或多目标Markov决策过程（MDP）中 maximize 期望值的非线性函数。
methods: 使用证明可靠的保证来解决这些问题，扩展了经典的E3算法，并提出了一种基于奖励意识的值迭代过程，以及一种同时学习环境模型的算法。
results: 该算法可以在很短的时间内获得一个约等于优化的策略，时间复杂度为MDP大小、欲达到的拟合度和非线性函数的平滑程度的高阶幂。

Abstract
We describe RA-E3 (Reward-Aware Explicit Explore or Exploit), an algorithm with provable guarantees for solving a single or multi-objective Markov Decision Process (MDP) where we want to maximize the expected value of a nonlinear function over accumulated rewards. This allows us to model fairness-aware welfare optimization for multi-objective reinforcement learning as well as risk-aware reinforcement learning with nonlinear Von Neumann-Morgenstern utility functions in the single objective setting. RA-E3 extends the classic E3 algorithm that solves MDPs with scalar rewards and linear preferences. We first state a distinct reward-aware version of value iteration that calculates a non-stationary policy that is approximately optimal for a given model of the environment. This sub-procedure is based on an extended form of Bellman optimality for nonlinear optimization that explicitly considers time and current accumulated reward. We then describe how to use this optimization procedure in a larger algorithm that must simultaneously learn a model of the environment. The algorithm learns an approximately optimal policy in time that depends polynomially on the MDP size, desired approximation, and smoothness of the nonlinear function, and exponentially on the number of objectives.

摘要
我们描述RA-E3（奖励意识的明确探索或利用算法），这是一个具有证明保证的算法，用于解决单或多个目标Markov决策过程（MDP），以 Maximize the expected value of a nonlinear function over accumulated rewards. 这Permit us to model fairness-aware welfare optimization for multi-objective reinforcement learning as well as risk-aware reinforcement learning with nonlinear Von Neumann-Morgenstern utility functions in the single objective setting. RA-E3 extends the classic E3 algorithm that solves MDPs with scalar rewards and linear preferences. We first state a distinct reward-aware version of value iteration that calculates a non-stationary policy that is approximately optimal for a given model of the environment. This sub-procedure is based on an extended form of Bellman optimality for nonlinear optimization that explicitly considers time and current accumulated reward. We then describe how to use this optimization procedure in a larger algorithm that must simultaneously learn a model of the environment. The algorithm learns an approximately optimal policy in time that depends polynomially on the MDP size, desired approximation, and smoothness of the nonlinear function, and exponentially on the number of objectives.

Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols

paper_url: http://arxiv.org/abs/2311.02538
repo_url: None
paper_authors: Iqra Qasim, Alexander Horsch, Dilip K. Prasad
for: 本研究旨在描述视频中的各种事件和互动，以提高视频的自然语言描述能力。
methods: 本研究使用了 dense video captioning (DVC) 技术，包括视频特征提取 (VFE)、时间事件Localization (TEL) 和高密度caption生成 (DCG) 三个子任务。
results: 研究人员通过实现DVC技术来描述视频中的各种事件和互动，并获得了较好的结果。

Abstract
Untrimmed videos have interrelated events, dependencies, context, overlapping events, object-object interactions, domain specificity, and other semantics that are worth highlighting while describing a video in natural language. Owing to such a vast diversity, a single sentence can only correctly describe a portion of the video. Dense Video Captioning (DVC) aims at detecting and describing different events in a given video. The term DVC originated in the 2017 ActivityNet challenge, after which considerable effort has been made to address the challenge. Dense Video Captioning is divided into three sub-tasks: (1) Video Feature Extraction (VFE), (2) Temporal Event Localization (TEL), and (3) Dense Caption Generation (DCG). This review aims to discuss all the studies that claim to perform DVC along with its sub-tasks and summarize their results. We also discuss all the datasets that have been used for DVC. Lastly, we highlight some emerging challenges and future trends in the field.

摘要
<> simulti-translation:en-cn原文：Untrimmed videos have interrelated events, dependencies, context, overlapping events, object-object interactions, domain specificity, and other semantics that are worth highlighting while describing a video in natural language. Owing to such a vast diversity, a single sentence can only correctly describe a portion of the video. Dense Video Captioning (DVC) aims at detecting and describing different events in a given video. The term DVC originated in the 2017 ActivityNet challenge, after which considerable effort has been made to address the challenge. Dense Video Captioning is divided into three sub-tasks: (1) Video Feature Extraction (VFE), (2) Temporal Event Localization (TEL), and (3) Dense Caption Generation (DCG). This review aims to discuss all the studies that claim to perform DVC along with its sub-tasks and summarize their results. We also discuss all the datasets that have been used for DVC. Lastly, we highlight some emerging challenges and future trends in the field.翻译：视频中有关联的事件、依赖关系、上下文、重叠事件、对象之间交互、域特定性和其他semantics，这些都值得在描述视频的自然语言中提到。由于这种广泛的多样性，单个句子只能正确描述视频的一部分。dense video captioning（DVC）目标在检测和描述视频中的不同事件。DVC的概念在2017年的ActivityNet挑战之后得到了广泛的努力，以解决这个挑战。DVC分为三个子任务：（1）视频特征提取（VFE），（2）时间事件地理位置（TEL），和（3）密集caption生成（DCG）。本文尝试讨论所有宣称实现DVC的研究，以及它们的结果。我们还讨论了所有用于DVC的数据集。最后，我们提出了一些emerging挑战和未来趋势。

Fun Paper

2023-11-05

cs.AI - 2023-11-05

Modelling Cellular Perturbations with the Sparse Additive Mechanism Shift Variational Autoencoder

CausalCite: A Causal Formulation of Paper Citations

Make a Donut: Language-Guided Hierarchical EMD-Space Planning for Zero-shot Deformable Object Manipulation

Towards Generic Anomaly Detection and Understanding: Large-scale Visual-linguistic Model (GPT-4V) Takes the Lead

ChaTA: Towards an Intelligent Question-Answer Teaching Assistant using Open-Source LLMs

Communication Efficient and Privacy-Preserving Federated Learning Based on Evolution Strategies

Rule Learning as Machine Translation using the Atomic Knowledge Bank

Causal Question Answering with Reinforcement Learning

Learning Independently from Causality in Multi-Agent Environments

AV-Lip-Sync+: Leveraging AV-HuBERT to Exploit Multimodal Inconsistency for Video Deepfake Detection

Extraction of Atypical Aspects from Customer Reviews: Datasets and Experiments with Language Models

Architecture Matters: Uncovering Implicit Mechanisms in Graph Contrastive Learning

Compute at Scale – A Broad Investigation into the Data Center Industry

New Approach for an Affective Computing-Driven Quality of Experience (QoE) Prediction

PotholeGuard: A Pothole Detection Approach by Point Cloud Semantic Segmentation

Assessing the Promise and Pitfalls of ChatGPT for Automated Code Generation

A Critical Perceptual Pre-trained Model for Complex Trajectory Recovery

The New Frontier of Cybersecurity: Emerging Threats and Innovations

AIOps-Driven Enhancement of Log Anomaly Detection in Unsupervised Scenarios

Get the Ball Rolling: Alerting Autonomous Robots When to Help to Close the Healthcare Loop

Automated Camera Calibration via Homography Estimation with GNNs

FloodBrain: Flood Disaster Reporting by Web-based Retrieval Augmented Generation with an LLM

scBeacon: single-cell biomarker extraction via identifying paired cell clusters across biological conditions with contrastive siamese networks

Differentially Private Pre-Trained Model Fusion using Decentralized Federated Graph Matching

Newvision: application for helping blind people using deep learning

KITS: Inductive Spatio-Temporal Kriging with Increment Training Strategy

Time Series Synthesis Using the Matrix Profile for Anonymization

Ego-Network Transformer for Subsequence Classification in Time Series Data

Sketching Multidimensional Time Series for Fast Discord Mining

Nonlinear Multi-objective Reinforcement Learning with Provable Guarantees

Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols