cs.AI - 2023-11-01

Generate and Pray: Using SALLMS to Evaluate the Security of LLM Generated Code

  • paper_url: http://arxiv.org/abs/2311.00889
  • repo_url: None
  • paper_authors: Mohammed Latif Siddiq, Joanna C. S. Santos
  • for: 本研究旨在 evaluating Large Language Models (LLMs) 的安全性,以确保 LLMs 生成的代码不仅功能正确,还不会带来漏洞。
  • methods: 本研究使用了一个框架 named SALLM,它包括一个新的安全中心Python提问集、一个测试生成代码的环境以及一些新的评价指标来评估 LLMs 的安全代码生成能力。
  • results: 研究发现,现有的评价指标主要关注函数正确性,忽视安全考虑因素,导致 LLMs 可能生成的代码存在漏洞。SALLM 框架可以系统地评估 LLMs 的安全代码生成能力,帮助开发者更好地使用 LLMs 进行软件开发。
    Abstract With the growing popularity of Large Language Models (e.g. GitHub Copilot, ChatGPT, etc.) in software engineers' daily practices, it is important to ensure that the code generated by these tools is not only functionally correct but also free of vulnerabilities. Although LLMs can help developers to be more productive, prior empirical studies have shown that LLMs can generate insecure code. There are two contributing factors to the insecure code generation. First, existing datasets used to evaluate Large Language Models (LLMs) do not adequately represent genuine software engineering tasks sensitive to security. Instead, they are often based on competitive programming challenges or classroom-type coding tasks. In real-world applications, the code produced is integrated into larger codebases, introducing potential security risks. There's a clear absence of benchmarks that focus on evaluating the security of the generated code. Second, existing evaluation metrics primarily focus on the functional correctness of the generated code while ignoring security considerations. Metrics such as pass@k gauge the probability of obtaining the correct code in the top k suggestions. Other popular metrics like BLEU, CodeBLEU, ROUGE, and METEOR similarly emphasize functional accuracy, neglecting security implications. In light of these research gaps, in this paper, we described SALLM, a framework to benchmark LLMs' abilities to generate secure code systematically. This framework has three major components: a novel dataset of security-centric Python prompts, an evaluation environment to test the generated code, and novel metrics to evaluate the models' performance from the perspective of secure code generation.
    摘要 随着大型语言模型(如GitHub Copilot和ChatGPT等)在软件工程师日常实践中的普及,需要确保这些工具生成的代码不仅功能正确,还要免受漏洞。虽然LLM可以帮助开发者更加生产力,但根据先前的研究表明,LLM可能会生成不安全的代码。这有两个贡献因素。首先,现有的LLM评价数据集不能够准确表征实际的软件工程任务,而是基于竞赛编程挑战或教室型编程任务。在实际应用中,生成的代码将被集成到更大的代码库中, introducing potential security risks。此外,存在一个缺失的评价标准,即评价生成代码的安全性。其次,现有的评价指标主要集中在生成代码的功能正确性上,忽视安全考虑。例如,pass@k指标衡量在topk建议中获得正确代码的概率。其他流行的指标如BLEU、CodeBLEU、ROUGE和METEOR也强调功能正确性,忽视安全后果。为了填补这些研究漏洞,本文提出了SALLM框架,用于系统地评价LLM的安全代码生成能力。SALLM框架包括三个主要组成部分:一个新的安全中心Python提问集,一个用于测试生成代码的评价环境,以及一些新的安全性评价指标。

SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization

  • paper_url: http://arxiv.org/abs/2311.00880
  • repo_url: None
  • paper_authors: Jaafar Mhamed, Shangding Gu
  • for: 这篇论文旨在提高实际应用中的强化学习环境的安全性。
  • methods: 该论文使用了受限的Markov决策过程(CMDP),并使用Lagrangian relaxation技术转化为不受限的双问题。
  • results: 该研究提出了一种新的安全强化学习算法(SCPO),可以自动平衡安全限制的满足和奖励的最大化。该算法在实验中舒适性比基本参考点高。
    Abstract Incorporating safety is an essential prerequisite for broadening the practical applications of reinforcement learning in real-world scenarios. To tackle this challenge, Constrained Markov Decision Processes (CMDPs) are leveraged, which introduce a distinct cost function representing safety violations. In CMDPs' settings, Lagrangian relaxation technique has been employed in previous algorithms to convert constrained optimization problems into unconstrained dual problems. However, these algorithms may inaccurately predict unsafe behavior, resulting in instability while learning the Lagrange multiplier. This study introduces a novel safe reinforcement learning algorithm, Safety Critic Policy Optimization (SCPO). In this study, we define the safety critic, a mechanism that nullifies rewards obtained through violating safety constraints. Furthermore, our theoretical analysis indicates that the proposed algorithm can automatically balance the trade-off between adhering to safety constraints and maximizing rewards. The effectiveness of the SCPO algorithm is empirically validated by benchmarking it against strong baselines.
    摘要 要扩大强化学习在实际场景中的应用,保障是一个必不可少的前提。为此,我们利用受限的Markov决策过程(CMDP),它们引入了一个特定的成本函数表示安全违反。在CMDP的设置下,以前的算法使用Lagrangian relaxation技术将受限优化问题转化为无约优化问题的双重问题。然而,这些算法可能会错误地预测不安全的行为,导致学习 Lagrange多余预测的不稳定。本研究提出了一种新的安全强化学习算法,安全批评策略优化(SCPO)。在本研究中,我们定义了安全批评机制,即在违反安全限制时取消获得的奖励。此外,我们的理论分析表明,提案的算法可以自动平衡遵守安全限制和最大化奖励之间的衡量。我们对SCPO算法进行了实验验证,并证明其与强大的基elines进行比较。

Selectively Sharing Experiences Improves Multi-Agent Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.00865
  • repo_url: https://github.com/mgerstgrasser/super
  • paper_authors: Matthias Gerstgrasser, Tom Danino, Sarah Keren
  • for: 本文提出了一种新的多代理RL方法,即选择性多代理经验转移(Selective Multi-Agent Prioritized Experience Relay,SMAPER),其中代理在训练时共享有限的转移经验。理解是,即使每个代理只有少量相关经验,也可以帮助它们学习。不同于许多其他多代理RL算法,这种方法允许代理在很大程度上自主训练,只需要代理之间的有限通信。
  • methods: 本文使用了SMAPER算法,其包括以下几个步骤:首先,每个代理从环境中收集经验,并将其分为有价值和无价值两类;然后,每个代理选择一部分有价值经验与其他代理共享,而不是所有的经验;最后,每个代理根据共享的经验进行学习。
  • results: 作者证明了SMAPER方法可以超过基eline的不共享训练和现状的多代理RL算法。此外,只共享有限的高度相关经验可以超过共享所有经验,并且这种性能提升是对多种参数和DQN变体的robust。参考实现可以在https://github.com/mgerstgrasser/super上找到。
    Abstract We present a novel multi-agent RL approach, Selective Multi-Agent Prioritized Experience Relay, in which agents share with other agents a limited number of transitions they observe during training. The intuition behind this is that even a small number of relevant experiences from other agents could help each agent learn. Unlike many other multi-agent RL algorithms, this approach allows for largely decentralized training, requiring only a limited communication channel between agents. We show that our approach outperforms baseline no-sharing decentralized training and state-of-the art multi-agent RL algorithms. Further, sharing only a small number of highly relevant experiences outperforms sharing all experiences between agents, and the performance uplift from selective experience sharing is robust across a range of hyperparameters and DQN variants. A reference implementation of our algorithm is available at https://github.com/mgerstgrasser/super.
    摘要 我们提出了一种新的多代理RL方法,选择性多代理优先经验转移(Selective Multi-Agent Prioritized Experience Relay,简称SMAPER)。在这种方法中,代理们在训练时分享有限的转移经验。我们的理念是,即使只有一小部分代理的转移经验,也可以帮助每个代理学习。不同于许多其他多代理RL算法,我们的方法允许大量分布式训练,只需要代理之间的有限通信。我们证明了,我们的方法比基eline无共享的分布式训练和现状的多代理RL算法高效。此外,仅分享一小部分高度相关的经验,可以超越分享所有经验 между代理,并且SMAPER的性能提升是对多种 гиперпараметров和DQN变体的 robust。一个Reference实现我们的算法可以在https://github.com/mgerstgrasser/super上找到。

Training Dynamics of Contextual N-Grams in Language Models

  • paper_url: http://arxiv.org/abs/2311.00863
  • repo_url: https://github.com/luciaquirke/contextual-ngrams
  • paper_authors: Lucia Quirke, Lovis Heindrich, Wes Gurnee, Neel Nanda
  • for: 这个论文的目的是提出了语言模型中的上下文 нейроン存在的证据,并证明了这些 neuron 是如何形成的。
  • methods: 这篇论文使用了一种叫做 second-order circuit 的方法,即在训练过程中,先形成了各个 n-gram 循环 circuit,然后这些循环 circuit 与 German detection circuit 相互作用,形成了一个更大的上下文 neuron 循环。
  • results: 这篇论文发现了一些异常观察结果,如训练时间序列的同步过渡,以及许多上下文 neuron 在训练早期就已经形成,但是在后期被忘记。这些结果与之前的假设不符,显示了上下文 neuron 的形成是一个慢速的过程,而不是突然的阶段性变化。
    Abstract Prior work has shown the existence of contextual neurons in language models, including a neuron that activates on German text. We show that this neuron exists within a broader contextual n-gram circuit: we find late layer neurons which recognize and continue n-grams common in German text, but which only activate if the German neuron is active. We investigate the formation of this circuit throughout training and find that it is an example of what we call a second-order circuit. In particular, both the constituent n-gram circuits and the German detection circuit which culminates in the German neuron form with independent functions early in training - the German detection circuit partially through modeling German unigram statistics, and the n-grams by boosting appropriate completions. Only after both circuits have already formed do they fit together into a second-order circuit. Contrary to the hypotheses presented in prior work, we find that the contextual n-gram circuit forms gradually rather than in a sudden phase transition. We further present a range of anomalous observations such as a simultaneous phase transition in many tasks coinciding with the learning rate warm-up, and evidence that many context neurons form simultaneously early in training but are later unlearned.
    摘要

Zero Coordinate Shift: Whetted Automatic Differentiation for Physics-informed Operator Learning

  • paper_url: http://arxiv.org/abs/2311.00860
  • repo_url: https://github.com/stfc-sciml/zerocoordinateshift
  • paper_authors: Kuangdai Leng, Mallikarjun Shankar, Jeyan Thiyagalingam
  • for: physics-informed machine learning, particularly for computing high-order derivatives of network output w.r.t. coordinates.
  • methods: introduces a novel and lightweight algorithm called Zero Coordinate Shift (ZCS), which simplifies the wanted derivatives from “many-roots-many-leaves” to “one-root-many-leaves” by introducing only one scalar-valued leaf variable for each spatial or temporal dimension.
  • results: persistently brings down GPU memory consumption and wall time for training by an order of magnitude, with the savings increasing with problem scale (i.e., number of functions, number of points, and order of PDE).
    Abstract Automatic differentiation (AD) is a critical step in physics-informed machine learning, required for computing the high-order derivatives of network output w.r.t. coordinates. In this paper, we present a novel and lightweight algorithm to conduct such AD for physics-informed operator learning, as we call the trick of Zero Coordinate Shift (ZCS). Instead of making all sampled coordinates leaf variables, ZCS introduces only one scalar-valued leaf variable for each spatial or temporal dimension, leading to a game-changing performance leap by simplifying the wanted derivatives from "many-roots-many-leaves" to "one-root-many-leaves". ZCS is easy to implement with current deep learning libraries; our own implementation is by extending the DeepXDE package. We carry out a comprehensive benchmark analysis and several case studies, training physics-informed DeepONets to solve partial differential equations (PDEs) without data. The results show that ZCS has persistently brought down GPU memory consumption and wall time for training by an order of magnitude, with the savings increasing with problem scale (i.e., number of functions, number of points and order of PDE). As a low-level optimisation, ZCS entails no restrictions on data, physics (PDEs) or network architecture and does not compromise training results from any aspect.
    摘要 自动 diferenciación (AD) es un paso crítico en aprendizaje de máquina informado por física, necesario para calcular las altas derivadas de salida de red w.r.t. coordenadas. En este artículo, presentamos un algoritmo nuevo y liviano para realizar AD para aprendizaje de operadores informados por física, como llamamos la truca de Zero Coordinate Shift (ZCS). En lugar de hacer que todas las variables de coordenada sean variables de hoja, ZCS introduce solo una variable de hoja escalar para cada dimensión espacial o temporal, lo que conduce a un salto de rendimiento revolucionario simplificando las derivadas deseadas de "muchas raíces muchas hojas" a "una raíz muchas hojas". ZCS es fácil de implementar con las bibliotecas de aprendizaje profundo actuales; nuestra propia implementación es mediante la extensión del paquete DeepXDE. Realizamos un análisis de benchmark completo y varios estudios de caso, entrenando redes DeepONet informadas por física para resolver ecuaciones diferenciales parciales (PDEs) sin datos. Los resultados muestran que ZCS ha reducido consumo de memoria de GPU y tiempo de pared de entrenamiento en orden de magnitud, con los ahorros aumentando con el tamaño del problema (es decir, el número de funciones, el número de puntos y el orden de PDE). Como una optimización de bajo nivel, ZCS no impone restricciones en los datos, la física (PDEs) ni la arquitectura de la red y no compromete los resultados de entrenamiento en ningún aspecto.

Optimal Cost Constrained Adversarial Attacks For Multiple Agent Systems

  • paper_url: http://arxiv.org/abs/2311.00859
  • repo_url: None
  • paper_authors: Ziqing Lu, Guanlin Liu, Lifeng Cai, Weiyu Xu
  • for: 这篇论文主要关注于实现多源攻击者在多智能系统中实现最佳攻击策略。
  • methods: 本论文提出一种基于内步骤静态攻击资源分配优化和间步骤动态计划的优化方法,以实现多源攻击者在多智能系统中的最佳攻击。
  • results: 本论文的数据显示,提出的攻击策略可以对攻击的智能系统实现内步骤静态攻击资源分配优化,并可以对攻击的智能系统实现间步骤动态计划。这些攻击策略可以对攻击的智能系统实现重要的攻击减少。
    Abstract Finding optimal adversarial attack strategies is an important topic in reinforcement learning and the Markov decision process. Previous studies usually assume one all-knowing coordinator (attacker) for whom attacking different recipient (victim) agents incurs uniform costs. However, in reality, instead of using one limitless central attacker, the attacks often need to be performed by distributed attack agents. We formulate the problem of performing optimal adversarial agent-to-agent attacks using distributed attack agents, in which we impose distinct cost constraints on each different attacker-victim pair. We propose an optimal method integrating within-step static constrained attack-resource allocation optimization and between-step dynamic programming to achieve the optimal adversarial attack in a multi-agent system. Our numerical results show that the proposed attacks can significantly reduce the rewards received by the attacked agents.
    摘要 找到最佳反对攻击策略是在强化学习和马克福德决策过程中非常重要的主题。先前的研究通常假设一个全知的协调者(攻击者),对不同的接收者(受害者)机器人进行攻击,卷入的成本均为一个常数。然而,在现实中,攻击通常需要由分布式的攻击者进行,而不是一个无限的中央攻击者。我们将在分布式攻击者中表述最佳反对攻击策略,并在每个不同攻击者-受害者对中强制实施不同的成本限制。我们提出一种折衔的方法,将在每步骤内的静态受限攻击资源分配优化与每步骤之间的动态规划相结合,以实现最佳反对攻击策略。我们的数据分析结果表明,我们的攻击方法可以在多机器人系统中很大程度地减少被攻击者的奖励。

A Multi-Agent Reinforcement Learning Framework for Evaluating the U.S. Ending the HIV Epidemic Plan

  • paper_url: http://arxiv.org/abs/2311.00855
  • repo_url: None
  • paper_authors: Dinesh Sharma, Ankit Shah, Chaitra Gopalappa
  • for: 这个论文的目的是为了帮助决策公共卫生政策,具体来说是通过多智能型强化学习(MARL)模型来分析和优化感染人类免疫缺乏病毒(HIV)的治疗和预防措施,以帮助降低HIV感染新增 casos。
  • methods: 这篇论文使用了多智能型强化学习(MARL)模型,这种模型可以考虑到不同地区之间的流行病学交互作用,并且可以为每个地区分析和优化HIV的治疖和预防措施。
  • results: 实验分析表明,使用MARL模型可以生成与单个智能型强化学习(RL)模型不同的优化策略,这表明了不同地区之间的流行病学交互作用的影响,并且提供了一种可靠的方法来分析和优化HIV的治疖和预防措施。
    Abstract Human immunodeficiency virus (HIV) is a major public health concern in the United States, with about 1.2 million people living with HIV and 35,000 newly infected each year. There are considerable geographical disparities in HIV burden and care access across the U.S. The 2019 Ending the HIV Epidemic (EHE) initiative aims to reduce new infections by 90% by 2030, by improving coverage of diagnoses, treatment, and prevention interventions and prioritizing jurisdictions with high HIV prevalence. Identifying optimal scale-up of intervention combinations will help inform resource allocation. Existing HIV decision analytic models either evaluate specific cities or the overall national population, thus overlooking jurisdictional interactions or differences. In this paper, we propose a multi-agent reinforcement learning (MARL) model, that enables jurisdiction-specific decision analyses but in an environment with cross-jurisdictional epidemiological interactions. In experimental analyses, conducted on jurisdictions within California and Florida, optimal policies from MARL were significantly different than those generated from single-agent RL, highlighting the influence of jurisdictional variations and interactions. By using comprehensive modeling of HIV and formulations of state space, action space, and reward functions, this work helps demonstrate the strengths and applicability of MARL for informing public health policies, and provides a framework for expanding to the national-level to inform the EHE.
    摘要 人体免疫缺陷病毒(HIV)是美国公共卫生中的一个重要问题,约有120万人患有HIV,每年新感染35000人。美国各地有较大的HIV荷重和护理访问差异。2019年的结束HIV疫苗计划(EHE)目标是在2030年减少新感染90%,通过提高诊断、治疗和预防 intervención的覆盖率,并优先级高HIV感染地区。确定最佳扩大 intervención的组合将有助于资源分配。现有的HIV决策分析模型 either评估特定城市或整个国家人口,因此忽视了地区交互或差异。在这篇论文中,我们提出了多代理人学习(MARL)模型,允许地区特定的决策分析,但在具有跨地区疫学交互的环境中进行。在加利福尼亚和佛罗里达的实验分析中,MARL优化的策略与单代理人学习生成的策略有显著差异,这 highlights了地区差异和交互的影响。通过全面地模型HIV和形式状态、行动空间和奖励函数,这种工作帮助表明MARL的优势和适用性,并提供了扩展到国家水平以 inform EHE的框架。

healthAIChain: Improving security and safety using Blockchain Technology applications in AI-based healthcare systems

  • paper_url: http://arxiv.org/abs/2311.00842
  • repo_url: None
  • paper_authors: Naresh Kshetri, James Hutson, Revathy G
  • for: 这篇论文旨在探讨使用区块链技术来提高医疗系统的安全性和可靠性,以及在医疗和医疗相关领域中应用区块链技术的可能性。
  • methods: 本论文使用了区块链技术来解决医疗系统中的安全性和可靠性问题,并且对区块链技术在医疗系统中的应用进行了评估和分析。
  • results: 研究表明,使用区块链技术可以提高医疗系统的安全性和可靠性,同时也可以提高医疗系统的性能和可扩展性。此外,本论文还提出了一种基于AI技术的医疗区块链模型(healthAIChain),以提高患者数据的安全性和可靠性。
    Abstract Blockchain as a digital ledger for keeping records of digital transactions and other information, it is secure and decentralized technology. The globally growing number of digital population every day possesses a significant threat to online data including the medical and patients data. After bitcoin, blockchain technology has emerged into a general-purpose technology with applications in medical industries and healthcare. Blockchain can promote highly configurable openness while retaining the highest security standards for critical data of medical patients. Referred to as distributed record keeping for healthcare systems which makes digital assets unalterable and transparent via a cryptographic hash and decentralized network. The study delves into the security and safety improvement associated with implementing blockchain in AI-based healthcare systems. Blockchain-enabled AI tackles the existing issues related to security, performance efficiencies, and safety in healthcare systems. We have also examined the Artificial Intelligence in healthcare and medical industry, potential areas, open questions concerning the blockchain in healthcare systems. Finally, the article proposed an AI-based healthcare blockchain model (healthAIChain) to improve patients data and security.
    摘要 链情为数字日志的保存记录数字交易和其他信息,它是一种安全和分散的技术。全球每天增长的数字人口对于在线数据,包括医疗和患者数据,具有重要的威胁。自比特币以后,区块链技术在医疗领域和卫生保健领域得到应用。区块链可以实现高度可配置的开放性,同时保持最高的安全标准 для医疗患者的敏感数据。被称为分布式记录系统,使得数字资产不可改变和透明,通过 крипτографиic Hash和分散网络。这篇文章研究了在区块链技术应用于人工智能基础设施医疗系统中的安全性和可靠性问题。此外,文章还考虑了人工智能在医疗和医疗保健领域的潜在领域和开放问题。最后,文章提出了一种基于人工智能的医疗链模型(健康AI链),以提高患者数据和安全性。

Constant-time Motion Planning with Anytime Refinement for Manipulation

  • paper_url: http://arxiv.org/abs/2311.00837
  • repo_url: None
  • paper_authors: Itamar Mishani, Hayden Feddock, Maxim Likhachev
  • for: 这个论文是为了提高机器人抓取系统的自主性和可靠性而写的。
  • methods: 这个论文使用了一种常数时间动态规划(CTMP)算法,并将其与任何时间修正算法结合使用。
  • results: 该方法可以快速生成初始解决方案,并在分配的时间预算内进行不断修正,以达到一个平衡点, simultanously guaranteeing the ability to generate motion plans within a user-defined time bound.
    Abstract Robotic manipulators are essential for future autonomous systems, yet limited trust in their autonomy has confined them to rigid, task-specific systems. The intricate configuration space of manipulators, coupled with the challenges of obstacle avoidance and constraint satisfaction, often makes motion planning the bottleneck for achieving reliable and adaptable autonomy. Recently, a class of constant-time motion planners (CTMP) was introduced. These planners employ a preprocessing phase to compute data structures that enable online planning provably guarantee the ability to generate motion plans, potentially sub-optimal, within a user defined time bound. This framework has been demonstrated to be effective in a number of time-critical tasks. However, robotic systems often have more time allotted for planning than the online portion of CTMP requires, time that can be used to improve the solution. To this end, we propose an anytime refinement approach that works in combination with CTMP algorithms. Our proposed framework, as it operates as a constant time algorithm, rapidly generates an initial solution within a user-defined time threshold. Furthermore, functioning as an anytime algorithm, it iteratively refines the solution's quality within the allocated time budget. This enables our approach to strike a balance between guaranteed fast plan generation and the pursuit of optimization over time. We support our approach by elucidating its analytical properties, showing the convergence of the anytime component towards optimal solutions. Additionally, we provide empirical validation through simulation and real-world demonstrations on a 6 degree-of-freedom robot manipulator, applied to an assembly domain.
    摘要 机器人 manipulate 是未来自主系统的关键组件,但它们的自主性受限,通常只能用于固定的任务特定系统。 manipulate 的复杂配置空间,以及避免障碍物和约束满足的挑战,通常使动作规划成为自主系统的瓶颈,阻碍它们实现可靠和适应的自主性。 recent 年,一种叫做常数时间动作规划器(CTMP)的新类型的动作规划器被引入。这种规划器在先期处理阶段计算出数据结构,以在线规划时保证能够在用户定义的时间上限内生成动作计划,可能不optimal。这个框架在许多时间敏感任务中得到证明。然而,机器人系统经常有更多的时间用于规划,而不是 CTMP 在线部分所需的时间。为此,我们提出了一种任何时间精度增强方法,它在 CTMP 算法的基础上运行,快速生成用户定义的时间上限内的初始解决方案。此外,作为任何时间算法,它会逐渐改进解决方案的质量,在分配的时间预算内。这使我们的方法能够协调快速生成的解决方案和时间的优化。我们支持我们的方法通过分析性质的证明,显示了任何时间组件的收敛性,以及在实际示范和实验中的验证。Note: Please note that the translation is done using a machine translation tool, and may not be perfect or idiomatic.

Beyond Still Images: Robust Multi-Stream Spatiotemporal Networks

  • paper_url: http://arxiv.org/abs/2311.00800
  • repo_url: None
  • paper_authors: AmirHosein Fadaei, Mohammad-Reza A. Dehaqani
  • for: 研究自然视觉的一种特点是对输入变化的抵抗能力,从而生成不变的环境表示。在深度神经网络中,certain forms of spatial input variation can cause significant changes in the representations of video content.
  • methods: 我们采用了一种简单的多流模型,以探索其在面对空间和时间变化时的抗变化能力。我们在训练时包含视频和时间流,以便在视频理解任务中提高准确率和map值。
  • results: 结果表明,在训练时包含视频和时间流可以降低图像和视频理解任务中的准确率和map值下降,相对下降1.36%和3.14%。
    Abstract A defining characteristic of natural vision is its ability to withstand a variety of input alterations, resulting in the creation of an invariant representation of the surroundings. While convolutional neural networks exhibit resilience to certain forms of spatial input variation, modifications in the spatial and temporal aspects can significantly affect the representations of video content in deep neural networks. Inspired by the resilience of natural vision to input variations, we employ a simple multi-stream model to explore its potential to address spatiotemporal changes by including temporal features. Our primary goal is to introduce a video-trained model and evaluate its robustness to diverse image and video inputs, with a particular focus on exploring the role of temporal features in invariant recognition. Results show that including videos and the temporal stream during training mitigates the decline in accuracy and mAP in image and video understanding tasks by 1.36% and 3.14%, respectively.
    摘要

Tipping Points of Evolving Epidemiological Networks: Machine Learning-Assisted, Data-Driven Effective Modeling

  • paper_url: http://arxiv.org/abs/2311.00797
  • repo_url: None
  • paper_authors: Nikolaos Evangelou, Tianqi Cui, Juan M. Bello-Rivas, Alexei Makeev, Ioannis G. Kevrekidis
  • For: The paper studies the tipping point collective dynamics of an adaptive susceptible-infected-susceptible (SIS) epidemiological network using a data-driven, machine learning-assisted approach.* Methods: The paper identifies a parameter-dependent effective stochastic differential equation (eSDE) in terms of physically meaningful coarse mean-field variables through a deep-learning ResNet architecture inspired by numerical stochastic integrators. The paper also constructs an approximate effective bifurcation diagram based on the identified drift term of the eSDE and compares it with the mean-field SIS model bifurcation diagram.* Results: The paper observes a subcritical Hopf bifurcation in the evolving network’s effective SIS dynamics, which causes the tipping point behavior and leads to large amplitude collective oscillations that spontaneously arise from the neighborhood of a (noisy) stationary state. The paper studies the statistics of these rare events using repeated brute force simulations and established mathematical/computational tools, and demonstrates that the collective SDE can also be identified and the rare events computations performed in terms of data-driven coarse observables obtained through manifold learning techniques, such as Diffusion Maps.
    Abstract We study the tipping point collective dynamics of an adaptive susceptible-infected-susceptible (SIS) epidemiological network in a data-driven, machine learning-assisted manner. We identify a parameter-dependent effective stochastic differential equation (eSDE) in terms of physically meaningful coarse mean-field variables through a deep-learning ResNet architecture inspired by numerical stochastic integrators. We construct an approximate effective bifurcation diagram based on the identified drift term of the eSDE and contrast it with the mean-field SIS model bifurcation diagram. We observe a subcritical Hopf bifurcation in the evolving network's effective SIS dynamics, that causes the tipping point behavior; this takes the form of large amplitude collective oscillations that spontaneously -- yet rarely -- arise from the neighborhood of a (noisy) stationary state. We study the statistics of these rare events both through repeated brute force simulations and by using established mathematical/computational tools exploiting the right-hand-side of the identified SDE. We demonstrate that such a collective SDE can also be identified (and the rare events computations also performed) in terms of data-driven coarse observables, obtained here via manifold learning techniques, in particular Diffusion Maps. The workflow of our study is straightforwardly applicable to other complex dynamics problems exhibiting tipping point dynamics.
    摘要 我们研究一个自适应感染者-病人-自适应感染者(SIS)流行病学网络的集合动力学在数据驱动、机器学习协助下进行研究。我们通过深度学习ResNet架构,从数值随机 интеграル中灵感,对 Physically meaningful 的宽泛平均变量进行定义,并construct了一个approximate的效果枢轴图。我们比较了这个枢轴图与传统的SIS模型的分化图,发现在流行病学网络的有效SIS动力学中存在一个低于极限的Hopf分化。这种分化导致了 collective 振荡的出现,这些振荡会在静态状态附近自发生,并且具有大 amplitudo 和罕见的特点。我们通过重复的粗糙 simulate 和使用已知的数学/计算工具,研究这些罕见事件的统计特性。我们还示出,这种集合SDE可以通过数据驱动的粗糙观察量来定义(并且进行罕见事件计算),例如通过抽象学习技术,特别是Diffusion Maps。我们的工作流程可以 straightforwardly 应用于其他复杂动力学问题,例如其他的tipping point动力学问题。

SAGE: Smart home Agent with Grounded Execution

  • paper_url: http://arxiv.org/abs/2311.00772
  • repo_url: None
  • paper_authors: Dmitriy Rivkin, Francois Hogan, Amal Feriani, Abhisek Konar, Adam Sigal, Steve Liu, Greg Dudek
  • for: 这篇论文旨在提高智能家居助手的灵活性,以更好地满足用户需求。
  • methods: 该框架使用自然语言处理技术和机器学习模型,以自动推理用户需求和设备状态,并通过对设备API文档进行探索和自动代码生成来实现智能家居任务。
  • results: 在43个智能家居任务中,SAGE取得了23个成功,舒过现有的LLM基eline(5/43)。
    Abstract This article introduces SAGE (Smart home Agent with Grounded Execution), a framework designed to maximize the flexibility of smart home assistants by replacing manually-defined inference logic with an LLM-powered autonomous agent system. SAGE integrates information about user preferences, device states, and external factors (such as weather and TV schedules) through the orchestration of a collection of tools. SAGE's capabilities include learning user preferences from natural-language utterances, interacting with devices by reading their API documentation, writing code to continuously monitor devices, and understanding natural device references. To evaluate SAGE, we develop a benchmark of 43 highly challenging smart home tasks, where SAGE successfully achieves 23 tasks, significantly outperforming existing LLM-enabled baselines (5/43).
    摘要 Note:* "智能家居代理人" (Smart Home Agent) is a simplified Chinese term that refers to a software agent that can perform tasks and make decisions on behalf of a user in a smart home environment.* "地面执行" (Grounded Execution) refers to the ability of the agent to execute tasks in the real world, by interacting with physical devices and sensors.* "LLM-powered" means that the agent is powered by a large language model (LLM), which allows it to understand and generate natural language, and to learn from its experiences.* "natural-language utterances" refers to the way users communicate with the agent, using natural language to express their preferences and requests.* "device states" refers to the current status of the devices in the smart home environment, such as whether a light is on or off.* "external factors" refers to information that is not specific to the smart home environment, such as the weather or TV schedules.* "orchestration of a collection of tools" refers to the way the agent uses a variety of tools and techniques to perform tasks and achieve its goals.* "benchmark of 43 highly challenging smart home tasks" refers to a set of tasks that are difficult and diverse, and that test the capabilities of the agent in different ways.* "outperforming existing LLM-enabled baselines" means that the agent performs better than other agents that have been trained on similar tasks and data.

Hand Gesture Classification on Praxis Dataset: Trading Accuracy for Expense

  • paper_url: http://arxiv.org/abs/2311.00767
  • repo_url: None
  • paper_authors: Rahat Islam, Kenneth Lai, Svetlana Yanushkevich
  • for: 这种研究旨在开发一种高效、准确、便宜的 cortical pathology 诊断方法,用于多种医疗应用。
  • methods: 该研究使用了RGB-Depth感知器记录的 ‘skeletal’ 数据,从Praxis数据集中提取了体 JOINT 坐标数据,并使用了窗口技术和深度学习架构,如RNN和LSTM,进行手势识别。
  • results: 该研究实现了使用体 JOINT 数据进行手势识别,达到了70.8%的总准确率,并且通过分析时间序列数据使用LSTM进行手势识别,达到了74.3%和67.3%的手势识别率。
    Abstract In this paper, we investigate hand gesture classifiers that rely upon the abstracted 'skeletal' data recorded using the RGB-Depth sensor. We focus on 'skeletal' data represented by the body joint coordinates, from the Praxis dataset. The PRAXIS dataset contains recordings of patients with cortical pathologies such as Alzheimer's disease, performing a Praxis test under the direction of a clinician. In this paper, we propose hand gesture classifiers that are more effective with the PRAXIS dataset than previously proposed models. Body joint data offers a compressed form of data that can be analyzed specifically for hand gesture recognition. Using a combination of windowing techniques with deep learning architecture such as a Recurrent Neural Network (RNN), we achieved an overall accuracy of 70.8% using only body joint data. In addition, we investigated a long-short-term-memory (LSTM) to extract and analyze the movement of the joints through time to recognize the hand gestures being performed and achieved a gesture recognition rate of 74.3% and 67.3% for static and dynamic gestures, respectively. The proposed approach contributed to the task of developing an automated, accurate, and inexpensive approach to diagnosing cortical pathologies for multiple healthcare applications.
    摘要 在这篇论文中,我们研究了基于RGB-深度传感器记录的抽象 '骨架' 数据的手势识别器。我们专注于Praxis数据集中的体 JOINT坐标数据。Praxis数据集包含由Alzheimer病患者在临床医生指导下进行的Praxis测试记录。在这篇论文中,我们提议了与Praxis数据集更有效的手势识别器。 body JOINT数据提供了一种压缩形式的数据,可以专门对手势识别进行分析。通过窗口技术和深度学习架构如回归神经网络(RNN),我们实现了使用只body JOINT数据的总准确率为70.8%。此外,我们还使用循环神经网络(LSTM)来抽取和分析关节的运动,以识别手势的执行,并实现了手势识别率为74.3%和67.3% для静止和动态手势,分别。我们的方法对于诊断 cortical pathology 的多种医疗应用做出了贡献。

Learning to Design and Use Tools for Robotic Manipulation

  • paper_url: http://arxiv.org/abs/2311.00754
  • repo_url: None
  • paper_authors: Ziang Liu, Stephen Tian, Michelle Guo, C. Karen Liu, Jiajun Wu
  • for: 本研究旨在设计一个可以快速实现多个目标的抓取机制,并且可以在实际环境中运行。
  • methods: 本研究使用深度学习来结合形式和控制的优化,实现设计一个可以实现多个目标的抓取机制。
  • results: 本研究在模拟的抓取任务中显示了更高的效率和可扩展性,并且可以在实际环境中运行。此外,研究还显示了可以通过将设计政策和控制政策分开来实现更好的可控性和可扩展性。
    Abstract When limited by their own morphologies, humans and some species of animals have the remarkable ability to use objects from the environment toward accomplishing otherwise impossible tasks. Robots might similarly unlock a range of additional capabilities through tool use. Recent techniques for jointly optimizing morphology and control via deep learning are effective at designing locomotion agents. But while outputting a single morphology makes sense for locomotion, manipulation involves a variety of strategies depending on the task goals at hand. A manipulation agent must be capable of rapidly prototyping specialized tools for different goals. Therefore, we propose learning a designer policy, rather than a single design. A designer policy is conditioned on task information and outputs a tool design that helps solve the task. A design-conditioned controller policy can then perform manipulation using these tools. In this work, we take a step towards this goal by introducing a reinforcement learning framework for jointly learning these policies. Through simulated manipulation tasks, we show that this framework is more sample efficient than prior methods in multi-goal or multi-variant settings, can perform zero-shot interpolation or fine-tuning to tackle previously unseen goals, and allows tradeoffs between the complexity of design and control policies under practical constraints. Finally, we deploy our learned policies onto a real robot. Please see our supplementary video and website at https://robotic-tool-design.github.io/ for visualizations.
    摘要 In this work, we introduce a reinforcement learning framework for jointly learning these policies. Through simulated manipulation tasks, we show that our framework is more sample efficient than prior methods in multi-goal or multi-variant settings, can perform zero-shot interpolation or fine-tuning to tackle previously unseen goals, and allows tradeoffs between the complexity of design and control policies under practical constraints. Finally, we deploy our learned policies onto a real robot, and provide visualizations on our supplementary website at .

Are These the Same Apple? Comparing Images Based on Object Intrinsics

  • paper_url: http://arxiv.org/abs/2311.00750
  • repo_url: https://github.com/s-tian/cute
  • paper_authors: Klemen Kotar, Stephen Tian, Hong-Xing Yu, Daniel L. K. Yamins, Jiajun Wu
  • for: measure image similarity purely based on intrinsic object properties that define object identity, especially for general object categories.
  • methods: combine deep features learned from contrastive self-supervised learning with foreground filtering.
  • results: a strong baseline that best measures intrinsic object-centric image similarity among current methods, and can aid in downstream applications such as acting as an analog for human subjects and improving generalizable re-identification.Here’s the Chinese text in the format you requested:
  • for: measure image similarity purely based on intrinsic object properties that define object identity, especially for general object categories.
  • methods: combine deep features learned from contrastive self-supervised learning with foreground filtering.
  • results: a strong baseline that best measures intrinsic object-centric image similarity among current methods, and can aid in downstream applications such as acting as an analog for human subjects and improving generalizable re-identification.
    Abstract The human visual system can effortlessly recognize an object under different extrinsic factors such as lighting, object poses, and background, yet current computer vision systems often struggle with these variations. An important step to understanding and improving artificial vision systems is to measure image similarity purely based on intrinsic object properties that define object identity. This problem has been studied in the computer vision literature as re-identification, though mostly restricted to specific object categories such as people and cars. We propose to extend it to general object categories, exploring an image similarity metric based on object intrinsics. To benchmark such measurements, we collect the Common paired objects Under differenT Extrinsics (CUTE) dataset of $18,000$ images of $180$ objects under different extrinsic factors such as lighting, poses, and imaging conditions. While existing methods such as LPIPS and CLIP scores do not measure object intrinsics well, we find that combining deep features learned from contrastive self-supervised learning with foreground filtering is a simple yet effective approach to approximating the similarity. We conduct an extensive survey of pre-trained features and foreground extraction methods to arrive at a strong baseline that best measures intrinsic object-centric image similarity among current methods. Finally, we demonstrate that our approach can aid in downstream applications such as acting as an analog for human subjects and improving generalizable re-identification. Please see our project website at https://s-tian.github.io/projects/cute/ for visualizations of the data and demos of our metric.
    摘要 人类视觉系统可以很容易地认出不同的外部因素(如照明、物体姿态和背景)下的物体,而现代计算机视觉系统却经常遇到这些变化的困难。为了更好地理解和改进人工视觉系统,我们需要测量图像相似性基于物体内部特征。这个问题在计算机视觉文献中已经被研究为重复识别,但主要局限于特定的物体类别,如人脸和汽车。我们提议扩展到普通的物体类别,研究基于物体内部特征的图像相似性度量。为了评估这种测量,我们收集了18,000张不同照明、姿态和捕捉条件的图像,组成了Common paired objects Under differenT Extrinsics(CUTE)数据集。现有的方法,如LPIPS和CLIP分数,不能很好地测量物体内部特征,但我们发现将深度特征通过对比自我supervised学习学习出来的特征与前景过滤结合是一种简单 yet effective的方法来估算图像相似性。我们进行了广泛的预训练特征和前景EXTRACTION方法的测试,以达到当前最强的基eline,可以最好地测量物体内部特征相似性。最后,我们示出了我们的方法可以在下游应用中提供人类参照和通用重复识别等功能。请参考我们项目网站https://s-tian.github.io/projects/cute/,可以查看数据的视觉化和我们度量的示例。

Unleashing the Creative Mind: Language Model As Hierarchical Policy For Improved Exploration on Challenging Problem Solving

  • paper_url: http://arxiv.org/abs/2311.00694
  • repo_url: https://github.com/lz1oceani/llm-as-hierarchical-policy
  • paper_authors: Zhan Ling, Yunhao Fang, Xuanlin Li, Tongzhou Mu, Mingu Lee, Reza Pourreza, Roland Memisevic, Hao Su
  • for: This paper aims to improve the ability of large language models (LLMs) to solve challenging reasoning problems by framing LLMs as hierarchical policies via in-context learning.
  • methods: The proposed approach uses a visionary leader to propose multiple diverse high-level problem-solving tactics as hints, accompanied by a follower that executes detailed problem-solving processes following each of the high-level instructions. The follower samples multiple reasoning chains to tackle the problem, generating a solution group for each leader proposal.
  • results: The approach improves the final answer accuracy on challenging problems in the MATH dataset and produces meaningful and inspiring hints that enhance problem-solving strategy exploration.
    Abstract Large Language Models (LLMs) have achieved tremendous progress, yet they still often struggle with challenging reasoning problems. Current approaches address this challenge by sampling or searching detailed and low-level reasoning chains. However, these methods are still limited in their exploration capabilities, making it challenging for correct solutions to stand out in the huge solution space. In this work, we unleash LLMs' creative potential for exploring multiple diverse problem solving strategies by framing an LLM as a hierarchical policy via in-context learning. This policy comprises of a visionary leader that proposes multiple diverse high-level problem-solving tactics as hints, accompanied by a follower that executes detailed problem-solving processes following each of the high-level instruction. The follower uses each of the leader's directives as a guide and samples multiple reasoning chains to tackle the problem, generating a solution group for each leader proposal. Additionally, we propose an effective and efficient tournament-based approach to select among these explored solution groups to reach the final answer. Our approach produces meaningful and inspiring hints, enhances problem-solving strategy exploration, and improves the final answer accuracy on challenging problems in the MATH dataset. Code will be released at https://github.com/lz1oceani/LLM-As-Hierarchical-Policy.
    摘要

On Task-personalized Multimodal Few-shot Learning for Visually-rich Document Entity Retrieval

  • paper_url: http://arxiv.org/abs/2311.00693
  • repo_url: None
  • paper_authors: Jiayi Chen, Hanjun Dai, Bo Dai, Aidong Zhang, Wei Wei
  • for: 这 paper 的目的是研究在新的文档类型不断出现的情况下,如何实现Visual-rich document entity retrieval (VDER) task,特别是在每个任务中具有个性化的目标类型和不同文档中的实体出现方式。
  • methods: 这 paper 使用了任务意识度 meta-learning 框架,包括一个层次解码器 (HC) 和一种对比学习策略 (ContrastProtoNet),以实现效果的任务个性化。
  • results: эксперименталь结果表明,这些方法可以大幅提高流行的 meta-learning 基eline 的稳定性。
    Abstract Visually-rich document entity retrieval (VDER), which extracts key information (e.g. date, address) from document images like invoices and receipts, has become an important topic in industrial NLP applications. The emergence of new document types at a constant pace, each with its unique entity types, presents a unique challenge: many documents contain unseen entity types that occur only a couple of times. Addressing this challenge requires models to have the ability of learning entities in a few-shot manner. However, prior works for Few-shot VDER mainly address the problem at the document level with a predefined global entity space, which doesn't account for the entity-level few-shot scenario: target entity types are locally personalized by each task and entity occurrences vary significantly among documents. To address this unexplored scenario, this paper studies a novel entity-level few-shot VDER task. The challenges lie in the uniqueness of the label space for each task and the increased complexity of out-of-distribution (OOD) contents. To tackle this novel task, we present a task-aware meta-learning based framework, with a central focus on achieving effective task personalization that distinguishes between in-task and out-of-task distribution. Specifically, we adopt a hierarchical decoder (HC) and employ contrastive learning (ContrastProtoNet) to achieve this goal. Furthermore, we introduce a new dataset, FewVEX, to boost future research in the field of entity-level few-shot VDER. Experimental results demonstrate our approaches significantly improve the robustness of popular meta-learning baselines.
    摘要 带有视觉 ric hdocument entity retrieve (VDER) 在工业自然语言处理应用中变得非常重要。新的文档类型随时间的推移而出现,每种文档都有独特的实体类型,这种情况提出了一个挑战:许多文档中的实体类型是未经见过的。为了解决这个挑战,模型需要能够在几次training中学习实体。然而,先前的几 shot VDER 研究主要关注了文档水平的问题,使用预先定义的全局实体空间,而不考虑实体级几 shot scenario:目标实体类型在每个任务中是本地个性化的,而且实体出现在文档中的差异很大。为了解决这个未探索的场景,本文研究了一个新的实体级几 shot VDER 任务。挑战在于标签空间的独特性和文档中实体出现的不同性。为了解决这个问题,我们提出了一个任务意识度meta-学习基础框架,强调实现效果task个性化,以便分辨在任务中和out-of-task分布中的差异。具体来说,我们采用层次解码器(HC)和对比学习(ContrastProtoNet)来实现这一目标。此外,我们还提供了一个新的数据集,FewVEX,以便未来的研究人员在实体级几 shot VDER 领域进行更多的研究。实验结果表明,我们的方法可以显著提高流行的meta-学习基础模型的Robustness。

Improving Interpersonal Communication by Simulating Audiences with Language Models

  • paper_url: http://arxiv.org/abs/2311.00687
  • repo_url: https://github.com/theryanl/egs
  • paper_authors: Ryan Liu, Howard Yen, Raja Marjieh, Thomas L. Griffiths, Ranjay Krishna
  • for: 提高目标实现效率的交流和决策过程
  • methods: 利用大语言模型(LLM)模拟来帮助人们更好地交流和决策
  • results: 在8个场景中,EGS框架选择的候选者和建议得到了人类评分员的偏好,而且在5个场景中,观众模拟得到了人类评分员的一致。此外,EGS框架在实际用户上 Forum 中应用也得到了良好的效果。
    Abstract How do we communicate with others to achieve our goals? We use our prior experience or advice from others, or construct a candidate utterance by predicting how it will be received. However, our experiences are limited and biased, and reasoning about potential outcomes can be difficult and cognitively challenging. In this paper, we explore how we can leverage Large Language Model (LLM) simulations to help us communicate better. We propose the Explore-Generate-Simulate (EGS) framework, which takes as input any scenario where an individual is communicating to an audience with a goal they want to achieve. EGS (1) explores the solution space by producing a diverse set of advice relevant to the scenario, (2) generates communication candidates conditioned on subsets of the advice, and (3) simulates the reactions from various audiences to determine both the best candidate and advice to use. We evaluate the framework on eight scenarios spanning the ten fundamental processes of interpersonal communication. For each scenario, we collect a dataset of human evaluations across candidates and baselines, and showcase that our framework's chosen candidate is preferred over popular generation mechanisms including Chain-of-Thought. We also find that audience simulations achieve reasonably high agreement with human raters across 5 of the 8 scenarios. Finally, we demonstrate the generality of our framework by applying it to real-world scenarios described by users on web forums. Through evaluations and demonstrations, we show that EGS enhances the effectiveness and outcomes of goal-oriented communication across a variety of situations, thus opening up new possibilities for the application of large language models in revolutionizing communication and decision-making processes.
    摘要 如何与他人沟通以达到我们的目标呢?我们可以利用我们的前经验或他人的建议,或者构建一个候选句子,预测它会如何被接受。然而,我们的经验是有限和偏袋的,理解可能的结果是心智具有挑战性的。在这篇论文中,我们探讨如何通过大语言模型(LLM)模拟来改善我们的沟通。我们提出了探索-生成-模拟(EGS)框架,该框架输入任何沟通场景,目标是与听众沟通。EGS(1)探索解决方案空间,生成各种相关于场景的建议集,(2)基于这些建议生成句子候选,并(3)通过不同听众的反应来确定最佳候选和建议。我们对八个场景进行了评估,每个场景都有人类评估者的数据集和基线,并显示了我们的框架选择的候选 exceeds Chain-of-Thought 的流行生成机制。此外,我们发现了观众模拟可以与人类评估者达到相当高的一致性在五个场景中。最后,我们通过应用它到网络论坛上的实际场景来证明框架的一般性。通过评估和示例,我们表明了EGS可以提高目标沟通的效iveness和结果,因此开启了大语言模型在沟通和决策过程中的新可能性。

Emergence of Collective Open-Ended Exploration from Decentralized Meta-Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.00651
  • repo_url: None
  • paper_authors: Richard Bornemann, Gautier Hamon, Eleni Nisioti, Clément Moulin-Frier
  • for: 本研究的目的是研究多个智能体在开放的任务分布上自适应学习并实现集体探索策略。
  • methods: 本研究使用了分布式培aunder reinforcement learning和开放任务分布来训练多个智能体。
  • results: 研究发现,由多个智能体自适应学习的策略可以在测试时face novel objects时示出强大的泛化能力,并且在没有强制合作的情况下,智能体们还能学习集体探索策略,解决 novel tasks。此外,智能体们学习的集体探索策略还可以在开放任务设定下扩展到更深的任务树。
    Abstract Recent works have proven that intricate cooperative behaviors can emerge in agents trained using meta reinforcement learning on open ended task distributions using self-play. While the results are impressive, we argue that self-play and other centralized training techniques do not accurately reflect how general collective exploration strategies emerge in the natural world: through decentralized training and over an open-ended distribution of tasks. In this work we therefore investigate the emergence of collective exploration strategies, where several agents meta-learn independent recurrent policies on an open ended distribution of tasks. To this end we introduce a novel environment with an open ended procedurally generated task space which dynamically combines multiple subtasks sampled from five diverse task types to form a vast distribution of task trees. We show that decentralized agents trained in our environment exhibit strong generalization abilities when confronted with novel objects at test time. Additionally, despite never being forced to cooperate during training the agents learn collective exploration strategies which allow them to solve novel tasks never encountered during training. We further find that the agents learned collective exploration strategies extend to an open ended task setting, allowing them to solve task trees of twice the depth compared to the ones seen during training. Our open source code as well as videos of the agents can be found on our companion website.
    摘要 近期研究表明,通过meta reinforcement learning在开放式任务分布上训练Agent可以实现复杂的合作行为。虽然结果吸引人,但我们认为自我玩家和中央训练技术不准确反映了自然界中集体探索策略的发展:通过分布式训练和开放式任务分布来发展集体探索策略。因此,我们在这项工作中investigate集体探索策略的emergence,其中多个Agent通过独立的recurrent Policy来meta-learn一个开放式任务分布。为此,我们提出了一个新的环境,其中包含一个开放式、生成式任务空间,动态组合了多个从五种多样化任务类型中采样的子任务,形成了一个庞大的任务树分布。我们表明,在这个环境中训练的分布式Agent exhibit强大的泛化能力,并且在测试时面对新物体时,能够快速适应。此外,我们发现,训练时不 forced合作的Agent仍然可以学习集体探索策略,并且这些策略可以在开放式任务设置下进行扩展,解决 novel task 不seen during training。我们的开源代码以及视频可以在我们的伙伴网站上找到。

FAIRLABEL: Correcting Bias in Labels

  • paper_url: http://arxiv.org/abs/2311.00638
  • repo_url: None
  • paper_authors: Srinivasan H Sengamedu, Hien Pham
  • for: 该论文目的是检测和修正机器学习模型中的偏见。
  • methods: 该论文使用的方法是FAIRLABEL算法,该算法可以检测和修正labels中的偏见,以降低模型对各个组的不同影响。
  • results: 该论文的结果显示,FAIRLABEL算法可以有效地检测和修正偏见,在synthetic dataset上的检测精度为86.7%,比基eline模型高出14.8%。此外,在UC Irvine Adult、German Credit Risk和Compas等数据集上,FAIRLABEL算法可以降低Disparate Impact Ratio,最高提高54.2%。
    Abstract There are several algorithms for measuring fairness of ML models. A fundamental assumption in these approaches is that the ground truth is fair or unbiased. In real-world datasets, however, the ground truth often contains data that is a result of historical and societal biases and discrimination. Models trained on these datasets will inherit and propagate the biases to the model outputs. We propose FAIRLABEL, an algorithm which detects and corrects biases in labels. The goal of FAIRLABELis to reduce the Disparate Impact (DI) across groups while maintaining high accuracy in predictions. We propose metrics to measure the quality of bias correction and validate FAIRLABEL on synthetic datasets and show that the label correction is correct 86.7% of the time vs. 71.9% for a baseline model. We also apply FAIRLABEL on benchmark datasets such as UCI Adult, German Credit Risk, and Compas datasets and show that the Disparate Impact Ratio increases by as much as 54.2%.
    摘要 有几种算法可以测量机器学习模型的公平性。这些方法的基本假设是地面真实不偏袋。然而,在实际数据集中,地面经常包含历史和社会偏袋和歧视的数据,模型在这些数据集上训练时会继承和传播这些偏袋。我们提出了 FAIRLABEL 算法,可以检测和修正标签中的偏袋。FAIRLABEL 的目标是降低不同群体之间的不同影响(DI),同时保持预测准确率高。我们提出了用于衡量偏袋修正质量的指标,并验证 FAIRLABEL 在模拟数据集上的性能,显示了86.7%的正确率vs. 71.9%的基eline模型。我们还应用 FAIRLABEL 于常见的 UCI 成人、德国借款风险和 Compas 数据集,并显示了 Disparate Impact Ratio 的提高,最高达54.2%。

A Bi-level Framework for Traffic Accident Duration Prediction: Leveraging Weather and Road Condition Data within a Practical Optimum Pipeline

  • paper_url: http://arxiv.org/abs/2311.00634
  • repo_url: None
  • paper_authors: Rafat Tabassum Sukonna, Soham Irtiza Swapnil
  • for: 预测交通事故持续时间的挑战是带有随机性的,因为交通事故的持续时间往往受到多种因素的影响,如事故严重程度、路面条件、天气等。本研究旨在检验是否可以使用交通事故数据库中的静态特征来预测事故持续时间,不使用事故上下文信息数据,如事故严重程度和文本描述。
  • methods: 本研究使用了多种机器学习模型来预测事故的短期和长期影响,并采用了二元方法来准确地预测事故持续时间。使用 Random Forest 分类模型可以在83%的正确率下分类事故的短期和长期影响,而 LightGBM 回归模型在 Mean Average Error (MAE) 和 Root Mean Squared Error (RMSE) 指标上表现较好,分别为 26.15、13.3、32.91和28.91。
  • results: 研究结果显示,只使用交通事故数据库中的静态特征可以准确地预测事故持续时间。使用最佳的分类和回归模型,我们构建了一个端到端的预测管道,并与之前的研究结果相符。 SHAP 值分析表明,天气条件、风速和风寒是决定事故持续时间的最重要因素。
    Abstract Due to the stochastic nature of events, predicting the duration of a traffic incident presents a formidable challenge. Accurate duration estimation can result in substantial advantages for commuters in selecting optimal routes and for traffic management personnel in addressing non-recurring congestion issues. In this study, we gathered accident duration, road conditions, and meteorological data from a database of traffic accidents to check the feasibility of a traffic accident duration pipeline without accident contextual information data like accident severity and textual description. Multiple machine learning models were employed to predict whether an accident's impact on road traffic would be of a short-term or long-term nature, and then utilizing a bimodal approach the precise duration of the incident's effect was determined. Our binary classification random forest model distinguished between short-term and long-term effects with an 83% accuracy rate, while the LightGBM regression model outperformed other machine learning regression models with Mean Average Error (MAE) values of 26.15 and 13.3 and RMSE values of 32.91 and 28.91 for short and long-term accident duration prediction, respectively. Using the optimal classification and regression model identified in the preceding section, we then construct an end-to-end pipeline to incorporate the entire process. The results of both separate and combined approaches were comparable with previous works, which shows the applicability of only using static features for predicting traffic accident duration. The SHAP value analysis identified weather conditions, wind chill and wind speed as the most influential factors in determining the duration of an accident.
    摘要 因为事件的随机性,预测交通事故持续时间是一项具有挑战性的任务。 preciselly estimating the duration of a traffic accident can bring significant benefits to commuters in choosing the best routes and to traffic management personnel in addressing non-recurring congestion issues. 在这项研究中,我们从交通事故事件数据库中收集了事故持续时间、路面条件和天气数据,以检验是否可以建立交通事故持续时间管道,不使用事故Contextual information数据如事故严重程度和文本描述。 我们使用多种机器学习模型来预测事故对道路交通的影响是短期或长期的,然后使用二分类方法确定事故的具体持续时间。我们的二分类Random Forest模型可以准确地将事故的影响分为短期和长期两个类别,其中精度为83%。 LightGBM回归模型在机器学习回归模型中表现出色,其MAE值为26.15和13.3,RMSE值为32.91和28.91,分别用于短期和长期事故持续时间预测。 使用最佳的分类和回归模型,我们构建了一个端到端管道,并将整个过程包含在内。结果表明,只使用静态特征可以达到与之前的研究结果相同的精度。 SHAP值分析表明,天气条件、风速和风轻度是确定事故持续时间的最重要因素。

Loss Modeling for Multi-Annotator Datasets

  • paper_url: http://arxiv.org/abs/2311.00619
  • repo_url: https://github.com/molyswu/hand_detection
  • paper_authors: Uthman Jinadu, Jesse Annan, Shanshan Wen, Yi Ding
  • for: 提高 dataset 的公平性,即使使用大量数据 annotator 的注释。
  • methods: 使用 multitask learning 和 loss-based label correction 来学习更准确的多个注释者意见。
  • results: 可以清晰地分化同意和不同意的注释,并且在单个或多个注释者设置下提高预测性能。
    Abstract Accounting for the opinions of all annotators of a dataset is critical for fairness. However, when annotating large datasets, individual annotators will frequently provide thousands of ratings which can lead to fatigue. Additionally, these annotation processes can occur over multiple days which can lead to an inaccurate representation of an annotator's opinion over time. To combat this, we propose to learn a more accurate representation of diverse opinions by utilizing multitask learning in conjunction with loss-based label correction. We show that using our novel formulation, we can cleanly separate agreeing and disagreeing annotations. Furthermore, we demonstrate that this modification can improve prediction performance in a single or multi-annotator setting. Lastly, we show that this method remains robust to additional label noise that is applied to subjective data.
    摘要 accounting for all annotators' opinions is crucial for fairness, but when annotating large datasets, individual annotators may provide thousands of ratings, leading to fatigue. furthermore, the annotation process may take place over multiple days, which can result in inaccurate representation of an annotator's opinion over time. to address this issue, we propose using multitask learning in conjunction with loss-based label correction to learn a more accurate representation of diverse opinions. we show that our novel formulation can cleanly separate agreeing and disagreeing annotations, and improve prediction performance in a single or multi-annotator setting. additionally, we demonstrate that our method remains robust to additional label noise that is commonly found in subjective data.

Rethinking Variational Inference for Probabilistic Programs with Stochastic Support

  • paper_url: http://arxiv.org/abs/2311.00594
  • repo_url: https://github.com/treigerm/sdvi_neurips
  • paper_authors: Tim Reichelt, Luke Ong, Tom Rainforth
  • for: 这个论文是用于解决 probabilistic programs with stochastic support 中的变量抽象问题的新方法。
  • methods: 这个方法使用了分解程序into sub-programs with static support,然后自动建立每个子程序的独立子导数。
  • results: 这个方法可以提高变量抽象的性能,具体来说,可以更好地建立适当的变量家族,从而提高推断性能。
    Abstract We introduce Support Decomposition Variational Inference (SDVI), a new variational inference (VI) approach for probabilistic programs with stochastic support. Existing approaches to this problem rely on designing a single global variational guide on a variable-by-variable basis, while maintaining the stochastic control flow of the original program. SDVI instead breaks the program down into sub-programs with static support, before automatically building separate sub-guides for each. This decomposition significantly aids in the construction of suitable variational families, enabling, in turn, substantial improvements in inference performance.
    摘要 我们介绍Support Decomposition Variational Inference(SDVI),一种新的可能性计算(VI)方法,用于实际程序中的数据支持。现有的方法对这个问题是通过设计单一的全球可能性引导,并在变量基础上维护原始程序中的随机控制流。然而,SDVI将程序分解为子程序,并自动建立每个子程序的独立子引导。这个分解可以帮助建立适合的可能性家族,从而提高推论性能。

Coop: Memory is not a Commodity

  • paper_url: http://arxiv.org/abs/2311.00591
  • repo_url: None
  • paper_authors: Jianhao Zhang, Shihan Ma, Peihong Liu, Jinhui Yuan
  • for: 提高深度学习框架下限制内存预算下的神经网络训练效率
  • methods: 提出了一种基于窗口内存混合的tensor重新材料化策略,并提出了便宜的tensor分割和可重复在位进行进一步减少重新材料化成本
  • results: 对八种代表性的神经网络进行了实验,实验结果表明,Coop可以达到$2\times$的内存储存空间约束,并大幅减少计算开销、搜索延迟和内存散射问题 compared to状态则基elines。
    Abstract Tensor rematerialization allows the training of deep neural networks (DNNs) under limited memory budgets by checkpointing the models and recomputing the evicted tensors as needed. However, the existing tensor rematerialization techniques overlook the memory system in deep learning frameworks and implicitly assume that free memory blocks at different addresses are identical. Under this flawed assumption, discontiguous tensors are evicted, among which some are not used to allocate the new tensor. This leads to severe memory fragmentation and increases the cost of potential rematerializations. To address this issue, we propose to evict tensors within a sliding window to ensure all evictions are contiguous and are immediately used. Furthermore, we proposed cheap tensor partitioning and recomputable in-place to further reduce the rematerialization cost by optimizing the tensor allocation. We named our method Coop as it is a co-optimization of tensor allocation and tensor rematerialization. We evaluated Coop on eight representative DNNs. The experimental results demonstrate that Coop achieves up to $2\times$ memory saving and hugely reduces compute overhead, search latency, and memory fragmentation compared to the state-of-the-art baselines.
    摘要 tensor重新材料化可以在有限内存预算下训练深度神经网络(DNN),通过检查点模型并重新计算被踢出的张量来实现。然而,现有的张量重新材料化技术忽视深度学习框架中的内存系统,并且自然地假设不同地址上的免费内存块是相同的。基于这个错误的假设,深度神经网络中的张量会被踢出,其中一些张量并没有用于分配新的张量。这会导致内存散落严重,并使 potential rematerializations 的成本增加。为解决这个问题,我们提议在滑动窗口内踢出张量,以确保所有的踢出都是连续的,并且立即用于分配新的张量。此外,我们还提出了便宜的张量分配和可重复计算在位的方法,以进一步减少rematerialization成本。我们命名了我们的方法为Coop,因为它是张量分配和张量重新材料化的共优化。我们在八个代表性的深度神经网络上进行了实验。实验结果表明,Coop可以达到 $2\times$ 的内存减少和巨大减少计算开销、搜索延迟和内存散落比现状态之前的基elines。

Boosting Summarization with Normalizing Flows and Aggressive Training

  • paper_url: http://arxiv.org/abs/2311.00588
  • repo_url: https://github.com/yuyangstat/flowsum
  • paper_authors: Yu Yang, Xiaotong Shen
  • for: 这个论文是为了提出一种基于normalizing flows的Transformer-based摘要框架,以解决变量摘要中的两个主要挑战:latent representation中的 semantic information不足和训练过程中的 posterior collapse。
  • methods: 该方法使用normalizing flows来实现灵活的latent posterior模型,并提出了一种控制性的 alternate aggressive training(CAAT)策略和改进的门控制机制。
  • results: 实验结果表明,FlowSUM可以显著提高生成的摘要质量,并允许知识储存以最小化影响 inference时间。此外,论文还研究了normalizing flows中的 posterior collapse问题,并分析了摘要质量如何受到训练策略、门初始值、normalizing flows类型和数量的影响,为未来研究提供了有价值的信息。
    Abstract This paper presents FlowSUM, a normalizing flows-based variational encoder-decoder framework for Transformer-based summarization. Our approach tackles two primary challenges in variational summarization: insufficient semantic information in latent representations and posterior collapse during training. To address these challenges, we employ normalizing flows to enable flexible latent posterior modeling, and we propose a controlled alternate aggressive training (CAAT) strategy with an improved gate mechanism. Experimental results show that FlowSUM significantly enhances the quality of generated summaries and unleashes the potential for knowledge distillation with minimal impact on inference time. Furthermore, we investigate the issue of posterior collapse in normalizing flows and analyze how the summary quality is affected by the training strategy, gate initialization, and the type and number of normalizing flows used, offering valuable insights for future research.
    摘要

Minimally Modifying a Markov Game to Achieve Any Nash Equilibrium and Value

  • paper_url: http://arxiv.org/abs/2311.00582
  • repo_url: None
  • paper_authors: Young Wu, Jeremy McMahan, Yiding Chen, Yudong Chen, Xiaojin Zhu, Qiaomin Xie
  • for: 这个论文研究了游戏修改问题,即一位仁慈的游戏设计者或一位恶意对手修改游戏奖励函数,使得一个目标策略Profile变为游戏的唯一马歇尔完美 equilibrio,并且其价值在一定范围内。
  • methods: 该论文使用了policy profile的 Installation Set Theory 和Random Perturbation Algorithm来解决这个问题。
  • results: 论文提出了一种高效的修改计划,可以在near-optimal cost下使得目标策略Profile变为游戏的唯一马歇尔完美 equilibrio。
    Abstract We study the game modification problem, where a benevolent game designer or a malevolent adversary modifies the reward function of a zero-sum Markov game so that a target deterministic or stochastic policy profile becomes the unique Markov perfect Nash equilibrium and has a value within a target range, in a way that minimizes the modification cost. We characterize the set of policy profiles that can be installed as the unique equilibrium of some game, and establish sufficient and necessary conditions for successful installation. We propose an efficient algorithm, which solves a convex optimization problem with linear constraints and then performs random perturbation, to obtain a modification plan with a near-optimal cost.
    摘要 我们研究游戏修改问题,其中一位好心的游戏设计师或一位邪恶对手修改了游戏奖励函数,使得目标决策函数 Profile 变成游戏的唯一马克洛夫完美均衡,并且其价值在一定范围内,以最小化修改成本。我们描述了可以安装为游戏唯一均衡的策略Profile集,并提出了必要和 suficient condition для成功安装。我们还提出了一种高效的算法,它首先解决一个几何编制问题,然后通过随机干扰来获得一个近似优化成本的修改计划。

Can Foundation Models Watch, Talk and Guide You Step by Step to Make a Cake?

  • paper_url: http://arxiv.org/abs/2311.00738
  • repo_url: None
  • paper_authors: Yuwei Bao, Keunwoo Peter Yu, Yichi Zhang, Shane Storks, Itamar Bar-Yossef, Alexander De La Iglesia, Megan Su, Xiao Lin Zheng, Joyce Chai
  • for: 这个论文旨在开发一种可以提供 situational, personalized 任务指导的人工智能系统,以帮助人类完成多种任务。
  • methods: 这个论文使用了一个新的多modal benchmark dataset,Watch, Talk and Guide (WTaG),基于自然的人类用户和导师之间的互动。它还提出了两个任务:用户和环境理解,以及导师决策。 authors 利用了多种基础模型,以研究这些模型是否可以快速适应可见指导任务。
  • results: 数据evaluation结果表明,这些模型在某些情况下可以达到 fair 性能,但快速适应任务仍然是一大挑战。这个论文的benchmark和基elines将为未来的 situational task guidance 工作提供一个进程架构。
    Abstract Despite tremendous advances in AI, it remains a significant challenge to develop interactive task guidance systems that can offer situated, personalized guidance and assist humans in various tasks. These systems need to have a sophisticated understanding of the user as well as the environment, and make timely accurate decisions on when and what to say. To address this issue, we created a new multimodal benchmark dataset, Watch, Talk and Guide (WTaG) based on natural interaction between a human user and a human instructor. We further proposed two tasks: User and Environment Understanding, and Instructor Decision Making. We leveraged several foundation models to study to what extent these models can be quickly adapted to perceptually enabled task guidance. Our quantitative, qualitative, and human evaluation results show that these models can demonstrate fair performances in some cases with no task-specific training, but a fast and reliable adaptation remains a significant challenge. Our benchmark and baselines will provide a stepping stone for future work on situated task guidance.
    摘要 尽管人工智能技术有很大的进步,仍然是一项非常大的挑战,开发出可以提供协助人类完成各种任务的交互式任务指南系统。这些系统需要具备高度智能的用户和环境认知,并在时间和内容上做出准确的决策。为解决这个问题,我们创建了一个新的多模态基准数据集,Watch, Talk and Guide(WTaG),基于人类用户和人类导师之间的自然交互。我们还提出了两个任务:用户和环境理解,以及导师决策。我们利用了一些基础模型,以研究这些模型是否可以快速适应具有感知能力的任务指南。我们的量化、质量和人类评估结果表明,这些模型在某些情况下可以达到公平的性能,但快速和可靠的适应仍然是一项大的挑战。我们的基准和基线将为未来的协助任务指南做出贡献。

LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

  • paper_url: http://arxiv.org/abs/2311.00571
  • repo_url: None
  • paper_authors: Wei-Ge Chen, Irina Spiridonova, Jianwei Yang, Jianfeng Gao, Chunyuan Li
  • for: 这个论文旨在描述一种用于多模态人机交互的研究 прототип(LLaVA-Interactive),可以与人类用户进行多回交流,并将多模态用户输入与生成多模态响应。
  • methods: 这个系统使用了三种预制的AI模型的多模态技能,包括视觉聊天的LLaVA、图像分割的SEEM以及图像生成和修改的GLIGEN,无需进行额外的模型训练。
  • results: 论文描述了LLaVA-Interactive的开发,并对其在多种应用场景中的推荐进行了展示,以鼓励未来的多模态交互系统研究。
    Abstract LLaVA-Interactive is a research prototype for multimodal human-AI interaction. The system can have multi-turn dialogues with human users by taking multimodal user inputs and generating multimodal responses. Importantly, LLaVA-Interactive goes beyond language prompt, where visual prompt is enabled to align human intents in the interaction. The development of LLaVA-Interactive is extremely cost-efficient as the system combines three multimodal skills of pre-built AI models without additional model training: visual chat of LLaVA, image segmentation from SEEM, as well as image generation and editing from GLIGEN. A diverse set of application scenarios is presented to demonstrate the promises of LLaVA-Interactive and to inspire future research in multimodal interactive systems.
    摘要 LLaVA-Interactive 是一个研究原型,用于多模态人机交互。该系统可以与人类用户进行多轮对话,接受多模态用户输入并生成多模态回应。特别是,LLaVA-Interactive 超越语言提示,允许视觉提示对人类意图进行Alignment。该系统的开发非常经济,因为它将三种预构建 AI 模型结合使用,无需进行额外模型训练:视觉对话的 LLaVA,图像分割的 SEEM,以及图像生成和编辑的 GLIGEN。一组多样化的应用场景被示出,以证明 LLaVA-Interactive 的推荐和未来多模态交互系统的研究的可能性。

Detecting Visual Cues in the Intensive Care Unit and Association with Patient Clinical Status

  • paper_url: http://arxiv.org/abs/2311.00565
  • repo_url: None
  • paper_authors: Subhash Nerella, Ziyuan Guan, Andrea Davidson, Yuanfang Ren, Tezcan Baslanti, Brooke Armfield, Patrick Tighe, Azra Bihorac, Parisa Rashidi
  • for: 这研究旨在开发一种基于人工智能技术的评估工具,以帮助医疗工作者在护理科室中进行更加准确和细化的病人评估。
  • methods: 该研究使用了一种名为“面具损失计算”的新技术,以解决数据不均衡问题,并使用了SWINTransformer模型进行训练。
  • results: 研究发现,通过检测18种表情动作单元(AU),可以与病人的acuity状况、急性脑功能障碍和疼痛有 statistically significant 的相关性。SWINTransformer模型在测试集上达到了0.57的mean F1分数和0.89的mean准确率。
    Abstract Intensive Care Units (ICU) provide close supervision and continuous care to patients with life-threatening conditions. However, continuous patient assessment in the ICU is still limited due to time constraints and the workload on healthcare providers. Existing patient assessments in the ICU such as pain or mobility assessment are mostly sporadic and administered manually, thus introducing the potential for human errors. Developing Artificial intelligence (AI) tools that can augment human assessments in the ICU can be beneficial for providing more objective and granular monitoring capabilities. For example, capturing the variations in a patient's facial cues related to pain or agitation can help in adjusting pain-related medications or detecting agitation-inducing conditions such as delirium. Additionally, subtle changes in visual cues during or prior to adverse clinical events could potentially aid in continuous patient monitoring when combined with high-resolution physiological signals and Electronic Health Record (EHR) data. In this paper, we examined the association between visual cues and patient condition including acuity status, acute brain dysfunction, and pain. We leveraged our AU-ICU dataset with 107,064 frames collected in the ICU annotated with facial action units (AUs) labels by trained annotators. We developed a new "masked loss computation" technique that addresses the data imbalance problem by maximizing data resource utilization. We trained the model using our AU-ICU dataset in conjunction with three external datasets to detect 18 AUs. The SWIN Transformer model achieved 0.57 mean F1-score and 0.89 mean accuracy on the test set. Additionally, we performed AU inference on 634,054 frames to evaluate the association between facial AUs and clinically important patient conditions such as acuity status, acute brain dysfunction, and pain.
    摘要 医疗保健机构(ICU)为患有生命威胁的患者提供临密监测和不间断的护理,但是现有的患者评估在ICU仍然受到时间约束和医疗人员的工作负担的限制。现有的患者评估,如疼痛或 mobilité评估,都是间歇的并由人工进行,因此存在人类错误的潜在风险。通过开发人工智能(AI)工具可以帮助医疗人员更加 объекively和精细地监测患者的状况。例如,捕捉患者面部表达的变化,以帮助调整疼痛药物或检测刺激性情况如 delirio。此外,在或 перед不良临床事件发生时,通过高分辨率生物参数和电子医疗纪录(EHR)数据,可能发现细微的视觉表达变化,以帮助持续性监测患者。在这篇论文中,我们研究了面部表达和患者状况之间的关系,包括病情严重程度、脑部功能障碍和疼痛。我们利用我们的AU-ICU数据集,包括107,064帧在ICU中收集的患者面部表达,并由训练过的标注员进行了facial action units(AUs)标签。我们开发了一种“遮盖损失计算”技术,解决数据不均衡问题,以最大化数据资源使用。我们使用我们的AU-ICU数据集,并与三个外部数据集进行了模型训练,检测18种AUs。SWIN Transformer模型在测试集上取得了0.57的 mean F1-score和0.89的 mean accuracy。此外,我们对634,054帧的面部表达进行了AU推断,以评估面部表达和临床重要的患者状况,如病情严重程度、脑部功能障碍和疼痛。

Tackling the Abstraction and Reasoning Corpus (ARC) with Object-centric Models and the MDL Principle

  • paper_url: http://arxiv.org/abs/2311.00545
  • repo_url: https://github.com/sebferre/arc-mdl
  • paper_authors: Sébastien Ferré
  • for: 用于推动人工智能研究,创建更高水平的智能系统。
  • methods: 使用对象中心模型,与人类自然语言程序相似,并使用最小描述长度原则进行有效的搜索。
  • results: 解决了多种任务,学习出的模型与自然程序相似,并在不同领域中进行了扩展应用。
    Abstract The Abstraction and Reasoning Corpus (ARC) is a challenging benchmark, introduced to foster AI research towards human-level intelligence. It is a collection of unique tasks about generating colored grids, specified by a few examples only. In contrast to the transformation-based programs of existing work, we introduce object-centric models that are in line with the natural programs produced by humans. Our models can not only perform predictions, but also provide joint descriptions for input/output pairs. The Minimum Description Length (MDL) principle is used to efficiently search the large model space. A diverse range of tasks are solved, and the learned models are similar to the natural programs. We demonstrate the generality of our approach by applying it to a different domain.
    摘要 《抽象和理解集合(ARC)》是一个挑战性的标准集,旨在促进人工智能研究,以达到人类水平的智能。它包含一些唯一的任务,需要生成颜色grid,只需要几个示例来定义。与现有的变换基本的程序不同,我们引入了对象中心的模型,与人类生成的自然程序相符。我们的模型不仅可以进行预测,还可以提供输入/输出对的共同描述。使用最小描述长度(MDL)原理,我们有效地搜索大型模型空间。我们解决了多种任务,并且学习的模型与自然程序类似。我们示示了我们的方法的通用性,通过应用到不同领域。

The Development of LLMs for Embodied Navigation

  • paper_url: http://arxiv.org/abs/2311.00530
  • repo_url: https://github.com/rongtao-xu/awesome-llm-en
  • paper_authors: Jinzhou Lin, Han Gao, Rongtao Xu, Changwei Wang, Li Guo, Shibiao Xu
  • for: 本研究的目的是探讨Large Language Models(LLMs)与embodied intelligence的相互作用,尤其是在导航任务中。
  • methods: 本文使用了现有的state-of-the-art模型和研究方法,以及一个全面的链接列表,以描述LLMs在embodied intelligence中的应用。
  • results: 本文对现有的embodied navigation模型和数据集进行了评估,并分析了LLMs在导航任务中的优势和缺点。同时,本文还预测了未来LLMs在embodied intelligence中的发展趋势。
    Abstract In recent years, the rapid advancement of Large Language Models (LLMs) such as the Generative Pre-trained Transformer (GPT) has attracted increasing attention due to their potential in a variety of practical applications. The application of LLMs with Embodied Intelligence has emerged as a significant area of focus. Among the myriad applications of LLMs, navigation tasks are particularly noteworthy because they demand a deep understanding of the environment and quick, accurate decision-making. LLMs can augment embodied intelligence systems with sophisticated environmental perception and decision-making support, leveraging their robust language and image-processing capabilities. This article offers an exhaustive summary of the symbiosis between LLMs and embodied intelligence with a focus on navigation. It reviews state-of-the-art models, research methodologies, and assesses the advantages and disadvantages of existing embodied navigation models and datasets. Finally, the article elucidates the role of LLMs in embodied intelligence, based on current research, and forecasts future directions in the field. A comprehensive list of studies in this survey is available at https://github.com/Rongtao-Xu/Awesome-LLM-EN
    摘要 This article provides an exhaustive summary of the symbiosis between LLMs and embodied intelligence, focusing on navigation. It reviews state-of-the-art models, research methodologies, and assesses the advantages and disadvantages of existing embodied navigation models and datasets. Additionally, the article elucidates the role of LLMs in embodied intelligence based on current research and forecasts future directions in the field. A comprehensive list of studies in this survey is available at [INSERT GITHUB LINK].Translated into Simplified Chinese:近年来,大语言模型(LLM)如生成预训练转换器(GPT)的快速发展,吸引了广泛关注,因为它们在各种实际应用中具有潜在的潜力。LLM与embodied intelligence的结合,被视为一个重要的研究方向。在LLM中,导航任务特别值得注意,因为它们需要深刻了解环境,快速准确地作出决策。LLM可以增强embodied intelligence系统,提供了先进的环境感知和决策支持,利用它们的语言和图像处理能力。本文提供了LLM与embodied intelligence的完整概述,强调导航。它检查了现状的模型、研究方法和现有的embodied navigation模型和数据集的优缺点。此外,文章还详细介绍了LLM在embodied intelligence中的角色,基于当前研究,并预测未来在这个领域的发展趋势。具体的研究列表可以在[INSERT GITHUB LINK]中找到。

Learning impartial policies for sequential counterfactual explanations using Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.00523
  • repo_url: None
  • paper_authors: E. Panagiotou, E. Ntoutsi
  • for: 这个论文的目的是提高Explainable Artificial Intelligence(XAI)中的sequential counterfactual(SCF)示例的效果。
  • methods: 这个论文使用了Reinforcement Learning(RL)方法来学习SCF的找索引策略,以提高执行效率。
  • results: 这个论文发现了现有方法可能会导致政策具有不适的属性,如偏爱特定的动作。这个论文提议使用分类器的输出概率来创建更加有用的奖励,以 Mitigate这个效应。
    Abstract In the field of explainable Artificial Intelligence (XAI), sequential counterfactual (SCF) examples are often used to alter the decision of a trained classifier by implementing a sequence of modifications to the input instance. Although certain test-time algorithms aim to optimize for each new instance individually, recently Reinforcement Learning (RL) methods have been proposed that seek to learn policies for discovering SCFs, thereby enhancing scalability. As is typical in RL, the formulation of the RL problem, including the specification of state space, actions, and rewards, can often be ambiguous. In this work, we identify shortcomings in existing methods that can result in policies with undesired properties, such as a bias towards specific actions. We propose to use the output probabilities of the classifier to create a more informative reward, to mitigate this effect.
    摘要 在可解释人工智能(XAI)领域,sequential counterfactual(SCF)例子经常用于改变已训练的分类器的决策,通过对输入实例进行一系列的修改。虽然一些测试时间算法尝试在每个新实例上优化,但是最近的奖励学习(RL)方法已经被提议用于找到SCFs,从而提高可扩展性。在RL问题的形式ulation中,包括状态空间、动作和奖励的规定,通常是抽象的。在这种情况下,我们发现现有方法的缺陷可能导致政策具有不жела的性质,如偏向特定的动作。我们提议使用分类器的输出概率来创建更有用的奖励,以 Mitigate这种效应。

Efficient LLM Inference on CPUs

  • paper_url: http://arxiv.org/abs/2311.00502
  • repo_url: https://github.com/intel/intel-extension-for-transformers
  • paper_authors: Haihao Shen, Hanwen Chang, Bo Dong, Yu Luo, Hengyu Meng
  • for: 本文旨在提出一种有效的方法,以提高大语言模型(LLMs)的部署效率。
  • methods: 本文使用自动INT4Weight-只量化流程和特制的LLM运行时,以优化CPU上LLM推理的速度。
  • results: 我们在各种流行的LLMs,包括Llama2、Llama、GPT-NeoX等,实现了高效的CPU推理。代码可以在:https://github.com/intel/intel-extension-for-transformers 中找到。
    Abstract Large language models (LLMs) have demonstrated remarkable performance and tremendous potential across a wide range of tasks. However, deploying these models has been challenging due to the astronomical amount of model parameters, which requires a demand for large memory capacity and high memory bandwidth. In this paper, we propose an effective approach that can make the deployment of LLMs more efficiently. We support an automatic INT4 weight-only quantization flow and design a special LLM runtime with highly-optimized kernels to accelerate the LLM inference on CPUs. We demonstrate the general applicability of our approach on popular LLMs including Llama2, Llama, GPT-NeoX, and showcase the extreme inference efficiency on CPUs. The code is publicly available at: https://github.com/intel/intel-extension-for-transformers.
    摘要 庞大语言模型(LLM)在各种任务上表现出色,但是部署这些模型却是一项极具挑战性的任务,因为它们的模型参数数量太多,需要大量的内存容量和高带宽。在这篇论文中,我们提出了一种有效的方法,可以使得LLM的部署更加高效。我们支持自动INT4Weight-only量化流程,并设计了特制的LLM运行时,以加速LLM的推理过程在CPU上。我们在各种受欢迎的LLM模型,包括Llama2、Llama和GPT-NeoX等模型上进行了普适性测试,并示出了在CPUs上的极高推理效率。代码可以在以下链接获取:https://github.com/intel/intel-extension-for-transformers。

Intriguing Properties of Data Attribution on Diffusion Models

  • paper_url: http://arxiv.org/abs/2311.00500
  • repo_url: https://github.com/sail-sg/d-trak
  • paper_authors: Xiaosen Zheng, Tianyu Pang, Chao Du, Jing Jiang, Min Lin
  • for: 这paper的目的是为了trace模型输出回到训练数据上,以确保数据贡献者得到公平的奖励或认可。
  • methods: 这paper使用了several theoretically motivated方法来实现数据归属,以提高计算可扩展性和效果的trade-off。
  • results: 在DDPMs和LoRA-finetuned模型上进行了广泛的实验和ablation study,发现了一些Counter-intuitive的观察结果,其中些 theoretically不合理的设计选择在数据归属方面 empirically outperform了之前的基准值,并且在linear datamodeling score和counterfactual评价方面均表现出了明显的改善。这些结果表明了一种更加有效的数据归属方法,同时也表明了在非拟合设置下,按照理论上的假设可能会导致数据归属性能下降。代码可以在https://github.com/sail-sg/D-TRAK中找到。
    Abstract Data attribution seeks to trace model outputs back to training data. With the recent development of diffusion models, data attribution has become a desired module to properly assign valuations for high-quality or copyrighted training samples, ensuring that data contributors are fairly compensated or credited. Several theoretically motivated methods have been proposed to implement data attribution, in an effort to improve the trade-off between computational scalability and effectiveness. In this work, we conduct extensive experiments and ablation studies on attributing diffusion models, specifically focusing on DDPMs trained on CIFAR-10 and CelebA, as well as a Stable Diffusion model LoRA-finetuned on ArtBench. Intriguingly, we report counter-intuitive observations that theoretically unjustified design choices for attribution empirically outperform previous baselines by a large margin, in terms of both linear datamodeling score and counterfactual evaluation. Our work presents a significantly more efficient approach for attributing diffusion models, while the unexpected findings suggest that at least in non-convex settings, constructions guided by theoretical assumptions may lead to inferior attribution performance. The code is available at https://github.com/sail-sg/D-TRAK.
    摘要 “数据追溯” seek to trace 模型输出到训练数据 zurück. With the recent development of diffusion models, 数据追溯 has become a desired module to properly assign valuations for high-quality or copyrighted 训练样本, ensuring that data contributors are fairly compensated or credited. Several theoretically motivated methods have been proposed to implement 数据追溯, in an effort to improve the trade-off between computational scalability and effectiveness. In this work, we conduct extensive experiments and ablation studies on attributing diffusion models, specifically focusing on DDPMs trained on CIFAR-10 and CelebA, as well as a Stable Diffusion model LoRA-finetuned on ArtBench. Intriguingly, we report counter-intuitive observations that theoretically unjustified design choices for attribution empirically outperform previous baselines by a large margin, in terms of both linear datamodeling score and counterfactual evaluation. Our work presents a significantly more efficient approach for attributing diffusion models, while the unexpected findings suggest that at least in non-convex settings, constructions guided by theoretical assumptions may lead to inferior attribution performance. 代码可以在 获取。

Bayes-enhanced Multi-view Attention Networks for Robust POI Recommendation

  • paper_url: http://arxiv.org/abs/2311.00491
  • repo_url: None
  • paper_authors: Jiangnan Xia, Yu Yang, Senzhang Wang, Hongzhi Yin, Jiannong Cao, Philip S. Yu
  • for: 本研究旨在提高 Location-Based Social Network 服务中 POI 推荐的精度和可靠性,由于现有的 POI 检查点数据可能受到主观和 объектив 因素的影响,导致 POI 推荐的性能下降。
  • methods: 本研究提出了一种 Bayes-enhanced Multi-view Attention Network,包括个人 POI 转移图、semantic-based POI 图和距离-based POI 图,用于全面模型 POI 之间的依赖关系。在个人 POI 转移图中,采用 Bayes-enhanced 空间依赖学习模块进行数据扩充,以增加数据多样性。然后,使用多视图注意力学习模块对 POI 表示学习进行修复。
  • results: 对比当前状态的方法,本研究的 BayMAN 方法在 POI 推荐时 Significantly 高于其他方法,特别是当 POI 检查点数据不完整或受到噪声影响时。
    Abstract POI recommendation is practically important to facilitate various Location-Based Social Network services, and has attracted rising research attention recently. Existing works generally assume the available POI check-ins reported by users are the ground-truth depiction of user behaviors. However, in real application scenarios, the check-in data can be rather unreliable due to both subjective and objective causes including positioning error and user privacy concerns, leading to significant negative impacts on the performance of the POI recommendation. To this end, we investigate a novel problem of robust POI recommendation by considering the uncertainty factors of the user check-ins, and proposes a Bayes-enhanced Multi-view Attention Network. Specifically, we construct personal POI transition graph, the semantic-based POI graph and distance-based POI graph to comprehensively model the dependencies among the POIs. As the personal POI transition graph is usually sparse and sensitive to noise, we design a Bayes-enhanced spatial dependency learning module for data augmentation from the local view. A Bayesian posterior guided graph augmentation approach is adopted to generate a new graph with collaborative signals to increase the data diversity. Then both the original and the augmented graphs are used for POI representation learning to counteract the data uncertainty issue. Next, the POI representations of the three view graphs are input into the proposed multi-view attention-based user preference learning module. By incorporating the semantic and distance correlations of POIs, the user preference can be effectively refined and finally robust recommendation results are achieved. The results of extensive experiments show that BayMAN significantly outperforms the state-of-the-art methods in POI recommendation when the available check-ins are incomplete and noisy.
    摘要

Dual Conditioned Diffusion Models for Out-Of-Distribution Detection: Application to Fetal Ultrasound Videos

  • paper_url: http://arxiv.org/abs/2311.00469
  • repo_url: None
  • paper_authors: Divyanshu Mishra, He Zhao, Pramit Saha, Aris T. Papageorghiou, J. Alison Noble
  • for: 本研究旨在提高机器学习模型的可靠性,通过检测训练数据集外的样本。
  • methods: 本研究使用 dual-conditioned diffusion models (DCDM),通过在模型中添加受控制的类信息和启发特征来实现重构基于OOD检测。
  • results: 对比参考方法,本研究所得到的准确率提高12%, 特征准确率提高22%, F1分数提高8%。
    Abstract Out-of-distribution (OOD) detection is essential to improve the reliability of machine learning models by detecting samples that do not belong to the training distribution. Detecting OOD samples effectively in certain tasks can pose a challenge because of the substantial heterogeneity within the in-distribution (ID), and the high structural similarity between ID and OOD classes. For instance, when detecting heart views in fetal ultrasound videos there is a high structural similarity between the heart and other anatomies such as the abdomen, and large in-distribution variance as a heart has 5 distinct views and structural variations within each view. To detect OOD samples in this context, the resulting model should generalise to the intra-anatomy variations while rejecting similar OOD samples. In this paper, we introduce dual-conditioned diffusion models (DCDM) where we condition the model on in-distribution class information and latent features of the input image for reconstruction-based OOD detection. This constrains the generative manifold of the model to generate images structurally and semantically similar to those within the in-distribution. The proposed model outperforms reference methods with a 12% improvement in accuracy, 22% higher precision, and an 8% better F1 score.
    摘要 外部分布(OOD)检测是提高机器学习模型的可靠性的关键之一,检测训练分布之外的样本。在某些任务中,检测OOD样本可能具有挑战性,因为ID和OOD类之间存在很大的同化和结构相似性。例如,在诊断胎儿心脏视频中,心脏和其他身体部位(如腹部)之间存在很高的结构相似性,同时心脏还有5种不同的视角和视频内部结构变化。为了在这种情况下检测OOD样本,我们需要构建一个能够总结各个体征变化的模型,同时拒绝类似OOD样本。在这篇论文中,我们提出了双conditioned diffusion模型(DCDM),其中我们将模型 conditioned于ID类信息和输入图像的隐藏特征,以实现图像的重构基于OOD检测。这将限制模型生成图像的概率分布,使其生成结构和semantic相似于训练分布中的图像。我们的模型与参考方法相比,提高了12%的准确率,22%的精度和8%的F1分数。

Leveraging Hyperbolic Embeddings for Coarse-to-Fine Robot Design

  • paper_url: http://arxiv.org/abs/2311.00462
  • repo_url: None
  • paper_authors: Heng Dong, Junyu Zhang, Chongjie Zhang
  • for: 设计多细胞机器人,实现多种任务的高效控制。
  • methods: 提出了一种新的粗细化到细致的方法,首先寻找优化的粗细机器人,然后逐渐细化。为了解决粗细转换中的决定问题,引入了Hyperbolic Embeddings for Robot Design(HERD)框架。HERD将机器人归一化到共同的虚拟空间中,并使用改进的十字熵方法进行优化。
  • results: 经验研究表明,该方法在多种复杂任务中显示出优于其他方法的高效性和普适性。
    Abstract Multi-cellular robot design aims to create robots comprised of numerous cells that can be efficiently controlled to perform diverse tasks. Previous research has demonstrated the ability to generate robots for various tasks, but these approaches often optimize robots directly in the vast design space, resulting in robots with complicated morphologies that are hard to control. In response, this paper presents a novel coarse-to-fine method for designing multi-cellular robots. Initially, this strategy seeks optimal coarse-grained robots and progressively refines them. To mitigate the challenge of determining the precise refinement juncture during the coarse-to-fine transition, we introduce the Hyperbolic Embeddings for Robot Design (HERD) framework. HERD unifies robots of various granularity within a shared hyperbolic space and leverages a refined Cross-Entropy Method for optimization. This framework enables our method to autonomously identify areas of exploration in hyperbolic space and concentrate on regions demonstrating promise. Finally, the extensive empirical studies on various challenging tasks sourced from EvoGym show our approach's superior efficiency and generalization capability.
    摘要 多细胞机器人设计目标是创建由多个细胞组成的机器人,可以高效控制完成多种任务。前一些研究已经实现了这些任务,但这些方法经常直接优化机器人的设计空间,导致机器人的结构变得复杂,控制困难。因此,本文提出了一种新的粗细到细的设计方法。首先,这种策略寻找最佳粗细机器人,然后进行细化。为了解决在粗细转换过程中决定精细化的具体时间点的挑战,我们提出了Hiperbolic Embeddings for Robot Design(HERD)框架。HERD在多细胞空间中囊括了各种机器人,并利用了改进的十字积分法进行优化。这种框架使我们的方法可以自动在偏特空间中寻找探索的区域,并集中在示 promise的区域。最后,我们对多个复杂任务的实验研究表明,我们的方法具有更高的效率和通用性。

On the Opportunities of Green Computing: A Survey

  • paper_url: http://arxiv.org/abs/2311.00447
  • repo_url: None
  • paper_authors: You Zhou, Xiujing Lin, Xiang Zhang, Maolin Wang, Gangwei Jiang, Huakang Lu, Yupeng Wu, Kai Zhang, Zhe Yang, Kehang Wang, Yongduo Sui, Fengwei Jia, Zuoli Tang, Yao Zhao, Hongxuan Zhang, Tiannuo Yang, Weibo Chen, Yunong Mao, Yi Li, De Bao, Yu Li, Hongrui Liao, Ting Liu, Jingwen Liu, Jinchi Guo, Jin Zhao, Xiangyu Zhao, Ying WEI, Hong Qian, Qi Liu, Xiang Wang, Wai Kin, Chan, Chenliang Li, Yusen Li, Shiyu Yang, Jining Yan, Chao Mou, Shuai Han, Wuxia Jin, Guannan Zhang, Xiaodong Zeng
  • for: 这篇论文主要是为了探讨绿色计算技术在人工智能领域中的应用和发展。
  • methods: 论文使用了一种系统性的分析方法,概括了绿色计算领域的四个关键组成部分,即“量化环保”、“能效AI”、“能效计算系统”和“可持续性用 случа”。
  • results: 论文结果表明,绿色计算技术有可能解决人工智能发展所带来的资源约束和环保问题。这个新的研究方向具有很大的潜力,并且鼓励更多的研究人员关注这个领域,让人工智能更加环保。
    Abstract Artificial Intelligence (AI) has achieved significant advancements in technology and research with the development over several decades, and is widely used in many areas including computing vision, natural language processing, time-series analysis, speech synthesis, etc. During the age of deep learning, especially with the arise of Large Language Models, a large majority of researchers' attention is paid on pursuing new state-of-the-art (SOTA) results, resulting in ever increasing of model size and computational complexity. The needs for high computing power brings higher carbon emission and undermines research fairness by preventing small or medium-sized research institutions and companies with limited funding in participating in research. To tackle the challenges of computing resources and environmental impact of AI, Green Computing has become a hot research topic. In this survey, we give a systematic overview of the technologies used in Green Computing. We propose the framework of Green Computing and devide it into four key components: (1) Measures of Greenness, (2) Energy-Efficient AI, (3) Energy-Efficient Computing Systems and (4) AI Use Cases for Sustainability. For each components, we discuss the research progress made and the commonly used techniques to optimize the AI efficiency. We conclude that this new research direction has the potential to address the conflicts between resource constraints and AI development. We encourage more researchers to put attention on this direction and make AI more environmental friendly.
    摘要 人工智能(AI)在技术和研究方面已经取得了重要进步,并广泛应用于多个领域,如计算视觉、自然语言处理、时间序列分析、语音合成等。在深度学习时代,特别是大语言模型的出现,研究者的关注主要集中在追求新的状态或艺术(SOTA)结果,导致模型的大小和计算复杂度的不断增加。这导致了更高的计算能力和环境影响,同时还妨碍了小或中型研究机构和公司的参与,因为它们有限的资金无法投入研究。为了解决AI计算资源和环境影响的挑战,绿色计算成为了热门的研究方向。在这篇评论中,我们提供了绿色计算的系统性评论,并将其分为四个关键组成部分:(1)绿色度指标,(2)能效AI,(3)能效计算系统,(4)用于可持续发展的AI应用场景。对于每个组成部分,我们讨论了研究进步和优化AI效率的常用技术。我们认为,这新的研究方向具有解决资源约束和AI发展之间的矛盾的潜力。我们劝勉更多的研究者关注这个方向,使AI更加环保。

A Systematic Comparison of Syllogistic Reasoning in Humans and Language Models

  • paper_url: http://arxiv.org/abs/2311.00445
  • repo_url: None
  • paper_authors: Tiwalayo Eisape, MH Tessler, Ishita Dasgupta, Fei Sha, Sjoerd van Steenkiste, Tal Linzen
  • for: investigate whether language models replicate human reasoning biases in logical inference
  • methods: using syllogisms to test the logical reasoning abilities of language models, comparing the performance of larger and smaller models and humans
  • results: larger models are more logical than smaller ones and humans, but all models make systematic errors and mimic human reasoning biases such as ordering effects and logical fallacies
    Abstract A central component of rational behavior is logical inference: the process of determining which conclusions follow from a set of premises. Psychologists have documented several ways in which humans' inferences deviate from the rules of logic. Do language models, which are trained on text generated by humans, replicate these biases, or are they able to overcome them? Focusing on the case of syllogisms -- inferences from two simple premises, which have been studied extensively in psychology -- we show that larger models are more logical than smaller ones, and also more logical than humans. At the same time, even the largest models make systematic errors, some of which mirror human reasoning biases such as ordering effects and logical fallacies. Overall, we find that language models mimic the human biases included in their training data, but are able to overcome them in some cases.
    摘要 人类理智中的一个重要组成部分是逻辑推理:从一组前提中导出结论的过程。心理学家已经记录了人们的推理偏差,而语言模型是否会复制这些偏差?我们通过研究 Syllogisms ,即从两个简单前提中导出结论,发现大型模型比小型模型更逻辑,同时也比人类更逻辑。然而,也有大型模型存在系统性错误,一些与人类理智偏差相似,如顺序效应和逻辑错误。总之,语言模型会吸收来自其训练数据中的人类偏差,但在一些情况下能够超越它们。

Improving Robustness for Vision Transformer with a Simple Dynamic Scanning Augmentation

  • paper_url: http://arxiv.org/abs/2311.00441
  • repo_url: None
  • paper_authors: Shashank Kotyan, Danilo Vasconcellos Vargas
  • for: 提高计算机视觉任务中 ViT 的准确率和Robustness
  • methods: 提出了一种名为 “动态扫描增强” 的扩展技术,利用动态输入序列来适应不同的补丁,以保持性能和Robustness
  • results: 对多种攻击和自然图像进行了详细的测试,发现这种适应性增强了 ViT 的Robustness,从 $17%$ 提高到 $92%$,并且对于自然图像的准确率也有所提高。
    Abstract Vision Transformer (ViT) has demonstrated promising performance in computer vision tasks, comparable to state-of-the-art neural networks. Yet, this new type of deep neural network architecture is vulnerable to adversarial attacks limiting its capabilities in terms of robustness. This article presents a novel contribution aimed at further improving the accuracy and robustness of ViT, particularly in the face of adversarial attacks. We propose an augmentation technique called `Dynamic Scanning Augmentation' that leverages dynamic input sequences to adaptively focus on different patches, thereby maintaining performance and robustness. Our detailed investigations reveal that this adaptability to the input sequence induces significant changes in the attention mechanism of ViT, even for the same image. We introduce four variations of Dynamic Scanning Augmentation, outperforming ViT in terms of both robustness to adversarial attacks and accuracy against natural images, with one variant showing comparable results. By integrating our augmentation technique, we observe a substantial increase in ViT's robustness, improving it from $17\%$ to $92\%$ measured across different types of adversarial attacks. These findings, together with other comprehensive tests, indicate that Dynamic Scanning Augmentation enhances accuracy and robustness by promoting a more adaptive type of attention. In conclusion, this work contributes to the ongoing research on Vision Transformers by introducing Dynamic Scanning Augmentation as a technique for improving the accuracy and robustness of ViT. The observed results highlight the potential of this approach in advancing computer vision tasks and merit further exploration in future studies.
    摘要 目标是使用新的深度神经网络架构ViT(Vision Transformer)在计算机视觉任务中表现出色,但是这种架构受到了针对性攻击的限制,增加了其可靠性的问题。这篇文章提出了一种新的贡献,即使用动态扫描加速器来提高ViT的准确率和可靠性,特别是在对抗针对性攻击方面。我们提出的动态扫描加速器利用动态输入序列来适应不同的补丁,以保持性能和可靠性。我们的详细调查表明,这种适应输入序列的能力会导致ViT的注意机制发生显著变化,即使用同一张图片。我们提出了四种变体的动态扫描加速器,其中一种变体与ViT相比,在对抗针对性攻击和自然图像方面均有显著提高。通过将我们的加速器纳入ViT中,我们观察到了ViT的可靠性从17%提高到92%,测试过程中不同类型的针对性攻击中。这些结果,以及其他详细的测试,表明动态扫描加速器可以提高ViT的准确率和可靠性,并促进计算机视觉任务的进步。因此,这种方法在未来的研究中具有潜在的应用前景。

Enhanced Generalization through Prioritization and Diversity in Self-Imitation Reinforcement Learning over Procedural Environments with Sparse Rewards

  • paper_url: http://arxiv.org/abs/2311.00426
  • repo_url: None
  • paper_authors: Alain Andres, Daochen Zha, Javier Del Ser
  • for: The paper is written to address the challenge of exploration in Reinforcement Learning (RL) with sparse rewards, specifically in procedurally-generated (PCG) environments.
  • methods: The paper proposes tailored self-Imitation Learning (self-IL) sampling strategies that prioritize transitions based on different criteria and address diversity loss through modifications to counteract the impact of generalization requirements and bias introduced by prioritization techniques.
  • results: The paper achieves a new state-of-the-art performance in the MiniGrid-MultiRoom-N12-S10 environment through experimental analysis conducted over three PCG sparse reward environments, including MiniGrid and ProcGen.Here’s the same information in Simplified Chinese text:
  • for: 该文章是为了解决在强化学习(RL)中的探索挑战,特别是在生成过程中的环境(PCG)中。
  • methods: 文章提出了适应性自我模仿学习(自我IL)的采样策略,根据不同的优先级来决定保留哪些经验,并通过修改来减少泛化需求和偏见引入的影响。
  • results: 通过对三个PCG稀补奖励环境,包括MiniGrid和ProcGen,的实验分析,文章在MiniGrid-MultiRoom-N12-S10环境中达到了新的最佳性能。
    Abstract Exploration poses a fundamental challenge in Reinforcement Learning (RL) with sparse rewards, limiting an agent's ability to learn optimal decision-making due to a lack of informative feedback signals. Self-Imitation Learning (self-IL) has emerged as a promising approach for exploration, leveraging a replay buffer to store and reproduce successful behaviors. However, traditional self-IL methods, which rely on high-return transitions and assume singleton environments, face challenges in generalization, especially in procedurally-generated (PCG) environments. Therefore, new self-IL methods have been proposed to rank which experiences to persist, but they replay transitions uniformly regardless of their significance, and do not address the diversity of the stored demonstrations. In this work, we propose tailored self-IL sampling strategies by prioritizing transitions in different ways and extending prioritization techniques to PCG environments. We also address diversity loss through modifications to counteract the impact of generalization requirements and bias introduced by prioritization techniques. Our experimental analysis, conducted over three PCG sparse reward environments, including MiniGrid and ProcGen, highlights the benefits of our proposed modifications, achieving a new state-of-the-art performance in the MiniGrid-MultiRoom-N12-S10 environment.
    摘要

Neural Implicit Field Editing Considering Object-environment Interaction

  • paper_url: http://arxiv.org/abs/2311.00425
  • repo_url: None
  • paper_authors: Zhihong Zeng, Zongji Wang, Yuanben Zhang, Weinan Cai, Zehao Cao, Lili Zhang, Yan Guo, Yanhong Zhang, Junyi Liu
  • for: 该论文主要目标是提出一种基于神经隐藏场的3D场景编辑方法,以解决现有方法中对物体和场景环境的交互不充分考虑的问题。
  • methods: 该方法基于两条流 neural rendering 系统,其中一条流负责处理物体和场景环境的交互,另一条流则负责处理物体的编辑任务。为了从混合汤中获取照明条件,该系统使用内在分解方法进行成功分离物体和场景环境之间的交互。
  • results: 该方法可以在对象级编辑任务中生成合理的外观变化,并且在新视图synthesis任务中实现了竞争性的表现质量。
    Abstract The 3D scene editing method based on neural implicit field has gained wide attention. It has achieved excellent results in 3D editing tasks. However, existing methods often blend the interaction between objects and scene environment. The change of scene appearance like shadows is failed to be displayed in the rendering view. In this paper, we propose an Object and Scene environment Interaction aware (OSI-aware) system, which is a novel two-stream neural rendering system considering object and scene environment interaction. To obtain illuminating conditions from the mixture soup, the system successfully separates the interaction between objects and scene environment by intrinsic decomposition method. To study the corresponding changes to the scene appearance from object-level editing tasks, we introduce a depth map guided scene inpainting method and shadow rendering method by point matching strategy. Extensive experiments demonstrate that our novel pipeline produce reasonable appearance changes in scene editing tasks. It also achieve competitive performance for the rendering quality in novel-view synthesis tasks.
    摘要 《基于神经隐函数的3D场景编辑方法获得了广泛关注。它在3D编辑任务中表现出色。然而,现有方法经常混合对象和场景环境的交互。改变场景外观的影响,如阴影,在渲染视图中未能正确显示。在这篇论文中,我们提出了一个对象和场景环境相互aware(OSI-aware)系统,这是一种新的两派神经渲染系统,考虑了对象和场景环境的交互。为了从混合液中获得照明条件,我们成功地将对象和场景环境之间的交互分解成内在分解方法。为了研究对象编辑任务中场景外观的相应变化,我们引入了深度地图准入场景填充方法和阴影渲染方法,使用点匹配策略。广泛的实验表明,我们的新ipeline在场景编辑任务中产生了合理的外观变化,同时在新视图合成任务中达到了竞争性的表现质量。》Note: Please keep in mind that the translation is done by a machine and may not be perfect, especially when it comes to the nuances of language and cultural references.

Couples can be tractable: New algorithms and hardness results for the Hospitals / Residents problem with Couples

  • paper_url: http://arxiv.org/abs/2311.00405
  • repo_url: None
  • paper_authors: Gergely Csáji, David Manlove, Iain McBride, James Trimble
  • for: 这个论文是研究{\sc Hospitals / Residents problem with Couples}({\sc hrc)的,它的解决方案是一个稳定匹配或一份报告表明无法找到匹配。
  • methods: 我们提出了一种新的多项时间算法,可以在{\sc hrc}实例中找到一个近似稳定匹配(对医院容量进行最多1个调整),其中couples的偏好是不响应(如果一个成员更改为更好的医院,那么夫妻也会改善)和不完全(每对医院都是两个成员都可以接受的)。
  • results: 我们的算法可以在一个子responsive、子完全的{\sc hrc}实例中找到一个稳定匹配,并且我们也证明了这个算法可以解决一个稳定b匹配问题,其中的基础graph是一个多GraphWithLoops。此外,我们还证明了{\sc hrc}的NP困难性,包括在一些特定的情况下是NP困难的。
    Abstract In this paper we study the {\sc Hospitals / Residents problem with Couples} ({\sc hrc}), where a solution is a stable matching or a report that none exists. We present a novel polynomial-time algorithm that can find a near-feasible stable matching (adjusting the hospitals' capacities by at most 1) in an {\sc hrc} instance where the couples' preferences are sub-responsive (i.e., if one member switches to a better hospital, than the couple also improves) and sub-complete (i.e., each pair of hospitals that are individually acceptable to both members are jointly acceptable for the couple) by reducing it to an instance of the {\sc Stable Fixtures} problem. We also present a polynomial-time algorithm for {\sc hrc} in a sub-responsive, sub-complete instance that is a Dual Market, or where all couples are one of several possible types. We show that our algorithm also implies the polynomial-time solvability of a stable b-matching problem, where the underlying graph is a multigraph with loops. We complement our algorithms with several hardness results. We show that {\sc hrc} with sub-responsive and sub-complete couples is NP-hard, even with other strong restrictions. We also show that {\sc hrc} with a Dual Market is NP-hard under several simultaneous restrictions. Finally, we show that the problem of finding a matching with the minimum number of blocking pairs in {\sc hrc} is not approximable within $m^{1-\varepsilon}$, for any $\varepsilon>0$, where $m$ is the total length of the hospitals' preference lists, unless P=NP, even if each couple applies to only one pair of hospitals. Our polynomial-time solvability results greatly expand the class of known tractable instances of {\sc hrc} and provide additional evidence as to why long-standing entry-level labour markets that allow couples such as the National Resident Matching Program remain successful to this day.
    摘要 在本文中,我们研究了医院和住院医生匹配问题(hrc),其中解决方案是稳定匹配或报告无解。我们提出了一种新的多项式时间算法,可以在hrc实例中,其中伙伴偏好是不响应的(即如果一方转移到更好的医院,那么伙伴也会改善)和不完整的(即每对医院都是两个成员都可以接受的)情况下,通过将医院容量调整到最多1来找到一个近似稳定匹配。我们还提出了一种多项式时间算法,用于hrc实例中的子响应、不完整实例,或者所有的couple都是一种特定类型。我们证明了我们的算法还可以解决稳定b匹配问题,其中下面的图是一个多重图。我们在本文中还提供了多种硬性结果。我们证明了hrc中的sub-responsive和sub-complete伙伴是NP困难的,即无论做出哪些强制限制,hrc都是NP困难的。我们还证明了hrc中的dual market是NP困难的,只要满足一些同时的强制限制。最后,我们证明了hrc中寻找最小数量的堵塞对的匹配是不可以approximate在$m^{1-\varepsilon}$中,其中$m$是医院的偏好列表总长度,任何$\varepsilon>0$。我们的多项式时间可行性结果大大扩展了知道的可解实例,并提供了更多的证明,证明为什么长期存在的入门级劳动市场,如国家住院医生匹配计划,至今仍然成功。

A Spatial-Temporal Transformer based Framework For Human Pose Assessment And Correction in Education Scenarios

  • paper_url: http://arxiv.org/abs/2311.00401
  • repo_url: None
  • paper_authors: Wenyang Hu, Kai Liu, Libin Liu, Huiliang Shang
  • for: 这篇论文是为了提供一个基于空间-时间转换器的框架,用于在教育场景中评估和修正学生的人体姿势。
  • methods: 该框架包括skeletal tracking、pose estimation、姿势评估和姿势修正模块,以提供专业、快速修复反馈。
  • results: 我们的模型可以有效地评估和修正学生的动作质量。STTF利用转换器模型捕捉人体姿势的空间和时间相关性,实现了准确的评估和有效的修正。
    Abstract Human pose assessment and correction play a crucial role in applications across various fields, including computer vision, robotics, sports analysis, healthcare, and entertainment. In this paper, we propose a Spatial-Temporal Transformer based Framework (STTF) for human pose assessment and correction in education scenarios such as physical exercises and science experiment. The framework comprising skeletal tracking, pose estimation, posture assessment, and posture correction modules to educate students with professional, quick-to-fix feedback. We also create a pose correction method to provide corrective feedback in the form of visual aids. We test the framework with our own dataset. It comprises (a) new recordings of five exercises, (b) existing recordings found on the internet of the same exercises, and (c) corrective feedback on the recordings by professional athletes and teachers. Results show that our model can effectively measure and comment on the quality of students' actions. The STTF leverages the power of transformer models to capture spatial and temporal dependencies in human poses, enabling accurate assessment and effective correction of students' movements.
    摘要 人体姿势评估和修正在多个领域中扮演着关键角色,包括计算机视觉、机器人学、运动分析、医疗和娱乐等。在这篇论文中,我们提出了基于空间时间变换器(STTF)的人体姿势评估和修正框架,用于在教育场景中评估和修正学生的 физи 活动和科学实验中的姿势。该框架包括骨骼跟踪、姿势估计、姿势评价和姿势修正模块,以提供专业、快速修复的反馈。我们还开发了一种姿势修正方法,以提供可见的修正反馈。我们对自己的数据集进行测试,该数据集包括(a)新录制的五种运动动作,(b)互联网上已有的同样运动动作的录制,以及(c)由专业运动员和教师提供的修正反馈。结果表明,我们的模型可以有效地评估和修正学生的动作质量。STTF利用变换器模型来捕捉人体姿势中的空间和时间相依关系,以便准确地评估和修正学生的动作。

Augmenting deep neural networks with symbolic knowledge: Towards trustworthy and interpretable AI for education

  • paper_url: http://arxiv.org/abs/2311.00393
  • repo_url: None
  • paper_authors: Danial Hooshyar, Roger Azevedo, Yeongwook Yang
  • for: 该研究旨在探讨人工神经网络(ANNs)在教育应用中的限制,并提出一种基于神经符号学家AI的解决方案,以增强ANNs的教育潜力。
  • methods: 该研究采用了一种基于神经符号学家AI的方法,称为NSAI,可以在深度神经网络中注入和提取教育知识。
  • results: 研究发现,NSAI方法比深度神经网络 Merely 训练数据和数据增强方法(SMOTE和自动编码器)的模型具有更好的泛化性。此外,NSAI方法可以减少训练数据中的偏见和自适应性,并提供可读性和解释性的规则。
    Abstract Artificial neural networks (ANNs) have shown to be amongst the most important artificial intelligence (AI) techniques in educational applications, providing adaptive educational services. However, their educational potential is limited in practice due to three major challenges: i) difficulty in incorporating symbolic educational knowledge (e.g., causal relationships, and practitioners' knowledge) in their development, ii) learning and reflecting biases, and iii) lack of interpretability. Given the high-risk nature of education, the integration of educational knowledge into ANNs becomes crucial for developing AI applications that adhere to essential educational restrictions, and provide interpretability over the predictions. This research argues that the neural-symbolic family of AI has the potential to address the named challenges. To this end, it adapts a neural-symbolic AI framework and accordingly develops an approach called NSAI, that injects and extracts educational knowledge into and from deep neural networks, for modelling learners computational thinking. Our findings reveal that the NSAI approach has better generalizability compared to deep neural networks trained merely on training data, as well as training data augmented by SMOTE and autoencoder methods. More importantly, unlike the other models, the NSAI approach prioritises robust representations that capture causal relationships between input features and output labels, ensuring safety in learning to avoid spurious correlations and control biases in training data. Furthermore, the NSAI approach enables the extraction of rules from the learned network, facilitating interpretation and reasoning about the path to predictions, as well as refining the initial educational knowledge. These findings imply that neural-symbolic AI can overcome the limitations of ANNs in education, enabling trustworthy and interpretable applications.
    摘要
  1. Difficulty in incorporating symbolic educational knowledge (e.g., causal relationships, practitioners’ knowledge) in their development.2. Learning and reflecting biases.3. Lack of interpretability.To address these challenges, this research advocates for the use of neural-symbolic AI, which has the potential to integrate educational knowledge into ANNs and provide interpretability over the predictions. The proposed approach, called NSAI, injects and extracts educational knowledge into and from deep neural networks, enabling the modelling of learners’ computational thinking.The results show that the NSAI approach has better generalizability compared to deep neural networks trained merely on training data, as well as training data augmented by SMOTE and autoencoder methods. Additionally, the NSAI approach prioritizes robust representations that capture causal relationships between input features and output labels, ensuring safety in learning and avoiding spurious correlations and control biases in training data.Moreover, the NSAI approach enables the extraction of rules from the learned network, facilitating interpretation and reasoning about the path to predictions, as well as refining the initial educational knowledge. These findings suggest that neural-symbolic AI can overcome the limitations of ANNs in education, enabling trustworthy and interpretable applications.

Will Code Remain a Relevant User Interface for End-User Programming with Generative AI Models?

  • paper_url: http://arxiv.org/abs/2311.00382
  • repo_url: None
  • paper_authors: Advait Sarkar
  • for: 本研究探讨了在生成AI时,传统编程语言仍然对非专业程序员有用性的问题。
  • methods: 本文采用了观察研究的方法,探讨了生成AI对非专业程序员的影响。
  • results: 本文提出了“生成shift假设”,即生成AI会对传统编程语言产生质量和量上的扩展。同时,文章还探讨了传统编程语言在非专业程序员中的可能性。
    Abstract The research field of end-user programming has largely been concerned with helping non-experts learn to code sufficiently well in order to achieve their tasks. Generative AI stands to obviate this entirely by allowing users to generate code from naturalistic language prompts. In this essay, we explore the extent to which "traditional" programming languages remain relevant for non-expert end-user programmers in a world with generative AI. We posit the "generative shift hypothesis": that generative AI will create qualitative and quantitative expansions in the traditional scope of end-user programming. We outline some reasons that traditional programming languages may still be relevant and useful for end-user programmers. We speculate whether each of these reasons might be fundamental and enduring, or whether they may disappear with further improvements and innovations in generative AI. Finally, we articulate a set of implications for end-user programming research, including the possibility of needing to revisit many well-established core concepts, such as Ko's learning barriers and Blackwell's attention investment model.
    摘要 研究领域内的终端用户编程主要关注于帮助非专业人员学习编程,以便实现他们的任务。生成AI可能将把用户的编程需求转化为自然语言提示,从而彻底改变这一情况。在这篇文章中,我们探讨了传统编程语言在非专业终端编程者面前是否仍然有用的问题。我们提出了“生成转移 гипотеза”:生成AI会使得终端编程的范围发生质量和量上的扩展。我们列举了传统编程语言在非专业终端编程者面前可能仍然有用的原因。我们推测这些原因是否是基本和普遍的,或者将随着生成AI的进一步改进和创新而消失。最后,我们详细介绍了终端编程研究的影响,包括可能需要重新评估许多已有核心概念,如科氏学习障碍和布莱克威尔注意力投入模型。

Architecture of Data Anomaly Detection-Enhanced Decentralized Expert System for Early-Stage Alzheimer’s Disease Prediction

  • paper_url: http://arxiv.org/abs/2311.00373
  • repo_url: None
  • paper_authors: Stefan Kambiz Behfar, Qumars Behfar, Marzie Hosseinpour
  • for: 这个研究旨在早期检测阿尔茨海默病,以提高病人结果。
  • methods: 这个研究使用了分布式专家系统,结合区块链技术和人工智能,以实现Robust anomaly detection。
  • results: 这个系统可以提供更精确的早期阿尔茨海默病预测,并保护数据完整性和病人隐私。
    Abstract Alzheimer's Disease is a global health challenge that requires early and accurate detection to improve patient outcomes. Magnetic Resonance Imaging (MRI) holds significant diagnostic potential, but its effective analysis remains a formidable task. This study introduces a groundbreaking decentralized expert system that cleverly combines blockchain technology with Artificial Intelligence (AI) to integrate robust anomaly detection for patient-submitted data. Traditional diagnostic methods often lead to delayed and imprecise predictions, especially in the early stages of the disease. Centralized data repositories struggle to manage the immense volumes of MRI data, and persistent privacy concerns hinder collaborative efforts. Our innovative solution harnesses decentralization to protect data integrity and patient privacy, facilitated by blockchain technology. It not only emphasizes AI-driven MRI analysis but also incorporates a sophisticated data anomaly detection architecture. These mechanisms scrutinize patient-contributed data for various issues, including data quality problems and atypical findings within MRI images. Conducting an exhaustive check of MRI image correctness and quality directly on the blockchain is impractical due to computational complexity and cost constraints. Typically, such checks are performed off-chain, and the blockchain securely records the results. This comprehensive approach empowers our decentralized app to provide more precise early-stage Alzheimer's Disease predictions. By merging the strengths of blockchain, AI, and anomaly detection, our system represents a pioneering step towards revolutionizing disease diagnostics.
    摘要 阿尔茨海默病是全球医疗挑战,早期检测是提高病人结果的关键。核磁共振成像(MRI)具有诊断潜力,但是有效分析具有挑战。这项研究推出了创新的分布式专家系统,协调区块链技术和人工智能(AI),以实现强大的异常检测。传统诊断方法通常会导致延迟和不准确的预测,特别是早期病情阶段。中央数据存储系统忙于管理大量MRI数据,而持续的隐私问题阻碍了合作努力。我们的创新解决方案利用分布式保护数据完整性和患者隐私,通过区块链技术。它不仅强调AI驱动的MRI分析,还包括了复杂的数据异常检测建筑。这些机制在患者提供的数据中检测了各种问题,包括数据质量问题和MRI图像中的异常现象。由于计算复杂性和成本约束,在区块链上进行全面的MRI图像正确性和质量检查是不实际。通常,这些检查在外部进行,并将结果记录在区块链上。这种全面的方法使我们的分布式应用程序提供更精准的早期阿尔茨海默病预测。通过融合区块链、AI和异常检测的优势,我们的系统表现出了革新的潜力,为疾病诊断领域带来巨大的改变。

Prompt-based Logical Semantics Enhancement for Implicit Discourse Relation Recognition

  • paper_url: http://arxiv.org/abs/2311.00367
  • repo_url: https://github.com/lalalamdbf/plse_idrr
  • paper_authors: Chenxu Wang, Ping Jian, Mu Huang
    for: 本文主要针对推广语句关系识别(IDRR)进行研究,并提出一种基于提示的逻辑 semantics 增强方法(PLSE),以提高 IDRR 的性能和可靠性。methods: 本文使用了预训语言模型的提示基于逻辑 semantics 预测,以将知识与语句关系相互连接。此外,为了解决预测器的局部依赖问题,本文提出了一种基于互联信息最大化的自愿学习目标,以从中获得提高的逻辑 semantics 表示。results: 本文在 PDTB 2.0 和 CoNLL16 数据集上实验ally demonstrate that our PLSE method achieves outstanding and consistent performance against the current state-of-the-art models。
    Abstract Implicit Discourse Relation Recognition (IDRR), which infers discourse relations without the help of explicit connectives, is still a crucial and challenging task for discourse parsing. Recent works tend to exploit the hierarchical structure information from the annotated senses, which demonstrate enhanced discourse relation representations can be obtained by integrating sense hierarchy. Nevertheless, the performance and robustness for IDRR are significantly constrained by the availability of annotated data. Fortunately, there is a wealth of unannotated utterances with explicit connectives, that can be utilized to acquire enriched discourse relation features. In light of such motivation, we propose a Prompt-based Logical Semantics Enhancement (PLSE) method for IDRR. Essentially, our method seamlessly injects knowledge relevant to discourse relation into pre-trained language models through prompt-based connective prediction. Furthermore, considering the prompt-based connective prediction exhibits local dependencies due to the deficiency of masked language model (MLM) in capturing global semantics, we design a novel self-supervised learning objective based on mutual information maximization to derive enhanced representations of logical semantics for IDRR. Experimental results on PDTB 2.0 and CoNLL16 datasets demonstrate that our method achieves outstanding and consistent performance against the current state-of-the-art models.
    摘要 《含义推理提升(IDRR)》是一项挑战性的自然语言处理任务,它推断话语关系无需显式连接。在最近的研究中,人们通常利用话语结构信息,从注解的意思中获得增强的话语关系表示。然而,IDRR的性能和可靠性受到注解数据的可用性的限制。幸运的是,有大量未注解的句子,可以用于获得增强的话语关系特征。基于这种动机,我们提出了一种《含义推理提升(PLSE)》方法,用于IDRR。我们的方法通过提供相关的话语关系知识,使预训练语言模型内置了含义推理能力。此外,由于隐藏语言模型(MLM)无法捕捉全局 semantics,我们设计了一种新的自动学习目标,基于mutual information maximization来 derivate增强的含义semantics表示。实验结果表明,我们的方法在 PDTB 2.0 和 CoNLL16 数据集上达到了当前状态的最佳性能。

Rethinking Samples Selection for Contrastive Learning: Mining of Potential Samples

  • paper_url: http://arxiv.org/abs/2311.00358
  • repo_url: None
  • paper_authors: Hengkui Dong, Xianzhong Long, Yun Li
  • for: 本研究旨在提高对比学习中样本采样的方法,以提高模型的自助学习能力。
  • methods: 我们的方法包括两个方面:首先,对于正样本,我们考虑了数据增强得到的扩展样本视图以及数据挖掘得到的样本视图,并使用软和硬权重策略权重合并。其次,我们分析了负样本的梯度方面,并 mines 适度困难的负样本作为可能的负样本。
  • results: 我们的方法在CIFAR10、CIFAR100和TinyImagenet等 datasets上进行了实验,并显示了与一些传统自助学习方法相比明显的优势。我们的方法在这些 datasets 上取得了88.57%、61.10%和36.69%的 top-1 准确率。
    Abstract Contrastive learning predicts whether two images belong to the same category by training a model to make their feature representations as close or as far away as possible. In this paper, we rethink how to mine samples in contrastive learning, unlike other methods, our approach is more comprehensive, taking into account both positive and negative samples, and mining potential samples from two aspects: First, for positive samples, we consider both the augmented sample views obtained by data augmentation and the mined sample views through data mining. Then, we weight and combine them using both soft and hard weighting strategies. Second, considering the existence of uninformative negative samples and false negative samples in the negative samples, we analyze the negative samples from the gradient perspective and finally mine negative samples that are neither too hard nor too easy as potential negative samples, i.e., those negative samples that are close to positive samples. The experiments show the obvious advantages of our method compared with some traditional self-supervised methods. Our method achieves 88.57%, 61.10%, and 36.69% top-1 accuracy on CIFAR10, CIFAR100, and TinyImagenet, respectively.
    摘要 异构学习预测两张图像属于同一个类别,通过训练模型使其特征表示更加相近或更加远 away。在这篇论文中,我们重新思考了如何采样异构学习中的样本。不同于其他方法,我们的方法更加全面,考虑了两种样本类型的样本:首先,对于正样本,我们考虑了数据扩充后得到的扩充样本视图,以及通过数据挖掘得到的样本视图。然后,我们将它们权重和组合使用软和硬权重策略。其次,我们分析了负样本中的不用fu正样本和假负样本,并最终 mines这些负样本,即与正样本相似的负样本。实验显示,我们的方法与一些传统的自助学习方法相比,具有明显的优势。我们的方法在CIFAR10、CIFAR100和TinyImagenet上取得了88.57%、61.10%和36.69%的top-1准确率。

QFree: A Universal Value Function Factorization for Multi-Agent Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.00356
  • repo_url: None
  • paper_authors: Rizhong Wang, Huiping Li, Di Cui, Demin Xu
  • For: The paper is written to propose a universal value function factorization method for multi-agent reinforcement learning (MARL) that satisfies the individual-global-max (IGM) principle without imposing additional limitations on the IGM function class.* Methods: The paper develops a mathematical equivalent conditions of the IGM principle based on the advantage function, and establishes a more expressive mixing network architecture that can fulfill the equivalent factorization. The novel loss function is developed by considering the equivalent conditions as regularization term during policy evaluation in the MARL algorithm.* Results: The proposed method, called QFree, is verified in a nonmonotonic matrix game scenario and achieves state-of-the-art performance in a general-purpose complex MARL benchmark environment, Starcraft Multi-Agent Challenge (SMAC).Here are the three points in Simplified Chinese:* For: 本 paper 是为了提出一种满足个体-全局-最大 (IGM) 原理的多智能体学习 (MARL) 价值函数分解方法。* Methods: 本 paper 基于优势函数的数学等价条件来定义 IGM 原理,并设计了一种更具表达能力的混合网络架构来满足等价分解。 novel 损失函数是在 MARL 算法中评估政策时考虑等价条件作为正则项来开发的。* Results: QFree 在非卷积环境中证明了其效果,并在 Starcraft Multi-Agent Challenge (SMAC) 多智能体挑战环境中达到了当前最佳性能。
    Abstract Centralized training is widely utilized in the field of multi-agent reinforcement learning (MARL) to assure the stability of training process. Once a joint policy is obtained, it is critical to design a value function factorization method to extract optimal decentralized policies for the agents, which needs to satisfy the individual-global-max (IGM) principle. While imposing additional limitations on the IGM function class can help to meet the requirement, it comes at the cost of restricting its application to more complex multi-agent environments. In this paper, we propose QFree, a universal value function factorization method for MARL. We start by developing mathematical equivalent conditions of the IGM principle based on the advantage function, which ensures that the principle holds without any compromise, removing the conservatism of conventional methods. We then establish a more expressive mixing network architecture that can fulfill the equivalent factorization. In particular, the novel loss function is developed by considering the equivalent conditions as regularization term during policy evaluation in the MARL algorithm. Finally, the effectiveness of the proposed method is verified in a nonmonotonic matrix game scenario. Moreover, we show that QFree achieves the state-of-the-art performance in a general-purpose complex MARL benchmark environment, Starcraft Multi-Agent Challenge (SMAC).
    摘要 中央化训练在多智能学习(MARL)领域广泛应用,以确保训练过程的稳定性。一旦获得共同策略,然后需要设计一种值函数分解方法,以EXTRACT optimal的分布式策略,满足IGM原则。虽然通过添加额外限制IGM函数类型可以帮助适应更复杂的多智能环境,但是这会增加训练复杂性。在这篇论文中,我们提出了QFree,一种通用的值函数分解方法 для MARL。我们首先开发了IGM原则的数学等价条件,以确保该原则在不妥协的情况下保持有效,从而消除传统方法中的保守性。然后,我们设计了一种更具表达能力的混合网络架构,可以满足等价分解。具体来说,我们开发了一种新的损失函数,通过在MARL算法中考虑等价条件来评估策略。最后,我们证明了提案的效果,在非 monotonic 矩阵游戏场景中进行了验证。此外,我们还证明了QFree在一个通用的复杂 MARL 环境中达到了状态领先性,例如Starcraft Multi-Agent Challenge(SMAC)。

tmn at #SMM4H 2023: Comparing Text Preprocessing Techniques for Detecting Tweets Self-reporting a COVID-19 Diagnosis

  • paper_url: http://arxiv.org/abs/2311.00732
  • repo_url: None
  • paper_authors: Anna Glazkova
  • for: 本文描述了在SMM4H 2023年度任务1中开发的一种系统,用于自动分类报告COVID-19诊断的推特。
  • methods: 本文使用了不同的技术进行推特处理,并使用四种基于 transformer 的模型进行 fine-tuning。
  • results: ensemble 的 fine-tuned 语言模型得到了84.5%的 F1 分数,比平均值高出4.1%。
    Abstract The paper describes a system developed for Task 1 at SMM4H 2023. The goal of the task is to automatically distinguish tweets that self-report a COVID-19 diagnosis (for example, a positive test, clinical diagnosis, or hospitalization) from those that do not. We investigate the use of different techniques for preprocessing tweets using four transformer-based models. The ensemble of fine-tuned language models obtained an F1-score of 84.5%, which is 4.1% higher than the average value.
    摘要 文章描述了在SMM4H 2023年的任务1中开发的系统。任务的目标是自动分类推特中的自测COVID-19诊断(例如,正确的测试、临床诊断或入院)和不符的推特。我们研究了不同的预处理技术,使用四种基于转换器的模型进行预处理,并获得了 ensemble 的精度模型,其 F1 分数为 84.5%,高于平均值4.1%。

A Definition of Open-Ended Learning Problems for Goal-Conditioned Agents

  • paper_url: http://arxiv.org/abs/2311.00344
  • repo_url: None
  • paper_authors: Olivier Sigaud, Gianluca Baldassarre, Cedric Colas, Stephane Doncieux, Richard Duro, Nicolas Perrin-Gilbert, Vieri Giuliano Santucci
  • for: 本研究为了解决开放式学习概念的不同定义和相关概念(如 continual learning、生命长学习和自我规划学习)之间的差异,并提出一个基本元素Property的定义,以便更好地理解开放式学习的本质。
  • methods: 本研究使用了论述和分析方法,描述了开放式学习的基本概念和历史发展,并提出了一种基于时间无穷horizon的开放式学习问题的定义方法。
  • results: 本研究显示了开放式学习的基本元素Property,并提出了一种基于这个元素的开放式学习问题的定义方法。此外,本研究还指出了在开放式学习领域还需要进一步的研究,以填充开放式学习与更广泛的发展人工智能研究中的差异。
    Abstract A lot of recent machine learning research papers have "Open-ended learning" in their title. But very few of them attempt to define what they mean when using the term. Even worse, when looking more closely there seems to be no consensus on what distinguishes open-ended learning from related concepts such as continual learning, lifelong learning or autotelic learning. In this paper, we contribute to fixing this situation. After illustrating the genealogy of the concept and more recent perspectives about what it truly means, we outline that open-ended learning is generally conceived as a composite notion encompassing a set of diverse properties. In contrast with these previous approaches, we propose to isolate a key elementary property of open-ended processes, which is to always produce novel elements from time to time over an infinite horizon. From there, we build the notion of open-ended learning problems and focus in particular on the subset of open-ended goal-conditioned reinforcement learning problems, as this framework facilitates the definition of learning a growing repertoire of skills. Finally, we highlight the work that remains to be performed to fill the gap between our elementary definition and the more involved notions of open-ended learning that developmental AI researchers may have in mind.
    摘要 很多最近的机器学习研究论文中都有“开放式学习”的标题,但很少有人尝试定义这个术语的含义。甚至更糟糕的是,当仔细查看时,似乎没有一致性的定义将开放式学习与相关概念,如持续学习、人生学习或自我追求学习区分开。在这篇论文中,我们贡献到解决这个问题。我们首先描述了概念的家系和更近期的观点,然后提出了开放式学习是一种复杂的概念,包括多种多样的性质。与之前的方法不同,我们提出了开放式学习过程的关键基本属性——在无穷远 horizon 上不断产生新的元素。从而,我们建立了开放式学习问题的概念,特别是关注开放式目标条件强化学习问题,因为这种框架可以定义学习增长的技能集。最后,我们强调了将这些基本定义与发展人工智能研究者可能有的更复杂的开放式学习概念相关的工作还需要进行。

MetisFL: An Embarrassingly Parallelized Controller for Scalable & Efficient Federated Learning Workflows

  • paper_url: http://arxiv.org/abs/2311.00334
  • repo_url: None
  • paper_authors: Dimitris Stripelis, Chrysovalantis Anastasiou, Patrick Toral, Armaghan Asghar, Jose Luis Ambite
  • for: 这个研究旨在提高 Federated Learning(FL)系统中的联合控制器可扩展性和可携性。
  • methods: 这个研究使用了一个名为 MetisFL 的新型 FL 系统,将联合控制器设计为“首席公民”,重新设计了联合控制器进行大规模 FL 工作流程的加速。
  • results: 透过对其他州旗性 FL 系统进行量化比较,这个研究证明了 MetisFL 在实际应用中可以获得10倍的压缩时间执行提升,适用于各种具有增加模型大小和联合网站的具有挑战性的 FL 工作流程。
    Abstract A Federated Learning (FL) system typically consists of two core processing entities: the federation controller and the learners. The controller is responsible for managing the execution of FL workflows across learners and the learners for training and evaluating federated models over their private datasets. While executing an FL workflow, the FL system has no control over the computational resources or data of the participating learners. Still, it is responsible for other operations, such as model aggregation, task dispatching, and scheduling. These computationally heavy operations generally need to be handled by the federation controller. Even though many FL systems have been recently proposed to facilitate the development of FL workflows, most of these systems overlook the scalability of the controller. To meet this need, we designed and developed a novel FL system called MetisFL, where the federation controller is the first-class citizen. MetisFL re-engineers all the operations conducted by the federation controller to accelerate the training of large-scale FL workflows. By quantitatively comparing MetisFL against other state-of-the-art FL systems, we empirically demonstrate that MetisFL leads to a 10-fold wall-clock time execution boost across a wide range of challenging FL workflows with increasing model sizes and federation sites.
    摘要 Translated into Simplified Chinese:一个 Federated Learning (FL) 系统通常包括两个核心处理实体:联邦控制器和学习者。控制器负责管理执行 FL 工作流程 across 学习者和学习者对其私有数据上的训练和评估联邦模型。在执行 FL 工作流程时,FL 系统没有对参与学习者的计算资源或数据进行控制。然而,它负责其他操作,如模型集成、任务分配和调度。这些计算沉重的操作通常需要由联邦控制器处理。虽然有很多 FL 系统最近被提出来促进 FL 工作流程的开发,但大多数这些系统忽略了控制器的扩展性。为了解决这个需求,我们设计并开发了一个新的 FL 系统 called MetisFL,其中联邦控制器是首要公民。MetisFL 重新设计了联邦控制器所执行的所有操作,以加速训练大规模 FL 工作流程。通过对 MetisFL 与其他当前状态艺术 FL 系统进行Quantitative比较,我们实际地证明 MetisFL 在各种挑战性 FL 工作流程中具有10倍的增速。

Robust Graph Clustering via Meta Weighting for Noisy Graphs

  • paper_url: http://arxiv.org/abs/2311.00322
  • repo_url: https://github.com/hyeonsoojo/metagc
  • paper_authors: Hyeonsoo Jo, Fanchen Bu, Kijung Shin
  • for: robustly clustering graphs with noise edges
  • methods: using a decomposable clustering loss function and meta-weighting to adaptively adjust node pair weights
  • results: outperforms state-of-the-art GNN-based competitors on five real-world graphs under varying levels of noise
    Abstract How can we find meaningful clusters in a graph robustly against noise edges? Graph clustering (i.e., dividing nodes into groups of similar ones) is a fundamental problem in graph analysis with applications in various fields. Recent studies have demonstrated that graph neural network (GNN) based approaches yield promising results for graph clustering. However, we observe that their performance degenerates significantly on graphs with noise edges, which are prevalent in practice. In this work, we propose MetaGC for robust GNN-based graph clustering. MetaGC employs a decomposable clustering loss function, which can be rephrased as a sum of losses over node pairs. We add a learnable weight to each node pair, and MetaGC adaptively adjusts the weights of node pairs using meta-weighting so that the weights of meaningful node pairs increase and the weights of less-meaningful ones (e.g., noise edges) decrease. We show empirically that MetaGC learns weights as intended and consequently outperforms the state-of-the-art GNN-based competitors, even when they are equipped with separate denoising schemes, on five real-world graphs under varying levels of noise. Our code and datasets are available at https://github.com/HyeonsooJo/MetaGC.
    摘要 如何在图中寻找有意义的集群?图分组(即将节点分组到相似的节点集中)是图分析的基本问题,具有各种应用场景。 latest studies have shown that graph neural network (GNN) based approaches have promising results for graph clustering. However, we observe that their performance degrades significantly on graphs with noise edges, which are common in practice. In this work, we propose MetaGC for robust GNN-based graph clustering. MetaGC uses a decomposable clustering loss function, which can be rephrased as a sum of losses over node pairs. We add a learnable weight to each node pair, and MetaGC adaptively adjusts the weights of node pairs using meta-weighting so that the weights of meaningful node pairs increase and the weights of less-meaningful ones (e.g., noise edges) decrease. We empirically show that MetaGC learns weights as intended and consequently outperforms the state-of-the-art GNN-based competitors, even when they are equipped with separate denoising schemes, on five real-world graphs under varying levels of noise. Our code and datasets are available at .

Unsupervised Lexical Simplification with Context Augmentation

  • paper_url: http://arxiv.org/abs/2311.00310
  • repo_url: https://github.com/twadada/lexsub_decontextualised
  • paper_authors: Takashi Wada, Timothy Baldwin, Jey Han Lau
  • for: 这个论文主要是为了提出一种新的无监督词归简方法,使用单语言数据和预训练语言模型。
  • methods: 该方法使用目标词和其上下文作为输入,通过基于目标上下文和额外样本的策略生成替换词。
  • results: 在英语、葡萄牙语和西班牙语的TSAR-2022分享任务上,该模型与其他无监督系统相比,具有显著的性能优势,并在拼接GPT-3.5模型后创造出新的状态天。此外,在SWORDS词归简数据集上进行评估,该模型也实现了新的状态天。
    Abstract We propose a new unsupervised lexical simplification method that uses only monolingual data and pre-trained language models. Given a target word and its context, our method generates substitutes based on the target context and also additional contexts sampled from monolingual data. We conduct experiments in English, Portuguese, and Spanish on the TSAR-2022 shared task, and show that our model substantially outperforms other unsupervised systems across all languages. We also establish a new state-of-the-art by ensembling our model with GPT-3.5. Lastly, we evaluate our model on the SWORDS lexical substitution data set, achieving a state-of-the-art result.
    摘要 我们提出了一种新的无监督词性简化方法,只使用单语言数据和预训练语言模型。给定目标词和其上下文,我们的方法生成替换基于目标上下文和其他从单语言数据采样的上下文。我们在英语、葡萄牙语和西班牙语的TSAR-2022共享任务上进行实验,并显示我们的模型在所有语言上明显超过其他无监督系统。我们还在我们的模型和GPT-3.5的拟合中成立了新的状态对。最后,我们对SWORDS词性替换数据集进行评估,达到了状态纪录。

From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities

  • paper_url: http://arxiv.org/abs/2311.00308
  • repo_url: None
  • paper_authors: Md Farhan Ishmam, Md Sakib Hossain Shovon, M. F. Mridha, Nilanjan Dey
  • for: 本论文旨在探讨视觉问答(VQA)领域的多模态任务,包括计算机视觉(CV)和自然语言处理(NLP)等方面,并且旨在根据任何视觉输入生成问题的答案。
  • methods: 本论文 introduce a detailed taxonomy to categorize the facets of VQA, 并且总结了随着时间的推移,VQA的范围从原始的自然图像集扩展到 sintetic images、视频、3D环境等多种视觉输入。此外,本论文还探讨了大量预训练网络的出现对VQA的影响,从而导致了传统的特征提取和融合方法被 replaced by vision language pre-training(VLP)技术。
  • results: 本论文 summarizes the recent trends, challenges, and scopes for improvement in VQA, 并且探讨了VLP在VQA领域的挑战,并提出了一些未解决的开放问题。此外,本论文还扩展了VQA的范围,探讨了相关的多模态问答任务和未来的研究方向。
    Abstract The multimodal task of Visual Question Answering (VQA) encompassing elements of Computer Vision (CV) and Natural Language Processing (NLP), aims to generate answers to questions on any visual input. Over time, the scope of VQA has expanded from datasets focusing on an extensive collection of natural images to datasets featuring synthetic images, video, 3D environments, and various other visual inputs. The emergence of large pre-trained networks has shifted the early VQA approaches relying on feature extraction and fusion schemes to vision language pre-training (VLP) techniques. However, there is a lack of comprehensive surveys that encompass both traditional VQA architectures and contemporary VLP-based methods. Furthermore, the VLP challenges in the lens of VQA haven't been thoroughly explored, leaving room for potential open problems to emerge. Our work presents a survey in the domain of VQA that delves into the intricacies of VQA datasets and methods over the field's history, introduces a detailed taxonomy to categorize the facets of VQA, and highlights the recent trends, challenges, and scopes for improvement. We further generalize VQA to multimodal question answering, explore tasks related to VQA, and present a set of open problems for future investigation. The work aims to navigate both beginners and experts by shedding light on the potential avenues of research and expanding the boundaries of the field.
    摘要 Multimodal任务视觉问答(VQA)包括计算机视觉(CV)和自然语言处理(NLP)的多个方面,旨在对任何视觉输入生成问题的答案。随着时间的推移,VQA的范围从原始的庞大自然图像集扩展到了 sintetic图像、视频、3D环境和其他多种视觉输入。随着大型预训练网络的出现,早期VQA方法依靠特征提取和融合方案已经转向了视语言预训练(VLP)技术。然而,现在还没有一份全面的报告,涵盖传统VQA架构和当代VLP基于方法。此外,VLP在VQA镜头下的挑战还没有得到全面的探讨,留下了一些未解决的问题。我们的工作提出了VQA领域的一份报告,探讨VQA数据集和方法的历史、介绍VQA的细化分类、描述当前趋势、挑战和改进的可能性。我们还将VQA扩展到多模态问答任务,探讨与VQA相关的任务,并提出一些未解决的问题,以便帮助新手和专家更好地理解这个领域,拓宽领域的边缘。

Inference of CO2 flow patterns – a feasibility study

  • paper_url: http://arxiv.org/abs/2311.00290
  • repo_url: None
  • paper_authors: Abhinav Prakash Gahlot, Huseyin Tuna Erdinc, Rafael Orozco, Ziyi Yin, Felix J. Herrmann
  • for: 本研究旨在开发一种能够准确探测地下碳捕集器(CCS)技术下的CO2泄漏,特别是通过存储储量中的渠道束缚的自然或人工扰动的 faults。
  • methods: 本研究使用 conditional normalizing flow 技术来描述 CO2 泄漏的流行行为,并通过 numerical experiments 来分析其性能。
  • results: 研究结果表明,使用 conditional normalizing flow 技术可以生成高精度的 CO2 泄漏流行行为的推断,并且uncertainty 的推断也是合理的,主要来自于地震数据的噪声和存储储量中流体流行性特性的不确定性。
    Abstract As the global deployment of carbon capture and sequestration (CCS) technology intensifies in the fight against climate change, it becomes increasingly imperative to establish robust monitoring and detection mechanisms for potential underground CO2 leakage, particularly through pre-existing or induced faults in the storage reservoir's seals. While techniques such as history matching and time-lapse seismic monitoring of CO2 storage have been used successfully in tracking the evolution of CO2 plumes in the subsurface, these methods lack principled approaches to characterize uncertainties related to the CO2 plumes' behavior. Inclusion of systematic assessment of uncertainties is essential for risk mitigation for the following reasons: (i) CO2 plume-induced changes are small and seismic data is noisy; (ii) changes between regular and irregular (e.g., caused by leakage) flow patterns are small; and (iii) the reservoir properties that control the flow are strongly heterogeneous and typically only available as distributions. To arrive at a formulation capable of inferring flow patterns for regular and irregular flow from well and seismic data, the performance of conditional normalizing flow will be analyzed on a series of carefully designed numerical experiments. While the inferences presented are preliminary in the context of an early CO2 leakage detection system, the results do indicate that inferences with conditional normalizing flows can produce high-fidelity estimates for CO2 plumes with or without leakage. We are also confident that the inferred uncertainty is reasonable because it correlates well with the observed errors. This uncertainty stems from noise in the seismic data and from the lack of precise knowledge of the reservoir's fluid flow properties.
    摘要 在全球范围内部署碳捕集技术的战 against 气候变化中,建立强大的监测和检测机制 для potential underground CO2 泄露已变得越来越重要。特别是通过存在或人为引入的 faults in the storage reservoir's seals 中的泄露。虽然历史匹配和时间lapse seismic monitoring of CO2 storage 已经成功地跟踪了在地下的 CO2 气泡的进化,但这些方法缺乏定则的方法来评估相关的不确定性。包括系统性的评估不确定性是必要的,以便风险控制,因为:(i) CO2 气泡引起的变化很小,seismic data 是噪音的;(ii)常规和不常规(例如,由泄露引起的)流pattern 之间的变化很小;和(iii)存储器的 свойства,控制流的流速和方向,是强 heterogeneous 的,通常只有分布式存储。为了到达一种可以从 well 和 seismic data 中推断常规和不常规流的形式,我们将分析 conditional normalizing flow 的性能在一系列仔细设计的数值实验中。虽然这些推断是在 CO2 泄露检测系统中的预liminary CONTEXT中提出的,但结果表明,使用 conditional normalizing flows 可以生成高精度的 CO2 气泡推断,无论是否存在泄露。我们还 confidence 的是,推断的不确定性是合理的,因为它与观测数据中的噪音和存储器的流体流速和方向的缺乏精确知识相关。这种不确定性来自 seismic data 中的噪音和存储器的流体流速和方向的缺乏精确知识。

Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks

  • paper_url: http://arxiv.org/abs/2311.00288
  • repo_url: https://github.com/pluslabnlp/active-it
  • paper_authors: Po-Nien Kung, Fan Yin, Di Wu, Kai-Wei Chang, Nanyun Peng
  • for: 这篇论文的目的是提出一个新的活动指令调整方法,以便对于实际应用中的大型自然语言模型(LLM)进行最佳化。
  • methods: 这篇论文使用了一个新的框架,即基于提示不确定性的活动指令调整方法,来选择新的任务,并对选择的任务进行调整。这个方法基于提示出现在的模型输出不确定性,以评估新任务的有用性。
  • results: 实验结果显示,这个方法可以与其他基于随机选择的方法相比,在NIV2和Self-Instruct datasets上实现更好的离散应用扩展性,并且可以透过给定的任务地图来评估和诊断任务的有用性。
    Abstract Instruction tuning (IT) achieves impressive zero-shot generalization results by training large language models (LLMs) on a massive amount of diverse tasks with instructions. However, how to select new tasks to improve the performance and generalizability of IT models remains an open question. Training on all existing tasks is impractical due to prohibiting computation requirements, and randomly selecting tasks can lead to suboptimal performance. In this work, we propose active instruction tuning based on prompt uncertainty, a novel framework to identify informative tasks, and then actively tune the models on the selected tasks. We represent the informativeness of new tasks with the disagreement of the current model outputs over perturbed prompts. Our experiments on NIV2 and Self-Instruct datasets demonstrate that our method consistently outperforms other baseline strategies for task selection, achieving better out-of-distribution generalization with fewer training tasks. Additionally, we introduce a task map that categorizes and diagnoses tasks based on prompt uncertainty and prediction probability. We discover that training on ambiguous (prompt-uncertain) tasks improves generalization while training on difficult (prompt-certain and low-probability) tasks offers no benefit, underscoring the importance of task selection for instruction tuning.
    摘要

Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models

  • paper_url: http://arxiv.org/abs/2311.00287
  • repo_url: https://github.com/ritaranx/clingen
  • paper_authors: Ran Xu, Hejie Cui, Yue Yu, Xuan Kan, Wenqi Shi, Yuchen Zhuang, Wei Jin, Joyce Ho, Carl Yang
  • for: 这个论文是为了提高临床自然语言处理领域中的方法,以便更好地处理复杂的医疗术语和临床背景。
  • methods: 该论文使用了大型自然语言模型(LLM)来解决这些问题,并提出了一种新的、资源有效的方法,即ClinGen,它将知识注入到过程中。
  • results: 该论文的实验表明,ClinGen可以在7种临床自然语言处理任务和16个数据集上提高性能,并且能够有效地增加数据生成的多样性和准确性。
    Abstract Clinical natural language processing requires methods that can address domain-specific challenges, such as complex medical terminology and clinical contexts. Recently, large language models (LLMs) have shown promise in this domain. Yet, their direct deployment can lead to privacy issues and are constrained by resources. To address this challenge, we delve into synthetic clinical text generation using LLMs for clinical NLP tasks. We propose an innovative, resource-efficient approach, ClinGen, which infuses knowledge into the process. Our model involves clinical knowledge extraction and context-informed LLM prompting. Both clinical topics and writing styles are drawn from external domain-specific knowledge graphs and LLMs to guide data generation. Our extensive empirical study across 7 clinical NLP tasks and 16 datasets reveals that ClinGen consistently enhances performance across various tasks, effectively aligning the distribution of real datasets and significantly enriching the diversity of generated training instances. We will publish our code and all the generated data in \url{https://github.com/ritaranx/ClinGen}.
    摘要 临床自然语言处理需要针对医疗域特有的挑战,如医疗术语和临床背景。最近,大型自然语言模型(LLM)在这个领域表现了承诺。然而,直接部署LLM可能会导致隐私问题并受到资源限制。为解决这个挑战,我们探索了人工生成的临床文本,使用LLM进行临床NLP任务。我们提出了一种创新的、资源有效的方法,名为ClinGen,它将知识整合到过程中。我们的模型包括临床知识提取和上下文 Informed LLM 提示。两者均从外部域specific知识图和LLM中提取临床主题和写作风格,以引导数据生成。我们的广泛的实验研究 across 7 临床 NLP 任务和 16 个数据集显示,ClinGen 在不同任务上一致地提高性能,并准确地调整实际数据的分布,并且有效地增加生成的训练示例多样性。我们将将代码和所有生成的数据发布在 \url{https://github.com/ritaranx/ClinGen}.

JADE: A Linguistics-based Safety Evaluation Platform for LLM

  • paper_url: http://arxiv.org/abs/2311.00286
  • repo_url: https://github.com/whitzard-ai/jade-db
  • paper_authors: Mi Zhang, Xudong Pan, Min Yang
  • for: 这个论文的目的是提出一种名为JADE的目标语言杂化平台,用于同时破坏多种广泛使用的中文和英文语言模型(LLMs)。
  • methods: JADE使用诺曼·钱博士的变换生成语法理论,将seed问题的语言复杂度逐步增加,直到破坏LLMs的安全防护。
  • results: JADE可以同时破坏多种中文和英文LLMs,并且生成了一些不安全的问题,其中大多数问题都能够让LLMs生成不良的回答。在 average unsafe generation ratio 为 70% 的情况下,这些问题仍然是自然、流畅的。
    Abstract In this paper, we present JADE, a targeted linguistic fuzzing platform which strengthens the linguistic complexity of seed questions to simultaneously and consistently break a wide range of widely-used LLMs categorized in three groups: eight open-sourced Chinese, six commercial Chinese and four commercial English LLMs. JADE generates three safety benchmarks for the three groups of LLMs, which contain unsafe questions that are highly threatening: the questions simultaneously trigger harmful generation of multiple LLMs, with an average unsafe generation ratio of $70\%$ (please see the table below), while are still natural questions, fluent and preserving the core unsafe semantics. We release the benchmark demos generated for commercial English LLMs and open-sourced English LLMs in the following link: https://github.com/whitzard-ai/jade-db. For readers who are interested in evaluating on more questions generated by JADE, please contact us. JADE is based on Noam Chomsky's seminal theory of transformational-generative grammar. Given a seed question with unsafe intention, JADE invokes a sequence of generative and transformational rules to increment the complexity of the syntactic structure of the original question, until the safety guardrail is broken. Our key insight is: Due to the complexity of human language, most of the current best LLMs can hardly recognize the invariant evil from the infinite number of different syntactic structures which form an unbound example space that can never be fully covered. Technically, the generative/transformative rules are constructed by native speakers of the languages, and, once developed, can be used to automatically grow and transform the parse tree of a given question, until the guardrail is broken. For more evaluation results and demo, please check our website: https://whitzard-ai.github.io/jade.html.
    摘要 在这篇论文中,我们介绍了JADE,一种针对性语言扩散平台,强化了种子问题的语言复杂度,同时并不断破坏了多种广泛使用的中文和英文语言模型。JADE生成了三个安全指标 для这三个类型的语言模型,包括 unsafe 问题,这些问题同时触发了多种语言模型的危险生成,平均危险生成率为70%(请参考下面的表),但是仍然是自然的问题,流畅而且保留了核心危险 semantics。我们在以下链接上发布了商业英文语言模型和开源英文语言模型的示例数据:https://github.com/whitzard-ai/jade-db。如果您有兴趣evaluate更多由JADE生成的问题,请与我们联系。JADE基于诺姆·钱百列的transformational-generative grammar理论。给定一个带有危险意图的种子问题,JADE采用一系列生成和transformational规则,逐步增加问题的语法结构复杂度,直到破坏安全 guardrail。我们的关键发现是:由于人类语言的复杂性,现有的最佳语言模型很难正确识别不同语法结构中的恶势力。技术上,生成/transformative规则由本地语言专家构建,并一旦开发,可以自动增长和转换问题的parse树,直到 guardrail 被破坏。更多评估结果和示例,请查看我们的网站:https://whitzard-ai.github.io/jade.html。

Re-Scoring Using Image-Language Similarity for Few-Shot Object Detection

  • paper_url: http://arxiv.org/abs/2311.00278
  • repo_url: None
  • paper_authors: Min Jae Jung, Seung Dae Han, Joohee Kim
  • for: 本研究旨在提高几个标注数据的 объек检测性能,特别是检测新的对象。
  • methods: 本研究使用了 Contrastive Language-Image Pre-training (CLIP) 和 hard negative classification loss 来改进对象检测性能。
  • results: 经验表明,提出的 RISF 方法在 MS-COCO 和 PASCAL VOC 上具有显著的性能提升,substantially 超越了现有的方法。
    Abstract Few-shot object detection, which focuses on detecting novel objects with few labels, is an emerging challenge in the community. Recent studies show that adapting a pre-trained model or modified loss function can improve performance. In this paper, we explore leveraging the power of Contrastive Language-Image Pre-training (CLIP) and hard negative classification loss in low data setting. Specifically, we propose Re-scoring using Image-language Similarity for Few-shot object detection (RISF) which extends Faster R-CNN by introducing Calibration Module using CLIP (CM-CLIP) and Background Negative Re-scale Loss (BNRL). The former adapts CLIP, which performs zero-shot classification, to re-score the classification scores of a detector using image-class similarities, the latter is modified classification loss considering the punishment for fake backgrounds as well as confusing categories on a generalized few-shot object detection dataset. Extensive experiments on MS-COCO and PASCAL VOC show that the proposed RISF substantially outperforms the state-of-the-art approaches. The code will be available.
    摘要 “几何 shot 物体检测,强调检测新物体几个标签,是当前社区的一个崛起挑战。 latest studies show that modifying a pre-trained model or loss function can improve performance. In this paper, we explore using the power of Contrastive Language-Image Pre-training (CLIP) and hard negative classification loss in low data setting. Specifically, we propose Re-scoring using Image-language Similarity for Few-shot object detection (RISF) which extends Faster R-CNN by introducing Calibration Module using CLIP (CM-CLIP) and Background Negative Re-scale Loss (BNRL). The former adapts CLIP, which performs zero-shot classification, to re-score the classification scores of a detector using image-class similarities, the latter is modified classification loss considering the punishment for fake backgrounds as well as confusing categories on a generalized few-shot object detection dataset. Extensive experiments on MS-COCO and PASCAL VOC show that the proposed RISF substantially outperforms the state-of-the-art approaches. The code will be available.”Here's the breakdown of the translation:* 几何 shot (few-shot) 物体检测 (object detection)* 强调 (emphasize) 新物体 (novel objects) 几个标签 (few labels)* latest studies (最近的研究) show (显示) that (that) modifying (修改) a pre-trained model (预训练模型) or loss function (损失函数) can improve (提高) performance.* In this paper (在这篇论文中), we explore (探索) using the power (使用) of Contrastive Language-Image Pre-training (CLIP) and hard negative classification loss (困难的负类别损失) in low data setting (低数据设定).* Specifically (特别), we propose (提议) Re-scoring using Image-language Similarity for Few-shot object detection (RISF) which extends (扩展) Faster R-CNN by introducing (引入) Calibration Module using CLIP (CM-CLIP) and Background Negative Re-scale Loss (BNRL).* The former (前者) adapts (适应) CLIP, which performs zero-shot classification (执行零批分类), to re-score (重新分类) the classification scores (分类分数) of a detector (检测器) using image-class similarities (图像类相似性).* The latter (后者) is modified (修改) classification loss (类别损失) considering (考虑) the punishment (惩罚) for fake backgrounds (假背景) as well as confusing categories (混淆类别) on a generalized few-shot object detection dataset (一般化的几何 shot 物体检测集).* Extensive experiments (广泛的实验) on MS-COCO and PASCAL VOC show (显示) that the proposed RISF substantially outperforms (显著超越) the state-of-the-art approaches (当前的方法).* The code will be available (代码将可用).

ChatCoder: Chat-based Refine Requirement Improves LLMs’ Code Generation

  • paper_url: http://arxiv.org/abs/2311.00272
  • repo_url: None
  • paper_authors: Zejun Wang, Jia Li, Ge Li, Zhi Jin
  • for: 提高大型自然语言处理模型对人类需求的理解和代码生成性能
  • methods: 通过人类与大型自然语言处理模型的对话方式,引导人类用户修改需求表达,使其更加精确、不ambiguous和完整
  • results: 实验显示,ChatCoder可以大幅提高现有大型自然语言处理模型的代码生成性能,同时比起修改基于需求的方法和人类回应基于模型 fine-tuning 方法更有优势。
    Abstract Large language models have shown good performances in generating code to meet human requirements. However, human requirements expressed in natural languages can be vague, incomplete, and ambiguous, leading large language models to misunderstand human requirements and make mistakes. Worse, it is difficult for a human user to refine the requirement. To help human users refine their requirements and improve large language models' code generation performances, we propose ChatCoder: a method to refine the requirements via chatting with large language models. We design a chat scheme in which the large language models will guide the human users to refine their expression of requirements to be more precise, unambiguous, and complete than before. Experiments show that ChatCoder has improved existing large language models' performance by a large margin. Besides, ChatCoder has the advantage over refine-based methods and LLMs fine-tuned via human response.
    摘要 大型自然语言模型已经表现出优秀的代码生成能力,但是人类需求表现往往是模糊、不完整和欠精确,导致大型自然语言模型 misunderstand 人类需求并发生错误。更糟糕的是,人类用户很难更正需求。为了帮助人类用户更正需求并提高大型自然语言模型的代码生成能力,我们提出了 ChatCoder:一种方法,通过与大型自然语言模型聊天,帮助人类用户更正需求,使其更精确、不模糊和完整。实验结果显示,ChatCoder 可以大幅提高现有大型自然语言模型的表现。此外,ChatCoder 比较于 refine-based 方法和 LLMS 通过人类回应进行 fine-tuning 来更有优势。

Rethinking Decision Transformer via Hierarchical Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.00267
  • repo_url: None
  • paper_authors: Yi Ma, Chenjun Xiao, Hebin Liang, Jianye Hao
  • for: 这篇论文旨在探讨决策变换(DT)算法在决策学习(RL)中的应用。
  • methods: 该论文提出了一种基于转换架构的普适序列模型框架,用于研究序列决策。在做决策时,高级策略首先提议理想的提示,而低级策略根据给定提示生成动作。研究发现,DT是这个框架的一个特殊情况,并讨论了这些选择的可能的失败。
  • results: 经验结果显示,提出的算法在多个控制和导航标准准chmark上显著超过DT。
    Abstract Decision Transformer (DT) is an innovative algorithm leveraging recent advances of the transformer architecture in reinforcement learning (RL). However, a notable limitation of DT is its reliance on recalling trajectories from datasets, losing the capability to seamlessly stitch sub-optimal trajectories together. In this work we introduce a general sequence modeling framework for studying sequential decision making through the lens of Hierarchical RL. At the time of making decisions, a high-level policy first proposes an ideal prompt for the current state, a low-level policy subsequently generates an action conditioned on the given prompt. We show DT emerges as a special case of this framework with certain choices of high-level and low-level policies, and discuss the potential failure of these choices. Inspired by these observations, we study how to jointly optimize the high-level and low-level policies to enable the stitching ability, which further leads to the development of new offline RL algorithms. Our empirical results clearly show that the proposed algorithms significantly surpass DT on several control and navigation benchmarks. We hope our contributions can inspire the integration of transformer architectures within the field of RL.
    摘要 Note:* "变转器架构" (transformer architecture) is translated as "变换器架构" in Simplified Chinese.* "Sequential Decision Making" (SDM) is translated as "Sequential Decision Making" in Simplified Chinese.* "高级策略" (high-level policy) is translated as "高级策略" in Simplified Chinese.* "低级策略" (low-level policy) is translated as "低级策略" in Simplified Chinese.* "做出决策" (make decisions) is translated as "做出决策" in Simplified Chinese.* "整体" (entirely) is translated as "整体" in Simplified Chinese.* "新的offline RL算法" (new offline RL algorithms) is translated as "新的offline RL算法" in Simplified Chinese.

Plug-and-Play Policy Planner for Large Language Model Powered Dialogue Agents

  • paper_url: http://arxiv.org/abs/2311.00262
  • repo_url: None
  • paper_authors: Yang Deng, Wenxuan Zhang, Wai Lam, See-Kiong Ng, Tat-Seng Chua
  • for: 该论文旨在提高语言模型(LLM)的对话政策规划能力,以便在对话中更加积极地与人类交互。
  • methods: 该论文提出了一种新的对话政策规划 paradigm,named PPDPP,它使用可调整的语言模型插件作为对话政策规划器,并通过监督微调和目标带动回馈来协助LLM拟合不同应用场景。
  • results: 实验结果表明,PPDPP在三种不同的积极对话应用中(包括谈判、情感支持和教学对话)具有显著优势,与现有方法相比,可以减少对话时间、提高对话质量和适应性。
    Abstract Proactive dialogues serve as a practical yet challenging dialogue problem in the era of large language models (LLMs), where the dialogue policy planning is the key to improving the proactivity of LLMs. Most existing studies enable the dialogue policy planning of LLMs using various prompting schemes or iteratively enhance this capability in handling the given case with verbal AI feedback. However, these approaches are either bounded by the policy planning capability of the frozen LLMs or hard to be transferred to new cases. In this work, we introduce a new dialogue policy planning paradigm to strategize LLMs for proactive dialogue problems with a tunable language model plug-in as a plug-and-play dialogue policy planner, named PPDPP. Specifically, we develop a novel training framework to facilitate supervised fine-tuning over available human-annotated data as well as reinforcement learning from goal-oriented AI feedback with dynamic interaction data collected by the LLM-based self-play simulation. In this manner, the LLM-powered dialogue agent can not only be generalized to different cases after the training, but also be applicable to different applications by just substituting the learned plug-in. In addition, we propose to evaluate the policy planning capability of dialogue systems under the interactive setting. Experimental results demonstrate that PPDPP consistently and substantially outperforms existing approaches on three different proactive dialogue applications, including negotiation, emotional support, and tutoring dialogues.
    摘要 大语言模型(LLM)的对话政策规划是一个实用又挑战性的对话问题,在这个时代,对话政策规划是改善 LLM 的核心。现有的研究通常使用不同的提示方案或逐步提高这个能力,但这些方法都受到固定 LLM 的政策规划能力的限制,或者对新情况难以转移。在这个工作中,我们介绍了一个新的对话政策规划方法,以便将 LLM 为主动对话问题的战略。 Specifically, we develop a novel training framework to facilitate supervised fine-tuning over available human-annotated data as well as reinforcement learning from goal-oriented AI feedback with dynamic interaction data collected by the LLM-based self-play simulation. In this manner, the LLM-powered dialogue agent can not only be generalized to different cases after the training, but also be applicable to different applications by just substituting the learned plug-in. In addition, we propose to evaluate the policy planning capability of dialogue systems under the interactive setting. Experimental results demonstrate that PPDPP consistently and substantially outperforms existing approaches on three different proactive dialogue applications, including negotiation, emotional support, and tutoring dialogues.

Implicit biases in multitask and continual learning from a backward error analysis perspective

  • paper_url: http://arxiv.org/abs/2311.00235
  • repo_url: None
  • paper_authors: Benoit Dherin
  • for: 这篇论文是关于使用回溯错误分析计算神经网络在多任务和继续学习 Setting 中的隐式训练偏好的研究。
  • methods: 这篇论文使用了 Stochastic Gradient Descent 训练神经网络,并 derive 了一些修改后的损失函数,其中包括原始损失函数、 converge 损失函数、隐式平滑化正则化项以及 conflict 项。
  • results: 研究发现,在多任务 Setting 中,conflict 项是一个已知的量,度量任务梯度的吸引力,而在继续学习 Setting 中,conflict 项是一个新的深度学习优化中的量,它是 differential geometry 中的 Lie 括号 between 任务梯度。
    Abstract Using backward error analysis, we compute implicit training biases in multitask and continual learning settings for neural networks trained with stochastic gradient descent. In particular, we derive modified losses that are implicitly minimized during training. They have three terms: the original loss, accounting for convergence, an implicit flatness regularization term proportional to the learning rate, and a last term, the conflict term, which can theoretically be detrimental to both convergence and implicit regularization. In multitask, the conflict term is a well-known quantity, measuring the gradient alignment between the tasks, while in continual learning the conflict term is a new quantity in deep learning optimization, although a basic tool in differential geometry: The Lie bracket between the task gradients.
    摘要 (使用倒数反析,我们计算了多任务和持续学习设置下神经网络在权重梯度下降法中的隐式训练偏见。特别是,我们 derivated modified losses,在训练中隐式地减少。它们包括三个项:原始损失,考虑到收敛,隐式平滑化规化项,卷积率相对,以及最后一个项,冲突项,可以 theoretically detrimental to both convergence and implicit regularization。在多任务中,冲突项是一个已知量,测量任务的梯度对齐,而在持续学习中,冲突项是一个新的深度学习优化工具,尽管是Diffgeometry中的一个基本工具:任务梯度的Lie括茧。)

StableFDG: Style and Attention Based Learning for Federated Domain Generalization

  • paper_url: http://arxiv.org/abs/2311.00227
  • repo_url: None
  • paper_authors: Jungwuk Park, Dong-Jun Han, Jinho Kim, Shiqiang Wang, Christopher G. Brinton, Jaekyun Moon
  • for: 本文提出了一种针对 Federated Learning (FL) 环境中的领域泛化(Domain Generalization,DG)问题的解决方案,以提高FL中的鲁棒性和通用性。
  • methods: 本文提出了两个重要贡献:首先是基于样式的学习策略,允许每个客户端在本地数据集中探索新的样式,提高领域多样性 based on 提出的样式分享、转移和探索策略。 其次是基于注意力的特征强调器,可以捕捉不同类别数据amples 之间的相似性,强调重要/共同特征,以更好地学习FL中的领域无关特征。
  • results: 实验结果表明,StableFDG 比现有的基elines 在多个 DG 标准 benchmark 数据集上表现出色, demonstrating its effectiveness.
    Abstract Traditional federated learning (FL) algorithms operate under the assumption that the data distributions at training (source domains) and testing (target domain) are the same. The fact that domain shifts often occur in practice necessitates equipping FL methods with a domain generalization (DG) capability. However, existing DG algorithms face fundamental challenges in FL setups due to the lack of samples/domains in each client's local dataset. In this paper, we propose StableFDG, a style and attention based learning strategy for accomplishing federated domain generalization, introducing two key contributions. The first is style-based learning, which enables each client to explore novel styles beyond the original source domains in its local dataset, improving domain diversity based on the proposed style sharing, shifting, and exploration strategies. Our second contribution is an attention-based feature highlighter, which captures the similarities between the features of data samples in the same class, and emphasizes the important/common characteristics to better learn the domain-invariant characteristics of each class in data-poor FL scenarios. Experimental results show that StableFDG outperforms existing baselines on various DG benchmark datasets, demonstrating its efficacy.
    摘要 <> translate "Traditional federated learning (FL) algorithms operate under the assumption that the data distributions at training (source domains) and testing (target domain) are the same. The fact that domain shifts often occur in practice necessitates equipping FL methods with a domain generalization (DG) capability. However, existing DG algorithms face fundamental challenges in FL setups due to the lack of samples/domains in each client's local dataset. In this paper, we propose StableFDG, a style and attention based learning strategy for accomplishing federated domain generalization, introducing two key contributions. The first is style-based learning, which enables each client to explore novel styles beyond the original source domains in its local dataset, improving domain diversity based on the proposed style sharing, shifting, and exploration strategies. Our second contribution is an attention-based feature highlighter, which captures the similarities between the features of data samples in the same class, and emphasizes the important/common characteristics to better learn the domain-invariant characteristics of each class in data-poor FL scenarios. Experimental results show that StableFDG outperforms existing baselines on various DG benchmark datasets, demonstrating its efficacy."中文翻译:传统的联合学习(FL)算法假设训练(源领域)和测试(目标领域)数据分布相同。然而,在实践中,频繁出现域shift问题,因此需要为FL方法增加域泛化(DG)能力。然而,现有的DG算法在FL设置中面临fundamental挑战,因为每个客户端的本地数据集中缺乏样本/域。在这篇论文中,我们提出了稳定FDG,一种风格和注意力基于学习策略,用于实现联合域泛化。我们的两大贡献是:首先,风格学习,允许每个客户端在本地数据集中探索新的风格,提高域多样性基于我们提出的风格分享、转换和探索策略。其次,我们提出了注意力基本特征强调器,可以捕捉数据示例在同一类型中的相似性,强调重要/共同特征,以更好地学习每个类型的域无关特征。实验结果表明,稳定FDG在多个DGbenchmark数据集上表现出色,证明其效果。

Domain decomposition-based coupling of physics-informed neural networks via the Schwarz alternating method

  • paper_url: http://arxiv.org/abs/2311.00224
  • repo_url: None
  • paper_authors: Will Snyder, Irina Tezaur, Christopher Wentland
  • for: 解决非线性偏微分方程(PDE)的数据驱动工具。
  • methods: 使用Schwarz alternating方法将PINN coupling到彼此和传统数值模型(FOM)。
  • results: 对一个一维稳态扩散-扩散方程进行数值研究,发现 coupling PINN via Schwarz alternating method可以提高PINN训练速度,但不一定加速PINN convergence。
    Abstract Physics-informed neural networks (PINNs) are appealing data-driven tools for solving and inferring solutions to nonlinear partial differential equations (PDEs). Unlike traditional neural networks (NNs), which train only on solution data, a PINN incorporates a PDE's residual into its loss function and trains to minimize the said residual at a set of collocation points in the solution domain. This paper explores the use of the Schwarz alternating method as a means to couple PINNs with each other and with conventional numerical models (i.e., full order models, or FOMs, obtained via the finite element, finite difference or finite volume methods) following a decomposition of the physical domain. It is well-known that training a PINN can be difficult when the PDE solution has steep gradients. We investigate herein the use of domain decomposition and the Schwarz alternating method as a means to accelerate the PINN training phase. Within this context, we explore different approaches for imposing Dirichlet boundary conditions within each subdomain PINN: weakly through the loss and/or strongly through a solution transformation. As a numerical example, we consider the one-dimensional steady state advection-diffusion equation in the advection-dominated (high Peclet) regime. Our results suggest that the convergence of the Schwarz method is strongly linked to the choice of boundary condition implementation within the PINNs being coupled. Surprisingly, strong enforcement of the Schwarz boundary conditions does not always lead to a faster convergence of the method. While it is not clear from our preliminary study that the PINN-PINN coupling via the Schwarz alternating method accelerates PINN convergence in the advection-dominated regime, it reveals that PINN training can be improved substantially for Peclet numbers as high as 1e6 by performing a PINN-FOM coupling.
    摘要 物理学 Informed Neural Networks (PINNs) 是一种吸引人的数据驱动工具,用于解决和推导非线性偏微分方程 (PDEs) 的解。与传统的神经网络 (NNs) 不同,PINNs 在训练过程中不仅学习解数据,还包含 PDE 的剩余在损失函数中,并在协调点上培训以降低这些剩余。本文研究了使用 Schwarz 交互方法将 PINNs 集成到传统的数值模型 (FOMs) 中,以实现域 decomposure 的目的。在训练 PINNs 时,如果解的梯度较大,可能会增加训练难度。我们在这里研究了使用域 decomposure 和 Schwarz 交互方法来加速 PINNs 训练阶段。在这个上下文中,我们还研究了不同的 Dirichlet 边界条件的实现方式,包括通过损失函数和/或强制实施解转换。我们的数字示例是一个一维不变 steady state 扩散-扩散 Equation 在扩散 доминиated (高 Peclet) режиме。我们的结果表明,Schwarz 方法的收敛与 PINNs 之间的边界条件实现方式有着强有力的关系。尽管使用强制实施边界条件可能会加速方法的收敛,但并不总是如此。我们的初步研究表明,在 Peclet 数为 1e6 的情况下,通过 PINNs-FOM 集成可以大幅提高 PINNs 的训练效率。

Can Large Language Models Capture Public Opinion about Global Warming? An Empirical Assessment of Algorithmic Fidelity and Bias

  • paper_url: http://arxiv.org/abs/2311.00217
  • repo_url: None
  • paper_authors: S. Lee, T. Q. Peng, M. H. Goldberg, S. A. Rosenthal, J. E. Kotcher, E. W. Maibach, A. Leiserowitz
  • for: This study assesses the algorithmic fidelity and bias of large language models (LLMs) in simulating survey responses, specifically in relation to climate change perspectives.
  • methods: The study uses two nationally representative climate change surveys and conditions LLMs on demographics and/or psychological covariates to simulate survey responses. GPT-4 is used as one of the LLMs and is found to perform better when conditioned on both demographics and covariates.
  • results: The study finds that LLMs can effectively capture presidential voting behaviors, but encounter challenges in accurately representing global warming perspectives when relevant covariates are not included. The study also identifies disparities in LLM estimations of the views of certain groups, with LLMs tending to underestimate worry about global warming among Black Americans.
    Abstract Large language models (LLMs) have demonstrated their potential in social science research by emulating human perceptions and behaviors, a concept referred to as algorithmic fidelity. This study assesses the algorithmic fidelity and bias of LLMs by utilizing two nationally representative climate change surveys. The LLMs were conditioned on demographics and/or psychological covariates to simulate survey responses. The findings indicate that LLMs can effectively capture presidential voting behaviors but encounter challenges in accurately representing global warming perspectives when relevant covariates are not included. GPT-4 exhibits improved performance when conditioned on both demographics and covariates. However, disparities emerge in LLM estimations of the views of certain groups, with LLMs tending to underestimate worry about global warming among Black Americans. While highlighting the potential of LLMs to aid social science research, these results underscore the importance of meticulous conditioning, model selection, survey question format, and bias assessment when employing LLMs for survey simulation. Further investigation into prompt engineering and algorithm auditing is essential to harness the power of LLMs while addressing their inherent limitations.
    摘要

Consistent Video-to-Video Transfer Using Synthetic Dataset

  • paper_url: http://arxiv.org/abs/2311.00213
  • repo_url: None
  • paper_authors: Jiaxin Cheng, Tianjun Xiao, Tong He
  • for: 文章旨在提出一种新的和高效的文本基于视频编辑方法,减少每个视频需要进行资源占用的模型特定finetuning。
  • methods: 我们的方法的核心是一个synthetic paired视频数据集,用于视频转换任务。我们 Drawing inspiration from Instruct Pix2Pix的图像转换via编辑指令,我们将这种 парадиг应用到视频领域。我们还引入了Long Video Sampling Correction,确保批处理中的长视频保持一致。
  • results: 我们的方法超越了现有的方法如Tune-A-Video,表明了文本基于视频编辑的重要进步,并开示了进一步探索和应用的潜在可能性。
    Abstract We introduce a novel and efficient approach for text-based video-to-video editing that eliminates the need for resource-intensive per-video-per-model finetuning. At the core of our approach is a synthetic paired video dataset tailored for video-to-video transfer tasks. Inspired by Instruct Pix2Pix's image transfer via editing instruction, we adapt this paradigm to the video domain. Extending the Prompt-to-Prompt to videos, we efficiently generate paired samples, each with an input video and its edited counterpart. Alongside this, we introduce the Long Video Sampling Correction during sampling, ensuring consistent long videos across batches. Our method surpasses current methods like Tune-A-Video, heralding substantial progress in text-based video-to-video editing and suggesting exciting avenues for further exploration and deployment.
    摘要 我们介绍了一种新的和高效的文本基于视频到视频编辑方法,消除了每个视频需要进行资源占用的精细化训练的需求。我们的方法的核心是一个人工合成的视频对应 dataset,专门用于视频转换任务。 Drawing inspiration from Instruct Pix2Pix的图像转换via编辑指令,我们将这种 парадиг adapted to the video domain。通过扩展Prompt-to-Prompt来视频,我们高效地生成了对应的样本,每个样本包括一个输入视频和其修改后的对应视频。此外,我们还引入了长视频抽样 corrections,以确保批处中的视频都是一致的长度。我们的方法超越了现有的方法 like Tune-A-Video, representing a significant progress in text-based video-to-video editing and opening up exciting avenues for further exploration and deployment.

Magmaw: Modality-Agnostic Adversarial Attacks on Machine Learning-Based Wireless Communication Systems

  • paper_url: http://arxiv.org/abs/2311.00207
  • repo_url: None
  • paper_authors: Jung-Woo Chang, Ke Sun, Nasimeh Heydaribeni, Seira Hidano, Xinyu Zhang, Farinaz Koushanfar
  • For: This paper proposes a black-box attack methodology called Magmaw that can generate universal adversarial perturbations for any multimodal signal transmitted over a wireless channel, targeting ML-based wireless systems.* Methods: Magmaw uses a combination of optimization techniques and machine learning algorithms to generate perturbations that are resilient to existing defense methods such as adversarial training and perturbation signal subtraction.* Results: The paper demonstrates the effectiveness of Magmaw through experimental results on a real-time wireless attack platform using a software-defined radio system, showing significant performance degradation even in the presence of defense mechanisms. Additionally, Magmaw is found to be effective against encrypted communication channels and conventional communications.Here’s the simplified Chinese text for the three key points:* For: 这篇论文提出了一种黑盒攻击方法,名为Magmaw,可以生成任何多模态信号通过无线通信频率的 universal adversarial 扰动。* Methods: Magmaw 使用了优化技术和机器学习算法来生成对防御机制不效的扰动。* Results: 论文通过实验示范了Magmaw 的效果,使用了一个基于软件定义 радио系统的实时无线攻击平台,并证明了Magmaw 在防御机制存在时仍然能够导致显著的性能下降。
    Abstract Machine Learning (ML) has been instrumental in enabling joint transceiver optimization by merging all physical layer blocks of the end-to-end wireless communication systems. Although there have been a number of adversarial attacks on ML-based wireless systems, the existing methods do not provide a comprehensive view including multi-modality of the source data, common physical layer components, and wireless domain constraints. This paper proposes Magmaw, the first black-box attack methodology capable of generating universal adversarial perturbations for any multimodal signal transmitted over a wireless channel. We further introduce new objectives for adversarial attacks on ML-based downstream applications. The resilience of the attack to the existing widely used defense methods of adversarial training and perturbation signal subtraction is experimentally verified. For proof-of-concept evaluation, we build a real-time wireless attack platform using a software-defined radio system. Experimental results demonstrate that Magmaw causes significant performance degradation even in the presence of the defense mechanisms. Surprisingly, Magmaw is also effective against encrypted communication channels and conventional communications.
    摘要

ChatGPT-Powered Hierarchical Comparisons for Image Classification

  • paper_url: http://arxiv.org/abs/2311.00206
  • repo_url: https://github.com/zhiyuan-r/chatgpt-powered-hierarchical-comparisons-for-image-classification
  • paper_authors: Zhiyuan Ren, Yiyang Su, Xiaoming Liu
  • for: 提出了一种新的图像分类框架,用于解决零例开放词汇图像分类 Task。
  • methods: 使用 CLIP 预训练视觉语言模型,并利用大型语言模型(LLMs)如 ChatGPT 提供类别特有的知识。
  • results: 提出了一种基于层次比较的图像分类方法,可以带来有意义、有效和可解释的结果。
    Abstract The zero-shot open-vocabulary challenge in image classification is tackled by pretrained vision-language models like CLIP, which benefit from incorporating class-specific knowledge from large language models (LLMs) like ChatGPT. However, biases in CLIP lead to similar descriptions for distinct but related classes, prompting our novel image classification framework via hierarchical comparisons: using LLMs to recursively group classes into hierarchies and classifying images by comparing image-text embeddings at each hierarchy level, resulting in an intuitive, effective, and explainable approach.
    摘要 CLIP和类型特定语言模型(LLM)如ChatGPT的视觉语言模型可以解决零架open-vocabulary挑战,但CLIP中的偏见导致类似的描述 для不同 yet related的类别,因此我们提出了一种新的图像分类框架:通过使用LLM来 recursively分类类别,并将图像与文本嵌入对比在每个层级上,从而实现直观、有效和可解释的方法。

Continuous Training and Fine-tuning for Domain-Specific Language Models in Medical Question Answering

  • paper_url: http://arxiv.org/abs/2311.00204
  • repo_url: None
  • paper_authors: Zhen Guo, Yining Hua
  • for: 这个研究旨在将大型语言模型训练为医学领域专家模型,以便应用不需要训练成本过高。
  • methods: 这个研究使用了连续训练和指令精炼方法,将Llama 2 base model逐渐适应中文医学领域。首先,使用10亿个中文医学参考文本进行连续训练,教育模型学习医学相关词汇和知识。然后,将模型精炼在54,000个中文医学考试例项上。
  • results: 实验结果显示,这种方法具有效果,可以训练出与GPT-3.5-turbo相比的模型,但需要训练时间和计算资源的投入相对较少。这个领域专家模型可以用于中文医学应用,同时也提供了领域专家模型训练的一个模板,可以应用于其他需要专家知识的领域,如法律、科学和工程。
    Abstract Large language models exhibit promising general capabilities but often lack specialized knowledge for domain-specific tasks. Developing domain experts from a base model enables a range of applications without prohibitive training costs. This work demonstrates a method using continuous training and instruction fine-tuning to rapidly adapt Llama 2 base models to the Chinese medical domain. We first conduct continuous training on 1B tokens from Chinese medical references to teach relevant vocabulary and knowledge. The models are then fine-tuned on 54K examples sourced from the Chinese National Medical Licensing Examination. Experiments on Chinese medical data confirm the effectiveness of this approach, producing a model comparable to GPT-3.5-turbo while using way less computational resource. The resulting domain-specific model could be useful for various Chinese medical applications. More broadly, this provides a template for domain-specific training of large language models in areas where pre-trained models lack the required expertise, such as law, science, and engineering.
    摘要 大型语言模型具有抢idthPromising的通用能力,但经常缺乏专业知识 для领域特定任务。将基本模型发展为领域专家,可以无需昂费训练成本,开辟多个应用程序。这个工作展示了一种使用连续训练和指导精炼方法快速地适应Llama 2基本模型到中文医学领域。我们首先通过10亿个中文医学参考文本进行连续训练,教育模型重要的词汇和知识。然后,我们对54,000个中文医学测验例题进行精炼,实验结果显示这种方法的有效性,可以与GPT-3.5-turbo相比,但用了很多 fewer computational resource。所有的领域专家模型可以用于多个中文医学应用程序。此外,这提供了对各个领域的大型语言模型特定训练的模板,例如法律、科学和工程。

ZEETAD: Adapting Pretrained Vision-Language Model for Zero-Shot End-to-End Temporal Action Detection

  • paper_url: http://arxiv.org/abs/2311.00729
  • repo_url: https://github.com/UARK-AICV/ZEETAD
  • paper_authors: Thinh Phan, Khoa Vo, Duy Le, Gianfranco Doretto, Donald Adjeroh, Ngan Le
  • for: 本研究旨在提高零例目标检测(TAD)的性能,特别是在无需大量标注数据的情况下。
  • methods: 本研究使用了两个模块:分别是一个基于转移器的 dual-localization 模块和一个基于 CLIP 的 zero-shot 提案类型检测模块。 dual-localization 模块可以在视频中检测动作事件,并选择ively收集关键的 semantic 嵌入,以便 later 的认知。 CLIP 模块可以从文本和帧输入中生成 semantic 嵌入。
  • results: 对 THUMOS14 和 ActivityNet-1.3 数据集进行了广泛的实验,结果显示我们的方法在零例目标检测中表现出色,并能够有效地将 ViL 模型传递知识到未看到的动作类别。
    Abstract Temporal action detection (TAD) involves the localization and classification of action instances within untrimmed videos. While standard TAD follows fully supervised learning with closed-set setting on large training data, recent zero-shot TAD methods showcase the promising of open-set setting by leveraging large-scale contrastive visual-language (ViL) pretrained models. However, existing zero-shot TAD methods have limitations on how to properly construct the strong relationships between two interdependent tasks of localization and classification and adapt ViL model to video understanding. In this work, we present ZEETAD, featuring two modules: dual-localization and zero-shot proposal classification. The former is a Transformer-based module that detects action events while selectively collecting crucial semantic embeddings for later recognition. The latter one, CLIP-based module, generates semantic embeddings from text and frame inputs for each temporal unit. Additionally, we enhance discriminative capability on unseen classes by minimally updating the frozen CLIP encoder with lightweight adapters. Extensive experiments on THUMOS14 and ActivityNet-1.3 datasets demonstrate our approach's superior performance in zero-shot TAD and effective knowledge transfer from ViL models to unseen action categories.
    摘要 Temporal action detection (TAD) 涉及到视频中的动作实例的地方化和分类。而标准的 TAD 采用完全监督学习,使用大量训练数据。而现有的零shot TAD 方法具有如何正确地建立两个相互依赖的任务的关系,并将 ViL 模型适应视频理解。在这种工作中,我们提出了 ZEETAD,它包括两个模块:双向本地化和零shot 提案分类。前者是基于 Transformer 的模块,检测动作事件,并选择ively 收集关键的 semantic 嵌入,以便后续的识别。后者是基于 CLIP 的模块,生成从文本和帧输入的 semantic 嵌入。此外,我们通过轻量级更新冰封 CLIP 编码器来增强对未看到的类型的推理能力。我们对 THUMOS14 和 ActivityNet-1.3 数据集进行了广泛的实验,并证明了我们的方法在零shot TAD 和将 ViL 模型传递到未经见过的动作类别的能力是superior。

Modeling subjectivity (by Mimicking Annotator Annotation) in toxic comment identification across diverse communities

  • paper_url: http://arxiv.org/abs/2311.00203
  • repo_url: None
  • paper_authors: Senjuti Dutta, Sid Mittal, Sherol Chen, Deepak Ramachandran, Ravi Rajakumar, Ian Kivlichan, Sunny Mak, Alena Butryna, Praveen Paritosh
  • for: 本研究旨在提高自动化内容审核系统的可靠性,通过模拟多样化社区的看法来减少人工审核的依赖。
  • methods: 研究使用了新 datasets 和现有的公共 datasets,以及 Large Language Model(LLM) 进行评估。
  • results: 研究发现,各个 annotator 群体之间存在主观性,这说明了多数投票法的缺陷。将主观标签作为训练数据的真实标签,将在未来对多样化社区中的恶意评论进行识别和审核。
    Abstract The prevalence and impact of toxic discussions online have made content moderation crucial.Automated systems can play a vital role in identifying toxicity, and reducing the reliance on human moderation.Nevertheless, identifying toxic comments for diverse communities continues to present challenges that are addressed in this paper.The two-part goal of this study is to(1)identify intuitive variances from annotator disagreement using quantitative analysis and (2)model the subjectivity of these viewpoints.To achieve our goal, we published a new dataset\footnote{\url{https://github.com/XXX} with expert annotators' annotations and used two other public datasets to identify the subjectivity of toxicity.Then leveraging the Large Language Model(LLM),we evaluate the model's ability to mimic diverse viewpoints on toxicity by varying size of the training data and utilizing same set of annotators as the test set used during model training and a separate set of annotators as the test set.We conclude that subjectivity is evident across all annotator groups, demonstrating the shortcomings of majority-rule voting. Moving forward, subjective annotations should serve as ground truth labels for training models for domains like toxicity in diverse communities.
    摘要 在线上的敏感讨论普遍和影响力大,内容审核已成为必备。自动化系统可以扮演重要的角色,识别毒单不易,并减少人工审核的依赖。然而,识别多元社群中的毒单仍然存在挑战,这些挑战在这篇论文中被解决。本研究的两个目标是:一、通过量化分析发现标签者间的差异,二、模拟不同观点的主观性。为了实现目标,我们发布了一个新的数据集\footnotemark[1],并使用了三个公共数据集来识别毒单的主观性。接着,我们运用了大型自然语言模型(LLM)来评估模型是否能够模拟多元社群中的不同观点,并随着训练数据的大小和使用相同的标签者组来训练模型和评估模型。我们发现,在所有标签者群体中,主观性都存在,这表明了多数决的缺陷。未来,将主观标签作为训练模型的参考参数,将有助于提高在多元社群中的内容审核。

Federated Natural Policy Gradient Methods for Multi-task Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.00201
  • repo_url: None
  • paper_authors: Tong Yang, Shicong Cen, Yuting Wei, Yuxin Chen, Yuejie Chi
  • for: 这个论文的目的是研究分布式决策的多个代理不共享本地数据轨迹。
  • methods: 这个论文使用的方法是 federated reinforcement learning(RL),它可以在多个分布式代理之间进行协同决策,而不需要共享本地数据轨迹。
  • results: 论文的结果表明,使用 federated vanilla 和 entropy-regularized natural policy gradient(NPG)方法可以在分布式环境中学习 globally optimal policy,并且可以在不同的网络大小和连接性下实现非 asymptotic 全球准确性保证。
    Abstract Federated reinforcement learning (RL) enables collaborative decision making of multiple distributed agents without sharing local data trajectories. In this work, we consider a multi-task setting, in which each agent has its own private reward function corresponding to different tasks, while sharing the same transition kernel of the environment. Focusing on infinite-horizon tabular Markov decision processes, the goal is to learn a globally optimal policy that maximizes the sum of the discounted total rewards of all the agents in a decentralized manner, where each agent only communicates with its neighbors over some prescribed graph topology. We develop federated vanilla and entropy-regularized natural policy gradient (NPG) methods under softmax parameterization, where gradient tracking is applied to the global Q-function to mitigate the impact of imperfect information sharing. We establish non-asymptotic global convergence guarantees under exact policy evaluation, which are nearly independent of the size of the state-action space and illuminate the impacts of network size and connectivity. To the best of our knowledge, this is the first time that global convergence is established for federated multi-task RL using policy optimization. Moreover, the convergence behavior of the proposed algorithms is robust against inexactness of policy evaluation.
    摘要 simult代码中文翻译<>多 Agent 联合强化学习(RL)可以在多个分布式 Agent 之间进行共同决策,不需要共享本地数据轨迹。在这个工作中,我们考虑了多任务 setting,每个 Agent 都有自己私有的私人奖励函数,对应不同的任务,而共享同一个环境转移核函数。我们的目标是在无穷远Tabular Markov决策过程中学习一个全局最优策略,以最大化所有 Agent 的折扣总奖励,在分布式方式下进行决策,每个 Agent 只与其邻居进行交流,并且在一定的图形结构上进行交流。我们开发了联邦vanilla和熵 regularized 自然策略梯度(NPG)方法,并使用 softmax 归一化,并在梯度跟踪技术下对全球Q函数进行梯度追踪,以避免因不完全信息共享而导致的影响。我们提出了不对极限的全球吞吐量保证,这些保证在状态动作空间的大小和精度级别上几乎是独立的,并且透视网络大小和连接度的影响。到目前为止,这是首次对联邦多任务 RL 使用策略优化进行全球吞吐量的全球吞吐量确认。此外,我们的方法的吞吐量行为对精度评估的不一致具有Robust性。