cs.AI - 2023-09-25

Integrating Higher-Order Dynamics and Roadway-Compliance into Constrained ILQR-based Trajectory Planning for Autonomous Vehicles

  • paper_url: http://arxiv.org/abs/2309.14566
  • repo_url: None
  • paper_authors: Hanxiang Li, Jiaqiao Zhang, Sheng Zhu, Dongjian Tang, Donghao Xu
  • for: 本研究旨在提出一种基于CILQR优化算法的在道路上的自动驾驶汽车路径规划方法,以提高安全性和舒适性。
  • methods: 本研究使用了CILQR优化算法,并增加了更高阶的约束和成本,以确保路径规划是可控的。此外,本研究还考虑了道路规则遵从性,以确保车辆遵循路径规划的约束。
  • results: simulation和实际驾驶场景 validate了本研究的方法,显示了改进的安全性和舒适性。
    Abstract This paper addresses the advancements in on-road trajectory planning for Autonomous Passenger Vehicles (APV). Trajectory planning aims to produce a globally optimal route for APVs, considering various factors such as vehicle dynamics, constraints, and detected obstacles. Traditional techniques involve a combination of sampling methods followed by optimization algorithms, where the former ensures global awareness and the latter refines for local optima. Notably, the Constrained Iterative Linear Quadratic Regulator (CILQR) optimization algorithm has recently emerged, adapted for APV systems, emphasizing improved safety and comfort. However, existing implementations utilizing the vehicle bicycle kinematic model may not guarantee controllable trajectories. We augment this model by incorporating higher-order terms, including the first and second-order derivatives of curvature and longitudinal jerk. This inclusion facilitates a richer representation in our cost and constraint design. We also address roadway compliance, emphasizing adherence to lane boundaries and directions, which past work often overlooked. Lastly, we adopt a relaxed logarithmic barrier function to address the CILQR's dependency on feasible initial trajectories. The proposed methodology is then validated through simulation and real-world experiment driving scenes in real time.
    摘要 To address this limitation, we augment the model by incorporating higher-order terms, including the first and second-order derivatives of curvature and longitudinal jerk. This allows for a more detailed representation in our cost and constraint design. Additionally, we emphasize adherence to lane boundaries and directions, which past work often overlooked. To address the CILQR's dependency on feasible initial trajectories, we adopt a relaxed logarithmic barrier function.The proposed methodology is then validated through simulation and real-world experiment driving scenes in real time. This paper's contributions include a more accurate and comprehensive vehicle model, improved roadway compliance, and a relaxed logarithmic barrier function to address the CILQR's dependency on feasible initial trajectories. These advancements lead to more controllable and safe trajectories for APVs.

Generative Escher Meshes

  • paper_url: http://arxiv.org/abs/2309.14564
  • repo_url: None
  • paper_authors: Noam Aigerman, Thibault Groueix
  • For: This paper proposes a fully-automatic, text-guided generative method for producing periodic, repeating, tile-able 2D art, such as the one seen on floors, mosaics, ceramics, and the work of M.C. Escher.* Methods: The method uses an unconstrained, differentiable parameterization of the space of all possible tileable shapes for a given symmetry group, and modifies the laplacian used in a 2D mesh-mapping technique - Orbifold Tutte Embedding - to achieve all possible tiling configurations for a chosen planar symmetry group. The method also leverages a trained image diffusion model to define a loss on the resulting image, thereby updating the mesh’s parameters based on its appearance matching the text prompt.* Results: The paper shows that the method is able to produce plausible, appealing results, with non-trivial tiles, for a variety of different periodic tiling patterns.
    Abstract This paper proposes a fully-automatic, text-guided generative method for producing periodic, repeating, tile-able 2D art, such as the one seen on floors, mosaics, ceramics, and the work of M.C. Escher. In contrast to the standard concept of a seamless texture, i.e., square images that are seamless when tiled, our method generates non-square tilings which comprise solely of repeating copies of the same object. It achieves this by optimizing both geometry and color of a 2D mesh, in order to generate a non-square tile in the shape and appearance of the desired object, with close to no additional background details. We enable geometric optimization of tilings by our key technical contribution: an unconstrained, differentiable parameterization of the space of all possible tileable shapes for a given symmetry group. Namely, we prove that modifying the laplacian used in a 2D mesh-mapping technique - Orbifold Tutte Embedding - can achieve all possible tiling configurations for a chosen planar symmetry group. We thus consider both the mesh's tile-shape and its texture as optimizable parameters, rendering the textured mesh via a differentiable renderer. We leverage a trained image diffusion model to define a loss on the resulting image, thereby updating the mesh's parameters based on its appearance matching the text prompt. We show our method is able to produce plausible, appealing results, with non-trivial tiles, for a variety of different periodic tiling patterns.
    摘要 Our key technical contribution is an unconstrained, differentiable parameterization of the space of all possible tileable shapes for a given symmetry group. We modify the laplacian used in a 2D mesh-mapping technique called Orbifold Tutte Embedding to achieve all possible tiling configurations for a chosen planar symmetry group. This allows us to optimize both the mesh's tile shape and its texture as parameters, which are then rendered using a differentiable renderer.We use a trained image diffusion model to define a loss on the resulting image, which is used to update the mesh's parameters based on its appearance matching the text prompt. Our method is able to produce plausible and appealing results with non-trivial tiles for a variety of different periodic tiling patterns.

Training-free Linear Image Inversion via Flows

  • paper_url: http://arxiv.org/abs/2310.04432
  • repo_url: None
  • paper_authors: Ashwini Pokle, Matthew J. Muckley, Ricky T. Q. Chen, Brian Karrer
  • for: Linear image inversion without training
  • methods: 使用预训练的生成模型,采用流程匹配模型,使用理论支持的质量补做方法,大幅减少手动参数调整。
  • results: 在高维数据集上(ImageNet-64/128和AFHQ-256),无需特定问题调整,我们的流程基于匹配方法对图像反向问题进行了有效的解决。
    Abstract Training-free linear inversion involves the use of a pretrained generative model and -- through appropriate modifications to the generation process -- solving inverse problems without any finetuning of the generative model. While recent prior methods have explored the use of diffusion models, they still require the manual tuning of many hyperparameters for different inverse problems. In this work, we propose a training-free method for image inversion using pretrained flow models, leveraging the simplicity and efficiency of Flow Matching models, using theoretically-justified weighting schemes and thereby significantly reducing the amount of manual tuning. In particular, we draw inspiration from two main sources: adopting prior gradient correction methods to the flow regime, and a solver scheme based on conditional Optimal Transport paths. As pretrained diffusion models are widely accessible, we also show how to practically adapt diffusion models for our method. Empirically, our approach requires no problem-specific tuning across an extensive suite of noisy linear image inversion problems on high-dimensional datasets, ImageNet-64/128 and AFHQ-256, and we observe that our flow-based method for image inversion significantly improves upon closely-related diffusion-based linear inversion methods.
    摘要 <>使用预训练的生成模型进行无需训练的线性逆转,通过对生成过程进行相应的修改,可以解决逆转问题无需生成模型的负载。Recent prior方法已经探索了使用扩散模型,但仍然需要手动调整许多超参数 для不同的逆转问题。在这种工作中,我们提出了一种无需训练的图像逆转方法使用预训练的流模型,利用流模型的简单性和高效性,并使用理论上正确的权重分配方案,以降低手动调整的数量。特别是,我们从两个主要的来源中突破想法:在流程中采用先前的梯度修正方法,以及基于条件最优运输路径的解决方案。由于预训练的扩散模型广泛可用,我们还展示了如何实际地适应 diffusion 模型。在实验中,我们发现我们的流基于方法可以在高维度的数据集上进行无需具体问题调整的图像逆转,并且与相似的扩散基于线性逆转方法相比,我们的流基于方法可以获得显著的改进。>>

Disinformation Detection: An Evolving Challenge in the Age of LLMs

  • paper_url: http://arxiv.org/abs/2309.15847
  • repo_url: None
  • paper_authors: Bohan Jiang, Zhen Tan, Ayushi Nirmal, Huan Liu
  • for: 本研究旨在探讨利用大型语言模型(LLMs)生成的假信息攻击性的威胁,以及如何通过利用LLMs自身来建立可靠的防御机制。
  • methods: 本研究采用了现有的假信息检测技术,以及利用LLMs自身来生成检测假信息的模型。
  • results: 研究发现,现有的假信息检测技术对LLMs生成的假信息有限的检测能力,而利用LLMs自身来生成检测假信息的模型则表现更高效。
    Abstract The advent of generative Large Language Models (LLMs) such as ChatGPT has catalyzed transformative advancements across multiple domains. However, alongside these advancements, they have also introduced potential threats. One critical concern is the misuse of LLMs by disinformation spreaders, leveraging these models to generate highly persuasive yet misleading content that challenges the disinformation detection system. This work aims to address this issue by answering three research questions: (1) To what extent can the current disinformation detection technique reliably detect LLM-generated disinformation? (2) If traditional techniques prove less effective, can LLMs themself be exploited to serve as a robust defense against advanced disinformation? and, (3) Should both these strategies falter, what novel approaches can be proposed to counter this burgeoning threat effectively? A holistic exploration for the formation and detection of disinformation is conducted to foster this line of research.
    摘要 LLMs的出现已经导致多个领域的进步,但同时也引入了潜在的威胁。一个重要的问题是利用LLMs散布假信息,使用这些模型生成高度感染的假信息,这会挑战假信息检测系统。本研究的目的是回答以下三个研究问题:(1)现有的假信息检测技术能够有效地检测LLM生成的假信息吗?(2)如果传统技术不够有效,可以利用LLM们自己作为防止高级假信息的强大防御手段吗?以及(3)如果这两种策略失败,可以提出新的方法来有效地对抗这种快速发展的威胁。通过探讨假信息的形成和检测,本研究旨在推动这一领域的研究。

Art or Artifice? Large Language Models and the False Promise of Creativity

  • paper_url: http://arxiv.org/abs/2309.14556
  • repo_url: None
  • paper_authors: Tuhin Chakrabarty, Philippe Laban, Divyansh Agarwal, Smaranda Muresan, Chien-Sheng Wu
  • for: 评估大语言模型(LLM)的创作能力
  • methods: 使用Consensual Assessment Technique和Torrance Test of Creative Writing评估创作性
  • results: LLM生成的故事通过TTCW测试失败率较高,并且使用LLM作为评估器时与专业作者的评估结果没有正面相关性。
    Abstract Researchers have argued that large language models (LLMs) exhibit high-quality writing capabilities from blogs to stories. However, evaluating objectively the creativity of a piece of writing is challenging. Inspired by the Torrance Test of Creative Thinking (TTCT), which measures creativity as a process, we use the Consensual Assessment Technique [3] and propose the Torrance Test of Creative Writing (TTCW) to evaluate creativity as a product. TTCW consists of 14 binary tests organized into the original dimensions of Fluency, Flexibility, Originality, and Elaboration. We recruit 10 creative writers and implement a human assessment of 48 stories written either by professional authors or LLMs using TTCW. Our analysis shows that LLM-generated stories pass 3-10X less TTCW tests than stories written by professionals. In addition, we explore the use of LLMs as assessors to automate the TTCW evaluation, revealing that none of the LLMs positively correlate with the expert assessments.
    摘要 Translated into Simplified Chinese:研究人员认为大语言模型(LLM)具有高质量的写作能力,从博客到故事。然而,评估创作文章的创新性是具有挑战性的。受某种创新思维测试(TTCT)的启发,我们使用了共识评估技术 [3],并提出了杜鲁门创作写作测试(TTCW),以评估创作作品的质量。TTCW包括14个二进制测试,涵盖了原始维度的流畅、灵活性、原创性和发展。我们邀请了10名创作作家,并对由专业作家或 LLM 写作的48篇故事进行人类评估使用 TTCW。我们的分析显示,LLM 生成的故事通过 TTCW 测试的数量比专业作家的故事少得多,3-10 倍。此外,我们还探讨了使用 LLM 作为评估者,以自动化 TTCW 评估,结果显示,没有任何 LLM 与专业评估相关。

Tactile Estimation of Extrinsic Contact Patch for Stable Placement

  • paper_url: http://arxiv.org/abs/2309.14552
  • repo_url: None
  • paper_authors: Kei Ota, Devesh K. Jha, Krishna Murthy Jatavallabhula, Asako Kanezaki, Joshua B. Tenenbaum
  • for: 这个论文是为了研究机器人如何具备细化的操作技能,特别是在堆叠复杂形状物体时。
  • methods: 该论文使用了反馈技能来帮助机器人学习堆叠复杂形状物体。机器人通过感受到物体与环境之间的轻微接触来理解物体的稳定性。
  • results: 研究结果表明,通过对物体与环境之间的轻微接触来估算物体的稳定性是可能的。此外,该方法还可以估算物体在释放 grasp 时的稳定性。实验结果表明,该方法可以在不同的物体对象中实现精准的堆叠。
    Abstract Precise perception of contact interactions is essential for the fine-grained manipulation skills for robots. In this paper, we present the design of feedback skills for robots that must learn to stack complex-shaped objects on top of each other. To design such a system, a robot should be able to reason about the stability of placement from very gentle contact interactions. Our results demonstrate that it is possible to infer the stability of object placement based on tactile readings during contact formation between the object and its environment. In particular, we estimate the contact patch between a grasped object and its environment using force and tactile observations to estimate the stability of the object during a contact formation. The contact patch could be used to estimate the stability of the object upon the release of the grasp. The proposed method is demonstrated on various pairs of objects that are used in a very popular board game.
    摘要 <精准感受接触互动是机器人细致 manipulate 技能的关键。在这篇论文中,我们提出了机器人学习排序复杂形状物体的方法。为了设计这种系统,机器人需要能够根据非常轻微的接触互动理解物体的稳定性。我们的结果表明,可以通过触感读数在物体和其环境之间的接触形成时计算物体的稳定性。特别是,我们可以通过力和感觉观察来估计握持物体和环境之间的接触面积,以估计物体在释放时的稳定性。我们的方法在各种普遍的板球游戏中使用了不同的对象。>Note that Simplified Chinese is used in the translation, as it is the more widely used standard for Chinese writing in mainland China. If you prefer Traditional Chinese, I can provide that as well.

Algorithmic Collusion or Competition: the Role of Platforms’ Recommender Systems

  • paper_url: http://arxiv.org/abs/2309.14548
  • repo_url: None
  • paper_authors: Xingchen Xu, Stephanie Lee, Yong Tan
  • For: This paper examines how recommendation algorithms used by e-commerce platforms can impact the competitive dynamics of AI-based pricing algorithms.* Methods: The paper uses a repeated game framework to model the interactions between sellers and the platform’s recommender system, and conducts experiments to observe price dynamics and determine the final equilibrium.* Results: The paper finds that a profit-based recommender system can intensify algorithmic collusion among sellers, while a demand-based recommender system can foster price competition and result in lower prices. The results are robust in various market scenarios.
    Abstract Recent academic research has extensively examined algorithmic collusion resulting from the utilization of artificial intelligence (AI)-based dynamic pricing algorithms. Nevertheless, e-commerce platforms employ recommendation algorithms to allocate exposure to various products, and this important aspect has been largely overlooked in previous studies on algorithmic collusion. Our study bridges this important gap in the literature and examines how recommendation algorithms can determine the competitive or collusive dynamics of AI-based pricing algorithms. Specifically, two commonly deployed recommendation algorithms are examined: (i) a recommender system that aims to maximize the sellers' total profit (profit-based recommender system) and (ii) a recommender system that aims to maximize the demand for products sold on the platform (demand-based recommender system). We construct a repeated game framework that incorporates both pricing algorithms adopted by sellers and the platform's recommender system. Subsequently, we conduct experiments to observe price dynamics and ascertain the final equilibrium. Experimental results reveal that a profit-based recommender system intensifies algorithmic collusion among sellers due to its congruence with sellers' profit-maximizing objectives. Conversely, a demand-based recommender system fosters price competition among sellers and results in a lower price, owing to its misalignment with sellers' goals. Extended analyses suggest the robustness of our findings in various market scenarios. Overall, we highlight the importance of platforms' recommender systems in delineating the competitive structure of the digital marketplace, providing important insights for market participants and corresponding policymakers.
    摘要 现代学术研究已经广泛研究了基于人工智能(AI)的动态价格算法的算法协作。然而,电商平台使用推荐算法来分配产品的曝光,这一重要方面在过去的研究中受到了广泛的忽略。我们的研究填补了这一重要的研究漏洞,并研究了推荐算法如何影响AI基于价格算法的竞争或协作动态。 Specifically,我们研究了两种通常部署的推荐算法:(i)一个目标 Maximize sellers' total profit的推荐系统(profit-based recommender system),和(ii)一个目标 Maximize the demand for products sold on the platform的推荐系统(demand-based recommender system)。我们建立了一个重复游戏框架,该框架包括采用的价格算法和平台的推荐系统。然后,我们进行实验,观察价格动态并确定最终平衡。实验结果表明,一个基于利润的推荐系统会使算法协作增强,因为它与卖家的利润最大化目标相匹配。相反,一个基于需求的推荐系统会促进价格竞争,导致价格下降,因为它与卖家的目标不一致。 extended 分析表明我们的结论在不同的市场情况下具有坚实性。总的来说,我们强调了平台的推荐系统在数字市场的竞争结构中发挥重要作用,为市场参与者和相关政策制定者提供重要的洞察。

Effect of roundabout design on the behavior of road users: A case study of roundabouts with application of Unsupervised Machine Learning

  • paper_url: http://arxiv.org/abs/2309.14540
  • repo_url: None
  • paper_authors: Tasnim M. Dwekat, Ayda A. Almsre, Huthaifa I. Ashqar
    for: 这个研究的目的是评估缓冲器的性能并研究人行道用户在互动缓冲器时的行为。methods: 该研究使用了观察和分类 drivers的行为,以及预测道路用户在缓冲器交叉点的行为。results: 研究发现,缓冲器可以减少拐弯口口的速度,入口速度和相应的影响速度取决于道路用户的评级。此外,车辆的速度在过缓冲器时更适合于汽车和卡车的速度。此外,缓冲器具有两种内在特点,首先,由于汽车的小尺寸和缓冲器所有部分都可见,因此从所有方向进入缓冲器时,所有 drivers 都需要减速,从而增加了他们在穿过缓冲器时的反应时间,降低了事故的风险。其次,由于缓冲器内部的流速更少, drivers 只需要左看(在右侧交通),从而更容易过缓冲器。
    Abstract This research aims to evaluate the performance of the rotors and study the behavior of the human driver in interacting with the rotors. In recent years, rotors have been increasingly used between countries due to their safety, capacity, and environmental advantages, and because they provide safe and fluid flows of vehicles for transit and integration. It turns out that roundabouts can significantly reduce speed at twisting intersections, entry speed and the resulting effect on speed depends on the rating of road users. In our research, (bus, car, truck) drivers were given special attention and their behavior was categorized into (conservative, normal, aggressive). Anticipating and recognizing driver behavior is an important challenge. Therefore, the aim of this research is to study the effect of roundabouts on these classifiers and to develop a method for predicting the behavior of road users at roundabout intersections. Safety is primarily due to two inherent features of the rotor. First, by comparing the data collected and processed in order to classify and evaluate drivers' behavior, and comparing the speeds of the drivers (bus, car and truck), the speed of motorists at crossing the roundabout was more fit than that of buses and trucks. We looked because the car is smaller and all parts of the rotor are visible to it. So drivers coming from all directions have to slow down, giving them more time to react and mitigating the consequences in the event of an accident. Second, with fewer conflicting flows (and points of conflict), drivers only need to look to their left (in right-hand traffic) for other vehicles, making their job of crossing the roundabout easier as there is less need to split attention between different directions.
    摘要 Safety is a primary concern, and rotors have two inherent features that contribute to safety. First, the speed of motorists crossing the roundabout is more controlled compared to buses and trucks, as the smaller car size allows for better visibility of all parts of the rotor. This requires drivers to slow down, giving them more time to react and reducing the risk of accidents. Second, with fewer conflicting flows and points of conflict, drivers only need to look to their left (in right-hand traffic) for other vehicles, making it easier to cross the roundabout and reducing the need to split attention between different directions.In our research, we collected and processed data to classify and evaluate driver behavior, and compared the speeds of buses, cars, and trucks. We found that the speed of motorists crossing the roundabout was more controlled than that of buses and trucks, as the car's smaller size allows for better visibility of all parts of the rotor. Overall, the design of rotors provides a safer and more efficient way to manage traffic flow, and our research aims to further understand and improve the performance of these intersections.

Watch Your Language: Large Language Models and Content Moderation

  • paper_url: http://arxiv.org/abs/2309.14517
  • repo_url: None
  • paper_authors: Deepak Kumar, Yousef AbuHashem, Zakir Durumeric
  • for: 这个论文旨在研究大型自然语言模型(LLM)在内容审核任务中的表现。
  • methods: 论文使用了现代商业化的GPT-3、GPT-3.5和GPT-4大型自然语言模型,对两种常见的内容审核任务进行评估:规则基础的社区审核和价值评估。
  • results: 论文发现,LLMs可以有效地进行许多社区的规则基础审核, median accuracy 达到 64%, median precision 达到 83%。而对恶意内容检测,LLMs 表现明显 луч于现有的商业化恶意类别化器。但是,论文发现,在恶意检测任务上,Recent 附加的模型大小增加只有微scopic 的提升,表明 LLMs 在这种任务上可能已经达到性能杯顶。
    Abstract Large language models (LLMs) have exploded in popularity due to their ability to perform a wide array of natural language tasks. Text-based content moderation is one LLM use case that has received recent enthusiasm, however, there is little research investigating how LLMs perform in content moderation settings. In this work, we evaluate a suite of modern, commercial LLMs (GPT-3, GPT-3.5, GPT-4) on two common content moderation tasks: rule-based community moderation and toxic content detection. For rule-based community moderation, we construct 95 LLM moderation-engines prompted with rules from 95 Reddit subcommunities and find that LLMs can be effective at rule-based moderation for many communities, achieving a median accuracy of 64% and a median precision of 83%. For toxicity detection, we find that LLMs significantly outperform existing commercially available toxicity classifiers. However, we also find that recent increases in model size add only marginal benefit to toxicity detection, suggesting a potential performance plateau for LLMs on toxicity detection tasks. We conclude by outlining avenues for future work in studying LLMs and content moderation.
    摘要 For rule-based community moderation, we created 95 LLM moderation engines using rules from 95 Reddit subcommunities. We found that LLMs can effectively moderate content for many communities, achieving a median accuracy of 64% and a median precision of 83%.For toxicity detection, we found that LLMs significantly outperformed existing commercial toxicity classifiers. However, we also found that increasing the size of the model only provided marginal benefits for toxicity detection, suggesting a potential performance plateau for LLMs on this task.Based on our findings, we outline potential avenues for future research on LLMs and content moderation.

Interaction-Aware Decision-Making for Autonomous Vehicles in Forced Merging Scenario Leveraging Social Psychology Factors

  • paper_url: http://arxiv.org/abs/2309.14497
  • repo_url: None
  • paper_authors: Xiao Li, Kaiwen Liu, H. Eric Tseng, Anouck Girard, Ilya Kolmanovsky
  • for: 本研究旨在帮助自动驾驶车辆在复杂的交通场景中成功完成其驾驶任务,特别是在高速公路强制汇聚场景中。
  • methods: 本研究使用了社会行为模型,该模型考虑了交互的 drivers 的社会行为和个人目标。基于这个模型,我们开发了一种退火策略控制的决策策略,可以在线估计其他司机的意图,并在不确定的意图下预测附近车辆的行为。
  • results: 我们通过对比game理论控制器和实际交通数据进行了对比,证明了我们的决策策略的有效性。
    Abstract Understanding the intention of vehicles in the surrounding traffic is crucial for an autonomous vehicle to successfully accomplish its driving tasks in complex traffic scenarios such as highway forced merging. In this paper, we consider a behavioral model that incorporates both social behaviors and personal objectives of the interacting drivers. Leveraging this model, we develop a receding-horizon control-based decision-making strategy, that estimates online the other drivers' intentions using Bayesian filtering and incorporates predictions of nearby vehicles' behaviors under uncertain intentions. The effectiveness of the proposed decision-making strategy is demonstrated and evaluated based on simulation studies in comparison with a game theoretic controller and a real-world traffic dataset.
    摘要 理解周围交通中车辆的意图是自动驾驶车辆在复杂交通场景中成功完成驾驶任务的关键。在这篇论文中,我们考虑了一种行为模型,该模型包括交互驾驶员的社会行为和个人目标。利用这种模型,我们开发了一种往复控制基于决策策略,该策略在线上估计其他驾驶员的意图使用 bayesian 筛选,并在不确定意图下预测附近车辆的行为。我们通过模拟研究和与游戏理论控制器进行比较,证明了提议的决策策略的有效性。

Era Splitting – Invariant Learning for Decision Trees

  • paper_url: http://arxiv.org/abs/2309.14496
  • repo_url: https://github.com/jefferythewind/era-splitting-notebook-examples
  • paper_authors: Timothy DeLise
  • For: The paper is written to address the issue of out-of-distribution (OOD) generalization in decision tree models, specifically random forest and gradient-boosting decision trees.* Methods: The paper proposes two new splitting criteria for decision trees that incorporate era-wise information into the splitting process, allowing the models to find split points that are optimal across all disjoint eras in the data.* Results: The paper describes unique experiments to showcase the benefits of the new splitting criteria, which improve metrics in the authors’ experiments out-of-sample. The new criteria are incorporated into a state-of-the-art gradient boosted decision tree model in the Scikit-Learn code base, which is made freely available.Here are the three key points in Simplified Chinese text:* For: 本研究是为了解决决策树模型中的外部数据泛化问题,特别是随机森林和梯度拟合决策树模型。* Methods: 本研究提出了两种新的分割 criterion,用于决策树模型中的分割过程中,以便在不同的时间和地点上进行数据分割。* Results: 本研究通过一系列唯一的实验,展示了新分割 criterion 的优势,可以在尝试样本中提高 metric 的表现。新 criterion 被 integrate 到 Scikit-Learn 代码库中的一个状态最佳的梯度拟合决策树模型中,并且免费释出。
    Abstract Real life machine learning problems exhibit distributional shifts in the data from one time to another or from on place to another. This behavior is beyond the scope of the traditional empirical risk minimization paradigm, which assumes i.i.d. distribution of data over time and across locations. The emerging field of out-of-distribution (OOD) generalization addresses this reality with new theory and algorithms which incorporate environmental, or era-wise information into the algorithms. So far, most research has been focused on linear models and/or neural networks. In this research we develop two new splitting criteria for decision trees, which allow us to apply ideas from OOD generalization research to decision tree models, including random forest and gradient-boosting decision trees. The new splitting criteria use era-wise information associated with each data point to allow tree-based models to find split points that are optimal across all disjoint eras in the data, instead of optimal over the entire data set pooled together, which is the default setting. We describe the new splitting criteria in detail and develop unique experiments to showcase the benefits of these new criteria, which improve metrics in our experiments out-of-sample. The new criteria are incorporated into the a state-of-the-art gradient boosted decision tree model in the Scikit-Learn code base, which is made freely available.
    摘要 The new splitting criteria use era-wise information associated with each data point to find split points that are optimal across all disjoint eras in the data, rather than optimal over the entire data set pooled together. We describe the new splitting criteria in detail and conduct unique experiments to demonstrate their benefits, which improve metrics out-of-sample.We have incorporated the new splitting criteria into a state-of-the-art gradient boosted decision tree model in the Scikit-Learn code base and made it freely available. This research provides a new approach to addressing distributional shifts in machine learning and improving the generalization of tree-based models.

A Novel Deep Learning Technique for Morphology Preserved Fetal ECG Extraction from Mother ECG using 1D-CycleGAN

  • paper_url: http://arxiv.org/abs/2310.03759
  • repo_url: None
  • paper_authors: Promit Basak, A. H. M Nazmus Sakib, Muhammad E. H. Chowdhury, Nasser Al-Emadi, Huseyin Cagatay Yalcin, Shona Pedersen, Sakib Mahmud, Serkan Kiranyaz, Somaya Al-Maadeed
    for: 这个研究的目的是监测胎儿心脏的电压信号,以实现胎儿心脏疾病的早期诊断和后续照护。methods: 这个研究使用了1D CycleGAN来重建胎儿心脏电压信号,并且进行了广泛的预处理和适当的框架,以维持信号的结构。results: 这个研究的结果显示,使用1D CycleGAN重建胎儿心脏电压信号的方法可以获得高精度的胎儿心脏疾病诊断和胎儿心脏功能监测。这个方法可以实现胎儿心脏疾病的早期诊断和后续照护,并且与现有的相关技术相比,具有较高的精度和可靠性。
    Abstract Monitoring the electrical pulse of fetal heart through a non-invasive fetal electrocardiogram (fECG) can easily detect abnormalities in the developing heart to significantly reduce the infant mortality rate and post-natal complications. Due to the overlapping of maternal and fetal R-peaks, the low amplitude of the fECG, systematic and ambient noises, typical signal extraction methods, such as adaptive filters, independent component analysis, empirical mode decomposition, etc., are unable to produce satisfactory fECG. While some techniques can produce accurate QRS waves, they often ignore other important aspects of the ECG. Our approach, which is based on 1D CycleGAN, can reconstruct the fECG signal from the mECG signal while maintaining the morphology due to extensive preprocessing and appropriate framework. The performance of our solution was evaluated by combining two available datasets from Physionet, "Abdominal and Direct Fetal ECG Database" and "Fetal electrocardiograms, direct and abdominal with reference heartbeat annotations", where it achieved an average PCC and Spectral-Correlation score of 88.4% and 89.4%, respectively. It detects the fQRS of the signal with accuracy, precision, recall and F1 score of 92.6%, 97.6%, 94.8% and 96.4%, respectively. It can also accurately produce the estimation of fetal heart rate and R-R interval with an error of 0.25% and 0.27%, respectively. The main contribution of our work is that, unlike similar studies, it can retain the morphology of the ECG signal with high fidelity. The accuracy of our solution for fetal heart rate and R-R interval length is comparable to existing state-of-the-art techniques. This makes it a highly effective tool for early diagnosis of fetal heart diseases and regular health checkups of the fetus.
    摘要 监测胎儿心脏电压通过非侵入式胎儿电cardiogram (fECG) 可以轻松地检测胎儿心脏发育异常,从而减少新生儿死亡率和哺乳期后的合并症状。由于胎母和胎儿的R峰重叠,低强度fECG,系统性和 ambient 噪声,传统的信号提取方法,如适应过滤、独立 componenets 分析、empirical mode decomposition 等,通常无法生成满意的fECG。虽然一些技术可以生成准确的QRS波,但它们通常忽略了其他重要的ECG方面。我们的方法基于1D CycleGAN,可以从mECG信号中重建fECG信号,同时保持信号的形态,因为我们进行了广泛的预处理和适当的框架。我们的解决方案的性能得到了两个可用的Physionet数据集("Abdominal and Direct Fetal ECG Database"和"Fetal electrocardiograms, direct and abdominal with reference heartbeat annotations")的评估,其中获得了88.4%和89.4%的PCC和spectral-correlation分数。它可以准确地检测信号中的fQRS,并具有92.6%、97.6%、94.8%和96.4%的准确率、精度、回归率和F1分数。它还可以准确地计算胎儿心率和R-R间隔的误差,分别为0.25%和0.27%。我们的工作的主要贡献在于,与其他相似的研究不同,可以保持ECG信号的形态高度准确。我们的解决方案的准确率和R-R间隔长度与现有的状态 искусственный智能技术相当。这使得它成为了诊断胎心疾病的高效工具,以及哺乳期后胎心健康检查的重要工具。

When Automated Assessment Meets Automated Content Generation: Examining Text Quality in the Era of GPTs

  • paper_url: http://arxiv.org/abs/2309.14488
  • repo_url: https://github.com/nd-hal/automated-ml-scoring-versus-generation
  • paper_authors: Marialena Bevilacqua, Kezia Oketch, Ruiyang Qin, Will Stamey, Xinyuan Zhang, Yi Gan, Kai Yang, Ahmed Abbasi
  • for: 这个论文主要研究了机器学习(ML)模型在文本数据评分中的应用和发展。
  • methods: 该论文使用了多种机器学习模型,包括文本生成大语言模型(GPTs)和卷积神经网络(CNN/RNN),对人类生成的文本和GPT生成的文本进行评分和评估。
  • results: 研究发现,使用 transformer 预训练语言模型(PLMs)可以更准确地评分人类生成的文本质量,而 traditional deep learning 和特征基于的 ML 模型则更倾向于评分人类文本较高。此外,研究还发现,transformer PLMs 具有更强的泛化能力,可以更好地处理 GPT 生成的文本。
    Abstract The use of machine learning (ML) models to assess and score textual data has become increasingly pervasive in an array of contexts including natural language processing, information retrieval, search and recommendation, and credibility assessment of online content. A significant disruption at the intersection of ML and text are text-generating large-language models such as generative pre-trained transformers (GPTs). We empirically assess the differences in how ML-based scoring models trained on human content assess the quality of content generated by humans versus GPTs. To do so, we propose an analysis framework that encompasses essay scoring ML-models, human and ML-generated essays, and a statistical model that parsimoniously considers the impact of type of respondent, prompt genre, and the ML model used for assessment model. A rich testbed is utilized that encompasses 18,460 human-generated and GPT-based essays. Results of our benchmark analysis reveal that transformer pretrained language models (PLMs) more accurately score human essay quality as compared to CNN/RNN and feature-based ML methods. Interestingly, we find that the transformer PLMs tend to score GPT-generated text 10-15\% higher on average, relative to human-authored documents. Conversely, traditional deep learning and feature-based ML models score human text considerably higher. Further analysis reveals that although the transformer PLMs are exclusively fine-tuned on human text, they more prominently attend to certain tokens appearing only in GPT-generated text, possibly due to familiarity/overlap in pre-training. Our framework and results have implications for text classification settings where automated scoring of text is likely to be disrupted by generative AI.
    摘要 使用机器学习(ML)模型评分文本数据已经在多种场景中广泛应用,包括自然语言处理、信息检索、搜索和推荐、以及在线内容可靠性评分。一种 significante 的干预在机器学习和文本之间的交叉处是文本生成大语言模型(GPTs)。我们employs一种分析框架,覆盖了文本评分ML模型、人类和GPTs生成的文本,以及一个简单的统计模型,考虑了评分模型的类型、提示类型和评分模型的影响。我们使用了一个丰富的测试环境,包括18,460个人类生成和GPTs生成的文本。我们的基准分析结果显示,transformer预训练语言模型(PLMs)在评分人类文本质量方面更为准确,比起CNN/RNN和特征基于ML方法。另外,我们发现transformer PLMs对GPTs生成的文本进行评分,相对于人类写作的文本,提高了10-15%的平均分。然而,传统的深度学习和特征基于ML方法对人类文本进行评分,显著高于。进一步的分析表明,although transformer PLMs是仅仅特征基于人类文本进行finetune,它们更加强调在GPTs生成的文本中出现的某些字符,可能是因为预训练中的熟悉/重叠。我们的框架和结果对于文本分类设置,其中自动评分文本可能受到生成AI的干扰有重要意义。

Incorporating Ensemble and Transfer Learning For An End-To-End Auto-Colorized Image Detection Model

  • paper_url: http://arxiv.org/abs/2309.14478
  • repo_url: None
  • paper_authors: Ahmed Samir Ragab, Shereen Aly Taie, Howida Youssry Abdelnaby
  • for: 这个论文旨在提出一种新的图像色调检测方法,用于分辨天然颜色图像和计算机色调图像。
  • methods: 该方法结合了传输学习和集成学习的优点,并使用预训练的VGG16和Resnet50树脊,以及Mobile Net v2或Efficientnet特征向量。
  • results: 该模型在分类性能和泛化能力方面表现出色,准确率在94.55%到99.13%之间,偏差总错误率很低。与现有状态的先进模型相比,该模型表现出了更高的分类性能和泛化能力。
    Abstract Image colorization is the process of colorizing grayscale images or recoloring an already-color image. This image manipulation can be used for grayscale satellite, medical and historical images making them more expressive. With the help of the increasing computation power of deep learning techniques, the colorization algorithms results are becoming more realistic in such a way that human eyes cannot differentiate between natural and colorized images. However, this poses a potential security concern, as forged or illegally manipulated images can be used illegally. There is a growing need for effective detection methods to distinguish between natural color and computer-colorized images. This paper presents a novel approach that combines the advantages of transfer and ensemble learning approaches to help reduce training time and resource requirements while proposing a model to classify natural color and computer-colorized images. The proposed model uses pre-trained branches VGG16 and Resnet50, along with Mobile Net v2 or Efficientnet feature vectors. The proposed model showed promising results, with accuracy ranging from 94.55% to 99.13% and very low Half Total Error Rate values. The proposed model outperformed existing state-of-the-art models regarding classification performance and generalization capabilities.
    摘要 Image colorization是将灰度图像或已经颜色化的图像中的颜色更改的过程。这种图像修改可以用于灰度卫星图像、医疗图像和历史图像等,使其更加表达力。随着深度学习技术的计算能力的提高,图像色化算法的结果变得越来越真实,以至于人眼无法分辨天然颜色和计算机颜色化图像之间的差异。然而,这也带来了安全性问题,因为假或非法修改的图像可以用于违法活动。随着需求的增长,有效地检测天然颜色和计算机颜色化图像的方法变得越来越重要。本文提出了一种新的方法,该方法结合了传输学习和集成学习的优点,以减少训练时间和资源需求,并提出了一种用于分类天然颜色和计算机颜色化图像的模型。该模型使用预训练分支VGG16和Resnet50,以及Mobile Net v2或Efficientnet特征向量。该模型的实验结果表现出色,准确率在94.55%至99.13%之间,并且半总错误率很低。该模型超越了现有的状态对模型,以 regards to classification performance和总体能力。

Adapting Double Q-Learning for Continuous Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.14471
  • repo_url: None
  • paper_authors: Arsenii Kuznetsov
  • for: 这个论文主要针对off-policy reinforcement learning中的偏高偏估问题,提出了一种新的偏估控制方法。
  • methods: 该方法基于一个混合策略,每个策略组件由两个分立的网络评估,从而消除了基于偏估的假设。
  • results: 该方法在一些MuJoCo环境中达到了near-SOTA的result,显示了其可行性和有效性。
    Abstract Majority of off-policy reinforcement learning algorithms use overestimation bias control techniques. Most of these techniques rooted in heuristics, primarily addressing the consequences of overestimation rather than its fundamental origins. In this work we present a novel approach to the bias correction, similar in spirit to Double Q-Learning. We propose using a policy in form of a mixture with two components. Each policy component is maximized and assessed by separate networks, which removes any basis for the overestimation bias. Our approach shows promising near-SOTA results on a small set of MuJoCo environments.
    摘要 大多数Off-policy reinforcement learning算法使用过估偏调技术。这些技术基于规则,主要是解决过估的后果而不是其基本原因。在这项工作中,我们提出了一种新的偏调修正方法,类似于Double Q-Learning。我们提议使用一个策略组合,其中每个策略组件是由两个分开的网络评估和最大化。这种方法可以消除任何基于过估偏调的基础。我们的方法在一些MuJoCo环境上显示了有优的近SOTA结果。

DefGoalNet: Contextual Goal Learning from Demonstrations For Deformable Object Manipulation

  • paper_url: http://arxiv.org/abs/2309.14463
  • repo_url: None
  • paper_authors: Bao Thach, Tanner Watts, Shing-Hei Ho, Tucker Hermans, Alan Kuntz
  • for: 解决控制柔体物体到目标形状的问题,即shape servoing问题。
  • methods: 开发了一种基于神经网络的 DefGoalNet,可以直接从人类示范中学习柔体物体目标形状。
  • results: 在 simulate 和physical robot 上进行了多种任务测试,包括手术压缩任务,并达到了高达90%的成功率,表明该方法可以有效地解决shape servoing问题, bringing deformable object manipulation closer to practical, real-world applications.
    Abstract Shape servoing, a robotic task dedicated to controlling objects to desired goal shapes, is a promising approach to deformable object manipulation. An issue arises, however, with the reliance on the specification of a goal shape. This goal has been obtained either by a laborious domain knowledge engineering process or by manually manipulating the object into the desired shape and capturing the goal shape at that specific moment, both of which are impractical in various robotic applications. In this paper, we solve this problem by developing a novel neural network DefGoalNet, which learns deformable object goal shapes directly from a small number of human demonstrations. We demonstrate our method's effectiveness on various robotic tasks, both in simulation and on a physical robot. Notably, in the surgical retraction task, even when trained with as few as 10 demonstrations, our method achieves a median success percentage of nearly 90%. These results mark a substantial advancement in enabling shape servoing methods to bring deformable object manipulation closer to practical, real-world applications.
    摘要 shape servoing, a robotic task focused on controlling objects to desired goal shapes, is a promising approach to deformable object manipulation. however, a challenge arises with the reliance on the specification of a goal shape. this goal has been obtained either through a laborious domain knowledge engineering process or by manually manipulating the object into the desired shape and capturing the goal shape at that specific moment, both of which are impractical in various robotic applications. in this paper, we solve this problem by developing a novel neural network DefGoalNet, which learns deformable object goal shapes directly from a small number of human demonstrations. we demonstrate our method's effectiveness on various robotic tasks, both in simulation and on a physical robot. notably, in the surgical retraction task, even when trained with as few as 10 demonstrations, our method achieves a median success percentage of nearly 90%. these results mark a substantial advancement in enabling shape servoing methods to bring deformable object manipulation closer to practical, real-world applications.

Online Active Learning For Sound Event Detection

  • paper_url: http://arxiv.org/abs/2309.14460
  • repo_url: None
  • paper_authors: Mark Lindsey, Ankit Shah, Francis Kubala, Richard M. Stern
  • for: 这篇论文是为了提高Sound Event Detection(SED)中的监督学习效率而写的。
  • methods: 这篇论文使用了线上活动学习(OAL)来减少监督学习需要的时间和努力。它还使用了新的损失函数来解决现有OAL方法中的问题,例如气流分布的变化和数据漂移。
  • results: 实验结果显示,使用OAL可以将SED监督学习的时间和努力缩减到SONYC dataset中的一半,并且新的方法可以成功地解决现有OAL方法中的问题。
    Abstract Data collection and annotation is a laborious, time-consuming prerequisite for supervised machine learning tasks. Online Active Learning (OAL) is a paradigm that addresses this issue by simultaneously minimizing the amount of annotation required to train a classifier and adapting to changes in the data over the duration of the data collection process. Prior work has indicated that fluctuating class distributions and data drift are still common problems for OAL. This work presents new loss functions that address these challenges when OAL is applied to Sound Event Detection (SED). Experimental results from the SONYC dataset and two Voice-Type Discrimination (VTD) corpora indicate that OAL can reduce the time and effort required to train SED classifiers by a factor of 5 for SONYC, and that the new methods presented here successfully resolve issues present in existing OAL methods.
    摘要 数据收集和注释是超级vised机器学习任务的必要前置条件,但它们是时间consuming和劳动密集的。在线活动学习(OAL)是一种方法,它同时减少了训练分类器所需的注释量和适应数据的变化过程中的数据风险。前一个研究表示,在OAL中仍然存在涨落分布和数据漂移的问题。这个工作提出了新的损失函数,以解决这些问题在音频事件检测(SED)领域中。实验结果来自SONYC数据集和两个语音类型识别(VTD) corpora,表明OAL可以将SED分类器训练所需的时间和劳动量减少到SONYC数据集的5倍,并且新的方法在现有OAL方法中成功解决了问题。

Self-Recovery Prompting: Promptable General Purpose Service Robot System with Foundation Models and Self-Recovery

  • paper_url: http://arxiv.org/abs/2309.14425
  • repo_url: None
  • paper_authors: Mimo Shirasaka, Tatsuya Matsushima, Soshi Tsunashima, Yuya Ikeda, Aoi Horo, So Ikoma, Chikaha Tsuji, Hikaru Wada, Tsunekazu Omija, Dai Komukai, Yutaka Matsuo Yusuke Iwasawa
  • for: 本研究旨在开发一个可以执行多种任务的通用服务机器人(GPSR),需要一个高度普适和适应任务和环境的系统。
  • methods: 我们首先基于多个基础模型开发了一个高级GPSR系统,并通过让每个模型提示来使其普适和适应。
  • results: 我们发现在更实际的GPSR应用场景中存在三种类型的失败情况:缺乏信息、错误的规划生成和执行失败。我们则提出了自适应提示管道,以探索必要的信息并修改提示来恢复失败。我们的实验证明,具有自适应机制的系统可以完成任务并解决多种失败情况。
    Abstract A general-purpose service robot (GPSR), which can execute diverse tasks in various environments, requires a system with high generalizability and adaptability to tasks and environments. In this paper, we first developed a top-level GPSR system for worldwide competition (RoboCup@Home 2023) based on multiple foundation models. This system is both generalizable to variations and adaptive by prompting each model. Then, by analyzing the performance of the developed system, we found three types of failure in more realistic GPSR application settings: insufficient information, incorrect plan generation, and plan execution failure. We then propose the self-recovery prompting pipeline, which explores the necessary information and modifies its prompts to recover from failure. We experimentally confirm that the system with the self-recovery mechanism can accomplish tasks by resolving various failure cases. Supplementary videos are available at https://sites.google.com/view/srgpsr .
    摘要 一种通用服务机器人(GPSR),能够执行多种任务在多种环境中,需要一个高度通用和适应性的系统。在这篇论文中,我们首先基于多个基础模型开发了一个全面的GPSR系统,并通过提示每个模型来使其通用和适应性更高。然后,通过分析系统的性能,我们发现在更实际的GPSR应用场景中存在三种失败类型:不充分的信息、错误的计划生成和计划执行失败。我们提出了自动恢复提示管道,以探索所需的信息并修改提示来解决失败。我们通过实验证明,具有自动恢复机制的系统可以成功完成任务,并解决多种失败情况。补充视频可以在https://sites.google.com/view/srgpsr 中找到。

Extreme Parkour with Legged Robots

  • paper_url: http://arxiv.org/abs/2309.14341
  • repo_url: None
  • paper_authors: Xuxin Cheng, Kexin Shi, Ananye Agarwal, Deepak Pathak
  • for: 本研究旨在开发一种小型低成本机器人,以便它可以通过困难的满足某些难以控制的环境。
  • methods: 该研究使用一种单一的前视频摄像头和深度学习算法,以便从摄像头图像直接生成高精度控制行为。
  • results: 研究结果显示,该机器人可以通过跳高障碍物、跨越差距、悬停和跑过倾斜的坡道等动作,并可以在新的障碍物环境中进行普适化。
    Abstract Humans can perform parkour by traversing obstacles in a highly dynamic fashion requiring precise eye-muscle coordination and movement. Getting robots to do the same task requires overcoming similar challenges. Classically, this is done by independently engineering perception, actuation, and control systems to very low tolerances. This restricts them to tightly controlled settings such as a predetermined obstacle course in labs. In contrast, humans are able to learn parkour through practice without significantly changing their underlying biology. In this paper, we take a similar approach to developing robot parkour on a small low-cost robot with imprecise actuation and a single front-facing depth camera for perception which is low-frequency, jittery, and prone to artifacts. We show how a single neural net policy operating directly from a camera image, trained in simulation with large-scale RL, can overcome imprecise sensing and actuation to output highly precise control behavior end-to-end. We show our robot can perform a high jump on obstacles 2x its height, long jump across gaps 2x its length, do a handstand and run across tilted ramps, and generalize to novel obstacle courses with different physical properties. Parkour videos at https://extreme-parkour.github.io/
    摘要 人类可以通过穿梭障碍物来完成公园OUR,需要精准的眼睛肌肉协调和运动。为了让机器人做同样的任务,需要超越类似的挑战。传统上,这是通过独立地工程感知、行动和控制系统来实现的,这会限制它们只能在严格控制的室内预先设定的赛跑课程中运行。与此相反,人类可以通过练习而不是改变基本生物结构来学习公园OUR。在这篇论文中,我们采用类似的方法,使用一个小型低成本机器人,具有不精准的运动和单个前方深度摄像头来感知,它的摄像头图像是低频、颤动和噪声易产生的。我们证明了一个单一神经网络策略,直接从摄像头图像中获取控制行为,通过大规模RL在模拟中训练,可以超越不精准的感知和运动,并输出高精度的控制行为。我们的机器人可以跳高障碍物2倍其高度,跳跃差2倍其长度,执行手stand和跑在倾斜的滚动道上,并可以通过不同物理特性的新障碍课程进行扩展。有关公园OUR视频,请参考https://extreme-parkour.github.io/

Joint Audio and Speech Understanding

  • paper_url: http://arxiv.org/abs/2309.14405
  • repo_url: https://github.com/YuanGongND/ltu
  • paper_authors: Yuan Gong, Alexander H. Liu, Hongyin Luo, Leonid Karlinsky, James Glass
  • for: 这篇论文旨在构建一个基于机器学习的听说识别和理解模型,以便更好地理解人类听说信号中的语音和非语音声音。
  • methods: 该模型基于 integrate Whisper 和 LLaMA 两个模块,分别用于听说识别和理解语音和非语音声音。
  • results: 模型可以同时识别和理解语音和非语音声音,包括语音和非语音声音的识别、语音特征提取、语音识别和语音理解等任务。
    Abstract Humans are surrounded by audio signals that include both speech and non-speech sounds. The recognition and understanding of speech and non-speech audio events, along with a profound comprehension of the relationship between them, constitute fundamental cognitive capabilities. For the first time, we build a machine learning model, called LTU-AS, that has a conceptually similar universal audio perception and advanced reasoning ability. Specifically, by integrating Whisper as a perception module and LLaMA as a reasoning module, LTU-AS can simultaneously recognize and jointly understand spoken text, speech paralinguistics, and non-speech audio events - almost everything perceivable from audio signals.
    摘要 人类在听到各种各样的声音信号周围,包括语音和非语音声音。认识和理解语音和非语音声音事件,以及对它们之间的关系的深入理解,是人类的基本认知能力。我们现在第一次建立了一个机器学习模型,即LTU-AS,它具有类似于人类听觉的概念性和高级逻辑能力。具体来说,通过将Whisper作为感知模块和LLaMA作为逻辑模块相结合,LTU-AS可以同时认识和共同理解说话文本、语音非语言特征和非语音声音事件——大致上来说,听到 audio 信号中的一切可见事物。

UnitedHuman: Harnessing Multi-Source Data for High-Resolution Human Generation

  • paper_url: http://arxiv.org/abs/2309.14335
  • repo_url: https://github.com/unitedhuman/unitedhuman
  • paper_authors: Jianglin Fu, Shikai Li, Yuming Jiang, Kwan-Yee Lin, Wayne Wu, Ziwei Liu
  • for: 提高人像生成质量
  • methods: 多源数据集合并学习高分辨率人像生成模型
  • results: 与单一数据集学习的模型相比,通过jointly learning from multi-source data achieve superior quality in human image generation.
    Abstract Human generation has achieved significant progress. Nonetheless, existing methods still struggle to synthesize specific regions such as faces and hands. We argue that the main reason is rooted in the training data. A holistic human dataset inevitably has insufficient and low-resolution information on local parts. Therefore, we propose to use multi-source datasets with various resolution images to jointly learn a high-resolution human generative model. However, multi-source data inherently a) contains different parts that do not spatially align into a coherent human, and b) comes with different scales. To tackle these challenges, we propose an end-to-end framework, UnitedHuman, that empowers continuous GAN with the ability to effectively utilize multi-source data for high-resolution human generation. Specifically, 1) we design a Multi-Source Spatial Transformer that spatially aligns multi-source images to full-body space with a human parametric model. 2) Next, a continuous GAN is proposed with global-structural guidance and CutMix consistency. Patches from different datasets are then sampled and transformed to supervise the training of this scale-invariant generative model. Extensive experiments demonstrate that our model jointly learned from multi-source data achieves superior quality than those learned from a holistic dataset.
    摘要 人类生成技术已经取得了 significiant 的进步,然而现有的方法仍然困难将特定的区域如面部和手臂等生成出来。我们认为这主要的原因在于训练数据。总体的人类数据集缺乏和低分辨率的地方部分信息,因此我们提议使用多源数据集,包括不同分辨率的图像,并将其集成到一个高分辨率的人类生成模型中。然而,多源数据集具有以下两个挑战:一是不同的部分不能够在一个准确的人类空间中匹配,二是不同的数据集来源的图像尺寸不同。为了解决这些挑战,我们提出了一个综合框架,名为 UnitedHuman,它使得 kontinuous GAN 能够有效地利用多源数据来生成高分辨率的人类图像。具体来说,我们设计了一个 Multi-Source Spatial Transformer,它将多源图像转换到全身人类空间中,并使用人类参数模型来进行匹配。然后,我们提出了一个 kontinuous GAN,它具有全STRUCTURE 导向和 CutMix 一致性。不同的数据集中的小块被随机选择并转换,以supervise kontinuous GAN 的训练。我们的实验表明,我们从多源数据集中 JOINTLY 学习的模型可以超过来自整体数据集的模型。

LinGCN: Structural Linearized Graph Convolutional Network for Homomorphically Encrypted Inference

  • paper_url: http://arxiv.org/abs/2309.14331
  • repo_url: https://github.com/harveyp123/lingcn-neurips23
  • paper_authors: Hongwu Peng, Ran Ran, Yukui Luo, Jiahui Zhao, Shaoyi Huang, Kiran Thorat, Tong Geng, Chenghong Wang, Xiaolin Xu, Wujie Wen, Caiwen Ding
  • for: 这个论文是为了提高Graph Convolution Network (GCN)模型的安全性和可扩展性而写的。
  • methods: 这个论文使用了Homomorphic Encryption (HE)技术来保护客户端数据,并且提出了一个名为LinGCN的框架,用于实现GCN模型的加密运算。LinGCN使用了分别为node-wise non-linear location selection和compact node-wise polynomial replacement policy两个关键元素,以提高GCN模型的性能和可扩展性。
  • results: 这个论文的实验结果显示,LinGCN在NTU-XVIEW skeleton joint dataset上具有较高的延迟速度、准确率和可扩展性,相比CryptoGCN等其他解决方案。具体来说,LinGCN在GCN模型的加密运算中实现了14.2倍的延迟速度提升,保持了75%的准确率,并且降低了 multiplication depth。
    Abstract The growth of Graph Convolution Network (GCN) model sizes has revolutionized numerous applications, surpassing human performance in areas such as personal healthcare and financial systems. The deployment of GCNs in the cloud raises privacy concerns due to potential adversarial attacks on client data. To address security concerns, Privacy-Preserving Machine Learning (PPML) using Homomorphic Encryption (HE) secures sensitive client data. However, it introduces substantial computational overhead in practical applications. To tackle those challenges, we present LinGCN, a framework designed to reduce multiplication depth and optimize the performance of HE based GCN inference. LinGCN is structured around three key elements: (1) A differentiable structural linearization algorithm, complemented by a parameterized discrete indicator function, co-trained with model weights to meet the optimization goal. This strategy promotes fine-grained node-level non-linear location selection, resulting in a model with minimized multiplication depth. (2) A compact node-wise polynomial replacement policy with a second-order trainable activation function, steered towards superior convergence by a two-level distillation approach from an all-ReLU based teacher model. (3) an enhanced HE solution that enables finer-grained operator fusion for node-wise activation functions, further reducing multiplication level consumption in HE-based inference. Our experiments on the NTU-XVIEW skeleton joint dataset reveal that LinGCN excels in latency, accuracy, and scalability for homomorphically encrypted inference, outperforming solutions such as CryptoGCN. Remarkably, LinGCN achieves a 14.2x latency speedup relative to CryptoGCN, while preserving an inference accuracy of 75% and notably reducing multiplication depth.
    摘要 Graph Convolutional Network (GCN) 模型的发展已经革命化了许多应用程序,超过了人类表现在个人健康监测和金融系统等领域。但是在云端部署GCNs时,隐私问题引起了关注,因为可能会发生对客户数据的敌意攻击。为解决安全问题,使用了同源加密(HE)来保护敏感客户数据。然而,HE引入了实际应用中的重要计算开销。为了解决这些挑战,我们提出了LinGCN框架,用于降低 multiplication depth并优化HE基于GCN的推理性能。LinGCN框架包括三个关键元素:1. 可微分结构线性化算法,并且通过一个参数化的整数指示函数,与模型参数进行共训练,以达到优化目标。这种策略可以实现精细化节点级非线性位置选择,从而降低 multiplication depth。2. 一种减少 multiplication depth的紧凑型节点值替换策略,通过一个二次可训练的活化函数,由一个两级液态灵感法推导,从一个所有ReLU基于教师模型中学习。3. 一种可以进一步减少HE基于节点活化函数的 multiplication level 的加强HE解决方案。我们的实验表明,LinGCN在NTU-XVIEW骨架联合数据集上表现出了较高的延迟、准确率和扩展性,与CRYPTOGCN相比,LinGCN可以实现14.2倍的延迟速度提升,保持75%的推理精度,同时减少 multiplication depth。

Innovative Digital Storytelling with AIGC: Exploration and Discussion of Recent Advances

  • paper_url: http://arxiv.org/abs/2309.14329
  • repo_url: None
  • paper_authors: Rongzhang Gu, Hui Li, Changyue Su, Wayne Wu
  • for: 这个研究的目的是提高人们对将AI生成内容(AIGC)与数字故事创作的结合现状、局限性和挑战的认识。
  • methods: 这篇论文使用了现有的AIGC技术和数字故事创作工具,并通过实验和专家采访来研究AIGC与数字故事创作的整体效果和挑战。
  • results: 研究发现,虽然AIGC可以快速生成图片、音频和音效,但是在复杂的人物动画、表情和声音效果方面,人类仍然无法被代表。此外,AIGC与数字故事创作的结合还存在许多挑战和限制,如人工创作的灵活性和艺术感受的缺失。
    Abstract Digital storytelling, as an art form, has struggled with cost-quality balance. The emergence of AI-generated Content (AIGC) is considered as a potential solution for efficient digital storytelling production. However, the specific form, effects, and impacts of this fusion remain unclear, leaving the boundaries of AIGC combined with storytelling undefined. This work explores the current integration state of AIGC and digital storytelling, investigates the artistic value of their fusion in a sample project, and addresses common issues through interviews. Through our study, we conclude that AIGC, while proficient in image creation, voiceover production, and music composition, falls short of replacing humans due to the irreplaceable elements of human creativity and aesthetic sensibilities at present, especially in complex character animations, facial expressions, and sound effects. The research objective is to increase public awareness of the current state, limitations, and challenges arising from combining AIGC and digital storytelling.
    摘要 “数字storytelling”作为艺术形式,困惑于成本质量平衡。人工智能生成内容(AIGC)的出现被视为可能解决高效数字storytelling生产的问题。然而,这两者的结合的具体形式、效果和影响仍然不清楚,“数字storytelling”与AIGC的界限未定。本研究探讨了AIGC与数字storytelling的当前整合状况,研究了这两者艺术价值的融合效果,并通过采访 Addressing common issues. Through our study, we conclude that AIGC, while proficient in image creation, voiceover production, and music composition, falls short of replacing humans due to the irreplaceable elements of human creativity and aesthetic sensibilities at present, especially in complex character animations, facial expressions, and sound effects. The research objective is to increase public awareness of the current state, limitations, and challenges arising from combining AIGC and digital storytelling.

Physics of Language Models: Part 3.2, Knowledge Manipulation

  • paper_url: http://arxiv.org/abs/2309.14402
  • repo_url: None
  • paper_authors: Zeyuan Allen-Zhu, Yuanzhi Li
    for:This paper explores the ability of language models to manipulate stored knowledge during inference, specifically focusing on four manipulation types: retrieval, classification, comparison, and inverse search.methods:The authors use pre-trained language models like GPT2/3/4 and employ Chain of Thoughts (CoTs) during both training and inference to improve performance on simple classification and comparison tasks.results:The paper finds that language models struggle with simple classification and comparison tasks unless CoTs are employed, and they perform poorly in inverse knowledge search, even with adequate instruct fine-tuning. The primary contribution of the paper is a synthetic dataset for a controlled experiment that confirms these inherent weaknesses in language models.Here is the text in Simplified Chinese:for:这篇论文探讨了语言模型在推理过程中是否可以有效地把已经存储的知识 manipulate。methods:作者使用了预训练的语言模型如GPT2/3/4,并在推理过程中使用链条思维(CoTs)来提高简单的分类和比较任务的性能。results:论文发现,语言模型在简单的分类和比较任务上需要使用CoTs才能够有效地完成,而且它们在反向知识搜索任务中表现不佳,即使有足够的指导 fine-tuning。Primary contribution是一个控制实验的synthetic数据集,以confirm这些语言模型内在的劣势。
    Abstract Language models can store vast amounts of factual knowledge, but their ability to use this knowledge for logical reasoning remains questionable. This paper explores a language model's ability to manipulate its stored knowledge during inference. We focus on four manipulation types: retrieval (e.g., "What is person A's attribute X"), classification (e.g., "Is A's attribute X even or odd?"), comparison (e.g., "Is A greater than B in attribute X?") and inverse search (e.g., "Which person's attribute X equals T?") We observe that pre-trained language models like GPT2/3/4 excel in knowledge retrieval but struggle with simple classification or comparison tasks unless Chain of Thoughts (CoTs) are employed during both training and inference. They also perform poorly in inverse knowledge search, irrespective of the prompts. Our primary contribution is a synthetic dataset for a controlled experiment that confirms these inherent weaknesses: a language model cannot efficiently manipulate knowledge from pre-training data, even when such knowledge is perfectly stored and fully extractable in the models, and despite adequate instruct fine-tuning.
    摘要 语言模型可以存储庞大的 фактические知识,但它们在逻辑推理方面的能力仍然存在问题。这篇论文探讨了语言模型在推理过程中如何 manipulate 存储的知识。我们关注了四种推理方法:提取(例如,"人A的特征X是什么?)、分类(例如,"A的特征X是偶数或奇数?)、比较(例如,"A是B在特征X方面大吗?)以及反向搜索(例如,"谁的特征X等于T?)。我们发现,预训练的语言模型如GPT2/3/4在知识提取方面表现出色,但在简单的分类或比较任务中,除非使用链接思维(CoTs),否则表现不佳。它们还在反向知识搜索方面表现不佳,不管提示是什么。我们的主要贡献是一个控制性的 synthetic 数据集,用于确认这些内在的弱点:语言模型不能效率地从预训练数据中提取知识,即使这些知识完全可以在模型中提取并且受到了充分的训练精化。

Physics of Language Models: Part 3.1, Knowledge Storage and Extraction

  • paper_url: http://arxiv.org/abs/2309.14316
  • repo_url: None
  • paper_authors: Zeyuan Allen Zhu, Yuanzhi Li
  • for: 本研究探讨了大语言模型是否真正从知识源中提取信息,或者只是通过在训练中遇到类似的问题来回答问题。
  • methods: 本研究使用控制的半人工生物графи信息来进行深入研究这个问题。
  • results: 研究发现,模型的知识提取能力与不同多样性度指标的训练数据相关。通过近似线性探测,发现模型在隐藏嵌入名词的位置或者在训练文本中的其他符号的嵌入位置上线性编码知识属性强相关。
    Abstract Large language models can store extensive world knowledge, often extractable through question-answering (e.g., "What is Abraham Lincoln's birthday?"). However, it's unclear whether the model answers questions based on exposure to exact/similar questions during training, or if it genuinely extracts knowledge from the source (e.g., Wikipedia biographies). In this paper, we conduct an in-depth study of this problem using a controlled set of semi-synthetic biography data. We uncover a relationship between the model's knowledge extraction ability and different diversity measures of the training data. We conduct (nearly) linear probing, revealing a strong correlation between this relationship and whether the model (nearly) linearly encodes the knowledge attributes at the hidden embedding of the entity names, or across the embeddings of other tokens in the training text.
    摘要 大型语言模型可以储存广泛的世界知识,通常通过问答(例如,“亚伯拉罕林肯的生日是什么?)来抽出知识。然而,是否 modelo 回答问题基于训练时期所曝露的具体/相似问题,或是它实际提取知识从源(例如,Wikipedia 传记),这是一个未知的问题。在这篇论文中,我们透过一个控制的半人工生物agraph 数据集进行了深入的研究。我们发现了知识提取能力和不同多样性度量的训练数据之间的关系。我们进行了(近乎)直线探索,发现这个关系和模型(近乎)直线将知识属性嵌入到实体名称的隐藏嵌入中,或者在训练文本中的其他 tokens 的嵌入中。

Multiple Different Explanations for Image Classifiers

  • paper_url: http://arxiv.org/abs/2309.14309
  • repo_url: None
  • paper_authors: Hana Chockler, David A. Kelly, Daniel Kroening
  • for: 提供多个预测结果的算法和工具,以帮助理解黑盒图像分类器的行为。
  • methods: 基于 causal 理论,使用原理导向的方法计算多个预测结果。
  • results: 在 ImageNet-mini benchmark 上,REX 算法可以对 7 倍更多的图像进行多个预测结果计算,与之前的工作具有显著的提升。
    Abstract Existing explanation tools for image classifiers usually give only one single explanation for an image. For many images, however, both humans and image classifiers accept more than one explanation for the image label. Thus, restricting the number of explanations to just one severely limits the insight into the behavior of the classifier. In this paper, we describe an algorithm and a tool, REX, for computing multiple explanations of the output of a black-box image classifier for a given image. Our algorithm uses a principled approach based on causal theory. We analyse its theoretical complexity and provide experimental results showing that REX finds multiple explanations on 7 times more images than the previous work on the ImageNet-mini benchmark.
    摘要 现有的图像分类器解释工具通常只给出一个图像的解释。然而,许多图像都可以由人类和图像分类器接受多个解释。因此,只给出一个解释将限制我们对分类器的行为的理解。在这篇论文中,我们描述了一种算法和工具,即REX,用于计算一个黑板图像分类器的输出对某图像的多个解释。我们的算法基于 causal theory,我们分析了其理论复杂性,并提供了实验结果,显示REX在ImageNet-mini benchmark上可以对7个图像计算多个解释。

Overview of Class Activation Maps for Visualization Explainability

  • paper_url: http://arxiv.org/abs/2309.14304
  • repo_url: None
  • paper_authors: Anh Pham Thi Minh
  • for: 本研究旨在概述过去几年内Class Activation Map(CAM)方法的演进,以及评价CAM的精度和可读性。
  • methods: 本研究使用了多种方法来评价CAM的精度和可读性,包括使用不同的评价指标和附加技术来提高CAM的精度和可读性。
  • results: 本研究发现了一些CAM方法的缺点和限制,并提出了未来研究的可能性,以提高CAM的可读性和精度。
    Abstract Recent research in deep learning methodology has led to a variety of complex modelling techniques in computer vision (CV) that reach or even outperform human performance. Although these black-box deep learning models have obtained astounding results, they are limited in their interpretability and transparency which are critical to take learning machines to the next step to include them in sensitive decision-support systems involving human supervision. Hence, the development of explainable techniques for computer vision (XCV) has recently attracted increasing attention. In the realm of XCV, Class Activation Maps (CAMs) have become widely recognized and utilized for enhancing interpretability and insights into the decision-making process of deep learning models. This work presents a comprehensive overview of the evolution of Class Activation Map methods over time. It also explores the metrics used for evaluating CAMs and introduces auxiliary techniques to improve the saliency of these methods. The overview concludes by proposing potential avenues for future research in this evolving field.
    摘要 Within the realm of XCV, Class Activation Maps (CAMs) have become widely recognized and utilized for enhancing interpretability and insights into the decision-making process of deep learning models. This overview provides a comprehensive review of the evolution of CAM methods over time, explores the metrics used for evaluating CAMs, and introduces auxiliary techniques to improve the saliency of these methods. The overview concludes by proposing potential avenues for future research in this rapidly evolving field.Translation notes:* "black-box" refers to the lack of transparency and interpretability of deep learning models.* "sensitive decision-support systems" refers to systems that require human supervision and involve critical decision-making.* "XCV" stands for "explainable computer vision".* "CAMs" stands for "Class Activation Maps".

Identifying the Risks of LM Agents with an LM-Emulated Sandbox

  • paper_url: http://arxiv.org/abs/2309.15817
  • repo_url: https://github.com/ryoungj/toolemu
  • paper_authors: Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, Tatsunori Hashimoto
    for:The paper aims to address the challenges of identifying risks in Language Model (LM) agents and tools, such as leaking private data or causing financial losses, by introducing a framework called ToolEmu and an automatic safety evaluator.methods:The ToolEmu framework uses an LM to emulate tool execution and enables the testing of LM agents against a diverse range of tools and scenarios without manual instantiation. The evaluator examines agent failures and quantifies associated risks.results:The tool emulator and evaluator were tested through human evaluation, and the results showed that 68.8% of failures identified with ToolEmu would be valid real-world agent failures. The paper also provides a quantitative risk analysis of current LM agents and identifies numerous failures with potentially severe outcomes, highlighting the need to develop safer LM agents for real-world deployment.Here’s the simplified Chinese text:for: 这篇论文目标是解决语言模型(LM)代理和工具中的风险识别问题,如泄露private数据或者导致金融损失,通过引入工具模拟器(ToolEmu)和自动安全评估器。methods: 工具模拟器使用LM来模拟工具执行,无需手动实例化,可以测试LM代理在多种工具和enario下。自动安全评估器对代理失败进行评估并评估相关风险。results: 通过人工评估,工具模拟器和自动安全评估器的测试结果显示,68.8%的失败是真实的世界代理失败。论文还提供了当前LM代理的量化风险分析,并发现了许多可能导致严重后果的失败,高亮了需要开发更安全的LM代理 для实际应用。
    Abstract Recent advances in Language Model (LM) agents and tool use, exemplified by applications like ChatGPT Plugins, enable a rich set of capabilities but also amplify potential risks - such as leaking private data or causing financial losses. Identifying these risks is labor-intensive, necessitating implementing the tools, manually setting up the environment for each test scenario, and finding risky cases. As tools and agents become more complex, the high cost of testing these agents will make it increasingly difficult to find high-stakes, long-tailed risks. To address these challenges, we introduce ToolEmu: a framework that uses an LM to emulate tool execution and enables the testing of LM agents against a diverse range of tools and scenarios, without manual instantiation. Alongside the emulator, we develop an LM-based automatic safety evaluator that examines agent failures and quantifies associated risks. We test both the tool emulator and evaluator through human evaluation and find that 68.8% of failures identified with ToolEmu would be valid real-world agent failures. Using our curated initial benchmark consisting of 36 high-stakes tools and 144 test cases, we provide a quantitative risk analysis of current LM agents and identify numerous failures with potentially severe outcomes. Notably, even the safest LM agent exhibits such failures 23.9% of the time according to our evaluator, underscoring the need to develop safer LM agents for real-world deployment.
    摘要 近期语言模型(LM)代理和工具的应用,如ChatGPT插件,提供了一个富有可能性的功能集,但也扩大了潜在风险的范围 - 如泄露私人数据或导致财务损失。识别这些风险是劳动密集的,需要实施工具,手动设置测试enario的环境,并找到危险的场景。随着工具和代理的复杂度的增加,测试这些代理的成本将在不断增加,使得找到高度投资、长尾风险变得越来越困难。为解决这些挑战,我们引入 ToolEmu:一个基于LM的工具抽象框架,可以在多种工具和enario下测试LM代理,无需手动实例化。同时,我们开发了基于LM的自动安全评估工具,可以对代理失败进行评估,并评估相关风险。我们通过人工评估测试了ToolEmu和评估工具,发现68.8%的失败是真实世界中的代理失败。使用我们精心准备的初始准 benchmark,包括36个高度投资工具和144个测试场景,我们提供了一个量化风险分析,并发现当前LM代理中存在许多失败,其中23.9%的时间,最安全的LM代理也会出现这些失败。这表明需要为实际部署开发更安全的LM代理。

NAS-NeRF: Generative Neural Architecture Search for Neural Radiance Fields

  • paper_url: http://arxiv.org/abs/2309.14293
  • repo_url: None
  • paper_authors: Saeejith Nair, Yuhao Chen, Mohammad Javad Shafiee, Alexander Wong
  • for: 该文章的目的是提出一种基于神经网络搜索的NeRF架构优化策略,以实现在不同场景下达到高品质synthesis的同时控制计算复杂性。
  • methods: 该方法使用神经网络搜索策略来生成适应不同场景的NeRF架构,并通过约束目标synthesis质量指标和预算来引导搜索。
  • results: 实验结果表明,提议的NAS-NeRF可以生成比基eline NeRF更加小巧、快速和低耗的NeRF架构,而且在不同场景下都可以保持高品质synthesis。
    Abstract Neural radiance fields (NeRFs) enable high-quality novel view synthesis, but their high computational complexity limits deployability. While existing neural-based solutions strive for efficiency, they use one-size-fits-all architectures regardless of scene complexity. The same architecture may be unnecessarily large for simple scenes but insufficient for complex ones. Thus, there is a need to dynamically optimize the neural network component of NeRFs to achieve a balance between computational complexity and specific targets for synthesis quality. We introduce NAS-NeRF, a generative neural architecture search strategy that generates compact, scene-specialized NeRF architectures by balancing architecture complexity and target synthesis quality metrics. Our method incorporates constraints on target metrics and budgets to guide the search towards architectures tailored for each scene. Experiments on the Blender synthetic dataset show the proposed NAS-NeRF can generate architectures up to 5.74$\times$ smaller, with 4.19$\times$ fewer FLOPs, and 1.93$\times$ faster on a GPU than baseline NeRFs, without suffering a drop in SSIM. Furthermore, we illustrate that NAS-NeRF can also achieve architectures up to 23$\times$ smaller, with 22$\times$ fewer FLOPs, and 4.7$\times$ faster than baseline NeRFs with only a 5.3% average SSIM drop. Our source code is also made publicly available at https://saeejithnair.github.io/NAS-NeRF.
    摘要 神经震荡场(NeRF)可以实现高质量的新视图合成,但是它们的计算复杂性限制了它们的部署。现有的神经网络解决方案尽量减少计算复杂性,但是它们使用一个适用于所有场景的 Architecture,无论场景的复杂性如何。这种 Architecture 可能是对简单场景来说过大,对复杂场景来说则不够。因此,有一个需要 dynamically optimize NeRF 的神经网络组件,以达到计算复杂性和特定目标的平衡。我们介绍了 NAS-NeRF,一种生成神经架构搜索策略,可以生成适合场景的 Compact 和 Scene-Specialized NeRF 架构,并且可以根据目标合成质量指标和预算来导引搜索。我们的方法可以在 Blender synthetic 数据集上实现,并且可以生成与基eline NeRF 相比,5.74倍小、4.19倍 fewer FLOPs、1.93倍快于 GPU 上的 NeRF 架构,而无需做出 SSIM 下降。此外,我们还证明了 NAS-NeRF 可以生成与基eline NeRF 相比,23倍小、22倍 fewer FLOPs、4.7倍快于 GPU 上的 NeRF 架构,只有5.3%的 average SSIM 下降。我们的源代码也公开发布在

Perception-and-Energy-aware Motion Planning for UAV using Learning-based Model under Heteroscedastic Uncertainty

  • paper_url: http://arxiv.org/abs/2309.14272
  • repo_url: https://gitlab.com/rei08/perception-energy-planner
  • paper_authors: Reiya Takemura, Genya Ishigami
  • for: 本研究旨在帮助无人航空器(UAV)在Global Navigation Satellite Systems(GNSS) denied环境中能够能量高效、可靠地飞行。
  • methods: 该研究提出了一种基于感知和能源的动作规划算法,用于解决UAV在GNSS denied环境中的轨迹规划问题。该算法优化了一个权重因子,包括UAV的总能量消耗和LiDAR探测器 mounted on UAV 的感知质量。在线导航之前,高精度模拟器从实际飞行数据中学习了UAV的能量消耗和LiDAR测量的不确定性,以便在线导航时能够更好地估算这些参数。
  • results: 对比实际环境中的 fotorealistic 环境,实验结果表明,提出的算法可以在不确定性下进行权衡,以提高UAV的能效性和感知质量。开源代码可以在 https://gitlab.com/ReI08/perception-energy-planner 上下载。
    Abstract Global navigation satellite systems (GNSS) denied environments/conditions require unmanned aerial vehicles (UAVs) to energy-efficiently and reliably fly. To this end, this study presents perception-and-energy-aware motion planning for UAVs in GNSS-denied environments. The proposed planner solves the trajectory planning problem by optimizing a cost function consisting of two indices: the total energy consumption of a UAV and the perception quality of light detection and ranging (LiDAR) sensor mounted on the UAV. Before online navigation, a high-fidelity simulator acquires a flight dataset to learn energy consumption for the UAV and heteroscedastic uncertainty associated with LiDAR measurements, both as functions of the horizontal velocity of the UAV. The learned models enable the online planner to estimate energy consumption and perception quality, reducing UAV battery usage and localization errors. Simulation experiments in a photorealistic environment confirm that the proposed planner can address the trade-off between energy efficiency and perception quality under heteroscedastic uncertainty. The open-source code is released at https://gitlab.com/ReI08/perception-energy-planner.
    摘要

Unsupervised correspondence with combined geometric learning and imaging for radiotherapy applications

  • paper_url: http://arxiv.org/abs/2309.14269
  • repo_url: https://github.com/rrr-uom-projects/unsup-rt-corr-net
  • paper_authors: Edward G. A. Henderson, Marcel van Herk, Andrew F. Green, Eliana M. Vasquez Osorio
  • For: The paper aims to develop a model for accurately identifying corresponding points between organ segmentations of different patients for radiotherapy applications.* Methods: The model uses a combination of 3D shape information and imaging information to estimate correspondences and perform interpolation. The model was trained with head and neck organ segmentations from planning CT scans, and two approaches were used to incorporate imaging information: extracting features directly from image patches and including the mean square error between patches as part of the loss function.* Results: The correspondence and interpolation performance were evaluated using several metrics, including geodesic error, chamfer distance, and conformal distortion. The best performing model configuration incorporated imaging information as part of the loss function, which produced more anatomically plausible correspondences. The model outperformed a baseline non-rigid registration approach and the original model with direct inclusion of image features.Here is the same information in Simplified Chinese:* For: 这篇论文的目标是为了准确地标注不同患者的器官分割中的对应点,以便在放疗应用中使用。* Methods: 该模型使用了3D形状信息和成像信息来估算对应点并进行插值。模型通过使用规划CT扫描图像的头颈器官分割来进行训练,并采用了两种方法来包含成像信息:直接从图像块中提取特征,以及将图像块之间的平均方差作为损失函数的一部分。* Results: 对应点和插值性能被评估使用了几种指标,包括 геодезиック误差、斜截距离和几何扭曲误差。最佳配置中包含了成像信息作为损失函数的一部分,生成了更 Plausible的对应点。模型比非rigid registration方法和原始模型 WITH direct inclusion of image features更好。
    Abstract The aim of this study was to develop a model to accurately identify corresponding points between organ segmentations of different patients for radiotherapy applications. A model for simultaneous correspondence and interpolation estimation in 3D shapes was trained with head and neck organ segmentations from planning CT scans. We then extended the original model to incorporate imaging information using two approaches: 1) extracting features directly from image patches, and 2) including the mean square error between patches as part of the loss function. The correspondence and interpolation performance were evaluated using the geodesic error, chamfer distance and conformal distortion metrics, as well as distances between anatomical landmarks. Each of the models produced significantly better correspondences than the baseline non-rigid registration approach. The original model performed similarly to the model with direct inclusion of image features. The best performing model configuration incorporated imaging information as part of the loss function which produced more anatomically plausible correspondences. We will use the best performing model to identify corresponding anatomical points on organs to improve spatial normalisation, an important step in outcome modelling, or as an initialisation for anatomically informed registrations. All our code is publicly available at https://github.com/rrr-uom-projects/Unsup-RT-Corr-Net
    摘要 “本研究的目标是开发一种能够准确标注不同病人器官分割的模型,以便在放射治疗应用中进行空间Normalization。我们使用了头和 neck器官分割的规划CT扫描图进行模型训练。然后,我们将原始模型扩展以包括影像信息,使用两种方法:1)直接从图像块中提取特征,2)在损失函数中包括图像块的平均方差。对于每个模型,我们评估了各种维度的比较,包括地odesic error、Chamfer distance和conformal distortion metric,以及器官标志点之间的距离。每个模型都生成了较好的对应关系,比基eline非rigid registration方法更好。原始模型和直接包括图像特征的模型的性能相似。最佳配置是将影像信息作为损失函数的一部分来,生成更符合解剖学的对应关系。我们将使用最佳配置来标注器官之间的对应点,以提高结果模型中的空间Normalization,或者作为初始化 для解剖学指导的registrations。我们的代码都公开在https://github.com/rrr-uom-projects/Unsup-RT-Corr-Net上”

Date-Driven Approach for Identifying State of Hemodialysis Fistulas: Entropy-Complexity and Formal Concept Analysis

  • paper_url: http://arxiv.org/abs/2309.14399
  • repo_url: None
  • paper_authors: Vasilii A. Gromov, E. I. Zvorykina, Yurii N. Beschastnov, Majid Sohrabi
  • for: 这种研究旨在为诊断病理性尿道采用数学方法进行分类。
  • methods: 这种方法基于laminar blood flow的假设,认为正常功能的尿道会出现单普液流,而病理性尿道会出现湍流。这种方法包括在 entropy-complexity 平面上将时序列映射,并与已知集合进行比较,以及使用正式概念分析构建 concepts-objects 图。
  • results: 这两种方法具有高效性,可以准确地确定尿道的状态。
    Abstract The paper explores mathematical methods that differentiate regular and chaotic time series, specifically for identifying pathological fistulas. It proposes a noise-resistant method for classifying responding rows of normally and pathologically functioning fistulas. This approach is grounded in the hypothesis that laminar blood flow signifies normal function, while turbulent flow indicates pathology. The study explores two distinct methods for distinguishing chaotic from regular time series. The first method involves mapping the time series onto the entropy-complexity plane and subsequently comparing it to established clusters. The second method, introduced by the authors, constructs a concepts-objects graph using formal concept analysis. Both of these methods exhibit high efficiency in determining the state of the fistula.
    摘要 文章研究了用数学方法分辨常规和异常时间序列,特别是用于识别病理性尿道。它提出了一种对应行的响应类型进行分类的听力抗噪方法,这种方法基于假设:在正常情况下,液体血流表示正常功能,而在异常情况下,液体血流表示疾病。文章研究了两种方法来分辨混乱和常规时间序列:首先,将时间序列映射到复杂度-自 entropy 平面,然后与已知的集群进行比较;其次,通过正式概念分析构建一个概念物件图。两种方法都能高效地确定尿道的状态。

OmniEvent: A Comprehensive, Fair, and Easy-to-Use Toolkit for Event Understanding

  • paper_url: http://arxiv.org/abs/2309.14258
  • repo_url: https://github.com/thu-keg/omnievent
  • paper_authors: Hao Peng, Xiaozhi Wang, Feng Yao, Zimu Wang, Chuzhao Zhu, Kaisheng Zeng, Lei Hou, Juanzi Li
  • for: 本研究开发了一个全面的事件理解工具kit OmniEvent,用于解决文本中事件检测、事件Argument提取和事件关系提取等复杂的信息提取任务。
  • methods: OmniEvent支持主流的模型化方法,并处理了Peng et al. (2023)所报告的隐藏评估坑,以确保公平的比较。
  • results: OmniEvent提供了一个可直接用于生产环境的Web服务,并提供了可修改的模块化框架,以便用户根据需要进行自定义模型评估。
    Abstract Event understanding aims at understanding the content and relationship of events within texts, which covers multiple complicated information extraction tasks: event detection, event argument extraction, and event relation extraction. To facilitate related research and application, we present an event understanding toolkit OmniEvent, which features three desiderata: (1) Comprehensive. OmniEvent supports mainstream modeling paradigms of all the event understanding tasks and the processing of 15 widely-used English and Chinese datasets. (2) Fair. OmniEvent carefully handles the inconspicuous evaluation pitfalls reported in Peng et al. (2023), which ensures fair comparisons between different models. (3) Easy-to-use. OmniEvent is designed to be easily used by users with varying needs. We provide off-the-shelf models that can be directly deployed as web services. The modular framework also enables users to easily implement and evaluate new event understanding models with OmniEvent. The toolkit (https://github.com/THU-KEG/OmniEvent) is publicly released along with the demonstration website and video (https://omnievent.xlore.cn/).
    摘要 Event理解目标是理解文本中的事件内容和关系,包括多种复杂信息提取任务:事件检测、事件参数提取和事件关系提取。为推动相关研究和应用,我们提供了一套事件理解工具包 OmniEvent,具有以下三个目标:1. 全面。OmniEvent支持主流模型化思路所有的事件理解任务,并处理15种常用的英文和中文数据集。2. 公平。OmniEvent通过彻底处理报告在Peng et al. (2023)中报道的隐藏评估坑,以确保比较不同模型的公平。3. 易用。OmniEvent设计便于用户们根据需要使用。我们提供了直接可以部署为网服务的准备好的模型,框架也允许用户轻松实现和评估新的事件理解模型。工具kit(https://github.com/THU-KEG/OmniEvent)公开发布,并提供了示例网站和视频(https://omnievent.xlore.cn/)。

Prediction Model For Wordle Game Results With High Robustness

  • paper_url: http://arxiv.org/abs/2309.14250
  • repo_url: https://github.com/zeniSoida/pl1
  • paper_authors: Jiaqi Weng, Chunlin Feng
  • for: 本研究用数据分析和机器学习方法研究Wordle游戏的动态。
  • methods: 我们使用ARIMAX模型和反射神经网络模型来预测词语的难度,并使用K-means归一化来分类词语的数值。
  • results: 我们的研究发现,在2023年3月1日,约有12,884个结果将被提交,词语”尴尬”的平均尝试次数为4.8,属于最难的分数 cluster。此外,我们还研究了玩家的忠诚度和他们是否做日常挑战的比例。我们的模型经过了严格的敏感分析和验证,确认其稳定性。总的来说,本研究提供了基于日期或给定的五个字词语的Wordle游戏预测框架,结果已经提交给纽约时报游戏编辑。
    Abstract In this study, we delve into the dynamics of Wordle using data analysis and machine learning. Our analysis initially focused on the correlation between the date and the number of submitted results. Due to initial popularity bias, we modeled stable data using an ARIMAX model with coefficient values of 9, 0, 2, and weekdays/weekends as the exogenous variable. We found no significant relationship between word attributes and hard mode results. To predict word difficulty, we employed a Backpropagation Neural Network, overcoming overfitting via feature engineering. We also used K-means clustering, optimized at five clusters, to categorize word difficulty numerically. Our findings indicate that on March 1st, 2023, around 12,884 results will be submitted and the word "eerie" averages 4.8 attempts, falling into the hardest difficulty cluster. We further examined the percentage of loyal players and their propensity to undertake daily challenges. Our models underwent rigorous sensitivity analyses, including ADF, ACF, PACF tests, and cross-validation, confirming their robustness. Overall, our study provides a predictive framework for Wordle gameplay based on date or a given five-letter word. Results have been summarized and submitted to the Puzzle Editor of the New York Times.
    摘要 在这个研究中,我们使用数据分析和机器学习来研究Wordle的动态。我们首先查看了日期和提交结果之间的相关性。由于初始的受欢迎偏见,我们使用ARIMAX模型,其含有9, 0, 2的系数值和星期天/星期六作为外生变量。我们发现没有显著的词属性和困难模式之间的关系。 为预测词难度,我们使用反射神经网络,并通过特征工程来避免过拟合。我们还使用K-means聚类算法,并优化为五个分类。我们发现,在2023年3月1日,约有12,884个结果将被提交,并且词“幽默”的平均尝试次数为4.8,属于最难的分类。 我们进一步研究了忠诚玩家的百分比和他们的日常挑战的倾向。我们的模型经过了严格的敏感分析,包括ADF、ACF、PACF测试和批处理,以确认其可靠性。总的来说,我们的研究提供了基于日期或给定的五个字的Wordle游戏玩法预测框架。结果已经总结并提交给纽约时报游戏编辑。

Rethinking Internet Communication Through LLMs: How Close Are We?

  • paper_url: http://arxiv.org/abs/2309.14247
  • repo_url: None
  • paper_authors: Sifat Ut Taki, Spyridon Mastorakis
  • for: 重新思考互联网上用户之间的交流方式,以便更好地捕捉用户对另一端通信频道的认知。
  • methods: 提出使用大语言模型(LLM)代表用户之间的交流,并提出了实现这种通信架构的方法。
  • results: 对现有技术的可行性进行了 reality check,并讨论了未来研究的挑战和有趣的方向。
    Abstract In this paper, we rethink the way that communication among users over the Internet, one of the fundamental outcomes of the Internet evolution, takes place. Instead of users communicating directly over the Internet, we explore an architecture that enables users to communicate with (query) Large Language Models (LLMs) that capture the cognition of users on the other end of the communication channel. We present an architecture to achieve such LLM-based communication and we perform a reality check to assess how close we are today to realizing such a communication architecture from a technical point of view. Finally, we discuss several research challenges and identify interesting directions for future research.
    摘要 在这篇论文中,我们重新思考互联网上用户之间的通信方式,这是互联网进化的一个基本结果。而不是直接通过互联网进行用户之间的通信,我们研究了一种使用大自然语言模型(LLM)来捕捉用户对另一端通信频道的认知。我们提出了实现这种 LLM-based 通信架构的方案,并进行了技术实现的现实性检查。最后,我们讨论了一些研究挑战和未来研究的有趣方向。

Enhancing data efficiency in reinforcement learning: a novel imagination mechanism based on mesh information propagation

  • paper_url: http://arxiv.org/abs/2309.14243
  • repo_url: https://github.com/ouazusakou/imagination_mechanism
  • paper_authors: Zihang Wang, Maowei Jiang
  • for: 提高深度学习强化学习(RL)算法的数据效率,特别是面临高维状态空间和大规模问题时。
  • methods: 引入一种人类类似的想象机制(Imagination Mechanism,IM),用于在不同话题间共享样本信息,从而提高RL算法的学习效率。
  • results: 在四种主流SOTA RL算法(SAC、PPO、DDPG和DQN)上,IM可以带来显著的提高,最终导致不同任务上的表现均得到提高。
    Abstract Reinforcement learning(RL) algorithms face the challenge of limited data efficiency, particularly when dealing with high-dimensional state spaces and large-scale problems. Most of RL methods often rely solely on state transition information within the same episode when updating the agent's Critic, which can lead to low data efficiency and sub-optimal training time consumption. Inspired by human-like analogical reasoning abilities, we introduce a novel mesh information propagation mechanism, termed the 'Imagination Mechanism (IM)', designed to significantly enhance the data efficiency of RL algorithms. Specifically, IM enables information generated by a single sample to be effectively broadcasted to different states across episodes, instead of simply transmitting in the same episode. This capability enhances the model's comprehension of state interdependencies and facilitates more efficient learning of limited sample information. To promote versatility, we extend the IM to function as a plug-and-play module that can be seamlessly and fluidly integrated into other widely adopted RL algorithms. Our experiments demonstrate that IM consistently boosts four mainstream SOTA RL algorithms, such as SAC, PPO, DDPG, and DQN, by a considerable margin, ultimately leading to superior performance than before across various tasks. For access to our code and data, please visit https://github.com/OuAzusaKou/imagination_mechanism
    摘要 利用人类类似的想象能力,我们引入了一种新的网格信息传递机制,称为“想象机制”(IM),以提高深度学习束缚学习(RL)算法的数据效率。Specifically,IM使得单个样本生成的信息可以有效地在不同话数据集中传递,而不是仅在同一话数据集中传递。这种能力提高模型对状态之间的相互关系的理解,并且使得学习有限样本信息更加高效。为了推广可用性,我们将IM作为一个可插入式和可靠地Integrate into other widely adopted RL algorithms。我们的实验表明,IM可以持续地提高四种主流SOTA RL算法,如SAC、PPO、DDPG和DQN,并 ultimately leading to superior performance across various tasks. For access to our code and data, please visit https://github.com/OuAzusaKou/imagination_mechanism。

Seeing and hearing what has not been said; A multimodal client behavior classifier in Motivational Interviewing with interpretable fusion

  • paper_url: http://arxiv.org/abs/2309.14398
  • repo_url: None
  • paper_authors: Lucie Galland, Catherine Pelachaud, Florian Pecune
  • for: 评估动机听讲的质量,以便提高治疗效果。
  • methods: 利用多Modal特征,如文本、语音、脸部表情和身体表情,建立一个准确地分类客户话语的模型。
  • results: 通过对AnnoMI数据集进行注解,收集多Modal信息,并确定最重要的Modalities在决策过程中的扮演。
    Abstract Motivational Interviewing (MI) is an approach to therapy that emphasizes collaboration and encourages behavioral change. To evaluate the quality of an MI conversation, client utterances can be classified using the MISC code as either change talk, sustain talk, or follow/neutral talk. The proportion of change talk in a MI conversation is positively correlated with therapy outcomes, making accurate classification of client utterances essential. In this paper, we present a classifier that accurately distinguishes between the three MISC classes (change talk, sustain talk, and follow/neutral talk) leveraging multimodal features such as text, prosody, facial expressivity, and body expressivity. To train our model, we perform annotations on the publicly available AnnoMI dataset to collect multimodal information, including text, audio, facial expressivity, and body expressivity. Furthermore, we identify the most important modalities in the decision-making process, providing valuable insights into the interplay of different modalities during a MI conversation.
    摘要 《动机导向会议》(MI)是一种帮助客户改变行为的医疗方法。为评估MI会议质量,客户的语言可以被分类为变化语言、维持语言或跟随/中立语言。变化语言的比例和治疗效果正相关,因此正确地分类客户语言非常重要。在这篇论文中,我们提出了一种精准地分类客户语言的分类器,利用多Modal特征,如文本、 просодия、 facial expressivity 和 body expressivity。为了训练我们的模型,我们对公共可用的 AnnoMI 数据集进行了标注,以收集多Modal信息,包括文本、音频、 facial expressivity 和 body expressivity。此外,我们还确定了决策过程中最重要的Modalities,提供了有价值的发现,描述了不同Modalities在MI会议中的协作。

MoDem-V2: Visuo-Motor World Models for Real-World Robot Manipulation

  • paper_url: http://arxiv.org/abs/2309.14236
  • repo_url: None
  • paper_authors: Patrick Lancaster, Nicklas Hansen, Aravind Rajeswaran, Vikash Kumar
    for:* 本研究旨在开发一个能够在无instrumented real-world environments中学习 contact-rich manipulation的系统,以提高现代机器人系统的可靠性和安全性。methods:* 该系统基于latest algorithmic advancements in model-based reinforcement learning (MBRL), demo-bootstrapping, and effective exploration,并使用 visual pixels directly for learning。results:* 该系统能够在实际世界中学习 contact-rich dexterous manipulation skills,并在四个复杂的visuo-motor manipulation问题中进行了 empirical demonstration。这是首个直接在实际世界中学习的 demonstration-augmented visual MBRL系统。
    Abstract Robotic systems that aspire to operate in uninstrumented real-world environments must perceive the world directly via onboard sensing. Vision-based learning systems aim to eliminate the need for environment instrumentation by building an implicit understanding of the world based on raw pixels, but navigating the contact-rich high-dimensional search space from solely sparse visual reward signals significantly exacerbates the challenge of exploration. The applicability of such systems is thus typically restricted to simulated or heavily engineered environments since agent exploration in the real-world without the guidance of explicit state estimation and dense rewards can lead to unsafe behavior and safety faults that are catastrophic. In this study, we isolate the root causes behind these limitations to develop a system, called MoDem-V2, capable of learning contact-rich manipulation directly in the uninstrumented real world. Building on the latest algorithmic advancements in model-based reinforcement learning (MBRL), demo-bootstrapping, and effective exploration, MoDem-V2 can acquire contact-rich dexterous manipulation skills directly in the real world. We identify key ingredients for leveraging demonstrations in model learning while respecting real-world safety considerations -- exploration centering, agency handover, and actor-critic ensembles. We empirically demonstrate the contribution of these ingredients in four complex visuo-motor manipulation problems in both simulation and the real world. To the best of our knowledge, our work presents the first successful system for demonstration-augmented visual MBRL trained directly in the real world. Visit https://sites.google.com/view/modem-v2 for videos and more details.
    摘要 роботизированные системы, которые стремятся работать в неинструментированных реальных средах, должны воспринимать мир непосредственно с помощью наборного зрения. Системы обучения на основе зрения стремятся отказаться от необходимости в инструментированной среде, построив implicit понимание мира на основе raw пикселей, но навигация высокомерных пространств по solely sparse visual reward signals значительно увеличивает вызов эксплорации. Поэтому применение таких систем ограничено симулированными или инженерными средами, так как исследуние агента в реальном мире без руководства эксплицитных state estimation и dense reward может привести к небезопасному поведению и фатальным ошибкам.В этом исследовании мы изолируем корни этих ограничений, чтобы разработать систему, называемую MoDem-V2, которая может научиться контактным манипуляциям непосредственно в неинструментированном реальном мире. На основе последних достижений в области алгоритмов моделируемого обучения (MBRL), демо-ботстроппинга и эффективного исследувания, MoDem-V2 может приобрести контактные манипулятивные навыки в реальном мире. Мы идентифицируем ключевые ингредиенты для использования демонстраций в модельном обучении, уважая considertions безопасности реального мира -- exploration centering, handover agency и actor-critic ensembles. Мы экспериментально подтверждаем вклад этих ингредиентов в решении четырех сложных визуометрических манипулятивных задач в обеих симуляции и реальном мире. По нашему знанию, наша работа представляет первую успешную систему demonstration-augmented visual MBRL, обученную непосредственно в реальном мире. Посетите https://sites.google.com/view/modem-v2 для видео и дополнительные детали.

Stackelberg Driver Model for Continual Policy Improvement in Scenario-Based Closed-Loop Autonomous Driving

  • paper_url: http://arxiv.org/abs/2309.14235
  • repo_url: https://github.com/BlueCat-de/SDM
  • paper_authors: Haoyi Niu, Qimao Chen, Yingyue Li, Jianming Hu
  • for: 本研究旨在提高自动驾驶车辆(AV)的性能和可靠性,通过针对极其罕见 yet critical corner cases 的优化。
  • methods: 该研究使用 adversarial generation 方法生成安全关键的驾驶场景,并通过 Stackelberg Driver Model(SDM)模型准确描述了车辆之间的层次结构,以实现 AV 的不断改进。
  • results: 实验表明,该算法在高维场景中表现出色,比基准方法更高效,导致了 AV 的显著进步,同时 continually generating progressively challenging scenarios。
    Abstract The deployment of autonomous vehicles (AVs) has faced hurdles due to the dominance of rare but critical corner cases within the long-tail distribution of driving scenarios, which negatively affects their overall performance. To address this challenge, adversarial generation methods have emerged as a class of efficient approaches to synthesize safety-critical scenarios for AV testing. However, these generated scenarios are often underutilized for AV training, resulting in the potential for continual AV policy improvement remaining untapped, along with a deficiency in the closed-loop design needed to achieve it. Therefore, we tailor the Stackelberg Driver Model (SDM) to accurately characterize the hierarchical nature of vehicle interaction dynamics, facilitating iterative improvement by engaging background vehicles (BVs) and AV in a sequential game-like interaction paradigm. With AV acting as the leader and BVs as followers, this leader-follower modeling ensures that AV would consistently refine its policy, always taking into account the additional information that BVs play the best response to challenge AV. Extensive experiments have shown that our algorithm exhibits superior performance compared to several baselines especially in higher dimensional scenarios, leading to substantial advancements in AV capabilities while continually generating progressively challenging scenarios. Code is available at https://github.com/BlueCat-de/SDM.
    摘要 自带驱动自动车 (AV) 的部署面临了由罕见而重要的角度案例所带来的阻碍,这些角度案例会影响 AV 的总性表现。为解决这个挑战, adversarial 生成方法在 AV 测试中出现了,这些方法可以生成安全关键的驾驶enario。然而,这些生成的场景通常不被用于 AV 训练,导致 AV 政策的持续改进 remained untapped,以及closed-loop 设计的缺失。因此,我们将 Stackelberg 驾驶器模型 (SDM) 改进,以便准确地描述车辆交互动力学的层次结构,从而实现了 iterative 改进,让 AV 在 background 车辆 (BV) 的支持下,通过 sequential 交互模式来精细调整其策略,并且总是考虑 BV 的最佳回应,以挑战 AV。我们的算法在高维度enario中表现出色,比如基eline 特别出色,从而实现了 AV 的重要进步,同时 continually 生成进一步挑战 AV 的场景。代码可以在 上找到。

Combined sizing and layout optimization of truss structures via update Monte Carlo tree search (UMCTS) algorithm

  • paper_url: http://arxiv.org/abs/2309.14231
  • repo_url: None
  • paper_authors: Fu-Yao Ko, Katsuyuki Suzuki, Kazuo Yonekura
  • for: 本研究的主要目标是找到螺栓结构的最佳设计,同时考虑尺寸和布局变量。
  • methods: 本研究使用了一种强化学习方法,名为更新 Monte Carlo 搜索(UMCTS),用于解决螺栓结构的尺寸和布局优化问题。
  • results: 研究表明,使用 UMCTS 方法可以减少计算时间,并且稳定地实现较好的解决方案,比传统方法更好。
    Abstract The main concern of this study is to find the optimal design of truss structures considering sizing and layout variables simultaneously. As compared to purely sizing optimization problems, this problem is more challenging since the two types of variables involved are fundamentally different in nature. In this paper, a reinforcement learning method combining the update process and Monte Carlo tree search called the update Monte Carlo tree search (UMCTS) for sizing optimization problems is applied to solve combined sizing and layout optimization for truss structures. This study proposes a novel update process for nodal coordinates with two features. (1) The allowed range of each coordinate varies in each round. (2) Accelerators for the number of entries in the allowed range and iteration numbers are introduced to reduce the computation time. Furthermore, nodal coordinates and member areas are determined at the same time with only one search tree in each round. The validation and efficiency of the UMCTS are tested on benchmark problems of planar and spatial trusses with discrete sizing variables and continuous layout variables. It is shown that the CPU time of the UMCTS is two times faster than the branch and bound method. The numerical results demonstrate that the proposed method stably achieves a better solution than other traditional methods.
    摘要 本研究的主要担忧是查找螺栓结构的最优设计,同时考虑大小和布局变量。与纯粹的大小优化问题相比,这个问题更加具有挑战性,因为这两种变量的本质不同。在这篇论文中,我们应用了一种combined reinforcement learning method,called update Monte Carlo tree search (UMCTS),解决螺栓结构的大小和布局优化问题。我们提出了一种新的更新过程,其中每个坐标的允许范围在每一轮都不同,同时还引入了加速器来减少计算时间。此外,每一轮都只需要一个搜索树来确定节点坐标和部件面积。我们对标准问题进行验证和效率测试,结果显示,UMCTS的计算时间比branch and bound方法快两倍。数值结果表明,我们提出的方法可稳定地实现更好的解决方案,比传统方法更好。

Implicit Sensing in Traffic Optimization: Advanced Deep Reinforcement Learning Techniques

  • paper_url: http://arxiv.org/abs/2309.14395
  • repo_url: None
  • paper_authors: Emanuel Figetakis, Yahuza Bello, Ahmed Refaey, Lei Lei, Medhat Moussa
    for: 这个论文的目的是解决高速公路突然出现堵塞的问题,使用自动驾驶车辆(AV)的感知器来做出智能决策,以避免因堵塞而导致的延迟。methods: 这个论文使用了深度强化学习(DRL)技术,基于Markov决策过程(MDP)模型,训练RL代理人以适应实际驾驶情况。具体来说,使用了SUMO仿真器和OPENAI GYM评估工具来评估提议模型的性能。results: 结果表明,使用{\epsilon}-抽象策略训练DQN代理人后,其性能明显超过使用Boltzmann策略训练的DQN代理人。
    Abstract A sudden roadblock on highways due to many reasons such as road maintenance, accidents, and car repair is a common situation we encounter almost daily. Autonomous Vehicles (AVs) equipped with sensors that can acquire vehicle dynamics such as speed, acceleration, and location can make intelligent decisions to change lanes before reaching a roadblock. A number of literature studies have examined car-following models and lane-changing models. However, only a few studies proposed an integrated car-following and lane-changing model, which has the potential to model practical driving maneuvers. Hence, in this paper, we present an integrated car-following and lane-changing decision-control system based on Deep Reinforcement Learning (DRL) to address this issue. Specifically, we consider a scenario where sudden construction work will be carried out along a highway. We model the scenario as a Markov Decision Process (MDP) and employ the well-known DQN algorithm to train the RL agent to make the appropriate decision accordingly (i.e., either stay in the same lane or change lanes). To overcome the delay and computational requirement of DRL algorithms, we adopt an MEC-assisted architecture where the RL agents are trained on MEC servers. We utilize the highly reputable SUMO simulator and OPENAI GYM to evaluate the performance of the proposed model under two policies; {\epsilon}-greedy policy and Boltzmann policy. The results unequivocally demonstrate that the DQN agent trained using the {\epsilon}-greedy policy significantly outperforms the one trained with the Boltzmann policy.
    摘要 高速公路上突然出现堵塞,常见的情况之一,可能是道路维护、事故或车辆维修等多种原因。自动驾驶车(AV)配备感知器可以获取车辆动态状态,如速度、加速度和位置,可以做出智能决策,以避免堵塞。许多文献研究了车辆随驾模型和车道变更模型,但只有一些研究提出了集成车辆随驾和车道变更模型,这种模型具有实际驾驶行为的潜在优势。因此,在这篇论文中,我们提出了基于深度强化学习(DRL)的集成车辆随驾和车道变更决策控制系统,以解决这个问题。具体来说,我们考虑了高速公路上突然进行建设工程的情况。我们将这种情况模型为马克夫满度决策过程(MDP),并使用了知名的DQN算法来训练RL代理人进行适当决策(即留在同一个车道或变更车道)。为了解决DRL算法的延迟和计算资源的问题,我们采用了MEC助け的架构,其中RL代理人在MEC服务器上进行训练。我们使用了非常可靠的SUMO仿真器和OPENAI GYM来评估提出的模型的性能,并对两种策略进行评估:{\epsilon}-抽象策略和博尔ツ曼策略。结果明确表明,使用{\epsilon}-抽象策略训练的DQN代理人明显超越使用博尔ツ曼策略训练的DQN代理人。

Multiple Noises in Diffusion Model for Semi-Supervised Multi-Domain Translation

  • paper_url: http://arxiv.org/abs/2309.14394
  • repo_url: None
  • paper_authors: Tsiry Mayet, Simon Bernard, Clement Chatelain, Romain Herault
  • for: 这 paper 的目的是提出一种多个频道的域转换方法,用于在半指导下进行多个域之间的域转换。
  • methods: 这 paper 使用了一种名为 Multi-Domain Diffusion (MDD) 的噪声扩散框架,它不需要定义输入和输出域,可以在任意的域分配中进行域转换(如 $(D_1, D_2)\rightarrow{}D_3$, $D_2\rightarrow{}(D_1, D_3)$, $D_3\rightarrow{}D_1$, 等等),而不需要额外训练每个域配置的模型。
  • results: 这 paper 在一个多个域的Synthetic image translation dataset上进行了实验,并得到了一些有趣的结果。
    Abstract Domain-to-domain translation involves generating a target domain sample given a condition in the source domain. Most existing methods focus on fixed input and output domains, i.e. they only work for specific configurations (i.e. for two domains, either $D_1\rightarrow{}D_2$ or $D_2\rightarrow{}D_1$). This paper proposes Multi-Domain Diffusion (MDD), a conditional diffusion framework for multi-domain translation in a semi-supervised context. Unlike previous methods, MDD does not require defining input and output domains, allowing translation between any partition of domains within a set (such as $(D_1, D_2)\rightarrow{}D_3$, $D_2\rightarrow{}(D_1, D_3)$, $D_3\rightarrow{}D_1$, etc. for 3 domains), without the need to train separate models for each domain configuration. The key idea behind MDD is to leverage the noise formulation of diffusion models by incorporating one noise level per domain, which allows missing domains to be modeled with noise in a natural way. This transforms the training task from a simple reconstruction task to a domain translation task, where the model relies on less noisy domains to reconstruct more noisy domains. We present results on a multi-domain (with more than two domains) synthetic image translation dataset with challenging semantic domain inversion.
    摘要 域到域翻译(Domain-to-domain translation)是生成目标域样本,给定源域的条件。现有的方法都是针对固定的输入和输出域,即只能处理特定的配置(例如 $D_1\to D_2$ 或 $D_2\to D_1$)。这篇论文提出了多域扩散(Multi-Domain Diffusion,MDD),一种基于半supervised的域扩散框架。与先前的方法不同,MDD不需要定义输入和输出域,可以在一个集合(例如 $(D_1, D_2)\to D_3$,$D_2\to (D_1, D_3)$,$D_3\to D_1$ 等)中进行翻译,无需为每个域配置单独训练模型。MDD的关键思想是利用扩散模型的噪声表示,每个域都有一个噪声水平,这使得缺失的域可以自然地被噪声表示。这将训练任务从一个简单的重建任务变为域翻译任务,其中模型通过更加净化的域来重建更加噪声的域。我们在多域(包括更多于两个域)的Synthetic image翻译dataset上进行了实验,并取得了具有挑战性的semantic domain inversion的结果。

Accelerating Machine Learning Algorithms with Adaptive Sampling

  • paper_url: http://arxiv.org/abs/2309.14221
  • repo_url: None
  • paper_authors: Mo Tiwari
  • for: 提高大规模数据处理中机器学习算法的效率。
  • methods: 使用Randomized counterparts instead of computationally intensive subroutines to improve computational efficiency.
  • results: 几乎没有质量下降,但可以大幅提高计算效率。I hope this helps! Let me know if you have any other questions.
    Abstract The era of huge data necessitates highly efficient machine learning algorithms. Many common machine learning algorithms, however, rely on computationally intensive subroutines that are prohibitively expensive on large datasets. Oftentimes, existing techniques subsample the data or use other methods to improve computational efficiency, at the expense of incurring some approximation error. This thesis demonstrates that it is often sufficient, instead, to substitute computationally intensive subroutines with a special kind of randomized counterparts that results in almost no degradation in quality.
    摘要 era of big data 需要非常高效的机器学习算法。然而,许多常见的机器学习算法却依赖于计算昂贵的子routine,对大量数据来说是不可接受的。有时候,现有的技术会采用采样或其他方法来提高计算效率,但这会导致一定的近似错误。这个论文示出,可以在代之前 substitute computationally intensive subroutines with a special kind of randomized counterparts,而不会导致质量下降。

MemDA: Forecasting Urban Time Series with Memory-based Drift Adaptation

  • paper_url: http://arxiv.org/abs/2309.14216
  • repo_url: https://github.com/deepkashiwa20/Urban_Concept_Drift
  • paper_authors: Zekun Cai, Renhe Jiang, Xinyu Yang, Zhaonan Wang, Diansheng Guo, Hiroki Kobayashi, Xuan Song, Ryosuke Shibasaki
  • for: 本研究旨在解决城市时间序列预测中的概念漂移问题,以提高城市智能化的可持续发展。
  • methods: 本研究提出了一种新的城市时间序列预测模型,该模型通过考虑数据周期性并在预测过程中进行协调调整,以适应概念漂移。
  • results: 实验结果表明,本研究的设计在实际数据上显著超越了现有方法,并且可以通过减少预测模型对数据分布变化的敏感性,提高模型的可重用性和泛化能力。
    Abstract Urban time series data forecasting featuring significant contributions to sustainable development is widely studied as an essential task of the smart city. However, with the dramatic and rapid changes in the world environment, the assumption that data obey Independent Identically Distribution is undermined by the subsequent changes in data distribution, known as concept drift, leading to weak replicability and transferability of the model over unseen data. To address the issue, previous approaches typically retrain the model, forcing it to fit the most recent observed data. However, retraining is problematic in that it leads to model lag, consumption of resources, and model re-invalidation, causing the drift problem to be not well solved in realistic scenarios. In this study, we propose a new urban time series prediction model for the concept drift problem, which encodes the drift by considering the periodicity in the data and makes on-the-fly adjustments to the model based on the drift using a meta-dynamic network. Experiments on real-world datasets show that our design significantly outperforms state-of-the-art methods and can be well generalized to existing prediction backbones by reducing their sensitivity to distribution changes.
    摘要 城市时序数据预测 featuring 重要贡献于可持续发展是智能城市广泛研究的必要任务。然而,随着世界环境的剧变和快速变化,假设数据遵循独立同分布(ID)的假设被后续数据分布变化所推翻,导致模型的弱复现和传输性,从而使得随变问题在实际场景中并不得到好的解决。在本研究中,我们提出了一种新的城市时序预测模型,该模型通过考虑数据中的周期性来编码随变,并在随变过程中进行实时调整,使用元动态网络。实验表明,我们的设计在实际数据集上显著超越了现有方法,并且可以将现有预测基础结构降低其对分布变化的敏感性。

Continual Driving Policy Optimization with Closed-Loop Individualized Curricula

  • paper_url: http://arxiv.org/abs/2309.14209
  • repo_url: https://github.com/YizhouXu-THU/CLIC
  • paper_authors: Haoyi Niu, Yizhou Xu, Xingjian Jiang, Jianming Hu
    for: This paper aims to improve the safety of autonomous vehicles (AVs) by developing a continual driving policy optimization framework called Closed-Loop Individualized Curricula (CLIC).methods: The CLIC framework uses a collision prediction task to estimate the chance of AV failures in pre-collected scenarios, and then tailors individualized curricula for downstream training based on these failure probabilities.results: The experimental results show that CLIC surpasses other curriculum-based training strategies in managing risky scenarios while maintaining proficiency in handling simpler cases, demonstrating the effectiveness of the CLIC framework in improving the safety of AVs.Here is the answer in Simplified Chinese text:for: 这篇论文目的是提高自动驾驶车辆(AV)的安全性,通过开发一种循环驾驶政策优化框架——封闭循环个性化课程(CLIC)。methods: CLIC框架使用碰撞预测任务来估计AV失败的可能性,然后基于这些失败概率而tailor个性化课程 для下游训练。results: 实验结果表明,CLIC超过了其他课程基本训练策略,在管理危险场景方面达到了显著改进,而且仍能保持处理简单场景的能力,这表明CLIC框架可以有效地提高AV的安全性。
    Abstract The safety of autonomous vehicles (AV) has been a long-standing top concern, stemming from the absence of rare and safety-critical scenarios in the long-tail naturalistic driving distribution. To tackle this challenge, a surge of research in scenario-based autonomous driving has emerged, with a focus on generating high-risk driving scenarios and applying them to conduct safety-critical testing of AV models. However, limited work has been explored on the reuse of these extensive scenarios to iteratively improve AV models. Moreover, it remains intractable and challenging to filter through gigantic scenario libraries collected from other AV models with distinct behaviors, attempting to extract transferable information for current AV improvement. Therefore, we develop a continual driving policy optimization framework featuring Closed-Loop Individualized Curricula (CLIC), which we factorize into a set of standardized sub-modules for flexible implementation choices: AV Evaluation, Scenario Selection, and AV Training. CLIC frames AV Evaluation as a collision prediction task, where it estimates the chance of AV failures in these scenarios at each iteration. Subsequently, by re-sampling from historical scenarios based on these failure probabilities, CLIC tailors individualized curricula for downstream training, aligning them with the evaluated capability of AV. Accordingly, CLIC not only maximizes the utilization of the vast pre-collected scenario library for closed-loop driving policy optimization but also facilitates AV improvement by individualizing its training with more challenging cases out of those poorly organized scenarios. Experimental results clearly indicate that CLIC surpasses other curriculum-based training strategies, showing substantial improvement in managing risky scenarios, while still maintaining proficiency in handling simpler cases.
    摘要 自动驾驶车辆(AV)的安全性问题一直是长期的主要担忧,这是因为自然驾驶驾驶分布中罕见的危险和安全关键场景的缺失。为解决这个挑战,自动驾驶场景研究有了很大的干预,关注生成高风险驾驶场景,并应用其进行安全检测自动驾驶模型。然而,有限的研究是关于重复这些广泛的场景来进一步改进AV模型。此外,从其他AV模型的巨大场景库中挑选有用信息是困难和挑战的。因此,我们开发了一个基于closed-loop个性化课程(CLIC)的驱动策略优化框架,它可以分解为以下几个标准化子模块:AV评估、场景选择和AV培训。在CLIC中,AV评估被设置为预测AV失败的概率任务,每轮评估AV在这些场景中的失败概率,然后根据这些概率重新采样历史场景,为下游培训生成个性化课程,使AV的培训更加个性化,与评估其能力相匹配。因此,CLIC不仅可以最大化已收集的历史场景库的利用,同时也可以通过个性化培训,提高AV在危险场景中的管理能力,而不会妨碍其在简单场景中的运作。实验结果表明,CLIC超越了其他课程基本培训策略,在管理危险场景方面显示了明显的改进,而且仍能保持简单场景中的运作效率。

Framework based on complex networks to model and mine patient pathways

  • paper_url: http://arxiv.org/abs/2309.14208
  • repo_url: https://github.com/caroline-rosa/framework_patient_pathways
  • paper_authors: Caroline de Oliveira Costa Souza Rosa, Márcia Ito, Alex Borges Vieira, Klaus Wehmuth, Antônio Tadeu Azevedo Gomes
  • for: 这个研究旨在自动发现患者群体的医疗系统历史记录,以提高医疗质量和效率。
  • methods: 该研究提出了一个框架,包括多方面图模型、基于时间的不同程度衡量方法和基于传统中心度指标的挖掘方法。
  • results: 研究在孕综和糖尿病两个例子中证明了该框架的有用性,可以找到相似路径集合、简洁表示路径和按照多个视角显示最重要的 Pattern。
    Abstract The automatic discovery of a model to represent the history of encounters of a group of patients with the healthcare system -- the so-called "pathway of patients" -- is a new field of research that supports clinical and organisational decisions to improve the quality and efficiency of the treatment provided. The pathways of patients with chronic conditions tend to vary significantly from one person to another, have repetitive tasks, and demand the analysis of multiple perspectives (interventions, diagnoses, medical specialities, among others) influencing the results. Therefore, modelling and mining those pathways is still a challenging task. In this work, we propose a framework comprising: (i) a pathway model based on a multi-aspect graph, (ii) a novel dissimilarity measurement to compare pathways taking the elapsed time into account, and (iii) a mining method based on traditional centrality measures to discover the most relevant steps of the pathways. We evaluated the framework using the study cases of pregnancy and diabetes, which revealed its usefulness in finding clusters of similar pathways, representing them in an easy-to-interpret way, and highlighting the most significant patterns according to multiple perspectives.
    摘要 自动发现患者群体对医疗系统的互动历史模型 -- 称之为"患者路径" -- 是一个新的研究领域,用于支持临床和组织决策,以提高治疗质量和效率。患者的路径通常在不同人群中有很大差异,具有重复的任务和多个视角(如 intervenciones、诊断、医学专业等)的影响。因此,模型和挖掘这些路径仍然是一项挑战。在这项工作中,我们提出了以下框架:(i)基于多方面图的路径模型,(ii)基于时间因素的不同度量来比较路径,以及(iii)基于传统中心度量来挖掘路径中最重要的步骤。我们使用了孕期和糖尿病两个案例进行评估,发现该框架可以快速找到相似路径集,将其易于理解地表示出来,并高亮多个视角中的重要特征。

LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language Models

  • paper_url: http://arxiv.org/abs/2309.14393
  • repo_url: https://github.com/sotarokaneda/mlcarbon
  • paper_authors: Ahmad Faiz, Sotaro Kaneda, Ruhan Wang, Rita Osi, Parteek Sharma, Fan Chen, Lei Jiang
  • for: 这研究旨在提供一个能够精准计算大语言模型(LLM)训练过程中的碳脚印,包括操作和嵌入碳脚印,以及新的 нейрон网络设计的碳脚印预测模型。
  • methods: 该研究使用了一种名为\textit{LLMCarbon}的端到端碳脚印预测模型,可以对 dense 和 mixture-of-experts(MoE) LLMs 进行碳脚印预测。与之前的研究mlco2相比,\textit{LLMCarbon} 能够更好地预测不同 LLMs 的碳脚印。
  • results: 对于不同的 LLMs,\textit{LLMCarbon} 能够提供更高的预测精度,并且可以模型出操作和嵌入碳脚印,以及新的 нейрон网络设计的碳脚印。
    Abstract The carbon footprint associated with large language models (LLMs) is a significant concern, encompassing emissions from their training, inference, experimentation, and storage processes, including operational and embodied carbon emissions. An essential aspect is accurately estimating the carbon impact of emerging LLMs even before their training, which heavily relies on GPU usage. Existing studies have reported the carbon footprint of LLM training, but only one tool, mlco2, can predict the carbon footprint of new neural networks prior to physical training. However, mlco2 has several serious limitations. It cannot extend its estimation to dense or mixture-of-experts (MoE) LLMs, disregards critical architectural parameters, focuses solely on GPUs, and cannot model embodied carbon footprints. Addressing these gaps, we introduce \textit{LLMCarbon}, an end-to-end carbon footprint projection model designed for both dense and MoE LLMs. Compared to mlco2, LLMCarbon significantly enhances the accuracy of carbon footprint estimations for various LLMs.
    摘要 Large language models (LLMs) 的碳 hoofprint 是一个重要的问题,包括训练、推理、实验和存储过程中的碳排放,包括运行和嵌入碳排放。一个重要的方面是在新的 LLM 出现之前已经准确地估算其碳影响,这主要取决于 GPU 使用情况。现有的研究已经报告了 LLM 训练的碳排放,但只有一个工具,mlco2,可以在物理训练之前预测新的神经网络的碳排放。然而,mlco2 有多个严重的限制。它无法扩展到 dense 或 mixture-of-experts (MoE) LLMs,忽略了关键的建筑 Parameters,围绕 GPU 进行固定的注意力,并不能模拟嵌入碳排放。为了解决这些缺陷,我们介绍了 \textit{LLMCarbon},一个针对 dense 和 MoE LLMs 的碳排放预测模型。与 mlco2 相比,LLMCarbon 可以对不同类型的 LLMs 提供更高精度的碳排放估算。

Species196: A One-Million Semi-supervised Dataset for Fine-grained Species Recognition

  • paper_url: http://arxiv.org/abs/2309.14183
  • repo_url: https://github.com/Species-Dataset/species-dataset.github.io
  • paper_authors: Wei He, Kai Han, Ying Nie, Chengcheng Wang, Yunhe Wang
  • for: 本研究旨在提供大规模的 semi-supervised 数据集,用于驱逐物种识别领域的深度学习基础模型开发。
  • methods: 本研究使用 semi-supervised 学习方法,包括 Species196-L 和 Species196-U 两个数据集,以及四种 эксперименталь设定:超级vised 学习、semi-supervised 学习、自我supervised 预训练和 zero-shot 推理。
  • results: 本研究通过对 Species196 数据集的 represntative 方法进行实证研究,以评估这些方法在驱逐物种识别领域的表现。
    Abstract The development of foundation vision models has pushed the general visual recognition to a high level, but cannot well address the fine-grained recognition in specialized domain such as invasive species classification. Identifying and managing invasive species has strong social and ecological value. Currently, most invasive species datasets are limited in scale and cover a narrow range of species, which restricts the development of deep-learning based invasion biometrics systems. To fill the gap of this area, we introduced Species196, a large-scale semi-supervised dataset of 196-category invasive species. It collects over 19K images with expert-level accurate annotations Species196-L, and 1.2M unlabeled images of invasive species Species196-U. The dataset provides four experimental settings for benchmarking the existing models and algorithms, namely, supervised learning, semi-supervised learning, self-supervised pretraining and zero-shot inference ability of large multi-modal models. To facilitate future research on these four learning paradigms, we conduct an empirical study of the representative methods on the introduced dataset. The dataset is publicly available at https://species-dataset.github.io/.
    摘要 开发基础视觉模型已经提高了普通视识能力到高水平,但无法良好地解决特殊领域的细腻识别。识别和管理入侵物种有着强烈的社会和生态价值。目前,大多数入侵物种数据集都具有有限的规模和局部的种类覆盖率,这限制了深入学习基于入侵生物ometrics系统的发展。为了填补这一领域的空白,我们引入了Species196数据集,这是一个大规模的半指导式数据集,收集了196类入侵物种的19K多张图像,其中Expert-level准确标注 Species196-L,以及1.2万张不标注的入侵物种图像 Species196-U。该数据集提供了四种实验设置,用于测试现有模型和算法的性能,即:指导学习、半指导学习、自动预训练和大多模式模型的零码推理能力。为了促进未来关于这四种学习方法的研究,我们进行了 Species196 数据集上的实验研究。该数据集公开可用于

Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision

  • paper_url: http://arxiv.org/abs/2309.14181
  • repo_url: https://github.com/Q-Future/Q-Bench
  • paper_authors: Haoning Wu, Zicheng Zhang, Erli Zhang, Chaofeng Chen, Liang Liao, Annan Wang, Chunyi Li, Wenxiu Sun, Qiong Yan, Guangtao Zhai, Weisi Lin
  • for: 这个论文是为了评估多Modal Large Language Models (MLLMs)在低级别视觉理解和描述能力方面的能力而写的。
  • methods: 这个论文使用了以下方法:constructed LLVisionQA dataset,proposed LLDescribe dataset,和一种基于GPT的比较管道来评估MLLMs的描述能力。
  • results: 这个论文的结果表明MLLMs具有初步的低级别视觉能力,但这些能力还是不稳定和不准确的,需要进一步的提升。
    Abstract The rapid evolution of Multi-modality Large Language Models (MLLMs) has catalyzed a shift in computer vision from specialized models to general-purpose foundation models. Nevertheless, there is still an inadequacy in assessing the abilities of MLLMs on low-level visual perception and understanding. To address this gap, we present Q-Bench, a holistic benchmark crafted to systematically evaluate potential abilities of MLLMs on three realms: low-level visual perception, low-level visual description, and overall visual quality assessment. a) To evaluate the low-level perception ability, we construct the LLVisionQA dataset, consisting of 2,990 diverse-sourced images, each equipped with a human-asked question focusing on its low-level attributes. We then measure the correctness of MLLMs on answering these questions. b) To examine the description ability of MLLMs on low-level information, we propose the LLDescribe dataset consisting of long expert-labelled golden low-level text descriptions on 499 images, and a GPT-involved comparison pipeline between outputs of MLLMs and the golden descriptions. c) Besides these two tasks, we further measure their visual quality assessment ability to align with human opinion scores. Specifically, we design a softmax-based strategy that enables MLLMs to predict quantifiable quality scores, and evaluate them on various existing image quality assessment (IQA) datasets. Our evaluation across the three abilities confirms that MLLMs possess preliminary low-level visual skills. However, these skills are still unstable and relatively imprecise, indicating the need for specific enhancements on MLLMs towards these abilities. We hope that our benchmark can encourage the research community to delve deeper to discover and enhance these untapped potentials of MLLMs. Project Page: https://vqassessment.github.io/Q-Bench.
    摘要 <>转换文本到简化中文。<>多Modal Large Language Models(MLLMs)的快速EVOLUTION catalyzed a shift from specialized models to general-purpose foundation models in computer vision. However, there is still an inadequacy in assessing the abilities of MLLMs on low-level visual perception and understanding. To address this gap, we present Q-Bench, a comprehensive benchmark crafted to systematically evaluate the potential abilities of MLLMs on three realms: low-level visual perception, low-level visual description, and overall visual quality assessment.a) To evaluate the low-level perception ability, we construct the LLVisionQA dataset, consisting of 2,990 diverse-sourced images, each equipped with a human-asked question focusing on its low-level attributes. We then measure the correctness of MLLMs on answering these questions.b) To examine the description ability of MLLMs on low-level information, we propose the LLDescribe dataset consisting of long expert-labelled golden low-level text descriptions on 499 images, and a GPT-involved comparison pipeline between outputs of MLLMs and the golden descriptions.c) Besides these two tasks, we further measure their visual quality assessment ability to align with human opinion scores. Specifically, we design a softmax-based strategy that enables MLLMs to predict quantifiable quality scores, and evaluate them on various existing image quality assessment (IQA) datasets. Our evaluation across the three abilities confirms that MLLMs possess preliminary low-level visual skills. However, these skills are still unstable and relatively imprecise, indicating the need for specific enhancements on MLLMs towards these abilities. We hope that our benchmark can encourage the research community to delve deeper to discover and enhance these untapped potentials of MLLMs. Project Page: .

Data Upcycling Knowledge Distillation for Image Super-Resolution

  • paper_url: http://arxiv.org/abs/2309.14162
  • repo_url: None
  • paper_authors: Yun Zhang, Wei Li, Simiao Li, Jie Hu, Hanting Chen, Hailing Wang, Zhijun Tu, Wenjia Wang, Bingyi Jing, Yunhe Wang
  • for: 这个论文旨在提出一种基于有效数据利用的知识储存抽象(DUKD)方法,以提高单个图像超分解(SISR)学习模型的表现。
  • methods: 该方法利用了两种有效的图像缩放操作和可逆数据增强操作,通过引入标签一致常数化来加强知识储存抽象的效果。
  • results: 对于多个 benchmark 测试,DUKD 方法可以明显超过基eline方法,例如PSNR 指标提高0.5dB,并且减少了 RCAN 模型的参数量,但是其表现与 RCAN 教师模型相当。
    Abstract Knowledge distillation (KD) emerges as a challenging yet promising technique for compressing deep learning models, characterized by the transmission of extensive learning representations from proficient and computationally intensive teacher models to compact student models. However, only a handful of studies have endeavored to compress the models for single image super-resolution (SISR) through KD, with their effects on student model enhancement remaining marginal. In this paper, we put forth an approach from the perspective of efficient data utilization, namely, the Data Upcycling Knowledge Distillation (DUKD) which facilitates the student model by the prior knowledge teacher provided via upcycled in-domain data derived from their inputs. This upcycling process is realized through two efficient image zooming operations and invertible data augmentations which introduce the label consistency regularization to the field of KD for SISR and substantially boosts student model's generalization. The DUKD, due to its versatility, can be applied across a broad spectrum of teacher-student architectures. Comprehensive experiments across diverse benchmarks demonstrate that our proposed DUKD method significantly outperforms previous art, exemplified by an increase of up to 0.5dB in PSNR over baselines methods, and a 67% parameters reduced RCAN model's performance remaining on par with that of the RCAN teacher model.
    摘要 知识储备(KD)技术为深度学习模型压缩,涉及教师模型传递丰富的学习表示,以提高学生模型的表达能力。然而,只有一些研究利用KD技术进行单张图像超分辨(SISR)压缩,其影响于学生模型的提高仍然较有限。本文提出了一种基于有效数据利用的方法,即数据升级知识储备(DUKD),通过教师模型提供的先前知识,对学生模型进行升级。这个升级过程通过两种高效的图像缩放操作和可逆数据增强来实现,并在KD领域中引入标签一致化规则。DUKD方法因其灵活性,可以应用于多种教师-学生架构。实验结果表明,我们提出的DUKD方法在多个标准benchmark上达到了 significanly更高的PSNR水平,相比基eline方法,提高了67%的参数量,而RCAN教师模型的性能仍然与RCAN教师模型相当。

SPIRT: A Fault-Tolerant and Reliable Peer-to-Peer Serverless ML Training Architecture

  • paper_url: http://arxiv.org/abs/2309.14148
  • repo_url: None
  • paper_authors: Amine Barrak, Mayssa Jaziri, Ranim Trabelsi, Fehmi Jaafar, Fabio Petrillo
  • for: 这篇论文旨在探讨 Parametric Serverless Distributed Machine Learning (PSDML) 技术,尤其是在 P2P 分布式学习环境中。
  • methods: 这篇论文提出了一种基于 RedisAI 的 P2P 分布式学习架构,名为 SPIRT,以实现 fault-tolerant、可靠和安全的分布式机器学习训练。
  • results: SPIRT 架构可以减少模型更新和梯度平均所需时间的82%,并且具有抗坏 peer 和新 peer 集成的能力,同时保证了分布式机器学习任务的安全性。
    Abstract The advent of serverless computing has ushered in notable advancements in distributed machine learning, particularly within parameter server-based architectures. Yet, the integration of serverless features within peer-to-peer (P2P) distributed networks remains largely uncharted. In this paper, we introduce SPIRT, a fault-tolerant, reliable, and secure serverless P2P ML training architecture. designed to bridge this existing gap. Capitalizing on the inherent robustness and reliability innate to P2P systems, SPIRT employs RedisAI for in-database operations, leading to an 82\% reduction in the time required for model updates and gradient averaging across a variety of models and batch sizes. This architecture showcases resilience against peer failures and adeptly manages the integration of new peers, thereby highlighting its fault-tolerant characteristics and scalability. Furthermore, SPIRT ensures secure communication between peers, enhancing the reliability of distributed machine learning tasks. Even in the face of Byzantine attacks, the system's robust aggregation algorithms maintain high levels of accuracy. These findings illuminate the promising potential of serverless architectures in P2P distributed machine learning, offering a significant stride towards the development of more efficient, scalable, and resilient applications.
    摘要 来自服务器无法 computing的启示,导致分布式机器学习中的分布式机器学习架构得到了重要的进步,特别是在基于参数服务器的架构中。然而,在对等(P2P)分布式网络中 интеGRATION of serverless特性仍然largely unexplored。在这篇论文中,我们引入SPIRT,一个可靠、可靠性和安全的服务器无法分布式机器学习训练架构。通过利用P2P系统的自然强大和可靠性,SPIRT使用RedisAI进行库操作,从而实现82%的模型更新和梯度平均时间优化。这个架构展示了对 peer 失败的抗衰变和新 peer 的适应能力,彰显其可靠性和可扩展性。此外,SPIRT确保了peer之间的安全通信,进一步提高了分布式机器学习任务的可靠性。甚至在面对拜尼黑攻击时,系统的坚固的总和算法可以保持高水平的准确性。这些发现探讨了服务器无法架构在P2P分布式机器学习中的应用前景,提供了一个重要的进步。

Exploring the Impact of Serverless Computing on Peer To Peer Training Machine Learning

  • paper_url: http://arxiv.org/abs/2309.14139
  • repo_url: https://github.com/aminebarrak/peertopeerserverless
  • paper_authors: Amine Barrak, Ranim Trabelsi, Fehmi Jaafar, Fabio Petrillo
  • for: 该论文主要旨在提出一种基于服务器レス计算和分布式训练的新架构,以提高分布式训练的可扩展性和容错性。
  • methods: 该论文使用了分布式 gradient computation 技术,并提出了一种基于 serverless computing 的高效并发分布式训练方法。
  • results: 研究发现,与传统分布式训练方法相比,该方法可以提高分布式训练的 gradient computation 时间,最高可达 97.34% 的提升。然而,在资源约束下,服务器レス架构可能带来更高的成本,最高达 5.4 倍于实例基础架构。
    Abstract The increasing demand for computational power in big data and machine learning has driven the development of distributed training methodologies. Among these, peer-to-peer (P2P) networks provide advantages such as enhanced scalability and fault tolerance. However, they also encounter challenges related to resource consumption, costs, and communication overhead as the number of participating peers grows. In this paper, we introduce a novel architecture that combines serverless computing with P2P networks for distributed training and present a method for efficient parallel gradient computation under resource constraints. Our findings show a significant enhancement in gradient computation time, with up to a 97.34\% improvement compared to conventional P2P distributed training methods. As for costs, our examination confirmed that the serverless architecture could incur higher expenses, reaching up to 5.4 times more than instance-based architectures. It is essential to consider that these higher costs are associated with marked improvements in computation time, particularly under resource-constrained scenarios. Despite the cost-time trade-off, the serverless approach still holds promise due to its pay-as-you-go model. Utilizing dynamic resource allocation, it enables faster training times and optimized resource utilization, making it a promising candidate for a wide range of machine learning applications.
    摘要 随着大数据和机器学习的需求增长,分布式训练方法得到了广泛应用。在这些方法中,点对点(P2P)网络具有提高可扩展性和fault tolerance的优势。然而,随着参与者的增加,P2P网络也面临资源占用、成本和通信开销的挑战。在这篇论文中,我们介绍了一种新的架构,即无服务器计算与P2P网络的结合,用于分布式训练。我们还提出了一种高效的并发梯度计算方法,以适应资源限制的情况。我们的研究表明,在资源限制情况下,使用无服务器架构可以提高梯度计算时间,最多达97.34%。相比传统的P2P分布式训练方法。虽然无服务器架构可能会增加成本,但是这些成本与计算时间之间的trade-off很明显。尤其是在资源受限的情况下,无服务器架构仍然保持了优势。通过动态资源分配,它可以减少训练时间并优化资源利用,使其成为许多机器学习应用的优选。

Small Objects Matters in Weakly-supervised Semantic Segmentation

  • paper_url: http://arxiv.org/abs/2309.14117
  • repo_url: None
  • paper_authors: Cheolhyun Mun, Sanghuk Lee, Youngjung Uh, Junsuk Choe, Hyeran Byun
  • for: 本研究旨在提供一种全面评估不同 объек 大小的 semantic segmentation 方法的评价指标,以及一个大小均衡的评估集,以便评估不同的 object size 下的 semantic segmentation 方法表现。
  • methods: 本研究提出了一种新的评价指标,以及一种大小均衡的 cross-entropy 损失函数,以及一种适当的训练策略,以解决现有的 semantic segmentation 方法在小对象上的表现不佳问题。
  • results: 对于十个基准方法在三个不同的 datasets 上进行了评估,研究发现现有的 semantic segmentation 方法在小对象上的表现不佳,而新提出的大小均衡 cross-entropy 损失函数和适当的训练策略可以改善现有的 semantic segmentation 方法表现。
    Abstract Weakly-supervised semantic segmentation (WSSS) performs pixel-wise classification given only image-level labels for training. Despite the difficulty of this task, the research community has achieved promising results over the last five years. Still, current WSSS literature misses the detailed sense of how well the methods perform on different sizes of objects. Thus we propose a novel evaluation metric to provide a comprehensive assessment across different object sizes and collect a size-balanced evaluation set to complement PASCAL VOC. With these two gadgets, we reveal that the existing WSSS methods struggle in capturing small objects. Furthermore, we propose a size-balanced cross-entropy loss coupled with a proper training strategy. It generally improves existing WSSS methods as validated upon ten baselines on three different datasets.
    摘要 弱监督semantic segmentation(WSSS)在给定图像级别标签的情况下进行像素级分类。虽然这是一项复杂的任务,但过去五年研究社区已经取得了可喜的成果。然而,现有WSSS литераature缺乏对不同物体大小的详细评估。因此,我们提出了一种新的评估度量,并收集了一个Size-balanced评估集,以完善PASCAL VOC。通过这两个工具,我们发现现有WSSS方法在捕捉小物体方面努力不足。此外,我们提出了一种Size-balancedcross-entropy损失函数,并与适当的训练策略相结合。它通常会改进现有WSSS方法,并在三个不同的dataset上验证了十个基elines。

Semi-Abstract Value-Based Argumentation Framework

  • paper_url: http://arxiv.org/abs/2309.14112
  • repo_url: None
  • paper_authors: Jovan Jeromela
  • for: 本论文主要研究的是abstract argumentation frameworks的扩展和应用。
  • methods: 本论文使用了value-based argumentation framework和semi-abstract argumentation framework两种扩展,它们增加了对Arguments的结构化表示。
  • results: 本论文提出了一种新的semi-abstract value-based argumentation framework,该框架可以将 Proposition associate with individual arguments映射到一组排序的值上,并通过新引入的攻击原则来使得隐式攻击变得明确。此外,本论文还使用了这两种框架来形ulate一个复杂的道德困境。
    Abstract In his seminal paper, Phan Minh Dung (1995) proposed abstract argumentation framework, which models argumentation using directed graphs where structureless arguments are the nodes and attacks among the arguments are the edges. In the following years, many extensions of this framework were introduced. These extensions typically add a certain form of structure to the arguments. This thesis showcases two such extensions -- value-based argumentation framework by Trevor Bench-Capon (2002) and semi-abstract argumentation framework by Esther Anna Corsi and Christian Ferm\"uller (2017). The former introduces a mapping function that links individual arguments to a set of ordered values, enabling a distinction between objectively and subjectively acceptable arguments. The latter links claims of individual arguments to propositional formulae and then applies newly-introduced attack principles in order to make implicit attacks explicit and to enable a definition of a consequence relation that relies on neither the truth values nor the interpretations in the usual sense. The contribution of this thesis is two-fold. Firstly, the new semi-abstract value-based argumentation framework is introduced. This framework maps propositional formulae associated with individual arguments to a set of ordered values. Secondly, a complex moral dilemma is formulated using the original and the value-based argumentation frameworks showcasing the expressivity of these formalisms.
    摘要 Phan Minh Dung(1995)提出了抽象论证框架,该框架使用导航图模型了论证,其中无结构的论证是图节点,而论证之间的攻击是图边。后来,许多对这种框架的扩展都被引入。这些扩展通常增加了论证的某种结构。本论文介绍了两种这种扩展:基于值的论证框架(Trevor Bench-Capon,2002)和半抽象论证框架(Esther Anna Corsi和Christian Fermüller,2017)。前者引入了一个映射函数,该函数将 individuak 论证映射到一个排序的值集中,以便分辨 объекively 和 subjectively 可接受的论证。后者将各个论证的laims链接到 propositional 式中,然后应用新引入的攻击原则,以使隐式攻击显式化,并使得定义一种基于真值和解释的后果关系。本论文的贡献有两个方面。首先,本论文引入了一种新的半抽象值基论证框架,该框架将 propositional 式与值集相关联。其次,通过原始论证框架和值基论证框架,形ulated 一个复杂的道德困境示例,以示这两种形式主义的表达能力。

Comprehensive Overview of Named Entity Recognition: Models, Domain-Specific Applications and Challenges

  • paper_url: http://arxiv.org/abs/2309.14084
  • repo_url: None
  • paper_authors: Kalyani Pakhale
  • for: 本文旨在探讨Named Entity Recognition(NER)技术的发展和应用,尤其是将传统rule-based策略与当今AI技术相结合,以提高NER的准确率和泛化能力。
  • methods: 本文涵盖了NER的基本概念、传统技术和当今AI技术的应用,包括BERT、LSTM和CNN等。特别是在领域化NER模型方面,本文强调了适应性的重要性,并提出了域специфи互调模型的概念。
  • results: 本文通过实践示例和数据分析,证明了NER技术在金融和生物医学等领域的应用,提高了自动化文本分类和结构化抽取的精度和效率。同时,本文还探讨了NER技术的未来发展和挑战,提出了一些未来研究的可能性和方向。
    Abstract In the domain of Natural Language Processing (NLP), Named Entity Recognition (NER) stands out as a pivotal mechanism for extracting structured insights from unstructured text. This manuscript offers an exhaustive exploration into the evolving landscape of NER methodologies, blending foundational principles with contemporary AI advancements. Beginning with the rudimentary concepts of NER, the study spans a spectrum of techniques from traditional rule-based strategies to the contemporary marvels of transformer architectures, particularly highlighting integrations such as BERT with LSTM and CNN. The narrative accentuates domain-specific NER models, tailored for intricate areas like finance, legal, and healthcare, emphasizing their specialized adaptability. Additionally, the research delves into cutting-edge paradigms including reinforcement learning, innovative constructs like E-NER, and the interplay of Optical Character Recognition (OCR) in augmenting NER capabilities. Grounding its insights in practical realms, the paper sheds light on the indispensable role of NER in sectors like finance and biomedicine, addressing the unique challenges they present. The conclusion outlines open challenges and avenues, marking this work as a comprehensive guide for those delving into NER research and applications.
    摘要 在自然语言处理(NLP)领域,命名实体识别(NER)作为提取结构化知识从未结构化文本中的重要机制,这篇论文对NER方法的发展进行了极其广泛的探讨,结合了基础原则和当代人工智能技术。这篇论文从传统的规则基础的斜笔概念开始,涵盖了从传统的字符串处理技术到当代的变换器架构,特别是BERT与LSTM和CNN的集成。研究着重点在各个领域中特化的NER模型,如金融、法律和医疗等,强调其特殊适应性。此外,研究还探讨了当前的前沿方法,如强化学习、创新的构造和E-NER,以及Optical Character Recognition(OCR)在NER能力的增强中的作用。以实际场景为基础,论文探讨了NER在金融和生物医学等领域的不可或缺的作用,解决这些领域所存在的特殊挑战。结尾,论文概述了目前的开放挑战和前瞻,用作NER研究和应用的全面指南。

ODE-based Recurrent Model-free Reinforcement Learning for POMDPs

  • paper_url: http://arxiv.org/abs/2309.14078
  • repo_url: None
  • paper_authors: Xuanle Zhao, Duzhen Zhang, Liyuan Han, Tielin Zhang, Bo Xu
  • for: 解决部分可见(PO)环境中的不可见信息推理问题,提高agent的决策能力。
  • methods: 使用循环策略与紧凑上下文,基于上下文抽象学习(Context-based reinforcement learning)来提取历史转移中的不可见信息。
  • results: 通过结合ODEs和无约束RL框架,在POMDPs中解决部分可见控制和meta-RL任务,并在不规则观察数据上进行了实验验证。
    Abstract Neural ordinary differential equations (ODEs) are widely recognized as the standard for modeling physical mechanisms, which help to perform approximate inference in unknown physical or biological environments. In partially observable (PO) environments, how to infer unseen information from raw observations puzzled the agents. By using a recurrent policy with a compact context, context-based reinforcement learning provides a flexible way to extract unobservable information from historical transitions. To help the agent extract more dynamics-related information, we present a novel ODE-based recurrent model combines with model-free reinforcement learning (RL) framework to solve partially observable Markov decision processes (POMDPs). We experimentally demonstrate the efficacy of our methods across various PO continuous control and meta-RL tasks. Furthermore, our experiments illustrate that our method is robust against irregular observations, owing to the ability of ODEs to model irregularly-sampled time series.
    摘要

Maximum Likelihood Estimation of Latent Variable Structural Equation Models: A Neural Network Approach

  • paper_url: http://arxiv.org/abs/2309.14073
  • repo_url: None
  • paper_authors: Mehrzad Saremi
  • for: 这个论文是为了提出一种稳定的图形结构模型,可以在linearity和Gaussianity假设下保持稳定。
  • methods: 该论文使用了一种基于GPU的算法,用于计算最大 likelihood estimation 的这些模型。
  • results: 该论文表明,计算最大 likelihood estimation 的这些模型等价于训练一个神经网络。
    Abstract We propose a graphical structure for structural equation models that is stable under marginalization under linearity and Gaussianity assumptions. We show that computing the maximum likelihood estimation of this model is equivalent to training a neural network. We implement a GPU-based algorithm that computes the maximum likelihood estimation of these models.
    摘要 我们提出了一种图解结构,用于结构方程模型,该结构在 Linearity 和 Gaussianity 假设下是稳定的。我们表明计算最大likelihood估计这种模型的过程与训练神经网络相同。我们实现了基于GPU的算法,用于计算这种模型的最大likelihood估计。Note: "Linearity" and "Gaussianity" are not exact translations of the English words, but they are commonly used terms in statistics and machine learning to refer to the assumptions of linearity and normality, respectively.

Adapt then Unlearn: Exploiting Parameter Space Semantics for Unlearning in Generative Adversarial Networks

  • paper_url: http://arxiv.org/abs/2309.14054
  • repo_url: None
  • paper_authors: Piyush Tiwary, Atri Guha, Subhodip Panda, Prathosh A. P
  • for: 防止深度生成模型生成包含不良、袋陋或危险内容的输出。
  • methods: 基于用户提供的负例进行适应,然后使用排斥正则化训练已经适应模型,以快速忘记特定不良特征。
  • results: 验证了方法的有效性,能够快速、高效地忘记深度生成模型中不良特征,同时保持生成样本质量。
    Abstract The increased attention to regulating the outputs of deep generative models, driven by growing concerns about privacy and regulatory compliance, has highlighted the need for effective control over these models. This necessity arises from instances where generative models produce outputs containing undesirable, offensive, or potentially harmful content. To tackle this challenge, the concept of machine unlearning has emerged, aiming to forget specific learned information or to erase the influence of undesired data subsets from a trained model. The objective of this work is to prevent the generation of outputs containing undesired features from a pre-trained GAN where the underlying training data set is inaccessible. Our approach is inspired by a crucial observation: the parameter space of GANs exhibits meaningful directions that can be leveraged to suppress specific undesired features. However, such directions usually result in the degradation of the quality of generated samples. Our proposed method, known as 'Adapt-then-Unlearn,' excels at unlearning such undesirable features while also maintaining the quality of generated samples. This method unfolds in two stages: in the initial stage, we adapt the pre-trained GAN using negative samples provided by the user, while in the subsequent stage, we focus on unlearning the undesired feature. During the latter phase, we train the pre-trained GAN using positive samples, incorporating a repulsion regularizer. This regularizer encourages the model's parameters to be away from the parameters associated with the adapted model from the first stage while also maintaining the quality of generated samples. To the best of our knowledge, our approach stands as first method addressing unlearning in GANs. We validate the effectiveness of our method through comprehensive experiments.
    摘要 “随着深度生成模型的输出控制需求的增加,导致了关于隐私和合规遵循的担忧。这些担忧的来源是深度生成模型生成的内容中可能包含不适合、歧视或可能危害的内容。为了解决这个挑战,机器忘记(Machine Unlearning)的概念已经出现,旨在忘记特定学习的信息或从已训练的模型中除去不适合的数据子集。我们的目标是防止从已训练的GAN(生成推导网络)中生成包含不适合特征的出力。我们的方法是根据GAN的参数空间展现意义的方向来抑制不适合的特征。但是,这些方向通常会导致生成的样本质量下降。我们的提案方法,称为“Adapt-then-Unlearn”,能够忘记不适合的特征而保持生成的质量。这个方法分成两个阶段:在首先阶段,我们适应已训练的GAN使用用户提供的负面样本,而在后续阶段,我们专注于忘记不适合的特征。在这个阶段中,我们使用正常化器来训练已训练的GAN,并且将这些参数导向远离已适应的模型参数。我们的方法是首个对GAN进行忘记的方法。我们透过广泛的实验证明了我们的方法的有效性。”

Revisiting LARS for Large Batch Training Generalization of Neural Networks

  • paper_url: http://arxiv.org/abs/2309.14053
  • repo_url: None
  • paper_authors: Khoi Do, Duong Nguyen, Hoa Nguyen, Long Tran-Thanh, Quoc-Viet Pham
  • for: 本文旨在研究Large Batch Learning(LBL)中的稳定性问题,特别是AI训练过程中捕捉到锐 minimum 的问题。
  • methods: 本文使用了LARS和LAMB两种广泛使用的技术,以及一种热启动策略。
  • results: 实验表明,TVLARS可以在不使用热启动策略的情况下实现稳定的训练,并且在使用热启动策略时可以与LARS和LAMB相比赢得竞争性的成绩。
    Abstract LARS and LAMB have emerged as prominent techniques in Large Batch Learning (LBL), ensuring the stability of AI training. One of the primary challenges in LBL is convergence stability, where the AI agent usually gets trapped into the sharp minimizer. Addressing this challenge, a relatively recent technique, known as warm-up, has been employed. However, warm-up lacks a strong theoretical foundation, leaving the door open for further exploration of more efficacious algorithms. In light of this situation, we conduct empirical experiments to analyze the behaviors of the two most popular optimizers in the LARS family: LARS and LAMB, with and without a warm-up strategy. Our analyses give us a comprehension of the novel LARS, LAMB, and the necessity of a warm-up technique in LBL. Building upon these insights, we propose a novel algorithm called Time Varying LARS (TVLARS), which facilitates robust training in the initial phase without the need for warm-up. Experimental evaluation demonstrates that TVLARS achieves competitive results with LARS and LAMB when warm-up is utilized while surpassing their performance without the warm-up technique.
    摘要 LARS和LAMB已成为大批学习(LBL)中显著的技术,确保训练稳定性。LBL的一个主要挑战是稳定性,AI代理通常会被拥堵在细小的最小值中。为解决这个挑战,一种相对较新的技术——暖身法——已经被采用。然而,暖身法没有强有力的理论基础,留下了进一步探索更有效的算法的门户。在这种情况下,我们进行了实验研究,分析了LARS和LAMB两个最受欢迎的优化器在LBL中的行为。我们的分析帮助我们更好地理解LARS、LAMB和暖身法的必要性,并在这些基础上提出了一种新的算法——时间变化LARS(TVLARS)。TVLARS可以在初始阶段实现稳定训练,不需要暖身法。实验评估表明,TVLARS在使用暖身法时与LARS和LAMB具有竞争性的性能,而无需暖身法时则超越它们。

An automatic selection of optimal recurrent neural network architecture for processes dynamics modelling purposes

  • paper_url: http://arxiv.org/abs/2309.14037
  • repo_url: None
  • paper_authors: Krzysztof Laddach, Rafał Łangowski, Tomasz A. Rutkowski, Bartosz Puchalski
  • for: 这个论文是为了解决人工神经网络用于行为(黑盒)模型Selected动态过程的开发问题。
  • methods: 本研究包括四种原创的神经网络架构搜索算法,基于 известные优化技术如进化算法和梯度下降方法。
  • results: 在使用了扩展验证研究的数据,研究人员通过提出特殊化的进化操作来优化神经网络架构,实现了神经网络的尺寸和准确性之间的变换。
    Abstract A problem related to the development of algorithms designed to find the structure of artificial neural network used for behavioural (black-box) modelling of selected dynamic processes has been addressed in this paper. The research has included four original proposals of algorithms dedicated to neural network architecture search. Algorithms have been based on well-known optimisation techniques such as evolutionary algorithms and gradient descent methods. In the presented research an artificial neural network of recurrent type has been used, whose architecture has been selected in an optimised way based on the above-mentioned algorithms. The optimality has been understood as achieving a trade-off between the size of the neural network and its accuracy in capturing the response of the mathematical model under which it has been learnt. During the optimisation, original specialised evolutionary operators have been proposed. The research involved an extended validation study based on data generated from a mathematical model of the fast processes occurring in a pressurised water nuclear reactor.
    摘要 In the research, an artificial neural network of recurrent type is used, and its architecture is selected in an optimized way based on the above-mentioned algorithms. The optimality is understood as achieving a trade-off between the size of the neural network and its accuracy in capturing the response of the mathematical model under which it has been learned.During the optimization, original specialized evolutionary operators are proposed. The research involves an extended validation study based on data generated from a mathematical model of the fast processes occurring in a pressurized water nuclear reactor.Translated into Simplified Chinese:这篇论文关注了人工神经网络(ANN)用于Behavioral(黑盒)模型选择动态过程的开发算法问题。研究包括四种原创的算法提案,基于常见的优化技术 such as evolutionary algorithms和梯度下降方法。在研究中,使用了一个人工神经网络的回归类型,其架构通过上述算法进行优化。优化的目标是在神经网络的大小和学习模型的响应之间寻找一个平衡点。在优化过程中,提出了原创的特殊演化算法。研究还包括一个扩展验证研究,基于核电站压力水堆受控核反应的数学模型生成的数据。

DeepACO: Neural-enhanced Ant Systems for Combinatorial Optimization

  • paper_url: http://arxiv.org/abs/2309.14032
  • repo_url: https://github.com/henry-yeh/DeepACO
  • paper_authors: Haoran Ye, Jiarui Wang, Zhiguang Cao, Helan Liang, Yong Li
  • for: 本研究提出了一种基于深度学习的ACO框架,以自动化ACO算法中的规则设计。
  • methods: 该框架使用深度学习来强化ACO算法中的优化策略,并且只需一个神经网络和一组超参数来应用于多种具体问题。
  • results: 对八种具体问题进行测试,DeepACO consistently outperforms传统的ACO算法,并且在许多情况下比特定问题的方法更好或与其相当。
    Abstract Ant Colony Optimization (ACO) is a meta-heuristic algorithm that has been successfully applied to various Combinatorial Optimization Problems (COPs). Traditionally, customizing ACO for a specific problem requires the expert design of knowledge-driven heuristics. In this paper, we propose DeepACO, a generic framework that leverages deep reinforcement learning to automate heuristic designs. DeepACO serves to strengthen the heuristic measures of existing ACO algorithms and dispense with laborious manual design in future ACO applications. As a neural-enhanced meta-heuristic, DeepACO consistently outperforms its ACO counterparts on eight COPs using a single neural model and a single set of hyperparameters. As a Neural Combinatorial Optimization method, DeepACO performs better than or on par with problem-specific methods on canonical routing problems. Our code is publicly available at https://github.com/henry-yeh/DeepACO.
    摘要 《蟑螂群体优化(ACO)是一种元规则算法,已经成功应用于多种 combinatorial optimization problems(COPs)。传统上,为特定问题自定义 ACO 需要专家设计知识驱动的规则。在这篇论文中,我们提议了 DeepACO,一种通用框架,利用深度强化学习自动化规则设计。DeepACO 可以增强现有 ACO 算法的规则措施,并减少未来 ACO 应用中的劳动密集设计。作为一种神经元规则优化方法,DeepACO 在八种 COPs 上以单个神经网络和单个超参数表现出色,并且在 canonical routing problems 中表现更好或与专门方法一样。我们的代码公开在 GitHub 上,请参考 。》

Diffeomorphic Transformations for Time Series Analysis: An Efficient Approach to Nonlinear Warping

  • paper_url: http://arxiv.org/abs/2309.14029
  • repo_url: None
  • paper_authors: Iñigo Martinez
  • for: 这个论文主要针对的是如何处理时间序列数据,以及设计特有的时间序列相似性、分类和对应方法。
  • methods: 本论文提出了一些新的柔性对焦方法,使用参数化和对焦变换来扩展和改进传统的时间序列相似性计量方法。这些方法是可微的、可逆的、敏感度高且可以应对噪音和异常值。
  • results: 本论文的结果显示,这些新的柔性对焦方法可以实现高精度的时间序列相似性计量,并且可以与深度学习架构结合,以提高时间序列分类和对应的性能。此外,论文还提出了一些进一步的技术,例如增强的时间transformer网络、深度学习基于时间序列分类模型、可扩展的时间序列对焦分群算法和可扩展的时间序列对焦模型。
    Abstract The proliferation and ubiquity of temporal data across many disciplines has sparked interest for similarity, classification and clustering methods specifically designed to handle time series data. A core issue when dealing with time series is determining their pairwise similarity, i.e., the degree to which a given time series resembles another. Traditional distance measures such as the Euclidean are not well-suited due to the time-dependent nature of the data. Elastic metrics such as dynamic time warping (DTW) offer a promising approach, but are limited by their computational complexity, non-differentiability and sensitivity to noise and outliers. This thesis proposes novel elastic alignment methods that use parametric \& diffeomorphic warping transformations as a means of overcoming the shortcomings of DTW-based metrics. The proposed method is differentiable \& invertible, well-suited for deep learning architectures, robust to noise and outliers, computationally efficient, and is expressive and flexible enough to capture complex patterns. Furthermore, a closed-form solution was developed for the gradient of these diffeomorphic transformations, which allows an efficient search in the parameter space, leading to better solutions at convergence. Leveraging the benefits of these closed-form diffeomorphic transformations, this thesis proposes a suite of advancements that include: (a) an enhanced temporal transformer network for time series alignment and averaging, (b) a deep-learning based time series classification model to simultaneously align and classify signals with high accuracy, (c) an incremental time series clustering algorithm that is warping-invariant, scalable and can operate under limited computational and time resources, and finally, (d) a normalizing flow model that enhances the flexibility of affine transformations in coupling and autoregressive layers.
    摘要 “随着时间数据的普遍和多元化,对时间序列资料的相似性、分类和对应方法已经引起了广泛的关注。时间序列之间的相似度决定是一个核心问题,因为传统的距离度量如欧几何距离(Euclidean distance)不适合时间序列资料。弹性度量如动态时间截弯(DTW)提供了一个有前途的方法,但是它们受到计算复杂度、非断统和干扰和噪音的影响。本论文提出了一些新的弹性对称方法,使用参数和 diffeomorphic 截弯变换来超越 DTW 基础的缺陷。这些方法是可微和可逆的,适合深度学习架构,具有较高的计算效率,并且具有较好的抗干扰和噪音性。此外,这些方法还具有关于参数空间的关注解,可以实现更好的搜索和更高的精度。本论文提出了以下几个提升:(a)改进的时间序列变换网络,用于时间序列Alignment和平均(b)使用深度学习的时间序列分类模型,同时进行时间序列Alignment和分类,(c)可扩展的时间序列集群分析算法,可以在有限的计算和时间资源下进行扩展和可扩展,(d)使用流形变换来增强时间序列的弹性和自适应性。”

LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot Compression

  • paper_url: http://arxiv.org/abs/2309.14021
  • repo_url: None
  • paper_authors: Ayush Kaushal, Tejas Vaidhya, Irina Rish
  • for: 这篇论文探讨了如何使用低阶分解(LoRD)来压缩大语言模型(LLM),以提高执行速度。
  • methods: 论文使用了低阶分解(LoRD)方法,将大量的Linear层分解为两个较小的Matrix,以减少模型的参数数量,并且保持了完全可微和所有参数可训练。
  • results: 论文的实验结果显示,使用LoRD压缩StarCoder 16B模型,可以将其变数数量从16B降至13.2B,而且只需要少于10分钟的时间,且没有Drop的情况下,对于推导速度有22.35%的提升。此外,LoRD模型可以与现有的高效缓存方法进行并行优化,以提高推导速度。
    Abstract Low Rank Decomposition of matrix - splitting a large matrix into a product of two smaller matrix offers a means for compression that reduces the parameters of a model without sparsification, and hence delivering more speedup on modern hardware. Moreover, unlike quantization, the compressed linear layers remain fully differentiable and all the parameters trainable, while being able to leverage the existing highly efficient kernels over floating point matrices. We study the potential to compress Large Language Models (LLMs) for monolingual Code generation via Low Rank Decomposition (LoRD) and observe that ranks for the linear layers in these models can be reduced by upto 39.58% with less than 1% increase in perplexity. We then use Low Rank Decomposition (LoRD) to compress StarCoder 16B to 13.2B parameter with no drop and to 12.3B with minimal drop in HumanEval Pass@1 score, in less than 10 minutes on a single A100. The compressed models speeds up inference by up to 22.35% with just a single line of change in code over huggingface's implementation with pytorch backend. Low Rank Decomposition (LoRD) models remain compatible with state of the art near-lossless quantization method such as SpQR, which allows leveraging further compression gains of quantization. Lastly, QLoRA over Low Rank Decomposition (LoRD) model further reduces memory requirements by as much as 21.2% over vanilla QLoRA while offering similar gains from parameter efficient fine tuning. Our work shows Low Rank Decomposition (LoRD) as a promising new paradigm for LLM compression.
    摘要 LOW Rank Decomposition of matrix - 将大Matrix split into two smaller matrices 提供了压缩方法,可以减少模型参数而不是简化,从而在现代硬件上提高速度。此外,与量化不同,压缩的线性层保持完全可导和所有参数可训练,同时可以利用浮点数矩阵的高效内核。我们研究了使用LOW Rank Decomposition (LoRD)压缩大型自然语言模型(LLMs),并发现可以将线性层的排名减少到39.58%,并且影响下降小于1%。然后,我们使用LoRD压缩StarCoder 16B 到 13.2B 参数,在单个 A100 上在 less than 10 分钟内完成,而无需Drop的情况下,带有 minimal Drop 的 HumanEval Pass@1 分数。压缩后的模型可以提高推理速度,达到22.35%的提升,只需要在代码中进行单行修改。LoRD 模型与现有的高效 near-lossless 量化方法相容,例如 SpQR,可以进一步减少压缩参数。最后,QLoRA over LoRD 模型可以减少内存需求,达到21.2%的减少,同时保持与参数高效 fine-tuning 的相同减少。我们的工作表明LOW Rank Decomposition (LoRD) 是一种有前途的新方法 для LLM 压缩。

Morphological Computing as Logic Underlying Cognition in Human, Animal, and Intelligent Machine

  • paper_url: http://arxiv.org/abs/2309.13979
  • repo_url: None
  • paper_authors: Gordana Dodig-Crnkovic
  • for: 本文探讨了自然主义传统下的逻辑、 epistemology 和科学之间的关系。
  • methods: 文章提出了一种连接逻辑、数学、物理、化学、生物和认知的方案,强调自然 proceses 中的缩减不变的、自组织的动力学。
  • results: 文章表明了生物体的逻辑存在于自然过程中,并且 humans, animals 和 artifactual agents 都具有内在的逻辑。人类中心的、基于自然语言的逻辑是生物体演化出来的复杂逻辑的 simplest form。因此, cognitive 逻辑来自物理、化学和生物逻辑的演化。在一个自组织的计算框架中,可以使用基于形态/物理/自然计算的创新计算框架来解释人类中心的逻辑的起源。extend Evolutionary Synthesis 是理解人类逻辑的起源和逻辑与信息处理/计算 epistemology 之间的关系的关键。
    Abstract This work examines the interconnections between logic, epistemology, and sciences within the Naturalist tradition. It presents a scheme that connects logic, mathematics, physics, chemistry, biology, and cognition, emphasizing scale-invariant, self-organizing dynamics across organizational tiers of nature. The inherent logic of agency exists in natural processes at various levels, under information exchanges. It applies to humans, animals, and artifactual agents. The common human-centric, natural language-based logic is an example of complex logic evolved by living organisms that already appears in the simplest form at the level of basal cognition of unicellular organisms. Thus, cognitive logic stems from the evolution of physical, chemical, and biological logic. In a computing nature framework with a self-organizing agency, innovative computational frameworks grounded in morphological/physical/natural computation can be used to explain the genesis of human-centered logic through the steps of naturalized logical processes at lower levels of organization. The Extended Evolutionary Synthesis of living agents is essential for understanding the emergence of human-level logic and the relationship between logic and information processing/computational epistemology. We conclude that more research is needed to elucidate the details of the mechanisms linking natural phenomena with the logic of agency in nature.
    摘要

Detecting Sexual Content at the Sentence Level in First Millennium Latin Texts

  • paper_url: http://arxiv.org/abs/2309.14974
  • repo_url: https://github.com/lascivaroma/seligator
  • paper_authors: Thibault Clérice
  • for: 这个研究旨在使用深度学习方法对句子水平进行Semantic classification,以加速人文学科和语言学科中 tradicional和时间consuming的Corpus建设。
  • methods: 我们引入了一个新的Corpus,包括约2500句文本,从300BCE到900CE,涵盖性 semantics(医学、 эротика等)。我们评估了各种句子分类方法和不同的输入嵌入层,并显示它们都能够超越简单的符号based搜索。
  • results: 我们的结果表明,这种方法有效,具有高精度和真正的正确率(TPR),分别为70.60%和86.33% using HAN。我们也评估了数据集大小对模型性能的影响(420个 вместо 2013),并显示,虽然我们的模型性能下降,但仍然可以提供高准确率和TPR,甚至无需MLM。
    Abstract In this study, we propose to evaluate the use of deep learning methods for semantic classification at the sentence level to accelerate the process of corpus building in the field of humanities and linguistics, a traditional and time-consuming task. We introduce a novel corpus comprising around 2500 sentences spanning from 300 BCE to 900 CE including sexual semantics (medical, erotica, etc.). We evaluate various sentence classification approaches and different input embedding layers, and show that all consistently outperform simple token-based searches. We explore the integration of idiolectal and sociolectal metadata embeddings (centuries, author, type of writing), but find that it leads to overfitting. Our results demonstrate the effectiveness of this approach, achieving high precision and true positive rates (TPR) of respectively 70.60% and 86.33% using HAN. We evaluate the impact of the dataset size on the model performances (420 instead of 2013), and show that, while our models perform worse, they still offer a high enough precision and TPR, even without MLM, respectively 69% and 51%. Given the result, we provide an analysis of the attention mechanism as a supporting added value for humanists in order to produce more data.
    摘要 在本研究中,我们提议使用深度学习方法进行含义分类,以加速人文科学和语言学领域的 корпу文建设,这是传统的时间消耗性任务。我们介绍了一个新的词库,包含约2500个句子,从300年前至900年前,涵盖性 semantics(医学、 эротиче、等)。我们评估了不同句子分类方法和输入嵌入层,发现它们都能够持续性地超越简单的token-based搜索。我们探索了idiololectal和sociolectic metadata嵌入(世纪、作者、类型的写作)的集成,但发现它会导致过拟合。我们的结果表明这种方法的有效性,卷积率分别为70.60%和86.33%使用HAN。我们评估了数据集大小对模型性能的影响(420个 вместо2013),发现,虽然我们的模型表现不佳,但它们仍然可以提供高准确率和TPR,甚至没有MLM,分别为69%和51%。 giventhe result,我们提供了关注机制的分析,作为支持的加值,以便人文科学家生产更多数据。

Audio classification with Dilated Convolution with Learnable Spacings

  • paper_url: http://arxiv.org/abs/2309.13972
  • repo_url: https://github.com/k-h-ismail/dcls-audio
  • paper_authors: Ismail Khalfaoui-Hassani, Timothée Masquelier, Thomas Pellegrini
  • for: 这个论文是关于音频标注的研究,使用了增宽 convolution 方法来提高音频分类的准确率。
  • methods: 这个论文使用了 learnable spacings 的增宽 convolution 方法(DCLS),将 DSC 层替换为 DCLS 层,以提高 AudioSet 分类 benchmark 的准确率。
  • results: 研究发现,使用 DCLS 方法可以在不增加参数数量和只增加低成本的情况下,提高音频分类的准确率。
    Abstract Dilated convolution with learnable spacings (DCLS) is a recent convolution method in which the positions of the kernel elements are learned throughout training by backpropagation. Its interest has recently been demonstrated in computer vision (ImageNet classification and downstream tasks). Here we show that DCLS is also useful for audio tagging using the AudioSet classification benchmark. We took two state-of-the-art convolutional architectures using depthwise separable convolutions (DSC), ConvNeXt and ConvFormer, and a hybrid one using attention in addition, FastViT, and drop-in replaced all the DSC layers by DCLS ones. This significantly improved the mean average precision (mAP) with the three architectures without increasing the number of parameters and with only a low cost on the throughput. The method code is based on PyTorch and is available at https://github.com/K-H-Ismail/DCLS-Audio
    摘要 <>将文本翻译成简化中文。<>最近的扩展 convolution 方法之一是learned spacings dilated convolution (DCLS),它在训练过程中通过反传播学习kernel元素的位置。在计算机视觉中(ImageNet分类和下游任务),DCLS的利用得到了广泛的关注。在这篇文章中,我们表明DCLS也是有用的 для音频标注,我们使用了两个现代 convolutional 架构(DSC),ConvNeXt和ConvFormer,以及一个hybrid架构使用注意力,FastViT,并将所有DSC层换为DCLS层。这会显著提高mAP值,而无需增加参数数量和只增加低成本的通过put Throughput。代码基于PyTorch,可在https://github.com/K-H-Ismail/DCLS-Audio 上下载。

An AI Chatbot for Explaining Deep Reinforcement Learning Decisions of Service-oriented Systems

  • paper_url: http://arxiv.org/abs/2309.14391
  • repo_url: https://gitlab.com/xrl2/chat4xai
  • paper_authors: Andreas Metzger, Jone Bartel, Jan Laufer
  • for: 本研究旨在帮助服务开发人员、服务提供者和服务用户更好地理解深度强化学习(Deep Reinforcement Learning,简称Deep RL)的决策过程,以便在服务系统中应用Deep RL。
  • methods: 本研究使用了现代人工智能对话系统技术和专门的提问工程来实现自然语言解释。相比于传统的软件基于对话系统,使用AI对话系统可以消除需要抽象出问题和答案的过程。
  • results: 本研究通过使用OpenAI的ChatGPT API实现了Chat4XAI,并评估了其解释的准确性和稳定性,结果表明,使用自然语言解释可以提高服务开发人员、服务提供者和服务用户对Deep RL决策过程的理解,并且可以提高服务用户对服务的信任和接受度。
    Abstract Deep Reinforcement Learning (Deep RL) is increasingly used to cope with the open-world assumption in service-oriented systems. Deep RL was successfully applied to problems such as dynamic service composition, job scheduling, and offloading, as well as service adaptation. While Deep RL offers many benefits, understanding the decision-making of Deep RL is challenging because its learned decision-making policy essentially appears as a black box. Yet, understanding the decision-making of Deep RL is key to help service developers perform debugging, support service providers to comply with relevant legal frameworks, and facilitate service users to build trust. We introduce Chat4XAI to facilitate the understanding of the decision-making of Deep RL by providing natural-language explanations. Compared with visual explanations, the reported benefits of natural-language explanations include better understandability for non-technical users, increased user acceptance and trust, as well as more efficient explanations. Chat4XAI leverages modern AI chatbot technology and dedicated prompt engineering. Compared to earlier work on natural-language explanations using classical software-based dialogue systems, using an AI chatbot eliminates the need for eliciting and defining potential questions and answers up-front. We prototypically realize Chat4XAI using OpenAI's ChatGPT API and evaluate the fidelity and stability of its explanations using an adaptive service exemplar.
    摘要 深度强化学习(深度RL)在服务 ориентирован系统中得到广泛应用,以应对开放世界假设。深度RL在动态服务组合、作业调度和下载等问题上取得了成功,同时也应用于服务适应性。然而,深度RL的决策过程理解具有挑战,因为它的学习决策策略看起来就像黑盒子。然而,理解深度RL的决策过程是关键,以帮助服务开发人员进行调试、支持服务提供者遵守相关法规,并促进服务用户建立信任。我们介绍了 Chat4XAI,用于促进深度RL 决策过程的理解,提供自然语言解释。与视觉解释相比,报告的优点包括更好的可读性 для非技术用户、更高的用户接受度和信任度,以及更高效的解释。 Chat4XAI 利用现代人工智能聊天机器人技术和专门的推荐工程。与先前的классиical软件基础的对话系统相比,使用 AI 聊天机器人解除了需要提取和定义 potential questions and answers 的需求。我们使用 OpenAI 的 ChatGPT API 实现 Chat4XAI,并评估其解释的准确性和稳定性使用适应服务示例。

May I Ask a Follow-up Question? Understanding the Benefits of Conversations in Neural Network Explainability

  • paper_url: http://arxiv.org/abs/2309.13965
  • repo_url: None
  • paper_authors: Tong Zhang, X. Jessie Yang, Boyang Li
  • for: 提高用户理解和信任AI模型的决策过程
  • methods: 使用自由形态对话提高用户理解和信任
  • results: 对话可以提高用户的理解、acceptance和信任,并促进人机合作
    Abstract Research in explainable AI (XAI) aims to provide insights into the decision-making process of opaque AI models. To date, most XAI methods offer one-off and static explanations, which cannot cater to the diverse backgrounds and understanding levels of users. With this paper, we investigate if free-form conversations can enhance users' comprehension of static explanations, improve acceptance and trust in the explanation methods, and facilitate human-AI collaboration. Participants are presented with static explanations, followed by a conversation with a human expert regarding the explanations. We measure the effect of the conversation on participants' ability to choose, from three machine learning models, the most accurate one based on explanations and their self-reported comprehension, acceptance, and trust. Empirical results show that conversations significantly improve comprehension, acceptance, trust, and collaboration. Our findings highlight the importance of customized model explanations in the format of free-form conversations and provide insights for the future design of conversational explanations.
    摘要

Early Churn Prediction from Large Scale User-Product Interaction Time Series

  • paper_url: http://arxiv.org/abs/2309.14390
  • repo_url: None
  • paper_authors: Shamik Bhattacharjee, Utkarsh Thukral, Nilesh Patil
  • For: The paper aims to predict user churn in business-to-customer scenarios, with a focus on fantasy sports, and to provide insights for businesses to formulate effective retention plans.* Methods: The paper uses historical data and combines user activity with deep neural networks for multivariate time series classification, demonstrating remarkable results for churn prediction in complex contexts.* Results: The paper achieves high accuracy in predicting customer churn likelihood, providing valuable insights for businesses to understand attrition trends and develop effective retention strategies.Here’s the simplified Chinese text for the three information points:* For: 这篇论文目标是预测商业到客户场景中的用户弃用,特别是在幻想体育中,以便为企业提供有价值的归属趋势和退休计划。* Methods: 该论文使用历史数据,将用户活动与深度神经网络结合,实现多变量时间序列分类,在复杂的商业到客户场景中达到了Remarkable的弃用预测结果。* Results: 该论文在预测客户弃用可能性方面实现了高精度,为企业提供有价值的归属趋势和退休计划。
    Abstract User churn, characterized by customers ending their relationship with a business, has profound economic consequences across various Business-to-Customer scenarios. For numerous system-to-user actions, such as promotional discounts and retention campaigns, predicting potential churners stands as a primary objective. In volatile sectors like fantasy sports, unpredictable factors such as international sports events can influence even regular spending habits. Consequently, while transaction history and user-product interaction are valuable in predicting churn, they demand deep domain knowledge and intricate feature engineering. Additionally, feature development for churn prediction systems can be resource-intensive, particularly in production settings serving 200m+ users, where inference pipelines largely focus on feature engineering. This paper conducts an exhaustive study on predicting user churn using historical data. We aim to create a model forecasting customer churn likelihood, facilitating businesses in comprehending attrition trends and formulating effective retention plans. Our approach treats churn prediction as multivariate time series classification, demonstrating that combining user activity and deep neural networks yields remarkable results for churn prediction in complex business-to-customer contexts.
    摘要 用户卷退,指客户与企业结束业务关系,对各种商业到客户场景产生深刻的经济影响。在多种系统到用户行为中,预测可能卷退者为primary objective。在投机领域如虚拟运动,国际运动赛事的不可预测因素可能对常规支付习惯产生影响。因此,对卷退预测系统的特征工程可能会占用资源,特别是在服务2000万用户以上的生产环境中,where inference pipelines largely focus on feature engineering。本文通过对历史数据进行广泛的研究,旨在创建一个预测用户卷退可能性的模型,帮助企业理解卷退趋势并制定有效的保留计划。我们的方法将卷退预测视为多变量时间系列分类,示出将用户活动和深度神经网络结合可以在复杂的商业到客户场景中实现remarkable的卷退预测结果。

VidChapters-7M: Video Chapters at Scale

  • paper_url: http://arxiv.org/abs/2309.13952
  • repo_url: https://github.com/antoyang/VidChapters
  • paper_authors: Antoine Yang, Arsha Nagrani, Ivan Laptev, Josef Sivic, Cordelia Schmid
  • for: 该论文旨在提供一个大规模的视频分章数据集,以便进行视频分章任务的研究。
  • methods: 该论文使用了自动抓取视频网站上的用户标注的分章信息,自动生成了817万个视频和7万个分章的数据集。
  • results: 该论文通过对这些数据进行分析,实现了三个任务:视频分章生成、视频分章grounding和 dense video captioning。 Results show that pretraining on VidChapters-7M transfers well to dense video captioning tasks, largely improving the state of the art on the YouCook2 and ViTT benchmarks.
    Abstract Segmenting long videos into chapters enables users to quickly navigate to the information of their interest. This important topic has been understudied due to the lack of publicly released datasets. To address this issue, we present VidChapters-7M, a dataset of 817K user-chaptered videos including 7M chapters in total. VidChapters-7M is automatically created from videos online in a scalable manner by scraping user-annotated chapters and hence without any additional manual annotation. We introduce the following three tasks based on this data. First, the video chapter generation task consists of temporally segmenting the video and generating a chapter title for each segment. To further dissect the problem, we also define two variants of this task: video chapter generation given ground-truth boundaries, which requires generating a chapter title given an annotated video segment, and video chapter grounding, which requires temporally localizing a chapter given its annotated title. We benchmark both simple baselines and state-of-the-art video-language models for these three tasks. We also show that pretraining on VidChapters-7M transfers well to dense video captioning tasks in both zero-shot and finetuning settings, largely improving the state of the art on the YouCook2 and ViTT benchmarks. Finally, our experiments reveal that downstream performance scales well with the size of the pretraining dataset. Our dataset, code, and models are publicly available at https://antoyang.github.io/vidchapters.html.
    摘要 “将长片 видео分成章节可以让用户快速导航到他们所需的信息。这个重要主题一直未被充分研究,原因是公共释出的数据缺乏。为解决这个问题,我们提出了 VidChapters-7M dataset,包含 817 万个用户分成的影片和 7 百万个章节。 VidChapters-7M 是通过自动抓取网络上的影片而实现的,无需任何额外的手动标注。我们提出了以下三个任务:影片章节生成任务,包括时间段分影片和生成每个段落的章节标题;以及两个这个任务的变化:影片章节生成基于预设边界,需要根据预设的影片段落标注生成章节标题,以及影片章节固定,需要根据章节标题进行时间位置local化。我们在这些三个任务上评估了基本的基础模型和现有的影词组言模型,并证明了这些模型在零shot和调整设定下具有优秀的表现。此外,我们的实验显示,下游性能与预训练数据的大小成正比。我们的 dataset、代码和模型都可以在 获取。”

The Time Traveler’s Guide to Semantic Web Research: Analyzing Fictitious Research Themes in the ESWC “Next 20 Years” Track

  • paper_url: http://arxiv.org/abs/2309.13939
  • repo_url: None
  • paper_authors: Irene Celino, Heiko Paulheim
  • for: The paper is written to explore the future research directions and themes of the Semantic Web community in the late 2040s and early 2050s.
  • methods: The paper uses fictitious research papers as a way to gather ideas from the community on potential future research themes and topics, and analyzes the research methods applied by the authors in these submissions.
  • results: The paper provides a survey of the “science fiction” papers submitted to the “Next 20 years” track of ESWC 2023, including the emerging research themes and topics, and investigates the most fictitious parts of the submissions.
    Abstract What will Semantic Web research focus on in 20 years from now? We asked this question to the community and collected their visions in the "Next 20 years" track of ESWC 2023. We challenged the participants to submit "future" research papers, as if they were submitting to the 2043 edition of the conference. The submissions - entirely fictitious - were expected to be full scientific papers, with research questions, state of the art references, experimental results and future work, with the goal to get an idea of the research agenda for the late 2040s and early 2050s. We received ten submissions, eight of which were accepted for presentation at the conference, that mixed serious ideas of potential future research themes and discussion topics with some fun and irony. In this paper, we intend to provide a survey of those "science fiction" papers, considering the emerging research themes and topics, analysing the research methods applied by the authors in these very special submissions, and investigating also the most fictitious parts (e.g., neologisms, fabricated references). Our goal is twofold: on the one hand, we investigate what this special track tells us about the Semantic Web community and, on the other hand, we aim at getting some insights on future research practices and directions.
    摘要 在未来20年,semantic web研究将集中焦点在什么?我们问了社区,收集了他们的见解在“未来20年”track of ESWC 2023中。我们邀请 particiants to submitting“未来”研究论文,如果他们是在2043年版本的会议上提交的。提交的“未来”论文应包括研究问题、现场研究、实验结果和未来工作,以获得2040年代和2050年代的研究训练。我们收到了10篇提交,8篇被接受到会议上,其中有一些具有可能性的未来研究主题和讨论topic。在这篇文章中,我们将对这10篇“科幻”论文进行调查,探讨这些论文中的emerging research theme和topic,分析作者所应用的研究方法,以及一些虚构的部分(例如, neologisms和 fabricated references)。我们的目标是twofold:一方面,我们想要了解这个特别track的semantic web社区,另一方面,我们希望透过这些未来研究方法和方向获得一些预见。

SPOTS: Stable Placement of Objects with Reasoning in Semi-Autonomous Teleoperation Systems

  • paper_url: http://arxiv.org/abs/2309.13937
  • repo_url: https://github.com/joonhyung-lee/spots
  • paper_authors: Joonhyung Lee, Sangbeom Park, Jeongeun Park, Kyungjae Lee, Sungjoon Choi
  • for: 本研究主要针对pick-and-place任务中的“place”任务,即在人工智能框架下将物品放置在合适的位置。
  • methods: 本研究提出一种结合实验驱动的物理稳定验证和大语言模型的 semantic reasoning 能力,以生成基于Contextual reasonableness和物理稳定性的物品放置候选者概率分布。
  • results: 对于两个实验环境和一个实际世界环境进行了广泛的评估,表明OUR方法可以大幅提高物品放置的物理可行性和上下文合理性,同时考虑用户首选。
    Abstract Pick-and-place is one of the fundamental tasks in robotics research. However, the attention has been mostly focused on the ``pick'' task, leaving the ``place'' task relatively unexplored. In this paper, we address the problem of placing objects in the context of a teleoperation framework. Particularly, we focus on two aspects of the place task: stability robustness and contextual reasonableness of object placements. Our proposed method combines simulation-driven physical stability verification via real-to-sim and the semantic reasoning capability of large language models. In other words, given place context information (e.g., user preferences, object to place, and current scene information), our proposed method outputs a probability distribution over the possible placement candidates, considering the robustness and reasonableness of the place task. Our proposed method is extensively evaluated in two simulation and one real world environments and we show that our method can greatly increase the physical plausibility of the placement as well as contextual soundness while considering user preferences.
    摘要 Pick-and-place 是 robotics 研究中的基本任务之一,但是它们的注意力主要集中在“捕获”任务上,剩下的“放置”任务则得到了更少的关注。在这篇论文中,我们对置物任务进行了研究,特别是在电子操作框架中。我们关注了放置物品的两个方面:稳定性和上下文合理性。我们提出的方法结合了实际驱动的物理稳定性验证和大语言模型的Semantic reasoning能力。具体来说,我们根据放置上下文信息(例如用户偏好、要放置的物品和当前场景信息)输出一个可能的放置候选者概率分布,考虑放置任务的稳定性和上下文合理性。我们的方法在三个 simulations 和一个真实世界环境中进行了广泛的评估,并显示了我们的方法可以大幅提高物理可能性以及上下文合理性。

Fairness and Bias in Algorithmic Hiring

  • paper_url: http://arxiv.org/abs/2309.13933
  • repo_url: https://github.com/Aryia-Behroziuan/neurons
  • paper_authors: Alessandro Fabris, Nina Baranowska, Matthew J. Dennis, Philipp Hacker, Jorge Saldivar, Frederik Zuiderveen Borgesius, Asia J. Biega
  • for: 这篇论文是为了探讨算法招聘技术在招聘过程中的应用和公平性问题。
  • methods: 本论文使用了多学科的方法,包括系统评估、偏见检测、数据分析等,以探讨算法招聘技术的优劣和应用场景。
  • results: 本论文结果表明,算法招聘技术可以减少招聘过程中的偏见和不公平,但是现有的数据和方法有限制,需要进一步的研究和开发以确保这些技术的公平性和可靠性。
    Abstract Employers are adopting algorithmic hiring technology throughout the recruitment pipeline. Algorithmic fairness is especially applicable in this domain due to its high stakes and structural inequalities. Unfortunately, most work in this space provides partial treatment, often constrained by two competing narratives, optimistically focused on replacing biased recruiter decisions or pessimistically pointing to the automation of discrimination. Whether, and more importantly what types of, algorithmic hiring can be less biased and more beneficial to society than low-tech alternatives currently remains unanswered, to the detriment of trustworthiness. This multidisciplinary survey caters to practitioners and researchers with a balanced and integrated coverage of systems, biases, measures, mitigation strategies, datasets, and legal aspects of algorithmic hiring and fairness. Our work supports a contextualized understanding and governance of this technology by highlighting current opportunities and limitations, providing recommendations for future work to ensure shared benefits for all stakeholders.
    摘要 雇主正在整个招聘过程中广泛采用算法招聘技术。算法公平特别适用于这个领域,因为它具有高的重要性和结构性不平等。然而,大多数工作在这个领域都提供了半路处理,经常受到两种竞争的观点所限制:一是乐观地关注代表人员偏见的替换,另一是悲观地指出自动化歧视。无论算法招聘能否更不偏袋更有利于社会,以及哪种类型的算法招聘可以更加不偏袋,目前仍未得到答案。这个多学科调查旨在为实践者和研究人员提供一个平衡和一致的涵盖系统、偏见、测量、缓减策略、数据集和法律方面的算法招聘公平问题的全面覆盖。我们的工作旨在支持Contextualized理解和管理这种技术,通过强调当前的机会和限制,提供未来工作的建议,以确保所有参与者共享利益。

UCF-Crime Annotation: A Benchmark for Surveillance Video-and-Language Understanding

  • paper_url: http://arxiv.org/abs/2309.13925
  • repo_url: https://github.com/xuange923/uca-dataset
  • paper_authors: Tongtong Yuan, Xuange Zhang, Kun Liu, Bo Liu, Jian Jin, Zhenzhen Jiao
  • for: 本研究旨在提供一个新的多模态Surveillance视频数据集,以便进行多模态Surveillance视频分析。
  • methods: 我们使用了手动标注的实际世界Surveillance视频数据集UCF-Crime,并将其注解为细腻事件内容和时间。我们的新创建的数据集UCA(UCF-Crime Annotation)提供了一个新的benchmark для多模态Surveillance视频分析。
  • results: 我们在这个新创建的数据集上测试了当前主流的多模态任务模型,发现这些模型在多模态Surveillance视频场景下表现不佳,这 highlights the necessity of constructing this dataset。
    Abstract Surveillance videos are an essential component of daily life with various critical applications, particularly in public security. However, current surveillance video tasks mainly focus on classifying and localizing anomalous events. Existing methods are limited to detecting and classifying the predefined events with unsatisfactory generalization ability and semantic understanding, although they have obtained considerable performance. To address this issue, we propose constructing the first multimodal surveillance video dataset by manually annotating the real-world surveillance dataset UCF-Crime with fine-grained event content and timing. Our newly annotated dataset, UCA (UCF-Crime Annotation), provides a novel benchmark for multimodal surveillance video analysis. It not only describes events in detailed descriptions but also provides precise temporal grounding of the events in 0.1-second intervals. UCA contains 20,822 sentences, with an average length of 23 words, and its annotated videos are as long as 102 hours. Furthermore, we benchmark the state-of-the-art models of multiple multimodal tasks on this newly created dataset, including temporal sentence grounding in videos, video captioning, and dense video captioning. Through our experiments, we found that mainstream models used in previously publicly available datasets perform poorly on multimodal surveillance video scenarios, which highlights the necessity of constructing this dataset. The link to our dataset and code is provided at: https://github.com/Xuange923/UCA-dataset.
    摘要 侦查视频是我们日常生活中的一个重要组成部分,尤其在公共安全领域。然而,现有的侦查视频任务主要集中在异常事件的分类和地点化。现有的方法具有不满足的泛化能力和 semantics理解,尽管它们在性能方面已经取得了一定的进步。为解决这个问题,我们提议创建了首个多模态侦查视频数据集,通过手动标注实际世界的侦查视频数据集UCF-Crime,并将其注解为细化事件内容和时间。我们新创建的数据集,UCAC(UCF-Crime Annotation),不仅描述事件的内容,还提供精确的时间地标,每个事件在0.1秒间隔内进行标注。UCAC包含20822句话,平均长度为23个单词,其注解视频的长度为102小时。此外,我们在这个新创建的数据集上测试了当今主流的多模态任务模型,包括视频句子注释、视频句子注释和稠密视频句子注释。我们的实验结果显示,主流在多模态侦查视频场景下表现糟糕,这 highlights 了我们构建这个数据集的必要性。数据集和代码的链接可以在 GitHub 上找到:https://github.com/Xuange923/UCA-dataset。

A comparison of controller architectures and learning mechanisms for arbitrary robot morphologies

  • paper_url: http://arxiv.org/abs/2309.13908
  • repo_url: None
  • paper_authors: Jie Luo, Jakub Tomczak, Karine Miras, Agoston E. Eiben
  • for: 这个论文的主要问题是:如果机器人的形态不知道先天,那么哪种控制器和学习方法应该使用?作者们的兴趣是基于模块化演化的机器人,但问题也适用于广泛的系统设计者,寻找可重用的解决方案。
  • methods: 作者们使用了三种控制器和学习方法的组合:一种基于动物 lokomootion 模型(中央 Pattern Generators,CPG)和一个进化算法学习者,另一种使用强化学习(RL)和一个神经网络控制器架构,以及一种 combining 的方法,其中控制器是神经网络,学习者是进化算法。
  • results: 作者们对一组模块化机器人进行了测试,并对三种组合的有效性、效率和稳定性进行了比较。结果显示,通常的 CPG 和 RL 方法被外围的 combining 组合所超越,这个组合更加稳定和高效。
    Abstract The main question this paper addresses is: What combination of a robot controller and a learning method should be used, if the morphology of the learning robot is not known in advance? Our interest is rooted in the context of morphologically evolving modular robots, but the question is also relevant in general, for system designers interested in widely applicable solutions. We perform an experimental comparison of three controller-and-learner combinations: one approach where controllers are based on modelling animal locomotion (Central Pattern Generators, CPG) and the learner is an evolutionary algorithm, a completely different method using Reinforcement Learning (RL) with a neural network controller architecture, and a combination `in-between' where controllers are neural networks and the learner is an evolutionary algorithm. We apply these three combinations to a test suite of modular robots and compare their efficacy, efficiency, and robustness. Surprisingly, the usual CPG-based and RL-based options are outperformed by the in-between combination that is more robust and efficient than the other two setups.
    摘要 本文探讨的主要问题是:在不知道机器人形态的情况下,应用哪种控制器和学习方法?我们的兴趣基于模块化 robots 的形态演化,但这个问题也适用于更广泛的系统设计师,寻找通用的解决方案。我们通过实验比较三种控制器和学习方法的组合:一种使用动物步态模型(中央 Pattern Generators,CPG)和进化算法学习者,一种完全不同的方法使用奖励学习(RL)和神经网络控制器架构,以及一种混合“中间”的方法,其中控制器是神经网络,学习者是进化算法。我们将这三种组合应用到一组模块 robots 上,并比较其效果、效率和稳定性。结果各种意外地发现,通常的 CPG-based 和 RL-based 选项被“中间”组合所超越,这种组合更加稳定和高效。

Analyzing the Efficacy of an LLM-Only Approach for Image-based Document Question Answering

  • paper_url: http://arxiv.org/abs/2309.14389
  • repo_url: None
  • paper_authors: Nidhi Hegde, Sujoy Paul, Gagan Madan, Gaurav Aggarwal
  • for: 这个论文的目的是研究文档问答模型中的两个关键组件:视觉编码器和大型自然语言模型(LLM),以及这两个组件之间的相对贡献。
  • methods: 这篇论文使用了一种叫做“LLM-只”的方法,即直接将文档图像中的文本信息序列化并传递给一个受训练的 LLM,以便不需要显式的视觉编码器。
  • results: 论文的结果表明,使用这种“LLM-只”方法可以在多种 datasets 上达到与或接近领先性水平的表现。
    Abstract Recent document question answering models consist of two key components: the vision encoder, which captures layout and visual elements in images, and a Large Language Model (LLM) that helps contextualize questions to the image and supplements them with external world knowledge to generate accurate answers. However, the relative contributions of the vision encoder and the language model in these tasks remain unclear. This is especially interesting given the effectiveness of instruction-tuned LLMs, which exhibit remarkable adaptability to new tasks. To this end, we explore the following aspects in this work: (1) The efficacy of an LLM-only approach on document question answering tasks (2) strategies for serializing textual information within document images and feeding it directly to an instruction-tuned LLM, thus bypassing the need for an explicit vision encoder (3) thorough quantitative analysis on the feasibility of such an approach. Our comprehensive analysis encompasses six diverse benchmark datasets, utilizing LLMs of varying scales. Our findings reveal that a strategy exclusively reliant on the LLM yields results that are on par with or closely approach state-of-the-art performance across a range of datasets. We posit that this evaluation framework will serve as a guiding resource for selecting appropriate datasets for future research endeavors that emphasize the fundamental importance of layout and image content information.
    摘要 现代文档问答模型通常包括两个关键组件:视觉编码器,用于捕捉图像中的布局和视觉元素,以及一个大语言模型(LLM),用于将问题与图像相关联并补充问题以外的知识来生成准确的答案。然而,视觉编码器和语言模型在这些任务中的相对贡献还不清楚。这 especially interesting, given the effectiveness of instruction-tuned LLMs,which exhibit remarkable adaptability to new tasks。为此,我们在这项工作中进行了以下三个方面的研究:1. 使用LLM-only方法解决文档问答任务的效果(2), Serializing textual information within document images and feeding it directly to an instruction-tuned LLM, thus bypassing the need for an explicit vision encoder。我们的全面分析涵盖了六个多样化的 benchmarck 数据集,使用不同规模的LLM。我们的发现表明,一种仅依靠LLM的方法可以在多个数据集上达到或接近状态艺术性的表现。我们认为这种评价框架将成为未来研究着重于图像布局和内容信息的关键资源。

Exploring Robot Morphology Spaces through Breadth-First Search and Random Query

  • paper_url: http://arxiv.org/abs/2309.14387
  • repo_url: None
  • paper_authors: Jie Luo
  • for: 这项研究旨在 Investigating the role of query mechanisms in the brain-body co-evolution of modular robots, and comparing the effectiveness of two different query mechanisms (BFS and Random Query) in evolving robot morphologies.
  • methods: 该研究使用了 CPPNs and robot controllers using tensors,以及两种不同的查询机制(BFS和随机查询),在两种演化框架(LAMARCK和达尔沃尼系统)中进行了对比性分析。
  • results: 研究发现,BFS 比 Random Query 更有效率地生成高性能的机器人体,并且在达尔沃尼系统中,BFS 导致机器人体的演化和性能具有更高的多样性和特征。
    Abstract Evolutionary robotics offers a powerful framework for designing and evolving robot morphologies, particularly in the context of modular robots. However, the role of query mechanisms during the genotype-to-phenotype mapping process has been largely overlooked. This research addresses this gap by conducting a comparative analysis of query mechanisms in the brain-body co-evolution of modular robots. Using two different query mechanisms, Breadth-First Search (BFS) and Random Query, within the context of evolving robot morphologies using CPPNs and robot controllers using tensors, and testing them in two evolutionary frameworks, Lamarckian and Darwinian systems, this study investigates their influence on evolutionary outcomes and performance. The findings demonstrate the impact of the two query mechanisms on the evolution and performance of modular robot bodies, including morphological intelligence, diversity, and morphological traits. This study suggests that BFS is both more effective and efficient in producing highly performing robots. It also reveals that initially, robot diversity was higher with BFS compared to Random Query, but in the Lamarckian system, it declines faster, converging to superior designs, while in the Darwinian system, BFS led to higher end-process diversity.
    摘要 生态进化机器人学提供了一个强大的框架 для设计和演化机器人体形,特别在模块化机器人中。然而,在基因型-到形态映射过程中 Query 机制的角色被大量遗弃。这种研究填补了这个遗弃,通过对 CPPN 和机器人控制器使用矩阵进行演化 robots 的脑体进行比较分析。使用 BFS 和随机 Query 两种不同的 Query 机制,在拉马克思主义和达尔文主义两种演化框架下测试它们,这种研究研究它们对演化结果和性能的影响。发现 BFS 比Random Query 更有效和高效地生成高性能机器人体形,并且发现在拉马克思主义系统中,BFS 初始时 robot 多样性比 Random Query 高,但随着演化,它快速下降,转化为优秀设计,而达尔文主义系统中,BFS 导致最终的多样性高于 Random Query。

Scene Informer: Anchor-based Occlusion Inference and Trajectory Prediction in Partially Observable Environments

  • paper_url: http://arxiv.org/abs/2309.13893
  • repo_url: https://github.com/sisl/sceneinformer
  • paper_authors: Bernard Lange, Jiachen Li, Mykel J. Kochenderfer
  • for: 本研究旨在提高自动驾驶汽车在部分可见环境中的导航能力,包括预测 observable agents 的未来运动和推断 occluded agents。
  • methods: 我们引入了 Scene Informer,一种统一的方法,可以同时预测 observable agents 的运动和推断 occluded agents。Scene Informer 使用 transformer 来聚合不同输入模式,并提供选择性查询 occlusions 可能与 AV 计划路径相交。
  • results: 我们的方法在 Waymo Open Motion Dataset 上的部分可见设置下表现出色,超过了现有方法在 occupancy 预测和 trajectory 预测方面。
    Abstract Navigating complex and dynamic environments requires autonomous vehicles (AVs) to reason about both visible and occluded regions. This involves predicting the future motion of observed agents, inferring occluded ones, and modeling their interactions based on vectorized scene representations of the partially observable environment. However, prior work on occlusion inference and trajectory prediction have developed in isolation, with the former based on simplified rasterized methods and the latter assuming full environment observability. We introduce the Scene Informer, a unified approach for predicting both observed agent trajectories and inferring occlusions in a partially observable setting. It uses a transformer to aggregate various input modalities and facilitate selective queries on occlusions that might intersect with the AV's planned path. The framework estimates occupancy probabilities and likely trajectories for occlusions, as well as forecast motion for observed agents. We explore common observability assumptions in both domains and their performance impact. Our approach outperforms existing methods in both occupancy prediction and trajectory prediction in partially observable setting on the Waymo Open Motion Dataset.
    摘要 Translated into Simplified Chinese:自适应环境需要自动驾驶车 (AV) 能够理解可见和遮盖的区域。这些区域包括预测可见的代理人的未来运动、推理遮盖的人和基于可见环境的场景表示的Vectorization。然而,先前的遮盖推断和轨迹预测工作都是分离的,前者基于简化的扫描方法,后者假设环境完全可见。我们介绍了Scene Informer,一种统一的方法,可以预测可见代理人的轨迹和预测遮盖物的存在。它使用 transformer 来聚合不同的输入模式,并且可以根据预测轨迹的可能性进行选择性的查询遮盖物。框架可以估算遮盖物的存在概率和可能的轨迹,以及可见代理人的预测运动。我们探讨了两个域的共同可见假设和其影响性。我们的方法在 partially observable 环境中的 Waymo Open Motion Dataset 上比过去的方法表现出色。

TouchUp-G: Improving Feature Representation through Graph-Centric Finetuning

  • paper_url: http://arxiv.org/abs/2309.13885
  • repo_url: None
  • paper_authors: Jing Zhu, Xiang Song, Vassilis N. Ioannidis, Danai Koutra, Christos Faloutsos
  • for: 提高下游图学任务中节点特征的质量,以提高图 neural network(GNN)的表现。
  • methods: 使用TOUCHUP-G方法,该方法是一种通用的、多Modal的、原则正的方法,可以提高任何下游图任务中节点特征的质量。
  • results: TOUCHUP-G方法可以在四个真实世界数据集上达到状态的最佳结果,这些数据集包括不同的任务和modal。
    Abstract How can we enhance the node features acquired from Pretrained Models (PMs) to better suit downstream graph learning tasks? Graph Neural Networks (GNNs) have become the state-of-the-art approach for many high-impact, real-world graph applications. For feature-rich graphs, a prevalent practice involves utilizing a PM directly to generate features, without incorporating any domain adaptation techniques. Nevertheless, this practice is suboptimal because the node features extracted from PM are graph-agnostic and prevent GNNs from fully utilizing the potential correlations between the graph structure and node features, leading to a decline in GNNs performance. In this work, we seek to improve the node features obtained from a PM for downstream graph tasks and introduce TOUCHUP-G, which has several advantages. It is (a) General: applicable to any downstream graph task, including link prediction which is often employed in recommender systems; (b) Multi-modal: able to improve raw features of any modality (e.g. images, texts, audio); (c) Principled: it is closely related to a novel metric, feature homophily, which we propose to quantify the potential correlations between the graph structure and node features and we show that TOUCHUP-G can effectively shrink the discrepancy between the graph structure and node features; (d) Effective: achieving state-of-the-art results on four real-world datasets spanning different tasks and modalities.
    摘要 如何增强从预训练模型(PM)获取的节点特征以更适合下游图学任务?图神经网络(GNNs)已成为许多高impact、实际世界图应用的州立艺术。对于具有丰富特征的图,一种常见做法是直接使用PM生成特征,不 incorporating任何领域适应技术。然而,这种做法是不优化的,因为PM中的节点特征是图无关的,不能让GNNs完全利用图结构和节点特征之间的潜在相关性,导致GNNs的性能下降。在这项工作中,我们想要改进从PM获取的节点特征,并引入了TOUCHUP-G,它具有以下优势:* 通用:适用于任何下游图任务,包括常用的链接预测任务(常用于推荐系统)。* 多模式:能够提高任何类型的原始特征(例如图像、文本、音频)。* 原则性:与我们提出的一种新的度量(特征同化度)有紧密关系,我们表明了TOUCHUP-G可以有效缩小图结构和节点特征之间的差异。* 有效:在四个实际数据集上达到了状态之最的结果,这些数据集来自不同的任务和模式。

PRiSM: Enhancing Low-Resource Document-Level Relation Extraction with Relation-Aware Score Calibration

  • paper_url: http://arxiv.org/abs/2309.13869
  • repo_url: https://github.com/brightjade/prism
  • paper_authors: Minseok Choi, Hyesu Lim, Jaegul Choo
  • for: DocRE是为了提取文档中所有实体对的关系而设计的。
  • methods: 我们使用了一种叫做PRiSM的方法,它可以根据关系Semantic信息来适应logits。
  • results: 我们在三个DocRE数据集上进行评估,结果表明,将现有模型与PRiSM结合可以提高表达度,而且在训练时使用的数据量只需要3%。同时,我们发现在训练过程中,PRiSM可以降低偏差错误的数量,达到36倍的提升。
    Abstract Document-level relation extraction (DocRE) aims to extract relations of all entity pairs in a document. A key challenge in DocRE is the cost of annotating such data which requires intensive human effort. Thus, we investigate the case of DocRE in a low-resource setting, and we find that existing models trained on low data overestimate the NA ("no relation") label, causing limited performance. In this work, we approach the problem from a calibration perspective and propose PRiSM, which learns to adapt logits based on relation semantic information. We evaluate our method on three DocRE datasets and demonstrate that integrating existing models with PRiSM improves performance by as much as 26.38 F1 score, while the calibration error drops as much as 36 times when trained with about 3% of data. The code is publicly available at https://github.com/brightjade/PRiSM.
    摘要 文档级关系提取(DocRE)目标是在文档中提取所有实体对的关系。一个主要挑战在DocRE中是获取数据 annotating 的成本,需要卷积的人工劳动。因此,我们在低资源设定下调查DocRE问题,发现现有模型在低数据上训练后过度估计NA("无关")标签,导致性能有限。在这种情况下,我们从抽象角度出发,提出了PRiSM,它学习基于关系semantic信息来调整logits。我们对三个DocRE数据集进行评估,并证明了将现有模型与PRiSM结合使用可以提高性能,最高提高26.38准确率,同时抽象错误下降了36倍,只需训练约3%的数据。代码可以在https://github.com/brightjade/PRiSM上获取。

Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning

  • paper_url: http://arxiv.org/abs/2309.13860
  • repo_url: https://github.com/yanghaha0908/fasthubert
  • paper_authors: Guanrou Yang, Ziyang Ma, Zhisheng Zheng, Yakun Song, Zhikang Niu, Xie Chen
  • for: 这篇论文主要是为了提高自动标注学习(SSL)方法在语音处理任务中的效率,并且测试这些方法在不同的下游任务中的表现。
  • methods: 本论文使用了HuBERT模型,并进行了多个效率优化,包括范例删除、批评迭代、条件更新、等。
  • results: 相比原始实现, Fast-HuBERT可以在1.1天内训练完成,并且无损性能,实现了5.2倍的速度提升。此外, authors 还 explore了两种已知技术,并证明了这些技术可以获得类似的改进。
    Abstract Recent years have witnessed significant advancements in self-supervised learning (SSL) methods for speech-processing tasks. Various speech-based SSL models have been developed and present promising performance on a range of downstream tasks including speech recognition. However, existing speech-based SSL models face a common dilemma in terms of computational cost, which might hinder their potential application and in-depth academic research. To address this issue, we first analyze the computational cost of different modules during HuBERT pre-training and then introduce a stack of efficiency optimizations, which is named Fast-HuBERT in this paper. The proposed Fast-HuBERT can be trained in 1.1 days with 8 V100 GPUs on the Librispeech 960h benchmark, without performance degradation, resulting in a 5.2x speedup, compared to the original implementation. Moreover, we explore two well-studied techniques in the Fast-HuBERT and demonstrate consistent improvements as reported in previous work.
    摘要 (Simplified Chinese translation)近年来,自动学习(Self-Supervised Learning,SSL)方法在语音处理任务中得到了 significative 进步。许多基于语音的 SSL 模型已经被开发出来,并在各种下游任务中表现出色,包括语音识别。然而,现有的语音基于 SSL 模型面临着计算成本的问题,这可能会限制它们的应用和学术研究的深度。为解决这个问题,我们首先分析了不同模块的计算成本在 HuBERT 预训练过程中,然后引入了一堆性能优化技术,称之为 Fast-HuBERT。我们的 Fast-HuBERT 可以在 1.1 天内使用 8 个 V100 GPU 在 Librispeech 960h 测试集上进行预训练,无需性能下降,相比原始实现,实现了 5.2 倍的速度提升。此外,我们还探索了两种已经广泛研究的技术,并在 Fast-HuBERT 中进行了详细的实验,并发现了一致的改进。

Can neural networks count digit frequency?

  • paper_url: http://arxiv.org/abs/2310.04431
  • repo_url: https://github.com/PadmakshKhandelwal/Can-neural-networks-count
  • paper_authors: Padmaksh Khandelwal
  • for: 本研究旨在比较不同的古典机器学习模型和神经网络在识别每个数字的频率出现的问题上的性能。这有各种应用场景,如获取视频场景中目标对象的频率。
  • methods: 我们在这个问题上采用了一种混合的分类和回归任务,并且特意制作了自己的数据集来观察系统性的差异。我们使用不同的度量来评估每种方法的性能,并且在多个数据集上进行了评估。
  • results: 我们发现,决策树和Random Forest具有内在的偏见,导致它们无法泛化好。同时,神经网络在分类和回归两个任务上都明显超过了古典机器学习模型,尤其是在6位和10位数据集上。数据集和代码在github上公开。
    Abstract In this research, we aim to compare the performance of different classical machine learning models and neural networks in identifying the frequency of occurrence of each digit in a given number. It has various applications in machine learning and computer vision, e.g. for obtaining the frequency of a target object in a visual scene. We considered this problem as a hybrid of classification and regression tasks. We carefully create our own datasets to observe systematic differences between different methods. We evaluate each of the methods using different metrics across multiple datasets.The metrics of performance used were the root mean squared error and mean absolute error for regression evaluation, and accuracy for classification performance evaluation. We observe that decision trees and random forests overfit to the dataset, due to their inherent bias, and are not able to generalize well. We also observe that the neural networks significantly outperform the classical machine learning models in terms of both the regression and classification metrics for both the 6-digit and 10-digit number datasets. Dataset and code are available on github.
    摘要 在这项研究中,我们目标是比较不同的古典机器学习模型和神经网络在识别每个数字的频率出现的性能。它在机器学习和计算机视觉等领域有各种应用,例如获取视觉场景中目标对象的频率。我们将这个问题视为分类和回归任务的混合问题。我们仔细制作了自己的数据集,以观察不同方法之间的系统差异。我们对每种方法使用不同的指标进行评估,并在多个数据集上进行评估。我们发现决策树和随机森林因数据集的偏袋而过拟合,无法通过泛化而表现出色。我们还发现神经网络在分类和回归指标方面对6位和10位数据集都有显著的优势。数据集和代码可以在github上下载。

Sampling - Variational Auto Encoder - Ensemble: In the Quest of Explainable Artificial Intelligence

  • paper_url: http://arxiv.org/abs/2309.14385
  • repo_url: None
  • paper_authors: Sarit Maitra, Vivek Mishra, Pratima Verma, Manav Chopra, Priyanka Nath
  • for: 这篇论文的目的是提出一种新的可解释人工智能(XAI)框架,用于解释人工智能模型的输出。
  • methods: 这篇论文使用了一种新的混合架构,称为Sampling-Variational Auto Encoder-Ensemble Anomaly Detection(SVEAD),它将Variational Auto Encoder(VAE)与集成折衔和SHapley Additive exPlanations(SHAP)相结合,用于解决不平衡分类问题。
  • results: 研究发现,将VAE、集成折衔和SHAP结合使用可以不仅提高模型性能,还可以提供一个简单易于解释的框架。此外,研究还使用SHAP与排序重要性和个体条件预期结合,创造了一个强大的模型解释方法。这些发现对实际应用中的XAI具有重要的意义,可以增强人工智能应用的信任度。
    Abstract Explainable Artificial Intelligence (XAI) models have recently attracted a great deal of interest from a variety of application sectors. Despite significant developments in this area, there are still no standardized methods or approaches for understanding AI model outputs. A systematic and cohesive framework is also increasingly necessary to incorporate new techniques like discriminative and generative models to close the gap. This paper contributes to the discourse on XAI by presenting an empirical evaluation based on a novel framework: Sampling - Variational Auto Encoder (VAE) - Ensemble Anomaly Detection (SVEAD). It is a hybrid architecture where VAE combined with ensemble stacking and SHapley Additive exPlanations are used for imbalanced classification. The finding reveals that combining ensemble stacking, VAE, and SHAP can. not only lead to better model performance but also provide an easily explainable framework. This work has used SHAP combined with Permutation Importance and Individual Conditional Expectations to create a powerful interpretability of the model. The finding has an important implication in the real world, where the need for XAI is paramount to boost confidence in AI applications.
    摘要 《可解释人工智能(XAI)模型在不同应用领域引起了很大的关注。虽然这一领域已经取得了很大的进步,但是还没有标准化的方法或方法来理解人工智能模型的输出。一个系统和一致的框架也在增加,以整合新技术如探测和生成模型,以填补这一空白。这篇论文对XAI进行了评估,通过提出了一个新的框架:采样-自适应变换器-ensemble异常检测(SVEAD)。这是一种混合体系,其中变换器与ensemble栈和SHapley Additive exPlanations(SHAP)结合使用,用于处理不均衡的分类。研究发现,将ensemble栈、变换器和SHAP结合使用,不仅可以提高模型性能,还可以提供一个简单易理解的框架。本研究使用SHAP与排序重要性和个体条件预期结合,创造了一个强大的模型解释能力。这种发现对现实中的XAI需求具有重要意义,以增加人工智能应用的信任度。》Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. The translation is done using a machine translation tool, and may not be perfect. Please note that the translation may not capture all the nuances and idiomatic expressions of the original text.

Prior Bilinear Based Models for Knowledge Graph Completion

  • paper_url: http://arxiv.org/abs/2309.13834
  • repo_url: None
  • paper_authors: Jiayi Li, Ruilin Luo, Jiaqi Sun, Jing Xiao, Yujiu Yang
  • for: 本文主要针对知识图(KG)完成任务,探讨bilinear模型忽略了先前属性的问题。
  • methods: 作者提出了一种解决方案called Unit Ball Bilinear Model (UniBi),该模型不仅有理论上的优势,还提供了更好的解释性和性能,通过最小化无用学习的约束来减少不必要的学习。
  • results: 实验表明,UniBi模型能够capture先前属性,并且verify其解释性和性能。
    Abstract Bilinear based models are powerful and widely used approaches for Knowledge Graphs Completion (KGC). Although bilinear based models have achieved significant advances, these studies mainly concentrate on posterior properties (based on evidence, e.g. symmetry pattern) while neglecting the prior properties. In this paper, we find a prior property named "the law of identity" that cannot be captured by bilinear based models, which hinders them from comprehensively modeling the characteristics of KGs. To address this issue, we introduce a solution called Unit Ball Bilinear Model (UniBi). This model not only achieves theoretical superiority but also offers enhanced interpretability and performance by minimizing ineffective learning through minimal constraints. Experiments demonstrate that UniBi models the prior property and verify its interpretability and performance.
    摘要 bilinear基于模型在知识图完成(KGC)领域是非常强大和广泛使用的方法。虽然bilinear基于模型已经取得了显著的进步,但这些研究主要集中于后果性质(基于证据,例如对称模式)而忽略了先前性质。在这篇论文中,我们发现了一种先前性质名为“同一性法律”,这种法律不能被bilinear基于模型捕捉,这会限制它们完全模型知识图的特点。为解决这个问题,我们介绍了一种解决方案called Unit Ball Bilinear Model(UniBi)。这个模型不仅具有理论上的优越性,也提供了更好的解释性和性能,通过最小化不必要的学习来减少不必要的约束。实验表明,UniBi模型了先前性质并证明了其解释性和性能。

Dual Feature Augmentation Network for Generalized Zero-shot Learning

  • paper_url: http://arxiv.org/abs/2309.13833
  • repo_url: https://github.com/sion1/dfan
  • paper_authors: Lei Xiang, Yuan Zhou, Haoran Duan, Yang Long
  • for: 这篇论文主要针对零例学习(Zero-shot learning)问题,旨在无需训练样本就能够推断未经训练的类别。
  • methods: 该论文提出了一种新的DUAL Feature Augmentation Network(DFAN),包括两个特征增强模块,一个用于视觉特征,另一个用于语义特征。视觉特征增强模块通过学习特征特性,使用高 cosine 距离来强化特征表示。语义特征增强模块则通过提出了一个偏置学习器,捕捉数据集的偏移,使得预测值与实际值之间的差距更小。此外,我们还引入了两个预测器,以冲突地解决本地和全局特征之间的冲突。
  • results: 实验结果表明,我们的方法在三个 benchmark 上表现出了明显的进步,与现有方法相比,具有更高的准确率和更好的一致性。
    Abstract Zero-shot learning (ZSL) aims to infer novel classes without training samples by transferring knowledge from seen classes. Existing embedding-based approaches for ZSL typically employ attention mechanisms to locate attributes on an image. However, these methods often ignore the complex entanglement among different attributes' visual features in the embedding space. Additionally, these methods employ a direct attribute prediction scheme for classification, which does not account for the diversity of attributes in images of the same category. To address these issues, we propose a novel Dual Feature Augmentation Network (DFAN), which comprises two feature augmentation modules, one for visual features and the other for semantic features. The visual feature augmentation module explicitly learns attribute features and employs cosine distance to separate them, thus enhancing attribute representation. In the semantic feature augmentation module, we propose a bias learner to capture the offset that bridges the gap between actual and predicted attribute values from a dataset's perspective. Furthermore, we introduce two predictors to reconcile the conflicts between local and global features. Experimental results on three benchmarks demonstrate the marked advancement of our method compared to state-of-the-art approaches. Our code is available at https://github.com/Sion1/DFAN.
    摘要 为了解决这些问题,我们提出了一个新的对应网络(DFAN),其包括两个对应模组:一个用于可视特征,另一个用于 semantic 特征。可视特征增强模组会明确地学习属性特征,并使用做 Cosine 距离来分离它们,这样提高了属性表示。另一方面,semantic 增强模组中,我们提出了一个偏置学习器,以捕捉实际数据集的偏移,将预测的属性值与实际值匹配。此外,我们引入了两个预测器,以调解本地和全局特征之间的冲突。实验结果显示,我们的方法与现有的方法相比,在三个 benchmark 上有明显的进步。我们的代码可以在 GitHub 上获取:https://github.com/Sion1/DFAN。

Evaluating Cognitive Maps and Planning in Large Language Models with CogEval

  • paper_url: http://arxiv.org/abs/2309.15129
  • repo_url: None
  • paper_authors: Ida Momennejad, Hosein Hasanbeig, Felipe Vieira, Hiteshi Sharma, Robert Osazuwa Ness, Nebojsa Jojic, Hamid Palangi, Jonathan Larson
  • for: 这研究的目的是系统evaluate大语言模型(LLM)的认知能力。
  • methods: 该研究使用了一种基于认知科学的评估协议,称为CogEval,来评估 LLM 的认知能力。
  • results: 研究发现, LL M 在一些简单的规划任务上表现出 Competence,但在更复杂的规划任务上存在 Failure modes,包括hallucination 和循环。这些发现不支持 LL M 具有出色的规划能力。
    Abstract Recently an influx of studies claim emergent cognitive abilities in large language models (LLMs). Yet, most rely on anecdotes, overlook contamination of training sets, or lack systematic Evaluation involving multiple tasks, control conditions, multiple iterations, and statistical robustness tests. Here we make two major contributions. First, we propose CogEval, a cognitive science-inspired protocol for the systematic evaluation of cognitive capacities in Large Language Models. The CogEval protocol can be followed for the evaluation of various abilities. Second, here we follow CogEval to systematically evaluate cognitive maps and planning ability across eight LLMs (OpenAI GPT-4, GPT-3.5-turbo-175B, davinci-003-175B, Google Bard, Cohere-xlarge-52.4B, Anthropic Claude-1-52B, LLaMA-13B, and Alpaca-7B). We base our task prompts on human experiments, which offer both established construct validity for evaluating planning, and are absent from LLM training sets. We find that, while LLMs show apparent competence in a few planning tasks with simpler structures, systematic evaluation reveals striking failure modes in planning tasks, including hallucinations of invalid trajectories and getting trapped in loops. These findings do not support the idea of emergent out-of-the-box planning ability in LLMs. This could be because LLMs do not understand the latent relational structures underlying planning problems, known as cognitive maps, and fail at unrolling goal-directed trajectories based on the underlying structure. Implications for application and future directions are discussed.
    摘要 近期有多个研究表明大语言模型(LLM)具有新的认知能力。然而,大多数研究仅仅基于启示,忽略训练集的杂乱,或者缺乏多个任务、控制条件、多个迭代和统计学robustness测试。我们在这里作出了两个主要贡献。首先,我们提议了一种认知科学途径 protocol for the systematic evaluation of cognitive capacities in Large Language Models,可以用于评估多种能力。其次,我们遵循这种协议来系统地评估 eight LLMs(OpenAI GPT-4、GPT-3.5-turbo-175B、davinci-003-175B、Google Bard、Cohere-xlarge-52.4B、Anthropic Claude-1-52B、LLaMA-13B和Alpaca-7B)的认知地图和规划能力。我们基于人类实验的任务提示,这些任务具有建立的验证有效性,且不存在在 LLM 训练集中。我们发现,虽然 LLMs 在一些规划任务中显示出一定的能力,但系统性评估表明,LLMs 在规划任务中存在明显的失败模式,包括hallucination 无效的轨迹和gets trapped in loops。这些发现不支持 LLMS 具有出 Box 的规划能力。这可能是因为 LLMS 不理解规划问题下的隐藏关系结构,并且无法基于这种结构推导目标导向的轨迹。我们的发现有很多应用和未来方向的意义。

Benchmarking Local Robustness of High-Accuracy Binary Neural Networks for Enhanced Traffic Sign Recognition

  • paper_url: http://arxiv.org/abs/2310.03033
  • repo_url: https://github.com/christopherbrix/vnncomp2023_benchmarks
  • paper_authors: Andreea Postovan, Mădălina Eraşcu
  • for: 本研究旨在提高自动驾驶系统中的道路标志识别精度,并且面临着实际中的挑战,如抗抗例和遮挡。
  • methods: 本研究使用了二进制神经网络(BNN)来构建精度高的道路标志识别模型,并且强调了模型的尺寸和计算资源的有效使用。
  • results: 本研究发现,使用BNN模型可以在实际中提高道路标志识别精度,但是存在一些地区的异常输出和错误结果。
    Abstract Traffic signs play a critical role in road safety and traffic management for autonomous driving systems. Accurate traffic sign classification is essential but challenging due to real-world complexities like adversarial examples and occlusions. To address these issues, binary neural networks offer promise in constructing classifiers suitable for resource-constrained devices. In our previous work, we proposed high-accuracy BNN models for traffic sign recognition, focusing on compact size for limited computation and energy resources. To evaluate their local robustness, this paper introduces a set of benchmark problems featuring layers that challenge state-of-the-art verification tools. These layers include binarized convolutions, max pooling, batch normalization, fully connected. The difficulty of the verification problem is given by the high number of network parameters (905k - 1.7 M), of the input dimension (2.7k-12k), and of the number of regions (43) as well by the fact that the neural networks are not sparse. The proposed BNN models and local robustness properties can be checked at https://github.com/ChristopherBrix/vnncomp2023_benchmarks/tree/main/benchmarks/traffic_signs_recognition. The results of the 4th International Verification of Neural Networks Competition (VNN-COMP'23) revealed the fact that 4, out of 7, solvers can handle many of our benchmarks randomly selected (minimum is 6, maximum is 36, out of 45). Surprisingly, tools output also wrong results or missing counterexample (ranging from 1 to 4). Currently, our focus lies in exploring the possibility of achieving a greater count of solved instances by extending the allotted time (previously set at 8 minutes). Furthermore, we are intrigued by the reasons behind the erroneous outcomes provided by the tools for certain benchmarks.
    摘要 traffic signs 对道路安全和交通管理具有关键作用,因此精准的交通标志分类是非常重要,但又是具有挑战性的。为了解决这些问题, binary neural networks(BNN)提供了一种可能性,它们可以在有限的计算和能源资源下构建高精度的分类器。在我们的前一项工作中,我们已经提出了高精度的BNN模型,专注于模型的紧凑性,以适应有限的计算资源。为了评估这些模型的本地稳定性,这篇论文引入了一组 benchmark 问题,这些问题挑战了当前的验证工具。这些问题包括缩进几何学层、最大池化层、批量常量层和归一化层,它们的难度来自于网络参数的大量(905k-1.7M)、输入维度的大量(2.7k-12k)和区域的数量(43)以及网络不是稀疏的性。可以在 中查看我们的模型和本地稳定性性质。 competition 的结果表明,4个出 из 7个解决方案可以随机选择的 benchmark 中的许多(最小值是6,最大值是36,总共45)。尽管有些工具输出了错误的结果或缺失Counterexample(从1到4),但我们目前的注意力是探索可以通过延长时间(原先设置为8分钟)来提高解决的数量。此外,我们也对工具输出错误的原因产生了极大的兴趣。

Privacy-preserving Linear Computations in Spiking Neural P Systems

  • paper_url: http://arxiv.org/abs/2309.13803
  • repo_url: https://github.com/hieu9955/ggggg
  • paper_authors: Mihail-Iulian Plesa, Marian Gheorghe, Florentin Ipate
  • for: 这个论文旨在提出一种基于生物神经元的启发式计算模型,以及这种模型在不同领域的应用,如正式验证、人工智能和加密等。
  • methods: 作者提出了一种基于SN P系统的隐私保护协议,允许客户端使用远程服务器来计算线性函数,而无需把函数参数和结果泄露给服务器。
  • results: 作者采用了SN P系统实现任意自然数上的线性函数,并评估了协议的安全性在“诚实但偷 curios”安全模型下。
    Abstract Spiking Neural P systems are a class of membrane computing models inspired directly by biological neurons. Besides the theoretical progress made in this new computational model, there are also numerous applications of P systems in fields like formal verification, artificial intelligence, or cryptography. Motivated by all the use cases of SN P systems, in this paper, we present a new privacy-preserving protocol that enables a client to compute a linear function using an SN P system hosted on a remote server. Our protocol allows the client to use the server to evaluate functions of the form t_1k + t_2 without revealing t_1, t_2 or k and without the server knowing the result. We also present an SN P system to implement any linear function over natural numbers and some security considerations of our protocol in the honest-but-curious security model.
    摘要 ��Spiking Neural P Systems是一种基于生物神经元的计算模型。除了这种新的计算模型的理论进步之外,P系统还有很多应用于领域如正式验证、人工智能和加密等。在这篇论文中,我们提出了一种新的隐私保护协议,使得客户可以使用远程服务器上的SN P系统来计算函数形式为t_1k + t_2,而无需抛出t_1, t_2或k的报告,也无需服务器知道结果。我们还提出了一种实现任意自然数上的线性函数的SN P系统,以及一些安全考虑在诚实但叛逆安全模型中。

Can LLM-Generated Misinformation Be Detected?

  • paper_url: http://arxiv.org/abs/2309.13788
  • repo_url: https://github.com/llm-misinformation/llm-misinformation
  • paper_authors: Canyu Chen, Kai Shu
  • for: investigates whether LLM-generated misinformation can cause more harm than human-written misinformation
  • methods: builds a taxonomy of LLM-generated misinformation and categorizes potential real-world methods for generating misinformation with LLMs, and employs extensive empirical investigation to study the detection difficulty of LLM-generated misinformation
  • results: discovers that LLM-generated misinformation can be harder to detect for humans and detectors compared to human-written misinformation with the same semantics, suggesting it can have more deceptive styles and potentially cause more harm
    Abstract The advent of Large Language Models (LLMs) has made a transformative impact. However, the potential that LLMs such as ChatGPT can be exploited to generate misinformation has posed a serious concern to online safety and public trust. A fundamental research question is: will LLM-generated misinformation cause more harm than human-written misinformation? We propose to tackle this question from the perspective of detection difficulty. We first build a taxonomy of LLM-generated misinformation. Then we categorize and validate the potential real-world methods for generating misinformation with LLMs. Then, through extensive empirical investigation, we discover that LLM-generated misinformation can be harder to detect for humans and detectors compared to human-written misinformation with the same semantics, which suggests it can have more deceptive styles and potentially cause more harm. We also discuss the implications of our discovery on combating misinformation in the age of LLMs and the countermeasures.
    摘要 LLMs 的出现对社交媒体和网络安全带来了重大影响。然而, LLMS 如 ChatGPT 可能会被滥用来生成谣言,这对于在线安全和公众信任造成了严重的问题。我们提出了一个基本研究问题: LLMS 生成的谣言是人类写的谣言更可能导致更多的伤害吗?我们从检测困难性的角度来回答这个问题。我们首先构建了 LLMS 生成谣言的分类体系,然后将可能在实际情况下使用 LLMS 生成谣言的方法分类和验证。经过广泛的实验研究,我们发现 LLMS 生成的谣言比人类写的谣言更难以检测,它们可能具有更多的欺骗性和误导性,从而可能导致更多的伤害。我们还讨论了我们的发现对于在 LLMS 时代战击谣言的应用和对策的影响。

On the Computational Benefit of Multimodal Learning

  • paper_url: http://arxiv.org/abs/2309.13782
  • repo_url: None
  • paper_authors: Zhou Lu
  • for: 本研究目的是调查多模态学习是否具有计算优势?
  • methods: 我们使用一种基于 intersecting two half-spaces 问题的新修改来实现多模态学习。
  • results: 我们发现,在某些条件下,多模态学习可以在计算上赶超单模态学习,具体来说是可以在幂时间内解决NP困难的学习任务。
    Abstract Human perception inherently operates in a multimodal manner. Similarly, as machines interpret the empirical world, their learning processes ought to be multimodal. The recent, remarkable successes in empirical multimodal learning underscore the significance of understanding this paradigm. Yet, a solid theoretical foundation for multimodal learning has eluded the field for some time. While a recent study by Lu (2023) has shown the superior sample complexity of multimodal learning compared to its unimodal counterpart, another basic question remains: does multimodal learning also offer computational advantages over unimodal learning? This work initiates a study on the computational benefit of multimodal learning. We demonstrate that, under certain conditions, multimodal learning can outpace unimodal learning exponentially in terms of computation. Specifically, we present a learning task that is NP-hard for unimodal learning but is solvable in polynomial time by a multimodal algorithm. Our construction is based on a novel modification to the intersection of two half-spaces problem.
    摘要 人类感知自然地进行多模态的处理。 Similarly,机器在观察现实世界时,其学习过程也应该是多模态的。当前, empirical multimodal learning的成功表明了这个思想的重要性。然而,领域中对多模态学习的基础理论还没有得到充分的解决。一些研究(如Lu 2023)已经表明了多模态学习的样本复杂性比单模态学习更低,但另一个基本问题仍然没有得到答案:多模态学习是否也提供了计算上的优势?本研究开始了对多模态学习的计算优势的研究。我们证明,在某些条件下,多模态学习可以在计算上赶超单模态学习,并且可以在计算时间方面呈指数增长。特别是,我们提出了一个NP困难的学习任务,但是通过多模态算法可以在 polynomial time 内解决。我们的构造基于一种新的两个半空间的交叉问题的修改。

Explainable Machine Learning for ICU Readmission Prediction

  • paper_url: http://arxiv.org/abs/2309.13781
  • repo_url: None
  • paper_authors: Alex G. C. de Sá, Daniel Gould, Anna Fedyukova, Mitchell Nicholas, Lucy Dockrell, Calvin Fletcher, David Pilcher, Daniel Capurro, David B. Ascher, Khaled El-Khawas, Douglas E. V. Pires
  • For: The paper aims to develop a standardized and explainable machine learning pipeline to predict patient readmission in the intensive care unit (ICU) using a multicentric database.* Methods: The paper uses a machine learning approach with a Random Forest classification model to predict patient readmission, and validates the model on both monocentric and multicentric settings. The authors also provide explanations for the constructed models to derive insightful conclusions.* Results: The paper achieves predictive performance with an area under the receiver operating characteristic curve (AUC) up to 0.7, and demonstrates good calibration and consistency on validation sets. The authors also identify a set of variables related to vital signs, blood tests, demographics, and ICU-associated variables that are associated with patient readmission.
    Abstract The intensive care unit (ICU) comprises a complex hospital environment, where decisions made by clinicians have a high level of risk for the patients' lives. A comprehensive care pathway must then be followed to reduce p complications. Uncertain, competing and unplanned aspects within this environment increase the difficulty in uniformly implementing the care pathway. Readmission contributes to this pathway's difficulty, occurring when patients are admitted again to the ICU in a short timeframe, resulting in high mortality rates and high resource utilisation. Several works have tried to predict readmission through patients' medical information. Although they have some level of success while predicting readmission, those works do not properly assess, characterise and understand readmission prediction. This work proposes a standardised and explainable machine learning pipeline to model patient readmission on a multicentric database (i.e., the eICU cohort with 166,355 patients, 200,859 admissions and 6,021 readmissions) while validating it on monocentric (i.e., the MIMIC IV cohort with 382,278 patients, 523,740 admissions and 5,984 readmissions) and multicentric settings. Our machine learning pipeline achieved predictive performance in terms of the area of the receiver operating characteristic curve (AUC) up to 0.7 with a Random Forest classification model, yielding an overall good calibration and consistency on validation sets. From explanations provided by the constructed models, we could also derive a set of insightful conclusions, primarily on variables related to vital signs and blood tests (e.g., albumin, blood urea nitrogen and hemoglobin levels), demographics (e.g., age, and admission height and weight), and ICU-associated variables (e.g., unit type). These insights provide an invaluable source of information during clinicians' decision-making while discharging ICU patients.
    摘要 医院快速病区(ICU)是一个复杂的医疗环境,医生的决策对患者生命的风险很高。为了降低复杂性和风险,一个完整的护理路径必须采取。不确定、竞争和不计划的因素在这个环境中增加了困难,使得一致性地实施护理路径变得更加困难。重复入院是这个护理路径的一个主要障碍物,入院次数较多,导致高死亡率和资源利用率高。许多研究已经尝试预测重复入院,但这些研究并没有充分评估、描述和理解重复入院预测。本文提出了一个标准化和可解释的机器学习管道,用于在多中心数据库(i.e., eICU cohort)上预测患者重复入院,并在多中心和单中心设置上验证。我们的机器学习管道在预测重复入院方面达到了AUC0.7的水平,并在验证集上表现出了一致性和准确性。从构建的模型中提供的解释中,我们也得到了一些有价值的结论,主要关注于生命指标和血液测试(如蛋白质、尿氨酸和血红细胞含量)、人口学特征(如年龄和入院高度和重量)以及ICU相关变量(如单元类型)。这些结论可以提供医生决策时的很有价值的信息。