cs.AI - 2023-07-28

Evaluating the structure of cognitive tasks with transfer learning

  • paper_url: http://arxiv.org/abs/2308.02408
  • repo_url: None
  • paper_authors: Bruno Aristimunha, Raphael Y. de Camargo, Walter H. Lopez Pinaya, Sylvain Chevallier, Alexandre Gramfort, Cedric Rommel
  • for: 这项研究旨在 investigate deep learning representations 的可传递性在不同的EEG解oding任务中。
  • methods: 研究者使用了现有的decoding模型和两个最新发布的EEG数据集(ERP CORE和M$^3$CV),包含140个人和11种不同的认知任务。他们测试了在一个任务上预训练深度神经网络后,其能够decode到后续任务的能力。
  • results: 研究结果表明,即使使用线性探测传输,也可以获得显著的提高,与纯粹的超vised方法相比,提高了最多28%。此外,研究者发现了一些解oding方案会释放特定和窄的脑活动,而其他解oding方案则需要预训练在广泛的表征上。这些发现有助于解决EEG解oding中的数据稀缺问题,并且提供了减少数据稀缺的实际应用。同时,生成的传输图也提供了认知任务之间的层次关系的理解,从 neuroscientific 的角度来看。
    Abstract Electroencephalography (EEG) decoding is a challenging task due to the limited availability of labelled data. While transfer learning is a promising technique to address this challenge, it assumes that transferable data domains and task are known, which is not the case in this setting. This study investigates the transferability of deep learning representations between different EEG decoding tasks. We conduct extensive experiments using state-of-the-art decoding models on two recently released EEG datasets, ERP CORE and M$^3$CV, containing over 140 subjects and 11 distinct cognitive tasks. We measure the transferability of learned representations by pre-training deep neural networks on one task and assessing their ability to decode subsequent tasks. Our experiments demonstrate that, even with linear probing transfer, significant improvements in decoding performance can be obtained, with gains of up to 28% compare with the pure supervised approach. Additionally, we discover evidence that certain decoding paradigms elicit specific and narrow brain activities, while others benefit from pre-training on a broad range of representations. By revealing which tasks transfer well and demonstrating the benefits of transfer learning for EEG decoding, our findings have practical implications for mitigating data scarcity in this setting. The transfer maps generated also provide insights into the hierarchical relations between cognitive tasks, hence enhancing our understanding of how these tasks are connected from a neuroscientific standpoint.
    摘要 电enzephalography(EEG)解oding是一个具有挑战性的任务,主要因为数据的有限可用性。而转移学习是一种有前途的技术,可以解决这个问题,但它假设了可以确定的数据领域和任务,而这不是这个设置的情况。这个研究探讨了EEG解oding任务之间的转移学习表示的可行性。我们在两个最新发布的EEG数据集,ERP CORE和M$^3$CV中,使用现有的解oding模型进行了广泛的实验。我们测量了转移学习中学习的表示的可行性,通过在一个任务上预训练深度神经网络,然后评估它对后续任务的解码性能的能力。我们的实验结果表明,即使使用线性探索传输,也可以获得显著改善,与纯粹supervised方法相比,改善率可达28%。此外,我们发现了一些解oding方法引起特定和窄的脑活动,而其他方法则受到预训练在广泛的表示上的 beneficial。我们的发现可以有实际意义,帮助解决EEG解oding数据的缺乏问题,同时还可以提供有关认知科学方面的task之间的层次关系的新的视角。

We are all Individuals: The Role of Robot Personality and Human Traits in Trustworthy Interaction

  • paper_url: http://arxiv.org/abs/2307.15568
  • repo_url: None
  • paper_authors: Mei Yii Lim, José David Aguas Lopes, David A. Robb, Bruce W. Wilson, Meriam Moujahid, Emanuele De Pellegrin, Helen Hastie
  • for: 这个论文旨在研究 робо类型在人类社会中的表现,以及人类对 robot的偏好和信任度。
  • methods: 该论文采用了量化和质化的方法,通过 vocal cues 和语言特征来描述 robot 的个性,并通过询问参与者对不同 robot 个性的偏好和信任度来评估 robot 的表现。
  • results: 研究发现,在 Robo-Barista 中, extrovert robot 被人类参与者更加信任和喜欢,无论参与者自己的人格 trait 如何。此外,研究还发现,人类对 robot 的态度和先天偏好对 human-robot interaction Study 中的信任度有重要影响。
    Abstract As robots take on roles in our society, it is important that their appearance, behaviour and personality are appropriate for the job they are given and are perceived favourably by the people with whom they interact. Here, we provide an extensive quantitative and qualitative study exploring robot personality but, importantly, with respect to individual human traits. Firstly, we show that we can accurately portray personality in a social robot, in terms of extroversion-introversion using vocal cues and linguistic features. Secondly, through garnering preferences and trust ratings for these different robot personalities, we establish that, for a Robo-Barista, an extrovert robot is preferred and trusted more than an introvert robot, regardless of the subject's own personality. Thirdly, we find that individual attitudes and predispositions towards robots do impact trust in the Robo-Baristas, and are therefore important considerations in addition to robot personality, roles and interaction context when designing any human-robot interaction study.
    摘要 As robots take on roles in our society, it is important that their appearance, behavior, and personality are appropriate for the job they are given and are perceived favorably by the people with whom they interact. Here, we provide an extensive quantitative and qualitative study exploring robot personality, but importantly, with respect to individual human traits. Firstly, we show that we can accurately portray personality in a social robot, in terms of extroversion-introversion using vocal cues and linguistic features. Secondly, through garnering preferences and trust ratings for these different robot personalities, we establish that, for a Robo-Barista, an extrovert robot is preferred and trusted more than an introvert robot, regardless of the subject's own personality. Thirdly, we find that individual attitudes and predispositions towards robots do impact trust in the Robo-Baristas, and are therefore important considerations in addition to robot personality, roles, and interaction context when designing any human-robot interaction study.

Few-shot Image Classification based on Gradual Machine Learning

  • paper_url: http://arxiv.org/abs/2307.15524
  • repo_url: None
  • paper_authors: Na Chen, Xianming Kuang, Feiyu Liu, Kehao Wang, Qun Chen
  • for: 这个论文的目的是提高几个标注样本的图像分类精度。
  • methods: 这个论文使用非同一个分布(Non-i.i.d)的渐进机器学习(GML)方法,从只有几个标注样本开始,然后逐渐将目标图像标注为增加难度的顺序,通过迭代因子推理在因子图中。
  • results: 该方法可以提高比较精度(SOTA)性能,在测试集上进行了比较研究,并证明了其在图像分类任务中的优越性。特别是,该方法可以在Query集大小增加时,保持性能的提高,而深度模型的性能则很可能会保持不变或者变差。
    Abstract Few-shot image classification aims to accurately classify unlabeled images using only a few labeled samples. The state-of-the-art solutions are built by deep learning, which focuses on designing increasingly complex deep backbones. Unfortunately, the task remains very challenging due to the difficulty of transferring the knowledge learned in training classes to new ones. In this paper, we propose a novel approach based on the non-i.i.d paradigm of gradual machine learning (GML). It begins with only a few labeled observations, and then gradually labels target images in the increasing order of hardness by iterative factor inference in a factor graph. Specifically, our proposed solution extracts indicative feature representations by deep backbones, and then constructs both unary and binary factors based on the extracted features to facilitate gradual learning. The unary factors are constructed based on class center distance in an embedding space, while the binary factors are constructed based on k-nearest neighborhood. We have empirically validated the performance of the proposed approach on benchmark datasets by a comparative study. Our extensive experiments demonstrate that the proposed approach can improve the SOTA performance by 1-5% in terms of accuracy. More notably, it is more robust than the existing deep models in that its performance can consistently improve as the size of query set increases while the performance of deep models remains essentially flat or even becomes worse.
    摘要 《几个样本图像分类》目的是使用只有几个标注样本来高精度地分类无标注图像。现状的解决方案基于深度学习,强调设计越来越复杂的深度背bone。然而,任务仍然非常困难,因为在训练类别之间传递知识的困难。在这篇论文中,我们提出了一种新的方法,基于异步度学习(GML)的异步机器学习(gradual machine learning)的 paradigm。它从只有几个标注样本开始,然后逐渐将目标图像标注为增加难度的顺序,通过迭代因子推理在因子图中。具体来说,我们的提出的方法首先提取特征表示,然后根据提取的特征构建 both unicode 和二进制因子,以便进行慢学习。unicode 因子是根据 embedding 空间中的类中心距离构建的,而二进制因子是根据 k-最近邻居构建的。我们在标准 benchmark 数据集上进行了比较研究,并经验 validate 了我们的方法的性能。我们的广泛的实验表明,我们的方法可以提高 SOTA 性能,并且比现有的深度模型更加稳定。 Specifically, our proposed approach can improve the SOTA performance by 1-5% in terms of accuracy, and it is more robust than the existing deep models in that its performance can consistently improve as the size of query set increases while the performance of deep models remains essentially flat or even becomes worse.Note: "SOTA" stands for "State of the Art", which means the current best performance in a particular field or task.

Revisiting Fully Convolutional Geometric Features for Object 6D Pose Estimation

  • paper_url: http://arxiv.org/abs/2307.15514
  • repo_url: None
  • paper_authors: Jaime Corsetti, Davide Boscaini, Fabio Poiesi
  • for: 6D object pose estimation
  • methods: Fully Convolutional Geometric Features (FCGF) with sparse convolutions and hardest contrastive loss, and key modifications to the loss and input data representations, as well as careful tuning of training strategies and data augmentations
  • results: state-of-the-art performance on popular benchmarks, with outperformance of recent competitors
    Abstract Recent works on 6D object pose estimation focus on learning keypoint correspondences between images and object models, and then determine the object pose through RANSAC-based algorithms or by directly regressing the pose with end-to-end optimisations. We argue that learning point-level discriminative features is overlooked in the literature. To this end, we revisit Fully Convolutional Geometric Features (FCGF) and tailor it for object 6D pose estimation to achieve state-of-the-art performance. FCGF employs sparse convolutions and learns point-level features using a fully-convolutional network by optimising a hardest contrastive loss. We can outperform recent competitors on popular benchmarks by adopting key modifications to the loss and to the input data representations, by carefully tuning the training strategies, and by employing data augmentations suitable for the underlying problem. We carry out a thorough ablation to study the contribution of each modification.
    摘要 最近的6D物体 pose 估计研究关注学习图像和物体模型之间关键点匹配,然后使用RANSAC算法或直接使用整体优化来确定物体pose。我们认为在文献中学习点级特征是被忽略的。为此,我们回顾了全面卷积Geometric Features(FCGF),并对其进行修改以适应物体6D pose估计,以达到领先的性能。FCGF使用稀疏卷积,通过全面卷积网络学习点级特征,并通过最难的对比损失来优化。我们可以通过采用修改loss和输入数据表示,精心调整训练策略,以及适合下面问题的数据增强来超越最近的竞争对手。我们进行了严格的拟合来研究每个修改的贡献。

Exploring Format Consistency for Instruction Tuning

  • paper_url: http://arxiv.org/abs/2307.15504
  • repo_url: None
  • paper_authors: Shihao Liang, Kunlun Zhu, Runchu Tian, Yujia Qin, Huadong Wang, Xin Cong, Zhiyuan Liu, Xiaojiang Liu, Maosong Sun
  • for: 提高大语言模型 seguir las instrucciones de los humanos
  • methods: 使用 OpenAI APIs 自动转换format instruction tuning 数据集,并提出一种基于抽象搅拌的干扰除法以降低自动转换中的噪音
  • results: 研究表明,UIT 框架可以提高 instruction tuning 中的泛化性能,并且在实际应用中可以降低成本Here is the full text in Simplified Chinese, with the three key points highlighted:
  • for: 本文旨在提高大语言模型 seguir las instrucciones de los humanos,具体来说是通过增加不同 instrucciones 和 instrucciones 集合的训练数据来提高模型的泛化性能。
  • methods: 我们提出了一种基于 OpenAI APIs 的自动转换format instruction tuning 数据集的框架,并提出了一种基于抽象搅拌的干扰除法以降低自动转换中的噪音。
  • results: 我们的研究表明,UIT 框架可以提高 instruction tuning 中的泛化性能,并且在实际应用中可以降低成本。
    Abstract Instruction tuning has emerged as a promising approach to enhancing large language models in following human instructions. It is shown that increasing the diversity and number of instructions in the training data can consistently enhance generalization performance, which facilitates a recent endeavor to collect various instructions and integrate existing instruction tuning datasets into larger collections. However, different users have their unique ways of expressing instructions, and there often exist variations across different datasets in the instruction styles and formats, i.e., format inconsistency. In this work, we study how format inconsistency may impact the performance of instruction tuning. We propose a framework called "Unified Instruction Tuning" (UIT), which calls OpenAI APIs for automatic format transfer among different instruction tuning datasets. We show that UIT successfully improves the generalization performance on unseen instructions, which highlights the importance of format consistency for instruction tuning. To make the UIT framework more practical, we further propose a novel perplexity-based denoising method to reduce the noise of automatic format transfer. We also train a smaller offline model that achieves comparable format transfer capability than OpenAI APIs to reduce costs in practice.
    摘要 translate_text="Instruction tuning has emerged as a promising approach to enhancing large language models in following human instructions. It is shown that increasing the diversity and number of instructions in the training data can consistently enhance generalization performance, which facilitates a recent endeavor to collect various instructions and integrate existing instruction tuning datasets into larger collections. However, different users have their unique ways of expressing instructions, and there often exist variations across different datasets in the instruction styles and formats, i.e., format inconsistency. In this work, we study how format inconsistency may impact the performance of instruction tuning. We propose a framework called "Unified Instruction Tuning" (UIT), which calls OpenAI APIs for automatic format transfer among different instruction tuning datasets. We show that UIT successfully improves the generalization performance on unseen instructions, which highlights the importance of format consistency for instruction tuning. To make the UIT framework more practical, we further propose a novel perplexity-based denoising method to reduce the noise of automatic format transfer. We also train a smaller offline model that achieves comparable format transfer capability than OpenAI APIs to reduce costs in practice."Here's the translation in Simplified Chinese: instrucion 调整有 emerged 为大语言模型遵循人类指令的一种有前途的方法。增加多样性和数量的指令在训练数据中可以一直提高总体性能,这有助于最近的努力,收集各种指令并将现有的指令调整数据集合入大型集合。然而,不同的用户有各自的指令表达方式,而指令集合中的指令风格和格式经常存在差异,即格式不一致。在这种情况下,我们研究了格式不一致如何影响指令调整的性能。我们提出了一个名为 "统一指令调整"(UIT)的框架,通过OpenAI API进行自动格式传输。我们发现,UIT可以成功地提高未见指令的总体性能,这说明了格式一致性对指令调整的重要性。为了使UIT框架更实用,我们进一步提出了一种基于抽象率的减噪方法,以减少自动格式传输中的噪声。此外,我们还训练了一个较小的离线模型,可以实现与OpenAI API相同的格式传输能力,以降低在实践中的成本。

Curiosity-Driven Reinforcement Learning based Low-Level Flight Control

  • paper_url: http://arxiv.org/abs/2307.15724
  • repo_url: https://github.com/a-ramezani/cdrl-l2fc_u_hcm
  • paper_authors: Amir Ramezani Dooraki, Alexandros Iosifidis
  • for: 这个论文的目的是提出一种基于好奇性的自主学习算法,用于控制quadcopter navigating through obstacles。
  • methods: 该算法使用了好奇性的prediction errorapproach,并与基于奖励学习的算法结合使用。
  • results: 测试结果显示,该算法可以学习优化策略,最大化奖励,其他算法无法达成的目标。
    Abstract Curiosity is one of the main motives in many of the natural creatures with measurable levels of intelligence for exploration and, as a result, more efficient learning. It makes it possible for humans and many animals to explore efficiently by searching for being in states that make them surprised with the goal of learning more about what they do not know. As a result, while being curious, they learn better. In the machine learning literature, curiosity is mostly combined with reinforcement learning-based algorithms as an intrinsic reward. This work proposes an algorithm based on the drive of curiosity for autonomous learning to control by generating proper motor speeds from odometry data. The quadcopter controlled by our proposed algorithm can pass through obstacles while controlling the Yaw direction of the quad-copter toward the desired location. To achieve that, we also propose a new curiosity approach based on prediction error. We ran tests using on-policy, off-policy, on-policy plus curiosity, and the proposed algorithm and visualized the effect of curiosity in evolving exploration patterns. Results show the capability of the proposed algorithm to learn optimal policy and maximize reward where other algorithms fail to do so.
    摘要 寻Curiosity是许多自然 creature 的主要动机,包括许多智能生物,以探索和学习为主要目的。它使得人类和许多动物能够效率地探索,并且在过程中获得更多的知识。在机器学习文献中,Curiosity通常与征募学习算法结合,作为自然选择的一种内生奖励。本工作提出一个基于寻Curiosity的自主学习控制算法,可以将四辐游戏机器人控制到避免障碍而飞行,并且控制机器人的转向方向以 дости其目标位置。为了实现这一目标,我们还提出了一种新的寻Curiosity方法,基于预测误差。我们在实验中使用了在政策、不在政策、在政策加上寻Curiosity、以及我们的提案中进行试验,并将寻Curiosity的影响在演化探索模式中 visualized。结果显示了我们的提案算法可以学习并实现最佳策略,其他算法无法实现。

ETHER: Aligning Emergent Communication for Hindsight Experience Replay

  • paper_url: http://arxiv.org/abs/2307.15494
  • repo_url: None
  • paper_authors: Kevin Denamganaï, Daniel Hernandez, Ozan Vardal, Sondess Missaoui, James Alfred Walker
  • for: 本研究旨在提高自然语言指令驱动的人工智能机器人的合作能力。
  • methods: 本研究使用自然语言conditioned reinforcement learning(RL)agent,利用自然语言的特性,如 компози�,提供强 inductive bias 来学习复杂的策略。
  • results: 研究表明,通过使用 referential game 作为 auxiliary task,可以使RL Agent 更好地利用语言信息,并且可以在不具备 oracle predicate function 的情况下提高性能和数据效率。
    Abstract Natural language instruction following is paramount to enable collaboration between artificial agents and human beings. Natural language-conditioned reinforcement learning (RL) agents have shown how natural languages' properties, such as compositionality, can provide a strong inductive bias to learn complex policies. Previous architectures like HIGhER combine the benefit of language-conditioning with Hindsight Experience Replay (HER) to deal with sparse rewards environments. Yet, like HER, HIGhER relies on an oracle predicate function to provide a feedback signal highlighting which linguistic description is valid for which state. This reliance on an oracle limits its application. Additionally, HIGhER only leverages the linguistic information contained in successful RL trajectories, thus hurting its final performance and data-efficiency. Without early successful trajectories, HIGhER is no better than DQN upon which it is built. In this paper, we propose the Emergent Textual Hindsight Experience Replay (ETHER) agent, which builds on HIGhER and addresses both of its limitations by means of (i) a discriminative visual referential game, commonly studied in the subfield of Emergent Communication (EC), used here as an unsupervised auxiliary task and (ii) a semantic grounding scheme to align the emergent language with the natural language of the instruction-following benchmark. We show that the referential game's agents make an artificial language emerge that is aligned with the natural-like language used to describe goals in the BabyAI benchmark and that it is expressive enough so as to also describe unsuccessful RL trajectories and thus provide feedback to the RL agent to leverage the linguistic, structured information contained in all trajectories. Our work shows that EC is a viable unsupervised auxiliary task for RL and provides missing pieces to make HER more widely applicable.
    摘要 自然语言指导following是人工智能和人类合作的关键。自然语言conditioned reinforcement learning(RL)代理人们已经证明了自然语言的特性,如compositionality,可以为学习复杂政策提供强大的逻辑导向。先前的架构如HIGhER将语言conditioning与Hindsight Experience Replay(HER)结合以处理罕见奖励环境。然而,如HER,HIGhER依赖oracle predicate函数提供一个反馈信号,用于指示哪些语言描述是哪个状态的有效描述。这种依赖oracle限制了其应用。此外,HIGhER只利用RL trajectory中的语言信息,因此在最终性和数据效率方面受到限制。在absence of early successful trajectories,HIGhER与DQN相比,没有优势。在这篇论文中,我们提出了Emergent Textual Hindsight Experience Replay(ETHER)代理人,它基于HIGhER并解决了它的两个限制。我们使用了一种推理视觉游戏,通常在Emergent Communication(EC)中被研究,作为一种无监督任务。此外,我们还使用了一种semantic grounding scheme来将自然语言与RL benchmark中的目标描述相对应。我们发现,EC中的代理人在学习一种与自然语言相关的人工语言,并且这种语言足够表达,以便描述失败的RL trajectory,并提供给RL代理人以利用语言、结构化信息来改进性能。我们的工作表明,EC是一种可靠的无监督任务,可以为RL提供 missing pieces,使HER更加广泛应用。

A Semantic Approach to Decidability in Epistemic Planning (Extended Version)

  • paper_url: http://arxiv.org/abs/2307.15485
  • repo_url: None
  • paper_authors: Alessandro Burigana, Paolo Felli, Marco Montali, Nicolas Troquard
  • for: 这篇论文主要探讨了在多智能计划中使用动态纽标逻辑(DEL)的可 decidability问题。
  • methods: 作者采用了一种新的semantic方法来实现可 decidability,而不是通过语法上的限制。具体来说,作者增强了知识逻辑S5$_n$的axioms,并添加了一个交互axioms(知识共同性),以控制代理人对别人知识的无限推理能力。
  • results: 作者首先证明了这种epistemic planning问题是可 decidable的。此外,作者还研究了不同的通常性axioms的推广,以实现更expressive的DEL Fragment的可 decidability。最后,作者证明了两个常见的epistemic planning系统,基于action templates,在知识下的设定下是可 decidable的。
    Abstract The use of Dynamic Epistemic Logic (DEL) in multi-agent planning has led to a widely adopted action formalism that can handle nondeterminism, partial observability and arbitrary knowledge nesting. As such expressive power comes at the cost of undecidability, several decidable fragments have been isolated, mainly based on syntactic restrictions of the action formalism. In this paper, we pursue a novel semantic approach to achieve decidability. Namely, rather than imposing syntactical constraints, the semantic approach focuses on the axioms of the logic for epistemic planning. Specifically, we augment the logic of knowledge S5$_n$ and with an interaction axiom called (knowledge) commutativity, which controls the ability of agents to unboundedly reason on the knowledge of other agents. We then provide a threefold contribution. First, we show that the resulting epistemic planning problem is decidable. In doing so, we prove that our framework admits a finitary non-fixpoint characterization of common knowledge, which is of independent interest. Second, we study different generalizations of the commutativity axiom, with the goal of obtaining decidability for more expressive fragments of DEL. Finally, we show that two well-known epistemic planning systems based on action templates, when interpreted under the setting of knowledge, conform to the commutativity axiom, hence proving their decidability.
    摘要 使用动态эпистемологи(DEL)在多代理规划中得到了广泛采用的行动 formalism,可以处理不确定性、部分可见性和嵌套知识。然而,这么高度表达力带来了不可解决性问题,因此有很多可 decidable fragments 已经被隔离出来,主要基于动态 formalism 的语法约束。在这篇论文中,我们采用一种新的semantic方法来实现可解决性。具体来说,我们在知识逻辑S5$_n$中添加了一个交互axioms(知识 commutativity),该axioms控制代理者对别人知识的无限推理能力。然后,我们提供了三项贡献:1. 我们表明了这种epistemic planning问题的可解决性。在这个过程中,我们证明了我们的框架有一个 finitary non-fixpoint characterization of common knowledge,这是独立的有趣的。2. 我们研究了不同的generalisations of the commutativity axiom,以实现更expressive fragments of DEL的可解决性。3. 我们证明了两个常见的epistemic planning系统,基于action templates,在知识下被解释,符合 commutativity axiom,因此其可解决性。

Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding

  • paper_url: http://arxiv.org/abs/2307.15484
  • repo_url: None
  • paper_authors: Chunyu Qiang, Hao Li, Hao Ni, He Qu, Ruibo Fu, Tao Wang, Longbiao Wang, Jianwu Dang
  • for: 这个论文旨在提出一种基于扩散模型和语言模型的文本译语音系统,以提高文本译语音的质量和自然性。
  • methods: 该论文使用了两种不同类型的扩散Speech表示,并使用两个序列到序列任务来解耦文本译语音。它还引入了一个提示编码结构,以提高提示表示能力。
  • results: 实验结果显示,该论文提出的方法比基eline方法表现出色,并提供了一个网站的音频样本。
    Abstract Recently, there has been a growing interest in text-to-speech (TTS) methods that can be trained with minimal supervision by combining two types of discrete speech representations and using two sequence-to-sequence tasks to decouple TTS. To address the challenges associated with high dimensionality and waveform distortion in discrete representations, we propose Diff-LM-Speech, which models semantic embeddings into mel-spectrogram based on diffusion models and introduces a prompt encoder structure based on variational autoencoders and prosody bottlenecks to improve prompt representation capabilities. Autoregressive language models often suffer from missing and repeated words, while non-autoregressive frameworks face expression averaging problems due to duration prediction models. To address these issues, we propose Tetra-Diff-Speech, which designs a duration diffusion model to achieve diverse prosodic expressions. While we expect the information content of semantic coding to be between that of text and acoustic coding, existing models extract semantic coding with a lot of redundant information and dimensionality explosion. To verify that semantic coding is not necessary, we propose Tri-Diff-Speech. Experimental results show that our proposed methods outperform baseline methods. We provide a website with audio samples.
    摘要 最近,有越来越多关注可以通过最小监督学习的文本识别(TTS)方法。我们提议一种叫做Diff-LM-Speech的方法,它利用扩散模型将含义编码作为mel-spectrogram中的semantic embedding,并通过变量自动编码器和谱瓣瓶逻辑来改善提示表示能力。而autoregressive语言模型经常会出现缺失和重复的单词问题,而非autoregressive框架则会面临表达均衡问题,这是因为duration预测模型的问题。为解决这些问题,我们提议Tetra-Diff-Speech,它使用扩散模型来实现多种表达方式的多样性。尽管我们预期含义编码的信息内容在文本和音频编码之间,现有的模型通常会提取很多无用的信息和维度爆炸。为验证这一点,我们提议Tri-Diff-Speech。实验结果表明,我们的提议方法在比基eline方法有更好的表现。我们提供了一个网站,包含了各种音频样本。

Non-invasive Diabetes Detection using Gabor Filter: A Comparative Analysis of Different Cameras

  • paper_url: http://arxiv.org/abs/2307.15480
  • repo_url: None
  • paper_authors: Christina A. Garcia, Patricia Angela R. Abu, Rosula SJ. Reyes
  • for: 这个论文旨在比较和探讨使用移动设备摄像头和笔记型电脑摄像头来捕捉非侵入性诊断糖尿病(DM)的图像,并使用facial block texture特征进行识别。
  • methods: 该论文使用了12mp和7mp移动设备摄像头以及笔记型电脑摄像头,在正常照明下拍摄图像。 extracted facial blocks被分类使用k-最近邻和支持向量机。
  • results: 系统的性能被测量为准确率96.7%,特异性93%和敏感性100%,最佳性能来自12mp后置摄像头使用支持向量机,使用100张图像。
    Abstract This paper compares and explores the performance of both mobile device camera and laptop camera as convenient tool for capturing images for non-invasive detection of Diabetes Mellitus (DM) using facial block texture features. Participants within age bracket 20 to 79 years old were chosen for the dataset. 12mp and 7mp mobile cameras, and a laptop camera were used to take the photo under normal lighting condition. Extracted facial blocks were classified using k-Nearest Neighbors (k-NN) and Support Vector Machine (SVM). 100 images were captured, preprocessed, filtered using Gabor, and iterated. Performance of the system was measured in terms of accuracy, specificity, and sensitivity. Best performance of 96.7% accuracy, 100% sensitivity, and 93% specificity were achieved from 12mp back camera using SVM with 100 images.
    摘要 这篇论文比较了手持设备摄像头和笔记型电脑摄像头作为轻便的捕捉照片用于不侵入性诊断糖尿病(DM)的方法。选择的参与者年龄在20岁至79岁之间。使用1200万像素和700万像素手持设备摄像头以及笔记型电脑摄像头,在正常照明条件下拍摄照片。提取的脸部块被分类使用k-最近邻和支持向量机(SVM)。 captured 100 张照片,预处理、筛选using Gabor, iterated。系统性能测量的指标包括准确率、特异性和敏感度。使用1200万像素后摄像头和SVM, achieved 96.7%的准确率、100%的敏感度和93%的特异性。

FeedbackLogs: Recording and Incorporating Stakeholder Feedback into Machine Learning Pipelines

  • paper_url: http://arxiv.org/abs/2307.15475
  • repo_url: None
  • paper_authors: Matthew Barker, Emma Kallina, Dhananjay Ashok, Katherine M. Collins, Ashley Casovan, Adrian Weller, Ameet Talwalkar, Valerie Chen, Umang Bhatt
  • for: 这个论文是为了提供一种方法来记录和 incorporate 多个潜在参与者的反馈,以便更好地了解 ML 管道的影响。
  • methods: 这篇论文提出了一种名为 FeedbackLogs 的新方法,用于跟踪 ML 管道中不同参与者的反馈。每个 FeedbackLog 都记录了反馈收集过程中的重要细节,以及反馈本身和如何将反馈纳入 ML 管道中。
  • results: 这篇论文提供了一些具体的使用案例,例如使用 FeedbackLogs 作为算法审核的证据,以及用于记录基于潜在参与者反馈的更新。
    Abstract Even though machine learning (ML) pipelines affect an increasing array of stakeholders, there is little work on how input from stakeholders is recorded and incorporated. We propose FeedbackLogs, addenda to existing documentation of ML pipelines, to track the input of multiple stakeholders. Each log records important details about the feedback collection process, the feedback itself, and how the feedback is used to update the ML pipeline. In this paper, we introduce and formalise a process for collecting a FeedbackLog. We also provide concrete use cases where FeedbackLogs can be employed as evidence for algorithmic auditing and as a tool to record updates based on stakeholder feedback.
    摘要 即使机器学习(ML)管道影响到越来越多的利益者,有很少关于如何记录和 incorporate 利益者的输入的研究。我们提议使用 FeedbackLogs,加入现有的 ML 管道文档,跟踪多个利益者的反馈。每个日志记录了反馈收集过程中重要的细节,反馈本身,以及如何使用反馈更新 ML 管道。在这篇论文中,我们介绍了和形式化了收集FeedbackLog的过程。我们还提供了具体的应用场景,其中FeedbackLog可以作为算法审核的证据,以及用于记录基于利益者反馈的更新。

Rethinking Noisy Label Learning in Real-world Annotation Scenarios from the Noise-type Perspective

  • paper_url: http://arxiv.org/abs/2307.16889
  • repo_url: https://github.com/fuxiailab/protosemi
  • paper_authors: Renyu Zhu, Haoyu Liu, Runze Wu, Minmin Lin, Tangjie Lv, Changjie Fan, Haobo Wang
  • for: investigate the problem of learning with noisy labels in real-world annotation scenarios
  • methods: propose a novel sample selection-based approach for noisy label learning called Proto-semi
  • results: demonstrate the effectiveness of Proto-semi in handling the problem of learning from noisy labels, and show that the prototype-based repartitioning strategy is effective in mitigating the adverse impact of label noise.Here is the summary in Traditional Chinese:
  • for: 研究实际标签条件下的学习噪音标签问题
  • methods: 提出一个基于选择体系的噪音标签学习方法 called Proto-semi
  • results: 验证 Proto-semi 能够实现噪音标签学习问题,并显示基于几何的重新分配策略有效地减少噪音标签的影响。
    Abstract In this paper, we investigate the problem of learning with noisy labels in real-world annotation scenarios, where noise can be categorized into two types: factual noise and ambiguity noise. To better distinguish these noise types and utilize their semantics, we propose a novel sample selection-based approach for noisy label learning, called Proto-semi. Proto-semi initially divides all samples into the confident and unconfident datasets via warm-up. By leveraging the confident dataset, prototype vectors are constructed to capture class characteristics. Subsequently, the distances between the unconfident samples and the prototype vectors are calculated to facilitate noise classification. Based on these distances, the labels are either corrected or retained, resulting in the refinement of the confident and unconfident datasets. Finally, we introduce a semi-supervised learning method to enhance training. Empirical evaluations on a real-world annotated dataset substantiate the robustness of Proto-semi in handling the problem of learning from noisy labels. Meanwhile, the prototype-based repartitioning strategy is shown to be effective in mitigating the adverse impact of label noise. Our code and data are available at https://github.com/fuxiAIlab/ProtoSemi.
    摘要 在这篇论文中,我们研究了在实际注释场景中学习受损标签的问题,其中噪声可以分为两类:事实噪声和模糊噪声。为了更好地 отличи出这两种噪声类型并利用其 semantics,我们提议了一种基于样本选择的受损标签学习方法,称为Proto-semi。Proto-semi首先将所有样本分为自信量高和自信量低两个集合via warm-up。然后,通过利用自信量集合,构建 prototype vectors,以捕捉类征特征。接着,计算不确定样本与 prototype vectors 之间的距离,以便噪声分类。根据这些距离,将标签更正或保留,从而对自信量集合和不确定集合进行修正。最后,我们引入了一种半supervised学习方法,以提高训练。empirical evaluations 表明,Proto-semi 能够有效地处理实际注释中的受损标签学习问题。同时,我们的 prototype-based repartitioning 策略能够减轻噪声对标签学习的负面影响。我们的代码和数据可以在 中找到。

Testing the Depth of ChatGPT’s Comprehension via Cross-Modal Tasks Based on ASCII-Art: GPT3.5’s Abilities in Regard to Recognizing and Generating ASCII-Art Are Not Totally Lacking

  • paper_url: http://arxiv.org/abs/2307.16806
  • repo_url: None
  • paper_authors: David Bayani
  • for: 这篇论文探讨了 GPT3.5 模型在视觉任务中的能力,包括图像识别、图像分割和图像生成等。
  • methods: 该论文使用了 GPT3.5 模型,并对其进行了不同的变换和修改,以测试其在视觉任务中的表现。
  • results: 研究发现,GPT3.5 模型在图像识别和图像分割任务中表现不佳,但在图像生成任务中表现较为出色。
    Abstract Over the eight months since its release, ChatGPT and its underlying model, GPT3.5, have garnered massive attention, due to their potent mix of capability and accessibility. While a niche-industry of papers have emerged examining the scope of capabilities these models possess, the information fed to and extracted from these networks has been either natural language text or stylized, code-like language. Drawing inspiration from the prowess we expect a truly human-level intelligent agent to have across multiple signal modalities, in this work we examine GPT3.5's aptitude for visual tasks, where the inputs feature content provided as ASCII-art without overt distillation into a lingual summary. We conduct experiments analyzing the model's performance on image recognition tasks after various transforms typical in visual settings, trials investigating knowledge of image parts, and tasks covering image generation.
    摘要 Over the past eight months since its release, ChatGPT and its underlying model, GPT3.5, have received massive attention due to their powerful combination of capabilities and accessibility. While a niche industry of papers has emerged examining the scope of capabilities these models possess, the information fed to and extracted from these networks has been limited to natural language text or stylized, code-like language. Inspired by the versatility we would expect from a truly human-level intelligent agent, in this work we explore GPT3.5's ability to perform visual tasks, using ASCII art as input without any explicit linguistic summaries. We conduct experiments analyzing the model's performance on image recognition tasks, image part recognition, and image generation.

Worrisome Properties of Neural Network Controllers and Their Symbolic Representations

  • paper_url: http://arxiv.org/abs/2307.15456
  • repo_url: https://github.com/mimuw-rl/worrisome-nn
  • paper_authors: Jacek Cyranka, Kevin E M Church, Jean-Philippe Lessard
  • for: 本研究探讨控制器在简单强化学习问题中的稳定性问题。
  • methods: 本研究使用神经网络控制器和其低神经级别和符号抽象。
  • results: 研究发现, Typical controller 可以达到高均返回值,但仍然生成大量的持续低返回解决方案,这是一个非常不жела的性能,易被敌对者利用。 更加简单的控制器会承认更多的持续坏解决方案。 研究提供了一种系统性 robustness 研究的算法,并证明存在持续解决方案和、在某些情况下, periodic orbits 的存在,使用计算机支持的证明方法。
    Abstract We raise concerns about controllers' robustness in simple reinforcement learning benchmark problems. We focus on neural network controllers and their low neuron and symbolic abstractions. A typical controller reaching high mean return values still generates an abundance of persistent low-return solutions, which is a highly undesirable property, easily exploitable by an adversary. We find that the simpler controllers admit more persistent bad solutions. We provide an algorithm for a systematic robustness study and prove existence of persistent solutions and, in some cases, periodic orbits, using a computer-assisted proof methodology.
    摘要 我们有关控制器的Robustness在简单的征务学习问题上表达出关注。我们专注于神经网络控制器的低神经和符号抽象。一般情况下,一个高均返回值的控制器仍然生成丰富的持续性低返回解,这是非常不愿意的危难,易于敌人利用。我们发现简单的控制器承认更多持续性坏解。我们提供了一个系统atic robustness研究的算法,并证明存在持续解和,在一些情况下, periodic orbit,使用了电脑辅助证明方法。

From Probabilistic Programming to Complexity-based Programming

  • paper_url: http://arxiv.org/abs/2307.15453
  • repo_url: None
  • paper_authors: Giovanni Sileno, Jean-Louis Dessalles
  • for: 本文提出了一种新的计算框架,名为CompLog,受 probabilistic programming 系统ProbLog的启发,基于 simplicity theory 的推理机制,通过计算两个kolmogorov复杂度来代替概率推理。
  • methods: 本文使用了两个kolmogorov复杂度来计算后期和前期意外程度,即后期和前期主观概率。计算基于世界和心理模型的假设,通过 causa 和descriptive 关系来Weight predicates的复杂度。
  • results: 本文提供了一些应用示例,包括生成相关描述和提供谱析和否定的不同方法。
    Abstract The paper presents the main characteristics and a preliminary implementation of a novel computational framework named CompLog. Inspired by probabilistic programming systems like ProbLog, CompLog builds upon the inferential mechanisms proposed by Simplicity Theory, relying on the computation of two Kolmogorov complexities (here implemented as min-path searches via ASP programs) rather than probabilistic inference. The proposed system enables users to compute ex-post and ex-ante measures of unexpectedness of a certain situation, mapping respectively to posterior and prior subjective probabilities. The computation is based on the specification of world and mental models by means of causal and descriptive relations between predicates weighted by complexity. The paper illustrates a few examples of application: generating relevant descriptions, and providing alternative approaches to disjunction and to negation.
    摘要 文章介绍了一种新的计算框架,名为CompLog,它受到概率编程系统ProbLog的启发,基于 simplicity theory 中的推理机制,通过计算两个可读性复杂度(在 ASP 程序中实现为最短路寻找)而不是概率推理。该系统可以为用户计算出不同情况的预后和预先抽象度,即后 posting 和前 posting Subjective 概率。计算基于世界和心理模型的干扰和描述关系,这些关系由 predicate 的复杂度Weight。文章还给出了一些应用示例,如生成相关的描述和提供了许多不同的补做法。

DELPHIC: Practical DEL Planning via Possibilities (Extended Version)

  • paper_url: http://arxiv.org/abs/2307.15451
  • repo_url: None
  • paper_authors: Alessandro Burigana, Paolo Felli, Marco Montali
  • for: This paper aims to improve the practicality of Dynamic Epistemic Logic (DEL) planning by questioning the traditional semantics and proposing an alternative, more compact approach called DELPHIC.
  • methods: The paper uses a new semantics defined using possibilities, which are non-well-founded objects representing both factual properties and what agents consider to be possible. The authors implement the DELPHIC approach in Answer Set Programming (ASP) and compare it with the traditional Kripke-based approach.
  • results: The experimental evaluation shows that DELPHIC outperforms the traditional approach in terms of space and time.
    Abstract Dynamic Epistemic Logic (DEL) provides a framework for epistemic planning that is capable of representing non-deterministic actions, partial observability, higher-order knowledge and both factual and epistemic change. The high expressivity of DEL challenges existing epistemic planners, which typically can handle only restricted fragments of the whole framework. The goal of this work is to push the envelop of practical DEL planning, ultimately aiming for epistemic planners to be able to deal with the full range of features offered by DEL. Towards this goal, we question the traditional semantics of DEL, defined in terms on Kripke models. In particular, we propose an equivalent semantics defined using, as main building block, so-called possibilities: non well-founded objects representing both factual properties of the world, and what agents consider to be possible. We call the resulting framework DELPHIC. We argue that DELPHIC indeed provides a more compact representation of epistemic states. To substantiate this claim, we implement both approaches in ASP and we set up an experimental evaluation to compare DELPHIC with the traditional, Kripke-based approach. The evaluation confirms that DELPHIC outperforms the traditional approach in space and time.
    摘要 dynamically epistemic logic (DEL) 提供了一个架构,可以表示非决定性行为、部分可观察性、高阶知识和事实和知识改变。 DEL 的表达力问题,使得现有的 epistemic 观察者通常只能处理 restriction 的 fragment。 在这个工作中,我们质疑传统 DEL 的 semantics,定义为基于 Kripke 模型。 具体来说,我们提出了一个相等的 semantics,使用 so-called possibilities:非对数世界的性质,以及 agents 认为可能的东西。 我们称这个框架为 DELPHIC。 我们认为 DELPHIC 可以提供更 компакт的 epistemic 状态表示。 为了证明这个主张,我们将实现这两种方法,并设置了一个实验评估,以比较 DELPHIC 和传统、基于 Kripke 的方法。 评估确认 DELPHIC 在空间和时间方面的表现比 traditional 方法更好。

Optimal Alignment of Temporal Knowledge Bases

  • paper_url: http://arxiv.org/abs/2307.15439
  • repo_url: None
  • paper_authors: Oliver Fernandez-Gil, Fabio Patrizi, Giuseppe Perelli, Anni-Yasmin Turhan
  • for: 本研究旨在实现基于ontology的情境识别,并且解决在知识库中收集的数据不准确导致重要查询答案被遗弃的问题。
  • methods: 本文引入了TKBAlignment问题,该问题计算一个变体的TKB,以最小改变TKB,但能使得给定的temporal CQ得到答案,并且是这种(成本-)优化的。
  • results: 本文对ALC TKBs和 conjunctive queries with LTL operators进行研究,并提出了一种解决TKBAlignment问题的方法,该方法基于 propositional LTL over finite traces的对应技术,可以 Compute (cost-optimal) alignments of TKBs。
    Abstract Answering temporal CQs over temporalized Description Logic knowledge bases (TKB) is a main technique to realize ontology-based situation recognition. In case the collected data in such a knowledge base is inaccurate, important query answers can be missed. In this paper we introduce the TKB Alignment problem, which computes a variant of the TKB that minimally changes the TKB, but entails the given temporal CQ and is in that sense (cost-)optimal. We investigate this problem for ALC TKBs and conjunctive queries with LTL operators and devise a solution technique to compute (cost-optimal) alignments of TKBs that extends techniques for the alignment problem for propositional LTL over finite traces.
    摘要 Answering temporal CQs over temporalized Description Logic knowledge bases (TKB) is a main technique to realize ontology-based situation recognition. If the collected data in such a knowledge base is inaccurate, important query answers can be missed. In this paper, we introduce the TKB Alignment problem, which computes a variant of the TKB that minimally changes the TKB, but entails the given temporal CQ and is in that sense (cost-)optimal. We investigate this problem for ALC TKBs and conjunctive queries with LTL operators and devise a solution technique to compute (cost-optimal) alignments of TKBs that extends techniques for the alignment problem for propositional LTL over finite traces.Here's the word-for-word translation:回答 temporal CQs over temporalized Description Logic knowledge bases (TKB) 是实现 ontology-based situation recognition 的主要技术。如果 collected data 中的 TKB 不准确,重要的查询答案就可能会丢失。在这篇论文中,我们介绍 TKB Alignment problem,该问题计算一个 TKB 的变体,使其最小地改变 TKB,但涵盖给定的 temporal CQ,并且是Cost-optimal的。我们对 ALC TKBs 和 conjunctive queries with LTL operators 进行调查,并提出一种 compute (cost-optimal) alignments of TKBs 的解决方案,该方案基于 propositional LTL over finite traces 的对应技术。

Improvable Gap Balancing for Multi-Task Learning

  • paper_url: http://arxiv.org/abs/2307.15429
  • repo_url: https://github.com/yanqidai/igb4mtl
  • paper_authors: Yanqi Dai, Nanyi Fei, Zhiwu Lu
  • For: 这篇论文主要关注多任务学习(MTL)中的梯度平衡和损失平衡两种方法,以及它们在不同任务间的对应关系。* Methods: 本篇论文提出了两种新的改进梯度平衡(IGB)算法,其中一种运用了简单的规律,另一种则是通过深度强化学习来实现MTL中的改进梯度平衡。* Results: 实验结果显示,IGB算法在MTL中实现了最佳的结果,并且与梯度平衡结合使用可以获得进一步的改进。
    Abstract In multi-task learning (MTL), gradient balancing has recently attracted more research interest than loss balancing since it often leads to better performance. However, loss balancing is much more efficient than gradient balancing, and thus it is still worth further exploration in MTL. Note that prior studies typically ignore that there exist varying improvable gaps across multiple tasks, where the improvable gap per task is defined as the distance between the current training progress and desired final training progress. Therefore, after loss balancing, the performance imbalance still arises in many cases. In this paper, following the loss balancing framework, we propose two novel improvable gap balancing (IGB) algorithms for MTL: one takes a simple heuristic, and the other (for the first time) deploys deep reinforcement learning for MTL. Particularly, instead of directly balancing the losses in MTL, both algorithms choose to dynamically assign task weights for improvable gap balancing. Moreover, we combine IGB and gradient balancing to show the complementarity between the two types of algorithms. Extensive experiments on two benchmark datasets demonstrate that our IGB algorithms lead to the best results in MTL via loss balancing and achieve further improvements when combined with gradient balancing. Code is available at https://github.com/YanqiDai/IGB4MTL.
    摘要 在多任务学习(MTL)中,梯度均衡在最近几年内吸引了更多的研究兴趣,因为它经常会导致更好的性能。然而,损失均衡是梯度均衡的更加有效的方法,因此仍然值得进一步的探索。尽管先前的研究通常忽略了多任务中存在的不同可改善差距,其中每个任务的可改善差距定义为从当前训练进度到期望的最终训练进度之间的距离。因此,在进行损失均衡后,性能差距仍然出现在许多情况下。在本文中,我们采用损失均衡框架,提出了两种新的可改善差距均衡(IGB)算法 для MTL:一个使用简单的启发,另一个(这是第一次)使用深度强化学习。特别是,不直接在 MTL 中平衡损失,而是动态分配任务权重以进行可改善差距均衡。此外,我们将 IGB 和梯度均衡相结合,以示两者之间的补充性。广泛的实验表明,我们的 IGB 算法在 MTL 中通过损失均衡得到最佳结果,并在结合梯度均衡时获得进一步的改进。代码可以在 中找到。

A Critical Review of Large Language Models: Sensitivity, Bias, and the Path Toward Specialized AI

  • paper_url: http://arxiv.org/abs/2307.15425
  • repo_url: None
  • paper_authors: Arash Hajikhani, Carolyn Cole
  • for: 本研究探讨了一种专门编译的语言模型和一个通用模型如OpenAI的GPT-3.5在文本数据中检测SDGs的比较效果。
  • methods: 本研究使用了大语言模型(LLMs),探讨了对偏见和敏感性的挑战。研究强调了特殊训练的重要性以实现精确和不偏的分析。
  • results: 研究发现,专门的SDG检测模型在公司描述 dataset 中比GPT-3.5更加精准地检测SDGs,并且可以快速地提供高度相关的SDGs。研究认为,在执行任务时应选择合适的模型,考虑任务的需求、成本、复杂度和可见性。
    Abstract This paper examines the comparative effectiveness of a specialized compiled language model and a general-purpose model like OpenAI's GPT-3.5 in detecting SDGs within text data. It presents a critical review of Large Language Models (LLMs), addressing challenges related to bias and sensitivity. The necessity of specialized training for precise, unbiased analysis is underlined. A case study using a company descriptions dataset offers insight into the differences between the GPT-3.5 and the specialized SDG detection model. While GPT-3.5 boasts broader coverage, it may identify SDGs with limited relevance to the companies' activities. In contrast, the specialized model zeroes in on highly pertinent SDGs. The importance of thoughtful model selection is emphasized, taking into account task requirements, cost, complexity, and transparency. Despite the versatility of LLMs, the use of specialized models is suggested for tasks demanding precision and accuracy. The study concludes by encouraging further research to find a balance between the capabilities of LLMs and the need for domain-specific expertise and interpretability.
    摘要

Improving Social Media Popularity Prediction with Multiple Post Dependencies

  • paper_url: http://arxiv.org/abs/2307.15413
  • repo_url: None
  • paper_authors: Zhizhen Zhang, Xiaohui Xie, Mengyu Yang, Ye Tian, Yong Jiang, Yong Cui
  • for: 预测社交媒体帖子的 популяр度,以提高推荐系统和多媒体广告等应用的效果。
  • methods: 提出了一种名为受依关系探测网络(DSN)的新预测框架,利用了帖子之间和帖子内的多个依存关系,以提高预测精度。
  • results: 对社交媒体帖子Popularity Dataset进行实验,比现有的模型表现更优异。
    Abstract Social Media Popularity Prediction has drawn a lot of attention because of its profound impact on many different applications, such as recommendation systems and multimedia advertising. Despite recent efforts to leverage the content of social media posts to improve prediction accuracy, many existing models fail to fully exploit the multiple dependencies between posts, which are important to comprehensively extract content information from posts. To tackle this problem, we propose a novel prediction framework named Dependency-aware Sequence Network (DSN) that exploits both intra- and inter-post dependencies. For intra-post dependency, DSN adopts a multimodal feature extractor with an efficient fine-tuning strategy to obtain task-specific representations from images and textual information of posts. For inter-post dependency, DSN uses a hierarchical information propagation method to learn category representations that could better describe the difference between posts. DSN also exploits recurrent networks with a series of gating layers for more flexible local temporal processing abilities and multi-head attention for long-term dependencies. The experimental results on the Social Media Popularity Dataset demonstrate the superiority of our method compared to existing state-of-the-art models.
    摘要 For intra-post dependency, DSN uses a multimodal feature extractor with an efficient fine-tuning strategy to obtain task-specific representations from images and textual information of posts. For inter-post dependency, DSN employs a hierarchical information propagation method to learn category representations that can better capture the differences between posts. Additionally, DSN utilizes recurrent networks with a series of gating layers for more flexible local temporal processing abilities and multi-head attention for long-term dependencies.The experimental results on the Social Media Popularity Dataset demonstrate the superiority of our method compared to existing state-of-the-art models.

Agent-Based Model: Simulating a Virus Expansion Based on the Acceptance of Containment Measures

  • paper_url: http://arxiv.org/abs/2307.15723
  • repo_url: None
  • paper_authors: Alejandro Rodríguez-Arias, Amparo Alonso-Betanzos, Bertha Guijarro-Berdiñas, Noelia Sánchez-Marroño
  • for: 这个研究旨在描述一种基于代理模型(ABM)的社会系统分析方法,用于研究流行病在社会中的传播和控制。
  • methods: 该研究使用了修改后的SEIRD模型和公民决策模型,以模拟公民在流行病爆发期间的行为和决策。
  • results: 研究发现,公民的行为和决策对抗流行病的传播有重要影响,而且这种影响可以通过分析各个公民的行为和决策来了解。
    Abstract Compartmental epidemiological models categorize individuals based on their disease status, such as the SEIRD model (Susceptible-Exposed-Infected-Recovered-Dead). These models determine the parameters that influence the magnitude of an outbreak, such as contagion and recovery rates. However, they don't account for individual characteristics or population actions, which are crucial for assessing mitigation strategies like mask usage in COVID-19 or condom distribution in HIV. Additionally, studies highlight the role of citizen solidarity, interpersonal trust, and government credibility in explaining differences in contagion rates between countries. Agent-Based Modeling (ABM) offers a valuable approach to study complex systems by simulating individual components, their actions, and interactions within an environment. ABM provides a useful tool for analyzing social phenomena. In this study, we propose an ABM architecture that combines an adapted SEIRD model with a decision-making model for citizens. In this paper, we propose an ABM architecture that allows us to analyze the evolution of virus infections in a society based on two components: 1) an adaptation of the SEIRD model and 2) a decision-making model for citizens. In this way, the evolution of infections is affected, in addition to the spread of the virus itself, by individual behavior when accepting or rejecting public health measures. We illustrate the designed model by examining the progression of SARS-CoV-2 infections in A Coru\~na, Spain. This approach makes it possible to analyze the effect of the individual actions of citizens during an epidemic on the spread of the virus.
    摘要 《组室传染学模型(SEIRD模型)分类人员根据疾病状况,但这些模型不会考虑个体特征或人口行为,这些因素对控制疫情策略的评估非常重要。例如,面罩使用和HIV抗原分发等疫情控制措施的效果。学者们指出,公民团结、人际信任和政府信用度在不同国家的传染率之间存在关系。基于代理模型(ABM)可以研究复杂系统,模拟个体组件、其行为和互动环境中的交互。在本研究中,我们提出了一种ABM架构,将SEIRD模型与公民决策模型相结合,以分析病毒传播在社会中的演化。我们通过对SARS-CoV-2在西班牙加的库恩省的传播进行示例分析,以示出这种方法的效果。这种方法可以评估疫情期间公民个体行为对病毒传播的影响。》Note: The translation is provided using the Google Translate tool, and may not be entirely accurate or idiomatic.

Co-attention Graph Pooling for Efficient Pairwise Graph Interaction Learning

  • paper_url: http://arxiv.org/abs/2307.15377
  • repo_url: https://github.com/leejunhyun/coattentiongraphpooling
  • paper_authors: Junhyun Lee, Bumsoo Kim, Minji Jeon, Jaewoo Kang
  • for: 处理和学习图structured数据
  • methods: 使用 co-attention 在图 pooling 中提取交互作表示
  • results: 在实际数据集上,与现有方法相比,我们的方法具有更高的精度和更低的计算成本
    Abstract Graph Neural Networks (GNNs) have proven to be effective in processing and learning from graph-structured data. However, previous works mainly focused on understanding single graph inputs while many real-world applications require pair-wise analysis for graph-structured data (e.g., scene graph matching, code searching, and drug-drug interaction prediction). To this end, recent works have shifted their focus to learning the interaction between pairs of graphs. Despite their improved performance, these works were still limited in that the interactions were considered at the node-level, resulting in high computational costs and suboptimal performance. To address this issue, we propose a novel and efficient graph-level approach for extracting interaction representations using co-attention in graph pooling. Our method, Co-Attention Graph Pooling (CAGPool), exhibits competitive performance relative to existing methods in both classification and regression tasks using real-world datasets, while maintaining lower computational complexity.
    摘要 格子神经网络(GNNs)已经证明能够有效地处理和学习具有格式结构的数据。然而,先前的工作主要集中在单个图像输入的理解上,而现实世界中许多应用需要对图像数据进行对比分析(例如场景图匹配、代码搜索和药物交互预测)。为此,最近的工作已经转移注意力到对图像对的交互进行学习。虽然这些方法提高了性能,但是它们仍然受到节点级别的交互限制,导致计算成本高并且性能不佳。为解决这个问题,我们提出了一种新的和高效的图像水平的交互表示提取方法,即协同注意力图集(CAGPool)。我们的方法在实际 dataset 上表现竞争性,同时保持计算复杂度较低。

Confident Feature Ranking

  • paper_url: http://arxiv.org/abs/2307.15361
  • repo_url: None
  • paper_authors: Bitya Neuhof, Yuval Benjamini
  • for: 本研究旨在提供一种基于对比测试的后处方法,以确定特征重要性值的稳定排名。
  • methods: 本研究使用对比测试方法,对特征重要性值进行重新排名,并生成相应的置信区间。
  • results: 本研究确保了对特征重要性值的排名具有高概率包含真实排名,并且允许选择top-k集。
    Abstract Interpretation of feature importance values often relies on the relative order of the features rather than on the value itself, referred to as ranking. However, the order may be unstable due to the small sample sizes used in calculating the importance values. We propose that post-hoc importance methods produce a ranking and simultaneous confident intervals for the rankings. Based on pairwise comparisons of the feature importance values, our method is guaranteed to include the ``true'' (infinite sample) ranking with high probability and allows for selecting top-k sets.
    摘要 常用的特征重要性值的解释通常是通过特征之间的相对排名而进行,而不是直接查看值的大小。然而,排名的稳定性可能会受到小样本大小的影响。我们提议使用 posterior 重要性方法生成排名和同时的信任范围,以确保包含“真实”(无限大样本)排名,并允许选择 top-k 集。基于特征之间的对比,我们的方法能够保证包含“真实”排名,并且可以选择 top-k 集。Note: "posterior" in Chinese is "后验" (hòu yì).

Med-HALT: Medical Domain Hallucination Test for Large Language Models

  • paper_url: http://arxiv.org/abs/2307.15343
  • repo_url: None
  • paper_authors: Logesh Kumar Umapathi, Ankit Pal, Malaikannan Sankarasubbu
  • for: The paper is written to address the challenges of hallucinations in large language models (LLMs) in the medical domain, and to propose a new benchmark and dataset (Med-HALT) to evaluate and reduce hallucinations.
  • methods: The paper proposes a new benchmark and dataset (Med-HALT) that includes reasoning and memory-based hallucination tests to assess LLMs’ problem-solving and information retrieval abilities.
  • results: The study evaluates leading LLMs, including Text Davinci, GPT-3.5, LlaMa-2, MPT, and Falcon, and reveals significant differences in their performance. The paper provides detailed insights into the dataset, promoting transparency and reproducibility.
    Abstract This research paper focuses on the challenges posed by hallucinations in large language models (LLMs), particularly in the context of the medical domain. Hallucination, wherein these models generate plausible yet unverified or incorrect information, can have serious consequences in healthcare applications. We propose a new benchmark and dataset, Med-HALT (Medical Domain Hallucination Test), designed specifically to evaluate and reduce hallucinations. Med-HALT provides a diverse multinational dataset derived from medical examinations across various countries and includes multiple innovative testing modalities. Med-HALT includes two categories of tests reasoning and memory-based hallucination tests, designed to assess LLMs's problem-solving and information retrieval abilities. Our study evaluated leading LLMs, including Text Davinci, GPT-3.5, LlaMa-2, MPT, and Falcon, revealing significant differences in their performance. The paper provides detailed insights into the dataset, promoting transparency and reproducibility. Through this work, we aim to contribute to the development of safer and more reliable language models in healthcare. Our benchmark can be found at medhalt.github.io
    摘要

Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding

  • paper_url: http://arxiv.org/abs/2307.15337
  • repo_url: None
  • paper_authors: Xuefei Ning, Zinan Lin, Zixuan Zhou, Huazhong Yang, Yu Wang
  • for: 降低大语言模型(LLMs)的终端生成延迟。
  • methods: 提出了“思维skeleton”(SoT),让LLMs先生成答案的框架,然后并行API调用或批处理解码完成每个框架点的内容。
  • results: 对11种不同的LLMs进行测试,得到了 considerable 的速度提升(最高达2.39倍),并且可能会在某些问题类型上提高答案质量。
    Abstract This work aims at decreasing the end-to-end generation latency of large language models (LLMs). One of the major causes of the high generation latency is the sequential decoding approach adopted by almost all state-of-the-art LLMs. In this work, motivated by the thinking and writing process of humans, we propose "Skeleton-of-Thought" (SoT), which guides LLMs to first generate the skeleton of the answer, and then conducts parallel API calls or batched decoding to complete the contents of each skeleton point in parallel. Not only does SoT provide considerable speed-up (up to 2.39x across 11 different LLMs), but it can also potentially improve the answer quality on several question categories in terms of diversity and relevance. SoT is an initial attempt at data-centric optimization for efficiency, and reveal the potential of pushing LLMs to think more like a human for answer quality.
    摘要 这项工作的目标是减少大语言模型(LLM)的端到端生成延迟。一个主要的延迟原因是大多数现状的LLM采用的是顺序解码方法。在这项工作中,我们受人类思维和写作过程的 inspirited by,提议了“思想骨架”(SoT),帮助LLM首先生成答案的框架,然后在平行API调用或批处理decode中完善每个骨架点。不仅SoT可以提供显著的速度增加(最多2.39倍于11个不同的LLM),而且也可能提高答案质量在一些问题类型上,包括多样性和相关性。SoT是数据驱动优化的初步尝试,揭示了推动LLM思考更像人类的答案质量的可能性。

Tutorials on Stance Detection using Pre-trained Language Models: Fine-tuning BERT and Prompting Large Language Models

  • paper_url: http://arxiv.org/abs/2307.15331
  • repo_url: None
  • paper_authors: Yun-Shiuan Chuang
  • for: 本文提供了两个自包含的教程,用于在推特数据中进行立场检测,使用BERT精度和大型自然语言模型(LLM)的启发。
  • methods: 本教程涵盖了BERT体系和Tokenization,并导导用户在训练、调参和评估标准和域pecificBERT模型的方法。
  • results: 教程使用了多种提示策略,并使用混淆矩阵和macro F1分数来评估。结果显示,不需要精度调整的ChatGPT和FLAN-T5可以在几个例子下表现出优于精度调整的BERT。
    Abstract This paper presents two self-contained tutorials on stance detection in Twitter data using BERT fine-tuning and prompting large language models (LLMs). The first tutorial explains BERT architecture and tokenization, guiding users through training, tuning, and evaluating standard and domain-specific BERT models with HuggingFace transformers. The second focuses on constructing prompts and few-shot examples to elicit stances from ChatGPT and open-source FLAN-T5 without fine-tuning. Various prompting strategies are implemented and evaluated using confusion matrices and macro F1 scores. The tutorials provide code, visualizations, and insights revealing the strengths of few-shot ChatGPT and FLAN-T5 which outperform fine-tuned BERTs. By covering both model fine-tuning and prompting-based techniques in an accessible, hands-on manner, these tutorials enable learners to gain applied experience with cutting-edge methods for stance detection.
    摘要 Translation in Simplified Chinese:这篇论文提供了使用BERT精细化和大型自然语言模型(LLM)的两个自包含的教程,用于推断推特上的立场检测。第一个教程介绍了BERT的架构和token化,并引导用户通过训练、调整和评估标准和域pecificBERT模型的方法。第二个教程专注于构建提示和几个示例来引发ChatGPT和开源FLAN-T5的立场,而不需要精细化。多种提示策略被实现和评估使用混淆矩阵和macro F1分数。这些教程提供了代码、视觉化和概念,揭示了几个批处的强点,其中几个几个示例超出了精细化BERT的性能。通过覆盖模型精细化和提示基本技术,这些教程帮助学习者获得应用最新方法的实践经验。

Robust Visual Sim-to-Real Transfer for Robotic Manipulation

  • paper_url: http://arxiv.org/abs/2307.15320
  • repo_url: None
  • paper_authors: Ricardo Garcia, Robin Strudel, Shizhe Chen, Etienne Arlaud, Ivan Laptev, Cordelia Schmid
  • for: 这个论文的目的是探索视觉动力策略在模拟环境中学习,以优化在真实世界中的机器人控制。
  • methods: 这个论文使用了域随机化(DR)方法来bridge模拟和真实数据之间的观察者随机化(sim-to-real)问题。
  • results: 研究人员通过使用DR方法,在一个rich的机器人抓取任务集中系统地探索了视觉域随机化策略,并证明了DR参数对off-line代理任务和on-line策略均有类似的影响。此外,研究人员还表明了在真实场景中的视觉变化robustness的优势。
    Abstract Learning visuomotor policies in simulation is much safer and cheaper than in the real world. However, due to discrepancies between the simulated and real data, simulator-trained policies often fail when transferred to real robots. One common approach to bridge the visual sim-to-real domain gap is domain randomization (DR). While previous work mainly evaluates DR for disembodied tasks, such as pose estimation and object detection, here we systematically explore visual domain randomization methods and benchmark them on a rich set of challenging robotic manipulation tasks. In particular, we propose an off-line proxy task of cube localization to select DR parameters for texture randomization, lighting randomization, variations of object colors and camera parameters. Notably, we demonstrate that DR parameters have similar impact on our off-line proxy task and on-line policies. We, hence, use off-line optimized DR parameters to train visuomotor policies in simulation and directly apply such policies to a real robot. Our approach achieves 93% success rate on average when tested on a diverse set of challenging manipulation tasks. Moreover, we evaluate the robustness of policies to visual variations in real scenes and show that our simulator-trained policies outperform policies learned using real but limited data. Code, simulation environment, real robot datasets and trained models are available at https://www.di.ens.fr/willow/research/robust_s2r/.
    摘要 学习视motor策略在模拟中 Much safer and cheaper than in the real world. However, due to discrepancies between the simulated and real data, simulator-trained policies often fail when transferred to real robots. One common approach to bridge the visual sim-to-real domain gap is domain randomization (DR). While previous work mainly evaluates DR for disembodied tasks, such as pose estimation and object detection, here we systematically explore visual domain randomization methods and benchmark them on a rich set of challenging robotic manipulation tasks. In particular, we propose an off-line proxy task of cube localization to select DR parameters for texture randomization, lighting randomization, variations of object colors and camera parameters. Notably, we demonstrate that DR parameters have similar impact on our off-line proxy task and on-line policies. We, hence, use off-line optimized DR parameters to train visuomotor policies in simulation and directly apply such policies to a real robot. Our approach achieves 93% success rate on average when tested on a diverse set of challenging manipulation tasks. Moreover, we evaluate the robustness of policies to visual variations in real scenes and show that our simulator-trained policies outperform policies learned using real but limited data. 代码、模拟环境、实际机器人数据和训练模型可以在https://www.di.ens.fr/willow/research/robust_s2r/ obtained。

Beyond Reality: The Pivotal Role of Generative AI in the Metaverse

  • paper_url: http://arxiv.org/abs/2308.06272
  • repo_url: None
  • paper_authors: Vinay Chamola, Gaurang Bansal, Tridib Kumar Das, Vikas Hassija, Naga Siva Sai Reddy, Jiacheng Wang, Sherali Zeadally, Amir Hussain, F. Richard Yu, Mohsen Guizani, Dusit Niyato
  • for: 这篇论文探讨了如何通过生成人工智能技术实现虚拟世界的演进和互动性,以及这些技术在虚拟世界中的应用。
  • methods: 论文描述了各种生成人工智能技术,包括文本生成模型ChatGPT和GPT-3、图像生成模型DALL-E和MidJourney、以及3D模型生成技术Point-E和Lumirithmic。
  • results: 论文总结了这些技术在虚拟世界中的应用和发展前景,同时也评估了这些技术的挑战和伦理问题。
    Abstract Imagine stepping into a virtual world that's as rich, dynamic, and interactive as our physical one. This is the promise of the Metaverse, and it's being brought to life by the transformative power of Generative Artificial Intelligence (AI). This paper offers a comprehensive exploration of how generative AI technologies are shaping the Metaverse, transforming it into a dynamic, immersive, and interactive virtual world. We delve into the applications of text generation models like ChatGPT and GPT-3, which are enhancing conversational interfaces with AI-generated characters. We explore the role of image generation models such as DALL-E and MidJourney in creating visually stunning and diverse content. We also examine the potential of 3D model generation technologies like Point-E and Lumirithmic in creating realistic virtual objects that enrich the Metaverse experience. But the journey doesn't stop there. We also address the challenges and ethical considerations of implementing these technologies in the Metaverse, offering insights into the balance between user control and AI automation. This paper is not just a study, but a guide to the future of the Metaverse, offering readers a roadmap to harnessing the power of generative AI in creating immersive virtual worlds.
    摘要 imagine 进入一个丰富、动态、互动的虚拟世界,这是metaverse的推荐,并且这个虚拟世界正在被生成人工智能(AI)的变革力所实现。本文将进行全面的探讨,描述如何使用生成AI技术将metaverse变成一个动态、内在、互动的虚拟世界。我们将探讨文本生成模型如ChatGPT和GPT-3,它们在虚拟世界中创建了AI生成的人物,让用户能够在虚拟世界中互动。我们也将探讨图像生成模型如DALL-E和MidJourney,它们在创建丰富多样的内容方面发挥了重要作用。此外,我们还将探讨3D模型生成技术如Point-E和Lumirithmic,它们将实现虚拟物品的实际化,增强metaverse的体验。但我们的旅程不止于此。我们还需要处理在metaverse中实施这些技术的挑战和伦理考虑。本文不仅是一篇研究,更是 metaverse 未来的路径,帮助读者实现在虚拟世界中使用生成AI的潜力。

DiffKendall: A Novel Approach for Few-Shot Learning with Differentiable Kendall’s Rank Correlation

  • paper_url: http://arxiv.org/abs/2307.15317
  • repo_url: None
  • paper_authors: Kaipeng Zheng, Huishuai Zhang, Weiran Huang
  • for: 这个论文主要针对几何学习中的测试类别不为模型所见过的问题进行适应。
  • methods: 这个论文使用了几何 similarity 度量来衡量两个特征之间的Semantic相似性,并将 Kendall rank correlation 作为替代的度量。
  • results: 这个论文的实验结果显示,使用 Kendall rank correlation 来替代几何 similarity 度量可以对几何学习中的测试类别进行更好的适应,并且可以实现更高的性能。
    Abstract Few-shot learning aims to adapt models trained on the base dataset to novel tasks where the categories are not seen by the model before. This often leads to a relatively uniform distribution of feature values across channels on novel classes, posing challenges in determining channel importance for novel tasks. Standard few-shot learning methods employ geometric similarity metrics such as cosine similarity and negative Euclidean distance to gauge the semantic relatedness between two features. However, features with high geometric similarities may carry distinct semantics, especially in the context of few-shot learning. In this paper, we demonstrate that the importance ranking of feature channels is a more reliable indicator for few-shot learning than geometric similarity metrics. We observe that replacing the geometric similarity metric with Kendall's rank correlation only during inference is able to improve the performance of few-shot learning across a wide range of datasets with different domains. Furthermore, we propose a carefully designed differentiable loss for meta-training to address the non-differentiability issue of Kendall's rank correlation. Extensive experiments demonstrate that the proposed rank-correlation-based approach substantially enhances few-shot learning performance.
    摘要 通过几拍学习适应基 dataset 中的任务,目标是使模型能够适应 novel 任务中的类别未经过模型训练。这经常导致 novel 类通道的特征值分布呈相对均勋的形式,从而增加了决定通道重要性的挑战。标准的几拍学习方法通常使用 геометрические相似度度量,如 косину斯相似度和负 Euclidian 距离,来衡量两个特征之间的 semantic 相似性。但是,具有高 geometric 相似度的特征可能会拥有不同的 semantics,特别是在几拍学习上。在这篇文章中,我们表明通道的重要性排名是几拍学习中更可靠的指标,而不是 geometric 相似度度量。我们发现,在推理时将 geometric 相似度度量替换为 Kendall 排名相关性可以在不同领域的 dataset 上提高几拍学习性能。此外,我们还提出了一种特殊的可导式损失函数,用于在 meta-training 中处理 Kendall 排名相关性的不导数问题。广泛的实验表明,我们的排名相关性基于的方法可以显著提高几拍学习性能。

Efficient Multiuser AI Downloading via Reusable Knowledge Broadcasting

  • paper_url: http://arxiv.org/abs/2307.15316
  • repo_url: None
  • paper_authors: Hai Wu, Qunsong Zeng, Kaibin Huang
  • for: 这 paper 的目的是解决 sixth-generation (6G) 移动网络中的即时适应人工智能(AI)模型下载问题,以减少无线链路上的通信开销。
  • methods: 这 paper 提出了一个名为 Model Broadcasting and Assembling (MBA) 框架,该框架利用可重用知识(shared parameters among tasks)来启用参数广播,从而减少通信开销。MBA 框架包括两个关键组件:MBA 协议和参数选择和功率控制(PS-PC)的共同设计。
  • results: 该 paper 的实验结果表明,相比传统模型下载方法,MBA 框架可以减少下载时间开销,提高设备的模型性能。
    Abstract For the 6G mobile networks, in-situ model downloading has emerged as an important use case to enable real-time adaptive artificial intelligence on edge devices. However, the simultaneous downloading of diverse and high-dimensional models to multiple devices over wireless links presents a significant communication bottleneck. To overcome the bottleneck, we propose the framework of model broadcasting and assembling (MBA), which represents the first attempt on leveraging reusable knowledge, referring to shared parameters among tasks, to enable parameter broadcasting to reduce communication overhead. The MBA framework comprises two key components. The first, the MBA protocol, defines the system operations including parameter selection from a model library, power control for broadcasting, and model assembling at devices. The second component is the joint design of parameter-selection-and-power-control (PS-PC), which provides guarantees on devices' model performance and minimizes the downloading latency. The corresponding optimization problem is simplified by decomposition into the sequential PS and PC sub-problems without compromising its optimality. The PS sub-problem is solved efficiently by designing two efficient algorithms. On one hand, the low-complexity algorithm of greedy parameter selection features the construction of candidate model sets and a selection metric, both of which are designed under the criterion of maximum reusable knowledge among tasks. On the other hand, the optimal tree-search algorithm gains its efficiency via the proposed construction of a compact binary tree pruned using model architecture constraints and an intelligent branch-and-bound search. Given optimal PS, the optimal PC policy is derived in closed form. Extensive experiments demonstrate the substantial reduction in downloading latency achieved by the proposed MBA compared to traditional model downloading.
    摘要 The MBA framework consists of two key components:1. MBA Protocol: This defines the system operations, including parameter selection from a model library, power control for broadcasting, and model assembling at devices.2. Joint Design of Parameter-Selection-and-Power-Control (PS-PC): This provides guarantees on devices' model performance and minimizes downloading latency. The optimization problem is simplified by decomposing it into sequential PS and PC sub-problems without compromising optimality.The PS sub-problem is solved efficiently using two efficient algorithms:1. Greedy Parameter Selection: This features the construction of candidate model sets and a selection metric, both designed under the criterion of maximum reusable knowledge among tasks.2. Optimal Tree-Search Algorithm: This gains efficiency via a proposed construction of a compact binary tree pruned using model architecture constraints and an intelligent branch-and-bound search.Given optimal PS, the optimal PC policy is derived in closed form. Extensive experiments demonstrate that the proposed MBA achieves substantial reduction in downloading latency compared to traditional model downloading.

WC-SBERT: Zero-Shot Text Classification via SBERT with Self-Training for Wikipedia Categories

  • paper_url: http://arxiv.org/abs/2307.15293
  • repo_url: https://github.com/seventychi/wc-sbert
  • paper_authors: Te-Yu Chi, Yu-Meng Tang, Chia-Wen Lu, Qiu-Xia Zhang, Jyh-Shing Roger Jang
  • for: 解决 zero-shot 文本分类问题,尤其是自适应自动学习策略。
  • methods: 提议一种使用标签而不是文本进行训练的新型自适应策略,利用 Wikipedia 中的类别作为训练集,并使用 SBERT 预训练模型建立文本中对应的相互关系,以便 associative 训练。
  • results: 实验结果表明,这种方法可以在 minutes 内将模型适应目标数据集,并在 Yahoo Topic 和 AG News 数据集上达到了状态元的 результаTS。相比其他 BERT 基于 transformer 模型,我们的方法可以减少训练数据量,提高训练效率,并且在不同数据集上进行快速的精度调整和推理。
    Abstract Our research focuses on solving the zero-shot text classification problem in NLP, with a particular emphasis on innovative self-training strategies. To achieve this objective, we propose a novel self-training strategy that uses labels rather than text for training, significantly reducing the model's training time. Specifically, we use categories from Wikipedia as our training set and leverage the SBERT pre-trained model to establish positive correlations between pairs of categories within the same text, facilitating associative training. For new test datasets, we have improved the original self-training approach, eliminating the need for prior training and testing data from each target dataset. Instead, we adopt Wikipedia as a unified training dataset to better approximate the zero-shot scenario. This modification allows for rapid fine-tuning and inference across different datasets, greatly reducing the time required for self-training. Our experimental results demonstrate that this method can adapt the model to the target dataset within minutes. Compared to other BERT-based transformer models, our approach significantly reduces the amount of training data by training only on labels, not the actual text, and greatly improves training efficiency by utilizing a unified training set. Additionally, our method achieves state-of-the-art results on both the Yahoo Topic and AG News datasets.
    摘要 Translated into Simplified Chinese:我们的研究集中于解决NLP中的零例文本分类问题,强调创新自动训练策略。为达到这个目标,我们提议一种新的自动训练策略,使用标签而不是文本进行训练,减少模型训练时间。特别是,我们使用Wikipedia中的类别作为我们的训练集,利用SBERT预训练模型来建立文本中类别之间的相互关联,促进相关训练。对于新的测试集,我们改进了原始自动训练方法,消除了每个目标集需要的先前训练和测试数据。而是采用Wikipedia作为一个统一的训练集,更好地逼近零例场景。这种修改允许快速的细化和推理,大幅减少自动训练的时间。我们的实验结果表明,这种方法可以在分钟内适应目标集。相比其他BERT基于转换器模型,我们的方法可以减少训练数据量,只训练标签而不是实际文本,并且大幅提高训练效率。此外,我们的方法在Yahoo主题和AG新闻集上达到了状态对的结果。

Reasoning before Responding: Integrating Commonsense-based Causality Explanation for Empathetic Response Generation

  • paper_url: http://arxiv.org/abs/2308.00085
  • repo_url: None
  • paper_authors: Yahui Fu, Koji Inoue, Chenhui Chu, Tatsuya Kawahara
  • for: 提高对用户情感的理解和回应
  • methods: incorporating commonsense knowledge and reasoning about the causes of emotions, integrating in-context learning with commonsense knowledge, and integrating with ChatGPT and T5-based models
  • results: outperforms other comparable methods on both automatic and human evaluations
    Abstract Recent approaches to empathetic response generation try to incorporate commonsense knowledge or reasoning about the causes of emotions to better understand the user's experiences and feelings. However, these approaches mainly focus on understanding the causalities of context from the user's perspective, ignoring the system's perspective. In this paper, we propose a commonsense-based causality explanation approach for diverse empathetic response generation that considers both the user's perspective (user's desires and reactions) and the system's perspective (system's intentions and reactions). We enhance ChatGPT's ability to reason for the system's perspective by integrating in-context learning with commonsense knowledge. Then, we integrate the commonsense-based causality explanation with both ChatGPT and a T5-based model. Experimental evaluations demonstrate that our method outperforms other comparable methods on both automatic and human evaluations.
    摘要 现代方法 для生成同情响应尝试包含常识知识或理智来更好地理解用户的经验和情感。然而,这些方法主要关注用户的视角,忽略系统的视角。在这篇论文中,我们提出一种基于常识的 causality 解释方法 для多样化同情响应生成,考虑用户的视角(用户的愿望和反应)和系统的视角(系统的意图和反应)。我们通过将审计学习与常识知识集成到ChatGPT中,提高其理解系统视角的能力。然后,我们将commonsense-based causality explanation与ChatGPT和基于T5的模型集成。实验评估表明,我们的方法在自动和人类评估中都高于其他相似方法。

Multiple Instance Learning Framework with Masked Hard Instance Mining for Whole Slide Image Classification

  • paper_url: http://arxiv.org/abs/2307.15254
  • repo_url: https://github.com/dearcaat/mhim-mil
  • paper_authors: Wenhao Tang, Sheng Huang, Xiaoxian Zhang, Fengtao Zhou, Yi Zhang, Bo Liu
  • for: 这篇论文是关于多个实例学习(MIL)问题的解决方案。
  • methods: 这篇论文使用了一种新的实例遮盖策略(Masked Hard Instance Mining,MHIM),它使用一个SIAMESE结构(教师-学生)和一个准确性约束来探索可能的困难实例。
  • results: 实验结果表明,使用MHIM-MIL方法可以在CAMELYON-16和TCGA肺癌数据集上超过其他最新的方法,并且具有更好的性能和训练成本。Here’s the breakdown of each point in more detail:
  • for: The paper is about solving the Multiple Instance Learning (MIL) problem, which is a common problem in medical image analysis.
  • methods: The proposed method uses a novel instance masking strategy called Masked Hard Instance Mining (MHIM), which combines a Siamese structure with a consistency constraint to explore potential hard instances.
  • results: The experimental results on the CAMELYON-16 and TCGA Lung Cancer datasets show that the proposed MHIM-MIL method outperforms other state-of-the-art methods in terms of performance and training cost.
    Abstract The whole slide image (WSI) classification is often formulated as a multiple instance learning (MIL) problem. Since the positive tissue is only a small fraction of the gigapixel WSI, existing MIL methods intuitively focus on identifying salient instances via attention mechanisms. However, this leads to a bias towards easy-to-classify instances while neglecting hard-to-classify instances. Some literature has revealed that hard examples are beneficial for modeling a discriminative boundary accurately. By applying such an idea at the instance level, we elaborate a novel MIL framework with masked hard instance mining (MHIM-MIL), which uses a Siamese structure (Teacher-Student) with a consistency constraint to explore the potential hard instances. With several instance masking strategies based on attention scores, MHIM-MIL employs a momentum teacher to implicitly mine hard instances for training the student model, which can be any attention-based MIL model. This counter-intuitive strategy essentially enables the student to learn a better discriminating boundary. Moreover, the student is used to update the teacher with an exponential moving average (EMA), which in turn identifies new hard instances for subsequent training iterations and stabilizes the optimization. Experimental results on the CAMELYON-16 and TCGA Lung Cancer datasets demonstrate that MHIM-MIL outperforms other latest methods in terms of performance and training cost. The code is available at: https://github.com/DearCaat/MHIM-MIL.
    摘要 整个滤镜图像(WSI)分类经常被视为多例学习(MIL)问题。由于正例组占整个多个吉比特像素的只有一小部分,现有的MIL方法倾向于通过注意力机制来标识突出的实例。然而,这会导致模型偏好易于分类的实例,而忽略困难分类的实例。一些文献表明,困难的实例对模型准确地界定边框具有重要作用。我们在实例层次上运用这一想法,提出了一种新的MIL框架——偏挥硬实例挖掘(MHIM-MIL)。该框架使用了SIAMESE结构(教师-学生),并通过一致性约束来探索潜在的困难实例。通过多种实例层次的掩码策略,MHIM-MIL使用了掩码硬实例来训练学生模型,该模型可以是任何注意力基于的MIL模型。这种Counter-intuitive策略使得学生能够学习更好的分类边界。此外,学生模型被用来更新教师模型,并使用了指数移动平均(EMA)来识别新的困难实例,以便在后续训练迭代中进行更新。实验结果表明,MHIM-MIL在CAMELYON-16和TCGA肺癌数据集上表现出色,比latest方法更高的性能和训练成本。代码可以在:https://github.com/DearCaat/MHIM-MIL。

An Overview Of Temporal Commonsense Reasoning and Acquisition

  • paper_url: http://arxiv.org/abs/2308.00002
  • repo_url: None
  • paper_authors: Georg Wenzel, Adam Jatowt
  • for: 本文旨在探讨大语言模型在时间常识逻辑 reasoning 方面的表现,并提出了一些增强语言模型表现的方法。
  • methods: 本文使用了多种增强方法,包括数据增强、随机隐藏状态和随机掩码等,以提高语言模型的时间常识逻辑能力。
  • results: despite the use of these augmentations, the models still struggle to approach human performance on reasoning tasks over temporal common sense properties, such as the typical occurrence times, orderings, or durations of events.
    Abstract Temporal commonsense reasoning refers to the ability to understand the typical temporal context of phrases, actions, and events, and use it to reason over problems requiring such knowledge. This trait is essential in temporal natural language processing tasks, with possible applications such as timeline summarization, temporal question answering, and temporal natural language inference. Recent research on the performance of large language models suggests that, although they are adept at generating syntactically correct sentences and solving classification tasks, they often take shortcuts in their reasoning and fall prey to simple linguistic traps. This article provides an overview of research in the domain of temporal commonsense reasoning, particularly focusing on enhancing language model performance through a variety of augmentations and their evaluation across a growing number of datasets. However, these augmented models still struggle to approach human performance on reasoning tasks over temporal common sense properties, such as the typical occurrence times, orderings, or durations of events. We further emphasize the need for careful interpretation of research to guard against overpromising evaluation results in light of the shallow reasoning present in transformers. This can be achieved by appropriately preparing datasets and suitable evaluation metrics.
    摘要 时间常识逻辑指的是理解phrase、action和event的典型时间上下文,并使用这些知识来解决问题。这种 trait 是 temporal natural language processing 任务的关键特征,可能的应用包括时间线概要、时间问答和时间自然语言推理。Recent research 表明,虽然大语言模型能够生成正确的语法结构和解决分类任务,但它们经常采取短cuts 的思维方式,容易受到simple linguistic traps 的影响。本文提供了 temporal commonsense reasoning 领域的研究概述,特别是通过多种加强和其评估在不断增长的数据集上。然而,这些加强模型仍然无法 approached human performance 在时间常识性Property上,如事件的典型发生时间、顺序或持续时间。我们进一步强调需要在研究中进行仔细的解释,以避免因 transformers 的浅层理解而导致的误导。这可以通过适当的数据准备和评估 metric 来实现。

A Practical Recipe for Federated Learning Under Statistical Heterogeneity Experimental Design

  • paper_url: http://arxiv.org/abs/2307.15245
  • repo_url: https://github.com/mmorafah/fedzoo-bench
  • paper_authors: Mahdi Morafah, Weijia Wang, Bill Lin
  • for: 本研究旨在探讨 Federated Learning (FL) 在数据不同性下的应用,并提供一个可比较的和有奖励的实验设置。
  • methods: 本研究使用了多种 FL 方法,包括 22 种state-of-the-art 方法,并提供了一个开源库 PyTorch 的实现。
  • results: 研究发现了 FL 特有的实验变量对性能的影响,并提供了一些建议和标准化的特性,以帮助设计更加有意义和有奖励的 FL 实验设置。
    Abstract Federated Learning (FL) has been an area of active research in recent years. There have been numerous studies in FL to make it more successful in the presence of data heterogeneity. However, despite the existence of many publications, the state of progress in the field is unknown. Many of the works use inconsistent experimental settings and there are no comprehensive studies on the effect of FL-specific experimental variables on the results and practical insights for a more comparable and consistent FL experimental setup. Furthermore, the existence of several benchmarks and confounding variables has further complicated the issue of inconsistency and ambiguity. In this work, we present the first comprehensive study on the effect of FL-specific experimental variables in relation to each other and performance results, bringing several insights and recommendations for designing a meaningful and well-incentivized FL experimental setup. We further aid the community by releasing FedZoo-Bench, an open-source library based on PyTorch with pre-implementation of 22 state-of-the-art methods, and a broad set of standardized and customizable features available at https://github.com/MMorafah/FedZoo-Bench. We also provide a comprehensive comparison of several state-of-the-art (SOTA) methods to better understand the current state of the field and existing limitations.
    摘要 Federated Learning (FL) 是近年来的一个热点领域,有很多研究来使其在数据不同性时更成功。然而,尽管有很多论文,但现状的进步还未得到了一个全面的了解。许多研究使用不一致的实验设置,并没有系统性的研究FL特有的实验变量对结果和实践建议。此外,存在多个标准准则和干扰变量,导致了不一致和混乱的问题。在这项工作中,我们提供了FL特有实验变量与其他变量之间的首次全面研究,从而获得了许多新的发现和建议,以及设计一个有意义和有奖励的FL实验设置。此外,我们还发布了FedZoo-Bench,一个基于PyTorch的开源库,包含22种当前领导的方法的预实现,以及一个广泛的标准化和自定义功能,可以在https://github.com/MMorafah/FedZoo-Bench中获取。此外,我们还提供了多种当前领导方法的比较,以更好地了解现场的状况和存在的限制。

BOURNE: Bootstrapped Self-supervised Learning Framework for Unified Graph Anomaly Detection

  • paper_url: http://arxiv.org/abs/2307.15244
  • repo_url: https://github.com/Jackson117/BOURNE
  • paper_authors: Jie Liu, Mengting He, Xuequn Shang, Jieming Shi, Bin Cui, Hongzhi Yin
  • for: 这篇论文的目的是提出一个统一的图像异常检测方法,以检测图像中的节点和边异常。
  • methods: 本论文使用的方法包括图像观察中心的节点和边异常检测模型,以及一个bootstrapped自我监督学习架构(BOURNE)。BOURNE使用图像和图像对映的方法来捕捉节点和边的表现,并通过节点和边之间的对映来实现节点和边异常的互相检测。
  • results: 实验结果显示,BOURNE在6个benchmark dataset上具有较高的异常检测效果和效率,并且可以处理大型图像。
    Abstract Graph anomaly detection (GAD) has gained increasing attention in recent years due to its critical application in a wide range of domains, such as social networks, financial risk management, and traffic analysis. Existing GAD methods can be categorized into node and edge anomaly detection models based on the type of graph objects being detected. However, these methods typically treat node and edge anomalies as separate tasks, overlooking their associations and frequent co-occurrences in real-world graphs. As a result, they fail to leverage the complementary information provided by node and edge anomalies for mutual detection. Additionally, state-of-the-art GAD methods, such as CoLA and SL-GAD, heavily rely on negative pair sampling in contrastive learning, which incurs high computational costs, hindering their scalability to large graphs. To address these limitations, we propose a novel unified graph anomaly detection framework based on bootstrapped self-supervised learning (named BOURNE). We extract a subgraph (graph view) centered on each target node as node context and transform it into a dual hypergraph (hypergraph view) as edge context. These views are encoded using graph and hypergraph neural networks to capture the representations of nodes, edges, and their associated contexts. By swapping the context embeddings between nodes and edges and measuring the agreement in the embedding space, we enable the mutual detection of node and edge anomalies. Furthermore, we adopt a bootstrapped training strategy that eliminates the need for negative sampling, enabling BOURNE to handle large graphs efficiently. Extensive experiments conducted on six benchmark datasets demonstrate the superior effectiveness and efficiency of BOURNE in detecting both node and edge anomalies.
    摘要 GRAPH anomaly detection (GAD) 在过去几年内得到了越来越多的关注,因为它在各种领域中具有重要的应用,如社交网络、金融风险管理和交通分析。现有的 GAD 方法可以分为基于图对象类型的节点和边异常检测模型。然而,这些方法通常将节点和边异常视为分开的任务,忽略了它们在实际图中的相互关系和常见的共occurrence。这会导致它们无法利用节点和边异常的相互信息进行互助检测。此外,现有的 GAD 方法,如 CoLA 和 SL-GAD,通常依赖于负样本采样,这会增加计算成本,使其不可扩展到大型图。为解决这些限制,我们提出了一种基于自我超vision learning的新的统一图异常检测框架(名为 BOURNE)。我们将目标节点所在的子图(图视图)中心化为节点上下文,并将其转换成双向图(双向图视图)来表示边上下文。这些视图被图和双向图神经网络编码,以Capture图节点、边和其相关上下文的表示。通过交换节点和边上下文嵌入的协调,我们实现了节点和边异常之间的互助检测。此外,我们采用了自我超视learning的培训策略,不需要负样本,从而使 BOURNE 可以高效地处理大型图。我们在六个 benchmark 数据集上进行了广泛的实验,结果表明 BOURNE 能够高效地检测节点和边异常。

Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures

  • paper_url: http://arxiv.org/abs/2307.15220
  • repo_url: https://github.com/camma-public/surgvlp
  • paper_authors: Kun Yuan, Vinkle Srivastav, Tong Yu, Joel Lavanchy, Pietro Mascagni, Nassir Navab, Nicolas Padoy
  • for: 本研究旨在使用开放式 Laparoscopic surgery 视频教程提供有效的超级视觉语言学习指导,无需人工标注。
  • methods: 我们使用多种自动语音识别系统生成视频lecture 的文本转写,并提出了一种新的多模态表示学习方法——SurgVLP,用于对视频和文本进行共同表示学习。
  • results: 我们在多种视觉语言任务上展示了我们的方法的表示能力,包括文本基于视频检索、时间活动固定和视频描述等。此外,我们还证明了我们的方法可以无需人工标注来进行传统的视觉下游任务,如手术工具、阶段和 triplet 识别。
    Abstract Recent advancements in surgical computer vision applications have been driven by fully-supervised methods, primarily using only visual data. These methods rely on manually annotated surgical videos to predict a fixed set of object categories, limiting their generalizability to unseen surgical procedures and downstream tasks. In this work, we put forward the idea that the surgical video lectures available through open surgical e-learning platforms can provide effective supervisory signals for multi-modal representation learning without relying on manual annotations. We address the surgery-specific linguistic challenges present in surgical video lectures by employing multiple complementary automatic speech recognition systems to generate text transcriptions. We then present a novel method, SurgVLP - Surgical Vision Language Pre-training, for multi-modal representation learning. SurgVLP constructs a new contrastive learning objective to align video clip embeddings with the corresponding multiple text embeddings by bringing them together within a joint latent space. To effectively show the representation capability of the learned joint latent space, we introduce several vision-and-language tasks for surgery, such as text-based video retrieval, temporal activity grounding, and video captioning, as benchmarks for evaluation. We further demonstrate that without using any labeled ground truth, our approach can be employed for traditional vision-only surgical downstream tasks, such as surgical tool, phase, and triplet recognition. The code will be made available at https://github.com/CAMMA-public/SurgVLP
    摘要 近期的手术计算机视觉应用程序得到了完全监督的方法驱动,主要使用视觉数据。这些方法依赖于手术视频的手动标注来预测固定的对象类别,这限制了它们对未经见过手术过程和下游任务的泛化能力。在这种工作中,我们提出了使用开放手术电子学习平台上的手术视频课程来提供有效的监督信号,以实现多模态表示学习而无需手动标注。我们对手术视频课程中存在的手术语言特有挑战采用多种自动语音识别系统来生成文本转录。然后,我们提出了一种新的多模态表示学习方法——手术视语言预训练(SurgVLP)。SurgVLP构建了一个新的对比学习目标,将视频剪辑embedding与相应的多个文本embedding集成在一个共同的latent空间中。为了有效地展示学习得到的共同空间表示能力,我们引入了许多视频和语言任务,如文本基于视频检索、时间活动固定和视频描述,作为评估标准。此外,我们还证明了我们的方法无需使用任何标注数据,可以用于传统的视觉下游任务,如手术工具、阶段和 triplet 识别。代码将在https://github.com/CAMMA-public/SurgVLP 上公开。

Reachability Poorman Discrete-Bidding Games

  • paper_url: http://arxiv.org/abs/2307.15218
  • repo_url: None
  • paper_authors: Guy Avni, Tobias Meggendorfer, Suman Sadhukhan, Josef Tkadlec, Đorđe Žikelić
  • for: 本研究是关于“拍卖游戏”(bidding games),特别是在图格上进行的两个玩家零点游戏。
  • methods: 本研究使用“贫人精确拍卖”(poorman discrete-bidding)机制,其中竞拍奖金是限制的,高得分玩家将奖金支付给银行。
  • results: 研究发现,在图DAGs中,贫人预算的阈值可以在某些情况下提供误差 bounds,并且具有周期性。在特定情况下,我们还发现了关闭式解决方案。我们还实现了一种算法来找到阈值预算。
    Abstract We consider {\em bidding games}, a class of two-player zero-sum {\em graph games}. The game proceeds as follows. Both players have bounded budgets. A token is placed on a vertex of a graph, in each turn the players simultaneously submit bids, and the higher bidder moves the token, where we break bidding ties in favor of Player 1. Player 1 wins the game iff the token visits a designated target vertex. We consider, for the first time, {\em poorman discrete-bidding} in which the granularity of the bids is restricted and the higher bid is paid to the bank. Previous work either did not impose granularity restrictions or considered {\em Richman} bidding (bids are paid to the opponent). While the latter mechanisms are technically more accessible, the former is more appealing from a practical standpoint. Our study focuses on {\em threshold budgets}, which is the necessary and sufficient initial budget required for Player 1 to ensure winning against a given Player 2 budget. We first show existence of thresholds. In DAGs, we show that threshold budgets can be approximated with error bounds by thresholds under continuous-bidding and that they exhibit a periodic behavior. We identify closed-form solutions in special cases. We implement and experiment with an algorithm to find threshold budgets.
    摘要 我们考虑{\em 拍卖游戏},一种两player零余{\em 图形游戏}。游戏进行如下:两名玩家都有固定预算。一个 токен被放在一个图形上的顶点上,在每次转折时,两名玩家同时提交拍卖,高拍卖者可以移动 токен,并且在拍卖僵固时,将拍卖赢家决定为 Player 1。Player 1 赢得游戏,只要 токен到达一个指定的目标顶点。我们在这篇研究中,以前无法实现的{\em 穷人粗糙拍卖},即拍卖时的价格粗糙限制,并且不同于前一些研究,不允许玩家在拍卖时付出费用。我们的研究集中在{\em 阈值预算},即玩家必须具备的最低预算,以确保在对某名玩家的预算下获得胜利。我们首先证明存在阈值。在DAGs中,我们证明阈值预算可以准确地 aproximated ,并且它们展现了一个周期性的行为。我们还发现了特殊情况下的关闭式解。我们实现了一个算法,以找到阈值预算。

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

  • paper_url: http://arxiv.org/abs/2307.15217
  • repo_url: None
  • paper_authors: Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Bıyık, Anca Dragan, David Krueger, Dorsa Sadigh, Dylan Hadfield-Menell
  • for: 这篇论文旨在探讨人工智能系统RLHF的问题和限制,以及如何更好地开发更安全的AI系统。
  • methods: 论文使用了RLHF和相关方法的评估和改进方法,以及如何在实践中使用这些方法。
  • results: 论文提出了RLHF和相关方法的开放问题和基本限制,并提出了优化和补偿这些方法的建议。
    Abstract Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the central method used to finetune state-of-the-art large language models (LLMs). Despite this popularity, there has been relatively little public work systematizing its flaws. In this paper, we (1) survey open problems and fundamental limitations of RLHF and related methods; (2) overview techniques to understand, improve, and complement RLHF in practice; and (3) propose auditing and disclosure standards to improve societal oversight of RLHF systems. Our work emphasizes the limitations of RLHF and highlights the importance of a multi-faceted approach to the development of safer AI systems.
    摘要 人工智能强化学习(RLHF)是一种训练人工智能系统以实现人类目标的技术。RLHF已经成为现代大语言模型(LLM)的训练方法的中心。尽管如此,RLHF的问题和限制得到了相对较少的公共研究。在这篇论文中,我们(1)survey了RLHF和相关方法的开放问题和基本限制;(2)介绍了RLHF在实践中的理解、改进和补充方法;(3)提出了审核和披露标准,以提高RLHF系统的社会监管。我们的工作强调RLHF的限制,并高调了在RLHF系统的开发中采取多方面的方法,以建立更安全的人工智能系统。

PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization

  • paper_url: http://arxiv.org/abs/2307.15199
  • repo_url: None
  • paper_authors: Junhyeong Cho, Gilhyun Nam, Sungyeon Kim, Hunmin Yang, Suha Kwak
  • for: 这个论文旨在提出一种不需要任何图像的源自由频谱适应方法,以便在视觉语言空间中生成多种风格特征。
  • methods: 该方法使用提示来生成多种风格特征,并使用可学习的风格词vector来 Represent these styles。为确保风格特征不会扭曲内容信息,该方法在视觉语言空间中强制风格-内容特征之间的相互靠近。
  • results: 该方法在PACS、VLCS、OfficeHome和DomainNet等四个 datasets上达到了状态对的最佳性能,而不需要任何图像进行训练。
    Abstract In a joint vision-language space, a text feature (e.g., from "a photo of a dog") could effectively represent its relevant image features (e.g., from dog photos). Also, a recent study has demonstrated the cross-modal transferability phenomenon of this joint space. From these observations, we propose PromptStyler which simulates various distribution shifts in the joint space by synthesizing diverse styles via prompts without using any images to deal with source-free domain generalization. The proposed method learns to generate a variety of style features (from "a S* style of a") via learnable style word vectors for pseudo-words S*. To ensure that learned styles do not distort content information, we force style-content features (from "a S* style of a [class]") to be located nearby their corresponding content features (from "[class]") in the joint vision-language space. After learning style word vectors, we train a linear classifier using synthesized style-content features. PromptStyler achieves the state of the art on PACS, VLCS, OfficeHome and DomainNet, even though it does not require any images for training.
    摘要 在共同视语空间中,文本特征(例如来自“狗照片”)可以有效表示相关的图像特征(例如狗照片中的特征)。此外,一项latest study发现了这个共同空间的各模态传递现象。基于这些观察,我们提出了PromptStyler,它在共同空间中通过提示 simulate various distribution shifts,无需使用任何图像进行源自无图像领域泛化。我们的方法学习生成多种风格特征(例如“S*风格”) via 可学习的风格词vecorts for pseudo-words S*。为确保学习的风格不会扭曲内容信息,我们强制风格-内容特征(例如“S*风格的[类别]”)在共同视语空间中与其相应的内容特征(例如“[类别]”)相 nearby。之后,我们使用生成的风格-内容特征进行线性分类。PromptStyler实现了在PACS、VLCS、OfficeHome和DomainNet上的state of the art,即使没有使用任何图像进行训练。

One-shot Joint Extraction, Registration and Segmentation of Neuroimaging Data

  • paper_url: http://arxiv.org/abs/2307.15198
  • repo_url: https://github.com/anonymous4545/jers
  • paper_authors: Yao Su, Zhentian Qian, Lei Ma, Lifang He, Xiangnan Kong
  • for: 本研究旨在开发一种基于单个标注图像(即Atlas)和几个未标注原始图像的一拟合批处理方法,以提高脑成像数据中的抽取、准确和分割等预处理步骤的效果。
  • methods: 本研究提出了一种统一的端到端框架,称为JERS,用于联合优化抽取、准确和分割任务。该框架使用了一组抽取、准确和分割模块,通过自我超级视觉来互相强化和促进Feedback。
  • results: 实验结果表明,我们提出的方法在抽取、准确和分割任务中表现出色,并且可以在实际 dataset 上减少人工干预和标注量。 codes 和数据可以在https://github.com/Anonymous4545/JERS 找到。
    Abstract Brain extraction, registration and segmentation are indispensable preprocessing steps in neuroimaging studies. The aim is to extract the brain from raw imaging scans (i.e., extraction step), align it with a target brain image (i.e., registration step) and label the anatomical brain regions (i.e., segmentation step). Conventional studies typically focus on developing separate methods for the extraction, registration and segmentation tasks in a supervised setting. The performance of these methods is largely contingent on the quantity of training samples and the extent of visual inspections carried out by experts for error correction. Nevertheless, collecting voxel-level labels and performing manual quality control on high-dimensional neuroimages (e.g., 3D MRI) are expensive and time-consuming in many medical studies. In this paper, we study the problem of one-shot joint extraction, registration and segmentation in neuroimaging data, which exploits only one labeled template image (a.k.a. atlas) and a few unlabeled raw images for training. We propose a unified end-to-end framework, called JERS, to jointly optimize the extraction, registration and segmentation tasks, allowing feedback among them. Specifically, we use a group of extraction, registration and segmentation modules to learn the extraction mask, transformation and segmentation mask, where modules are interconnected and mutually reinforced by self-supervision. Empirical results on real-world datasets demonstrate that our proposed method performs exceptionally in the extraction, registration and segmentation tasks. Our code and data can be found at https://github.com/Anonymous4545/JERS
    摘要 脑部提取、注册和分割是 neuroscience 研究中不可或缺的前processing 步骤。目的是从 raw 成像扫描中提取脑部(i.e., 提取步骤),将其与目标脑部图像(i.e., 注册步骤)进行对接,并将脑部区域标注为不同的 анатомические区域(i.e., 分割步骤)。传统的研究通常会对提取、注册和分割任务进行分别的开发,并在超级vised Setting中进行训练。然而,收集 voxel-level 标签和进行手动质量控制高维度 neuroscience 成像数据(例如 3D MRI)是许多医学研究中的昂贵和时间consuming。在这篇论文中,我们研究了一种一遍性的脑部提取、注册和分割方法,该方法只需要一个标注图像(即 atlas)和一些 raw 成像数据进行训练。我们提出了一个统一的端到端框架,称之为 JERS,以同时优化提取、注册和分割任务,并允许Feedback among them。具体来说,我们使用一组提取、注册和分割模块,通过自我监督来学习提取 маMask,变换和分割 mask,这些模块之间存在相互之间的连接和互相强化。我们在实际数据上进行了实验,结果表明我们的提案方法在提取、注册和分割任务中表现出色。我们的代码和数据可以在 https://github.com/Anonymous4545/JERS 找到。

Learning in Repeated Multi-Unit Pay-As-Bid Auctions

  • paper_url: http://arxiv.org/abs/2307.15193
  • repo_url: None
  • paper_authors: Rigel Galgana, Negin Golrezaei
  • For: The paper is written for learning how to bid in repeated multi-unit pay-as-bid auctions, with the goal of maximizing revenue.* Methods: The paper uses dynamic programming and online learning algorithms with polynomial time and space complexity under full information and bandit feedback settings.* Results: The paper achieves an upper bound on regret of $O(M\sqrt{T\log |\mathcal{B}|})$ and $O(M\sqrt{|\mathcal{B}|T\log |\mathcal{B}|})$ respectively, and demonstrates through numerical results that the resulting market dynamics converge to a welfare maximizing equilibrium where bidders submit uniform bids. Additionally, the paper shows that the pay-as-bid auction consistently generates significantly higher revenue compared to its popular alternative, the uniform price auction.Here is the same information in Simplified Chinese:* For: 这篇论文是为了学习在重复的多个单位付款为投标的拍卖中,以最大化收益为目的。* Methods: 论文使用动态规划和在线学习算法,具有对数时间和空间复杂度的优化。* Results: 论文实现了对 regret 的Upper bound,并通过数值实验表明,市场动态 converge 到一个最大化利益的均衡点,投标者提交均匀投标。此外,论文还表明,付款拍卖可以与通常的固定价格拍卖相比,一直高得多。
    Abstract Motivated by Carbon Emissions Trading Schemes, Treasury Auctions, and Procurement Auctions, which all involve the auctioning of homogeneous multiple units, we consider the problem of learning how to bid in repeated multi-unit pay-as-bid auctions. In each of these auctions, a large number of (identical) items are to be allocated to the largest submitted bids, where the price of each of the winning bids is equal to the bid itself. The problem of learning how to bid in pay-as-bid auctions is challenging due to the combinatorial nature of the action space. We overcome this challenge by focusing on the offline setting, where the bidder optimizes their vector of bids while only having access to the past submitted bids by other bidders. We show that the optimal solution to the offline problem can be obtained using a polynomial time dynamic programming (DP) scheme. We leverage the structure of the DP scheme to design online learning algorithms with polynomial time and space complexity under full information and bandit feedback settings. We achieve an upper bound on regret of $O(M\sqrt{T\log |\mathcal{B}|})$ and $O(M\sqrt{|\mathcal{B}|T\log |\mathcal{B}|})$ respectively, where $M$ is the number of units demanded by the bidder, $T$ is the total number of auctions, and $|\mathcal{B}|$ is the size of the discretized bid space. We accompany these results with a regret lower bound, which match the linear dependency in $M$. Our numerical results suggest that when all agents behave according to our proposed no regret learning algorithms, the resulting market dynamics mainly converge to a welfare maximizing equilibrium where bidders submit uniform bids. Lastly, our experiments demonstrate that the pay-as-bid auction consistently generates significantly higher revenue compared to its popular alternative, the uniform price auction.
    摘要 受到碳排放交易制度、储蓄拍卖和采购拍卖的启发,我们考虑了在重复的多单位付出拍卖中学习投标的问题。在这些拍卖中,大量相同的物品需要分配给最大的提交投标价格,其中每个赢得投标价格都等于投标价格本身。投标在付出拍卖中的问题具有 combinatorial 性,这使得问题更加挑战。我们通过关注线上设置,即bidder在过去其他投标者提交的投标中仅有访问 Vector of bids 的问题来解决这个挑战。我们表明了在线上问题的优化解决方案可以在 polynomial time 内完成。我们利用 DP 算法的结构来设计在线学习算法,其时间复杂度和空间复杂度均为 O(M\*sqrt(T\*log(|\mathcal{B}|))),其中 M 是投标者需要的单位数量,T 是总的拍卖数量,并且 $|\mathcal{B}|$ 是投标空间中的精度。我们还提供了一个 regret 下界,它与 M 的线性相似。我们的数值结果表明,当所有代理人按照我们建议的无恐学习算法进行投标时,市场动态会主要向积极的均衡点转化,其中投标者会提交均匀投标。最后,我们的实验表明,付出拍卖routinely 生成较高的收益,相比于其受欢迎的替代方案 uniform price auction。

Med-Flamingo: a Multimodal Medical Few-shot Learner

  • paper_url: http://arxiv.org/abs/2307.15189
  • repo_url: https://github.com/snap-stanford/med-flamingo
  • paper_authors: Michael Moor, Qian Huang, Shirley Wu, Michihiro Yasunaga, Cyril Zakka, Yash Dalmia, Eduardo Pontes Reis, Pranav Rajpurkar, Jure Leskovec
  • for: 这个研究旨在提出一个适应医疗领域多Modal Few-shot Learning的模型,以满足医疗应用中资料稀少的问题。
  • methods: 本研究基于OpenFlamingo-9B,继续进行预训练,使用医疗图像和文本数据库,并实现了几个医疗问题的解释。
  • results: 研究结果显示,Med-Flamingo可以在几个医疗问题上表现出优秀的生成能力,并且可以提供解释,具体来说是20%的提升在医生评价中。此外,本研究首次进行了人类评价,并发现Med-Flamingo可以在不同的医疗问题上提供更好的解释。
    Abstract Medicine, by its nature, is a multifaceted domain that requires the synthesis of information across various modalities. Medical generative vision-language models (VLMs) make a first step in this direction and promise many exciting clinical applications. However, existing models typically have to be fine-tuned on sizeable down-stream datasets, which poses a significant limitation as in many medical applications data is scarce, necessitating models that are capable of learning from few examples in real-time. Here we propose Med-Flamingo, a multimodal few-shot learner adapted to the medical domain. Based on OpenFlamingo-9B, we continue pre-training on paired and interleaved medical image-text data from publications and textbooks. Med-Flamingo unlocks few-shot generative medical visual question answering (VQA) abilities, which we evaluate on several datasets including a novel challenging open-ended VQA dataset of visual USMLE-style problems. Furthermore, we conduct the first human evaluation for generative medical VQA where physicians review the problems and blinded generations in an interactive app. Med-Flamingo improves performance in generative medical VQA by up to 20\% in clinician's rating and firstly enables multimodal medical few-shot adaptations, such as rationale generation. We release our model, code, and evaluation app under https://github.com/snap-stanford/med-flamingo.
    摘要 医学是一个多方面的领域,需要将多种模式的信息集成起来。医学生成视语模型(VLM)可以作为第一步,并且承诺了许多临床应用。然而,现有模型通常需要在大量下游数据集上练习,这会限制其在医学应用中的使用,因为在许多医学应用中数据是稀缺的,需要能够从少量示例中学习。在这里,我们提出了医学鹭鸟(Med-Flamingo),一种适应医学领域的多Modal几个shot学习者。基于OpenFlamingo-9B,我们继续预训练在医学图像和文本数据集上,并且在医学图像和文本数据集上进行了交叉和混合预训练。Med-Flamingo实现了几个shot的生成医学视觉问答(VQA)能力,我们评估了这些能力在多个数据集上,包括一个新的开放式VQA数据集,这些数据集包括视频USMLE风格问题。此外,我们进行了第一次的人类评估 для生成医学VQA, Physicians review了问题和潜在的生成,并在交互应用中进行了评估。Med-Flamingo提高了生成医学VQA的性能,提高了临床评估员的评分,并且首次实现了多Modal医学几个shot适应。我们将我们的模型、代码和评估应用发布在https://github.com/snap-stanford/med-flamingo。

Rotation-Invariant Random Features Provide a Strong Baseline for Machine Learning on 3D Point Clouds

  • paper_url: http://arxiv.org/abs/2308.06271
  • repo_url: https://github.com/meliao/rotation-invariant-random-features
  • paper_authors: Owen Melia, Eric Jonas, Rebecca Willett
  • for: 这个论文的目的是研究三维点云数据上的征要学习方法,以实现具有旋转对称性的函数学习。
  • methods: 这个论文使用了随机特征方法,并对其进行了三维旋转对称性的扩展,以便快速评估点云数据上的函数。
  • results: 实验表明,这个方法可以与通用的旋转不变深度神经网络相比或超越其性能,并且具有许多任务的通用性和快速评估特点。
    Abstract Rotational invariance is a popular inductive bias used by many fields in machine learning, such as computer vision and machine learning for quantum chemistry. Rotation-invariant machine learning methods set the state of the art for many tasks, including molecular property prediction and 3D shape classification. These methods generally either rely on task-specific rotation-invariant features, or they use general-purpose deep neural networks which are complicated to design and train. However, it is unclear whether the success of these methods is primarily due to the rotation invariance or the deep neural networks. To address this question, we suggest a simple and general-purpose method for learning rotation-invariant functions of three-dimensional point cloud data using a random features approach. Specifically, we extend the random features method of Rahimi & Recht 2007 by deriving a version that is invariant to three-dimensional rotations and showing that it is fast to evaluate on point cloud data. We show through experiments that our method matches or outperforms the performance of general-purpose rotation-invariant neural networks on standard molecular property prediction benchmark datasets QM7 and QM9. We also show that our method is general-purpose and provides a rotation-invariant baseline on the ModelNet40 shape classification task. Finally, we show that our method has an order of magnitude smaller prediction latency than competing kernel methods.
    摘要 rotational invariance 是机器学习中广泛使用的一种印度预测,如计算机视觉和量子化学机器学习。无需旋转的机器学习方法已经设置了许多任务的州OF-the-art,包括分子性质预测和3D形状分类。这些方法通常 Either rely on task-specific rotation-invariant features or use general-purpose deep neural networks, which are complicated to design and train. However, it is unclear whether the success of these methods is primarily due to the rotation invariance or the deep neural networks. To address this question, we propose a simple and general-purpose method for learning rotation-invariant functions of three-dimensional point cloud data using a random features approach. Specifically, we extend the random features method of Rahimi & Recht 2007 by deriving a version that is invariant to three-dimensional rotations and showing that it is fast to evaluate on point cloud data. We show through experiments that our method matches or outperforms the performance of general-purpose rotation-invariant neural networks on standard molecular property prediction benchmark datasets QM7 and QM9. We also show that our method is general-purpose and provides a rotation-invariant baseline on the ModelNet40 shape classification task. Finally, we show that our method has an order of magnitude smaller prediction latency than competing kernel methods.

RCT Rejection Sampling for Causal Estimation Evaluation

  • paper_url: http://arxiv.org/abs/2307.15176
  • repo_url: https://github.com/kakeith/rct_rejection_sampling
  • paper_authors: Katherine A. Keith, Sergey Feldman, David Jurgens, Jonathan Bragg, Rohit Bhattacharya
  • for: 这个论文旨在提高对 observational data 中 causal effect 的估计,并解决高维 covariate 的干扰问题。
  • methods: 该论文提出了一种基于机器学习方法的 adjustment 方法,用于解决 causal estimation 中的干扰问题。 authors 还提出了一种新的抽样算法,称为 RCT rejection sampling,并提供了理论保证, garanting causal identification 在 observational data 中。
  • results: 通过使用 simulate data, authors 证明了其算法在 oracle estimators 上的低偏度性。 In addition, authors 还 highlighted 一些 finite data 考虑因素,以便在实际应用中使用 RCT rejection sampling。 as a proof of concept, authors 实现了一个 example evaluation pipeline, 并详细介绍了这些 finite data 考虑因素。
    Abstract Confounding is a significant obstacle to unbiased estimation of causal effects from observational data. For settings with high-dimensional covariates -- such as text data, genomics, or the behavioral social sciences -- researchers have proposed methods to adjust for confounding by adapting machine learning methods to the goal of causal estimation. However, empirical evaluation of these adjustment methods has been challenging and limited. In this work, we build on a promising empirical evaluation strategy that simplifies evaluation design and uses real data: subsampling randomized controlled trials (RCTs) to create confounded observational datasets while using the average causal effects from the RCTs as ground-truth. We contribute a new sampling algorithm, which we call RCT rejection sampling, and provide theoretical guarantees that causal identification holds in the observational data to allow for valid comparisons to the ground-truth RCT. Using synthetic data, we show our algorithm indeed results in low bias when oracle estimators are evaluated on the confounded samples, which is not always the case for a previously proposed algorithm. In addition to this identification result, we highlight several finite data considerations for evaluation designers who plan to use RCT rejection sampling on their own datasets. As a proof of concept, we implement an example evaluation pipeline and walk through these finite data considerations with a novel, real-world RCT -- which we release publicly -- consisting of approximately 70k observations and text data as high-dimensional covariates. Together, these contributions build towards a broader agenda of improved empirical evaluation for causal estimation.
    摘要 干扰是观察数据中 causal 效应的重要障碍。在高维 covariate 的设置下(如文本数据、 genomics 或行为社会科学),研究人员已经提出了适应机器学习方法以便 causal 估计的调整方法。然而,实际评估这些调整方法的困难和有限。在这项工作中,我们基于一种有前途的评估策略,即使用 randomized controlled trials (RCTs) 的平均 causal 效应作为真实参照值,并提出了一种新的抽样算法,称为 RCT 拒绝抽样。我们提供了理论保证,表明在观察数据中, causal 标识是可行的,从而允许有效地与参照值 RCT 进行比较。使用 sintetic 数据,我们证明了我们的算法在 oracle 估计器中的低偏误。此外,我们还提出了一些实际评估设计师应该考虑的有限数据问题。作为证明,我们实现了一个示例评估管道,并详细介绍了这些有限数据问题。作为证明,我们发布了一个新的、实际存在的 RCT,包含约 70k 个观察和文本数据作为高维 covariate。总的来说,这些贡献共同推动了观察数据中 causal 估计的有效评估。

VISU at WASSA 2023 Shared Task: Detecting Emotions in Reaction to News Stories Leveraging BERT and Stacked Embeddings

  • paper_url: http://arxiv.org/abs/2307.15164
  • repo_url: None
  • paper_authors: Vivek Kumar, Sushmita Singh, Prayag Tiwari
  • for: 这篇论文是为了探讨情感识别FROM essays written in reaction to news articles的问题而写的。
  • methods: 这篇论文使用了深度学习(DL)模型,将词嵌入表示与特化的预处理策略相结合,以捕捉表达的情感细节。实验使用了静止和上下文嵌入(个体和堆叠),以及BIaLSTM和Transformer基本模型。
  • results: 这篇论文在WASSA 2023 Shared Task(3)中的情感识别任务中获得了rank十的成绩,即Macro F1-Score为0.2717,证明了我们实施的方法在小型和不均衡数据集中的效果。
    Abstract Our system, VISU, participated in the WASSA 2023 Shared Task (3) of Emotion Classification from essays written in reaction to news articles. Emotion detection from complex dialogues is challenging and often requires context/domain understanding. Therefore in this research, we have focused on developing deep learning (DL) models using the combination of word embedding representations with tailored prepossessing strategies to capture the nuances of emotions expressed. Our experiments used static and contextual embeddings (individual and stacked) with Bidirectional Long short-term memory (BiLSTM) and Transformer based models. We occupied rank tenth in the emotion detection task by scoring a Macro F1-Score of 0.2717, validating the efficacy of our implemented approaches for small and imbalanced datasets with mixed categories of target emotions.
    摘要 我们的系统,VISU,参加了2023年WASSA分享任务(3)的情感分类从新闻文章中的反应文章。情感检测从复杂对话中是挑战,因此在这项研究中,我们将重点发展深度学习(DL)模型,使用词嵌入表示和特制预处理策略来捕捉表达出的情感含义。我们的实验使用静态和上下文嵌入(个体和堆叠)以及双向长短Memory(BiLSTM)和转换器基于模型。我们在情感检测任务中占据了排名第十的位置,取得了macro F1分数0.2717,证明我们实施的方法对小数据集和混合类目标情感任务具有效果。

Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation

  • paper_url: http://arxiv.org/abs/2308.07931
  • repo_url: None
  • paper_authors: William Shen, Ge Yang, Alan Yu, Jansen Wong, Leslie Pack Kaelbling, Phillip Isola
  • for: bridges the 2D-to-3D gap for robotic manipulation
  • methods: leverages distilled feature fields to combine accurate 3D geometry with rich semantics from 2D foundation models
  • results: achieves in-the-wild generalization to unseen objects using few-shot learning method for 6-DOF grasping and placing
    Abstract Self-supervised and language-supervised image models contain rich knowledge of the world that is important for generalization. Many robotic tasks, however, require a detailed understanding of 3D geometry, which is often lacking in 2D image features. This work bridges this 2D-to-3D gap for robotic manipulation by leveraging distilled feature fields to combine accurate 3D geometry with rich semantics from 2D foundation models. We present a few-shot learning method for 6-DOF grasping and placing that harnesses these strong spatial and semantic priors to achieve in-the-wild generalization to unseen objects. Using features distilled from a vision-language model, CLIP, we present a way to designate novel objects for manipulation via free-text natural language, and demonstrate its ability to generalize to unseen expressions and novel categories of objects.
    摘要 自我监督和语言监督的图像模型含有重要的世界知识,这对总化非常重要。然而,许多 робоaxi tasks需要精准的三维几何理解,而图像特征通常缺乏这种知识。这个工作将两个维度之间的 gap bridged ,使用精炼的特征场来结合准确的三维几何和丰富的语言特征,以实现在野外进行6个自由度抓取和放置的几何学掌握。使用从视觉语言模型CLIP中提取出的特征,我们提出了一种通过自然语言文本来指定新的物体 для操作的方法,并证明其能够通过未经见过的表达和新类别的物体进行总化。

Matching Patients to Clinical Trials with Large Language Models

  • paper_url: http://arxiv.org/abs/2307.15051
  • repo_url: None
  • paper_authors: Qiao Jin, Zifeng Wang, Charalampos S. Floudas, Jimeng Sun, Zhiyong Lu
  • For: The paper aims to assist individual patients and referral physicians in identifying suitable clinical trials from an extensive selection, using large language models (LLMs) to predict criterion-level eligibility with detailed explanations.* Methods: The paper introduces TrialGPT, a novel architecture that employs LLMs to predict criterion-level eligibility with detailed explanations, which are then aggregated for ranking and excluding candidate clinical trials based on free-text patient notes.* Results: The experimental results demonstrate that TrialGPT achieves high criterion-level prediction accuracy with faithful explanations, and the aggregated trial-level TrialGPT scores are highly correlated with expert eligibility annotations. The scores are also effective in ranking clinical trials and excluding ineligible candidates, but the paper acknowledges that current LLMs still make some mistakes due to limited medical knowledge and domain-specific context understanding.
    Abstract Clinical trials are vital in advancing drug development and evidence-based medicine, but their success is often hindered by challenges in patient recruitment. In this work, we investigate the potential of large language models (LLMs) to assist individual patients and referral physicians in identifying suitable clinical trials from an extensive selection. Specifically, we introduce TrialGPT, a novel architecture employing LLMs to predict criterion-level eligibility with detailed explanations, which are then aggregated for ranking and excluding candidate clinical trials based on free-text patient notes. We evaluate TrialGPT on three publicly available cohorts of 184 patients and 18,238 annotated clinical trials. The experimental results demonstrate several key findings: First, TrialGPT achieves high criterion-level prediction accuracy with faithful explanations. Second, the aggregated trial-level TrialGPT scores are highly correlated with expert eligibility annotations. Third, these scores prove effective in ranking clinical trials and exclude ineligible candidates. Our error analysis suggests that current LLMs still make some mistakes due to limited medical knowledge and domain-specific context understanding. Nonetheless, we believe the explanatory capabilities of LLMs are highly valuable. Future research is warranted on how such AI assistants can be integrated into the routine trial matching workflow in real-world settings to improve its efficiency.
    摘要 临床试验是药物开发和基于证据的医学发展的关键,但患者招募困难往往阻碍其成功。在这项工作中,我们调查了大语言模型(LLM)在帮助个人患者和推荐医生选择适合的临床试验中的潜在作用。我们介绍了一种新的建筑方案,称为TrialGPT,它使用LLM来预测临床试验权威性的详细解释,然后将这些解释聚合为排名和排除不适的临床试验。我们在三个公共可用的群组中进行了184名患者和18238个临床试验的评估。实验结果表明了以下几点:首先,TrialGPT在权威性预测中达到了高精度和详细的解释。其次,聚合的临床试验级TrialGPT分数与专家证明的可参与性注释高度相关。最后,这些分数能够有效地排名临床试验和排除不适的参与者。我们的错误分析表明,当前的LLM仍然由于医学知识和域pecific上下文理解有一些错误。然而,我们认为LLM的解释能力很值得。未来的研究应该关注如何在实际试验匹配过程中集成这些AI助手,以提高其效率。

Universal and Transferable Adversarial Attacks on Aligned Language Models

  • paper_url: http://arxiv.org/abs/2307.15043
  • repo_url: https://github.com/llm-attacks/llm-attacks
  • paper_authors: Andy Zou, Zifan Wang, J. Zico Kolter, Matt Fredrikson
    for: 这个论文的目的是提出一种简单而有效的攻击方法,使得已经被调整的语言模型产生不适的行为。methods: 这个论文使用的方法包括批处理和梯度下降搜索技术,自动生成攻击 suffix。results: 这个论文的实验结果表明,使用这种攻击方法可以让已经被调整的语言模型产生不适的行为,并且这种攻击方法可以在黑盒模型上也有效。
    Abstract Because "out-of-the-box" large language models are capable of generating a great deal of objectionable content, recent work has focused on aligning these models in an attempt to prevent undesirable generation. While there has been some success at circumventing these measures -- so-called "jailbreaks" against LLMs -- these attacks have required significant human ingenuity and are brittle in practice. In this paper, we propose a simple and effective attack method that causes aligned language models to generate objectionable behaviors. Specifically, our approach finds a suffix that, when attached to a wide range of queries for an LLM to produce objectionable content, aims to maximize the probability that the model produces an affirmative response (rather than refusing to answer). However, instead of relying on manual engineering, our approach automatically produces these adversarial suffixes by a combination of greedy and gradient-based search techniques, and also improves over past automatic prompt generation methods. Surprisingly, we find that the adversarial prompts generated by our approach are quite transferable, including to black-box, publicly released LLMs. Specifically, we train an adversarial attack suffix on multiple prompts (i.e., queries asking for many different types of objectionable content), as well as multiple models (in our case, Vicuna-7B and 13B). When doing so, the resulting attack suffix is able to induce objectionable content in the public interfaces to ChatGPT, Bard, and Claude, as well as open source LLMs such as LLaMA-2-Chat, Pythia, Falcon, and others. In total, this work significantly advances the state-of-the-art in adversarial attacks against aligned language models, raising important questions about how such systems can be prevented from producing objectionable information. Code is available at github.com/llm-attacks/llm-attacks.
    摘要 因为"out-of-the-box"大语言模型可以生成很多不适的内容,因此最近的工作都在尝试对这些模型进行对齐,以避免不适的生成。虽然有一些成功的尝试(即“监狱”对LLMs),但这些攻击需要人类的创造力,并且在实践中较脆弱。在这篇论文中,我们提出了一种简单而有效的攻击方法,使得对齐的语言模型生成不适的行为。具体来说,我们的方法找到一个适用于各种查询的 suffix,以使模型生成有利的答案(而不是拒绝回答)。而不是人工工程,我们的方法通过滥览和梯度基于搜索技术自动生成这些反对性词组,并且超过了过去的自动提示生成方法。 surprisingly,我们发现了这些反对性词组在黑盒、公共释放的LLMs中也是可转移的。我们在多个提问(即请求多种不适的内容)和多个模型(我们的案例是Vicuna-7B和13B)上训练了攻击 suffix,并且在ChatGPT、Bard和Claude等公共接口上也能够引起不适的内容。总的来说,这项工作提高了对齐语言模型的反对攻击的状态艺术,提出了如何避免这些系统生成不适的信息的问题。代码可以在github.com/llm-attacks/llm-attacks中找到。

AI Literature Review Suite

  • paper_url: http://arxiv.org/abs/2308.02443
  • repo_url: https://github.com/datovar4/ai_literature_review_suite
  • paper_authors: David A. Tovar
  • for: automate and optimize the process of literature review in academic and industrial research
  • methods: leverages open access science, large language models (LLMs), natural language processing, semantic search queries, text embeddings, and summarization
  • results: provides a comprehensive literature review, enables searching, downloading, and organizing of PDF files, and extracts content from articles with succinct summaries
    Abstract The process of conducting literature reviews is often time-consuming and labor-intensive. To streamline this process, I present an AI Literature Review Suite that integrates several functionalities to provide a comprehensive literature review. This tool leverages the power of open access science, large language models (LLMs) and natural language processing to enable the searching, downloading, and organizing of PDF files, as well as extracting content from articles. Semantic search queries are used for data retrieval, while text embeddings and summarization using LLMs present succinct literature reviews. Interaction with PDFs is enhanced through a user-friendly graphical user interface (GUI). The suite also features integrated programs for bibliographic organization, interaction and query, and literature review summaries. This tool presents a robust solution to automate and optimize the process of literature review in academic and industrial research.
    摘要 Literature reviews 通常是时间和劳动密集的过程。为了减少这个过程的复杂性,我们提出了一个基于人工智能的文献评估套件(AI Literature Review Suite),该套件集成了多种功能以提供全面的文献评估。这个工具利用了开放科学、大语言模型(LLM)和自然语言处理技术来实现PDF文档的搜索、下载和组织,以及文章中的内容抽取。使用semantic search queries进行数据检索,并使用文本嵌入和摘要使用LLM来提供简洁的文献评估。用户可以通过用户友好的图形用户界面(GUI)进行交互,并且套件还包括了一个集成的bibliographic组织、交互和查询程序,以及文献评估摘要。这个工具为学术和工业研究中的文献评估带来了一个强大的自动化和优化解决方案。

SuperCLUE: A Comprehensive Chinese Large Language Model Benchmark

  • paper_url: http://arxiv.org/abs/2307.15020
  • repo_url: None
  • paper_authors: Liang Xu, Anqi Li, Lei Zhu, Hang Xue, Changtai Zhu, Kangkang Zhao, Haonan He, Xuanwei Zhang, Qiyue Kang, Zhenzhong Lan
  • for: 这 paper 的目的是为了评估大语言模型在实际应用中的性能,而不是仅仅是测试其精度。
  • methods: 这 paper 使用了一个全面的中文测试套件 SuperCLUE,包括 CArena、OPEN 和 CLOSE 三个子任务。
  • results: 这 paper 的研究结果表明,关闭式问题的答案准确率不足以反映人类的偏好,但是它们可以补充对话来预测实际用户的偏好。此外,GPT-4 可以自动评估中文语言模型在开放式问题上的人类偏好。
    Abstract Large language models (LLMs) have shown the potential to be integrated into human daily lives. Therefore, user preference is the most critical criterion for assessing LLMs' performance in real-world scenarios. However, existing benchmarks mainly focus on measuring models' accuracy using multi-choice questions, which limits the understanding of their capabilities in real applications. We fill this gap by proposing a comprehensive Chinese benchmark SuperCLUE, named after another popular Chinese LLM benchmark CLUE. SuperCLUE encompasses three sub-tasks: actual users' queries and ratings derived from an LLM battle platform (CArena), open-ended questions with single and multiple-turn dialogues (OPEN), and closed-ended questions with the same stems as open-ended single-turn ones (CLOSE). Our study shows that accuracy on closed-ended questions is insufficient to reflect human preferences achieved on open-ended ones. At the same time, they can complement each other to predict actual user preferences. We also demonstrate that GPT-4 is a reliable judge to automatically evaluate human preferences on open-ended questions in a Chinese context. Our benchmark will be released at https://www.CLUEbenchmarks.com
    摘要 大型语言模型(LLM)已经展示了在人类日常生活中的潜在应用前景。因此,用户偏好成为评估LLM在实际应用场景中的表现的关键因素。然而,现有的标准约束主要是通过多选问题来衡量模型的准确率,这限制了我们对其在实际应用中的能力的理解。我们填补了这一漏洞,提出了一个全面的中文标准准备SuperCLUE,名称来自另一个流行的中文LLM标准准备CLUE。SuperCLUE包括三个子任务:实际用户的问题和评分来自LLM战场平台(CArena),开放式问题(OPEN)和关闭式问题(CLOSE)。我们的研究显示,关闭式问题的准确率不充分反映人类偏好,而且可以补充each other来预测实际用户的偏好。此外,我们还证明了GPT-4可以自动评估中文上开放式问题的人类偏好。我们的标准将在https://www.CLUEbenchmarks.com上发布。

How Good is Google Bard’s Visual Understanding? An Empirical Study on Open Challenges

  • paper_url: http://arxiv.org/abs/2307.15016
  • repo_url: https://github.com/htqin/googlebard-visunderstand
  • paper_authors: Haotong Qin, Ge-Peng Ji, Salman Khan, Deng-Ping Fan, Fahad Shahbaz Khan, Luc Van Gool
  • for: This paper explores the ability of Google Bard to understand and interpret visual data (images) conditioned by text questions, with the goal of evaluating its performance in various task scenarios and identifying areas for improvement.
  • methods: The paper uses Google Bard to process text and image inputs and evaluate its performance in 15 diverse task scenarios, including regular, camouflaged, medical, under-water, and remote sensing data.
  • results: The primary finding of the study is that Bard struggles in vision scenarios, highlighting the significant gap in vision-based understanding that needs to be bridged in future developments. The study provides valuable insights for advancing future models and improving their capabilities in comprehending and interpreting fine-grained visual data.Here is the same information in Simplified Chinese:
  • for: 这个研究用Google Bard来处理文本和图像输入,以评估其在不同任务场景中的表现,并找到改进的方向。
  • methods: 这篇论文使用Google Bard处理文本和图像输入,并在15种多样化任务场景中评估其表现,包括常见、掩蔽、医疗、水下和Remote感知数据。
  • results: 研究发现,Bard在视觉场景中表现不佳,这显示了未来模型需要覆盖视觉理解的巨大差距。这项实验带来了价值的发现,可以帮助未来的模型在细致的视觉数据上增强其能力。
    Abstract Google's Bard has emerged as a formidable competitor to OpenAI's ChatGPT in the field of conversational AI. Notably, Bard has recently been updated to handle visual inputs alongside text prompts during conversations. Given Bard's impressive track record in handling textual inputs, we explore its capabilities in understanding and interpreting visual data (images) conditioned by text questions. This exploration holds the potential to unveil new insights and challenges for Bard and other forthcoming multi-modal Generative models, especially in addressing complex computer vision problems that demand accurate visual and language understanding. Specifically, in this study, we focus on 15 diverse task scenarios encompassing regular, camouflaged, medical, under-water and remote sensing data to comprehensively evaluate Bard's performance. Our primary finding indicates that Bard still struggles in these vision scenarios, highlighting the significant gap in vision-based understanding that needs to be bridged in future developments. We expect that this empirical study will prove valuable in advancing future models, leading to enhanced capabilities in comprehending and interpreting fine-grained visual data. Our project is released on https://github.com/htqin/GoogleBard-VisUnderstand
    摘要 Google的Bard在协作AI领域已经成为OpenAI的ChatGPT的强有力竞争对手。特别是最近Bard更新以处理图像和文本提示的对话。由于Bard在文本输入方面的卓越表现,我们探索了它在理解和解释图像数据(图像)的能力。这种探索具有探索新的发现和挑战,特别是在解决复杂计算机视觉问题方面。在这项研究中,我们选择了15种多样化任务场景,包括常见、掩体、医疗、水下和远程感知数据,以全面评估Bard的性能。我们的主要发现表明Bard在视觉场景中仍然努力,反映了需要在未来发展中覆盖视觉基础知识的巨大差距。我们预计这项实验性研究会对未来模型的发展产生重要影响,导致它们在理解和解释细致的视觉数据方面增强其能力。我们的项目在https://github.com/htqin/GoogleBard-VisUnderstand上发布。

Improved Neural Radiance Fields Using Pseudo-depth and Fusion

  • paper_url: http://arxiv.org/abs/2308.03772
  • repo_url: None
  • paper_authors: Jingliang Li, Qiang Zhou, Chaohui Yu, Zhengda Lu, Jun Xiao, Zhibin Wang, Fan Wang
  • for: 本研究旨在提高Neural Radiance Fields(NeRF)模型的渲染精度和视角渲染能力,特别是在实际场景中存在多种大小对象/结构的情况下。
  • methods: 我们提出了一种使用多尺度编码量表示 scene中对象的几何信息,并将其提供给NeRF模型。我们还提出了同时进行深度预测和场景重建,以使得构造的量表示更加准确。此外,我们还提出了基于深度导航的点云特征协同拼接,以提高点云特征的准确性。
  • results: 我们的方法在novel view synthesis和dense geometry modeling中表现出了superior的性能,无需Scene-specific优化。
    Abstract Since the advent of Neural Radiance Fields, novel view synthesis has received tremendous attention. The existing approach for the generalization of radiance field reconstruction primarily constructs an encoding volume from nearby source images as additional inputs. However, these approaches cannot efficiently encode the geometric information of real scenes with various scale objects/structures. In this work, we propose constructing multi-scale encoding volumes and providing multi-scale geometry information to NeRF models. To make the constructed volumes as close as possible to the surfaces of objects in the scene and the rendered depth more accurate, we propose to perform depth prediction and radiance field reconstruction simultaneously. The predicted depth map will be used to supervise the rendered depth, narrow the depth range, and guide points sampling. Finally, the geometric information contained in point volume features may be inaccurate due to occlusion, lighting, etc. To this end, we propose enhancing the point volume feature from depth-guided neighbor feature fusion. Experiments demonstrate the superior performance of our method in both novel view synthesis and dense geometry modeling without per-scene optimization.
    摘要

Thinker: Learning to Plan and Act

  • paper_url: http://arxiv.org/abs/2307.14993
  • repo_url: https://github.com/anonymous-scrl/thinker
  • paper_authors: Stephen Chung, Ivan Anokhin, David Krueger
  • for: 这个论文的目的是开发一种新的推奖学习算法,帮助推奖学习代理人自主地使用学习到的世界模型进行规划。
  • methods: 这个算法使用了包装环境在世界模型中的方法,并提出了一些特定的模型交互动作,让代理人可以通过对世界模型进行规划,选择更好的行动。
  • results: 经验结果表明,这个算法在扮演游戏和Atari 2600测试中都达到了状态之 искусственный智能表现的最佳效果和竞争性表现。代理人训练后的视觉化表示它们已经学习了如何有效地规划使用世界模型选择更好的行动。
    Abstract We propose the Thinker algorithm, a novel approach that enables reinforcement learning agents to autonomously interact with and utilize a learned world model. The Thinker algorithm wraps the environment with a world model and introduces new actions designed for interacting with the world model. These model-interaction actions enable agents to perform planning by proposing alternative plans to the world model before selecting a final action to execute in the environment. This approach eliminates the need for hand-crafted planning algorithms by enabling the agent to learn how to plan autonomously and allows for easy interpretation of the agent's plan with visualization. We demonstrate the algorithm's effectiveness through experimental results in the game of Sokoban and the Atari 2600 benchmark, where the Thinker algorithm achieves state-of-the-art performance and competitive results, respectively. Visualizations of agents trained with the Thinker algorithm demonstrate that they have learned to plan effectively with the world model to select better actions. The algorithm's generality opens a new research direction on how a world model can be used in reinforcement learning and how planning can be seamlessly integrated into an agent's decision-making process.
    摘要 我们提出了思考算法(Thinker algorithm),一种新的方法,让强化学习代理人能够自主地与学习的世界模型互动。思考算法将环境包装在世界模型中,并增加了特定 для与世界模型互动的动作。这些模型互动动作使代理人能够在世界模型中进行规划,提出不同的计划供世界模型评估,然后选择最佳的行动进行环境中执行。这种方法扩展了强化学习的可能性,并且让代理人能够自主地学习规划技巧,同时亦可以轻松地将其计划视觉化。我们透过实验结果在拓扑游戏和Atari 2600测试中证明了思考算法的有效性,并且在这两个测试中获得了竞争性的结果。代理人训练了思考算法后的视觉化结果表明,它们已经学会了从世界模型中选择更好的动作。这个算法的通用性开启了一个新的研究方向,即如何在强化学习中使用世界模型,以及如何将规划自透地整合到代理人的决策过程中。

Multilingual Code Co-Evolution Using Large Language Models

  • paper_url: http://arxiv.org/abs/2307.14991
  • repo_url: None
  • paper_authors: Jiyang Zhang, Pengyu Nie, Junyi Jessy Li, Milos Gligoric
  • for: 本研究旨在解决跨编程语言代码更新的问题,通过大语言模型(LLMs)来实现代码更新。
  • methods: 本研究使用大语言模型(LLMs)来模型代码更新为编辑序列,并学习代码更新之间的相关性。
  • results: 对于6,613个对齐的代码更新样本,codeditor与状态之前的方法相比,得到了大幅度的提高。此外,codeditor与现有的生成型模型相结合,能够实现更高的性能。
    Abstract Many software projects implement APIs and algorithms in multiple programming languages. Maintaining such projects is tiresome, as developers have to ensure that any change (e.g., a bug fix or a new feature) is being propagated, timely and without errors, to implementations in other programming languages. In the world of ever-changing software, using rule-based translation tools (i.e., transpilers) or machine learning models for translating code from one language to another provides limited value. Translating each time the entire codebase from one language to another is not the way developers work. In this paper, we target a novel task: translating code changes from one programming language to another using large language models (LLMs). We design and implement the first LLM, dubbed Codeditor, to tackle this task. Codeditor explicitly models code changes as edit sequences and learns to correlate changes across programming languages. To evaluate Codeditor, we collect a corpus of 6,613 aligned code changes from 8 pairs of open-source software projects implementing similar functionalities in two programming languages (Java and C#). Results show that Codeditor outperforms the state-of-the-art approaches by a large margin on all commonly used automatic metrics. Our work also reveals that Codeditor is complementary to the existing generation-based models, and their combination ensures even greater performance.
    摘要 许多软件项目会实现API和算法在多种编程语言中。维护这些项目是疲劳的,因为开发者必须确保任何更改(例如,bug fix或新功能)在其他编程语言中得到有效地传播,并且不会出现错误。在软件世界中,使用规则基于的翻译工具(例如,转换器)或机器学习模型来翻译代码从一种语言到另一种语言提供有限的价值。每次翻译整个代码库从一种语言到另一种语言都不是开发者的工作方式。在这篇论文中,我们target一个新任务:将代码更改从一种编程语言到另一种编程语言使用大语言模型(LLM)进行翻译。我们设计并实现了第一个LLM,名为Codeditor,以解决这个任务。Codeditor显式模型代码更改为编辑序列,并学习代码更改之间的相互关系。为了评估Codeditor,我们收集了6,613个对齐的代码更改从8对开源软件项目中,这些项目在两种编程语言(Java和C#)中实现了相同的功能。结果表明,Codeditor在所有常用的自动指标上都高于现有的状态艺术方法。我们的工作还发现,Codeditor与现有的生成基于模型相结合,可以确保更高的性能。

Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models

  • paper_url: http://arxiv.org/abs/2307.14971
  • repo_url: https://github.com/wangzy22/tap
  • paper_authors: Ziyi Wang, Xumin Yu, Yongming Rao, Jie Zhou, Jiwen Lu
  • for: 提高3D视觉模型的性能
  • methods: 使用交叉注意机制生成视图图像作为预训练方案
  • results: 超过前一代预训练方法的表现,并且可以提高建筑型approaches的性能
    Abstract With the overwhelming trend of mask image modeling led by MAE, generative pre-training has shown a remarkable potential to boost the performance of fundamental models in 2D vision. However, in 3D vision, the over-reliance on Transformer-based backbones and the unordered nature of point clouds have restricted the further development of generative pre-training. In this paper, we propose a novel 3D-to-2D generative pre-training method that is adaptable to any point cloud model. We propose to generate view images from different instructed poses via the cross-attention mechanism as the pre-training scheme. Generating view images has more precise supervision than its point cloud counterpart, thus assisting 3D backbones to have a finer comprehension of the geometrical structure and stereoscopic relations of the point cloud. Experimental results have proved the superiority of our proposed 3D-to-2D generative pre-training over previous pre-training methods. Our method is also effective in boosting the performance of architecture-oriented approaches, achieving state-of-the-art performance when fine-tuning on ScanObjectNN classification and ShapeNetPart segmentation tasks. Code is available at https://github.com/wangzy22/TAP.
    摘要 在MAE领导的面孔图模型化潮流中,生成预训练显示了强大的可能性,以提高2D视觉基本模型的性能。然而,在3D视觉中,基于Transformer的幕后和点云的顺序性限制了生成预训练的进一步发展。在这篇论文中,我们提出了一种适用于任意点云模型的3D-to-2D生成预训练方法。我们提议通过交叉注意机制来生成不同指导姿态的视图图像作为预训练方案。生成视图图像的精确超级vision比点云对应的点云更加精准,因此帮助3D背部更好地理解点云的几何结构和立体关系。实验结果表明了我们提议的3D-to-2D生成预训练的优越性,并在ScanObjectNN分类和ShapeNetPart segmentation任务上实现了最佳性能。代码可以在https://github.com/wangzy22/TAP上获取。