cs.LG - 2023-09-15

BioinspiredLLM: Conversational Large Language Model for the Mechanics of Biological and Bio-inspired Materials

  • paper_url: http://arxiv.org/abs/2309.08788
  • repo_url: None
  • paper_authors: Rachel K. Luu, Markus J. Buehler
  • for: 本研究旨在帮助加速发现和引导想象,开发一个开源的自然语言处理模型(BioinspiredLLM),以便在生物物理和生物引用材料领域内进行搜索和研究。
  • methods: 该模型基于一个超过一千篇同行评审文章的词库,通过自动逻辑转换来挖掘知识,并可以根据用户的提示来 aktive和互动地回忆信息,帮助完成研究任务,以及作为创造力引擎。
  • results: 模型可以不仅准确回忆生物材料信息,还可以自动提出生物材料问题和答案,评估自己的性能,并且能够开发出有效的生物材料设计理论。此外,模型还能够与其他生成人工智能模型结合使用,重新定义传统材料设计过程。
    Abstract The study of biological materials and bio-inspired materials science is well established; however, surprisingly little knowledge has been systematically translated to engineering solutions. To accelerate discovery and guide insights, an open-source autoregressive transformer large language model, BioinspiredLLM, is reported. The model was finetuned with a corpus of over a thousand peer-reviewed articles in the field of structural biological and bio-inspired materials and can be prompted to actively and interactively recall information, assist with research tasks, and function as an engine for creativity. The model has proven by example that it is not only able to accurately recall information about biological materials when queried but also formulate biomaterials questions and answers that can evaluate its own performance. BioinspiredLLM also has been shown to develop sound hypotheses regarding biological materials design and remarkably so for materials that have never been explicitly studied before. Lastly, the model showed impressive promise in collaborating with other generative artificial intelligence models in a workflow that can reshape the traditional materials design process. This collaborative generative artificial intelligence method can stimulate and enhance bio-inspired materials design workflows. Biological materials is at a critical intersection of multiple scientific fields and models like BioinspiredLLM help to connect knowledge domains.
    摘要 研究生物材料和生物启发材料科学领域已经有很长的历史,但 surprisingly little knowledge has been systematically translated into engineering solutions。为了加速发现和引导意见,一个开源的自适应 transformer大语言模型, BioinspiredLLM,已经被报道。这个模型通过一千多篇 peer-reviewed文章的训练集进行了训练,可以通过活动和互动地回忆信息,协助研究任务,并作为创ativity的发动机。这个模型不仅可以准确地回忆生物材料信息,而且可以形ulate生物材料 вопро题和答案,评估自己的性能。BioinspiredLLM还能提出有效的生物材料设计假设,特别是 для Materials that have never been explicitly studied before。最后,模型还示出了在工作流程中与其他生成人工智能模型合作的潜力,可以改变传统材料设计过程。这种合作生成人工智能方法可以刺激和提高生物启发材料设计 workflows。生物材料处于多种科学领域的交叉点,模型如BioinspiredLLM可以连接知识域。

Beyond Labels: Leveraging Deep Learning and LLMs for Content Metadata

  • paper_url: http://arxiv.org/abs/2309.08787
  • repo_url: None
  • paper_authors: Saurabh Agrawal, John Trenkle, Jaya Kawale
  • for: 这篇论文关注电影推荐系统中的内容metadata的使用,具体来说是对电影或电视剧的类别标签进行分析,以便为用户提供个性化推荐和 Item Cold Starting。
  • methods: 本文提出了一种新的类别标签分析方法,称为“类别谱”(Genre Spectrum),该方法能够捕捉电影或电视剧中各种细腻的类别,并通过线上和线下实验证明其效果。
  • results: 本文的实验结果表明,使用类别谱可以更好地捕捉电影或电视剧中的细腻类别,并且可以用于实现用户的2D家庭栅格中的有效推荐组织。
    Abstract Content metadata plays a very important role in movie recommender systems as it provides valuable information about various aspects of a movie such as genre, cast, plot synopsis, box office summary, etc. Analyzing the metadata can help understand the user preferences to generate personalized recommendations and item cold starting. In this talk, we will focus on one particular type of metadata - \textit{genre} labels. Genre labels associated with a movie or a TV series help categorize a collection of titles into different themes and correspondingly setting up the audience expectation. We present some of the challenges associated with using genre label information and propose a new way of examining the genre information that we call as the \textit{Genre Spectrum}. The Genre Spectrum helps capture the various nuanced genres in a title and our offline and online experiments corroborate the effectiveness of the approach. Furthermore, we also talk about applications of LLMs in augmenting content metadata which could eventually be used to achieve effective organization of recommendations in user's 2-D home-grid.
    摘要 Content metadata plays a very important role in movie recommender systems, as it provides valuable information about various aspects of a movie, such as genre, cast, plot synopsis, box office summary, etc. Analyzing the metadata can help understand the user preferences and generate personalized recommendations and item cold starting. In this talk, we will focus on one particular type of metadata - 电影或电视剧的《类型》标签。这些标签可以将电影或电视剧分类到不同的主题,并对用户的期望进行设定。我们将介绍一些使用类型标签信息的挑战,并提出一种新的类型信息分析方法,我们称之为《类型谱》。《类型谱》可以捕捉电影或电视剧中的多种细腻类型,并我们的线上和线下实验证明了这种方法的有效性。此外,我们还会谈论使用大语言模型(LLMs)来增强内容metadata,这可以用于实现用户的2D家庭格式化推荐。

Long-term Neurological Sequelae in Post-COVID-19 Patients: A Machine Learning Approach to Predict Outcomes

  • paper_url: http://arxiv.org/abs/2309.09993
  • repo_url: None
  • paper_authors: Hayder A. Albaqer, Kadhum J. Al-Jibouri, John Martin, Fadhil G. Al-Amran, Salman Rawaf, Maitham G. Yousif
  • for: 这研究探讨了抗击 COVID-19 病毒的患者在长期后可能出现的神经科学综合征。
  • methods: 这项研究使用机器学习方法,基于多种临床数据和神经成像参数,预测患者在长期后的神经科学结果。
  • results: 研究发现,68% 的抗击 COVID-19 患者报告了神经科学症状,最常见的是疲劳、头痛和anosmia。此外,22% 的患者表现出更严重的神经科学综合征,包括脑部损伤和脑血栓。机器学习模型的应用显示了预测长期神经科学结果的潜在价值。
    Abstract The COVID-19 pandemic has brought to light a concerning aspect of long-term neurological complications in post-recovery patients. This study delved into the investigation of such neurological sequelae in a cohort of 500 post-COVID-19 patients, encompassing individuals with varying illness severity. The primary aim was to predict outcomes using a machine learning approach based on diverse clinical data and neuroimaging parameters. The results revealed that 68% of the post-COVID-19 patients reported experiencing neurological symptoms, with fatigue, headache, and anosmia being the most common manifestations. Moreover, 22% of the patients exhibited more severe neurological complications, including encephalopathy and stroke. The application of machine learning models showed promising results in predicting long-term neurological outcomes. Notably, the Random Forest model achieved an accuracy of 85%, sensitivity of 80%, and specificity of 90% in identifying patients at risk of developing neurological sequelae. These findings underscore the importance of continuous monitoring and follow-up care for post-COVID-19 patients, particularly in relation to potential neurological complications. The integration of machine learning-based outcome prediction offers a valuable tool for early intervention and personalized treatment strategies, aiming to improve patient care and clinical decision-making. In conclusion, this study sheds light on the prevalence of long-term neurological complications in post-COVID-19 patients and demonstrates the potential of machine learning in predicting outcomes, thereby contributing to enhanced patient management and better health outcomes. Further research and larger studies are warranted to validate and refine these predictive models and to gain deeper insights into the underlying mechanisms of post-COVID-19 neurological sequelae.
    摘要 COVID-19 大流行已经暴露出一个担忧的长期神经障碍问题,这项研究通过调查500名恢复后COVID-19患者的神经障碍继发情况,发现68%的患者表现神经症状,最常见的是疲劳、头痛和anosmia。此外,22%的患者 display more severe neurological complications,包括脑膜炎和脑卒。通过机器学习模型应用发现,可以有效预测患者长期神经障碍的结果。特别是Random Forest模型的准确率达85%,敏感度80%,特异度90%。这些发现highlights the importance of continuous monitoring and follow-up care for post-COVID-19 patients, especially in relation to potential neurological complications。机器学习基于结果预测可以提供有价值的工具,用于早期 intervención和个性化治疗策略,以提高患者的护理和临床决策。总之,本研究暴露了COVID-19后长期神经障碍的现象,并证明了机器学习在预测结果方面的潜在价值,以便更好地护理患者和提高健康结果。更多的研究和更大的研究是必要的,以验证和细化这些预测模型,并帮助我们更深入了解COVID-19后神经障碍的机理。

Mining Patents with Large Language Models Demonstrates Congruence of Functional Labels and Chemical Structures

  • paper_url: http://arxiv.org/abs/2309.08765
  • repo_url: https://github.com/kosonocky/chef
  • paper_authors: Clayton W. Kosonocky, Claus O. Wilke, Edward M. Marcotte, Andrew D. Ellington
  • for: 这paper的目的是开发一个可预测化学功能的模型,以推断化学物质的功能。
  • methods: 这paper使用了大型自然语言处理算法,将化学专利文献中关于化学功能的信息抽取出来,并使用ChatGPT assisted patent summarization和word-embedding label cleaning pipeline来生成一个名为CheF的数据集,包含100,000个分子和其它专利中的功能标签。
  • results: 这paper发现了一个强相关性 между功能标签和化学结构空间,并通过分析功能标签的共occurrence图可以检测到相似的功能之间的关系。此外,使用CheF数据集训练了一个模型,可以为新的化学分子分配功能标签,并成功预测了一些批准的肝炎C病毒药物,还揭示了一种未公开的抗病毒机制。
    Abstract Predicting chemical function from structure is a major goal of the chemical sciences, from the discovery and repurposing of novel drugs to the creation of new materials. Recently, new machine learning algorithms are opening up the possibility of general predictive models spanning many different chemical functions. Here, we consider the challenge of applying large language models to chemical patents in order to consolidate and leverage the information about chemical functionality captured by these resources. Chemical patents contain vast knowledge on chemical function, but their usefulness as a dataset has historically been neglected due to the impracticality of extracting high-quality functional labels. Using a scalable ChatGPT-assisted patent summarization and word-embedding label cleaning pipeline, we derive a Chemical Function (CheF) dataset, containing 100K molecules and their patent-derived functional labels. The functional labels were validated to be of high quality, allowing us to detect a strong relationship between functional label and chemical structural spaces. Further, we find that the co-occurrence graph of the functional labels contains a robust semantic structure, which allowed us in turn to examine functional relatedness among the compounds. We then trained a model on the CheF dataset, allowing us to assign new functional labels to compounds. Using this model, we were able to retrodict approved Hepatitis C antivirals, uncover an antiviral mechanism undisclosed in the patent, and identify plausible serotonin-related drugs. The CheF dataset and associated model offers a promising new approach to predict chemical functionality.
    摘要 预测化学 функ数据是化学科学的一大目标,从发现和重新利用新药到创造新材料。近些年,新的机器学习算法开始开销化可以涵盖多种化学功能的总体预测模型。在这里,我们考虑使用大语言模型来分析化学专利,以利用专利中 capture 了化学功能的信息。化学专利包含巨量的化学功能信息,但以前因为EXTRACTING高质量功能标签的困难而忽略了这些资源。我们使用可扩展的ChatGPT-assisted专利摘要和单词嵌入标签清洁管道, derive 了一个化学功能(CheF)数据集,包含100,000个分子和其专利 Derived 的功能标签。这些功能标签的质量得到验证,allowing 我们检测化学结构空间中功能标签和功能标签之间的强相关性。此外,我们发现功能标签的协occurrence 图包含坚实的semantic结构,这allowed 我们 examine 功能相似性 Among the compounds.我们然后使用CheF数据集训练了一个模型,allowing 我们对新分子分配功能标签。使用这个模型,我们能够预测已经批准的 Hepatitis C 抑菌药,揭示了在专利中未提及的抑菌机制,并Identify 可能的 serotonin 相关的药物。CheF数据集和相关模型提供了一个有前途的新方法,用于预测化学功能。

Circular Clustering with Polar Coordinate Reconstruction

  • paper_url: http://arxiv.org/abs/2309.08757
  • repo_url: None
  • paper_authors: Xiaoxiao Sun, Paul Sajda
  • for: Characterizing circular data found in biological systems, such as signal phase in neural recordings and nucleotide sequences in round genomes.
  • methods: Proposes a new analysis framework that utilizes projections onto a cylindrical coordinate system to better represent objects in a polar coordinate system, leveraging the mathematical properties of circular data to always find the correct clustering result within the reconstructed dataset.
  • results: Demonstrates more appropriate and consistent clustering results compared to standard methods on synthetic and real data, providing a more accurate and efficient way to cluster circular data.
    Abstract There is a growing interest in characterizing circular data found in biological systems. Such data are wide ranging and varied, from signal phase in neural recordings to nucleotide sequences in round genomes. Traditional clustering algorithms are often inadequate due to their limited ability to distinguish differences in the periodic component. Current clustering schemes that work in a polar coordinate system have limitations, such as being only angle-focused or lacking generality. To overcome these limitations, we propose a new analysis framework that utilizes projections onto a cylindrical coordinate system to better represent objects in a polar coordinate system. Using the mathematical properties of circular data, we show our approach always finds the correct clustering result within the reconstructed dataset, given sufficient periodic repetitions of the data. Our approach is generally applicable and adaptable and can be incorporated into most state-of-the-art clustering algorithms. We demonstrate on synthetic and real data that our method generates more appropriate and consistent clustering results compared to standard methods. In summary, our proposed analysis framework overcomes the limitations of existing polar coordinate-based clustering methods and provides a more accurate and efficient way to cluster circular data.
    摘要 “there is a growing interest in characterizing circular data found in biological systems. such data are wide ranging and varied, from signal phase in neural recordings to nucleotide sequences in round genomes. traditional clustering algorithms are often inadequate due to their limited ability to distinguish differences in the periodic component. current clustering schemes that work in a polar coordinate system have limitations, such as being only angle-focused or lacking generality. to overcome these limitations, we propose a new analysis framework that utilizes projections onto a cylindrical coordinate system to better represent objects in a polar coordinate system. using the mathematical properties of circular data, we show our approach always finds the correct clustering result within the reconstructed dataset, given sufficient periodic repetitions of the data. our approach is generally applicable and adaptable and can be incorporated into most state-of-the-art clustering algorithms. we demonstrate on synthetic and real data that our method generates more appropriate and consistent clustering results compared to standard methods. in summary, our proposed analysis framework overcomes the limitations of existing polar coordinate-based clustering methods and provides a more accurate and efficient way to cluster circular data.”Please note that the translation is in Simplified Chinese, which is one of the two standard versions of Chinese used in mainland China. If you prefer Traditional Chinese, I can provide that version as well.

Diverse Neural Audio Embeddings – Bringing Features back !

  • paper_url: http://arxiv.org/abs/2309.08751
  • repo_url: None
  • paper_authors: Prateek Verma
  • for: 本研究旨在学习听取 Audio embeddings,通过不同特征表示来学习多个听取特性,如抑 pitch、timbre 等。
  • methods: 本研究使用了endo-to-end 架构,并通过加入听取特性手工特征(如 pitch、timbre)来学习多个听取特性。
  • results: 研究发现,将手工特征与endo-to-end 架构结合使用可以显著提高性能,比单独使用endo-to-end 模型或手工特征来更好。
    Abstract With the advent of modern AI architectures, a shift has happened towards end-to-end architectures. This pivot has led to neural architectures being trained without domain-specific biases/knowledge, optimized according to the task. We in this paper, learn audio embeddings via diverse feature representations, in this case, domain-specific. For the case of audio classification over hundreds of categories of sound, we learn robust separate embeddings for diverse audio properties such as pitch, timbre, and neural representation, along with also learning it via an end-to-end architecture. We observe handcrafted embeddings, e.g., pitch and timbre-based, although on their own, are not able to beat a fully end-to-end representation, yet adding these together with end-to-end embedding helps us, significantly improve performance. This work would pave the way to bring some domain expertise with end-to-end models to learn robust, diverse representations, surpassing the performance of just training end-to-end models.
    摘要 现代AI体系出现以后,听众倾向于终端体系。这个转变导致神经网络被训练无关域pecific偏见/知识,根据任务进行优化。在这篇论文中,我们通过多样的特征表示学习音频嵌入,其中包括域pecific的特征表示。例如,在数百种声音分类任务中,我们学习了不同的音频属性,如抑 pitch、气质和神经表示,同时还通过终端体系来学习。我们发现,手工设计的嵌入,如抑 pitch和气质基于的嵌入,虽然独立不能超过完全终端体系的表示,但是将这些嵌入与终端体系结合可以帮助我们大幅提高性能。这项工作将为我们带来域pecific的专家知识和终端模型来学习多样、稳定的表示,超越只是训练终端模型的性能。

Wasserstein Distributionally Robust Policy Evaluation and Learning for Contextual Bandits

  • paper_url: http://arxiv.org/abs/2309.08748
  • repo_url: None
  • paper_authors: Yi Shen, Pan Xu, Michael M. Zavlanos
  • for: 本研究旨在提出一种基于 Wasserstein 距离的分布robust优化方法,以便在不直接与环境交互的情况下评估和学习政策。
  • methods: 本研究使用了分布robust优化方法,并提出了一种基于 Wasserstein 距离的新方法,以及一种减少计算成本的归一化方法和一种实际(偏向)随机梯度下降方法来优化政策。
  • results: 本研究提供了一种 theoretically 有 finite sample complexity 和 iteration complexity 的新方法,并通过使用公共数据进行验证,证明了该方法的可行性和有效性。
    Abstract Off-policy evaluation and learning are concerned with assessing a given policy and learning an optimal policy from offline data without direct interaction with the environment. Often, the environment in which the data are collected differs from the environment in which the learned policy is applied. To account for the effect of different environments during learning and execution, distributionally robust optimization (DRO) methods have been developed that compute worst-case bounds on the policy values assuming that the distribution of the new environment lies within an uncertainty set. Typically, this uncertainty set is defined based on the KL divergence around the empirical distribution computed from the logging dataset. However, the KL uncertainty set fails to encompass distributions with varying support and lacks awareness of the geometry of the distribution support. As a result, KL approaches fall short in addressing practical environment mismatches and lead to over-fitting to worst-case scenarios. To overcome these limitations, we propose a novel DRO approach that employs the Wasserstein distance instead. While Wasserstein DRO is generally computationally more expensive compared to KL DRO, we present a regularized method and a practical (biased) stochastic gradient descent method to optimize the policy efficiently. We also provide a theoretical analysis of the finite sample complexity and iteration complexity for our proposed method. We further validate our approach using a public dataset that was recorded in a randomized stoke trial.
    摘要 <>translate "Off-policy evaluation and learning are concerned with assessing a given policy and learning an optimal policy from offline data without direct interaction with the environment. Often, the environment in which the data are collected differs from the environment in which the learned policy is applied. To account for the effect of different environments during learning and execution, distributionally robust optimization (DRO) methods have been developed that compute worst-case bounds on the policy values assuming that the distribution of the new environment lies within an uncertainty set. Typically, this uncertainty set is defined based on the KL divergence around the empirical distribution computed from the logging dataset. However, the KL uncertainty set fails to encompass distributions with varying support and lacks awareness of the geometry of the distribution support. As a result, KL approaches fall short in addressing practical environment mismatches and lead to over-fitting to worst-case scenarios. To overcome these limitations, we propose a novel DRO approach that employs the Wasserstein distance instead. While Wasserstein DRO is generally computationally more expensive compared to KL DRO, we present a regularized method and a practical (biased) stochastic gradient descent method to optimize the policy efficiently. We also provide a theoretical analysis of the finite sample complexity and iteration complexity for our proposed method. We further validate our approach using a public dataset that was recorded in a randomized stoke trial."into Simplified Chinese:<>将 "Off-policy evaluation and learning are concerned with assessing a given policy and learning an optimal policy from offline data without direct interaction with the environment. Often, the environment in which the data are collected differs from the environment in which the learned policy is applied. To account for the effect of different environments during learning and execution, distributionally robust optimization (DRO) methods have been developed that compute worst-case bounds on the policy values assuming that the distribution of the new environment lies within an uncertainty set. Typically, this uncertainty set is defined based on the KL divergence around the empirical distribution computed from the logging dataset. However, the KL uncertainty set fails to encompass distributions with varying support and lacks awareness of the geometry of the distribution support. As a result, KL approaches fall short in addressing practical environment mismatches and lead to over-fitting to worst-case scenarios. To overcome these limitations, we propose a novel DRO approach that employs the Wasserstein distance instead. While Wasserstein DRO is generally computationally more expensive compared to KL DRO, we present a regularized method and a practical (biased) stochastic gradient descent method to optimize the policy efficiently. We also provide a theoretical analysis of the finite sample complexity and iteration complexity for our proposed method. We further validate our approach using a public dataset that was recorded in a randomized stoke trial."into Simplified Chinese.Here's the translation:Off-policy 评估和学习是关注评估给定策略和从停止数据中学习优化策略,而不直接与环境交互。常见情况下,收集数据的环境与学习后执行策略的环境不同。为了考虑不同的环境影响,分布robust优化(DRO)方法已经开发出来,计算新环境下策略值的最坏情况下界。通常,这个不确定集是基于logging dataset中的KL差分来定义。然而,KL不确定集无法包含变化支持的分布,而且缺乏分布支持的几何意识。因此,KL方法在实际环境差异处理中缺乏效果,容易陷入最坏情况的遮盲匹配。为了超越这些限制,我们提出了一种新的DRO方法,使用拓扑距离而不是KL差分。虽然拓扑DRO通常比KL DRO更加计算昂贵,但我们提出了一种减少方法和一种实用的偏向梯度下降法来有效地优化策略。我们还提供了论证分析的finite sample complexity和iteration complexity。此外,我们还验证了我们的方法使用公共数据集,该数据集在随机化的摇篮试验中录制。

Experimental Assessment of a Forward-Collision Warning System Fusing Deep Learning and Decentralized Radio Sensing

  • paper_url: http://arxiv.org/abs/2309.08737
  • repo_url: None
  • paper_authors: Jorge D. Cardenas, Omar Contreras-Ponce, Carlos A. Gutierrez, Ruth Aguilar-Ponce, Francisco R. Castillo-Soria, Cesar A. Azurdia-Meza
  • for: 这个论文旨在提出一个自动化前方碰撞警示系统,基于分散式射频感知(RS)方法。
  • methods: 这个系统使用了一个继续波形(CW)传送器,将其作为探测信号,以探测前方的车辆并警示驾驶员 potential 的前方碰撞。CW 可以轻松地被 integrate 到现有的多个车辆通信系统中。探测车辆的方法使用了深度学习(DL)模组,将 Doppler 簇印刷在 CW 探测信号上进行分析,以探测前方车辆的进入。
  • results: 这个实验使用了一系列在高速公路上进行的场域试验数据,以评估这种概念的可行性。两个不同的 DL 模型:一个长Short-Term Memory 网络和一个卷积神经网络,用于评估探测性能。结果显示了这种基于 DL 和分散式 CW RS 的前方碰撞警示系统的可行性。
    Abstract This paper presents the idea of an automatic forward-collision warning system based on a decentralized radio sensing (RS) approach. In this framework, a vehicle in receiving mode employs a continuous waveform (CW) transmitted by a second vehicle as a probe signal to detect oncoming vehicles and warn the driver of a potential forward collision. Such a CW can easily be incorporated as a pilot signal within the data frame of current multicarrier vehicular communication systems. Detection of oncoming vehicles is performed by a deep learning (DL) module that analyzes the features of the Doppler signature imprinted on the CW probe signal by a rapidly approaching vehicle. This decentralized CW RS approach was assessed experimentally using data collected by a series of field trials conducted in a two-lanes high-speed highway. Detection performance was evaluated for two different DL models: a long short-term memory network and a convolutional neural network. The obtained results demonstrate the feasibility of the envisioned forward-collision warning system based on the fusion of DL and decentralized CW RS.
    摘要 The detection of oncoming vehicles is performed by a deep learning (DL) module that analyzes the features of the Doppler signature imprinted on the CW probe signal by a rapidly approaching vehicle. The decentralized CW RS approach was tested experimentally using data collected from field trials conducted on a two-lane high-speed highway. The detection performance was evaluated using two different DL models: a long short-term memory network and a convolutional neural network.The results show that the proposed forward-collision warning system based on the fusion of DL and decentralized CW RS is feasible.

Pointing the Way: Refining Radar-Lidar Localization Using Learned ICP Weights

  • paper_url: http://arxiv.org/abs/2309.08731
  • repo_url: None
  • paper_authors: Daniil Lisus, Johann Laconte, Keenan Burnett, Timothy D. Barfoot
  • for: 提高雷达测量与探雷数据匹配性,以提高自动驾驶系统的稳定性和安全性。
  • methods: 利用深度学习来加权雷达点云,并与高质量探雷数据进行匹配,以提高雷达-探雷匹配性。
  • results: 对实际自动驾驶数据进行实验,提高了雷达-探雷ICP结果的精度,减少了54.94%的翻译错误和68.39%的旋转错误,同时保持了解释性和可靠性。
    Abstract This paper presents a novel deep-learning-based approach to improve localizing radar measurements against lidar maps. Although the state of the art for localization is matching lidar data to lidar maps, radar has been considered as a promising alternative, as it is potentially more resilient against adverse weather such as precipitation and heavy fog. To make use of existing high-quality lidar maps, while maintaining performance in adverse weather, matching radar data to lidar maps is of interest. However, owing in part to the unique artefacts present in radar measurements, radar-lidar localization has struggled to achieve comparable performance to lidar-lidar systems, preventing it from being viable for autonomous driving. This work builds on an ICP-based radar-lidar localization system by including a learned preprocessing step that weights radar points based on high-level scan information. Combining a proven analytical approach with a learned weight reduces localization errors in radar-lidar ICP results run on real-world autonomous driving data by up to 54.94% in translation and 68.39% in rotation, while maintaining interpretability and robustness.
    摘要 This work builds upon an ICP-based radar-lidar localization system by incorporating a learned preprocessing step that weights radar points based on high-level scan information. By combining a proven analytical approach with a learned weight, the proposed method reduces localization errors in radar-lidar ICP results on real-world autonomous driving data by up to 54.94% in translation and 68.39% in rotation, while maintaining interpretability and robustness.

Clustered Multi-Agent Linear Bandits

  • paper_url: http://arxiv.org/abs/2309.08710
  • repo_url: None
  • paper_authors: Hamza Cherkaoui, Merwan Barlier, Igor Colin
  • for: 这 paper addresses the multi-agent linear stochastic bandit problem with clustered structure.
  • methods: 该 paper 提出了一种新的算法,通过 Agent 之间的协作来加速优化问题。网络控制器负责估计网络下面的含义结构,并优化 Agent 之间的经验分享。
  • results: 该 paper 提供了许多实验结果,证明了该算法可以减少 regret 量,同时能够 correctly 回归真实的含义结构。
    Abstract We address in this paper a particular instance of the multi-agent linear stochastic bandit problem, called clustered multi-agent linear bandits. In this setting, we propose a novel algorithm leveraging an efficient collaboration between the agents in order to accelerate the overall optimization problem. In this contribution, a network controller is responsible for estimating the underlying cluster structure of the network and optimizing the experiences sharing among agents within the same groups. We provide a theoretical analysis for both the regret minimization problem and the clustering quality. Through empirical evaluation against state-of-the-art algorithms on both synthetic and real data, we demonstrate the effectiveness of our approach: our algorithm significantly improves regret minimization while managing to recover the true underlying cluster partitioning.
    摘要 我们在这篇论文中Addressing a particular instance of the multi-agent linear stochastic bandit problem, called clustered multi-agent linear bandits. In this setting, we propose a novel algorithm that leverages efficient collaboration between agents to accelerate the overall optimization problem. In this contribution, a network controller is responsible for estimating the underlying cluster structure of the network and optimizing the experience sharing among agents within the same groups. We provide a theoretical analysis for both regret minimization and clustering quality. Through empirical evaluation against state-of-the-art algorithms on both synthetic and real data, we demonstrate the effectiveness of our approach: our algorithm significantly improves regret minimization while managing to recover the true underlying cluster partitioning.Here's the word-for-word translation of the text into Simplified Chinese:我们在这篇论文中Addressing a particular instance of the multi-agent linear stochastic bandit problem, called clustered multi-agent linear bandits. In this setting, we propose a novel algorithm that leverages efficient collaboration between agents to accelerate the overall optimization problem. In this contribution, a network controller is responsible for estimating the underlying cluster structure of the network and optimizing the experience sharing among agents within the same groups. We provide a theoretical analysis for both regret minimization and clustering quality. Through empirical evaluation against state-of-the-art algorithms on both synthetic and real data, we demonstrate the effectiveness of our approach: our algorithm significantly improves regret minimization while managing to recover the true underlying cluster partitioning.

Price of Safety in Linear Best Arm Identification

  • paper_url: http://arxiv.org/abs/2309.08709
  • repo_url: None
  • paper_authors: Xuedong Shang, Igor Colin, Merwan Barlier, Hamza Cherkaoui
  • for: 本文提出了一个安全最佳臂标定框架,用于在Linear Feedback的情况下进行最佳臂标定。agent需要在每个round中采取保守的行动,以确保安全约束不会被违反。
  • methods: 本文提出了一种含有差距的算法,可以在保证安全性的情况下实现最佳臂标定。这种算法利用了线性结构,以降低误差。
  • results: 本文表明了该算法可以实现有意义的样本复杂度,同时保证stage-wise安全性。然而,该算法需要付出一定的额外成本,即在安全约束下的探索阶段所增加的成本。实验结果证明了算法的设计合理性。
    Abstract We introduce the safe best-arm identification framework with linear feedback, where the agent is subject to some stage-wise safety constraint that linearly depends on an unknown parameter vector. The agent must take actions in a conservative way so as to ensure that the safety constraint is not violated with high probability at each round. Ways of leveraging the linear structure for ensuring safety has been studied for regret minimization, but not for best-arm identification to the best our knowledge. We propose a gap-based algorithm that achieves meaningful sample complexity while ensuring the stage-wise safety. We show that we pay an extra term in the sample complexity due to the forced exploration phase incurred by the additional safety constraint. Experimental illustrations are provided to justify the design of our algorithm.
    摘要 我们介绍一个安全最佳臂分析框架,该框架运用了线性反馈,agent在每个阶段都受到一些线性靠度的安全约束,这些约束取决于一个未知参数vector。agent需要在保守的方式下进行行动,以确保在每个阶段不会违反安全约束。我们已经研究了利用线性结构来保证安全的方法,但是这些方法并未曾运用在最佳臂分析中。我们提出了一个差值基本的算法,可以在保证阶段性安全的情况下取得实际的样本缩减。我们显示出,我们因为强制的探索阶段而付出了额外的样本成本。实验证明了我们的算法的设计。

Wasserstein Distributionally Robust Control Barrier Function using Conditional Value-at-Risk with Differentiable Convex Programming

  • paper_url: http://arxiv.org/abs/2309.08700
  • repo_url: None
  • paper_authors: Alaa Eddine Chriat, Chuangchuang Sun
  • for: 这篇论文旨在设计安全控制器,以应对生产环境中的分布式随机性。
  • methods: 本文使用分布robust控制边界函数(DR-CBF)来实现安全性的可靠性,并保留控制边界函数的计算效率和前向不变性。
  • results: 本文通过实验验证,显示DR-CBF可以在分布式随机环境中提供可靠的安全性保证,并且适用于高维系统。
    Abstract Control Barrier functions (CBFs) have attracted extensive attention for designing safe controllers for their deployment in real-world safety-critical systems. However, the perception of the surrounding environment is often subject to stochasticity and further distributional shift from the nominal one. In this paper, we present distributional robust CBF (DR-CBF) to achieve resilience under distributional shift while keeping the advantages of CBF, such as computational efficacy and forward invariance. To achieve this goal, we first propose a single-level convex reformulation to estimate the conditional value at risk (CVaR) of the safety constraints under distributional shift measured by a Wasserstein metric, which is by nature tri-level programming. Moreover, to construct a control barrier condition to enforce the forward invariance of the CVaR, the technique of differentiable convex programming is applied to enable differentiation through the optimization layer of CVaR estimation. We also provide an approximate variant of DR-CBF for higher-order systems. Simulation results are presented to validate the chance-constrained safety guarantee under the distributional shift in both first and second-order systems.
    摘要 控制边界函数(CBF)已引起广泛关注,用于设计安全控制器的应用在实际安全关键系统中。然而,环境的识别常受Randomness和分布转移的影响。在这篇论文中,我们提出了分布 robust CBF(DR-CBF),以实现对分布转移的抗性,保持CBF的优点,如计算效率和前向不变性。为达到这个目标,我们首先提出了单级凸形式的改进,以估计分布转移 measured by Wasserstein metric 下的安全约束CVaR的Conditional Value at Risk。此外,我们使用拟合ifferentiable convex programming来构建控制边界条件,以保证CVaR的前向不变性。我们还提供了高阶系统的approximate variant of DR-CBF。在实验中,我们验证了在分布转移下的可靠性约束的随机性,并在一阶和二阶系统中实现了安全 garantue。

Evaluating the Impact of Local Differential Privacy on Utility Loss via Influence Functions

  • paper_url: http://arxiv.org/abs/2309.08678
  • repo_url: None
  • paper_authors: Alycia N. Carey, Minh-Hao Van, Xintao Wu
    for: This paper focuses on improving the privacy parameter setting in differential privacy (DP) for randomized response-based local DP.methods: The paper uses influence functions to approximate the change in test loss when randomized response is applied over features and/or labels, and provides a data curator with a way to select the best privacy parameter without requiring extensive computation.results: The paper shows that influence functions can approximate the true change in test loss with small mean absolute error, especially when noise correction methods are applied, and provides an empirical analysis of the computational complexity of the proposed approach.
    Abstract How to properly set the privacy parameter in differential privacy (DP) has been an open question in DP research since it was first proposed in 2006. In this work, we demonstrate the ability of influence functions to offer insight into how a specific privacy parameter value will affect a model's test loss in the randomized response-based local DP setting. Our proposed method allows a data curator to select the privacy parameter best aligned with their allowed privacy-utility trade-off without requiring heavy computation such as extensive model retraining and data privatization. We consider multiple common randomization scenarios, such as performing randomized response over the features, and/or over the labels, as well as the more complex case of applying a class-dependent label noise correction method to offset the noise incurred by randomization. Further, we provide a detailed discussion over the computational complexity of our proposed approach inclusive of an empirical analysis. Through empirical evaluations we show that for both binary and multi-class settings, influence functions are able to approximate the true change in test loss that occurs when randomized response is applied over features and/or labels with small mean absolute error, especially in cases where noise correction methods are applied.
    摘要 如何正确地设置隐私参数在差分隐私(DP)中是一个长期未解之问题。在这个工作中,我们示示了影响函数可以为选择最佳隐私参数提供深入的视角。我们的提议方法不需要大量计算,如广泛的模型重训练和数据加密。我们考虑了多种常见的随机化场景,包括在特征上执行随机响应,以及在标签上执行随机响应,以及更复杂的情况下应用类征依赖性的标签噪声纠正方法来抵消随机噪声。此外,我们还提供了 Computational Complexity 的详细分析,并进行了实验评估。我们的实验结果表明,对于 binary 和多类设置,影响函数能够以小的绝对误差来 aproximate 随机响应中的真实变化,特别是在应用标签噪声纠正方法时。

Attention-Only Transformers and Implementing MLPs with Attention Heads

  • paper_url: http://arxiv.org/abs/2309.08593
  • repo_url: None
  • paper_authors: Robert Huben, Valerie Morris
  • for: 这篇论文主要针对的是 transformer 架构中的 multi-head attention 机制,以及如何通过 attention 机制来实现 traditional MLP 神经网络中的 linear transformation 和 activation function。
  • methods: 论文使用了 masked attention 机制来实现 MLP 神经网络中的 linear transformation 和 activation function,并证明了 attention 机制可以在较小的数量的 attention heads 中实现 MLP 神经网络的所有功能。
  • results: 论文证明了 attention heads 可以在较小的数量中实现 MLP 神经网络中的 linear transformation 和 activation function,并且可以在较小的数量中实现 arbitrary masking patterns。
    Abstract The transformer architecture is widely used in machine learning models and consists of two alternating sublayers: attention heads and MLPs. We prove that an MLP neuron can be implemented by a masked attention head with internal dimension 1 so long as the MLP's activation function comes from a restricted class including SiLU and close approximations of ReLU and GeLU. This allows one to convert an MLP-and-attention transformer into an attention-only transformer at the cost of greatly increasing the number of attention heads. We also prove that attention heads can perform the components of an MLP (linear transformations and activation functions) separately. Finally, we prove that attention heads can encode arbitrary masking patterns in their weight matrices to within arbitrarily small error.
    摘要 transformer 架构广泛应用于机器学习模型中,由两个相互转换的互层:注意头和多层感知(MLP)组成。我们证明了一个 MLP 神经元可以通过一个掩码注意头来实现,只要 MLP 的活化函数来自一个限定的类型,包括 SiLU 和 ReLU 以及其近似函数。这样,可以将 MLP-和注意 transformer 转换为注意只 transformer,但是需要增加大量的注意头。此外,我们证明了注意头可以分别执行 linear 变换和活化函数组成mlp的组成部分。最后,我们证明了注意头可以在其权重矩阵中编码任意的掩码模式,至少在小于零的误差水平。

A Bayesian Approach to Robust Inverse Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.08571
  • repo_url: https://github.com/rw422scarlet/bmirl_tf
  • paper_authors: Ran Wei, Siliang Zeng, Chenliang Li, Alfredo Garcia, Anthony McDonald, Mingyi Hong
  • for: 这个论文旨在提出一种 bayesian 方法,用于 offline 模型基于 inverse reinforcement learning(IRL)。
  • methods: 该方法 simultaneous 估计专家的奖励函数和环境动力学模型。使用一类 priors distributions,以 parameterize 专家的环境模型准确程度,开发高效的估计算法。
  • results: 分析发现,当专家被认为有高度准确的环境模型时,估计政策具有 Robust 性能。在 MuJoCo 环境中验证了这一观察,并证明了我们的算法在 offline IRL 中超过了现状的算法。
    Abstract We consider a Bayesian approach to offline model-based inverse reinforcement learning (IRL). The proposed framework differs from existing offline model-based IRL approaches by performing simultaneous estimation of the expert's reward function and subjective model of environment dynamics. We make use of a class of prior distributions which parameterizes how accurate the expert's model of the environment is to develop efficient algorithms to estimate the expert's reward and subjective dynamics in high-dimensional settings. Our analysis reveals a novel insight that the estimated policy exhibits robust performance when the expert is believed (a priori) to have a highly accurate model of the environment. We verify this observation in the MuJoCo environments and show that our algorithms outperform state-of-the-art offline IRL algorithms.
    摘要 我们考虑了bayesian方法来进行离线基于 inverse reinforcement learning(IRL)。我们的框架与现有的离线基于IRL方法不同,因为它同时进行了专家的奖励函数和环境动力学模型的同时估计。我们利用一类的先验分布来 parameterizes how accurate the expert's model of the environment 来开发高效的算法来估计专家的奖励和主观动力学模型在高维设置下。我们的分析发现了一个新的发现,即估计政策在专家被认为(先验)有高度准确的环境模型时表现稳定。我们在MuJoCo环境中验证了这一观察,并示出了我们的算法在离线IRL算法中表现出色。

Neural Network Driven, Interactive Design for Nonlinear Optical Molecules Based on Group Contribution Method

  • paper_url: http://arxiv.org/abs/2309.08570
  • repo_url: None
  • paper_authors: Jinming Fan, Chao Qian, Shaodong Zhou
  • for: 设计有机小分子非线性光学材料
  • methods: 利用多Stage Bayesian神经网络(msBNN)和修正Lewis模式分组贡献方法(cLGC)对分子的光学性能进行准确和高效预测
  • results: 通过使用小训练数据集,实现了高精度和高效的光学性能预测,并且通过特定的进化算法(EA)实现了结构搜索
    Abstract A Lewis-mode group contribution method (LGC) -- multi-stage Bayesian neural network (msBNN) -- evolutionary algorithm (EA) framework is reported for rational design of D-Pi-A type organic small-molecule nonlinear optical materials is presented. Upon combination of msBNN and corrected Lewis-mode group contribution method (cLGC), different optical properties of molecules are afforded accurately and efficiently - by using only a small data set for training. Moreover, by employing the EA model designed specifically for LGC, structural search is well achievable. The logical origins of the well performance of the framework are discussed in detail. Considering that such a theory guided, machine learning framework combines chemical principles and data-driven tools, most likely, it will be proven efficient to solve molecular design related problems in wider fields.
    摘要 “一种基于Lewis-模式群贡献方法(LGC)—多Stage Bayesian神经网络(msBNN)—进化算法(EA)框架,用于有机小分子非线性光学材料的理智设计,被报道。在组合msBNN和 corrected Lewis-模式群贡献方法(cLGC)后,可以准确和有效地计算分子的不同光学性质,只需训练一小 datasets。此外,通过特定设计的EA模型,结构搜索也是可行的。文章详细介绍了这种框架的逻辑来源,可能在更广泛的领域中解决分子设计相关的问题。”Note: Please note that the translation is in Simplified Chinese, which is one of the two standard forms of Chinese writing. If you need the translation in Traditional Chinese, please let me know.

Local Differential Privacy in Graph Neural Networks: a Reconstruction Approach

  • paper_url: http://arxiv.org/abs/2309.08569
  • repo_url: https://github.com/karuna-bhaila/rgnn
  • paper_authors: Karuna Bhaila, Wen Huang, Yongkai Wu, Xintao Wu
  • for: 本研究旨在提供节点私钥保护在用户层次上,而且具有低功能损失。
  • methods: 本研究使用了局部差分隐私(Local Differential Privacy)的分布式想法,并在节点级别进行了数据杂化。特别是在高维特征设定下,我们提出了一种LDP协议,并使用了频率估计在统计分析杂化数据中进行了重建方法。
  • results: 我们的提出的学习框架在实际世界和半 sintetic 数据集上进行了广泛的实验,并证明了我们的模型的有效性。
    Abstract Graph Neural Networks have achieved tremendous success in modeling complex graph data in a variety of applications. However, there are limited studies investigating privacy protection in GNNs. In this work, we propose a learning framework that can provide node privacy at the user level, while incurring low utility loss. We focus on a decentralized notion of Differential Privacy, namely Local Differential Privacy, and apply randomization mechanisms to perturb both feature and label data at the node level before the data is collected by a central server for model training. Specifically, we investigate the application of randomization mechanisms in high-dimensional feature settings and propose an LDP protocol with strict privacy guarantees. Based on frequency estimation in statistical analysis of randomized data, we develop reconstruction methods to approximate features and labels from perturbed data. We also formulate this learning framework to utilize frequency estimates of graph clusters to supervise the training procedure at a sub-graph level. Extensive experiments on real-world and semi-synthetic datasets demonstrate the validity of our proposed model.
    摘要 格raph神经网络(Graph Neural Networks)在各种应用中取得了巨大成功,但有限的研究探讨了GNNS中的隐私保护。在这种工作中,我们提出了一种可以在用户级别保护节点隐私的学习框架。我们专注于分布式的差异隐私(Differential Privacy),具体来说是局部差异隐私(Local Differential Privacy),并通过在节点级别Randomization机制来扰乱特征和标签数据 перед数据被中央服务器收集 для模型训练。我们在高维特征设置中进行了研究,并提出了一种具有严格隐私保证的LDP协议。基于统计分析Randomized数据的频率估计,我们开发了一种可以将特征和标签从扰乱数据中重建的方法。此外,我们将这种学习框架设计为在子graph水平使用频率估计来监控训练过程。我们对真实世界和半合成数据集进行了广泛的实验,并证明了我们的提出的模型的有效性。

Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization

  • paper_url: http://arxiv.org/abs/2309.08546
  • repo_url: None
  • paper_authors: Jack Foster, Alexandra Brintrup
  • for: 本研究旨在应对长期自主的需求, robotic agent需要不断适应环境变化,并解决新任务。
  • methods: 本研究使用偏好基于的方法,以避免catastrophic forgetting问题。
  • results: 本研究引入Bayesian adaptive moment regularization(BAdam),一种新的偏好基于方法,可以更好地控制参数的增长,从而降低catastrophic forgetting。BAdam具有许多潜在应用的优点,例如轻量级、任务标签自由、快速收敛和安全的实际应用。Results表明,BAdam在单头类别增量实验中(如Split MNIST和Split FashionMNIST) achieve state-of-the-art performance,并且不需要任务标签或类别边界。
    Abstract The pursuit of long-term autonomy mandates that robotic agents must continuously adapt to their changing environments and learn to solve new tasks. Continual learning seeks to overcome the challenge of catastrophic forgetting, where learning to solve new tasks causes a model to forget previously learnt information. Prior-based continual learning methods are appealing for robotic applications as they are space efficient and typically do not increase in computational complexity as the number of tasks grows. Despite these desirable properties, prior-based approaches typically fail on important benchmarks and consequently are limited in their potential applications compared to their memory-based counterparts. We introduce Bayesian adaptive moment regularization (BAdam), a novel prior-based method that better constrains parameter growth, leading to lower catastrophic forgetting. Our method boasts a range of desirable properties for robotic applications such as being lightweight and task label-free, converging quickly, and offering calibrated uncertainty that is important for safe real-world deployment. Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments such as Split MNIST and Split FashionMNIST, and does so without relying on task labels or discrete task boundaries.
    摘要 长期自主需求 robotic agent 必须不断适应环境变化,学习解决新任务。不断学习挑战Catastrophic forgetting问题,已经学习的信息将被新任务学习所取代。基于优先的 continual learning 方法在 robotic 应用中具有可取的特点,如空间有效和计算复杂性不增加任务数量增加。然而,这些方法通常在重要的标准准则上表现不佳,因此在应用上受到限制。我们介绍了 Bayesian adaptive moment regularization(BAdam),一种新的基于优先的方法,可以更好地控制参数增长,避免Catastrophic forgetting。BAdam 具有许多适合 robotic 应用的优点,如轻量级、无需任务标签和分类边界,快速收敛和提供了可靠的uncertainty,这些特点在实际世界中部署时非常重要。结果显示,BAdam 在单头类增量实验中(如 Split MNIST 和 Split FashionMNIST)达到了优秀的性能,而不需要任务标签或分类边界。

Efficient and robust Sensor Placement in Complex Environments

  • paper_url: http://arxiv.org/abs/2309.08545
  • repo_url: None
  • paper_authors: Lukas Taus, Yen-Hsi Richard Tsai
  • for: solves the problem of efficient and unobstructed surveillance or communication in complex environments with a minimal number of sensors.
  • methods: proposes a greedy algorithm and explores deep learning techniques to accelerate the evaluation of the objective function formulated in the greedy algorithm.
  • results: discusses the differences in using greedy and $\epsilon$-greedy algorithms to generate data and their impact on the robustness of the network.Here is the text in Traditional Chinese:
  • for: Addresses the problem of 高效且不受阻扰的监控或通信在复杂环境中,使用最小数量的侦测器。
  • methods: Proposes a 采取方法和应用深度学习技术来加速侦测器的评估。
  • results: Discusses the differences in using 绝对 greedy 和 $\epsilon$-greedy 算法来生成数据,以及它们对网络的稳定性。
    Abstract We address the problem of efficient and unobstructed surveillance or communication in complex environments. On one hand, one wishes to use a minimal number of sensors to cover the environment. On the other hand, it is often important to consider solutions that are robust against sensor failure or adversarial attacks. This paper addresses these challenges of designing minimal sensor sets that achieve multi-coverage constraints -- every point in the environment is covered by a prescribed number of sensors. We propose a greedy algorithm to achieve the objective. Further, we explore deep learning techniques to accelerate the evaluation of the objective function formulated in the greedy algorithm. The training of the neural network reveals that the geometric properties of the data significantly impact the network's performance, particularly at the end stage. By taking into account these properties, we discuss the differences in using greedy and $\epsilon$-greedy algorithms to generate data and their impact on the robustness of the network.
    摘要 我们面临复杂环境中高效无阻碍监控或通信的问题。一方面,我们希望使用最小数量的传感器来覆盖环境。另一方面,常常需要考虑对于传感器故障或敌方攻击的强健性。这篇论文解决了设计最小传感器集的问题,以满足多覆盖需求,即每个环境点都被至少一定数量的传感器覆盖。我们提出了一个采取式算法来解决这个问题。此外,我们探讨了深度学习技术来加速评估目标函数中的算法。训练神经网络 revelaed that data的几何性对神经网络的性能有着重要的影响,特别是在终端阶段。我们考虑到这些属性,我们讨论了使用采取式和ε-采取式产生数据的区别,以及它们对网络的强健性的影响。

Towards Last-layer Retraining for Group Robustness with Fewer Annotations

  • paper_url: http://arxiv.org/abs/2309.08534
  • repo_url: https://github.com/tmlabonte/last-layer-retraining
  • paper_authors: Tyler LaBonte, Vidya Muthukumar, Abhishek Kumar
  • for: 这篇论文旨在探讨深度神经网络(Deep Neural Network,DNNet)中的Empirical Risk Minimization(ERM)问题,特别是DNNet在面对少数群体时的准确性问题。
  • methods: 这篇论文使用的方法包括last-layer retraining(最后层重训)和selective last-layer finetuning(SELF),并评估了这些方法在不同的测试集上的效果。
  • results: 论文的结果显示last-layer retraining可以对DNNet进行优化,并且可以在没有小集和分类标签的情况下进行优化。SELF方法可以对DNNet进行更好的分类准确性和种族准确性优化,并且可以在没有小集和分类标签的情况下进行优化。
    Abstract Empirical risk minimization (ERM) of neural networks is prone to over-reliance on spurious correlations and poor generalization on minority groups. The recent deep feature reweighting (DFR) technique achieves state-of-the-art group robustness via simple last-layer retraining, but it requires held-out group and class annotations to construct a group-balanced reweighting dataset. In this work, we examine this impractical requirement and find that last-layer retraining can be surprisingly effective with no group annotations (other than for model selection) and only a handful of class annotations. We first show that last-layer retraining can greatly improve worst-group accuracy even when the reweighting dataset has only a small proportion of worst-group data. This implies a "free lunch" where holding out a subset of training data to retrain the last layer can substantially outperform ERM on the entire dataset with no additional data or annotations. To further improve group robustness, we introduce a lightweight method called selective last-layer finetuning (SELF), which constructs the reweighting dataset using misclassifications or disagreements. Our empirical and theoretical results present the first evidence that model disagreement upsamples worst-group data, enabling SELF to nearly match DFR on four well-established benchmarks across vision and language tasks with no group annotations and less than 3% of the held-out class annotations. Our code is available at https://github.com/tmlabonte/last-layer-retraining.
    摘要 Empirical risk minimization (ERM) of neural networks 是受到偶散 correlations 和对少数群体的差异化的弱点。最近的 Deep Feature Reweighting (DFR) 技术可以实现当前的群体强度 Robustness via 简单的最后层重新训练,但它需要一个具有多少的分组和类别标注来构建一个均衡化的重新训练数据集。在这个工作中,我们考虑这个不实际的要求,并发现最后层重新训练可以 surprisingly 有效,只需要一些类别标注(除了模型选择之外)和少量的数据标注。我们首先表明,最后层重新训练可以在具有少量最坏组数据的情况下大幅提高最坏组的准确率。这意味着在找到一个小部分的训练数据来重新训练最后一层时,可以大幅超越 ERM 在整个数据集上的性能,无需额外的数据或标注。为了进一步提高群体强度,我们引入了一种轻量级的方法 called selective last-layer finetuning (SELF),它使用了分类错误或不一致来构建重新训练数据集。我们的实际和理论结果表明,模型不一致可以增加最坏组数据,使 SELF 能够在四个常见的视觉和语言任务上达到 DFR 的性能,但需要无Group 标注和少于 3% 的被听出来的类别标注。我们的代码可以在 上找到。

Scaling Laws for Sparsely-Connected Foundation Models

  • paper_url: http://arxiv.org/abs/2309.08520
  • repo_url: None
  • paper_authors: Elias Frantar, Carlos Riquelme, Neil Houlsby, Dan Alistarh, Utku Evci
  • for: This paper investigates the impact of parameter sparsity on the scaling behavior of Transformers trained on massive datasets, specifically in the vision and language domains.
  • methods: The paper explores the relationship between weight sparsity, number of non-zero parameters, and amount of training data, and identifies a scaling law that describes this relationship. The authors also experiment with different sparsity structures and strategies.
  • results: The paper finds that the “optimal sparsity” (the sparsity level that yields the best performance for a given effective model size and training budget) increases with the amount of data used for training. The authors also extend their study to different sparsity structures and strategies, and shed light on the power and limitations of weight sparsity across various parameter and computational settings.
    Abstract We explore the impact of parameter sparsity on the scaling behavior of Transformers trained on massive datasets (i.e., "foundation models"), in both vision and language domains. In this setting, we identify the first scaling law describing the relationship between weight sparsity, number of non-zero parameters, and amount of training data, which we validate empirically across model and data scales; on ViT/JFT-4B and T5/C4. These results allow us to characterize the "optimal sparsity", the sparsity level which yields the best performance for a given effective model size and training budget. For a fixed number of non-zero parameters, we identify that the optimal sparsity increases with the amount of data used for training. We also extend our study to different sparsity structures (such as the hardware-friendly n:m pattern) and strategies (such as starting from a pretrained dense model). Our findings shed light on the power and limitations of weight sparsity across various parameter and computational settings, offering both theoretical understanding and practical implications for leveraging sparsity towards computational efficiency improvements.
    摘要

Deep-learning-powered data analysis in plankton ecology

  • paper_url: http://arxiv.org/abs/2309.08500
  • repo_url: https://github.com/softmatterlab/deep-learning-in-plankton-ecology
  • paper_authors: Harshith Bachimanchi, Matthew I. M. Pinder, Chloé Robert, Pierre De Wit, Jonathan Havenhand, Alexandra Kinnby, Daniel Midtvedt, Erik Selander, Giovanni Volpe
  • for: 这篇论文旨在探讨深度学习算法在浮游生物学中的应用,以及深度学习在浮游生物研究中的优势和局限性。
  • methods: 论文介绍了深度学习算法在浮游生物图像识别和分类、掠食和游泳行为分析、以及生态模型建立方面的应用。
  • results: 论文总结了深度学习在浮游生物研究中的可能性和挑战,并提供了详细的教程和代码示例,以便读者可以应用这些方法到自己的数据中。
    Abstract The implementation of deep learning algorithms has brought new perspectives to plankton ecology. Emerging as an alternative approach to established methods, deep learning offers objective schemes to investigate plankton organisms in diverse environments. We provide an overview of deep-learning-based methods including detection and classification of phyto- and zooplankton images, foraging and swimming behaviour analysis, and finally ecological modelling. Deep learning has the potential to speed up the analysis and reduce the human experimental bias, thus enabling data acquisition at relevant temporal and spatial scales with improved reproducibility. We also discuss shortcomings and show how deep learning architectures have evolved to mitigate imprecise readouts. Finally, we suggest opportunities where deep learning is particularly likely to catalyze plankton research. The examples are accompanied by detailed tutorials and code samples that allow readers to apply the methods described in this review to their own data.
    摘要 <>translate "The implementation of deep learning algorithms has brought new perspectives to plankton ecology. Emerging as an alternative approach to established methods, deep learning offers objective schemes to investigate plankton organisms in diverse environments. We provide an overview of deep-learning-based methods including detection and classification of phyto- and zooplankton images, foraging and swimming behaviour analysis, and finally ecological modelling. Deep learning has the potential to speed up the analysis and reduce the human experimental bias, thus enabling data acquisition at relevant temporal and spatial scales with improved reproducibility. We also discuss shortcomings and show how deep learning architectures have evolved to mitigate imprecise readouts. Finally, we suggest opportunities where deep learning is particularly likely to catalyze plankton research. The examples are accompanied by detailed tutorials and code samples that allow readers to apply the methods described in this review to their own data." into Simplified Chinese.翻译文本:深度学习算法在湿生生物学中提供了新的视角,代替传统方法,它可以 объектив地研究各种湿生生物在多样化环境中的行为。本文提供了基于深度学习的方法,包括植物和动物图像检测和分类、捕食和游泳行为分析,以及生态模型建立。深度学习可以快速分析数据,减少人类实验偏见,实现在相关的时空尺度上取得数据,并提高数据的重复性。我们还讨论了深度学习的缺点,并详细介绍了深度学习架构的发展,以抗衡不准确的输出。最后,我们提出了深度学习在湿生生物研究中可能具有激发作用的机会。这些例子附有详细的教程和代码示例,让读者可以将本文中的方法应用到自己的数据上。

Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary Network

  • paper_url: http://arxiv.org/abs/2309.08489
  • repo_url: None
  • paper_authors: Yiling Huang, Weiran Wang, Guanlong Zhao, Hank Liao, Wei Xia, Quan Wang
  • for: 本研究旨在提出一种word-level朴素神经笔记录系统,以实现同时进行语音识别和说话人标注。
  • methods: 该系统使用了一种多任务学习算法,把语音识别和说话人标注嵌入到同一个神经网络架构中进行同时进行。
  • results: 实验结果表明,WEEND比基准系统在所有2人短采样场景中表现出色,并且具有总线上进行5分钟音频的泛化能力。
    Abstract While standard speaker diarization attempts to answer the question "who spoken when", most of relevant applications in reality are more interested in determining "who spoken what". Whether it is the conventional modularized approach or the more recent end-to-end neural diarization (EEND), an additional automatic speech recognition (ASR) model and an orchestration algorithm are required to associate the speaker labels with recognized words. In this paper, we propose Word-level End-to-End Neural Diarization (WEEND) with auxiliary network, a multi-task learning algorithm that performs end-to-end ASR and speaker diarization in the same neural architecture. That is, while speech is being recognized, speaker labels are predicted simultaneously for each recognized word. Experimental results demonstrate that WEEND outperforms the turn-based diarization baseline system on all 2-speaker short-form scenarios and has the capability to generalize to audio lengths of 5 minutes. Although 3+speaker conversations are harder, we find that with enough in-domain training data, WEEND has the potential to deliver high quality diarized text.
    摘要 通常的Speaker diarization(Speaker分类)问题是回答 "谁说了什么时候",但是在实际应用中,大多数应用更关心的是 "谁说了什么"。无论是传统的模块化方法还是更新的终端神经网络分类(EEND),都需要一个自动语音识别(ASR)模型和一个抽象算法来将Speaker标签与识别出的词语相关联。在这篇论文中,我们提议了Word-level End-to-End Neural Diarization(WEEND),一种多任务学习算法,即在同一个神经网络架构中同时进行终端ASR和Speaker分类。即使在识别语音时,Speaker标签同时预测每个识别出的词语。实验结果表明,WEEND在所有2个说者短形enario下都高于基准系统,并且有能力泛化到5分钟的音频长度。虽然3+个说者对话更加复杂,但是我们发现,在具有相应预处理数据的情况下,WEEND有可能提供高质量的分类文本。

On the limitations of data-driven weather forecasting models

  • paper_url: http://arxiv.org/abs/2309.08473
  • repo_url: None
  • paper_authors: Massimo Bonavita
  • for: This paper examines the forecasts produced by the Pangu-Weather machine learning model and compares them to traditional physics-based models in terms of fidelity and physical consistency.
  • methods: The paper uses Pangu-Weather forecasts and compares them to traditional physics-based models to evaluate their accuracy and physical consistency.
  • results: The paper finds that Pangu-Weather forecasts lack the fidelity and physical consistency of physics-based models, but they can still provide accurate forecasts for specific applications and have low computational cost during deployment.Here’s the simplified Chinese text for the three points:
  • for: 这篇论文研究了由Pangu-Weather机器学习模型生成的预测和传统物理模型的对比,以评估其准确性和物理一致性。
  • methods: 论文使用Pangu-Weather预测和传统物理模型进行对比,以评估它们的准确性和物理一致性。
  • results: 论文发现Pangu-Weather预测缺乏物理模型的准确性和物理一致性,但它们可以在特定应用场景提供准确的预测,并且在部署时的计算成本非常低。
    Abstract As in many other areas of engineering and applied science, Machine Learning (ML) is having a profound impact in the domain of Weather and Climate Prediction. A very recent development in this area has been the emergence of fully data-driven ML prediction models which routinely claim superior performance to that of traditional physics-based models. In this work, we examine some aspects of the forecasts produced by an exemplar of the current generation of ML models, Pangu-Weather, with a focus on the fidelity and physical consistency of those forecasts and how these characteristics relate to perceived forecast performance. The main conclusion is that Pangu-Weather forecasts, and by extension those of similar ML models, do not have the fidelity and physical consistency of physics-based models and their advantage in accuracy on traditional deterministic metrics of forecast skill can be attributed, to a large extent, to these peculiarities. Similarly to other current post-processing technologies, ML models appear to be able to add value to standard NWP outputs for specific forecast applications and combined with their extremely low computational cost during deployment, will likely provide an additional, useful source of forecast information.
    摘要 如同其他工程和应用科学领域一样,机器学习(ML)在气象和气候预测领域也有深远的影响。最近,这个领域出现了完全数据驱动的ML预测模型,这些模型常常超越传统的物理基础模型的性能。在这篇文章中,我们研究了一个ML模型的示例——Pangu-Weather,强调预测的准确性和物理一致性如何影响预测性能。结论是,Pangu-Weather的预测和类似的ML模型没有物理基础模型的准确性和物理一致性,但它们在传统的决定性指标上的准确性却很高。这种特点与其他当前的后处理技术类似,ML模型可以在特定预测应用场景中添加价值,并且在部署时的计算成本极低,因此将成为一个有用的预测信息来源。

Quantifying Credit Portfolio sensitivity to asset correlations with interpretable generative neural networks

  • paper_url: http://arxiv.org/abs/2309.08652
  • repo_url: None
  • paper_authors: Sergio Caprioli, Emanuele Cagliero, Riccardo Crupi
  • for: 本研究提出了一种新的方法,用于量化债权资产值风险(VaR)对资产回报的敏感性,通过使用深度学习模型生成的合成金融相关矩阵。
  • methods: 本研究使用了变量自动编码器(VAE),而不是之前使用的生成 adversarial network(GANs),以实现更加可读的幂等空间表示。
  • results: 通过分析,我们发现了VAE幂等空间可以 capture债权资产投资风险中的关键因素,特别是对资产相关性变化的敏感性。
    Abstract In this research, we propose a novel approach for the quantification of credit portfolio Value-at-Risk (VaR) sensitivity to asset correlations with the use of synthetic financial correlation matrices generated with deep learning models. In previous work Generative Adversarial Networks (GANs) were employed to demonstrate the generation of plausible correlation matrices, that capture the essential characteristics observed in empirical correlation matrices estimated on asset returns. Instead of GANs, we employ Variational Autoencoders (VAE) to achieve a more interpretable latent space representation. Through our analysis, we reveal that the VAE latent space can be a useful tool to capture the crucial factors impacting portfolio diversification, particularly in relation to credit portfolio sensitivity to asset correlations changes.
    摘要 在这项研究中,我们提出了一种新的方法来衡量借据风险(VaR)敏感性对资产相关性的影响,使用深度学习模型生成的 sintetic 金融相关矩阵。在先前的工作中,我们使用Generative Adversarial Networks(GANs)来示出生成有可能性的相关矩阵,这些矩阵能够捕捉到实际相关矩阵中所见到的主要特征。而在这项研究中,我们使用Variational Autoencoders(VAE)来实现更加可读的幂等空间表示。通过我们的分析,我们发现了VAE幂等空间可以用于捕捉借据风险敏感性对资产相关性变化的关键因素,特别是借据风险对资产相关性变化的影响。

FedDCSR: Federated Cross-domain Sequential Recommendation via Disentangled Representation Learning

  • paper_url: http://arxiv.org/abs/2309.08420
  • repo_url: https://github.com/orion-orion/FedDCSR
  • paper_authors: Hongyu Zhang, Dongyi Zheng, Xu Yang, Jiyuan Feng, Qing Liao
  • for: 该论文目的是提出一种基于联合学习和跨Domains Sequential Recommendation的数据隐私保护推荐方法,以便全面利用不同Domains的用户序列数据,保持数据隐私。
  • methods: 该方法使用了联合学习(FL)和跨Domains Sequential Recommendation(CSR),并提出了一种解决序列特征差异性问题的方法,即在不同Domains中分离用户序列特征,并通过内域对用户序列进行数据增强来学习更丰富的用户特征。
  • results: 在三个实际场景中,FedDCSR比基于现有基eline的方法具有显著改善,表明该方法可以有效地利用不同Domains的用户序列数据,同时保持数据隐私。
    Abstract Cross-domain Sequential Recommendation (CSR) which leverages user sequence data from multiple domains has received extensive attention in recent years. However, the existing CSR methods require sharing origin user data across domains, which violates the General Data Protection Regulation (GDPR). Thus, it is necessary to combine federated learning (FL) and CSR to fully utilize knowledge from different domains while preserving data privacy. Nonetheless, the sequence feature heterogeneity across different domains significantly impacts the overall performance of FL. In this paper, we propose FedDCSR, a novel federated cross-domain sequential recommendation framework via disentangled representation learning. Specifically, to address the sequence feature heterogeneity across domains, we introduce an approach called inter-intra domain sequence representation disentanglement (SRD) to disentangle the user sequence features into domain-shared and domain-exclusive features. In addition, we design an intra domain contrastive infomax (CIM) strategy to learn richer domain-exclusive features of users by performing data augmentation on user sequences. Extensive experiments on three real-world scenarios demonstrate that FedDCSR achieves significant improvements over existing baselines.
    摘要 cross-domain sequential recommendation (CSR) 跨多个域的序列推荐,在过去几年内得到了广泛的关注。然而,现有的 CSR 方法需要在域之间共享原始用户数据,这会违反欧盟数据保护条例 (GDPR)。因此,我们需要结合联邦学习 (FL) 和 CSR,以便在不同域上完全利用不同域的知识,保持数据隐私。然而,各域序列特征之间的差异会对整体性的FL具有负面影响。在本文中,我们提出了 FedDCSR,一种新的联邦跨域序列推荐框架,通过分离表示学习来解决各域序列特征之间的差异。具体来说,我们提出了一种名为域内序列特征分离 (SRD) 的方法,以分离用户序列特征为域共享和域独特的特征。此外,我们设计了一种域内对照信息最大 (CIM) 策略,以学习用户序列特征的更加丰富的域独特特征。经过对三个实际场景的广泛实验,我们发现 FedDCSR 可以与现有的基eline相比,具有显著的改进。

A new method of modeling the multi-stage decision-making process of CRT using machine learning with uncertainty quantification

  • paper_url: http://arxiv.org/abs/2309.08415
  • repo_url: https://github.com/miilab-mtu/crt_multistageml_uncertainty
  • paper_authors: Kristoffer Larsen, Chen Zhao, Joyce Keyak, Qiuying Sha, Diana Paez, Xinwei Zhang, Jiangang Zou, Amalia Peix, Weihua Zhou
  • for: 这项研究的目的是为心力衰竭疾患(HF)患者创建一个多个阶段机器学习模型,以预测心室复ynchronization疗法(CRT)的回应。
  • methods: 这项研究使用了218名接受了rest-gated SPECT MPI的患者,定义了CRT的回应为左心室泵力上升 > 5%的6个月跟踪。创建了一个多个阶段机器学习模型,combined two ensemble models。
  • results: CRT回应率为55.5%(n = 121),男性患者占61.0%(n = 133),平均年龄62.0岁,LVEF为27.7。多个阶段模型与Ensemble 2(使用额外的SPECT数据)的性能相似,AUC为0.75 vs. 0.77,准确率为0.71 vs. 0.69,感知率为0.70 vs. 0.72,特异性为0.72 vs. 0.65,respectively。但多个阶段模型只需要SPECT MPI数据 для52.7%的患者 across all folds。
    Abstract Aims. The purpose of this study is to create a multi-stage machine learning model to predict cardiac resynchronization therapy (CRT) response for heart failure (HF) patients. This model exploits uncertainty quantification to recommend additional collection of single-photon emission computed tomography myocardial perfusion imaging (SPECT MPI) variables if baseline clinical variables and features from electrocardiogram (ECG) are not sufficient. Methods. 218 patients who underwent rest-gated SPECT MPI were enrolled in this study. CRT response was defined as an increase in left ventricular ejection fraction (LVEF) > 5% at a 6 month follow-up. A multi-stage ML model was created by combining two ensemble models. Results. The response rate for CRT was 55.5% (n = 121) with overall male gender 61.0% (n = 133), an average age of 62.0, and LVEF of 27.7. The multi-stage model performed similarly to Ensemble 2 (which utilized the additional SPECT data) with AUC of 0.75 vs. 0.77, accuracy of 0.71 vs. 0.69, sensitivity of 0.70 vs. 0.72, and specificity 0.72 vs. 0.65, respectively. However, the multi-stage model only required SPECT MPI data for 52.7% of the patients across all folds. Conclusions. By using rule-based logic stemming from uncertainty quantification, the multi-stage model was able to reduce the need for additional SPECT MPI data acquisition without sacrificing performance.
    摘要 目的:这个研究的目的是创建一个多Stage机器学习模型,以预测心脏复制疗法(CRT) 对心脏不全(HF) 患者的响应。该模型利用不确定性衡量来建议在基线临床变量和电cardiogram(ECG) 变量不充分时进行单 photon辐射计算机tomography myocardial perfusion imaging(SPECT MPI) 变量的采集。方法:这个研究包含218名HF患者,他们在Rest-gated SPECT MPI测试中被录用。CRT响应是指在6个月后跟踪中左心脏血液泵动功能(LVEF) 增加了5%以上。一个多Stage机器学习模型被创建,其由两个ensemble模型组成。结果:CRT响应率为55.5%(n=121),男性患者占61.0%(n=133),平均年龄为62.0岁,LVEF为27.7。多Stage模型与Ensemble 2(使用额外SPECT数据)的性能相似,AUC为0.75vs.0.77,准确率为0.71vs.0.69,敏感性为0.70vs.0.72,特异性为0.72vs.0.65,分别。然而,多Stage模型只需要SPECT MPI数据 dla 52.7%的患者 across all folds。结论:通过基于不确定性衡量的规则,多Stage模型可以降低SPECT MPI数据采集的需求,而不会降低性能。

Make Deep Networks Shallow Again

  • paper_url: http://arxiv.org/abs/2309.08414
  • repo_url: None
  • paper_authors: Bernhard Bermeitinger, Tomas Hrycej, Siegfried Handschuh
  • for: 提高深度神经网络的性能和优化速度
  • methods: 使用差分连接来解决梯度消失问题
  • results: 发现广泛、浅层网络可以与深度网络匹配性能,并且可以简化网络结构、提高优化效率和加速训练过程
    Abstract Deep neural networks have a good success record and are thus viewed as the best architecture choice for complex applications. Their main shortcoming has been, for a long time, the vanishing gradient which prevented the numerical optimization algorithms from acceptable convergence. A breakthrough has been achieved by the concept of residual connections -- an identity mapping parallel to a conventional layer. This concept is applicable to stacks of layers of the same dimension and substantially alleviates the vanishing gradient problem. A stack of residual connection layers can be expressed as an expansion of terms similar to the Taylor expansion. This expansion suggests the possibility of truncating the higher-order terms and receiving an architecture consisting of a single broad layer composed of all initially stacked layers in parallel. In other words, a sequential deep architecture is substituted by a parallel shallow one. Prompted by this theory, we investigated the performance capabilities of the parallel architecture in comparison to the sequential one. The computer vision datasets MNIST and CIFAR10 were used to train both architectures for a total of 6912 combinations of varying numbers of convolutional layers, numbers of filters, kernel sizes, and other meta parameters. Our findings demonstrate a surprising equivalence between the deep (sequential) and shallow (parallel) architectures. Both layouts produced similar results in terms of training and validation set loss. This discovery implies that a wide, shallow architecture can potentially replace a deep network without sacrificing performance. Such substitution has the potential to simplify network architectures, improve optimization efficiency, and accelerate the training process.
    摘要 深度神经网络有着良好的成功记录,因此被视为复杂应用场景的最佳建筑选择。它们的主要缺点是长期以来的淡化Gradient,这阻碍了数学优化算法的可接受性。然而,通过嵌入式连接的概念——一种与传统层相同的同步映射——这个问题得到了重要缓解。这种概念适用于层叠的层次结构,并有效地缓解淡化Gradient的问题。层叠的嵌入式连接层可以表示为一种类似于泰勒展开的扩展,这种展开表明可以将高阶项截断,得到一个由所有初始层叠在平行状态下组成的单一宽层。即,一个顺序深度结构被替换为一个平行浅层结构。根据这一理论,我们对两种结构的性能进行了比较,使用了计算机视觉数据集MNIST和CIFAR10进行训练,共计6912组不同层数、缓冲器大小、滤波器数和其他元参数的组合。我们的发现表明,深度和浅度结构之间存在 surprising equivalence。两种布局均在训练和验证集损失方面得到了类似的结果。这一发现 imply that a wide, shallow architecture can potentially replace a deep network without sacrificing performance.这种替换可能使神经网络建筑简化,优化效率提高,训练过程加速。

Constraint-Free Structure Learning with Smooth Acyclic Orientations

  • paper_url: http://arxiv.org/abs/2309.08406
  • repo_url: None
  • paper_authors: Riccardo Massidda, Francesco Landolfi, Martina Cinquini, Davide Bacciu
  • for: 学习图structure问题,包括将生成的数据correctly重建其arc。
  • methods: 我们引入了COSMO,一种不含约束的连续优化方案,用于学习图结构。我们定义了一个导数 differentiable的方向矩阵参数化器,该参数化器可以无需评估循环性来生成一个smooth的方向矩阵和相应的无环度邻接矩阵。
  • results: 我们证明COSMO总是 converges to an acyclic solution,并且在规模增加时比预期的速度更快。我们的实验表明,COSMO在图结构学习中与竞争方法相比表现出色。
    Abstract The structure learning problem consists of fitting data generated by a Directed Acyclic Graph (DAG) to correctly reconstruct its arcs. In this context, differentiable approaches constrain or regularize the optimization problem using a continuous relaxation of the acyclicity property. The computational cost of evaluating graph acyclicity is cubic on the number of nodes and significantly affects scalability. In this paper we introduce COSMO, a constraint-free continuous optimization scheme for acyclic structure learning. At the core of our method, we define a differentiable approximation of an orientation matrix parameterized by a single priority vector. Differently from previous work, our parameterization fits a smooth orientation matrix and the resulting acyclic adjacency matrix without evaluating acyclicity at any step. Despite the absence of explicit constraints, we prove that COSMO always converges to an acyclic solution. In addition to being asymptotically faster, our empirical analysis highlights how COSMO performance on graph reconstruction compares favorably with competing structure learning methods.
    摘要 “structure learning problem”是指将由directed Acyclic Graph(DAG)生成的数据适当地重建其矩阵。在这个 Setting中,可 diferenciable approaches 使用连续relaxation的acyclic property来对优化问题进行干扰或调控。然而,评估graph acyclicity的computational cost是nodes的cubic,对于可扩展性有很大的影响。在本文中,我们介绍COSMO,一个不含任何预设的连续优化方案,用于学习矩阵的径向。我们定义了一个可微分的方向矩阵参数,这个参数是由单一的优先级 векторparameterized。与之前的工作不同,我们的参数化适合smooth的方向矩阵,而不需要评估矩阵的径向性在任何步骤。尽管没有明确的预设,我们证明COSMO总是会 converge to an acyclic solution。此外,我们的实验分析显示,COSMO在矩阵重建方面的性能与其他结构学习方法相比,有着优势。

Neural Metamaterial Networks for Nonlinear Material Design

  • paper_url: http://arxiv.org/abs/2309.10600
  • repo_url: https://github.com/asem010/legend-pice
  • paper_authors: Yue Li, Stelian Coros, Bernhard Thomaszewski
  • for: 本研究旨在开发一种基于神经网络的非线性金属材料设计方法,用于工程、医学、机器人等领域的应用。
  • methods: 该方法使用神经网络模型来表示非线性金属材料的机械性质,并通过离散几何变换来找到满足高级性能目标的结构参数。
  • results: 该方法可以自动设计满足需求的金属材料,包括指定的剪力弹簧曲线、指定的方向刚度和Poisson比。此外,研究还表明该方法比native scale优化更有优势。
    Abstract Nonlinear metamaterials with tailored mechanical properties have applications in engineering, medicine, robotics, and beyond. While modeling their macromechanical behavior is challenging in itself, finding structure parameters that lead to ideal approximation of high-level performance goals is a challenging task. In this work, we propose Neural Metamaterial Networks (NMN) -- smooth neural representations that encode the nonlinear mechanics of entire metamaterial families. Given structure parameters as input, NMN return continuously differentiable strain energy density functions, thus guaranteeing conservative forces by construction. Though trained on simulation data, NMN do not inherit the discontinuities resulting from topological changes in finite element meshes. They instead provide a smooth map from parameter to performance space that is fully differentiable and thus well-suited for gradient-based optimization. On this basis, we formulate inverse material design as a nonlinear programming problem that leverages neural networks for both objective functions and constraints. We use this approach to automatically design materials with desired strain-stress curves, prescribed directional stiffness and Poisson ratio profiles. We furthermore conduct ablation studies on network nonlinearities and show the advantages of our approach compared to native-scale optimization.
    摘要 非线性金属材料具有工程、医学、机器人等领域的应用。而模型这些macro mechanical行为本身也是一项具有挑战性的任务。在这种工作中,我们提议使用神经元件网络(NMN)——光滑的神经表示方法,以编码整个金属家族的非线性机械学。给定结构参数作为输入,NMN返回继承了可导数学函数,从而保证了保守的力学。尽管在 simulations 数据上训练,NMN 不会继承由finite element网格的topological变化而产生的缺陷。它们而是提供了一个光滑的参数到性能空间的映射,这是完全可导的,因此非常适合 gradient-based 优化。基于这个基础,我们将 inverse material design 定义为一个非线性Programming problem,利用神经网络来定义目标函数和约束。我们使用这种方法来自动设计材料,以实现所需的弯形剪力曲线、指定的方向剪力率和Poisson ratio profile。我们进一步进行了网络非线性的ablation 研究,并证明了我们的方法与原始级别的优化相比具有优势。

Optimizing Modular Robot Composition: A Lexicographic Genetic Algorithm Approach

  • paper_url: http://arxiv.org/abs/2309.08399
  • repo_url: None
  • paper_authors: Jonathan Külz, Matthias Althoff
  • for: 用于开发适应任务需求和环境变化的模块化工业机器人
  • methods: combinatorial lexicographic optimization和遗传算法
  • results: 能够超越现有基eline,Synthesize 适应工业任务的模块化机器人
    Abstract Industrial robots are designed as general-purpose hardware, which limits their ability to adapt to changing task requirements or environments. Modular robots, on the other hand, offer flexibility and can be easily customized to suit diverse needs. The morphology, i.e., the form and structure of a robot, significantly impacts the primary performance metrics acquisition cost, cycle time, and energy efficiency. However, identifying an optimal module composition for a specific task remains an open problem, presenting a substantial hurdle in developing task-tailored modular robots. Previous approaches either lack adequate exploration of the design space or the possibility to adapt to complex tasks. We propose combining a genetic algorithm with a lexicographic evaluation of solution candidates to overcome this problem and navigate search spaces exceeding those in prior work by magnitudes in the number of possible compositions. We demonstrate that our approach outperforms a state-of-the-art baseline and is able to synthesize modular robots for industrial tasks in cluttered environments.
    摘要 工业机器人通常设计为通用硬件,这限制了它们对任务要求或环境变化的适应能力。然而,模块化机器人具有灵活性,可以根据多种需求进行自定义。机器人的形态,即其形式和结构,对主要性能指标(即取得成本、周期时间和能效率)产生了深见影响。然而,确定一个任务特定的模块组合仍然是一个未解决的问题,这导致了开发任务特定的模块化机器人的很大难度。previous approaches either lack adequate exploration of the design space or the possibility to adapt to complex tasks. We propose combining a genetic algorithm with a lexicographic evaluation of solution candidates to overcome this problem and navigate search spaces exceeding those in prior work by magnitudes in the number of possible compositions. We demonstrate that our approach outperforms a state-of-the-art baseline and is able to synthesize modular robots for industrial tasks in cluttered environments.Translation:工业机器人通常设计为通用硬件,这限制了它们对任务要求或环境变化的适应能力。然而,模块化机器人具有灵活性,可以根据多种需求进行自定义。机器人的形态,即其形式和结构,对主要性能指标(即取得成本、周期时间和能效率)产生了深见影响。然而,确定一个任务特定的模块组合仍然是一个未解决的问题,这导致了开发任务特定的模块化机器人的很大难度。previous approaches either lack adequate exploration of the design space or the possibility to adapt to complex tasks. We propose combining a genetic algorithm with a lexicographic evaluation of solution candidates to overcome this problem and navigate search spaces exceeding those in prior work by magnitudes in the number of possible compositions. We demonstrate that our approach outperforms a state-of-the-art baseline and is able to synthesize modular robots for industrial tasks in cluttered environments.

Exploring Meta Information for Audio-based Zero-shot Bird Classification

  • paper_url: http://arxiv.org/abs/2309.08398
  • repo_url: None
  • paper_authors: Alexander Gebhard, Andreas Triantafyllopoulos, Teresa Bez, Lukas Christ, Alexander Kathan, Björn W. Schuller
  • for: 这个研究旨在探讨如何透过元信息提高零数据音乐分类,以鸟类为例研究,因为鸟类的数据库丰富且多样化。
  • methods: 本研究使用bird sound descriptions encoded via (S)BERT、功能特征(AVONET)和鸟类生物生活史特征(BLH)等三种元信息,将音频特征转换为这些auxiliary information的维度,然后使用点积compatibility函数和标准零数据学学习排名损失进行排序。
  • results: 研究发现, concatenating AVONET和BLH特征可以取得最好的结果,在五个不同的测试集上,平均F1分数为0.233,标准误差为0.032。
    Abstract Advances in passive acoustic monitoring and machine learning have led to the procurement of vast datasets for computational bioacoustic research. Nevertheless, data scarcity is still an issue for rare and underrepresented species. This study investigates how meta-information can improve zero-shot audio classification, utilising bird species as an example case study due to the availability of rich and diverse metadata. We investigate three different sources of metadata: textual bird sound descriptions encoded via (S)BERT, functional traits (AVONET), and bird life-history (BLH) characteristics. As audio features, we extract audio spectrogram transformer (AST) embeddings and project them to the dimension of the auxiliary information by adopting a single linear layer. Then, we employ the dot product as compatibility function and a standard zero-shot learning ranking hinge loss to determine the correct class. The best results are achieved by concatenating the AVONET and BLH features attaining a mean F1-score of .233 over five different test sets with 8 to 10 classes.
    摘要 科学家们通过Active Passive Acoustic Monitoring和机器学习技术获得了大量的计算生物声学数据,但是对于罕见和受欢迎的物种来说,数据缺乏仍然是一个问题。这个研究探讨了如何使用元信息提高零样本音频分类,使用鸟类为例研究,因为鸟类的metadata具有丰富和多样化的特征。我们研究了三种不同的元信息来源:文本鸟叫描述编码via(S)BERT、功能特征(AVONET)和鸟类生活史特征(BLH)。我们使用音频спектрограм变换器(AST)特征提取音频特征,并将它们映射到auxiliary信息的维度上采用单线性层。然后,我们使用点积为兼容函数和标准零样本学习折射损失来确定正确的类别。最佳结果是在AVONET和BLH特征 concatenation 上达到了平均F1分数0.233,在5个不同的测试集上(8-10个类)。

A Unified View Between Tensor Hypergraph Neural Networks And Signal Denoising

  • paper_url: http://arxiv.org/abs/2309.08385
  • repo_url: None
  • paper_authors: Fuli Wang, Karelia Pena-Pena, Wei Qian, Gonzalo R. Arce
  • for: 本文研究了hypergraph neural networks (HyperGNNs)和hypergraph signal denoising (HyperGSD)的关系,以及如何基于HyperGSD的视角设计novel HyperGNNs。
  • methods: 本文提出了一种基于tensor-hypergraph convolutional network (T-HGCN)的HyperGSD问题Equivalence relation,以及一种基于这个关系的tensor-hypergraph iterative network (T-HGIN)方法。
  • results: numerically experiments demonstrate the promising applications of the proposed T-HGIN approach in preserving higher-order interactions on hypergraphs.
    Abstract Hypergraph Neural networks (HyperGNNs) and hypergraph signal denoising (HyperGSD) are two fundamental topics in higher-order network modeling. Understanding the connection between these two domains is particularly useful for designing novel HyperGNNs from a HyperGSD perspective, and vice versa. In particular, the tensor-hypergraph convolutional network (T-HGCN) has emerged as a powerful architecture for preserving higher-order interactions on hypergraphs, and this work shows an equivalence relation between a HyperGSD problem and the T-HGCN. Inspired by this intriguing result, we further design a tensor-hypergraph iterative network (T-HGIN) based on the HyperGSD problem, which takes advantage of a multi-step updating scheme in every single layer. Numerical experiments are conducted to show the promising applications of the proposed T-HGIN approach.
    摘要 高等网络模型中的卷积神经网络(HyperGNN)和卷积信号干扰(HyperGSD)是两个基本话题。理解这两个领域之间的联系非常有用于设计基于HyperGSD的HyperGNN,以及vice versa。特别是,在T-HGCN中,我们发现了一个HyperGSD问题和T-HGCN之间的相等关系。根据这一结论,我们采用了一种基于HyperGSD问题的卷积迭代网络(T-HGIN),利用每个层的多步更新机制。我们进行了数值实验,以证明我们提议的T-HGIN方法的扩展性。

Adaptive Priority Reweighing for Generalizing Fairness Improvement

  • paper_url: http://arxiv.org/abs/2309.08375
  • repo_url: https://github.com/che2198/apw
  • paper_authors: Zhihao Hu, Yiran Xu, Mengnan Du, Jindong Gu, Xinmei Tian, Fengxiang He
  • for: 提高机器学习应用中的公平性,减少风险偏见。
  • methods: 提出一种适应重量调整方法,根据样本预测结果和决策边界之间的距离进行权重调整,以提高模型的泛化能力和公平性。
  • results: 通过对多个标准 benchmark 进行广泛的实验 validate 了我们的适应优先重量调整法,并在几种公平度指标(等机会、等化机会、人口平衡)上达到了比较好的性能。此外,我们还展示了我们的方法在语言和视觉模型中的应用和改进公平性。代码可以在 GitHub 上找到。
    Abstract With the increasing penetration of machine learning applications in critical decision-making areas, calls for algorithmic fairness are more prominent. Although there have been various modalities to improve algorithmic fairness through learning with fairness constraints, their performance does not generalize well in the test set. A performance-promising fair algorithm with better generalizability is needed. This paper proposes a novel adaptive reweighing method to eliminate the impact of the distribution shifts between training and test data on model generalizability. Most previous reweighing methods propose to assign a unified weight for each (sub)group. Rather, our method granularly models the distance from the sample predictions to the decision boundary. Our adaptive reweighing method prioritizes samples closer to the decision boundary and assigns a higher weight to improve the generalizability of fair classifiers. Extensive experiments are performed to validate the generalizability of our adaptive priority reweighing method for accuracy and fairness measures (i.e., equal opportunity, equalized odds, and demographic parity) in tabular benchmarks. We also highlight the performance of our method in improving the fairness of language and vision models. The code is available at https://github.com/che2198/APW.
    摘要

Understanding the limitations of self-supervised learning for tabular anomaly detection

  • paper_url: http://arxiv.org/abs/2309.08374
  • repo_url: None
  • paper_authors: Kimberly T. Mai, Toby Davies, Lewis D. Griffin
  • for: 该论文探讨了自动学习是否可以提高表格数据异常检测性能。
  • methods: 作者通过在26个标准数据集上进行多种预 Text task的实验来检验自动学习是否有利于表格异常检测。
  • results: 结果表明,使用 neural network 的表示不会提高表格异常检测性能,但是使用 neural network 的子空间可以恢复性能。
    Abstract While self-supervised learning has improved anomaly detection in computer vision and natural language processing, it is unclear whether tabular data can benefit from it. This paper explores the limitations of self-supervision for tabular anomaly detection. We conduct several experiments spanning various pretext tasks on 26 benchmark datasets to understand why this is the case. Our results confirm representations derived from self-supervision do not improve tabular anomaly detection performance compared to using the raw representations of the data. We show this is due to neural networks introducing irrelevant features, which reduces the effectiveness of anomaly detectors. However, we demonstrate that using a subspace of the neural network's representation can recover performance.
    摘要 自我指导学习在计算机视觉和自然语言处理中已经提高了异常检测,但是对于表格数据是否能够受益于它的问题仍然存在。这篇论文探讨了自我指导学习对表格异常检测的限制。我们进行了26个标准 benchmark 数据集的多种预测任务的实验,以了解这种情况的原因。我们的结果表明,由于神经网络引入无关的特征,自我指导学习得到的表格异常检测性能不会提高,而使用数据原始表示则能够保持性能。但是,我们发现使用神经网络表示的子空间可以恢复性能。

Convergence of ADAM with Constant Step Size in Non-Convex Settings: A Simple Proof

  • paper_url: http://arxiv.org/abs/2309.08339
  • repo_url: None
  • paper_authors: Alokendu Mazumder, Bhartendu Kumar, Manan Tayal, Punit Rathore
  • for: 研究ADAM算法在非对称Setting中的性能。
  • methods: 使用定step size的ADAM算法,并提供了runtime bounds for deterministic ADAM来达到精度逼近的极值。
  • results: 证明了step size的选择对ADAM算法性能的影响,并提供了 suficient conditions for the stepsize to achieve almost sure asymptotic convergence of the gradients to zero with minimal assumptions。
    Abstract In neural network training, RMSProp and ADAM remain widely favoured optimization algorithms. One of the keys to their performance lies in selecting the correct step size, which can significantly influence their effectiveness. It is worth noting that these algorithms performance can vary considerably, depending on the chosen step sizes. Additionally, questions about their theoretical convergence properties continue to be a subject of interest. In this paper, we theoretically analyze a constant stepsize version of ADAM in the non-convex setting. We show sufficient conditions for the stepsize to achieve almost sure asymptotic convergence of the gradients to zero with minimal assumptions. We also provide runtime bounds for deterministic ADAM to reach approximate criticality when working with smooth, non-convex functions.
    摘要 在神经网络训练中,RMSProp和ADAM仍然广泛得到欢迎的优化算法。选择正确的步长是这些算法表现的关键,步长选择会对其效iveness产生很大的影响。需要注意的是,这些算法的性能会根据选择的步长而异常变化。此外,关于这些算法的理论收敛性质也仍然是研究的热点。在这篇论文中,我们对ADAM算法中的常数步长版本进行了理论分析。我们提供了适用于非对称函数的sufficient condition,以确保步长能够在大致上确定的条件下达到零的梯度几乎一定的精度。此外,我们还提供了 deterministic ADAM 在精确 критиче的运行时间上下界。

Estimation of Counterfactual Interventions under Uncertainties

  • paper_url: http://arxiv.org/abs/2309.08332
  • repo_url: None
  • paper_authors: Juliane Weilbach, Sebastian Gerwinn, Melih Kandemir, Martin Fraenzle
  • for: 本研究旨在解决counterfactual分析中的ambiguity问题,即在假设式的干扰下,对于一个系统的行为的可能性。
  • methods: 本研究使用 hierarchical Bayesian approach,以解决counterfactual分布的ambiguity问题。 Specifically, we derive counterfactual distributions for a Bayesian Warped Gaussian Process, allowing for non-Gaussian distributions and non-additive noise.
  • results: 我们在一个 synthetic 和一个 semi-synthetic example中 Illustrated the properties of our approach, and show its performance when used within an algorithmic recourse downstream task.
    Abstract Counterfactual analysis is intuitively performed by humans on a daily basis eg. "What should I have done differently to get the loan approved?". Such counterfactual questions also steer the formulation of scientific hypotheses. More formally it provides insights about potential improvements of a system by inferring the effects of hypothetical interventions into a past observation of the system's behaviour which plays a prominent role in a variety of industrial applications. Due to the hypothetical nature of such analysis, counterfactual distributions are inherently ambiguous. This ambiguity is particularly challenging in continuous settings in which a continuum of explanations exist for the same observation. In this paper, we address this problem by following a hierarchical Bayesian approach which explicitly models such uncertainty. In particular, we derive counterfactual distributions for a Bayesian Warped Gaussian Process thereby allowing for non-Gaussian distributions and non-additive noise. We illustrate the properties our approach on a synthetic and on a semi-synthetic example and show its performance when used within an algorithmic recourse downstream task.
    摘要 <>将文本翻译成简化中文。>人们日常生活中也会进行Counterfactual分析,例如:“我应该如何更好地申请贷款?”。这种Counterfactual问题不仅影响科学假设的构思,而且还提供了系统的可能性提高的信息。由于Counterfactual分析的假设性质,Counterfactual分布存在ambiguity。这种抽象特征尤其在连续设置下存在挑战,因为存在同样的解释 для同一个观察结果的多种可能性。在这篇论文中,我们采用层次权重架构方法来解决这个问题,并将Counterfactual分布模型为 Bayesian Warped Gaussian Process,以允许非泊olu Distribution和非加性噪声。我们在一个 sintetic 和一个 semi-sintetic 示例中证明了我们的方法的性能,并在下游任务中使用我们的方法来实现Algorithmic recourse。

Heteroskedastic conformal regression

  • paper_url: http://arxiv.org/abs/2309.08313
  • repo_url: https://github.com/nmdwolf/heteroskedasticconformalregression
  • paper_authors: Nicolas Dewolf, Bernard De Baets, Willem Waegeman
  • for: 这个论文主要目的是提出一种适用于不同类型数据的预测间隔估计方法,以及一种基于这种方法的可靠预测间隔构建方法。
  • methods: 该论文使用了均值和 Mondrian 逻辑推理的准确预测间隔构建方法,以及一种基于这些方法的自适应预测间隔构建方法。
  • results: 该论文通过 teoretic 和实验研究,证明了这种预测间隔构建方法可以在适用于不同类型数据的情况下提供高度可靠的预测间隔,并且可以适应不同类型的雷达数据。
    Abstract Conformal prediction, and split conformal prediction as a specific implementation, offer a distribution-free approach to estimating prediction intervals with statistical guarantees. Recent work has shown that split conformal prediction can produce state-of-the-art prediction intervals when focusing on marginal coverage, i.e., on a calibration dataset the method produces on average prediction intervals that contain the ground truth with a predefined coverage level. However, such intervals are often not adaptive, which can be problematic for regression problems with heteroskedastic noise. This paper tries to shed new light on how adaptive prediction intervals can be constructed using methods such as normalized and Mondrian conformal prediction. We present theoretical and experimental results in which these methods are investigated in a systematic way.
    摘要 《匹配预测》和《分割匹配预测》作为特定实现,提供了不受分布限制的预测间隔估计方法,并且具有统计保证。latest work表明,使得焦点在边缘覆盖率(marginal coverage)上,则可以在calibration dataset上produce state-of-the-art预测间隔。然而,这些间隔通常不是可适应的,这可能会对于具有不同水平噪声的回归问题而是一个问题。这篇论文试图通过方法如normalized和Mondrian匹配预测来构建适应预测间隔。我们present theoretical和实验结果,系统地 investigate这些方法。

Sampling-Free Probabilistic Deep State-Space Models

  • paper_url: http://arxiv.org/abs/2309.08256
  • repo_url: https://github.com/djdprogramming/adfa2
  • paper_authors: Andreas Look, Melih Kandemir, Barbara Rakitsch, Jan Peters
  • for: 这个论文是为了描述不同的动态系统,使用状态空间模型(SSM)的形式来描述每个观察结果是由隐藏状态所emit的。
  • methods: 这个论文使用了概率深度状态空间模型(ProDSSM),它是一种基于神经网络的模型,用于描述动态系统的未知参数形式。在这种模型中,转移和发射模型都是由神经网络所描述,并且具有不确定参数。
  • results: 该论文提出了一种 deterministic inference algorithm,用于在这种模型中进行归掌。这种方法具有高效的approximation能力,并且在许多实验中表现出了superior的balance between predictive performance和计算预算。
    Abstract Many real-world dynamical systems can be described as State-Space Models (SSMs). In this formulation, each observation is emitted by a latent state, which follows first-order Markovian dynamics. A Probabilistic Deep SSM (ProDSSM) generalizes this framework to dynamical systems of unknown parametric form, where the transition and emission models are described by neural networks with uncertain weights. In this work, we propose the first deterministic inference algorithm for models of this type. Our framework allows efficient approximations for training and testing. We demonstrate in our experiments that our new method can be employed for a variety of tasks and enjoys a superior balance between predictive performance and computational budget.
    摘要 很多实际世界中的动态系统可以用状态空间模型(SSM)来描述。在这种形式中,每个观察结果是由隐藏状态发射出来的,隐藏状态follows first-order Markovian dynamics。一种 probabilistic deep SSM(ProDSSM)扩展了这种框架,用于未知参数形式的动态系统,转移和发射模型由神经网络的不确定参数来描述。在这项工作中,我们提出了首个确定性推理算法 для这种类型的模型。我们的框架允许有效的估计和测试。我们的实验表明,我们的新方法可以用于多种任务,并且具有更好的预测性和计算预算之间的平衡。

Deep Nonnegative Matrix Factorization with Beta Divergences

  • paper_url: http://arxiv.org/abs/2309.08249
  • repo_url: https://github.com/vleplat/deep-kl-nmf-public
  • paper_authors: Valentin Leplat, Le Thi Khanh Hien, Akwum Onwunta, Nicolas Gillis
  • for: 提取多层特征,包括图像、语音和文档等数据类型
  • methods: 使用 $\beta$-多元分解来评估准确性,并开发新的深度非负矩阵因子化模型和算法
  • results: 在面部特征提取、文档收集中的主题分类和干扰图像中的材料识别等方面获得了优秀的结果
    Abstract Deep Nonnegative Matrix Factorization (deep NMF) has recently emerged as a valuable technique for extracting multiple layers of features across different scales. However, all existing deep NMF models and algorithms have primarily centered their evaluation on the least squares error, which may not be the most appropriate metric for assessing the quality of approximations on diverse datasets. For instance, when dealing with data types such as audio signals and documents, it is widely acknowledged that $\beta$-divergences offer a more suitable alternative. In this paper, we develop new models and algorithms for deep NMF using $\beta$-divergences. Subsequently, we apply these techniques to the extraction of facial features, the identification of topics within document collections, and the identification of materials within hyperspectral images.
    摘要

Topological Node2vec: Enhanced Graph Embedding via Persistent Homology

  • paper_url: http://arxiv.org/abs/2309.08241
  • repo_url: https://github.com/killianfmeehan/topological_node2vec
  • paper_authors: Yasuaki Hiraoka, Yusuke Imoto, Killian Meehan, Théo Lacombe, Toshiaki Yachimura
  • for: 本 paper 是一篇研究 graph embedding 的文章,它的目的是学习一个权重图的节点vector表示,同时保持节点之间的相对距离和全局结构。
  • methods: 本 paper 使用 Node2vec 算法,但是它发现 Node2vec 在保持输入图的拓扑结构方面存在问题。因此,它引入了一个拓扑损失函数,以尝试将输入图的拓扑结构与输出 embedding 的拓扑结构进行最佳匹配。
  • results: 本 paper 通过对synthetic例子进行示例,显示了这种方法的优点。它可以准确地重建输入图的拓扑结构和几何结构。
    Abstract Node2vec is a graph embedding method that learns a vector representation for each node of a weighted graph while seeking to preserve relative proximity and global structure. Numerical experiments suggest Node2vec struggles to recreate the topology of the input graph. To resolve this we introduce a topological loss term to be added to the training loss of Node2vec which tries to align the persistence diagram (PD) of the resulting embedding as closely as possible to that of the input graph. Following results in computational optimal transport, we carefully adapt entropic regularization to PD metrics, allowing us to measure the discrepancy between PDs in a differentiable way. Our modified loss function can then be minimized through gradient descent to reconstruct both the geometry and the topology of the input graph. We showcase the benefits of this approach using demonstrative synthetic examples.
    摘要 Node2vec 是一种图 embedding 方法,它学习每个权重图的节点 vector 表示,同时尽可能保持相对靠近性和全局结构。但是实验表明,Node2vec 很难复制输入图的拓扑结构。为解决这个问题,我们引入了一个拓扑损失项,将它添加到 Node2vec 的训练损失中,以尝试将输入图的 persistency 图(PD)与输出 embedding 的 PD 之间的距离最小化。通过计算学习率,我们精心适应了 PD 度量,从而可以在 differentiable 的方式下测量拓扑损失的差异。我们修改了损失函数,然后通过梯度下降来减小这个损失,以重建输入图的几何和拓扑结构。我们通过示例数据示出了这种方法的优点。Here's the text with some minor adjustments to make it more readable in Simplified Chinese:Node2vec 是一种图 embedding 方法,它学习每个权重图的节点 vector 表示,同时尽可能保持相对靠近性和全局结构。但是实验表明,Node2vec 很难复制输入图的拓扑结构。为解决这个问题,我们引入了一个拓扑损失项,将它添加到 Node2vec 的训练损失中,以尝试将输入图的 persistency 图(PD)与输出 embedding 的 PD 之间的距离最小化。通过计算学习率,我们精心适应了 PD 度量,从而可以在 differentiable 的方式下测量拓扑损失的差异。我们修改了损失函数,然后通过梯度下降来减小这个损失,以重建输入图的几何和拓扑结构。我们通过示例数据示出了这种方法的优点。I hope this helps! Let me know if you have any further questions.

Ensuring Toplogical Data-Structure Preservation under Autoencoder Compression due to Latent Space Regularization in Gauss–Legendre nodes

  • paper_url: http://arxiv.org/abs/2309.08228
  • repo_url: https://github.com/casus/autoencoder-regularisation
  • paper_authors: Chethan Krishnamurthy Ramanaik, Juan-Esteban Suarez Cardona, Anna Willmann, Pia Hanfeld, Nico Hoffmann, Michael Hecht
  • for: 这个论文是为了提出一种数据独立的幂值空间规范化约束,以便为普通无监督自适应神经网络提供一个可靠的低维表示。
  • methods: 这个论文使用了采样自动编码器Jacobian在Legendre节点上,通过Gauss-Legendre quadrature来实现规范化。这种规范化可以保证初始数据拟杂的一对一嵌入到幂值空间中。
  • results: 实验表明,该规范化技术可以避免过去提出的规范化策略,如强制编码和卷积Autoencoder,在简单的示例中就会出现拓扑缺陷。而通过我们的贡献,标准多层感知网络也可以在规范化下提供可靠的低维表示,这些表示可以在不同领域中应用,包括FashionMNIST数据集和真实世界的MRI脑成像编码问题。
    Abstract We formulate a data independent latent space regularisation constraint for general unsupervised autoencoders. The regularisation rests on sampling the autoencoder Jacobian in Legendre nodes, being the centre of the Gauss-Legendre quadrature. Revisiting this classic enables to prove that regularised autoencoders ensure a one-to-one re-embedding of the initial data manifold to its latent representation. Demonstrations show that prior proposed regularisation strategies, such as contractive autoencoding, cause topological defects already for simple examples, and so do convolutional based (variational) autoencoders. In contrast, topological preservation is ensured already by standard multilayer perceptron neural networks when being regularised due to our contribution. This observation extends through the classic FashionMNIST dataset up to real world encoding problems for MRI brain scans, suggesting that, across disciplines, reliable low dimensional representations of complex high-dimensional datasets can be delivered due to this regularisation technique.
    摘要 我们提出了一种无supervised autoencoder中独立的积分空间正则化约束。这种约束基于采样 autoencoder Jacobian 在Legendre节点中,这是Gauss-Legendre quadrature的中心。重新评估这个 классический方法,我们可以证明 regularized autoencoders 确保了初始数据抽象的一一重新嵌入到其latent表示中。示例显示,先前提出的正则化策略,如Contractive autoencoding,会在简单的示例中导致 topological defects,而且 convolutional based (variational) autoencoders 也是如此。相比之下,我们的正则化技术可以保证数据的topological preservation,这种技术已经被应用于多层感知神经网络,并且在classic FashionMNIST dataset和实际世界的编码问题中得到了证明。这意味着,无论是在不同领域,都可以通过这种正则化技术来提供可靠的低维度表示方法。

Unified Risk Analysis for Weakly Supervised Learning

  • paper_url: http://arxiv.org/abs/2309.08216
  • repo_url: None
  • paper_authors: Chao-Kai Chiang, Masashi Sugiyama
  • for: 提供一个涵盖全面理解和统一方法论的概念框架,以探讨弱监督学习(WSL)的机制和问题。
  • methods: 使用污染观点来构建形式体系,涵盖了15种WSL设定;并提出了一个新的预期链策略,以进行问题重写。
  • results: 透过实践验证,证明了提案的框架可以回传现有的 rewrite 报告。
    Abstract Among the flourishing research of weakly supervised learning (WSL), we recognize the lack of a unified interpretation of the mechanism behind the weakly supervised scenarios, let alone a systematic treatment of the risk rewrite problem, a crucial step in the empirical risk minimization approach. In this paper, we introduce a framework providing a comprehensive understanding and a unified methodology for WSL. The formulation component of the framework, leveraging a contamination perspective, provides a unified interpretation of how weak supervision is formed and subsumes fifteen existing WSL settings. The induced reduction graphs offer comprehensive connections over WSLs. The analysis component of the framework, viewed as a decontamination process, provides a systematic method of conducting risk rewrite. In addition to the conventional inverse matrix approach, we devise a novel strategy called marginal chain aiming to decontaminate distributions. We justify the feasibility of the proposed framework by recovering existing rewrites reported in the literature.
    摘要 中文翻译:在弱监督学习(WSL)的繁荣研究中,我们注意到弱监督场景的机制之间没有一个统一的解释,更不是一种系统的风险 rewrite 问题的处理方法。在这篇论文中,我们提出了一个框架,提供了弱监督场景的全面理解和统一方法。该框架的形式部分,基于污染视角,提供了弱监督形成的统一解释,并包含了十五种现有的 WSL 设置。生成的减少图表示WTRLS中的广泛连接。分析部分,视为一种去污程序,提供了一种系统的风险 rewrite 方法,包括传统的逆矩阵方法以及一种新的 marginal chain 方法,用于去污分布。我们证明了提案的可行性,通过回归现有的 rewrite 报告。

HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods

  • paper_url: http://arxiv.org/abs/2309.08208
  • repo_url: None
  • paper_authors: Hyun-seo Shin, Jungwoo Heo, Ju-ho Kim, Chan-yeong Lim, Wonbin Kim, Ha-Jin Yu
  • for: Audio deepfake detection (ADD) 即检测由文本至语音或语音转换系统生成的伪装攻击。
  • methods: 我们提议了两种组件:(1)层次池化方法,逐渐减少序列长度,以消除重复信息(2)多级分类 токен聚合方法,使用分类 токен来从不同块中收集信息。
  • results: 在ASVspoof 2021 Deepfake dataset上进行实验,HM-Conformer实现了15.71% EER,与当前系统相比表现竞争力强。
    Abstract Audio deepfake detection (ADD) is the task of detecting spoofing attacks generated by text-to-speech or voice conversion systems. Spoofing evidence, which helps to distinguish between spoofed and bona-fide utterances, might exist either locally or globally in the input features. To capture these, the Conformer, which consists of Transformers and CNN, possesses a suitable structure. However, since the Conformer was designed for sequence-to-sequence tasks, its direct application to ADD tasks may be sub-optimal. To tackle this limitation, we propose HM-Conformer by adopting two components: (1) Hierarchical pooling method progressively reducing the sequence length to eliminate duplicated information (2) Multi-level classification token aggregation method utilizing classification tokens to gather information from different blocks. Owing to these components, HM-Conformer can efficiently detect spoofing evidence by processing various sequence lengths and aggregating them. In experimental results on the ASVspoof 2021 Deepfake dataset, HM-Conformer achieved a 15.71% EER, showing competitive performance compared to recent systems.
    摘要 文本深圳检测(ADD)是检测由文本至语音或语音转换系统生成的恶作剂攻击的任务。恶作剂证据,可以帮助分辨 Between spoofed 和真实的讲话,可能存在本地或全局地在输入特征中。为了捕捉这些,Conformer ,它包括转换器和 CNN,具有适合的结构。然而,由于 Conformer 是设计 для序列到序列任务,其直接应用于 ADD 任务可能不佳。为了解决这些限制,我们提出了 HM-Conformer,采用以下两个组成部分:1. 层次 pooling 方法,逐渐减少序列长度,以消除重复的信息。2. 多级分类 токен 聚合方法,使用分类 токен 来从不同块中收集信息。受到这些组成部分的影响,HM-Conformer 可以高效地检测恶作剂证据,通过处理不同的序列长度和聚合信息。在 ASVspoof 2021 Deepfake 数据集的实验结果中,HM-Conformer 实现了 15.71% EER,与当前系统相比表现竞争力强。

Gaussian Processes with Linear Multiple Kernel: Spectrum Design and Distributed Learning for Multi-Dimensional Data

  • paper_url: http://arxiv.org/abs/2309.08201
  • repo_url: https://github.com/richardcsuwandi/distributed-gsm
  • paper_authors: Richard Cornelius Suwandi, Zhidi Lin, Feng Yin
  • for: 本文主要针对的是使用 Gaussian processes(GP)进行机器学习和信号处理,特别是Linear Multiple Kernels(LMK) kernel 的选择和模型化。
  • methods: 本文提出了一种新的 Grid Spectral Mixture(GSM) kernel 形式化,可以将多维数据模型化为 arbitrary stationary kernel,同时减少了超参数的数量,保持了有利的优化结构和近似能力。此外,为了使大规模超参数优化在 GSM kernel 中成为可行,我们首先引入了分布式 SCA(DSCA)算法,然后基于 ADMM 框架,提出了 doubly distributed SCA(D$^2$SCA)算法,可以在大数据上共同学习 GSM kernel,保持数据隐私。最后,我们解决了分布式框架中的内存带宽限制,通过对超参数进行量化,得到了量化 doubly distributed SCA(QD$^2$SCA)算法。
  • results: 理论分析表明了提posed algorithms 的收敛保证,而实验表明了我们的方法在多种数据集上具有更高的预测性能和效率。
    Abstract Gaussian processes (GPs) have emerged as a prominent technique for machine learning and signal processing. A key component in GP modeling is the choice of kernel, and linear multiple kernels (LMKs) have become an attractive kernel class due to their powerful modeling capacity and interpretability. This paper focuses on the grid spectral mixture (GSM) kernel, an LMK that can approximate arbitrary stationary kernels. Specifically, we propose a novel GSM kernel formulation for multi-dimensional data that reduces the number of hyper-parameters compared to existing formulations, while also retaining a favorable optimization structure and approximation capability. In addition, to make the large-scale hyper-parameter optimization in the GSM kernel tractable, we first introduce the distributed SCA (DSCA) algorithm. Building on this, we propose the doubly distributed SCA (D$^2$SCA) algorithm based on the alternating direction method of multipliers (ADMM) framework, which allows us to cooperatively learn the GSM kernel in the context of big data while maintaining data privacy. Furthermore, we tackle the inherent communication bandwidth restriction in distributed frameworks, by quantizing the hyper-parameters in D$^2$SCA, resulting in the quantized doubly distributed SCA (QD$^2$SCA) algorithm. Theoretical analysis establishes convergence guarantees for the proposed algorithms, while experiments on diverse datasets demonstrate the superior prediction performance and efficiency of our methods.
    摘要

An Explainable Deep-learning Model of Proton Auroras on Mars

  • paper_url: http://arxiv.org/abs/2309.08195
  • repo_url: None
  • paper_authors: Dattaraj B. Dhuri, Dimitra Atri, Ahmed AlHantoobi
  • For: 这个论文是为了研究火星上的氢气发射 aurora 而写的。* Methods: 这个论文使用了 Mars Atmosphere and Volatile EvolutioN (MAVEN) 实际观测数据和 Ly alpha 辐射的观测数据,并使用人工神经网络来模拟氢气发射。* Results: 这个论文通过 SHapley Additive exPlanations (SHAP) 分析发现,solar Zenith Angle、季节性 CO2 atmospheric variability、solar wind temperature 和 density 是氢气发射的最重要因素。 Additionally, the paper demonstrates that the model can be used as an inexpensive tool for simulating and characterizing Ly alpha response under various seasonal and upstream solar wind conditions.
    Abstract Proton auroras are widely observed on the day side of Mars, identified as a significant intensity enhancement in the hydrogen Ly alpha (121.6 nm) emission between 120 and 150~km altitudes. Solar wind protons penetrating as energetic neutral atoms into the Martian thermosphere are thought to be responsible for these auroras. Understanding proton auroras is therefore important for characterizing the solar wind interaction with the atmosphere of Mars. Recent observations of spatially localized "patchy" proton auroras suggest a possible direct deposition of protons into the atmosphere of Mars during unstable solar wind conditions. Here, we develop a purely data-driven model of proton auroras using Mars Atmosphere and Volatile EvolutioN (MAVEN) in situ observations and limb scans of Ly alpha emissions between 2014 and 2022. We train an artificial neural network that reproduces individual Ly alpha intensities with a Pearson correlation of 0.95 along with a faithful reconstruction of the observed Ly alpha emission altitude profiles. By performing a SHapley Additive exPlanations (SHAP) analysis, we find that Solar Zenith Angle, seasonal CO2 atmosphere variability, solar wind temperature, and density are the most important features for the modelled proton auroras. We also demonstrate that our model can serve as an inexpensive tool for simulating and characterizing Ly alpha response under a variety of seasonal and upstream solar wind conditions.
    摘要 托普遮 auroras 在火星日面 widely observed, identified as a significant intensity enhancement in the hydrogen Ly alpha (121.6 nm) emission between 120 and 150~km altitudes. Solar wind protons penetrating as energetic neutral atoms into the Martian thermosphere are thought to be responsible for these auroras. Understanding proton auroras is therefore important for characterizing the solar wind interaction with the atmosphere of Mars. Recent observations of spatially localized "patchy" proton auroras suggest a possible direct deposition of protons into the atmosphere of Mars during unstable solar wind conditions. Here, we develop a purely data-driven model of proton auroras using Mars Atmosphere and Volatile EvolutioN (MAVEN) in situ observations and limb scans of Ly alpha emissions between 2014 and 2022. We train an artificial neural network that reproduces individual Ly alpha intensities with a Pearson correlation of 0.95 along with a faithful reconstruction of the observed Ly alpha emission altitude profiles. By performing a SHapley Additive exPlanations (SHAP) analysis, we find that Solar Zenith Angle, seasonal CO2 atmosphere variability, solar wind temperature, and density are the most important features for the modelled proton auroras. We also demonstrate that our model can serve as an inexpensive tool for simulating and characterizing Ly alpha response under a variety of seasonal and upstream solar wind conditions.

A Precision-Scalable RISC-V DNN Processor with On-Device Learning Capability at the Extreme Edge

  • paper_url: http://arxiv.org/abs/2309.08186
  • repo_url: None
  • paper_authors: Longwei Huang, Chao Fang, Qiong Li, Jun Lin, Zhongfeng Wang
  • for: 提高智能应用程序的吞吐量和能效率,在具有限制的能量、存储和计算资源的边缘设备上运行量化深度学习模型。
  • methods: 使用多种方法,如FP16操作支持、多精度整数乘除器重复使用和FPGA资源均衡映射,以显著提高硬件资源利用率。
  • results: 在Xilinx ZCU102 FPGA上实验表明,我们的处理器可以提高推理吞吐量1.6-14.6倍和能效率1.1-14.6倍,相比之前艺术XpulpNN。此外,我们的处理器可以实现FP16操作支持,实现在设备上进行学习。
    Abstract Extreme edge platforms, such as in-vehicle smart devices, require efficient deployment of quantized deep neural networks (DNNs) to enable intelligent applications with limited amounts of energy, memory, and computing resources. However, many edge devices struggle to boost inference throughput of various quantized DNNs due to the varying quantization levels, and these devices lack floating-point (FP) support for on-device learning, which prevents them from improving model accuracy while ensuring data privacy. To tackle the challenges above, we propose a precision-scalable RISC-V DNN processor with on-device learning capability. It facilitates diverse precision levels of fixed-point DNN inference, spanning from 2-bit to 16-bit, and enhances on-device learning through improved support with FP16 operations. Moreover, we employ multiple methods such as FP16 multiplier reuse and multi-precision integer multiplier reuse, along with balanced mapping of FPGA resources, to significantly improve hardware resource utilization. Experimental results on the Xilinx ZCU102 FPGA show that our processor significantly improves inference throughput by 1.6$\sim$14.6$\times$ and energy efficiency by 1.1$\sim$14.6$\times$ across various DNNs, compared to the prior art, XpulpNN. Additionally, our processor achieves a 16.5$\times$ higher FP throughput for on-device learning.
    摘要 “极端边缘平台,如车辆内部智能设备,需要高效部署量化深度 нейрон网络(DNN)以实现智能应用程序,并且具备限制能源、内存和处理资源。然而,许多边缘设备受到不同量化水平的影响,无法提高实际测验运算速率,并且缺乏浮点(FP)支持,导致在保持数据隐私的情况下,不能提高模型精度。为了解决以上问题,我们提出了一个精简可Programmable RISC-V DNN处理器,具有在设备上进行学习的能力。这个处理器支持多种定点精度的固定点DNN测验,从2位到16位,并且提高了设备上学习的支持,包括FP16操作。此外,我们运用多种方法,例如FP16乘法重复和多精度整数乘法重复,以及FPGA资源均衡分配,实现了许多实际资源的优化。实验结果显示,我们的处理器可以实现1.6∼14.6倍的测验速率和1.1∼14.6倍的能源效率,与对比项XpulpNN相比。此外,我们的处理器可以达到16.5倍的FPthroughput。”

A Testbed for Automating and Analysing Mobile Devices and their Applications

  • paper_url: http://arxiv.org/abs/2309.08158
  • repo_url: None
  • paper_authors: Lachlan Simpson, Kyle Millar, Adriel Cheng, Hong Gunn Chew, Cheng-Chew Lim
  • for: 提高网络状况意识,增强网络安全性。
  • methods: 使用机器学习技术提供网络设备和活动的视图,自动生成和标注网络流量。
  • results: 创建了两个标注的网络流量集,对应应用类型的分类任务进行了分析和比较。
    Abstract The need for improved network situational awareness has been highlighted by the growing complexity and severity of cyber-attacks. Mobile phones pose a significant risk to network situational awareness due to their dynamic behaviour and lack of visibility on a network. Machine learning techniques enhance situational awareness by providing administrators insight into the devices and activities which form their network. Developing machine learning techniques for situational awareness requires a testbed to generate and label network traffic. Current testbeds, however, are unable to automate the generation and labelling of realistic network traffic. To address this, we describe a testbed which automates applications on mobile devices to generate and label realistic traffic. From this testbed, two labelled datasets of network traffic have been created. We provide an analysis of the testbed automation reliability and benchmark the datasets for the task of application classification.
    摘要 需求改善网络现状意识提高,由于网络攻击的复杂性和严重性增加。移动设备对网络现状意识具有重要的风险,因为它们的动态行为和网络上的不可见性。机器学习技术可以提高现状意识,给管理员提供设备和活动的信息。开发机器学习技术需要一个测试环境,生成和标注网络流量。现有的测试环境无法自动生成和标注真实的网络流量。为解决这个问题,我们描述了一个测试环境,通过自动运行移动设备上的应用程序来生成和标注真实的网络流量。从这个测试环境中,我们创建了两个标注的网络流量集合。我们分析了测试环境自动化可靠性,并对这两个数据集进行了应用类别化分析。

Two-Step Knowledge Distillation for Tiny Speech Enhancement

  • paper_url: http://arxiv.org/abs/2309.08144
  • repo_url: None
  • paper_authors: Rayan Daod Nathoo, Mikolaj Kegler, Marko Stamenovic
  • for: 这项研究旨在提出一种新的两步小声音提升模型筛选方法,以提高模型压缩和采用精细的知识储存技术。
  • methods: 该方法首先使用知识储存对象进行预训练,然后转换到完全监督学习 regime。此外,提出了一种新的细化相似性保持知识储存损失函数,以匹配学生模型的内部活动矩阵与教师模型的矩阵。
  • results: 该方法在具有高压缩和低信号噪比(SNR)的条件下表现广泛改进,特别是在严重的条件下,如 -5 dB输入SNR和63倍压缩下,实现了信号质量与噪声比(SNR)的改进约0.9 dB和1.1 dB。
    Abstract Tiny, causal models are crucial for embedded audio machine learning applications. Model compression can be achieved via distilling knowledge from a large teacher into a smaller student model. In this work, we propose a novel two-step approach for tiny speech enhancement model distillation. In contrast to the standard approach of a weighted mixture of distillation and supervised losses, we firstly pre-train the student using only the knowledge distillation (KD) objective, after which we switch to a fully supervised training regime. We also propose a novel fine-grained similarity-preserving KD loss, which aims to match the student's intra-activation Gram matrices to that of the teacher. Our method demonstrates broad improvements, but particularly shines in adverse conditions including high compression and low signal to noise ratios (SNR), yielding signal to distortion ratio gains of 0.9 dB and 1.1 dB, respectively, at -5 dB input SNR and 63x compression compared to baseline.
    摘要 小型、 causal 模型在嵌入式音频机器学习应用中非常重要。模型压缩可以通过将大教师模型中的知识透析到小学生模型中来实现。在这项工作中,我们提出了一种新的两步方法 для小声音提升模型压缩。相比标准方法,我们首先在学生模型中使用只有知识透析(KD)目标进行预训练,然后将学生模型转换到完全监督学习 regime。我们还提出了一种新的细致保持相似性的 KD 损失函数,旨在将学生模型的内部活动矩阵与教师模型的相似。我们的方法在各种情况下都显示了广泛的改进,特别是在高压缩和低信噪比(SNR)下,它们的信噪比提升为0.9 dB和1.1 dB,分别在 -5 dB 输入 SNR 和 63x 压缩下。

Oobleck: Resilient Distributed Training of Large Models Using Pipeline Templates

  • paper_url: http://arxiv.org/abs/2309.08125
  • repo_url: https://github.com/symbioticlab/oobleck
  • paper_authors: Insu Jang, Zhenning Yang, Zhen Zhang, Xin Jin, Mosharaf Chowdhury
  • for: 这篇论文是为了实现大型深度神经网络模型的分布式训练,并提供保证性缺陷tolerance的方法。
  • methods: 这篇论文采用了规划执行协同设计方法,首先生成一组异类管道模板,然后将至少有$f+1$个同等管道复制,以承受任何$f$个同时故障。在执行时,它利用已经复制的模型状态来提供快速恢复。
  • results: 评估表明,使用Oobleck实现大型深度神经网络模型的分布式训练可以保证资源不会浪费,并且可以在大规模的模型训练中提供高 durchput,与现有的缺陷tolerance解决方案相比,Oobleck可以提高到13.9倍。
    Abstract Oobleck enables resilient distributed training of large DNN models with guaranteed fault tolerance. It takes a planning-execution co-design approach, where it first generates a set of heterogeneous pipeline templates and instantiates at least $f+1$ logically equivalent pipeline replicas to tolerate any $f$ simultaneous failures. During execution, it relies on already-replicated model states across the replicas to provide fast recovery. Oobleck provably guarantees that some combination of the initially created pipeline templates can be used to cover all available resources after $f$ or fewer simultaneous failures, thereby avoiding resource idling at all times. Evaluation on large DNN models with billions of parameters shows that Oobleck provides consistently high throughput, and it outperforms state-of-the-art fault tolerance solutions like Bamboo and Varuna by up to $13.9x$.
    摘要

Supervised Stochastic Neighbor Embedding Using Contrastive Learning

  • paper_url: http://arxiv.org/abs/2309.08077
  • repo_url: https://github.com/imyizhang/manifold-learn
  • paper_authors: Yi Zhang
  • for: 本研究旨在将自动编码学习中的自然 neighboor embedding(SNE)方法与对抗学习(contrastive learning)相结合,以便在采用 labels 信息的全盘 Setting 下进行减维。
  • methods: 本研究使用的方法包括 t-SNE 和 UMAP,以及基于对抗学习的自然 neighboor embedding(SNE)方法。
  • results: 研究发现,在保持数据集中 neighboor 信息的情况下,将对抗学习引入到减维中可以更好地利用标签信息,使得样本 clusters 在低维度嵌入空间中更加紧密地凝聚在一起,同时将不同类型的样本 clusters 分开。
    Abstract Stochastic neighbor embedding (SNE) methods $t$-SNE, UMAP are two most popular dimensionality reduction methods for data visualization. Contrastive learning, especially self-supervised contrastive learning (SSCL), has showed great success in embedding features from unlabeled data. The conceptual connection between SNE and SSCL has been exploited. In this work, within the scope of preserving neighboring information of a dataset, we extend the self-supervised contrastive approach to the fully-supervised setting, allowing us to effectively leverage label information. Clusters of samples belonging to the same class are pulled together in low-dimensional embedding space, while simultaneously pushing apart clusters of samples from different classes.
    摘要