cs.LG - 2023-08-23

The Challenges of Machine Learning for Trust and Safety: A Case Study on Misinformation Detection

  • paper_url: http://arxiv.org/abs/2308.12215
  • repo_url: https://github.com/ramybaly/News-Media-Reliability
  • paper_authors: Madelyne Xiao, Jonathan Mayer
  • for: 这篇论文旨在探讨机器学习在信任和安全问题上的应用,使用假信息检测作为例子。
  • methods: 作者系мати了关于自动检测假信息的文献,涵盖270篇最具影响力的论文。然后,作者分析了这些论文中的数据和代码可用性、设计异常、可重现性和泛化性等方面的缺点。
  • results: 研究发现了论文中的一些缺点,包括:检测任务与实际场景不符,数据集和模型评估不符合实际情况,模型无法在非标准数据上泛化。这些缺点让人们对检测性能和实用性的宣称存在很大的质疑。
    Abstract We examine the disconnect between scholarship and practice in applying machine learning to trust and safety problems, using misinformation detection as a case study. We systematize literature on automated detection of misinformation across a corpus of 270 well-cited papers in the field. We then examine subsets of papers for data and code availability, design missteps, reproducibility, and generalizability. We find significant shortcomings in the literature that call into question claimed performance and practicality. Detection tasks are often meaningfully distinct from the challenges that online services actually face. Datasets and model evaluation are often non-representative of real-world contexts, and evaluation frequently is not independent of model training. Data and code availability is poor. Models do not generalize well to out-of-domain data. Based on these results, we offer recommendations for evaluating machine learning applications to trust and safety problems. Our aim is for future work to avoid the pitfalls that we identify.
    摘要 我们研究机器学习在安全和信任问题上的应用中存在的知识沟通问题,使用假信息检测为 caso study。我们对涉及270篇论文的领域评审了自动化假信息检测的文献。然后,我们对文章中的数据和代码可用性、设计异常、可重现性和泛化性进行了分析。我们发现了 significativo shortcomings 在文献中,这些缺陷会质疑提出的性能和实用性。检测任务经常与实际场景中的挑战不符,数据集和模型评估 часто不准确反映实际情况,评估frequently不独立于模型训练。数据和代码可用性差,模型不好泛化到非本地数据上。根据这些结果,我们提出了评估机器学习应用于信任和安全问题的建议。我们的目标是将来的研究避免我们所标注的坑。

Learning to Learn Financial Networks for Optimising Momentum Strategies

  • paper_url: http://arxiv.org/abs/2308.12212
  • repo_url: None
  • paper_authors: Xingyue Pu, Stefan Zohren, Stephen Roberts, Xiaowen Dong
  • for: 这篇论文旨在提出一种新型的风险豁免,即基于金融网络的资产间关系来预测未来回报。
  • methods: 这篇论文提出了一种基于机器学习的结构,同时学习金融网络和优化交易信号,以实现网络动量策略的最优化。
  • results: 经过64个连续未来合约的回测,模型显示了明显的资产收益和风险控制提升,资产收益率为1.74,覆盖20年时间。
    Abstract Network momentum provides a novel type of risk premium, which exploits the interconnections among assets in a financial network to predict future returns. However, the current process of constructing financial networks relies heavily on expensive databases and financial expertise, limiting accessibility for small-sized and academic institutions. Furthermore, the traditional approach treats network construction and portfolio optimisation as separate tasks, potentially hindering optimal portfolio performance. To address these challenges, we propose L2GMOM, an end-to-end machine learning framework that simultaneously learns financial networks and optimises trading signals for network momentum strategies. The model of L2GMOM is a neural network with a highly interpretable forward propagation architecture, which is derived from algorithm unrolling. The L2GMOM is flexible and can be trained with diverse loss functions for portfolio performance, e.g. the negative Sharpe ratio. Backtesting on 64 continuous future contracts demonstrates a significant improvement in portfolio profitability and risk control, with a Sharpe ratio of 1.74 across a 20-year period.
    摘要 网络势头提供了一种新型的风险资本,利用财务网络中资产之间的连接来预测未来的回报。然而,现有的金融网络构建过程受到了高价数据库和金融专业知识的限制,导致小型和学术机构有限的访问。另外,传统方法将网络构建和投资策略优化视为两个独立的任务,可能会降低投资策略的优化性。为解决这些挑战,我们提出了L2GMOM,一个终端机器学习框架,同时学习金融网络和投资策略。L2GMOM的模型是一个具有高度可读性的神经网络,其来自算法折叠。L2GMOM是灵活的,可以使用多种损失函数来优化股票表现,例如负涨策略率。在64个连续未来合约的回测中,L2GMOM显示出了明显的投资策略效果和风险控制提升,负涨策略率为1.74 across a 20-year period。

ULDP-FL: Federated Learning with Across Silo User-Level Differential Privacy

  • paper_url: http://arxiv.org/abs/2308.12210
  • repo_url: https://github.com/fumiyukikato/uldp-fl
  • paper_authors: Fumiyuki Kato, Li Xiong, Shun Takagi, Yang Cao, Masatoshi Yoshikawa
  • for: This paper focuses on providing user-level differential privacy (DP) in cross-silo federated learning (FL) settings, where a single user’s data may belong to multiple silos.
  • methods: The proposed algorithm, called ULDP-FL, ensures user-level DP through per-user weighted clipping, departing from group-privacy approaches. The algorithm also utilizes cryptographic building blocks to enhance its utility and provide private implementation.
  • results: The paper provides a theoretical analysis of the algorithm’s privacy and utility, and empirical experiments on real-world datasets show substantial improvements in privacy-utility trade-offs under user-level DP compared to baseline methods. Specifically, the results demonstrate that ULDP-FL achieves a better balance between privacy and utility compared to existing methods.Here’s the same information in Simplified Chinese text:
  • for: 这篇论文关注在跨存储桶 federated learning(FL)设置下提供用户级别的敏感度保证(DP),其中单个用户的数据可能属于多个桶中。
  • methods: 提案的算法(ULDP-FL)通过每个用户的权重剪辑实现用户级别的DP,与集群隐私方法不同。它还利用了 криптографических建筑物来增强其实用性和提供private实现。
  • results: 论文提供了算法的理论分析和实践实验,显示ULDP-FL在用户级别的DP下实现了更好的隐私利用冲突比基eline方法。具体来说,实验结果表明ULDP-FL在隐私利用冲突中实现了更好的平衡。
    Abstract Differentially Private Federated Learning (DP-FL) has garnered attention as a collaborative machine learning approach that ensures formal privacy. Most DP-FL approaches ensure DP at the record-level within each silo for cross-silo FL. However, a single user's data may extend across multiple silos, and the desired user-level DP guarantee for such a setting remains unknown. In this study, we present ULDP-FL, a novel FL framework designed to guarantee user-level DP in cross-silo FL where a single user's data may belong to multiple silos. Our proposed algorithm directly ensures user-level DP through per-user weighted clipping, departing from group-privacy approaches. We provide a theoretical analysis of the algorithm's privacy and utility. Additionally, we enhance the algorithm's utility and showcase its private implementation using cryptographic building blocks. Empirical experiments on real-world datasets show substantial improvements in our methods in privacy-utility trade-offs under user-level DP compared to baseline methods. To the best of our knowledge, our work is the first FL framework that effectively provides user-level DP in the general cross-silo FL setting.
    摘要 diferencialmente privado aprendizaje federado (DP-FL) ha atraído la atención como un enfoque de aprendizaje automático colaborativo que garantiza la privacidad formal. La mayoría de los enfoques DP-FL aseguran la privacidad DP en el nivel de registros dentro de cada silo para la FL trans-silo. Sin embargo, un usuario solo puede tener datos que se extienden a través de múltiples silos, y el garantizado de privacidad del usuario a nivel se desconoce para tal configuración. En este estudio, presentamos ULDP-FL, un marco de FL novel diseñado para garantizar la privacidad del usuario en la FL trans-silo donde los datos de un usuario solo pueden pertenecer a múltiples silos. Nuestro algoritmo propuesto garantiza directamente la privacidad del usuario a través de clipeo ponderado por usuario, en lugar de enfoques de privacidad de grupo. Proporcionamos un análisis teórico de la privacidad y la utilidad del algoritmo. Además, mejoramos la utilidad del algoritmo y demostramos su implementación privada utilizando bloques de construcción criptográfica. Los experimentos empíricos en conjuntos de datos reales muestran mejoras significativas en los compromisos de privacidad-utilidad en comparación con los métodos de línea base bajo la privacidad del usuario. A nuestro conocimiento, nuestro trabajo es el primer marco de FL que proporciona efectivamente la privacidad del usuario en el entorno general de FL trans-silo.

Curriculum Learning with Adam: The Devil Is in the Wrong Details

  • paper_url: http://arxiv.org/abs/2308.12202
  • repo_url: None
  • paper_authors: Lucas Weber, Jaap Jumelet, Paul Michel, Elia Bruni, Dieuwke Hupkes
  • for: This paper aims to explore the limitations of curriculum learning (CL) methods in natural language processing (NLP) and understand why they often fail to achieve expected results.
  • methods: The paper uses a combination of replication and extension of recent CL methods, as well as a deep dive into the (in)effectiveness of these curricula in different scenarios.
  • results: The paper finds that CL methods often learn to adapt to suboptimally chosen optimisation parameters for the Adam algorithm, and none of the common hand-crafted and automated CL approaches outperforms optimisation with only Adam and well-chosen hyperparameters. The results contribute to understanding why CL methods work, but also urge caution when claiming positive results.Here is the same information in Simplified Chinese text:
  • for: 这篇论文目标是探讨自然语言处理(NLP)领域中curriculum learning(CL)方法的局限性,并理解它们常常无法达到预期的结果。
  • methods: 论文使用了复制和扩展最近的CL方法,以及对这些课程的深入分析。
  • results: 论文发现,CL方法经常学习到Adam算法中不优化的优化参数,而且没有任何常见的手动制定和自动生成CL方法能够超过Adam算法和良好选择的超参数。
    Abstract Curriculum learning (CL) posits that machine learning models -- similar to humans -- may learn more efficiently from data that match their current learning progress. However, CL methods are still poorly understood and, in particular for natural language processing (NLP), have achieved only limited success. In this paper, we explore why. Starting from an attempt to replicate and extend a number of recent curriculum methods, we find that their results are surprisingly brittle when applied to NLP. A deep dive into the (in)effectiveness of the curricula in some scenarios shows us why: when curricula are employed in combination with the popular Adam optimisation algorithm, they oftentimes learn to adapt to suboptimally chosen optimisation parameters for this algorithm. We present a number of different case studies with different common hand-crafted and automated CL approaches to illustrate this phenomenon, and we find that none of them outperforms optimisation with only Adam with well-chosen hyperparameters. As such, our results contribute to understanding why CL methods work, but at the same time urge caution when claiming positive results.
    摘要 curriculum学习(CL)认为机器学习模型,类似于人类,可能从数据中更加效率地学习。然而,CL方法仍然不够了解,特别是在自然语言处理(NLP)领域,成果很有限。在这篇论文中,我们探索了这个问题的原因。从尝试复制和扩展一些最近的课程方法开始,我们发现其结果在NLP领域很brittl。我们对这些课程的效果进行了深入的分析,发现它们在使用流行的Adam优化算法时 oftentimes会适应不合适选择的优化参数。我们在不同的常见手动制定和自动生成CL方法的case study中发现, none of them outperforms仅使用Adam算法和合适的超参数。因此,我们的结果对CL方法的工作 Mechanism做出了贡献,同时也谨慎地宣称了CL方法的成果。

Predicting Drug Solubility Using Different Machine Learning Methods – Linear Regression Model with Extracted Chemical Features vs Graph Convolutional Neural Network

  • paper_url: http://arxiv.org/abs/2308.12325
  • repo_url: None
  • paper_authors: John Ho, Zhao-Heng Yin, Colin Zhang, Henry Overhauser, Kyle Swanson, Yang Ha
  • for: 这项研究的目的是提高药物设计中的化学结构影响化学性能的理解,以便设计新药物。
  • methods: 这项研究使用了两种机器学习模型:线性回归模型和图 convolutional neural network 模型,并在多个实验数据集上应用了这两种模型。
  • results: 研究发现,GCNN 模型在多个实验数据集上表现最佳,但是当前 GCNN 模型是黑盒模型,而线性回归模型的功能重要性分析却可以提供更多有关化学元件的影响的信息。
    Abstract Predicting the solubility of given molecules is an important task in the pharmaceutical industry, and consequently this is a well-studied topic. In this research, we revisited this problem with the advantage of modern computing resources. We applied two machine learning models, a linear regression model and a graph convolutional neural network model, on multiple experimental datasets. Both methods can make reasonable predictions while the GCNN model had the best performance. However, the current GCNN model is a black box, while feature importance analysis from the linear regression model offers more insights into the underlying chemical influences. Using the linear regression model, we show how each functional group affects the overall solubility. Ultimately, knowing how chemical structure influences chemical properties is crucial when designing new drugs. Future work should aim to combine the high performance of GCNNs with the interpretability of linear regression, unlocking new advances in next generation high throughput screening.
    摘要 预测给定分子的溶解度是药品工业中重要的任务,因此这是一个已有的研究主题。在这项研究中,我们利用现代计算资源重新评估了这个问题,并应用了两种机器学习模型:线性回归模型和图 convolutional neural network 模型。两种方法都可以做出合理的预测,但GCNN模型的性能最好。然而,当前GCNN模型是一个黑盒模型,而线性回归模型的特征重要性分析可以提供更多有关化学影响的信息。使用线性回归模型,我们显示了每个 функциональ组对总溶解度的影响。最后,了解化学结构对化学性质的影响是药品设计新药的关键。未来的工作应该努力结合GCNN的高性能和线性回归模型的可读性,开启下一代高通过率屏选。

Self-Supervised Knowledge-Driven Deep Learning for 3D Magnetic Inversion

  • paper_url: http://arxiv.org/abs/2308.12193
  • repo_url: None
  • paper_authors: Yinshuo Li, Zhuo Jia, Wenkai Lu, Cao Song
  • for: 这个论文旨在提出一种基于自主学习的非破坏地球物理方法,用于估计地表磁场异常数据下的地下吸引分布。
  • methods: 该方法使用了自主学习的深度学习方法,通过一个封闭的循环结构,将目标场数据与反射模型相互关联。
  • results: 对比实验表明,提出的方法可以在磁场异常数据下提供可靠的地下吸引分布估计,并且可以更好地解释深度学习模型的含义。
    Abstract The magnetic inversion method is one of the non-destructive geophysical methods, which aims to estimate the subsurface susceptibility distribution from surface magnetic anomaly data. Recently, supervised deep learning methods have been widely utilized in lots of geophysical fields including magnetic inversion. However, these methods rely heavily on synthetic training data, whose performance is limited since the synthetic data is not independently and identically distributed with the field data. Thus, we proposed to realize magnetic inversion by self-supervised deep learning. The proposed self-supervised knowledge-driven 3D magnetic inversion method (SSKMI) learns on the target field data by a closed loop of the inversion and forward models. Given that the parameters of the forward model are preset, SSKMI can optimize the inversion model by minimizing the mean absolute error between observed and re-estimated surface magnetic anomalies. Besides, there is a knowledge-driven module in the proposed inversion model, which makes the deep learning method more explicable. Meanwhile, comparative experiments demonstrate that the knowledge-driven module can accelerate the training of the proposed method and achieve better results. Since magnetic inversion is an ill-pose task, SSKMI proposed to constrain the inversion model by a guideline in the auxiliary loop. The experimental results demonstrate that the proposed method is a reliable magnetic inversion method with outstanding performance.
    摘要 《磁偏振方法》是一种不破坏地球物理方法,旨在从地表磁偏数据中估算地下 Distribution of susceptibility. 现在,大量的地球物理领域,包括磁偏方法,都使用了监督式深度学习方法。然而,这些方法仰赖于人工生成的Synthetic training data,其性能有限,因为人工生成的数据不是独立并且相同分布于场数据。因此,我们提出了基于自我监督深度学习的磁偏方法。我们的自我监督知识驱动3D磁偏方法(SSKMI)通过关闭循环的反向模型和减少模型来学习目标场数据。给定前置的前向模型参数,SSKMI可以优化减少模型参数,使其与观测到的地表磁偏数据的差异降到最小。此外,我们还添加了知识驱动模块到减少模型中,使得深度学习方法更加可读性。同时,对比 экспериiments表明,知识驱动模块可以加速我们的方法的训练和提高其性能。由于磁偏是一个不定问题,我们提出了在辅助循环中使用导向的约束来约束减少模型。实验结果表明,我们的方法是一种可靠的磁偏方法,性能出色。

Robustness Analysis of Continuous-Depth Models with Lagrangian Techniques

  • paper_url: http://arxiv.org/abs/2308.12192
  • repo_url: None
  • paper_authors: Sophie A. Neubauer, Radu Grosu
  • for: 本研究旨在提供一种综合性的 Lagrange 验证技术,用于评估时间连续过程中的行为坚定性。
  • methods: 本研究使用了 LRT-NG、SLR 和 GoTube 等算法来构建紧距抽象(reachtube),以估计在给定时间框内可达的状态集。这些算法具有确定性和统计性的保证。
  • results: 实验表明,Lagrange 技术在对不同时间连续模型的坚定性分析中表现出色,比较流行的 LRT、Flow* 和 CAPD 等方法更高效。
    Abstract This paper presents, in a unified fashion, deterministic as well as statistical Lagrangian-verification techniques. They formally quantify the behavioral robustness of any time-continuous process, formulated as a continuous-depth model. To this end, we review LRT-NG, SLR, and GoTube, algorithms for constructing a tight reachtube, that is, an over-approximation of the set of states reachable within a given time-horizon, and provide guarantees for the reachtube bounds. We compare the usage of the variational equations, associated to the system equations, the mean value theorem, and the Lipschitz constants, in achieving deterministic and statistical guarantees. In LRT-NG, the Lipschitz constant is used as a bloating factor of the initial perturbation, to compute the radius of an ellipsoid in an optimal metric, which over-approximates the set of reachable states. In SLR and GoTube, we get statistical guarantees, by using the Lipschitz constants to compute local balls around samples. These are needed to calculate the probability of having found an upper bound, of the true maximum perturbation at every timestep. Our experiments demonstrate the superior performance of Lagrangian techniques, when compared to LRT, Flow*, and CAPD, and illustrate their use in the robustness analysis of various continuous-depth models.
    摘要 The paper compares the use of variational equations, associated with the system equations, the mean value theorem, and the Lipschitz constants, in achieving deterministic and statistical guarantees. In LRT-NG, the Lipschitz constant is used as a bloating factor of the initial perturbation to compute the radius of an ellipsoid in an optimal metric, which over-approximates the set of reachable states. In SLR and GoTube, the Lipschitz constants are used to compute local balls around samples, which are needed to calculate the probability of having found an upper bound of the true maximum perturbation at every timestep.The paper also presents experiments demonstrating the superior performance of Lagrangian techniques compared to LRT, Flow*, and CAPD, and illustrates their use in the robustness analysis of various continuous-depth models.

Development and external validation of a lung cancer risk estimation tool using gradient-boosting

  • paper_url: http://arxiv.org/abs/2308.12188
  • repo_url: https://github.com/plbenveniste/lungcancerrisk
  • paper_authors: Pierre-Louis Benveniste, Julie Alberge, Lei Xing, Jean-Emmanuel Bibault
  • for: 这个研究旨在开发一个基于机器学习算法的工具,用于估计在5年内是否有可能发生肺癌。
  • methods: 该研究使用了PLCO癌症creening试验和NLST试验的数据,并使用了XGBoost ensemble学习算法进行特征选择、超参数优化和模型加鲁。
  • results: 模型在PLCO试验 dataset上具有良好的准确率(Brier score=0.044)和ROC-AUC(82%),并且在NLST试验 dataset上也达到了一定的准确率(ROC-AUC=70%)。相比USPSTF的肺癌检测指南,该模型具有同等的回归率,但是准确率较高(13.1% vs. 9.3%)。
    Abstract Lung cancer is a significant cause of mortality worldwide, emphasizing the importance of early detection for improved survival rates. In this study, we propose a machine learning (ML) tool trained on data from the PLCO Cancer Screening Trial and validated on the NLST to estimate the likelihood of lung cancer occurrence within five years. The study utilized two datasets, the PLCO (n=55,161) and NLST (n=48,595), consisting of comprehensive information on risk factors, clinical measurements, and outcomes related to lung cancer. Data preprocessing involved removing patients who were not current or former smokers and those who had died of causes unrelated to lung cancer. Additionally, a focus was placed on mitigating bias caused by censored data. Feature selection, hyper-parameter optimization, and model calibration were performed using XGBoost, an ensemble learning algorithm that combines gradient boosting and decision trees. The ML model was trained on the pre-processed PLCO dataset and tested on the NLST dataset. The model incorporated features such as age, gender, smoking history, medical diagnoses, and family history of lung cancer. The model was well-calibrated (Brier score=0.044). ROC-AUC was 82% on the PLCO dataset and 70% on the NLST dataset. PR-AUC was 29% and 11% respectively. When compared to the USPSTF guidelines for lung cancer screening, our model provided the same recall with a precision of 13.1% vs. 9.3% on the PLCO dataset and 3.2% vs. 3.1% on the NLST dataset. The developed ML tool provides a freely available web application for estimating the likelihood of developing lung cancer within five years. By utilizing risk factors and clinical data, individuals can assess their risk and make informed decisions regarding lung cancer screening. This research contributes to the efforts in early detection and prevention strategies, aiming to reduce lung cancer-related mortality rates.
    摘要 严重的肺癌是全球的主要死亡原因之一,强调了早期发现的重要性以提高存活率。在这项研究中,我们提出了一种基于机器学习(ML)技术的工具,用于在5年内lung cancer的发生可能性的估计。该工具在PLCO肺癌creening试验和NLST肺癌试验上使用了大量数据,并通过选择特征、优化超参数和模型准确性来训练和测试。该模型包括年龄、性别、吸烟史、医疗诊断和家族史等因素。模型在PLCO数据集上训练,并在NLST数据集上测试。模型的准确性很好(Brier score=0.044),ROC-AUC在PLCO数据集上为82%,在NLST数据集上为70%。PR-AUC分别为29%和11%。与美国医疗保险基金(USPSTF)肺癌检测指南相比,我们的模型具有同等的回归率,但精度更高(13.1% vs. 9.3%)。我们开发的ML工具提供了一个免费的在线应用程序,用于在5年内肺癌的发生可能性的估计。通过利用风险因素和临床数据,个人可以评估自己的风险,并做出有知ledge的决策关于肺癌检测。这项研究对早期发现和预防策略的发展做出了贡献,以减少肺癌相关的死亡率。

Unsupervised anomalies detection in IIoT edge devices networks using federated learning

  • paper_url: http://arxiv.org/abs/2308.12175
  • repo_url: None
  • paper_authors: Niyomukiza Thamar, Hossam Samy Elsaid Sharara
  • for: 这个研究是为了解决 IoT/ IIoT 设备上的数据隐私问题,使用分布式机器学习方法(Federated Learning,FL)来训练机器学习模型,而不需要将数据传输到中央服务器。
  • methods: 本研究使用 Fedavg 算法,让参与训练的设备上执行模型训练,并将训练结果传递到协调服务器,进行模型平均值。
  • results: 研究发现,使用 Fedavg 算法可以达到中央机器学习方法的效果,并且解决了对数据隐私的担忧。但是,研究也发现了一些 Fedavg 的缺陷,例如在训练过程中,一些设备可能无法参与训练,导致模型的不公平性。因此,研究人员提出了一个 Fair Fedavg 算法,以解决这个问题。
    Abstract In a connection of many IoT devices that each collect data, normally training a machine learning model would involve transmitting the data to a central server which requires strict privacy rules. However, some owners are reluctant of availing their data out of the company due to data security concerns. Federated learning(FL) as a distributed machine learning approach performs training of a machine learning model on the device that gathered the data itself. In this scenario, data is not share over the network for training purpose. Fedavg as one of FL algorithms permits a model to be copied to participating devices during a training session. The devices could be chosen at random, and a device can be aborted. The resulting models are sent to the coordinating server and then average models from the devices that finished training. The process is repeated until a desired model accuracy is achieved. By doing this, FL approach solves the privacy problem for IoT/ IIoT devices that held sensitive data for the owners. In this paper, we leverage the benefits of FL and implemented Fedavg algorithm on a recent dataset that represent the modern IoT/ IIoT device networks. The results were almost the same as the centralized machine learning approach. We also evaluated some shortcomings of Fedavg such as unfairness that happens during the training when struggling devices do not participate for every stage of training. This inefficient training of local or global model could lead in a high number of false alarms in intrusion detection systems for IoT/IIoT gadgets developed using Fedavg. Hence, after evaluating the FedAv deep auto encoder with centralized deep auto encoder ML, we further proposed and designed a Fair Fedavg algorithm that will be evaluated in the future work.
    摘要 在许多互联网关联的设备之间连接多个设备,通常在训练机器学习模型时需要将数据传输到中央服务器,这会导致严格的隐私规则。然而,一些拥有者因数据安全问题而不愿提供数据。 Federation learning(FL)是一种分布式机器学习方法,在设备上训练机器学习模型,不需要将数据传输到网络上。 Fedavg算法是FL中的一种,允许在训练过程中将模型 копи到参与设备上。这些设备可以随机选择,并且可以中途终止。获得的模型将被传输到协调服务器,然后平均处理完成训练的设备。这种方法可以解决互联网/IIoT设备拥有敏感数据的隐私问题。在这篇论文中,我们利用了FL的优点,并在最新的数据集上实现了Fedavg算法。结果与中央机器学习方法相似。然而,我们还发现了Fedavg算法的一些缺点,如训练过程中不参与的设备可能会导致不公平。这可能会在对互联网/IIoT设备开发的投入检测系统中导致高数量的假警示。因此,我们在后续工作中提出了一种公平的Fedavg算法,以解决这个问题。

Data-driven decision-focused surrogate modeling

  • paper_url: http://arxiv.org/abs/2308.12161
  • repo_url: https://github.com/ddolab/decfocsurrmod
  • paper_authors: Rishabh Gupta, Qi Zhang
  • for: 解决 computationally challenging nonlinear optimization problems in real-time settings
  • methods: 使用 data-driven framework 学习一个简单的优化模型,以减少决策预测错误
  • results: 通过数学实验 validate 了框架,并与标准数据驱动模型方法进行比较,显示了更高的数据效率和决策预测准确率
    Abstract We introduce the concept of decision-focused surrogate modeling for solving computationally challenging nonlinear optimization problems in real-time settings. The proposed data-driven framework seeks to learn a simpler, e.g. convex, surrogate optimization model that is trained to minimize the decision prediction error, which is defined as the difference between the optimal solutions of the original and the surrogate optimization models. The learning problem, formulated as a bilevel program, can be viewed as a data-driven inverse optimization problem to which we apply a decomposition-based solution algorithm from previous work. We validate our framework through numerical experiments involving the optimization of common nonlinear chemical processes such as chemical reactors, heat exchanger networks, and material blending systems. We also present a detailed comparison of decision-focused surrogate modeling with standard data-driven surrogate modeling methods and demonstrate that our approach is significantly more data-efficient while producing simple surrogate models with high decision prediction accuracy.
    摘要 我们介绍一种专注决策的代理模型,用于解决具有 Computationally Challenging Nonlinear Optimization Problems 的实时设定中的问题。我们的框架将学习一个更简单的、例如凸的估计服务器模型,以减少决策预测误差,即原始服务器模型的优化解的差异。我们的学习问题可以被视为一个数据驱动的逆优化问题,我们将运用之前的研究中的分解法来解决。我们透过实验证明了我们的框架,并与标准的数据驱动模型方法进行比较,发现我们的方法更加资料效率,实现高精度决策预测。

A Probabilistic Fluctuation based Membership Inference Attack for Generative Models

  • paper_url: http://arxiv.org/abs/2308.12143
  • repo_url: None
  • paper_authors: Wenjie Fu, Huandong Wang, Chen Gao, Guanghua Liu, Yong Li, Tao Jiang
  • for: 这篇论文主要研究的是机器学习模型的训练数据中是否包含某记录的问题,即成员推测攻击(MIA)。
  • methods: 该论文使用了现有的MIA方法,并对这些方法进行了改进和扩展,以适应生成模型中的Memorization现象。
  • results: 对多种生成模型和数据集进行了广泛的实验,结果显示,提出的PFAMI方法可以提高攻击成功率(ASR)约27.9%,比最佳基eline更高。
    Abstract Membership Inference Attack (MIA) identifies whether a record exists in a machine learning model's training set by querying the model. MIAs on the classic classification models have been well-studied, and recent works have started to explore how to transplant MIA onto generative models. Our investigation indicates that existing MIAs designed for generative models mainly depend on the overfitting in target models. However, overfitting can be avoided by employing various regularization techniques, whereas existing MIAs demonstrate poor performance in practice. Unlike overfitting, memorization is essential for deep learning models to attain optimal performance, making it a more prevalent phenomenon. Memorization in generative models leads to an increasing trend in the probability distribution of generating records around the member record. Therefore, we propose a Probabilistic Fluctuation Assessing Membership Inference Attack (PFAMI), a black-box MIA that infers memberships by detecting these trends via analyzing the overall probabilistic fluctuations around given records. We conduct extensive experiments across multiple generative models and datasets, which demonstrate PFAMI can improve the attack success rate (ASR) by about 27.9% when compared with the best baseline.
    摘要 Member Inference Attack (MIA) 可以判断一个记录是否在机器学习模型的训练集中存在,通过访问模型。针对经典的分类模型,MIA的研究已经很深入,而现在的研究已经开始探索如何将MIA应用于生成模型。我们的调查表明,现有的生成模型MIA主要依赖于目标模型的过分学习。然而,过分学习可以通过多种正则化技术来避免,而现有的MIA在实践中表现不佳。与过分学习不同,记忆是深度学习模型达到优化性能的关键因素,因此在生成模型中更加普遍。记忆在生成模型中导致生成记录的概率分布增加趋势,因此我们提出了一种基于概率波动的黑obox会员攻击方法(PFAMI)。PFAMI通过分析给定记录的概率波动来推断成员身份。我们在多个生成模型和数据集上进行了广泛的实验,实验结果表明,PFAMI可以提高攻击成功率(ASR)约27.9%,相比最佳基准。

Masking Strategies for Background Bias Removal in Computer Vision Models

  • paper_url: http://arxiv.org/abs/2308.12127
  • repo_url: https://github.com/ananthu-aniraj/masking_strategies_bias_removal
  • paper_authors: Ananthu Aniraj, Cassio F. Dantas, Dino Ienco, Diego Marcos
  • for: 这篇论文主要探讨了精细图像分类任务中背景对准精度的影响,以及如何使用不同的掩盖策略来减少背景对准精度的影响。
  • methods: 这篇论文使用了两种掩盖策略来减少背景对准精度的影响,包括 Early Masking 和 Late Masking。Early Masking 是在输入图像水平进行背景资讯的 removing,而 Late Masking 则是在高级特征水平进行背景掩盖。
  • results: 实验结果显示,两种掩盖策略都能够提高 OOD 性能,并且 Early Masking 显示出最佳 OOD 性能。另外,一种使用 GAP-Pooled Patch token-based classification 和 Early Masking 的 ViT 模型也取得了最高 OOD 抗衡性能。
    Abstract Models for fine-grained image classification tasks, where the difference between some classes can be extremely subtle and the number of samples per class tends to be low, are particularly prone to picking up background-related biases and demand robust methods to handle potential examples with out-of-distribution (OOD) backgrounds. To gain deeper insights into this critical problem, our research investigates the impact of background-induced bias on fine-grained image classification, evaluating standard backbone models such as Convolutional Neural Network (CNN) and Vision Transformers (ViT). We explore two masking strategies to mitigate background-induced bias: Early masking, which removes background information at the (input) image level, and late masking, which selectively masks high-level spatial features corresponding to the background. Extensive experiments assess the behavior of CNN and ViT models under different masking strategies, with a focus on their generalization to OOD backgrounds. The obtained findings demonstrate that both proposed strategies enhance OOD performance compared to the baseline models, with early masking consistently exhibiting the best OOD performance. Notably, a ViT variant employing GAP-Pooled Patch token-based classification combined with early masking achieves the highest OOD robustness.
    摘要 “微型图像分类任务中的模型,特别是当某些类别之间的差异非常微小,且每个类别的训练数据少的时候,容易受到背景相关的偏见。为了更深入地了解这个重要问题,我们的研究探讨微型图像分类中背景对于模型的影响,以标准的对称神经网络(CNN)和视觉 трансформа者(ViT)为例。我们探讨了两种遮盾策略来减少背景对模型的影响:早期遮盾,将背景信息在输入图像水平上除除,以及晚期遮盾,将高层次的空间特征选择性地遮盾。实验结果显示,这两种遮盾策略都能够提高模型对于非典型背景的一般化性,其中早期遮盾一般表现较好。此外,一种基于GAP-Pooled Patchtoken的ViT变体,配合早期遮盾,实现了最高的非典型背景Robustness。”

An Accelerated Block Proximal Framework with Adaptive Momentum for Nonconvex and Nonsmooth Optimization

  • paper_url: http://arxiv.org/abs/2308.12126
  • repo_url: None
  • paper_authors: Weifeng Yang, Wenwen Min
  • for: solves nonconvex and nonsmooth optimization problems with block variables and adaptive momentum.
  • methods: uses an accelerated block proximal linear framework with adaptive momentum, enhances the comparison process, and allows random shuffling of update order for variable blocks.
  • results: monotonically decreases function value, globally converges, and achieves linear and sublinear convergence rates under mild assumptions. Additionally, the derivative set of the sequence generated by the algorithm is a critical point set, and the algorithm is effective and efficient in numerical experiments.
    Abstract We propose an accelerated block proximal linear framework with adaptive momentum (ABPL$^+$) for nonconvex and nonsmooth optimization. We analyze the potential causes of the extrapolation step failing in some algorithms, and resolve this issue by enhancing the comparison process that evaluates the trade-off between the proximal gradient step and the linear extrapolation step in our algorithm. Furthermore, we extends our algorithm to any scenario involving updating block variables with positive integers, allowing each cycle to randomly shuffle the update order of the variable blocks. Additionally, under mild assumptions, we prove that ABPL$^+$ can monotonically decrease the function value without strictly restricting the extrapolation parameters and step size, demonstrates the viability and effectiveness of updating these blocks in a random order, and we also more obviously and intuitively demonstrate that the derivative set of the sequence generated by our algorithm is a critical point set. Moreover, we demonstrate the global convergence as well as the linear and sublinear convergence rates of our algorithm by utilizing the Kurdyka-Lojasiewicz (K{\L}) condition. To enhance the effectiveness and flexibility of our algorithm, we also expand the study to the imprecise version of our algorithm and construct an adaptive extrapolation parameter strategy, which improving its overall performance. We apply our algorithm to multiple non-negative matrix factorization with the $\ell_0$ norm, nonnegative tensor decomposition with the $\ell_0$ norm, and perform extensive numerical experiments to validate its effectiveness and efficiency.
    摘要 我们提出一种加速的块靠近线性框架(ABPL$^+$)用于非对称和非准确优化。我们分析了一些算法中扩rap step失败的可能原因,并解决这个问题 by 提高比较过程,该过程评估了 proximal Gradient step 和线性扩rap step 之间的负担。此外,我们扩展了我们的算法,以便在每个ecycle中随机洗牌变量块的更新顺序。此外,在宽泛的假设下,我们证明了 ABPL$^+$ 可以不断降低函数值,而不是严格限制扩rap参数和步长,这表明了更新这些块的随机顺序的可行性和有效性。此外,我们还证明了我们的算法的全局收敛性,以及其线性和非线性收敛率,通过利用 Kurdyka-Lojasiewicz(K{\L)} 条件。为了提高我们的算法的效iveness和灵活性,我们还扩展了我们的研究到不精确版本的我们算法,并构建了 adaptive 扩rap参数策略。我们应用我们的算法于多个非正方形矩阵因子化、非正方形矩阵归一化等问题,并进行了广泛的数值实验来验证其效果和效率。

An Open-Source ML-Based Full-Stack Optimization Framework for Machine Learning Accelerators

  • paper_url: http://arxiv.org/abs/2308.12120
  • repo_url: None
  • paper_authors: Hadi Esmaeilzadeh, Soroush Ghodrati, Andrew B. Kahng, Joon Kyung Kim, Sean Kinzer, Sayak Kundu, Rohan Mahapatra, Susmita Dey Manasi, Sachin Sapatnekar, Zhiang Wang, Ziqing Zeng
  • for: 这个论文的目的是为了实现硬件加速的深度学习算法和非深度学习算法的设计空间探索(DSE)。
  • methods: 这个论文使用了物理设计驱动的学习based预测框架, combinig backend PPA分析与前端性能仿真,以实现准确的backend PPA和系统指标(如运行时和能耗)预测。此外,论文还提出了一种自动化DSE技术,通过自动化搜索架构和后端参数,以优化backend和系统指标。
  • results: 实验研究表明,该预测框架可以准确预测backend PPA和系统指标,在两种深度学习加速器平台(VTA和VeriGOOD-ML)的ASIC实现中,在12nm商业处理器和45nm研究处理器中,平均预测错误率只有7%或更低。
    Abstract Parameterizable machine learning (ML) accelerators are the product of recent breakthroughs in ML. To fully enable their design space exploration (DSE), we propose a physical-design-driven, learning-based prediction framework for hardware-accelerated deep neural network (DNN) and non-DNN ML algorithms. It adopts a unified approach that combines backend power, performance, and area (PPA) analysis with frontend performance simulation, thereby achieving a realistic estimation of both backend PPA and system metrics such as runtime and energy. In addition, our framework includes a fully automated DSE technique, which optimizes backend and system metrics through an automated search of architectural and backend parameters. Experimental studies show that our approach consistently predicts backend PPA and system metrics with an average 7% or less prediction error for the ASIC implementation of two deep learning accelerator platforms, VTA and VeriGOOD-ML, in both a commercial 12 nm process and a research-oriented 45 nm process.
    摘要 现代机器学习(ML)加速器的 Parameterizable 机制是 ML 的新突破口。为了全面探索设计空间(DSE),我们提议一种物理设计驱动、学习基于的预测框架,用于硬件加速深度神经网络(DNN)和非 DNN ML 算法。这种方法结合了后端能力、性能和面积(PPA)分析和前端性能仿真,从而实现了准确的后端 PPA 和系统指标(如运行时间和能耗)预测。此外,我们的框架还包括一种完全自动化 DSE 技术,通过自动搜索架构和后端参数的调整来优化后端和系统指标。实验研究表明,我们的方法可预测后端 PPA 和系统指标的误差在 7% 左右,对 ASIC 实现的两种深度学习加速器平台(VTA 和 VeriGOOD-ML)在商业 12 nm 进程和研究oriented 45 nm 进程中进行了可靠的预测。

Less is More – Towards parsimonious multi-task models using structured sparsity

  • paper_url: http://arxiv.org/abs/2308.12114
  • repo_url: None
  • paper_authors: Richa Upadhyay, Ronald Phlypo, Rajkumar Saini, Marcus Liwicki
  • For: This paper aims to incorporate structured group sparsity into a Multi-Task Learning (MTL) framework to develop parsimonious models that can effectively address multiple tasks with fewer parameters while maintaining comparable or superior performance to a dense model.* Methods: The paper uses channel-wise l1/l2 group sparsity in the shared layers of a Convolutional Neural Network (CNN) to facilitate the elimination of extraneous groups (channels) and impose a penalty on the weights, thereby enhancing the learning of all tasks.* Results: The paper compares the outcomes of single-task and multi-task experiments under group sparsity on two publicly available MTL datasets, NYU-v2 and CelebAMask-HQ, and investigates how changing the sparsification degree impacts both the performance of the model and the sparsity of groups.
    Abstract Group sparsity in Machine Learning (ML) encourages simpler, more interpretable models with fewer active parameter groups. This work aims to incorporate structured group sparsity into the shared parameters of a Multi-Task Learning (MTL) framework, to develop parsimonious models that can effectively address multiple tasks with fewer parameters while maintaining comparable or superior performance to a dense model. Sparsifying the model during training helps decrease the model's memory footprint, computation requirements, and prediction time during inference. We use channel-wise l1/l2 group sparsity in the shared layers of the Convolutional Neural Network (CNN). This approach not only facilitates the elimination of extraneous groups (channels) but also imposes a penalty on the weights, thereby enhancing the learning of all tasks. We compare the outcomes of single-task and multi-task experiments under group sparsity on two publicly available MTL datasets, NYU-v2 and CelebAMask-HQ. We also investigate how changing the sparsification degree impacts both the performance of the model and the sparsity of groups.
    摘要 Translated into Simplified Chinese:群体简洁在机器学习(ML)中鼓励更简单、更易理解的模型,具有更少的活跃参数组。这个工作想要在多任务学习(MTL)框架中 integrate 结构化群体简洁,开发更具有约束的模型,可以更好地处理多个任务,使用 fewer 参数,保持与稠密模型相当或更高的性能。在训练中减少模型的内存占用、计算需求和预测时间。我们使用 CNN 中的通道 wise L1/L2 群体简洁,不仅可以消除无关的通道,还对权重进行约束,从而提高所有任务的学习。我们在 NYU-v2 和 CelebAMask-HQ 两个公共可用的 MTL 数据集上进行单任务和多任务实验,并研究透明度如何随着简洁度的变化而变化。

Generalized Continual Category Discovery

  • paper_url: http://arxiv.org/abs/2308.12112
  • repo_url: None
  • paper_authors: Daniel Marczak, Grzegorz Rypeść, Sebastian Cygert, Tomasz Trzciński, Bartłomiej Twardowski
  • for: 本研究旨在探讨 continual learning (CL) 方法在实际生活中的应用,即一个学习机器人可以在含有未知类和已知类的大量未标注数据下学习新任务。
  • methods: 本研究引入一种新的框架,即通用 continual category discovery (GCCD),允许任务中存在未知和已知类,并使用 continual 无监督学习方法来发现它们。
  • results: 实验结果表明,现有的 CL 方法在面对后续任务中存在未标注样本时表现不佳,而我们提议的方法能够充分利用 supervised 和无监督学习信号,并通过中心点适应来缓解忘记现象,从而达到更高的表征学习性能。
    Abstract Most of Continual Learning (CL) methods push the limit of supervised learning settings, where an agent is expected to learn new labeled tasks and not forget previous knowledge. However, these settings are not well aligned with real-life scenarios, where a learning agent has access to a vast amount of unlabeled data encompassing both novel (entirely unlabeled) classes and examples from known classes. Drawing inspiration from Generalized Category Discovery (GCD), we introduce a novel framework that relaxes this assumption. Precisely, in any task, we allow for the existence of novel and known classes, and one must use continual version of unsupervised learning methods to discover them. We call this setting Generalized Continual Category Discovery (GCCD). It unifies CL and GCD, bridging the gap between synthetic benchmarks and real-life scenarios. With a series of experiments, we present that existing methods fail to accumulate knowledge from subsequent tasks in which unlabeled samples of novel classes are present. In light of these limitations, we propose a method that incorporates both supervised and unsupervised signals and mitigates the forgetting through the use of centroid adaptation. Our method surpasses strong CL methods adopted for GCD techniques and presents a superior representation learning performance.
    摘要 大多数 Continual Learning(CL)方法在supervised learning Setting下进行推广,其中一个agent需要学习新的标记任务而不忘记之前的知识。然而,这些设置并不适合实际生活中的enario,因为一个学习Agent在可以获得大量的无标签数据,包括新的(完全无标签)类和已知类的示例。 Drawing inspiration from Generalized Category Discovery(GCD),我们引入一个新的框架,允许在任务中存在新的和已知的类,并使用continual version of unsupervised learning方法来发现它们。我们称这种设置为Generalized Continual Category Discovery(GCCD)。它将CL和GCD融合在一起,bridging the gap between synthetic benchmarks和实际场景。在一系列实验中,我们发现了现有方法在接下来的任务中不能够从无标签样本中积累知识。在这些限制下,我们提出了一种方法,该方法将supervised和unsupervised信号相互作用,以避免忘记。我们的方法超越了strong CL方法,并在representation learning中表现出优于其他方法。

Constrained Stein Variational Trajectory Optimization

  • paper_url: http://arxiv.org/abs/2308.12110
  • repo_url: None
  • paper_authors: Thomas Power, Dmitry Berenson
  • for: 这个论文的目的是提出一种具有约束的轨迹优化算法,以便在平行的环境中进行轨迹优化。
  • methods: 这个算法使用stein热函数 Gradient Descent (SVGD) 来找到一组粒子,这些粒子可以近似一个低成本轨迹分布,并且遵循约束。
  • results: 这个算法在具有高度约束的任务中表现出色,比如7DoF 瓦辛 manipulate 任务,这个算法成功了20/20次试验,而最近的基eline则只有13/20次。这些结果表明,通过生成多个约束遵循的轨迹,可以增强响应外部干扰和初始化的稳定性。
    Abstract We present Constrained Stein Variational Trajectory Optimization (CSVTO), an algorithm for performing trajectory optimization with constraints on a set of trajectories in parallel. We frame constrained trajectory optimization as a novel form of constrained functional minimization over trajectory distributions, which avoids treating the constraints as a penalty in the objective and allows us to generate diverse sets of constraint-satisfying trajectories. Our method uses Stein Variational Gradient Descent (SVGD) to find a set of particles that approximates a distribution over low-cost trajectories while obeying constraints. CSVTO is applicable to problems with arbitrary equality and inequality constraints and includes a novel particle resampling step to escape local minima. By explicitly generating diverse sets of trajectories, CSVTO is better able to avoid poor local minima and is more robust to initialization. We demonstrate that CSVTO outperforms baselines in challenging highly-constrained tasks, such as a 7DoF wrench manipulation task, where CSVTO succeeds in 20/20 trials vs 13/20 for the closest baseline. Our results demonstrate that generating diverse constraint-satisfying trajectories improves robustness to disturbances and initialization over baselines.
    摘要 我们提出了受限 Stein 变量演化轨迹优化(CSVTO)算法,用于并行执行受限轨迹优化。我们将受限轨迹优化视为一种新的受限函数最小化问题,避免对约束视为对目标函数的罚项,从而可以生成符合约束的多个轨迹。我们使用 Stein 变量梯度下降(SVGD)来找到一组粒子,该粒子 approximates 一个低成本轨迹分布,同时满足约束。CSVTO 适用于具有平等和不平等约束的问题,并包括一种新的粒子重采样步骤,以避免局部最优点。通过显式生成多个约束满足的轨迹,CSVTO 更有可能避免噪音和初始化点的问题,并且更加稳定。我们在一个高度受限的 7DoF 工具把持任务中表现出色,CSVTO 成功完成了 20/20 次试验,而最接近的基准只成功完成了 13/20 次。我们的结果表明,通过生成多个约束满足的轨迹,CSVTO 可以提高对干扰和初始化点的Robustness。

Quantifying degeneracy in singular models via the learning coefficient

  • paper_url: http://arxiv.org/abs/2308.12108
  • repo_url: https://github.com/edmundlth/scalable_learning_coefficient_with_sgld
  • paper_authors: Edmund Lau, Daniel Murfet, Susan Wei
  • for: 这篇论文探讨了深度神经网络(DNN)中的复杂度征性。
  • methods: 这篇论文使用了singular learning theory中的学习系数来量化DNN中的复杂度征性。
  • results: 论文显示了学习系数可以准确地捕捉DNN中的复杂度征性,而不仅仅是计数”平坦”方向。此外,论文还提出了一种可扩展的近似方法,并在低维模型中进行了验证。
    Abstract Deep neural networks (DNN) are singular statistical models which exhibit complex degeneracies. In this work, we illustrate how a quantity known as the \emph{learning coefficient} introduced in singular learning theory quantifies precisely the degree of degeneracy in deep neural networks. Importantly, we will demonstrate that degeneracy in DNN cannot be accounted for by simply counting the number of "flat" directions. We propose a computationally scalable approximation of a localized version of the learning coefficient using stochastic gradient Langevin dynamics. To validate our approach, we demonstrate its accuracy in low-dimensional models with known theoretical values. Importantly, the local learning coefficient can correctly recover the ordering of degeneracy between various parameter regions of interest. An experiment on MNIST shows the local learning coefficient can reveal the inductive bias of stochastic opitmizers for more or less degenerate critical points.
    摘要

Cached Operator Reordering: A Unified View for Fast GNN Training

  • paper_url: http://arxiv.org/abs/2308.12093
  • repo_url: None
  • paper_authors: Julia Bazinska, Andrei Ivanov, Tal Ben-Nun, Nikoli Dryden, Maciej Besta, Siyuan Shen, Torsten Hoefler
  • for: 这篇论文旨在提出一种基于图 neural network 的性能优化策略,以提高大规模图神经网络模型的训练速度。
  • methods: 该论文使用了一种统一的视图来分析图 convolutional network 和 graph attention 层的计算图,并提出了一些代替计算策略。包括了自适应操作重新排序与缓存,可以达到 GCN 的速度提高最多 2.43倍。
  • results: 该论文的提出的优化策略可以降低内存占用量,易于实现在不同硬件平台上,并有望缓解大规模图神经网络模型训练中的性能瓶颈。
    Abstract Graph Neural Networks (GNNs) are a powerful tool for handling structured graph data and addressing tasks such as node classification, graph classification, and clustering. However, the sparse nature of GNN computation poses new challenges for performance optimization compared to traditional deep neural networks. We address these challenges by providing a unified view of GNN computation, I/O, and memory. By analyzing the computational graphs of the Graph Convolutional Network (GCN) and Graph Attention (GAT) layers -- two widely used GNN layers -- we propose alternative computation strategies. We present adaptive operator reordering with caching, which achieves a speedup of up to 2.43x for GCN compared to the current state-of-the-art. Furthermore, an exploration of different caching schemes for GAT yields a speedup of up to 1.94x. The proposed optimizations save memory, are easily implemented across various hardware platforms, and have the potential to alleviate performance bottlenecks in training large-scale GNN models.
    摘要 граф нейронные сети(GNNs)是一种功能强大的工具,用于处理结构化图数据并解决节点分类、图分类和聚类等任务。然而,GNN计算的稀疏性带来了性能优化的新挑战,与传统深度神经网络相比。我们通过提供一个统一的视图来解决这些挑战,包括计算图、输入/输出和内存。我们通过分析GCN和GAT层的计算图来提出代码执行策略。我们的方案包括自适应运算重新排序与缓存,可以在GCN中实现速度提升达2.43倍,而且对GAT也可以实现速度提升达1.94倍。这些优化可以降低内存占用量,易于在不同硬件平台上实现,并有可能解决训练大规模GNN模型的性能瓶颈。

Stabilizing RNN Gradients through Pre-training

  • paper_url: http://arxiv.org/abs/2308.12075
  • repo_url: None
  • paper_authors: Luca Herranz-Celotti, Jean Rouat
  • for: 防止梯度变幅呈指数增长,稳定和改善训练
  • methods: 预训练网络到本地稳定性,扩展已知稳定理论覆盖更广泛的深度循环网络
  • results: 预训练Feed-forward和循环网络达到本地稳定性后,final表现通常提高,提供一种稳定网络的方法,可以作为大规模增强数据集预训练的一步。
    Abstract Numerous theories of learning suggest to prevent the gradient variance from exponential growth with depth or time, to stabilize and improve training. Typically, these analyses are conducted on feed-forward fully-connected neural networks or single-layer recurrent neural networks, given their mathematical tractability. In contrast, this study demonstrates that pre-training the network to local stability can be effective whenever the architectures are too complex for an analytical initialization. Furthermore, we extend known stability theories to encompass a broader family of deep recurrent networks, requiring minimal assumptions on data and parameter distribution, a theory that we refer to as the Local Stability Condition (LSC). Our investigation reveals that the classical Glorot, He, and Orthogonal initialization schemes satisfy the LSC when applied to feed-forward fully-connected neural networks. However, analysing deep recurrent networks, we identify a new additive source of exponential explosion that emerges from counting gradient paths in a rectangular grid in depth and time. We propose a new approach to mitigate this issue, that consists on giving a weight of a half to the time and depth contributions to the gradient, instead of the classical weight of one. Our empirical results confirm that pre-training both feed-forward and recurrent networks to fulfill the LSC often results in improved final performance across models. This study contributes to the field by providing a means to stabilize networks of any complexity. Our approach can be implemented as an additional step before pre-training on large augmented datasets, and as an alternative to finding stable initializations analytically.
    摘要 多种学习理论建议防止梯度变异的暴力增长,以稳定和改善训练。通常,这些分析是在 fully-connected 神经网络或单层循环神经网络上进行的,由于它们的数学可读性。然而,这项研究表明,在网络 Architecture 太复杂,无法进行分析初始化时,可以预训练网络到地方稳定性。此外,我们扩展了已知稳定性理论,以涵盖更广泛的深度循环神经网络,无需对数据和参数分布做出多少假设。我们称之为地方稳定条件(LSC)。我们的调查表明,经典的 Glorot、He 和正交 initialization 方案在 fully-connected 神经网络上满足 LSC。然而,对深度循环神经网络进行分析时,我们发现了一种新的梯度爆炸源,即在深度和时间方向上计数梯度路径的暴力增长。我们提出一种新的方法来解决这个问题,即在时间和深度方向上给梯度分配一半的权重,而不是经典的权重为1。我们的实验结果表明,预训练 feed-forward 和循环神经网络以满足 LSC frequently 会导致最终性能提高。这项研究对深度学习领域做出了贡献,提供了一种可以稳定任何复杂性的网络。我们的方法可以作为大量增强数据预训练的附加步骤,以及analytical初始化的替代方案。

Identifying Reaction-Aware Driving Styles of Stochastic Model Predictive Controlled Vehicles by Inverse Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2308.12069
  • repo_url: None
  • paper_authors: Ni Dang, Tao Shi, Zengjie Zhang, Wanxin Jin, Marion Leibold, Martin Buss
  • for: 这个论文目的是提出一种基于最大熵逆反奖学习(ME-IRL)方法来识别自动驾驶汽车(AV)的驾驶风格。
  • methods: 该论文使用了ME-IRL方法,并添加了一些新的特征来捕捉AV的反应情况。
  • results: 该论文通过使用修改后的ME-IRL方法和新提出的特征,成功地识别了在多辆自动驾驶汽车系统中AV的驾驶风格。
    Abstract The driving style of an Autonomous Vehicle (AV) refers to how it behaves and interacts with other AVs. In a multi-vehicle autonomous driving system, an AV capable of identifying the driving styles of its nearby AVs can reliably evaluate the risk of collisions and make more reasonable driving decisions. However, there has not been a consistent definition of driving styles for an AV in the literature, although it is considered that the driving style is encoded in the AV's trajectories and can be identified using Maximum Entropy Inverse Reinforcement Learning (ME-IRL) methods as a cost function. Nevertheless, an important indicator of the driving style, i.e., how an AV reacts to its nearby AVs, is not fully incorporated in the feature design of previous ME-IRL methods. In this paper, we describe the driving style as a cost function of a series of weighted features. We design additional novel features to capture the AV's reaction-aware characteristics. Then, we identify the driving styles from the demonstration trajectories generated by the Stochastic Model Predictive Control (SMPC) using a modified ME-IRL method with our newly proposed features. The proposed method is validated using MATLAB simulation and an off-the-shelf experiment.
    摘要 自动驾驶车(AV)的驾驶方式指的是它如何行驶和与其他AV交互。在多辆自动驾驶车系统中,能够识别附近AV的驾驶方式的AV可以更加可靠地评估碰撞风险并做出更加合理的驾驶决策。然而,在文献中没有一致的定义自动驾驶车的驾驶方式,尽管认为驾驶方式是AV的轨迹中编码的。在前一些ME-IRL方法中,不完全包含AV对附近AV的反应的重要指标。在这篇论文中,我们定义了AV的驾驶方式为一系列加权特征的成本函数。我们还设计了新的反应意义特征来捕捉AV的反应特征。然后,我们使用修改后的ME-IRL方法和我们新提出的特征来识别示例轨迹,并从示例轨迹中提取出驾驶方式。我们的方法在MATLAB模拟和一个off-the-shelf实验中得到验证。

InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4

  • paper_url: http://arxiv.org/abs/2308.12067
  • repo_url: None
  • paper_authors: Lai Wei, Zihao Jiang, Weiran Huang, Lichao Sun
  • for: This paper aims to improve the instruction-following capabilities of multimodal large language models by fine-tuning them on high-quality, limited data.
  • methods: The authors use a two-stage training process, pre-training on image-text pairs and fine-tuning on supervised vision-language instruction data, and propose several metrics to evaluate the quality of multimodal instruction data.
  • results: The fine-tuned model, InstructionGPT-4, outperforms the original MiniGPT-4 on various evaluations, demonstrating that less but high-quality instruction tuning data can be efficient in enabling multimodal large language models to generate better output.Here’s the Chinese version:
  • for: 本研究目的是提高多modal大型自然语言模型的 instrucion-following 能力,通过高质量但有限的数据进行微调。
  • methods: 作者使用两stage训练过程,首先在图片文本对的 pré-training 中进行训练,然后在supervised vision-language instrucion数据上进行微调。同时,作者提出了多种métricas来评估多modal instrucion数据的质量。
  • results: 微调后的模型InstructionGPT-4,在多种评估中超过原始miniGPT-4,表明less pero高质量的 instrucion tuning 数据可以有效地使多modal大型自然语言模型生成更好的输出。
    Abstract Multimodal large language models acquire their instruction-following capabilities through a two-stage training process: pre-training on image-text pairs and fine-tuning on supervised vision-language instruction data. Recent studies have shown that large language models can achieve satisfactory results even with a limited amount of high-quality instruction-following data. In this paper, we introduce InstructionGPT-4, which is fine-tuned on a small dataset comprising only 200 examples, amounting to approximately 6% of the instruction-following data used in the alignment dataset for MiniGPT-4. We first propose several metrics to access the quality of multimodal instruction data. Based on these metrics, we present a simple and effective data selector to automatically identify and filter low-quality vision-language data. By employing this method, InstructionGPT-4 outperforms the original MiniGPT-4 on various evaluations (e.g., visual question answering, GPT-4 preference). Overall, our findings demonstrate that less but high-quality instruction tuning data is efficient to enable multimodal large language models to generate better output.
    摘要 多模式大语言模型通过两个阶段训练过程获得指令遵循能力:在图像文本对的预训练后,进行精度的视觉语言指令数据的精度训练。现在的研究表明,大语言模型可以通过有限量的高质量指令遵循数据来获得满意的结果。在这篇论文中,我们介绍了InstructionGPT-4,它通过一个小样本集(约6%的指令遵循数据)进行了精度训练。我们首先提出了一些度量视觉语言指令数据质量的指标,然后提出了一种简单有效的数据选择器,可以自动地从视觉语言数据中选择和过滤低质量的数据。通过这种方法,InstructionGPT-4在不同的评估(如视觉问答)中表现出色,超过了原始的MiniGPT-4。总之,我们的发现表明, fewer but high-quality instruction tuning data is efficient to enable multimodal large language models to generate better output.

Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference

  • paper_url: http://arxiv.org/abs/2308.12066
  • repo_url: None
  • paper_authors: Ranggi Hwang, Jianyu Wei, Shijie Cao, Changho Hwang, Xiaohu Tang, Ting Cao, Mao Yang, Minsoo Rhu
  • for: 该研究旨在解决基于转换器的大语言模型(LLM)中的计算和存储需求问题,以提高其算法性能。
  • methods: 该研究使用了 Mixture-of-Experts(MoE)架构,并提出了一种名为 Pre-gated MoE 的新系统。该系统通过我们的算法-系统合理设计,可以有效地解决传统 MoE 架构中的计算和存储挑战。
  • results: 我们的 Pre-gated MoE 系统可以提高性能,降低 GPU 内存占用量,同时保持模型质量不变。这些特点使我们的 Pre-gated MoE 系统可以经济地部署大规模的 LLM,只需要一个 GPU 高性能。
    Abstract Large language models (LLMs) based on transformers have made significant strides in recent years, the success of which is driven by scaling up their model size. Despite their high algorithmic performance, the computational and memory requirements of LLMs present unprecedented challenges. To tackle the high compute requirements of LLMs, the Mixture-of-Experts (MoE) architecture was introduced which is able to scale its model size without proportionally scaling up its computational requirements. Unfortunately, MoE's high memory demands and dynamic activation of sparse experts restrict its applicability to real-world problems. Previous solutions that offload MoE's memory-hungry expert parameters to CPU memory fall short because the latency to migrate activated experts from CPU to GPU incurs high performance overhead. Our proposed Pre-gated MoE system effectively tackles the compute and memory challenges of conventional MoE architectures using our algorithm-system co-design. Pre-gated MoE employs our novel pre-gating function which alleviates the dynamic nature of sparse expert activation, allowing our proposed system to address the large memory footprint of MoEs while also achieving high performance. We demonstrate that Pre-gated MoE is able to improve performance, reduce GPU memory consumption, while also maintaining the same level of model quality. These features allow our Pre-gated MoE system to cost-effectively deploy large-scale LLMs using just a single GPU with high performance.
    摘要 Previous solutions that offload MoE's memory-hungry expert parameters to CPU memory are not effective because the latency to migrate activated experts from CPU to GPU incurs high performance overhead. Our proposed Pre-gated MoE system effectively tackles the compute and memory challenges of conventional MoE architectures using our algorithm-system co-design.Pre-gated MoE employs a novel pre-gating function that alleviates the dynamic nature of sparse expert activation, allowing our proposed system to address the large memory footprint of MoEs while also achieving high performance. We demonstrate that Pre-gated MoE can improve performance, reduce GPU memory consumption, and maintain the same level of model quality. These features allow our Pre-gated MoE system to cost-effectively deploy large-scale LLMs using just a single GPU with high performance.

Ensembling Uncertainty Measures to Improve Safety of Black-Box Classifiers

  • paper_url: http://arxiv.org/abs/2308.12065
  • repo_url: None
  • paper_authors: Tommaso Zoppi, Andrea Ceccarelli, Andrea Bondavalli
  • for: 提高机器学习模型的安全性,避免因误分类而导致的系统故障。
  • methods: 使用ensemble of uncertainty measures计算出输入和输出uncertainty measures,如果检测到误分类,则阻止输出的传播到后续系统。
  • results: 能够准确地检测机器学习模型的误分类,并将误分类转化为数据排除错误,这有助于提高系统的安全性。
    Abstract Machine Learning (ML) algorithms that perform classification may predict the wrong class, experiencing misclassifications. It is well-known that misclassifications may have cascading effects on the encompassing system, possibly resulting in critical failures. This paper proposes SPROUT, a Safety wraPper thROugh ensembles of UncertainTy measures, which suspects misclassifications by computing uncertainty measures on the inputs and outputs of a black-box classifier. If a misclassification is detected, SPROUT blocks the propagation of the output of the classifier to the encompassing system. The resulting impact on safety is that SPROUT transforms erratic outputs (misclassifications) into data omission failures, which can be easily managed at the system level. SPROUT has a broad range of applications as it fits binary and multi-class classification, comprising image and tabular datasets. We experimentally show that SPROUT always identifies a huge fraction of the misclassifications of supervised classifiers, and it is able to detect all misclassifications in specific cases. SPROUT implementation contains pre-trained wrappers, it is publicly available and ready to be deployed with minimal effort.
    摘要 机器学习(ML)算法可能会预测错误的类别,导致错误分类。这是已知的现象,错误分类可能会带来整个系统的崩溃。这篇论文提议了“植物”(SPROUT),它是一种安全包装,通过多个不确定度度量来怀疑错误分类。如果检测到错误分类,SPROUT会阻止分类器的输出传递给包含系统。这将导致安全性的改善,因为SPROUT将抖动输出(错误分类)转换为数据检查失败,这可以轻松地在系统 nivel handle。SPROUT适用于 binary 和多类分类,包括图像和表格数据集。我们实验表明,SPROUT总能够检测大量错误分类,并且在某些情况下可以检测所有错误分类。SPROUT 实现包括预训练包装,可公共释放,准备好使用,需 minimum 的努力。

HarvestNet: A Dataset for Detecting Smallholder Farming Activity Using Harvest Piles and Remote Sensing

  • paper_url: http://arxiv.org/abs/2308.12061
  • repo_url: None
  • paper_authors: Jonathan Xu, Amna Elmustafa, Liya Weldegebriel, Emnet Negash, Richard Lee, Chenlin Meng, Stefano Ermon, David Lobell
  • For: The paper aims to improve the accuracy of cropland mapping in smallholder farming regions, specifically in sub-Saharan Africa.* Methods: The authors use a new approach based on the detection of harvest piles, which are characteristic of many smallholder systems. They use expert knowledge and satellite images to collect a dataset called HarvestNet, which includes 7,000 hand-labeled images and 2,000 ground-collected labels. They also benchmark a set of baseline models, including state-of-the-art remote sensing models.* Results: The authors achieve high accuracy performance on the hand-labeled data (around 80%) and ground truth data (90% and 98% accuracy in Tigray and Amhara, respectively). They also visually compare their model with a pre-existing coverage map and show that it detects an additional 56,621 hectares of cropland in Tigray. The authors conclude that remote sensing of harvest piles can contribute to more timely and accurate cropland assessments in food-insecure regions.Here’s the format you requested:* For: 增强小农场地区农地评估的精度* Methods: 利用収获堆特征进行农地探测,使用专家知识和卫星图像实现数据库HarvestNet的收集,包括7,000个手动标注图像和2,000个地面点标测数据。 benchmark一些基准模型,包括顶尖距离感知模型。* Results: 取得手动标注数据(约80%)和真实数据(90%和98%的正确率在提前的Tigray和Amhara地区)。 进一步比较自己的模型与已有的覆盖地图,发现自己的模型可以检测到提前的56,621公顷农地在Tigray。
    Abstract Small farms contribute to a large share of the productive land in developing countries. In regions such as sub-Saharan Africa, where 80% of farms are small (under 2 ha in size), the task of mapping smallholder cropland is an important part of tracking sustainability measures such as crop productivity. However, the visually diverse and nuanced appearance of small farms has limited the effectiveness of traditional approaches to cropland mapping. Here we introduce a new approach based on the detection of harvest piles characteristic of many smallholder systems throughout the world. We present HarvestNet, a dataset for mapping the presence of farms in the Ethiopian regions of Tigray and Amhara during 2020-2023, collected using expert knowledge and satellite images, totaling 7k hand-labeled images and 2k ground collected labels. We also benchmark a set of baselines including SOTA models in remote sensing with our best models having around 80% classification performance on hand labelled data and 90%, 98% accuracy on ground truth data for Tigray, Amhara respectively. We also perform a visual comparison with a widely used pre-existing coverage map and show that our model detects an extra 56,621 hectares of cropland in Tigray. We conclude that remote sensing of harvest piles can contribute to more timely and accurate cropland assessments in food insecure region.
    摘要 小型农场在发展国家占据大量生产土地。如在亚фри半沙漠地区,80%的农场面积在2公顷以下。将小农场易于易识别的特征——即收割堆——作为易识别特征,我们提出了一种新的方法。我们发布了一个名为“卫星网”的数据集,用于在2020-2023年的补做期间在埃塞俄比亚地区的提格雷和阿姆哈拉地区进行农场易识别。我们收集了7000张专家知识和卫星图像的手动标注图像和2000张地面收集的标签。我们还对一些最新的遥感领域的标准模型进行了比较,并发现我们的最佳模型在手动标注数据上达到80%的分类性能,而在地面真实数据上达到90%、98%的准确率在提格雷和阿姆哈拉地区。我们还对一个广泛使用的现有的覆盖地图进行了视觉比较,并发现我们的模型可以检测到提格雷地区的56621公顷的农地。我们认为遥感卫星堆核可以为食品不安全地区提供更加准确和及时的农地评估。

Manipulating Embeddings of Stable Diffusion Prompts

  • paper_url: http://arxiv.org/abs/2308.12059
  • repo_url: https://github.com/webis-de/arxiv23-prompt-embedding-manipulation
  • paper_authors: Niklas Deckers, Julia Peters, Martin Potthast
  • for: This paper focuses on improving the process of generating images using text-to-image models, specifically by changing the embedding of a prompt instead of the prompt text.
  • methods: The proposed method treats the generative text-to-image model as a continuous function and passes gradients between the image space and the prompt embedding space, allowing for more fine-grained and targeted control of the generated images based on user intentions.
  • results: The authors demonstrate the feasibility of the proposed methods through experiments in three scenarios: optimization of a metric defined in image space, assistance of users in creative tasks, and inclusion of information in the prompt that the user has seen but finds difficult to describe.
    Abstract Generative text-to-image models such as Stable Diffusion allow users to generate images based on a textual description, the prompt. Changing the prompt is still the primary means for the user to change a generated image as desired. However, changing the image by reformulating the prompt remains a difficult process of trial and error, which has led to the emergence of prompt engineering as a new field of research. We propose and analyze methods to change the embedding of a prompt directly instead of the prompt text. It allows for more fine-grained and targeted control that takes into account user intentions. Our approach treats the generative text-to-image model as a continuous function and passes gradients between the image space and the prompt embedding space. By addressing different user interaction problems, we can apply this idea in three scenarios: (1) Optimization of a metric defined in image space that could measure, for example, image style. (2) Assistance of users in creative tasks by enabling them to navigate the image space along a selection of directions of "near" prompt embeddings. (3) Changing the embedding of the prompt to include information that the user has seen in a particular seed but finds difficult to describe in the prompt. Our experiments demonstrate the feasibility of the described methods.
    摘要 <>将文本描述转换为生成图像的模型,如稳定扩散,允许用户根据文本描述生成图像。但是,通过修改描述仍然是用户 cambio 图像的主要方式。然而,通过修改描述来改变图像仍然是一种困难的过程,这导致了描述工程的出现。我们提出和分析了直接改变描述 embedding 的方法,允许更细化和targeted控制,考虑用户的意图。我们的方法将生成文本到图像模型看作是连续函数,将gradients传递 между图像空间和描述embedding空间。通过解决不同的用户交互问题,我们可以应用这个想法在以下三个场景中:1. 图像空间中定义的一个度量的优化,例如图像风格。2. 用户在创作任务中的帮助,让他们可以在"near"描述 embeddings 方向上导航图像空间。3. 将描述 embedding 包含用户看到的特定种子中的信息,但用户Difficult to describe in the prompt。我们的实验表明这种方法的可行性。

Sample Complexity of Robust Learning against Evasion Attacks

  • paper_url: http://arxiv.org/abs/2308.12054
  • repo_url: None
  • paper_authors: Pascale Gourdeau
  • For: The paper is written to study the feasibility of adversarially robust learning from the perspective of learning theory, specifically considering sample complexity.* Methods: The paper uses the exact-in-the-ball notion of robustness and explores the setting where the learner has access to random examples only, as well as learning models where the learner is given more power through local membership queries and local equivalence query oracles.* Results: The paper shows that robustly learning monotone conjunctions has sample complexity at least exponential in the adversary’s budget, but if the adversary is restricted to perturbing $O(\log n)$ bits, then one can robustly learn conjunctions and decision lists w.r.t. log-Lipschitz distributions. Additionally, the paper develops robust empirical risk minimization algorithms in the distribution-free setting with general query complexity upper and lower bounds, as well as for concrete concept classes.
    Abstract It is becoming increasingly important to understand the vulnerability of machine learning models to adversarial attacks. One of the fundamental problems in adversarial machine learning is to quantify how much training data is needed in the presence of evasion attacks, where data is corrupted at test time. In this thesis, we work with the exact-in-the-ball notion of robustness and study the feasibility of adversarially robust learning from the perspective of learning theory, considering sample complexity. We first explore the setting where the learner has access to random examples only, and show that distributional assumptions are essential. We then focus on learning problems with distributions on the input data that satisfy a Lipschitz condition and show that robustly learning monotone conjunctions has sample complexity at least exponential in the adversary's budget (the maximum number of bits it can perturb on each input). However, if the adversary is restricted to perturbing $O(\log n)$ bits, then one can robustly learn conjunctions and decision lists w.r.t. log-Lipschitz distributions. We then study learning models where the learner is given more power. We first consider local membership queries, where the learner can query the label of points near the training sample. We show that, under the uniform distribution, the exponential dependence on the adversary's budget to robustly learn conjunctions remains inevitable. We then introduce a local equivalence query oracle, which returns whether the hypothesis and target concept agree in a given region around a point in the training sample, and a counterexample if it exists. We show that if the query radius is equal to the adversary's budget, we can develop robust empirical risk minimization algorithms in the distribution-free setting. We give general query complexity upper and lower bounds, as well as for concrete concept classes.
    摘要 “机器学习模型对抗攻击的漏洞日益受到重视。一个基本问题在于在恶意攻击下执行机器学习的时候,需要多少训练数据。在这些thesis中,我们使用精确在球体上的敏感度来衡量机器学习模型的Robustness,并从学习理论的角度进行研究,考虑体积 Complexity。”“我们首先假设学习者只有随机样本的存在,并证明了需要Distributional assumptions。然后,我们对input数据的分布做出了Lipschitz条件,并证明了在恶意攻击下,可以实现精确地学习几何联合和决策表。但是,如果攻击者只能对每个入力进行O(log n)位数据的变更,那么可以精确地学习几何联合和决策表。”“然后,我们研究了学习模型,学习者可以更多的力量。我们首先考虑了本地会员查询,学习者可以查询训练样本附近的标签。我们证明了,在均匀分布下,随着攻击者的预算增加,精确地学习几何联合的 exponential dependence是不可避免的。然后,我们引入了本地相等查询或acles,它返回训练样本附近的标签是否相同,如果存在对立的示例,则返回对立的示例。我们证明了,如果查询半径等于攻击者的预算,那么可以在分布无法设定的情况下开发精确的empirical risk minimization算法。我们给出了一般的查询复杂度上下文和具体的概念类别的上下文。

Layer-wise Feedback Propagation

  • paper_url: http://arxiv.org/abs/2308.12053
  • repo_url: None
  • paper_authors: Leander Weber, Jim Berend, Alexander Binder, Thomas Wiegand, Wojciech Samek, Sebastian Lapuschkin
  • for: 本研究专项是一种基于解释的训练方法,用于对神经网络类模型进行训练,并使用层别反映传播(LRP)来评估个别连接的贡献度。
  • methods: 本方法使用层别反映传播(LRP)来分配奖励到个别连接,而不需要计算Gradient。LFP会将奖励信号分布到模型中,强化获得正面奖励的结构,同时抑制获得负面奖励的结构。
  • results: 本研究 theoretically和empirically证明LFP的稳定性和有效性,并在不同的模型和数据集上显示LFP可以实现与梯度下降相同的性能。此外,LFP可以超越一些相关的限制,例如对梯度 Computation的依赖。
    Abstract In this paper, we present Layer-wise Feedback Propagation (LFP), a novel training approach for neural-network-like predictors that utilizes explainability, specifically Layer-wise Relevance Propagation(LRP), to assign rewards to individual connections based on their respective contributions to solving a given task. This differs from traditional gradient descent, which updates parameters towards anestimated loss minimum. LFP distributes a reward signal throughout the model without the need for gradient computations. It then strengthens structures that receive positive feedback while reducingthe influence of structures that receive negative feedback. We establish the convergence of LFP theoretically and empirically, and demonstrate its effectiveness in achieving comparable performance to gradient descent on various models and datasets. Notably, LFP overcomes certain limitations associated with gradient-based methods, such as reliance on meaningful derivatives. We further investigate how the different LRP-rules can be extended to LFP, what their effects are on training, as well as potential applications, such as training models with no meaningful derivatives, e.g., step-function activated Spiking Neural Networks (SNNs), or for transfer learning, to efficiently utilize existing knowledge.
    摘要 在这篇论文中,我们介绍层wise Feedback Propagation(LFP),一种使用解释性来训练神经网络模型的新的训练方法。LFP使用层wise Relevance Propagation(LRP)来将权重分配给各个连接,以便在解决特定任务时评估它们的贡献。这与传统的梯度下降不同,梯度下降会更新参数向估计损失函数的最小值。LFP不需要梯度计算,可以在模型中分布奖励信号。它会强化接收正面奖励的结构,而减少接收负面奖励的结构的影响。我们证明LFP的 converges theoretically and empirically,并在不同模型和数据集上证明其效果相当于梯度下降。此外,LFP可以超越梯度基本方法的一些局限性,例如依赖于有意义的导数。我们还研究了不同LRP规则如何扩展到LFP,以及它们在训练中的影响,以及潜在应用,如训练没有有意义导数的模型,例如步函数激活的神经网络(SNN),或者用于转移学习,以有效地利用现有的知识。

A multiobjective continuation method to compute the regularization path of deep neural networks

  • paper_url: http://arxiv.org/abs/2308.12044
  • repo_url: https://github.com/aamakor/continuation-method
  • paper_authors: Augustina C. Amakor, Konstantin Sonntag, Sebastian Peitz
  • for: 该论文旨在提高深度神经网络(DNNs)中的稀疏性,以提高数值效率、提高模型解释性和鲁棒性。
  • methods: 该论文使用了对律学习方法的扩展,将empirical loss和稀疏性($\ell^1$ norm)作为两个矛盾的目标,并通过多目标优化问题来解决。
  • results: 该论文提出了一种高效地计算Pareto前面的算法,并通过数据示例和推论表明了知道Regularization path可以为神经网络参数化提供好的generalization。
    Abstract Sparsity is a highly desired feature in deep neural networks (DNNs) since it ensures numerical efficiency, improves the interpretability of models (due to the smaller number of relevant features), and robustness. In machine learning approaches based on linear models, it is well known that there exists a connecting path between the sparsest solution in terms of the $\ell^1$ norm (i.e., zero weights) and the non-regularized solution, which is called the regularization path. Very recently, there was a first attempt to extend the concept of regularization paths to DNNs by means of treating the empirical loss and sparsity ($\ell^1$ norm) as two conflicting criteria and solving the resulting multiobjective optimization problem. However, due to the non-smoothness of the $\ell^1$ norm and the high number of parameters, this approach is not very efficient from a computational perspective. To overcome this limitation, we present an algorithm that allows for the approximation of the entire Pareto front for the above-mentioned objectives in a very efficient manner. We present numerical examples using both deterministic and stochastic gradients. We furthermore demonstrate that knowledge of the regularization path allows for a well-generalizing network parametrization.
    摘要 <>将文本翻译成简化中文。<>深度神经网络(DNN)中的稀疏性是一个非常受欢迎的特性,因为它确保了数学效率,提高了模型的解释性(由于更少的相关特征),并且提高了模型的稳定性。在基于线性模型的机器学习方法中,已经知道存在一条连接 PATH между最稀疏的解决方案(以 $\ell^1$ norm 为零)和非REGULARIZED 解决方案,这个 PATH 被称为REGULARIZATION PATH。很少时候,有一次尝试将REGULARIZATION PATH 扩展到 DNN 中。通过对 empirical loss 和稀疏性($\ell^1$ norm)作为两个矛盾的目标,解决 resulting 多目标优化问题。但由于 $\ell^1$ norm 的不整合性和参数的高数量,这种方法不是很有效率。为了解决这些限制,我们提出了一种可以很快地 Approximation 整个 Pareto front 的算法。我们通过 deterministic 和 Stochastic 梯度进行数值示例,并证明了知道REGULARIZATION PATH 可以获得一个很好地 Parametrization 的网络。

IncreLoRA: Incremental Parameter Allocation Method for Parameter-Efficient Fine-tuning

  • paper_url: http://arxiv.org/abs/2308.12043
  • repo_url: https://github.com/feiyuzhang98/increlora
  • paper_authors: Feiyu Zhang, Liangzhi Li, Junhao Chen, Zhouqiang Jiang, Bowen Wang, Yiming Qian
  • for: 这个研究是为了提高大型预训语言模型(PLM)的优化效率,特别是当有许多下游任务时,避免过度训练和储存成本。
  • methods: 本研究使用了低维适应(LoRA)方法,将适应矩阵注入到每个目标模组中。但LoRA忽略了各模组参数的重要性。为解决这个问题,许多过滤参数的方法已经被提出,但这些方法仍然受限于固定的训练条件下的最大维度上界。我们因此提出了IncreLoRA方法,它在训练过程中逐步增加可变参数,根据各模组的重要性分数。这种方法与删除参数方法不同,因为它不受限于初始训练参数数量,每个参数矩阵的最大维度上界较高。
  • results: 我们在GLUE数据集上进行了广泛的实验,结果显示了IncreLoRA方法的效iveness。相比于基eline,我们的方法在资源充足情况下表现出更高的参数效率,特别是在资源有限的情况下,我们的方法具有明显的优势。我们的代码公开可用。
    Abstract With the increasing size of pre-trained language models (PLMs), fine-tuning all the parameters in the model is not efficient, especially when there are a large number of downstream tasks, which incur significant training and storage costs. Many parameter-efficient fine-tuning (PEFT) approaches have been proposed, among which, Low-Rank Adaptation (LoRA) is a representative approach that injects trainable rank decomposition matrices into every target module. Yet LoRA ignores the importance of parameters in different modules. To address this problem, many works have been proposed to prune the parameters of LoRA. However, under limited training conditions, the upper bound of the rank of the pruned parameter matrix is still affected by the preset values. We, therefore, propose IncreLoRA, an incremental parameter allocation method that adaptively adds trainable parameters during training based on the importance scores of each module. This approach is different from the pruning method as it is not limited by the initial number of training parameters, and each parameter matrix has a higher rank upper bound for the same training overhead. We conduct extensive experiments on GLUE to demonstrate the effectiveness of IncreLoRA. The results show that our method owns higher parameter efficiency, especially when under the low-resource settings where our method significantly outperforms the baselines. Our code is publicly available.
    摘要 随着语言模型的大小不断增加,训练所有模型参数是不可能efficient的,特别是当有大量下游任务时,训练和存储成本会增加显著。许多parameter-efficient fine-tuning(PEFT)方法已经被提出,其中LoRA(low-rank adaptation)是一种常见的方法,它在目标模块中添加可训练的rank划分矩阵。然而,LoRA忽视了模块中参数的重要性。为解决这个问题,许多工作已经提出了对LoRA的参数剪枝。然而,在有限的训练条件下,剪枝后参数矩阵的最大核数仍受到先前设置的值的影响。我们因此提出了IncreLoRA,一种逐步分配参数的方法,它在训练过程中根据模块的重要性分配参数。这种方法与剪枝方法不同,因为它不受限于初始训练参数的数量,并且每个参数矩阵的最大核数 upper bound 是同样的训练负担下得到的更高。我们在GLUE上进行了广泛的实验,结果表明,我们的方法具有更高的参数效率,特别是在资源短缺的情况下,我们的方法明显超越了基eline。我们的代码公开可用。

CACTUS: a Comprehensive Abstraction and Classification Tool for Uncovering Structures

  • paper_url: http://arxiv.org/abs/2308.12031
  • repo_url: None
  • paper_authors: Luca Gherardini, Varun Ravi Varma, Karol Capala, Roger Woods, Jose Sousa
  • for: 这篇论文是为了提高安全分析的可靠性和效率而设计的,它使用可解释人工智能技术来解释模型的决策过程。
  • methods: 这篇论文使用了CACTUS模型,它可以有效地处理小型数据集,并且可以保留 categorical 特征的原始含义,提高内存使用率和并行计算速度。
  • results: 在应用于 wisconsin 诊断乳腺癌和 thyroid0387 数据集上,CACTUS 模型表现出色,可以快速地分类数据并提供有用的特征频率和排名结果。
    Abstract The availability of large data sets is providing an impetus for driving current artificial intelligent developments. There are, however, challenges for developing solutions with small data sets due to practical and cost-effective deployment and the opacity of deep learning models. The Comprehensive Abstraction and Classification Tool for Uncovering Structures called CACTUS is presented for improved secure analytics by effectively employing explainable artificial intelligence. It provides additional support for categorical attributes, preserving their original meaning, optimising memory usage, and speeding up the computation through parallelisation. It shows to the user the frequency of the attributes in each class and ranks them by their discriminative power. Its performance is assessed by application to the Wisconsin diagnostic breast cancer and Thyroid0387 data sets.
    摘要 大量数据的可用性正在驱动当前人工智能的发展。然而,使用小数据集构建解释性人工智能解决方案存在实用和成本效益的挑战。本文提出了一种名为CACTUS的全面抽象和分类工具,用于提高安全分析。它可以有效地使用可解释人工智能,并且支持分类属性,保持原始含义,优化内存使用和并行计算,以提高计算速度。它还可以为用户显示每个类别的属性频率,并将其排序为推理力。本文通过应用于 wisconsin 诊断乳腺癌和 thyroid0387 数据集来评估其性能。

Prompt-Based Length Controlled Generation with Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2308.12030
  • repo_url: None
  • paper_authors: Renlong Jie, Xiaojun Meng, Lifeng Shang, Xin Jiang, Qun Liu
  • for: 提高 GPT 类模型的Length controlled generation能力,以满足不同场景中的需求
  • methods: 使用奖励学习的方法,通过给定的奖励模型来评估并改进 GPT 类模型的生成结果的长度
  • results: 在 популяр的数据集上 Summarization 任务上,我们的方法可以准确地控制 GPT 类模型的生成结果的长度,并且在某些情况下可以提高准确率。
    Abstract Recently, large language models (LLMs) like ChatGPT and GPT-4 have attracted great attention given their surprising improvement and performance. Length controlled generation of LLMs emerges as an important topic, which also enables users to fully leverage the capability of LLMs in more real-world scenarios like generating a proper answer or essay of a desired length. In addition, the autoregressive generation in LLMs is extremely time-consuming, while the ability of controlling this generated length can arbitrarily reduce the inference cost by limiting the length, and thus satisfy different needs. Therefore, we aim to propose a prompt-based length control method to achieve this length controlled generation, which can also be widely applied in GPT-style LLMs. In particular, we adopt reinforcement learning with the reward signal given by either trainable or rule-based reward model, which further affects the generation of LLMs via rewarding a pre-defined target length. Experiments show that our method significantly improves the accuracy of prompt-based length control for summarization task on popular datasets like CNNDM and NYT. We believe this length-controllable ability can provide more potentials towards the era of LLMs.
    摘要 Translated into Simplified Chinese:最近,大型自然语言模型(LLM)如ChatGPT和GPT-4已经吸引了很大的关注,因为它们的表现和提高有让人惊叹。控制LLM的长度已成为一个重要的话题,这也使得用户可以更好地利用LLM在更多的实际场景中,如生成一个正确的答案或一篇文章的总长度。此外,LLM的自适应生成具有极高的计算成本,控制生成的长度可以大大减少计算成本,满足不同的需求。因此,我们目的是提出一种提示基于的长度控制方法,可以广泛应用于GPT样式的LLM。特别是,我们采用了奖励学习,使用trainable或规则基本的奖励模型,通过奖励LLM生成的目标长度,进一步影响LLM的生成。实验结果表明,我们的方法可以在 популяр的数据集CNNDM和NYT上提高提示基于长度控制的准确性。我们认为这种可控长度的能力可以为LLM的未来带来更多的潜力。

A Scale-Invariant Task Balancing Approach for Multi-Task Learning

  • paper_url: http://arxiv.org/abs/2308.12029
  • repo_url: None
  • paper_authors: Baijiong Lin, Weisen Jiang, Feiyang Ye, Yu Zhang, Pengguang Chen, Ying-Cong Chen, Shu Liu
  • for: 此研究旨在解决多任务学习(MTL)中的任务均衡问题,通过保证损失和梯度层次一致性来提高MTL的性能。
  • methods: 本研究提出了一种减小损失层次差异的缩放不变多任务学习(SI-MTL)方法,包括对所有任务损失进行对数变换,以及一种梯度均衡方法SI-G,用于 нормализа所有任务梯度的大小。
  • results: 对多个 benchmark 数据集进行了广泛的实验,并 consistently 显示了 SI-G 的效果和 SI-MTL 的状态之冠性。
    Abstract Multi-task learning (MTL), a learning paradigm to learn multiple related tasks simultaneously, has achieved great success in various fields. However, task-balancing remains a significant challenge in MTL, with the disparity in loss/gradient scales often leading to performance compromises. In this paper, we propose a Scale-Invariant Multi-Task Learning (SI-MTL) method to alleviate the task-balancing problem from both loss and gradient perspectives. Specifically, SI-MTL contains a logarithm transformation which is performed on all task losses to ensure scale-invariant at the loss level, and a gradient balancing method, SI-G, which normalizes all task gradients to the same magnitude as the maximum gradient norm. Extensive experiments conducted on several benchmark datasets consistently demonstrate the effectiveness of SI-G and the state-of-the-art performance of SI-MTL.
    摘要 多任务学习(MTL),一种同时学习多个相关任务的学习模式,在各个领域取得了很大成功。然而,任务均衡仍然是MTL中的主要挑战,由于任务损失/梯度的比例不同,通常会导致性能受损。在这篇论文中,我们提出了一种归一化多任务学习(SI-MTL)方法,以解决从损失和梯度角度来看的任务均衡问题。具体来说,SI-MTL包括一个对所有任务损失进行对数变换,以确保损失水平归一化,以及一种梯度均衡方法SI-G,该方法将所有任务梯度Normalized到最大梯度 нор 的同一个范围。我们在多个标准数据集上进行了广泛的实验,并 consistently demonstated SI-G的有效性和SI-MTL的状态的杰出性。

Bias-Aware Minimisation: Understanding and Mitigating Estimator Bias in Private SGD

  • paper_url: http://arxiv.org/abs/2308.12018
  • repo_url: None
  • paper_authors: Moritz Knolle, Robert Dorfman, Alexander Ziller, Daniel Rueckert, Georgios Kaissis
  • for: 提高 differentially private SGD (DP-SGD) 的模型使用性和安全性。
  • methods: 利用 per-sample gradient norms 与 private gradient oracle 的估计偏差之间的连接,并提出了 Bias-Aware Minimisation (BAM) 方法,可以有效减少 private gradient estimator 的偏差。
  • results: 通过 theoretically 和实验测试,证明 BAM 不仅减少了偏差,还substantially 提高了 privacy-utility trade-offs 在 CIFAR-10、CIFAR-100 和 ImageNet-32 数据集上。
    Abstract Differentially private SGD (DP-SGD) holds the promise of enabling the safe and responsible application of machine learning to sensitive datasets. However, DP-SGD only provides a biased, noisy estimate of a mini-batch gradient. This renders optimisation steps less effective and limits model utility as a result. With this work, we show a connection between per-sample gradient norms and the estimation bias of the private gradient oracle used in DP-SGD. Here, we propose Bias-Aware Minimisation (BAM) that allows for the provable reduction of private gradient estimator bias. We show how to efficiently compute quantities needed for BAM to scale to large neural networks and highlight similarities to closely related methods such as Sharpness-Aware Minimisation. Finally, we provide empirical evidence that BAM not only reduces bias but also substantially improves privacy-utility trade-offs on the CIFAR-10, CIFAR-100, and ImageNet-32 datasets.
    摘要 <>使用差异隐私SGD(DP-SGD)可以实现安全和负责任地应用机器学习到敏感数据集。然而,DP-SGD只提供偏置、噪声的批处理Gradient。这使得优化步骤效果减退,模型的用途受限。在这个工作中,我们显示了每个样本Gradient的norm和私有Gradient报告的估计偏置之间的连接。我们提议了偏置意识的最小化(BAM),可以证明性地减少私有Gradient估计偏置。我们介绍了如何有效地计算BAM所需的量,并高亮了与相似方法,如锐度意识的最小化之间的相似性。最后,我们提供了实际证明,BAM不仅减少了偏置,还有效地改善了隐私-功能质量的负责任。在CIFAR-10、CIFAR-100和ImageNet-32数据集上,BAM实现了substantial的改善。>>>

Graph Neural Stochastic Differential Equations

  • paper_url: http://arxiv.org/abs/2308.12316
  • repo_url: None
  • paper_authors: Richard Bergna, Felix Opolka, Pietro Liò, Jose Miguel Hernandez-Lobato
  • for: 本研究写作的目的是提出一种新的模型Graph Neural Stochastic Differential Equations (Graph Neural SDEs),用于评估预测uncertainty。
  • methods: 该模型基于Brownian Motion嵌入随机性,从而提高预测精度。研究者提出了Latent Graph Neural SDE变体,并通过实验证明其效果。
  • results: 实验结果表明,Latent Graph Neural SDEs比常见模型如图 convolutional networks和图Neural ODEs更高效,特别是在信任预测方面。这种模型在静止和空间时间上也能够更好地处理out-of-distribution检测。
    Abstract We present a novel model Graph Neural Stochastic Differential Equations (Graph Neural SDEs). This technique enhances the Graph Neural Ordinary Differential Equations (Graph Neural ODEs) by embedding randomness into data representation using Brownian motion. This inclusion allows for the assessment of prediction uncertainty, a crucial aspect frequently missed in current models. In our framework, we spotlight the \textit{Latent Graph Neural SDE} variant, demonstrating its effectiveness. Through empirical studies, we find that Latent Graph Neural SDEs surpass conventional models like Graph Convolutional Networks and Graph Neural ODEs, especially in confidence prediction, making them superior in handling out-of-distribution detection across both static and spatio-temporal contexts.
    摘要 我们提出了一种新的模型:图 neural stochastic differential equations(图 neural SDE)。这种技术将随机性embedded into数据表示使用拟合运动,从而评估预测不确定性,这是现有模型常常缺乏的一个关键方面。在我们的框架中,我们强调了《 latent graph neural SDE》的变体,并证明其效果。通过实验研究,我们发现latent graph neural SDEs在信任预测方面表现出色,特别是在对于静态和空间-时上的异常检测方面,比如graph convolutional networks和graph neural ODEs更高效。

MKL-$L_{0/1}$-SVM

  • paper_url: http://arxiv.org/abs/2308.12016
  • repo_url: https://github.com/maxis1718/simplemkl
  • paper_authors: Bin Zhu, Yijie Shi
  • for: 提出了一种基于多kernels学习的支持向量机(SVM)算法,用于解决非 convex 和非光滑的优化问题。
  • methods: 使用了首览优化条件,然后开发了一个快速的ADMM解决方案来处理非 convex 和非光滑的优化问题。
  • results: 对于 sintetic 和实际数据集的实验表明,我们的MKL-$L_{0/1}$-SVM表现和SimpleMKL算法相当。
    Abstract This paper presents a Multiple Kernel Learning (abbreviated as MKL) framework for the Support Vector Machine (SVM) with the $(0, 1)$ loss function. Some first-order optimality conditions are given and then exploited to develop a fast ADMM solver to deal with the nonconvex and nonsmooth optimization problem. Extensive numerical experiments on synthetic and real datasets show that the performance of our MKL-$L_{0/1}$-SVM is comparable with the one of the leading approaches called SimpleMKL developed by Rakotomamonjy, Bach, Canu, and Grandvalet [Journal of Machine Learning Research, vol. 9, pp. 2491-2521, 2008].
    摘要 这篇论文提出了一种基于多kernels学习(简称MKL)的支持向量机(SVM)模型,使用$(0,1)$损失函数。文章首先给出了一些首选条件,然后利用这些条件来开发一种快速的ADMM算法来解决非对称和不连续优化问题。广泛的数据集实验表明,我们的MKL-$L_{0/1}$-SVM的性能与SimpleMKL模型相当,这种模型由Rakotomamonjy等人在《机器学习研究》杂志上发表的(vol. 9, pp. 2491-2521, 2008)。

Quantum-Noise-driven Generative Diffusion Models

  • paper_url: http://arxiv.org/abs/2308.12013
  • repo_url: None
  • paper_authors: Marco Parigi, Stefano Martina, Filippo Caruso
  • for: 这个论文是为了探讨并实现基于机器学习技术的生成模型,以便从 finite 个训练样本中推断复杂的数据分布,并生成新的 sintetic 数据。
  • methods: 这个论文使用了 diffusion 模型,这是一种在创建 sintetic 文本和高质量图像方面表现出色的潜在敌手网络的新框架。
  • results: 这个论文提出了三种基于量子噪声的生成扩散模型,这些模型可以在实际的量子系统上进行实验测试。这些模型利用量子特性,特别是干扰、Entanglement和噪声之间的非常复杂的交互作用,以解决类别 diffusion 模型中的主要计算压力。因此,这些结果预期会开拓新的量子推导或量子基于的生成扩散算法,用于解决更广泛的实际应用问题,从气候预测到 neuroscience,从交通流量分析到金融预测。
    Abstract Generative models realized with machine learning techniques are powerful tools to infer complex and unknown data distributions from a finite number of training samples in order to produce new synthetic data. Diffusion models are an emerging framework that have recently overcome the performance of the generative adversarial networks in creating synthetic text and high-quality images. Here, we propose and discuss the quantum generalization of diffusion models, i.e., three quantum-noise-driven generative diffusion models that could be experimentally tested on real quantum systems. The idea is to harness unique quantum features, in particular the non-trivial interplay among coherence, entanglement and noise that the currently available noisy quantum processors do unavoidably suffer from, in order to overcome the main computational burdens of classical diffusion models during inference. Hence, we suggest to exploit quantum noise not as an issue to be detected and solved but instead as a very remarkably beneficial key ingredient to generate much more complex probability distributions that would be difficult or even impossible to express classically, and from which a quantum processor might sample more efficiently than a classical one. Therefore, our results are expected to pave the way for new quantum-inspired or quantum-based generative diffusion algorithms addressing more powerfully classical tasks as data generation/prediction with widespread real-world applications ranging from climate forecasting to neuroscience, from traffic flow analysis to financial forecasting.
    摘要 “生成模型利用机器学习技术可以将复杂和未知的数据分布推导出来生成新的人工数据。扩散模型是一种兴起的框架,最近在创建高品质的文本和图像的生成方面已经超越了对抗式生成模型。在这篇文章中,我们提出了量子扩散模型的普遍化,即使用量子过程中的噪声来验证和评估这些模型。我们的想法是利用量子特有的特征,特别是干扰、统计和噪声之间的非凡互动,从现有的噪声量子处理器中获得优化的效果。因此,我们建议利用噪声不作为问题,而是将其视为一个非常有利的元素,从中生成更加复杂的概率分布,这些分布可能是класси�criptor的表达不可能的,而且量子处理器可能从这些分布中抽样更加高效。因此,我们预期我们的结果将开启一条新的量子灵感或量子基础的生成扩散算法,用于更有效地处理大量的数据,并具有广泛的实际应用,包括气候预测、神经科学、交通流量分析和金融预测。”

Neural oscillators for magnetic hysteresis modeling

  • paper_url: http://arxiv.org/abs/2308.12002
  • repo_url: None
  • paper_authors: Abhishek Chandra, Taniya Kapoor, Bram Daniels, Mitrofan Curti, Koen Tiels, Daniel M. Tartakovsky, Elena A. Lomonova
  • for: 这个论文的目的是模型和诊断材料中的惯性现象。
  • methods: 这个论文使用了 diferencial equation-based recurrent neural network (RNN)方法来模型惯性,启发自coupled-oscillatory RNN和现象学模型。
  • results: 研究发现,HystRNN可以在未经训练的区域中展现出普适的行为,这是traditional rate-dependent方法无法捕捉材料内部非线性的优势。
    Abstract Hysteresis is a ubiquitous phenomenon in science and engineering; its modeling and identification are crucial for understanding and optimizing the behavior of various systems. We develop an ordinary differential equation-based recurrent neural network (RNN) approach to model and quantify the hysteresis, which manifests itself in sequentiality and history-dependence. Our neural oscillator, HystRNN, draws inspiration from coupled-oscillatory RNN and phenomenological hysteresis models to update the hidden states. The performance of HystRNN is evaluated to predict generalized scenarios, involving first-order reversal curves and minor loops. The findings show the ability of HystRNN to generalize its behavior to previously untrained regions, an essential feature that hysteresis models must have. This research highlights the advantage of neural oscillators over the traditional RNN-based methods in capturing complex hysteresis patterns in magnetic materials, where traditional rate-dependent methods are inadequate to capture intrinsic nonlinearity.
    摘要 恒定性(Hysteresis)是科学和工程中广泛存在的现象,其模型化和识别对各种系统的行为有着重要的意义。我们开发了基于常微分方程的循环神经网络(RNN)方法来模型和评估恒定性,该现象在Sequentiality和历史依赖性中表现出来。我们的神经普朗(HystRNN) drawing inspiration from coupled-oscillatory RNN and phenomenological hysteresis models,用来更新隐藏状态。我们对HystRNN的性能进行评估,可以预测一般化enario,包括第一次反转曲线和小循环。发现HystRNN可以在未经训练的区域中generalize its behavior,这是恒定性模型必备的一个重要特点。本研究表明神经普朗比传统RNN基本方法更有利于捕捉magnetic materials中的复杂恒定性模式,这些模式traditional rate-dependent方法无法捕捉内在非线性。

Trustworthy Representation Learning Across Domains

  • paper_url: http://arxiv.org/abs/2308.12315
  • repo_url: None
  • paper_authors: Ronghang Zhu, Dongliang Guo, Daiqing Qi, Zhixuan Chu, Xiang Yu, Sheng Li
  • For: 本文主要研究领域是如何使 Machine Learning 模型在不同领域之间进行可靠的表示学习,以满足现实世界应用场景中的多样化数据和不确定性。* Methods: 本文提出了一个包含四个概念(robustness、privacy、fairness和explainability)的可靠表示学习框架,并对现有的方法进行了概括和评论。* Results: 本文通过对多种实验和案例进行了评估和分析,并提出了未来研究方向的想法和见解。Here’s the same information in English for reference:* For: The main research area of this paper is how to make Machine Learning models reliable and robust across different domains, to meet the diverse data and uncertainty in real-world applications.* Methods: The paper proposes a trustworthy representation learning framework that includes four concepts (robustness, privacy, fairness, and explainability) and provides an overview and analysis of existing methods for each concept.* Results: The paper evaluates and analyzes the performance of various experiments and case studies, and provides insights and ideas for future research directions.
    Abstract As AI systems have obtained significant performance to be deployed widely in our daily live and human society, people both enjoy the benefits brought by these technologies and suffer many social issues induced by these systems. To make AI systems good enough and trustworthy, plenty of researches have been done to build guidelines for trustworthy AI systems. Machine learning is one of the most important parts for AI systems and representation learning is the fundamental technology in machine learning. How to make the representation learning trustworthy in real-world application, e.g., cross domain scenarios, is very valuable and necessary for both machine learning and AI system fields. Inspired by the concepts in trustworthy AI, we proposed the first trustworthy representation learning across domains framework which includes four concepts, i.e, robustness, privacy, fairness, and explainability, to give a comprehensive literature review on this research direction. Specifically, we first introduce the details of the proposed trustworthy framework for representation learning across domains. Second, we provide basic notions and comprehensively summarize existing methods for the trustworthy framework from four concepts. Finally, we conclude this survey with insights and discussions on future research directions.
    摘要 First, we introduce the details of the proposed trustworthy framework for representation learning across domains. Second, we provide a comprehensive summary of existing methods for the trustworthy framework from the four concepts. Finally, we conclude this survey with insights and discussions on future research directions.The proposed trustworthy framework for representation learning across domains includes four key components:1. Robustness: This refers to the ability of the representation learning model to be resistant to variations in the input data and to maintain its performance under different conditions.2. Privacy: This involves ensuring that the representation learning model does not infringe on the privacy of the individuals or entities it is applied to, and that it does not leak sensitive information.3. Fairness: This requires that the representation learning model treats all individuals or entities equally and without bias, regardless of their background or characteristics.4. Explainability: This involves providing clear and understandable explanations of the decisions made by the representation learning model, so that users can understand how it arrived at its conclusions.Existing methods for the trustworthy framework from the four concepts include:1. Robustness: Techniques such as data augmentation, adversarial training, and ensemble learning can be used to improve the robustness of representation learning models.2. Privacy: Methods such as differential privacy, secure multi-party computation, and homomorphic encryption can be used to protect the privacy of individuals or entities in representation learning models.3. Fairness: Techniques such as fairness-aware regularization, re-weighting, and transfer learning can be used to promote fairness in representation learning models.4. Explainability: Approaches such as feature importance, saliency maps, and visualization can be used to provide explainability in representation learning models.In conclusion, the proposed trustworthy representation learning across domains framework provides a comprehensive and systematic approach to ensuring the trustworthiness of representation learning models in real-world applications. Future research should focus on developing more effective and efficient methods for each of the four components of the framework, as well as exploring new applications and domains for trustworthy representation learning.

On Uniformly Optimal Algorithms for Best Arm Identification in Two-Armed Bandits with Fixed Budget

  • paper_url: http://arxiv.org/abs/2308.12000
  • repo_url: None
  • paper_authors: Po-An Wang, Kaito Ariu, Alexandre Proutiere
  • for: 这个论文研究了固定预算下最佳臂 Identification的问题,具体来说是在随机二臂投掷机中的 Bernoulli 奖励中。
  • methods: 作者使用了consistent和stable算法的自然类型,并证明了这类算法在所有实例上都能够达到 uniform 抽样算法的性能水平,而且不存在任何 strictly outperform uniform 抽样算法的算法。
  • results: 研究结果表明,无论是在所有实例上还是在某些特定实例上,uniform 抽样算法都是最佳的,而且任何consistent和stable的算法都不能 strictly outperform uniform 抽样算法。
    Abstract We study the problem of best-arm identification with fixed budget in stochastic two-arm bandits with Bernoulli rewards. We prove that surprisingly, there is no algorithm that (i) performs as well as the algorithm sampling each arm equally (this algorithm is referred to as the {\it uniform sampling} algorithm) on all instances, and that (ii) strictly outperforms this algorithm on at least one instance. In short, there is no algorithm better than the uniform sampling algorithm. Towards this result, we introduce the natural class of {\it consistent} and {\it stable} algorithms, and show that any algorithm that performs as well as the uniform sampling algorithm on all instances belongs to this class. The proof is completed by deriving a lower bound on the error rate satisfied by any consistent and stable algorithm, and by showing that the uniform sampling algorithm matches this lower bound. Our results provide a solution to the two open problems presented in \cite{qin2022open}.
    摘要 我们研究fixed budget下的最佳臂标识问题,具体来说是在随机两臂投机中的 Берну利奖励下。我们证明了一些意外的结论:没有任何算法可以在所有情况下比uniform sampling算法(即随机选择每个臂的算法)更好,并且在至少一个情况下比它更好。简而言之,uniform sampling算法是最佳的。为达到这个结论,我们引入了自然的consistent和stable算法类型,并证明了任何可以在所有情况下与uniform sampling算法相当的算法都属于这个类型。proof的完成是通过 derivation of a lower bound on the error rate satisfied by any consistent and stable algorithm,并证明了uniform sampling算法与这个下界相符。我们的结果解决了在 \cite{qin2022open} 中提出的两个开放问题。Note: "consistent" and "stable" are translated as "自然" (natural) and "稳定" (stable) respectively, since there is no direct translation for "consistent" in Chinese.

Relational Concept Based Models

  • paper_url: http://arxiv.org/abs/2308.11991
  • repo_url: https://github.com/Aghoreshwar/Awesome-Customer-Analytics
  • paper_authors: Pietro Barbiero, Francesco Giannini, Gabriele Ciravegna, Michelangelo Diligenti, Giuseppe Marra
  • for: 这paper的目的是提出一种可解释深度学习模型,用于在关系领域中进行任务预测。
  • methods: 这paper使用的方法是基于概念的深度学习模型,它们不仅可以在关系领域中进行任务预测,还可以提供可解释的任务预测结果。
  • results: 这paper的实验结果表明,关系基于概念的深度学习模型可以匹配现有关系黑盒模型的一致性表现,同时支持生成量化的概念基于解释,应对测试时间的交互,并在各种具有挑战性的情况下保持可靠的表现。
    Abstract The design of interpretable deep learning models working in relational domains poses an open challenge: interpretable deep learning methods, such as Concept-Based Models (CBMs), are not designed to solve relational problems, while relational models are not as interpretable as CBMs. To address this problem, we propose Relational Concept-Based Models, a family of relational deep learning methods providing interpretable task predictions. Our experiments, ranging from image classification to link prediction in knowledge graphs, show that relational CBMs (i) match generalization performance of existing relational black-boxes (as opposed to non-relational CBMs), (ii) support the generation of quantified concept-based explanations, (iii) effectively respond to test-time interventions, and (iv) withstand demanding settings including out-of-distribution scenarios, limited training data regimes, and scarce concept supervisions.
    摘要 “ interpretability deep learning 模型在关系领域设计存在开放挑战: interpretable deep learning 方法,如概念基础模型(CBM),不是解释关系问题的设计,而关系模型不如 CBM 那样可读取。为解决这个问题,我们提议关系概念基础模型,这是一种可解释的关系深度学习方法,可以提供可读取的任务预测。我们的实验,从图像分类到知识图表链接预测,表明关系 CBM(i)与现有关系黑盒子(非关系 CBM)相比,具有相同的普通化性能,(ii)支持生成量化的概念基础解释,(iii)能够响应测试时间 intervenction,以及(iv)在具有异常enario、有限训练数据 régime和罕见概念监督的情况下表现出色。”Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, I can provide that as well.

Will More Expressive Graph Neural Networks do Better on Generative Tasks?

  • paper_url: http://arxiv.org/abs/2308.11978
  • repo_url: None
  • paper_authors: Xiandong Zou, Xiangyu Zhao, Pietro Liò, Yiren Zhao
  • for: 这种论文的目的是为了探讨图生成任务中Graph Neural Network(GNN)的表达能力,并评估不同GNN的表达能力在分子生成任务中。
  • methods: 该论文使用了六种不同的GNN,包括GCPN和GraphAF两种生成框架,在ZINC-250k dataset上进行了广泛的实验。
  • results: 研究发现,使用更高级的GNN可以提高GCPN和GraphAF在分子生成任务中的表现,但是GNN表达能力不是一定是一个很好的GNN-based生成模型的必要 условия。此外,GCPN和GraphAF与更高级的GNN可以在17种非GNN-based图生成方法(如变量自动编码器和极限优化模型)上达到状态元的结果,在提出的分子生成目标(DRD2、Median1、Median2)中。
    Abstract Graph generation poses a significant challenge as it involves predicting a complete graph with multiple nodes and edges based on simply a given label. This task also carries fundamental importance to numerous real-world applications, including de-novo drug and molecular design. In recent years, several successful methods have emerged in the field of graph generation. However, these approaches suffer from two significant shortcomings: (1) the underlying Graph Neural Network (GNN) architectures used in these methods are often underexplored; and (2) these methods are often evaluated on only a limited number of metrics. To fill this gap, we investigate the expressiveness of GNNs under the context of the molecular graph generation task, by replacing the underlying GNNs of graph generative models with more expressive GNNs. Specifically, we analyse the performance of six GNNs in two different generative frameworks (GCPN and GraphAF), on six different molecular generative objectives on the ZINC-250k dataset. Through our extensive experiments, we demonstrate that advanced GNNs can indeed improve the performance of GCPN and GraphAF on molecular generation tasks, but GNN expressiveness is not a necessary condition for a good GNN-based generative model. Moreover, we show that GCPN and GraphAF with advanced GNNs can achieve state-of-the-art results across 17 other non-GNN-based graph generative approaches, such as variational autoencoders and Bayesian optimisation models, on the proposed molecular generative objectives (DRD2, Median1, Median2), which are important metrics for de-novo molecular design.
    摘要 <>文本翻译成简化中文。图生成具有重要挑战,因为它需要预测具有多个节点和边的完整图 based on 只有一个标签。这个任务也对实际应用中的许多应用有基本重要性,包括新药和分子设计。在过去几年,图生成领域中有多种成功的方法出现。然而,这些方法受到两个重要缺点的影响:(1)用于这些方法的基本图神经网络(GNN)架构通常未得到充分研究; 和(2)这些方法通常只在有限的约束下评估。为了填补这个差距,我们调查了GNN在分子图生成任务中的表达能力,通过将基本GNN替换为更表达能力的GNN来进行分析。我们在ZINC-250k数据集上使用六种GNN进行两种不同的生成框架(GCPN和GraphAF)进行了广泛的实验。我们的结果表明,高级GNN可以改善GCPN和GraphAF在分子生成任务上的性能,但GNN表达能力不是一定的必要条件 для好的GNN-based生成模型。此外,我们发现GCPN和GraphAF与高级GNN可以在17种非GNN-based图生成方法(如变量自动编码器和抽象优化模型)之上 achieve state-of-the-art 结果,这些结果是重要的指标 для新药设计。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Approximating Score-based Explanation Techniques Using Conformal Regression

  • paper_url: http://arxiv.org/abs/2308.11975
  • repo_url: None
  • paper_authors: Amr Alkhatib, Henrik Boström, Sofiane Ennadir, Ulf Johansson
  • for: 这 paper 的目的是提出一种 computationally less costly regression model 来 approximate score-based explanation techniques, such as SHAP, 以提高时间效率。
  • methods: 这 paper 使用了一种 inductive conformal prediction framework 提供了有效性保证,以及一些不同的 non-conformity measures 来考虑 aproximation 的困难性和计算成本。
  • results: 实验结果表明,提出的方法可以 Significantly improve execution time compared to the fast version of SHAP, TreeSHAP, 并且可以生成紧凑的间隔,同时提供有效性保证。 另外,这种方法允许对不同的 approximation 方法进行比较,并选择一种方法基于其预测间隔的紧凑程度。
    Abstract Score-based explainable machine-learning techniques are often used to understand the logic behind black-box models. However, such explanation techniques are often computationally expensive, which limits their application in time-critical contexts. Therefore, we propose and investigate the use of computationally less costly regression models for approximating the output of score-based explanation techniques, such as SHAP. Moreover, validity guarantees for the approximated values are provided by the employed inductive conformal prediction framework. We propose several non-conformity measures designed to take the difficulty of approximating the explanations into account while keeping the computational cost low. We present results from a large-scale empirical investigation, in which the approximate explanations generated by our proposed models are evaluated with respect to efficiency (interval size). The results indicate that the proposed method can significantly improve execution time compared to the fast version of SHAP, TreeSHAP. The results also suggest that the proposed method can produce tight intervals, while providing validity guarantees. Moreover, the proposed approach allows for comparing explanations of different approximation methods and selecting a method based on how informative (tight) are the predicted intervals.
    摘要 “基于分数的解释机器学习技术常用来理解黑obox模型的逻辑。然而,这些解释技术经常 computationally expensive,这限制了它们在时间敏感上下文中的应用。因此,我们提出并调查使用 computationally less costly regression models来近似分数基的解释技术,如 SHAP。此外,我们提供了雇用 inductive conformal prediction 框架来提供有效性保证。我们还提出了一些非准确度度量,用于考虑近似解释的困难程度,同时保持计算成本低。我们对大规模的实验进行了评估,并评估了我们提出的方法的执行时间、间隔大小和有效性。结果表明,我们的方法可以明显提高执行时间,并且可以生成紧凑的间隔,同时提供有效性保证。此外,我们的方法允许比较不同的近似方法的解释,并选择基于解释的紧凑程度(有效性)来选择最佳的方法。”Note: Please note that the translation is in Simplified Chinese, which is one of the two standard versions of Chinese. If you need Traditional Chinese, please let me know.

EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE

  • paper_url: http://arxiv.org/abs/2308.11971
  • repo_url: None
  • paper_authors: Junyi Chen, Longteng Guo, Jia Sun, Shuai Shao, Zehuan Yuan, Liang Lin, Dongyu Zhang
  • for: This paper aims to introduce an efficient vision-language model that can learn from diverse, multimodal data.
  • methods: The proposed model, called EVE, uses a unified multimodal Transformer network with modality-aware sparse Mixture-of-Experts (MoE) modules to capture modality-specific information. The model is pre-trained using a simple yet effective masked signal modeling task, which accelerates training and enables better downstream performance.
  • results: EVE achieves state-of-the-art performance on various vision-language downstream tasks, including visual question answering, visual reasoning, and image-text retrieval, despite its simplicity.
    Abstract Building scalable vision-language models to learn from diverse, multimodal data remains an open challenge. In this paper, we introduce an Efficient Vision-languagE foundation model, namely EVE, which is one unified multimodal Transformer pre-trained solely by one unified pre-training task. Specifically, EVE encodes both vision and language within a shared Transformer network integrated with modality-aware sparse Mixture-of-Experts (MoE) modules, which capture modality-specific information by selectively switching to different experts. To unify pre-training tasks of vision and language, EVE performs masked signal modeling on image-text pairs to reconstruct masked signals, i.e., image pixels and text tokens, given visible signals. This simple yet effective pre-training objective accelerates training by 3.5x compared to the model pre-trained with Image-Text Contrastive and Image-Text Matching losses. Owing to the combination of the unified architecture and pre-training task, EVE is easy to scale up, enabling better downstream performance with fewer resources and faster training speed. Despite its simplicity, EVE achieves state-of-the-art performance on various vision-language downstream tasks, including visual question answering, visual reasoning, and image-text retrieval.
    摘要 建立可扩展的视觉语言模型,从多样化的多模式数据学习仍然是一个开放的挑战。在这篇论文中,我们介绍了一种高效的视觉语言基础模型,即EVE(高效视觉语言基础模型),它是一种具有共享变换网络和多样化专家模块的单一多模式变换器,可以同时处理视觉和语言信息。Specifically,EVE使用多样化专家模块来捕捉不同模式的信息,并通过隐藏信号重建来快速适应不同的预训练任务。由于EVE的统一架构和预训练任务,它可以轻松扩展,从而实现更好的下游性能,并且更快地训练。尽管其简单,EVE仍然达到了多种视觉语言下游任务的状态领先性,包括视觉问答、视觉理解和图像文本检索。

Anisotropic Hybrid Networks for liver tumor segmentation with uncertainty quantification

  • paper_url: http://arxiv.org/abs/2308.11969
  • repo_url: None
  • paper_authors: Benjamin Lambert, Pauline Roca, Florence Forbes, Senan Doyle, Michel Dojat
  • for: 避免liver tumor burden, 提高hepatocellular carcinoma(HCC)的治疗策略。
  • methods: 使用contrast-enhanced magnetic resonance imaging(CE-MRI)进行liver和tumor的分割,并 Comparing two different pipelines based on anisotropic models。
  • results: 两种pipeline都能够 segments liver and tumors,但是每个pipeline具有不同的优劣点,同时提出了一种uncertainty quantification strategy来评估潜在的假阳性肿瘤。
    Abstract The burden of liver tumors is important, ranking as the fourth leading cause of cancer mortality. In case of hepatocellular carcinoma (HCC), the delineation of liver and tumor on contrast-enhanced magnetic resonance imaging (CE-MRI) is performed to guide the treatment strategy. As this task is time-consuming, needs high expertise and could be subject to inter-observer variability there is a strong need for automatic tools. However, challenges arise from the lack of available training data, as well as the high variability in terms of image resolution and MRI sequence. In this work we propose to compare two different pipelines based on anisotropic models to obtain the segmentation of the liver and tumors. The first pipeline corresponds to a baseline multi-class model that performs the simultaneous segmentation of the liver and tumor classes. In the second approach, we train two distinct binary models, one segmenting the liver only and the other the tumors. Our results show that both pipelines exhibit different strengths and weaknesses. Moreover we propose an uncertainty quantification strategy allowing the identification of potential false positive tumor lesions. Both solutions were submitted to the MICCAI 2023 Atlas challenge regarding liver and tumor segmentation.
    摘要 肝肿瘤的重要性,排名为癌症致死的第四大原因。在肝细胞癌(HCC)中,在对着增强的磁共振成像(CE-MRI)中定义肝脏和肿瘤的分割是用于导向治疗策略的。然而,由于这个任务需要较长的时间,需要高水平的专业知识,并且容易受到观察者间的变化,因此有强烈的需求 для自动工具。然而,由于数据不足以及图像分辨率和MRI序列的高变化,在这种情况下存在多种挑战。在这项工作中,我们提出了两种不同的管道,基于不同的模型来获得肝脏和肿瘤的分割。第一个管道是基线多类模型,同时 segments肝脏和肿瘤的多个类。在第二个方法中,我们训练了两个不同的二进制模型,其中一个只 segments肝脏,另一个只 segments肿瘤。我们的结果显示,这两种管道具有不同的优劣点。此外,我们还提出了一种不确定性评估策略,允许标识 potential false positive肿瘤肿瘤。这两种解决方案都被提交到了MICCAI 2023 Atlas挑战,探讨肝脏和肿瘤的分割。

Maintaining Plasticity via Regenerative Regularization

  • paper_url: http://arxiv.org/abs/2308.11958
  • repo_url: None
  • paper_authors: Saurabh Kumar, Henrik Marklund, Benjamin Van Roy
  • for: 本文旨在提出一种简单的方法来维护神经网络在处理非站ARY数据流时的柔软性,即L2Init。
  • methods: L2Init方法基于损失函数中添加L2正则化项,将初始参数迁化到原点方向进行正则化。这种方法简单易行,只需选择一个额外参数即可。
  • results: 在不同类型的非站ARY学习问题上,L2Init方法能够一致地遏制权重损失。此外,我们发现L2Init方法可以降低参数的大小,并保持高效的特征矩阵。
    Abstract In continual learning, plasticity refers to the ability of an agent to quickly adapt to new information. Neural networks are known to lose plasticity when processing non-stationary data streams. In this paper, we propose L2 Init, a very simple approach for maintaining plasticity by incorporating in the loss function L2 regularization toward initial parameters. This is very similar to standard L2 regularization (L2), the only difference being that L2 regularizes toward the origin. L2 Init is simple to implement and requires selecting only a single hyper-parameter. The motivation for this method is the same as that of methods that reset neurons or parameter values. Intuitively, when recent losses are insensitive to particular parameters, these parameters drift toward their initial values. This prepares parameters to adapt quickly to new tasks. On simple problems representative of different types of nonstationarity in continual learning, we demonstrate that L2 Init consistently mitigates plasticity loss. We additionally find that our regularization term reduces parameter magnitudes and maintains a high effective feature rank.
    摘要

When MiniBatch SGD Meets SplitFed Learning:Convergence Analysis and Performance Evaluation

  • paper_url: http://arxiv.org/abs/2308.11953
  • repo_url: None
  • paper_authors: Chao Huang, Geng Tian, Ming Tang
  • For: The paper is written to address the issue of client drift in split federated learning (SFL) and to propose a new algorithm called MiniBatch-SFL that incorporates mini-batch SGD into SFL to mitigate client drift.* Methods: The paper uses a distributed approach called SFL, which splits the model into two parts and trains them separately on the client and server sides. The algorithm incorporates mini-batch SGD into SFL to improve the training process.* Results: The paper shows that MiniBatch-SFL achieves higher accuracy than conventional SFL and FL, with an accuracy improvement of up to 24.1% and 17.1% with highly non-IID data, respectively. The paper also analyzes the convergence of MiniBatch-SFL and shows that the bound of the expected loss can be obtained by analyzing the expected server-side and client-side model updates.Here is the information in Simplified Chinese text:* For: 本文是为了解决分布式学习(Federated Learning,FL)中客户端偏移(client drift)问题,并提出一种新的算法called MiniBatch-SFL。* Methods: 本文使用了分布式方法called SFL,该方法将模型分成两部分,并在客户端和服务器两个地方进行训练。该算法将mini-batch SGD incorporated into SFL以改善训练过程。* Results: 本文显示,MiniBatch-SFL可以与传统的SFL和FL相比,在高度异步数据上提高准确性,准确性提高可达24.1%和17.1%。本文还分析了MiniBatch-SFL的收敛性,并证明了服务器端和客户端模型更新的预期损失下界可以通过分析服务器端和客户端模型更新的预期值来获得。
    Abstract Federated learning (FL) enables collaborative model training across distributed clients (e.g., edge devices) without sharing raw data. Yet, FL can be computationally expensive as the clients need to train the entire model multiple times. SplitFed learning (SFL) is a recent distributed approach that alleviates computation workload at the client device by splitting the model at a cut layer into two parts, where clients only need to train part of the model. However, SFL still suffers from the \textit{client drift} problem when clients' data are highly non-IID. To address this issue, we propose MiniBatch-SFL. This algorithm incorporates MiniBatch SGD into SFL, where the clients train the client-side model in an FL fashion while the server trains the server-side model similar to MiniBatch SGD. We analyze the convergence of MiniBatch-SFL and show that the bound of the expected loss can be obtained by analyzing the expected server-side and client-side model updates, respectively. The server-side updates do not depend on the non-IID degree of the clients' datasets and can potentially mitigate client drift. However, the client-side model relies on the non-IID degree and can be optimized by properly choosing the cut layer. Perhaps counter-intuitive, our empirical result shows that a latter position of the cut layer leads to a smaller average gradient divergence and a better algorithm performance. Moreover, numerical results show that MiniBatch-SFL achieves higher accuracy than conventional SFL and FL. The accuracy improvement can be up to 24.1\% and 17.1\% with highly non-IID data, respectively.
    摘要 federated 学习(FL)可以在分布式客户端(例如边缘设备)上进行共同模型训练,而不需要将原始数据共享。然而,FL 可能会具有高计算成本,因为客户端需要训练整个模型多次。splitFed 学习(SFL)是一种最近的分布式方法,它可以减轻客户端设备上的计算工作负担,通过将模型分割成两个部分, client 只需要训练一部分模型。然而,SFL 仍然受到客户端数据高度非同一致(non-IID)的问题困扰。为了解决这个问题,我们提出了 MiniBatch-SFL。这个算法将 MiniBatch SGD 与 SFL 结合,客户端在 FL 方式下训练客户端模型,而服务器则在 MiniBatch SGD 方式下训练服务器模型。我们分析了 MiniBatch-SFL 的收敛性,并证明了预期损失的下界可以通过分析服务器和客户端模型更新的预期值来获得。服务器模型更新不依赖于客户端数据的非同一致度,并可能减轻客户端漂移。然而,客户端模型受到非同一致度的影响,可以通过正确选择切割层来优化。有些奇妙的是,我们的实验结果表明,将切割层放在后面位置可以减少平均梯度偏差,并且提高算法性能。此外,我们的数值结果表明,MiniBatch-SFL 可以高于传统的 SFL 和 FL 准确率。准确率的提高可以达到 24.1% 和 17.1%,尤其是在高度非同一致的数据上。

Multi-scale Transformer Pyramid Networks for Multivariate Time Series Forecasting

  • paper_url: http://arxiv.org/abs/2308.11946
  • repo_url: None
  • paper_authors: Yifan Zhang, Rui Wu, Sergiu M. Dascalu, Frederick C. Harris Jr
  • for: 本文旨在提出一种维度不变的嵌入技术,用于捕捉短期时间关系,并将多ivariate时间序列数据映射到更高维度空间中,保留时间步和变量的维度。
  • methods: 本文提出了一种新的多级变换器 пирамид网络(MTPNet),用于有效地捕捉多个自由级别的时间关系。该网络通过多级变换器来获取多scale latent representation,并通过这些latent representation来预测时间序列。
  • results: gexperiments表明,提出的MTPNet方法在九个benchmark数据集上表现出色,比之前的state-of-the-art方法更高。
    Abstract Multivariate Time Series (MTS) forecasting involves modeling temporal dependencies within historical records. Transformers have demonstrated remarkable performance in MTS forecasting due to their capability to capture long-term dependencies. However, prior work has been confined to modeling temporal dependencies at either a fixed scale or multiple scales that exponentially increase (most with base 2). This limitation hinders their effectiveness in capturing diverse seasonalities, such as hourly and daily patterns. In this paper, we introduce a dimension invariant embedding technique that captures short-term temporal dependencies and projects MTS data into a higher-dimensional space, while preserving the dimensions of time steps and variables in MTS data. Furthermore, we present a novel Multi-scale Transformer Pyramid Network (MTPNet), specifically designed to effectively capture temporal dependencies at multiple unconstrained scales. The predictions are inferred from multi-scale latent representations obtained from transformers at various scales. Extensive experiments on nine benchmark datasets demonstrate that the proposed MTPNet outperforms recent state-of-the-art methods.
    摘要 多变量时间序列(MTS)预测 involve 模型历史记录中的时间相关性。 transformers 在 MTS 预测中表现出色,因为它们可以捕捉长期相关性。然而,先前的工作受限于模型时间相关性的固定级别或多个级别,其中每个级别都是对数增长的(大多数是对数增长级别)。这种限制使得它们无法有效地捕捉多种季节性,如每小时和每天的模式。在本文中,我们介绍了一种维度不变的嵌入技术,该技术可以捕捉短期时间相关性,并将 MTS 数据 проек到更高维度空间中,保持时间步骤和变量在 MTS 数据中的维度。此外,我们提出了一种特有的多级 transformer пирамиidal 网络(MTPNet),该网络是专门为了 capture 时间相关性的多个自由级别。预测基于 MTS 数据中的多级幂表示,从 transformers 中获得的多级幂表示。经验表明,我们提出的 MTPNet 在九个 Refer 数据集上表现出色,超过了当前状态的方法。

RamseyRL: A Framework for Intelligent Ramsey Number Counterexample Searching

  • paper_url: http://arxiv.org/abs/2308.11943
  • repo_url: None
  • paper_authors: Steve Vott, Adam M. Lehavi
  • for: 这篇论文探讨了使用最佳搜索算法和强化学习(RL)技术来找到特定 Ramsey 数字的反例。
  • methods: 该论文引入了图vectorization和深度神经网络(DNN)基于的优化,以评估图是否为反例的可能性。
  • results: 论文不主要是报道新的反例,而是介绍和评估一种用于 Ramsey 反例探索的框架,并提供了代码和方法。
    Abstract The Ramsey number is the minimum number of nodes, $n = R(s, t)$, such that all undirected simple graphs of order $n$, contain a clique of order $s$, or an independent set of order $t$. This paper explores the application of a best first search algorithm and reinforcement learning (RL) techniques to find counterexamples to specific Ramsey numbers. We incrementally improve over prior search methods such as random search by introducing a graph vectorization and deep neural network (DNN)-based heuristic, which gauge the likelihood of a graph being a counterexample. The paper also proposes algorithmic optimizations to confine a polynomial search runtime. This paper does not aim to present new counterexamples but rather introduces and evaluates a framework supporting Ramsey counterexample exploration using other heuristics. Code and methods are made available through a PyPI package and GitHub repository.
    摘要 《赛美数字》是指最小的节点数量,$n = R(s, t)$, 使得所有无向简单图的顺序为$n$,都包含一个 clique 的顺序为$s$,或一个独立集的顺序为$t$。这篇论文探讨了使用最佳搜索算法和强化学习(RL)技术来找到特定的赛美数字 counterexample。我们会逐步改进先前的搜索方法,如随机搜索,通过引入图像化和深度神经网络(DNN)基于的优化器,来评估图的可能性是否为 counterexample。论文还提出了算法优化,以确保搜索时间减少到多项式。本论文不是寻找新的 counterexample,而是介绍和评估一种用于赛美 counterexample 探索的框架,使用其他诱导。代码和方法通过 PyPI 包和 GitHub 存储库提供。

Audio Generation with Multiple Conditional Diffusion Model

  • paper_url: http://arxiv.org/abs/2308.11940
  • repo_url: None
  • paper_authors: Zhifang Guo, Jianguo Mao, Rui Tao, Long Yan, Kazushige Ouchi, Hong Liu, Xiangdong Wang
  • For: The paper aims to enhance the controllability of existing pre-trained text-to-audio models by incorporating additional conditions such as content and style.* Methods: The proposed model uses a trainable control condition encoder enhanced by a large language model and a trainable Fusion-Net to encode and fuse the additional conditions, while keeping the weights of the pre-trained text-to-audio model frozen.* Results: The model achieves fine-grained control over the temporal order, pitch, and energy of generated audio, and experimental results demonstrate its success in accomplishing controllable audio generation.Here is the same information in Simplified Chinese text:* For: 该文章的目的是增强现有的预训练文本到音频模型的可控性,通过添加更多的条件,包括内容(时间戳)和风格(折屈轨迹和能量轨迹)。* Methods: 提议的模型使用一个可调控条件编码器,通过一个大型自然语言模型和一个可调控的融合网络来编码和融合更多的条件,同时保持预训练文本到音频模型的 weights 冻结。* Results: 模型实现了细化的控制,可以控制音频的时间顺序、折屈和能量等方面,实验结果表明模型成功实现了可控音频生成。
    Abstract Text-based audio generation models have limitations as they cannot encompass all the information in audio, leading to restricted controllability when relying solely on text. To address this issue, we propose a novel model that enhances the controllability of existing pre-trained text-to-audio models by incorporating additional conditions including content (timestamp) and style (pitch contour and energy contour) as supplements to the text. This approach achieves fine-grained control over the temporal order, pitch, and energy of generated audio. To preserve the diversity of generation, we employ a trainable control condition encoder that is enhanced by a large language model and a trainable Fusion-Net to encode and fuse the additional conditions while keeping the weights of the pre-trained text-to-audio model frozen. Due to the lack of suitable datasets and evaluation metrics, we consolidate existing datasets into a new dataset comprising the audio and corresponding conditions and use a series of evaluation metrics to evaluate the controllability performance. Experimental results demonstrate that our model successfully achieves fine-grained control to accomplish controllable audio generation. Audio samples and our dataset are publicly available at https://conditionaudiogen.github.io/conditionaudiogen/
    摘要 文本基于的音频生成模型有限制,因为它们无法包含所有音频中的信息,导致仅基于文本的控制性有限。为解决这个问题,我们提议一种新的模型,它可以增强现有的预训练文本到Audio模型的控制性,通过添加内容(时间戳)和风格(折射和能量折射)等补充条件。这种方法可以实现精细的控制音频的时间顺序、折射和能量。为保持生成的多样性,我们使用可训练的控制条件编码器,并通过大语言模型和可训练的融合网络来编码和融合其他条件,同时保持预训练文本到Audio模型的权重冻结。由于缺乏适当的数据集和评价指标,我们将现有数据集整合成一个新的数据集,包括音频和相应的条件,并使用一系列评价指标来评估控制性表现。实验结果表明,我们的模型成功实现了精细的控制,以实现可控的音频生成。音频样本和我们的数据集公共可在https://conditionaudiogen.github.io/conditionaudiogen/上下载。

Retail Demand Forecasting: A Comparative Study for Multivariate Time Series

  • paper_url: http://arxiv.org/abs/2308.11939
  • repo_url: None
  • paper_authors: Md Sabbirul Haque, Md Shahedul Amin, Jonayet Miah
  • for: 预测零售需求精度的研究,以提高企业的财务表现和供应链效率。
  • methods: 对时间序列顾客需求数据进行拓展,并利用macro经济变量(如Consumer Price Index、Index of Consumer Sentiment和失业率)进行预测。
  • results: 通过对不同回归和机器学习模型进行比较,实现 precisel retail demand prediction。
    Abstract Accurate demand forecasting in the retail industry is a critical determinant of financial performance and supply chain efficiency. As global markets become increasingly interconnected, businesses are turning towards advanced prediction models to gain a competitive edge. However, existing literature mostly focuses on historical sales data and ignores the vital influence of macroeconomic conditions on consumer spending behavior. In this study, we bridge this gap by enriching time series data of customer demand with macroeconomic variables, such as the Consumer Price Index (CPI), Index of Consumer Sentiment (ICS), and unemployment rates. Leveraging this comprehensive dataset, we develop and compare various regression and machine learning models to predict retail demand accurately.
    摘要 precisión del pronóstico de demanda en la industria de comercio al por mayor es un determinante crítico del rendimiento financiero y la eficiencia de la cadena de suministro. Con el aumento de la interconexión global, las empresas están dirigiéndose hacia modelos de predicción avanzados para obtener una ventaja competitiva. Sin embargo, la literatura existente se centra principalmente en los datos de ventas históricas y ignora la influencia vital de las condiciones macroeconómicas en el comportamiento de gasto de los consumidores. En este estudio, abrimos esta brecha utilizando variables macroeconómicas, como el Índice de Precios al Consumidor (CPI), el Índice de Sentimiento del Consumidor (ICS) y tasas de desempleo, para enriquecer los datos de tiempo series de la demanda de los clientes. Utilizando esta base de datos completa, desarrollamos y comparamos diferentes modelos de regresión y aprendizaje automático para predecir la demanda de retail de manera precisa.

System Identification for Continuous-time Linear Dynamical Systems

  • paper_url: http://arxiv.org/abs/2308.11933
  • repo_url: https://github.com/Jonas-Nicodemus/phdmd
  • paper_authors: Peter Halmos, Jonathan Pillow, David A. Knowles
  • for: 这篇论文旨在探讨 kalman 筛filter 的系统识别问题,以期减少 Observations 的时间间隔点击,以便在实际应用中更好地适用。
  • methods: 该论文使用 expectation-maximization (EM) 算法来学习底层系统的参数,并提出了一种新的 two-filter 解法,可以efficiently 计算 posterior 的 analytical 更新,不需要先行计算 forward-pass。
  • results: 该论文通过应用这种新的解法,成功地扩展了 kalman 筛filter 的学习范围,使其可以处理不规则的测量数据和偶极缺失值。此外,该论文还对 latent 线性动力系统 (LDS) 的学习进行扩展,并证明了这种扩展可以提高非线性系统的识别精度。
    Abstract The problem of system identification for the Kalman filter, relying on the expectation-maximization (EM) procedure to learn the underlying parameters of a dynamical system, has largely been studied assuming that observations are sampled at equally-spaced time points. However, in many applications this is a restrictive and unrealistic assumption. This paper addresses system identification for the continuous-discrete filter, with the aim of generalizing learning for the Kalman filter by relying on a solution to a continuous-time It\^o stochastic differential equation (SDE) for the latent state and covariance dynamics. We introduce a novel two-filter, analytical form for the posterior with a Bayesian derivation, which yields analytical updates which do not require the forward-pass to be pre-computed. Using this analytical and efficient computation of the posterior, we provide an EM procedure which estimates the parameters of the SDE, naturally incorporating irregularly sampled measurements. Generalizing the learning of latent linear dynamical systems (LDS) to continuous-time may extend the use of the hybrid Kalman filter to data which is not regularly sampled or has intermittent missing values, and can extend the power of non-linear system identification methods such as switching LDS (SLDS), which rely on EM for the linear discrete-time Kalman filter as a sub-unit for learning locally linearized behavior of a non-linear system. We apply the method by learning the parameters of a latent, multivariate Fokker-Planck SDE representing a toggle-switch genetic circuit using biologically realistic parameters, and compare the efficacy of learning relative to the discrete-time Kalman filter as the step-size irregularity and spectral-radius of the dynamics-matrix increases.
    摘要 system identification for the Kalman filter, which relies on the expectation-maximization (EM) procedure to learn the underlying parameters of a dynamical system, has been extensively studied assuming that observations are sampled at equally-spaced time points. However, in many applications, this assumption is too restrictive and unrealistic. This paper addresses system identification for the continuous-discrete filter, with the goal of generalizing the learning of the Kalman filter by relying on a solution to a continuous-time It\^o stochastic differential equation (SDE) for the latent state and covariance dynamics. We introduce a new two-filter, analytical form for the posterior, which is derived using Bayesian methods and yields analytical updates that do not require the forward-pass to be pre-computed. Using this analytical and efficient computation of the posterior, we provide an EM procedure that estimates the parameters of the SDE, naturally incorporating irregularly sampled measurements. By generalizing the learning of latent linear dynamical systems (LDS) to continuous-time, we can extend the use of the hybrid Kalman filter to data that is not regularly sampled or has intermittent missing values, and can also extend the power of non-linear system identification methods such as switching LDS (SLDS), which rely on EM for the linear discrete-time Kalman filter as a sub-unit for learning locally linearized behavior of a non-linear system. We apply the method by learning the parameters of a latent, multivariate Fokker-Planck SDE representing a toggle-switch genetic circuit using biologically realistic parameters, and compare the efficacy of learning relative to the discrete-time Kalman filter as the step-size irregularity and spectral-radius of the dynamics-matrix increases.

Dynamic landslide susceptibility mapping over recent three decades to uncover variations in landslide causes in subtropical urban mountainous areas

  • paper_url: http://arxiv.org/abs/2308.11929
  • repo_url: https://github.com/cli-de/d_lsm
  • paper_authors: Peifeng Ma, Li Chen, Chang Yu, Qing Zhu, Yulin Ding
  • for: 这项研究旨在提供一种能够考虑不同时间间隔内的滑坡风险评估方法,以便更好地mitigate landslide risks.
  • methods: 这项研究使用了多种预测模型来实现年度滑坡风险评估,并使用了SHAP来解释每个模型的输出和滑坡特征的排序。此外,研究还使用了MT-InSAR来增强和验证滑坡风险评估结果。
  • results: 研究发现,滑坡诱导因素主要是地形坡度和极端降雨,而滑坡诱导的变化主要归结于全球气候变化和香港政府实施的滑坡预防和 Mitigation Programme (LPMitP)。
    Abstract Landslide susceptibility assessment (LSA) is of paramount importance in mitigating landslide risks. Recently, there has been a surge in the utilization of data-driven methods for predicting landslide susceptibility due to the growing availability of aerial and satellite data. Nonetheless, the rapid oscillations within the landslide-inducing environment (LIE), primarily due to significant changes in external triggers such as rainfall, pose difficulties for contemporary data-driven LSA methodologies to accommodate LIEs over diverse timespans. This study presents dynamic landslide susceptibility mapping that simply employs multiple predictive models for annual LSA. In practice, this will inevitably encounter small sample problems due to the limited number of landslide samples in certain years. Another concern arises owing to the majority of the existing LSA approaches train black-box models to fit distinct datasets, yet often failing in generalization and providing comprehensive explanations concerning the interactions between input features and predictions. Accordingly, we proposed to meta-learn representations with fast adaptation ability using a few samples and gradient updates; and apply SHAP for each model interpretation and landslide feature permutation. Additionally, we applied MT-InSAR for LSA result enhancement and validation. The chosen study area is Lantau Island, Hong Kong, where we conducted a comprehensive dynamic LSA spanning from 1992 to 2019. The model interpretation results demonstrate that the primary factors responsible for triggering landslides in Lantau Island are terrain slope and extreme rainfall. The results also indicate that the variation in landslide causes can be primarily attributed to extreme rainfall events, which result from global climate change, and the implementation of the Landslip Prevention and Mitigation Programme (LPMitP) by the Hong Kong government.
    摘要 陡峰危险评估(LSA)对陡峰风险的控制具有极其重要的 Bedeutung。在最近几年,由于飞行器和卫星数据的可用性的增加,数据驱动方法在陡峰危险评估中得到了广泛的应用。然而,由于陡峰引发环境(LIE)的快速振荡,主要归因于外部触发因素的重要变化,如降水的变化,这使得当今的数据驱动LSA方法具有困难以适应LIE的问题。本研究提出了动态陡峰危险地图,使用多种预测模型来年度陡峰危险评估。在实践中,这将不可避免小样本问题,因为 certain years 的陡峰样本数量有限。另一个问题是,现有的LSA方法通常将黑盒模型训练到特定数据集上,然而这些模型在总体化和提供完整的输入特征和预测之间的交互关系的解释方面经常失败。为此,我们提议使用元学习来学习快速适应能力,使用少量样本和梯度更新来更新模型;并使用 SHAP 来对每个模型进行解释和陡峰特征排列。此外,我们还应用 MT-InSAR 来增强和验证 LSA 结果。我们选择的研究区是香港大屿山岛,我们在1992年至2019年之间进行了全面的动态陡峰危险评估。模型解释结果显示,陡峰岛上陡峰的主要诱因是地形坡度和极端降水。结果还表明,降水事件的变化可以归因于全球气候变化,以及香港政府实施的坡地防险mitigationProgramme(LPMitP)。

Solving Elliptic Optimal Control Problems using Physics Informed Neural Networks

  • paper_url: http://arxiv.org/abs/2308.11925
  • repo_url: None
  • paper_authors: Bangti Jin, Ramesh Sau, Luowei Yin, Zhi Zhou
  • for: 本文解决了优化控制问题(无框约束/带框约束)的线性和半线性二阶凝聚问题的数学解决方案。
  • methods: 本文基于优化控制问题的第一阶可能性系统,使用物理学 Informed Neural Networks (PINNs) 解决了 Coupled 系统。
  • results: 本文提供了深度神经网络参数(例如深度、宽度、参数范围)和采样点数量在域内和边界上的 $L^2(\Omega)$ 误差 bounds,并通过对比三种现有方法进行了数据例子。
    Abstract In this work, we present and analyze a numerical solver for optimal control problems (without / with box constraint) for linear and semilinear second-order elliptic problems. The approach is based on a coupled system derived from the first-order optimality system of the optimal control problem, and applies physics informed neural networks (PINNs) to solve the coupled system. We present an error analysis of the numerical scheme, and provide $L^2(\Omega)$ error bounds on the state, control and adjoint state in terms of deep neural network parameters (e.g., depth, width, and parameter bounds) and the number of sampling points in the domain and on the boundary. The main tools in the analysis include offset Rademacher complexity and boundedness and Lipschitz continuity of neural network functions. We present several numerical examples to illustrate the approach and compare it with three existing approaches.
    摘要 在这项工作中,我们提出和分析了一种数值解决方案 для优化控制问题(无框约束和带框约束)的线性和半线性第二阶几何问题。我们的方法基于优化控制问题的第一阶优化系统中 derivated 的 Coupled 系统,并使用物理学 Informed Neural Networks (PINNs) 解决这个 Coupled 系统。我们提供了深度神经网络参数(例如深度、宽度和约束)和域内和边界上样本点数量的 $L^2(\Omega)$ 误差分析,并提供了误差分析中的深度神经网络参数和样本点数量的关系。我们使用偏移Rademacher复杂度和 boundedness 和 Lipschitz连续性来分析神经网络函数。我们在数据中提供了多个实例来 illustrate 我们的方法,并与三种现有方法进行比较。

Diverse Policies Converge in Reward-free Markov Decision Processe

  • paper_url: http://arxiv.org/abs/2308.11924
  • repo_url: https://github.com/openrl-lab/diversepolicies
  • paper_authors: Fanqi Lin, Shiyu Huang, Weiwei Tu
  • for: 这篇论文是关于多样化强化学习的研究,旨在解决现有的多样化强化学习算法无法确定其扩散和效率的问题。
  • methods: 该论文提出了一个统一的多样化强化学习框架,并研究了多样化强化学习的训练过程是如何 converges。此外,他们还提出了一种可证明高效的多样化强化学习算法。
  • results: 经过数学实验,该论文证明了他们的方法的有效性和高效性。
    Abstract Reinforcement learning has achieved great success in many decision-making tasks, and traditional reinforcement learning algorithms are mainly designed for obtaining a single optimal solution. However, recent works show the importance of developing diverse policies, which makes it an emerging research topic. Despite the variety of diversity reinforcement learning algorithms that have emerged, none of them theoretically answer the question of how the algorithm converges and how efficient the algorithm is. In this paper, we provide a unified diversity reinforcement learning framework and investigate the convergence of training diverse policies. Under such a framework, we also propose a provably efficient diversity reinforcement learning algorithm. Finally, we verify the effectiveness of our method through numerical experiments.
    摘要 <> translate "Reinforcement learning has achieved great success in many decision-making tasks, and traditional reinforcement learning algorithms are mainly designed for obtaining a single optimal solution. However, recent works show the importance of developing diverse policies, which makes it an emerging research topic. Despite the variety of diversity reinforcement learning algorithms that have emerged, none of them theoretically answer the question of how the algorithm converges and how efficient the algorithm is. In this paper, we provide a unified diversity reinforcement learning framework and investigate the convergence of training diverse policies. Under such a framework, we also propose a provably efficient diversity reinforcement learning algorithm. Finally, we verify the effectiveness of our method through numerical experiments." into Simplified Chinese.这chter=" Reynforcement learning has achieved great success in many decision-making tasks, and traditional reinforcement learning algorithms are mainly designed for obtaining a single optimal solution. However, recent works show the importance of developing diverse policies, which makes it an emerging research topic. Despite the variety of diversity reinforcement learning algorithms that have emerged, none of them theoretically answer the question of how the algorithm converges and how efficient the algorithm is. In this paper, we provide a unified diversity reinforcement learning framework and investigate the convergence of training diverse policies. Under such a framework, we also propose a provably efficient diversity reinforcement learning algorithm. Finally, we verify the effectiveness of our method through numerical experiments.">以下是文章的中文翻译:这些年来,强化学习已经在许多决策任务中取得了很大的成功,但传统的强化学习算法仅针对单一的最佳解答。然而,最近的研究显示了发展多样化策略的重要性,使之成为研究的热门题目。 DESPITE 多样化强化学习算法的出现, none of them theoretically answer the question of how the algorithm converges and how efficient the algorithm is. In this paper, we provide a unified diversity reinforcement learning framework and investigate the convergence of training diverse policies. Under such a framework, we also propose a provably efficient diversity reinforcement learning algorithm. Finally, we verify the effectiveness of our method through numerical experiments.

Audio Difference Captioning Utilizing Similarity-Discrepancy Disentanglement

  • paper_url: http://arxiv.org/abs/2308.11923
  • repo_url: None
  • paper_authors: Daiki Takeuchi, Yasunori Ohishi, Daisuke Niizumi, Noboru Harada, Kunio Kashino
  • for: 本研究提出了一种新的扩展任务:音频差异描述(ADC),用于描述输入对的相似 yet 微妙不同的音频clip中的semantic差异。
  • methods: 我们提出了一种基于对比两个音频clip的cross-attention-concentrated transformer encoder,用于提取差异。此外,我们还提出了一种similarity-discrepancy disentanglement,用于在干扰空间中强调差异。
  • results: 我们使用AudioDiffCaps数据集进行实验,并证明了我们的方法可以有效解决ADC任务,并且可以提取差异。此外,我们还可以在transformer encoder中视见化差异。
    Abstract We proposed Audio Difference Captioning (ADC) as a new extension task of audio captioning for describing the semantic differences between input pairs of similar but slightly different audio clips. The ADC solves the problem that conventional audio captioning sometimes generates similar captions for similar audio clips, failing to describe the difference in content. We also propose a cross-attention-concentrated transformer encoder to extract differences by comparing a pair of audio clips and a similarity-discrepancy disentanglement to emphasize the difference in the latent space. To evaluate the proposed methods, we built an AudioDiffCaps dataset consisting of pairs of similar but slightly different audio clips with human-annotated descriptions of their differences. The experiment with the AudioDiffCaps dataset showed that the proposed methods solve the ADC task effectively and improve the attention weights to extract the difference by visualizing them in the transformer encoder.
    摘要 我们提出了对audio captioning的新扩展任务,即Audio Difference Captioning(ADC),用于描述两个相似但微小不同的音频片段之间的semantic differences。ADC解决了传统的音频描述 sometimes generates similar captions for similar audio clips, failing to describe the difference in content的问题。我们还提出了一个cross-attention-concentrated transformer encoder,用于比较两个音频片段并强调差异在 latent space 中。为了评估我们的方法,我们建立了AudioDiffCaps dataset,该dataset包含了两个相似但微小不同的音频片段,每个片段都有人题标的其差异。实验结果表明,我们的方法可以有效地解决ADC任务,并将注意力集中在 transformer encoder 中强调差异。

Addressing Selection Bias in Computerized Adaptive Testing: A User-Wise Aggregate Influence Function Approach

  • paper_url: http://arxiv.org/abs/2308.11912
  • repo_url: https://github.com/riiid/useraif
  • paper_authors: Soonwoo Kwon, Sojung Kim, Seunghyun Lee, Jin-Young Kim, Suyeong An, Kyuseok Kim
  • for: 这paper的目的是提出一种基于CAT服务的测试数据的项目profile生成方法,以便提高CAT的效率和准确性。
  • methods: 该方法利用CAT服务中收集的响应数据,首先通过训练一个诊断模型来生成项目profile。然而,由于CAT服务中的选择性偏见问题,这些数据具有明显的偏见特征,导致生成的项目profile与真实值有很大差异。为了解决这个问题,该方法提议使用用户wise总影响函数方法,即对每个用户的响应数据进行权重平衡,以便减少数据中偏见的影响。
  • results: 该方法在三个公共数据集和一个真实世界CAT响应数据集上进行了广泛的实验,并证明了其超越了传统方法的性能。
    Abstract Computerized Adaptive Testing (CAT) is a widely used, efficient test mode that adapts to the examinee's proficiency level in the test domain. CAT requires pre-trained item profiles, for CAT iteratively assesses the student real-time based on the registered items' profiles, and selects the next item to administer using candidate items' profiles. However, obtaining such item profiles is a costly process that involves gathering a large, dense item-response data, then training a diagnostic model on the collected data. In this paper, we explore the possibility of leveraging response data collected in the CAT service. We first show that this poses a unique challenge due to the inherent selection bias introduced by CAT, i.e., more proficient students will receive harder questions. Indeed, when naively training the diagnostic model using CAT response data, we observe that item profiles deviate significantly from the ground-truth. To tackle the selection bias issue, we propose the user-wise aggregate influence function method. Our intuition is to filter out users whose response data is heavily biased in an aggregate manner, as judged by how much perturbation the added data will introduce during parameter estimation. This way, we may enhance the performance of CAT while introducing minimal bias to the item profiles. We provide extensive experiments to demonstrate the superiority of our proposed method based on the three public datasets and one dataset that contains real-world CAT response data.
    摘要 计算机化适应测试(CAT)是一种广泛使用的高效测试模式,可以根据测试对象的水平进行适应。CAT需要预训练的项profile,CAT每次评估学生的实时水平,并根据注册的项profile选择下一个测试项。然而,获得这些项profile的成本很高,需要收集大量的精密的项响应数据,然后在这些数据上训练诊断模型。在这篇论文中,我们explore使用CAT服务中收集的响应数据来获得项profile。我们首先发现,这会带来一种困难,即CAT中的选择偏见,即更有 talent的学生会接受更加困难的问题。实际上,当直接使用CAT响应数据训练诊断模型时,我们发现itemprofile差异 significatively From the ground truth。为了解决选择偏见问题,我们提议使用用户wise总影响函数方法。我们的直觉是,过滤掉响应数据中受到极大偏见的用户,以judged by how much perturbation the added data will introduce during parameter estimation。这样,我们可以增强CAT的性能,而不是引入最小的偏见到项profile中。我们提供了广泛的实验来证明我们的提议的优越性,基于三个公共数据集和一个包含实际CAT响应数据的数据集。

Diagnosing Infeasible Optimization Problems Using Large Language Models

  • paper_url: http://arxiv.org/abs/2308.12923
  • repo_url: None
  • paper_authors: Hao Chen, Gonzalo E. Constante-Flores, Can Li
  • for: 该论文旨在帮助实践者更好地理解和解释不可能的优化模型,尤其是当这些模型是不可能的时候。
  • methods: 该论文提出了一种基于自然语言的系统,名为OptiChat,可以与用户进行互动对话,描述优化模型,识别可能导致不可能性的源头,并提供修改模型以解决这些问题的建议。
  • results: 实验表明,OptiChat可以帮助专家和非专家用户更好地理解优化模型,快速地识别不可能性的来源,并提供修改模型以解决这些问题。
    Abstract Decision-making problems can be represented as mathematical optimization models, finding wide applications in fields such as economics, engineering and manufacturing, transportation, and health care. Optimization models are mathematical abstractions of the problem of making the best decision while satisfying a set of requirements or constraints. One of the primary barriers to deploying these models in practice is the challenge of helping practitioners understand and interpret such models, particularly when they are infeasible, meaning no decision satisfies all the constraints. Existing methods for diagnosing infeasible optimization models often rely on expert systems, necessitating significant background knowledge in optimization. In this paper, we introduce OptiChat, a first-of-its-kind natural language-based system equipped with a chatbot GUI for engaging in interactive conversations about infeasible optimization models. OptiChat can provide natural language descriptions of the optimization model itself, identify potential sources of infeasibility, and offer suggestions to make the model feasible. The implementation of OptiChat is built on GPT-4, which interfaces with an optimization solver to identify the minimal subset of constraints that render the entire optimization problem infeasible, also known as the Irreducible Infeasible Subset (IIS). We utilize few-shot learning, expert chain-of-thought, key-retrieve, and sentiment prompts to enhance OptiChat's reliability. Our experiments demonstrate that OptiChat assists both expert and non-expert users in improving their understanding of the optimization models, enabling them to quickly identify the sources of infeasibility.
    摘要 Translated into Simplified Chinese:决策问题可以表示为数学优化模型,在经济、工程和生产、交通和医疗等领域找到广泛应用。优化模型是决策问题的数学抽象,它们需要决策者满足一系列要求或限制。但现有的优化模型 диагностика方法通常需要很多背景知识,这成为实际应用中的一个主要障碍。在这篇论文中,我们介绍了 OptiChat,一个基于自然语言的系统,它通过一个 chatbot GUI 进行互动对话,以帮助用户理解和解释不可行的优化模型。OptiChat可以提供优化模型的自然语言描述,识别可能的不可行性来源,并提供修改模型以使其可行的建议。OptiChat 的实现基于 GPT-4,它通过与优化解手机接口,以确定整个优化问题中的最小不可行subset,也就是不可行最小集(IIS)。我们使用了 few-shot learning、专家链条思维、关键检索和 sentiment 提示来增强 OptiChat 的可靠性。我们的实验表明,OptiChat 可以帮助专家和非专家用户更好地理解优化模型,快速地识别不可行性的来源。

Utilizing Admissible Bounds for Heuristic Learning

  • paper_url: http://arxiv.org/abs/2308.11905
  • repo_url: None
  • paper_authors: Carlos Núñez-Molina, Masataro Asai
  • for: 本研究旨在提高前向搜索算法中的规划函数学习,通过现代机器学习技术。
  • methods: 本研究使用截断 Gaussian 分布来 parameterize 可靠规划函数,这将减小假设空间, faithful 地实现最大Entropy原则。
  • results: 实验表明,使用可靠规划函数可以获得更高精度的规划函数,并且更快地训练。
    Abstract While learning a heuristic function for forward search algorithms with modern machine learning techniques has been gaining interest in recent years, there has been little theoretical understanding of \emph{what} they should learn, \emph{how} to train them, and \emph{why} we do so. This lack of understanding leads to various literature performing an ad-hoc selection of datasets (suboptimal vs optimal costs or admissible vs inadmissible heuristics) and optimization metrics (e.g., squared vs absolute errors). Moreover, due to the lack of admissibility of the resulting trained heuristics, little focus has been put on the role of admissibility \emph{during} learning. This paper articulates the role of admissible heuristics in supervised heuristic learning using them as parameters of Truncated Gaussian distributions, which tightens the hypothesis space compared to ordinary Gaussian distributions. We argue that this mathematical model faithfully follows the principle of maximum entropy and empirically show that, as a result, it yields more accurate heuristics and converges faster during training.
    摘要 Recently, there has been growing interest in using modern machine learning techniques to learn heuristic functions for forward search algorithms. However, there has been little theoretical understanding of what these heuristics should learn, how to train them, and why we do so. This lack of understanding has led to various literature using ad-hoc selection of datasets and optimization metrics, such as suboptimal vs optimal costs or admissible vs inadmissible heuristics. Moreover, due to the lack of admissibility of the resulting trained heuristics, little focus has been put on the role of admissibility during learning.This paper explores the role of admissible heuristics in supervised heuristic learning, using them as parameters of Truncated Gaussian distributions. This approach tightens the hypothesis space compared to ordinary Gaussian distributions, and faithfully follows the principle of maximum entropy. We empirically show that this approach yields more accurate heuristics and converges faster during training.

Rethinking Data Perturbation and Model Stabilization for Semi-supervised Medical Image Segmentation

  • paper_url: http://arxiv.org/abs/2308.11903
  • repo_url: https://github.com/zhenzhao/dpms
  • paper_authors: Zhen Zhao, Ye Liu, Meng Zhao, Di Yin, Yixuan Yuan, Luping Zhou
    for: 本研究主要是为了提高 semi-supervised medical image segmentation(SSMIS)的性能。methods: 本研究提出了一种简单 yet effective的方法,称为 DPMS,以提高 SSMIS 性能。DPMS 采用了一种教师-学生框架,并使用了标准的监督损失和无监督一致损失。为生成足够的预测差异,DPMS 对未标注数据进行了强大的扩展和增强。同时,DPMS 还使用了一种forwarding-twice和 momentum 更新策略来正常化统计信息,以确保在未标注数据上进行有效的训练。results: DPMS 在公共的 2D ACDC 和 3D LA 数据集上 across 多种 semi-supervised 设置中 obtained 新的 state-of-the-art 性能。例如,DPMS 在 ACDC 数据集上使用 5% 标签时获得了 remarkable 22.62% 的提高,胜过了之前的 SOTA。
    Abstract Studies on semi-supervised medical image segmentation (SSMIS) have seen fast progress recently. Due to the limited labelled data, SSMIS methods mainly focus on effectively leveraging unlabeled data to enhance the segmentation performance. However, despite their promising performance, current state-of-the-art methods often prioritize integrating complex techniques and loss terms rather than addressing the core challenges of semi-supervised scenarios directly. We argue that the key to SSMIS lies in generating substantial and appropriate prediction disagreement on unlabeled data. To this end, we emphasize the crutiality of data perturbation and model stabilization in semi-supervised segmentation, and propose a simple yet effective approach to boost SSMIS performance significantly, dubbed DPMS. Specifically, we first revisit SSMIS from three distinct perspectives: the data, the model, and the loss, and conduct a comprehensive study of corresponding strategies to examine their effectiveness. Based on these examinations, we then propose DPMS, which adopts a plain teacher-student framework with a standard supervised loss and unsupervised consistency loss. To produce appropriate prediction disagreements, DPMS perturbs the unlabeled data via strong augmentations to enlarge prediction disagreements considerably. On the other hand, using EMA teacher when strong augmentation is applied does not necessarily improve performance. DPMS further utilizes a forwarding-twice and momentum updating strategies for normalization statistics to stabilize the training on unlabeled data effectively. Despite its simplicity, DPMS can obtain new state-of-the-art performance on the public 2D ACDC and 3D LA datasets across various semi-supervised settings, e.g. obtaining a remarkable 22.62% improvement against previous SOTA on ACDC with 5% labels.
    摘要 研究 semi-supervised medical image segmentation (SSMIS) 在最近几年得到了快速进步。由于受限制的标注数据,SSMIS 方法主要是利用无标注数据来提高分类性能。然而,当前状态的艺术方法经常强调复杂的技术和损失函数,而不是直接面临半超级vised scenario 的核心挑战。我们认为,SSMIS 的关键在于生成足够和合适的预测冲突。为此,我们强调数据扰动和模型稳定在半超级vised分类中的重要性,并提出了一种简单 yet effective 的方法来提高 SSMIS 性能,称为 DPMS。我们首先从数据、模型和损失三个角度重新评估 SSMIS,并进行了广泛的研究相应策略的效果。根据这些研究,我们然后提出了 DPMS,它采用了一种简单的教师-学生框架,并采用标准的超级vised损失函数和无标注一致损失函数。为生成足够的预测冲突,DPMS 在无标注数据上应用强大的扩展来增加预测冲突至多。在另一边,使用 EMA 教师当强大扩展应用时,不一定能提高性能。DPMS 还使用了前进 twice 和积分更新策略来减少无标注数据的训练不稳定。尽管简单,DPMS 可以在多个半超级vised设置下获得新的状态的艺术性能,例如在 ACDC 和 LA datasets 上获得了22.62%的提高。

Shape-conditioned 3D Molecule Generation via Equivariant Diffusion Models

  • paper_url: http://arxiv.org/abs/2308.11890
  • repo_url: None
  • paper_authors: Ziqi Chen, Bo Peng, Srinivasan Parthasarathy, Xia Ning
  • for: 这个论文的目的是用计算机方法来设计新的药物,具体来说是通过设计与已知活性分子相似的三维分子结构来找到新的药物候选者。
  • methods: 这个论文使用了一种叫做ShapeMol的方法,它是一种基于equivariant shape encoder和equivariant diffusion模型的三维分子生成方法。这种方法可以根据给定的分子形状生成新的三维分子结构,保持与给定分子的形状相似。
  • results: 实验结果表明,ShapeMol可以生成新、多样化的药物分子结构,这些结构与已知活性分子的三维形状相似。这些结果表明ShapeMol可以用于设计需要的三维形状卷积到蛋白质目标袋中的药物候选者。
    Abstract Ligand-based drug design aims to identify novel drug candidates of similar shapes with known active molecules. In this paper, we formulated an in silico shape-conditioned molecule generation problem to generate 3D molecule structures conditioned on the shape of a given molecule. To address this problem, we developed a translation- and rotation-equivariant shape-guided generative model ShapeMol. ShapeMol consists of an equivariant shape encoder that maps molecular surface shapes into latent embeddings, and an equivariant diffusion model that generates 3D molecules based on these embeddings. Experimental results show that ShapeMol can generate novel, diverse, drug-like molecules that retain 3D molecular shapes similar to the given shape condition. These results demonstrate the potential of ShapeMol in designing drug candidates of desired 3D shapes binding to protein target pockets.
    摘要 <>利ганд基于药物设计的目标是找到与已知活药分子相似的新药候选者。在这篇论文中,我们形ulated了一个在 silico 形态条件生成问题,用于生成具有给定分子形态的3D分子结构。为解决这个问题,我们开发了一种具有变换对称性的形态导向生成模型ShapeMol。ShapeMol包括一种具有变换对称性的形态编码器,该编码器将分子表面形态映射到幂 embeddings 中,以及一种具有变换对称性的扩散模型,该模型基于这些幂 embeddings 生成3D分子结构。实验结果表明,ShapeMol 可以生成新、多样、药理活药分子,其3D分子结构与给定形态条件相似。这些结果表明ShapeMol 在设计愿意 binding 到蛋白质目标孔的药物候选者中具有潜在的应用价值。Note: "Simplified Chinese" is a romanization of the Chinese language that uses a simplified set of characters and pronunciation. It is commonly used in mainland China and Singapore.

Adversarial Training Using Feedback Loops

  • paper_url: http://arxiv.org/abs/2308.11881
  • repo_url: None
  • paper_authors: Ali Haisam Muhammad Rafid, Adrian Sandu
  • for: 防御黑客攻击,建立高度抗性的深度神经网络
  • methods: 基于控制理论的Feedback Neural Networks架构,及其相应的反向循环对抗训练方法(FLAT)
  • results: 对标准测试问题进行数值实验,显示FLAT方法比现状技术更有效地防御黑客攻击
    Abstract Deep neural networks (DNN) have found wide applicability in numerous fields due to their ability to accurately learn very complex input-output relations. Despite their accuracy and extensive use, DNNs are highly susceptible to adversarial attacks due to limited generalizability. For future progress in the field, it is essential to build DNNs that are robust to any kind of perturbations to the data points. In the past, many techniques have been proposed to robustify DNNs using first-order derivative information of the network. This paper proposes a new robustification approach based on control theory. A neural network architecture that incorporates feedback control, named Feedback Neural Networks, is proposed. The controller is itself a neural network, which is trained using regular and adversarial data such as to stabilize the system outputs. The novel adversarial training approach based on the feedback control architecture is called Feedback Looped Adversarial Training (FLAT). Numerical results on standard test problems empirically show that our FLAT method is more effective than the state-of-the-art to guard against adversarial attacks.
    摘要

SUMMIT: Source-Free Adaptation of Uni-Modal Models to Multi-Modal Targets

  • paper_url: http://arxiv.org/abs/2308.11880
  • repo_url: https://github.com/csimo005/summit
  • paper_authors: Cody Simons, Dripta S. Raychaudhuri, Sk Miraj Ahmed, Suya You, Konstantinos Karydis, Amit K. Roy-Chowdhury
  • for: 这个论文的目的是如何在多Modal数据下实现Scene理解,以便在各种应用中实现自主导航等多Modal数据应用。
  • methods: 这个论文使用了一种 switching 框架,通过自动选择两种跨Modal pseudo-label 融合方法(协调 Filtering 和 Entropy Weighting)来适应目标频谱中的频谱差。
  • results: 实验结果表明,这种方法可以在七个挑战性的适应场景中达到与有源数据 assumptions 的结果,并在一些场景下超越竞争性的基准值。特别是,这种方法可以提高 mIoU 指标的表现,最高提高12%。
    Abstract Scene understanding using multi-modal data is necessary in many applications, e.g., autonomous navigation. To achieve this in a variety of situations, existing models must be able to adapt to shifting data distributions without arduous data annotation. Current approaches assume that the source data is available during adaptation and that the source consists of paired multi-modal data. Both these assumptions may be problematic for many applications. Source data may not be available due to privacy, security, or economic concerns. Assuming the existence of paired multi-modal data for training also entails significant data collection costs and fails to take advantage of widely available freely distributed pre-trained uni-modal models. In this work, we relax both of these assumptions by addressing the problem of adapting a set of models trained independently on uni-modal data to a target domain consisting of unlabeled multi-modal data, without having access to the original source dataset. Our proposed approach solves this problem through a switching framework which automatically chooses between two complementary methods of cross-modal pseudo-label fusion -- agreement filtering and entropy weighting -- based on the estimated domain gap. We demonstrate our work on the semantic segmentation problem. Experiments across seven challenging adaptation scenarios verify the efficacy of our approach, achieving results comparable to, and in some cases outperforming, methods which assume access to source data. Our method achieves an improvement in mIoU of up to 12% over competing baselines. Our code is publicly available at https://github.com/csimo005/SUMMIT.
    摘要 Scene理解使用多Modal数据是必需的许多应用程序中,例如自动驾驶。为实现这些应用程序中的多种情况,现有的模型需要能够适应数据分布的变化而无需较大的数据注释。现有的方法假设源数据在适应过程中可用,并且假设源数据是 paired multiModal 数据。这两个假设可能会对许多应用程序造成问题。源数据可能因为隐私、安全或经济问题而不可用。假设存在 paired multiModal 数据进行训练也会产生大量的数据收集成本,而且不使用广泛可用的免费分布的uniModal 模型。在这个工作中,我们relax这两个假设,解决将多个独立训练在 uniModal 数据上的模型适应到目标领域中的问题,无需访问原始源数据集。我们提出的方法通过一个 switching 框架来自动选择 между两种 complementary 方法的 cross-modal pseudo-label fusion -- agreement filtering 和 entropy weighting -- 根据估计的领域差异。我们在 semantic segmentation 问题上进行了实验,并在七个具有挑战性的适应情况中证明了我们的方法的有效性,与许多方法相比,我们的方法可以达到更高的 mIoU 改进率,最高达到 12%。我们的代码可以在 上获取。

Cabrita: closing the gap for foreign languages

  • paper_url: http://arxiv.org/abs/2308.11878
  • repo_url: None
  • paper_authors: Celio Larcher, Marcos Piau, Paulo Finardi, Pedro Gengo, Piero Esposito, Vinicius Caridá
    for: 这个研究是为了提高特定语言或领域上模型的性能,以及确保有效的分词。methods: 该研究使用了自scratch开始训练模型,并采用了一种名为Cabrita的方法来解决成本问题。results: 研究表明,使用Cabrita方法可以在低成本下提高模型的性能和Tokenization的效率,并且在少量学习任务中实现了类似的结果,与传统连续预训练方法和英语7B模型预训练模型相比。
    Abstract The strategy of training the model from scratch in a specific language or domain serves two essential purposes: i) enhancing performance in the particular linguistic or domain context, and ii) ensuring effective tokenization. The main limitation inherent to this approach lies in the associated cost, which can reach six to seven-digit dollar values, depending on the model size and the number of parameters involved. The main solution to overcome the cost challenge is to rely on available pre-trained models, which, despite recent advancements such as the LLaMA and LLaMA-2 models, still demonstrate inefficiency for certain specific domain problems or prove ineffective in scenarios involving conversational memory resources, given the large number of tokens required to represent text. To overcome this issue, we present a methodology named Cabrita, which, as our research demonstrates, successfully addresses the performance and efficient tokenization problem, all at an affordable cost. We believe that this methodology can be applied to any transformer-like architecture model. To validate the study, we conducted continuous pre-training exclusively using Portuguese text on a 3-billion-parameter model known as OpenLLaMA, resulting in a model named openCabrita 3B. The openCabrita 3B also features a new tokenizer that results in a significant reduction in the number of tokens required to represent the text. In our assessment, for few-shot learning tasks, we achieved similar results with this 3B model compared to a traditional continuous pre-training approach as well as to 7B models English pre-trained models.
    摘要 这种训练模型从零开始的策略在特定语言或领域中服务两个重要目的:一是提高特定语言或领域上表现,二是确保有效的分词。这种方法的主要局限性在于相关的成本,可以达到6到7位数字的值,具体取决于模型的大小和参数的数量。 为了解决成本挑战,我们可以依靠可用的预训练模型,尽管最近的进展,如LLaMA和LLaMA-2模型,仍然在特定领域问题上不具有效果,或者在需要大量字符的场景下表现不佳。 为此,我们提出了一种方法named Cabrita,该方法成功解决表现和有效分词问题,并且在成本上具有可持续性。我们认为这种方法可以应用于任何 transformer-like 架构模型。为验证研究,我们进行了独特的连续预训练,仅使用葡萄牙文本,在一个3亿参数的模型上进行了openCabrita 3B。openCabrita 3B还拥有一个新的分词器,该分词器可以减少表示文本的字符数量。在我们的评估中,对少量学习任务,我们使用3B模型和传统连续预训练方法以及7B英文预训练模型获得了类似的结果。

Integrating Large Language Models into the Debugging C Compiler for generating contextual error explanations

  • paper_url: http://arxiv.org/abs/2308.11873
  • repo_url: https://github.com/comp1511unsw/dcc
  • paper_authors: Andrew Taylor, Alexandra Vassar, Jake Renzella, Hammond Pearce
  • for: 这篇论文旨在使用大语言模型(LLM)生成改进的编译器错误说明,以便为 beginner 程序员提供更好的学习体验。
  • methods: 该论文使用了 DCC(Debugging C Compiler)和 LLM(大语言模型)组合,以生成在编译时和运行时的错误说明。
  • results: 经过专家评估,LLM生成的错误说明准确率为90%(编译时)和75%(运行时),而 DCC-help 工具也在学生中得到了普遍的承认和使用。
    Abstract This paper introduces a method for Large Language Models (LLM) to produce enhanced compiler error explanations, in simple language, within our Debugging C Compiler (DCC). It is well documented that compiler error messages have been known to present a barrier for novices learning how to program. Although our initial use of DCC in introductory programming (CS1) has been instrumental in teaching C to novice programmers by providing safeguards to commonly occurring errors and translating the usually cryptic compiler error messages at both compile- and run-time, we proposed that incorporating LLM-generated explanations would further enhance the learning experience for novice programmers. Through an expert evaluation, we observed that LLM-generated explanations for compiler errors were conceptually accurate in 90% of compile-time errors, and 75% of run-time errors. Additionally, the new DCC-help tool has been increasingly adopted by students, with an average of 1047 unique runs per week, demonstrating a promising initial assessment of using LLMs to complement compiler output to enhance programming education for beginners. We release our tool as open-source to the community.
    摘要

Fast Exact NPN Classification with Influence-aided Canonical Form

  • paper_url: http://arxiv.org/abs/2308.12311
  • repo_url: None
  • paper_authors: Yonghe Zhang, Liwei Ni, Jiaxi Zhang, Guojie Luo, Huawei Li, Shenggen Zheng
  • for: 这篇论文主要用于提高NPNN类型的分类算法,即使用Boolean影响来加速NPNN类型的分类。
  • methods: 该论文提出了一种新的canoical form和其计算算法,通过引入Boolean影响来简化NPNN类型的canoical form构造和计算。
  • results: 实验结果表明,在使用Boolean影响的情况下,可以大幅提高NPNN类型的分类速度,相比之前的算法,可以达到5.5倍的提高。
    Abstract NPN classification has many applications in the synthesis and verification of digital circuits. The canonical-form-based method is the most common approach, designing a canonical form as representative for the NPN equivalence class first and then computing the transformation function according to the canonical form. Most works use variable symmetries and several signatures, mainly based on the cofactor, to simplify the canonical form construction and computation. This paper describes a novel canonical form and its computation algorithm by introducing Boolean influence to NPN classification, which is a basic concept in analysis of Boolean functions. We show that influence is input-negation-independent, input-permutation-dependent, and has other structural information than previous signatures for NPN classification. Therefore, it is a significant ingredient in speeding up NPN classification. Experimental results prove that influence plays an important role in reducing the transformation enumeration in computing the canonical form. Compared with the state-of-the-art algorithm implemented in ABC, our influence-aided canonical form for exact NPN classification gains up to 5.5x speedup.
    摘要

KinSPEAK: Improving speech recognition for Kinyarwanda via semi-supervised learning methods

  • paper_url: http://arxiv.org/abs/2308.11863
  • repo_url: None
  • paper_authors: Antoine Nzeyimana
  • for: 这个论文是为了提高基尼亚拉各语音识别的精度和可靠性而写的。
  • methods: 这个论文使用了自然语言处理技术,包括自助预训练、简单的课程学习计划和半监督学习,以利用大量未标注的语音数据来提高基尼亚拉各语音识别的性能。
  • results: 这个论文的实验结果显示,使用公共领域数据only,收集了一个新的studio质量语音dataset,然后使用这个清洁基线模型来评估和排序来自更加多样化和噪音的公共数据集中的示例,并在四个Successive Generation中应用半监督学习来标注和学习大量未标注的语音数据,最终实现了3.2%的单词错误率(WER)和15.9%的WER在Mozilla Common Voice数据集上,与当前最佳状态相比。此外,实验还表明,使用 syllabic 而不是字符基本的分词可以提高基尼亚拉各语音识别的性能。
    Abstract Despite recent availability of large transcribed Kinyarwanda speech data, achieving robust speech recognition for Kinyarwanda is still challenging. In this work, we show that using self-supervised pre-training, following a simple curriculum schedule during fine-tuning and using semi-supervised learning to leverage large unlabelled speech data significantly improve speech recognition performance for Kinyarwanda. Our approach focuses on using public domain data only. A new studio-quality speech dataset is collected from a public website, then used to train a clean baseline model. The clean baseline model is then used to rank examples from a more diverse and noisy public dataset, defining a simple curriculum training schedule. Finally, we apply semi-supervised learning to label and learn from large unlabelled data in four successive generations. Our final model achieves 3.2% word error rate (WER) on the new dataset and 15.9% WER on Mozilla Common Voice benchmark, which is state-of-the-art to the best of our knowledge. Our experiments also indicate that using syllabic rather than character-based tokenization results in better speech recognition performance for Kinyarwanda.
    摘要 尽管最近有大量采用Kinyarwanda语音训练数据的可用性,但实现Kinyarwanda语音识别仍然是一项挑战。在这项工作中,我们展示了使用自动预训练、在精度调教中采用简单的课程时间表,以及使用半监督学习来利用大量无标语音数据,可以显著提高Kinyarwanda语音识别性能。我们的方法仅使用公共领域数据。我们收集了一个新的 studio-quality 语音数据集,然后使用这个数据集来训练一个清洁基线模型。然后,我们使用这个清洁基线模型来排序从更多和更加噪音的公共数据集中的例子,定义了一个简单的课程时间表。最后,我们应用半监督学习来标注和学习大量无标语音数据,在四个成功的代代中进行了四次生成。我们的最终模型在新数据集上达到了3.2%的单词错误率(WER),在Mozilla Common Voice标准测试上达到了15.9%的WER,这与我们所知道的状态太阳相当。我们的实验也表明,使用 syllabic 而不是字符基本的分词可以提高Kinyarwanda语音识别性能。

Finding the Perfect Fit: Applying Regression Models to ClimateBench v1.0

  • paper_url: http://arxiv.org/abs/2308.11854
  • repo_url: None
  • paper_authors: Anmol Chaure, Ashok Kumar Behera, Sudip Bhattacharya
  • for: 本研究旨在评估非线性回归模型在气候数据预测中的表现,以便为政策制定者提供有用的信息。
  • methods: 本研究使用了数据驱动机器学习模型作为气候模拟器,以减少计算机中的时间和碳脚印。在这个方向下,气候Bench(ClimateBench)是一个最近被整理的气候数据 benchmarking 集合,用于评估机器学习模型的表现。
  • results: 研究发现,尽管被视为基本的,回归模型在气候预测中具有多种优势,特别是通过kernel trick,回归模型可以捕捉复杂的关系,提高预测能力。在这些非线性回归模型中, Gaussian Process Regressor 表现最佳,其在标准评估指标上的表现比其他两种模型更好。然而, Gaussian Process Regression 具有空间和时间复杂性的缺点。相比之下, Support Vector 和 Kernel Ridge 模型也提供了竞争力的结果,但需要解决一些负担。此外,我们还在活动地 investigating composite kernels 和 variational inference 等技术,以提高回归模型的表现和有效地模型复杂非线性 patrerns,包括降水现象。
    Abstract Climate projections using data driven machine learning models acting as emulators, is one of the prevailing areas of research to enable policy makers make informed decisions. Use of machine learning emulators as surrogates for computationally heavy GCM simulators reduces time and carbon footprints. In this direction, ClimateBench [1] is a recently curated benchmarking dataset for evaluating the performance of machine learning emulators designed for climate data. Recent studies have reported that despite being considered fundamental, regression models offer several advantages pertaining to climate emulations. In particular, by leveraging the kernel trick, regression models can capture complex relationships and improve their predictive capabilities. This study focuses on evaluating non-linear regression models using the aforementioned dataset. Specifically, we compare the emulation capabilities of three non-linear regression models. Among them, Gaussian Process Regressor demonstrates the best-in-class performance against standard evaluation metrics used for climate field emulation studies. However, Gaussian Process Regression suffers from being computational resource hungry in terms of space and time complexity. Alternatively, Support Vector and Kernel Ridge models also deliver competitive results and but there are certain trade-offs to be addressed. Additionally, we are actively investigating the performance of composite kernels and techniques such as variational inference to further enhance the performance of the regression models and effectively model complex non-linear patterns, including phenomena like precipitation.
    摘要 климакс预测使用数据驱动机器学习模型作为模拟器,是当前研究领域的一个主要方向,以便政策 makers 做出了 informed 决策。 使用机器学习模拟器作为计算机中心 GCM 模拟器的代理,可以降低时间和碳脚印。在这个方向下,ClimateBench 是最近筹集的气候数据 evaluating 性能的机器学习模拟器的 benchmarking 数据集。 最近的研究表明,尽管被视为基本的,但是回归模型在气候模拟方面具有多种优势。具体来说,通过键ernel trick,回归模型可以捕捉复杂的关系,提高预测能力。这个研究将关注评估非线性回归模型的表现。特别是,我们将比较三种非线性回归模型的预测能力。其中, Gaussian Process Regressor 表现出了对气候场模拟研究中常用的标准评估指标的最佳表现。然而, Gaussian Process Regression 具有空间和时间复杂度的计算资源占用问题。相比之下, Support Vector 和 Kernel Ridge 模型也可以提供竞争力,但是需要解决一些负担。此外,我们还在 актив探索 composite kernels 和 variational inference 等技术,以提高回归模型的表现和有效地模拟复杂的非线性模式,包括降水现象。

A deep reinforcement learning approach for real-time demand-responsive railway rescheduling to mitigate station overcrowding using mobile data

  • paper_url: http://arxiv.org/abs/2308.11849
  • repo_url: None
  • paper_authors: Enze Liu, Zhiyuan Lin, Judith Y. T. Wang, Hong Chen
  • for: 提供一种基于实时旅客流动数据的铁路规划调整方法,以应对铁路紧急事件引起的长期干扰。
  • methods: 使用深度优化学习(DRL)框架,结合实时旅客流动数据,对多个路线的列车时间表进行调整,以满足实时旅客需求,避免站点拥堵和列车过载。
  • results: 提出了一种基于实时旅客流动数据的铁路规划调整方法,可以有效地应对铁路紧急事件引起的长期干扰,并且能够满足实时旅客需求。
    Abstract Real-time railway rescheduling is a timely and flexible technique to automatically alter the operation schedule in response to time-varying conditions. Current research lacks data-driven approaches that capture real-time passenger mobility during railway disruptions, relying mostly on OD-based data and model-based methods for estimating demands of trains. Meanwhile, the schedule-updating principles for a long-term disruption overlook the uneven distribution of demand over time. To fill this gap, this paper proposes a demand-responsive approach by inferring real-world passenger mobility from mobile data (MD) to facilitate real-time rescheduling. Unlike network-level approaches, this paper focuses on a heavy-demand station upstream of the disrupted area. The objective is to reschedule all trains on multiple routes passing through this target station, which have been affected by a severe emergency event such as a natural disaster. Particular attention should be given to avoiding the accumulation of overcrowded passengers at this station, to prevent additional accidents arising from overcrowding. This research addresses the challenges associated with this scenario, including the dynamics of arriving and leaving of passengers, station overcrowding, rolling stock shortage, open-ended disruption duration, integrated rescheduling on multiple routes, and delays due to detours. A deep reinforcement learning (DRL) framework is proposed to determine the optimal rescheduled timetable, route stops, and rolling stock allocation, while considering real-time demand satisfaction, station overcrowding, train capacity utilization, and headway safety.
    摘要 现实时铁路重新计划是一种时间可靠和灵活的技术,可以在应对时间变化的条件下自动修改运营计划。当前的研究缺乏基于数据驱动的方法,可以捕捉实时铁路干线上的乘客流动,而是主要依赖于 Origine-Destination 数据和模型基于方法来估算列车需求。此外,长期紧急情况下的计划更新原则忽略了时间不均的需求分布。为了填补这个空白,本文提出了一种需求回应方法,通过推理出实际世界中乘客流动的 mobile 数据 (MD) 来支持实时重新计划。与传统网络级方法不同,本文将关注被紧急事件such as 自然灾害所摧毁的重要铁路站点。目标是重新计划通过这个目标站点的多个路线的列车,并且在紧急情况下进行实时重新计划。特别是要避免在这个站点上积累的乘客过多,以避免由过度填满引起的进一步事故。本研究面临的挑战包括乘客到达和离开站点的动态、站点拥堵、车辆短缺、开放结束的紧急情况持续时间、多个路线的同时重新计划和延迟。为解决这些挑战,本文提出了一种深度强化学习(DRL)框架,以确定最佳重新计划时间表、路线停站和车辆分配,同时考虑实时需求满足、站点拥堵、列车容量利用和队列安全。

SEA: Shareable and Explainable Attribution for Query-based Black-box Attacks

  • paper_url: http://arxiv.org/abs/2308.11845
  • repo_url: None
  • paper_authors: Yue Gao, Ilia Shumailov, Kassem Fawaz
    for:This paper aims to provide a novel security system for Machine Learning (ML) systems to characterize black-box attacks for forensic purposes and facilitate human-explainable intelligence sharing.methods:The proposed security system, called SEA, leverages Hidden Markov Models (HMMs) to attribute observed query sequences to known attacks, allowing for a comprehensive understanding of the attack’s progression.results:SEA is effective at attack attribution, even on its second occurrence, and is robust to adaptive strategies designed to evade forensics analysis. The explanations provided by SEA allow for the identification of specific minor implementation bugs in attack libraries, and the system achieves 90+% Top-1 and 95+% Top-3 accuracy in recognizing the same attack’s second occurrence.
    Abstract Machine Learning (ML) systems are vulnerable to adversarial examples, particularly those from query-based black-box attacks. Despite various efforts to detect and prevent such attacks, there is a need for a more comprehensive approach to logging, analyzing, and sharing evidence of attacks. While classic security benefits from well-established forensics and intelligence sharing, Machine Learning is yet to find a way to profile its attackers and share information about them. In response, this paper introduces SEA, a novel ML security system to characterize black-box attacks on ML systems for forensic purposes and to facilitate human-explainable intelligence sharing. SEA leverages the Hidden Markov Models framework to attribute the observed query sequence to known attacks. It thus understands the attack's progression rather than just focusing on the final adversarial examples. Our evaluations reveal that SEA is effective at attack attribution, even on their second occurrence, and is robust to adaptive strategies designed to evade forensics analysis. Interestingly, SEA's explanations of the attack behavior allow us even to fingerprint specific minor implementation bugs in attack libraries. For example, we discover that the SignOPT and Square attacks implementation in ART v1.14 sends over 50% specific zero difference queries. We thoroughly evaluate SEA on a variety of settings and demonstrate that it can recognize the same attack's second occurrence with 90+% Top-1 and 95+% Top-3 accuracy.
    摘要 机器学习(ML)系统容易受到恶意例子的攻击,特别是来自于查询基本黑盒攻击。尽管有各种努力检测和防范这些攻击,但是还是需要一种更加全面的方法来记录、分析和共享攻击证据。 whereas классической安全受益于已经成熔的审计和情报共享,机器学习尚未找到一种方法来识别和分享攻击者的信息。为此,这篇论文引入了SEA,一种新的机器学习安全系统,用于Characterize黑盒攻击ML系统的攻击行为,以便对其进行审计和人类可读的情报分享。SEA利用隐藏马尔可夫模型框架来归属观察到的查询序列到已知的攻击。因此,它可以理解攻击的进程而不仅仅是关注最终的恶意例子。我们的评估表明,SEA能够有效地归属攻击,包括其第二次出现,并且对于适应性攻击策略而言是Robust。而且,SEA的攻击行为解释可以让我们甚至可以识别特定的小型实现漏洞。例如,我们发现ART v1.14中的SignOPT和方块攻击实现中发送了50%以上的特定零差查询。我们在不同的设置下进行了全面的评估,并证明了SEA可以在90+% Top-1和95+% Top-3的准确率下识别同一个攻击的第二次出现。

${\rm E}(3)$-Equivariant Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2308.11842
  • repo_url: https://github.com/dchen48/e3ac
  • paper_authors: Dingyang Chen, Qi Zhang
  • for: 本研究旨在利用多智能体强化学习(MARL)中存在的几何 symmetries,以提高多智能体在各种应用中的合作行为。
  • methods: 我们采用了具有几何约束的神经网络体系,以便在多智能体actor-critic方法中嵌入几何偏好。
  • results: 我们的实验结果表明,在多种合作MARL benchmark中,我们的方法可以实现更高的性能和更好的泛化能力,包括零shot学习和传输学习。Here’s the translation in English for reference:
  • for: This research aims to exploit the Euclidean symmetries inherent in cooperative multi-agent reinforcement learning (MARL) problems to improve the cooperative behavior of multiple intelligent agents in various applications.
  • methods: We use neural network architectures with symmetric constraints embedded as an inductive bias for multi-agent actor-critic methods.
  • results: Our experimental results show that our method can achieve higher performance and better generalization capabilities, including zero-shot learning and transfer learning, in various cooperative MARL benchmarks.
    Abstract Identification and analysis of symmetrical patterns in the natural world have led to significant discoveries across various scientific fields, such as the formulation of gravitational laws in physics and advancements in the study of chemical structures. In this paper, we focus on exploiting Euclidean symmetries inherent in certain cooperative multi-agent reinforcement learning (MARL) problems and prevalent in many applications. We begin by formally characterizing a subclass of Markov games with a general notion of symmetries that admits the existence of symmetric optimal values and policies. Motivated by these properties, we design neural network architectures with symmetric constraints embedded as an inductive bias for multi-agent actor-critic methods. This inductive bias results in superior performance in various cooperative MARL benchmarks and impressive generalization capabilities such as zero-shot learning and transfer learning in unseen scenarios with repeated symmetric patterns. The code is available at: https://github.com/dchen48/E3AC.
    摘要 identification和分析 symmetrical patternsinatural world Led to significant discoveries across various scientific fields, such as the formulation of gravitational laws in physics and advancements in the study of chemical structures. In this paper, we focus on exploiting Euclidean symmetries inherent in certain cooperative multi-agent reinforcement learning (MARL) problems and prevalent in many applications. We begin by formally characterizing a subclass of Markov games with a general notion of symmetries that admits the existence of symmetric optimal values and policies. Motivated by these properties, we design neural network architectures with symmetric constraints embedded as an inductive bias for multi-agent actor-critic methods. This inductive bias results in superior performance in various cooperative MARL benchmarks and impressive generalization capabilities such as zero-shot learning and transfer learning in unseen scenarios with repeated symmetric patterns. The code is available at: https://github.com/dchen48/E3AC.Here's the translation breakdown:* "identification" is 识别 (shí bié)* "and analysis" is 并分析 (bìng fāng yì)* "symmetrical patterns" is symmetrical pattern (xìng xìng jì zhì)* "in the natural world" is 在自然界 (zài zì rán jiè)* "have led to significant discoveries" is Led to significant discoveries (dàn yì yì jì)* "across various scientific fields" is across various scientific fields (guò yǐ zhòng kē jiào)* "such as the formulation of gravitational laws" is such as the formulation of gravitational laws (如如 gravitational laws 的提出)* "in physics" is in physics (在物理学中)* "and advancements in the study of chemical structures" is and advancements in the study of chemical structures (并在化学结构的研究中)* "In this paper" is In this paper (在这篇论文中)* "we focus on exploiting Euclidean symmetries" is we focus on exploiting Euclidean symmetries (我们专注于利用欧几何Symmetries)* "inherent in certain cooperative multi-agent reinforcement learning (MARL) problems" is inherent in certain cooperative multi-agent reinforcement learning (MARL) problems (在某些合作多智能agent reinforcement learning (MARL) 问题中存在)* "and prevalent in many applications" is and prevalent in many applications (并在许多应用中很普遍)* "We begin by formally characterizing a subclass of Markov games" is We begin by formally characterizing a subclass of Markov games (我们开始是正式地定义一个Markov games 的子集)* "with a general notion of symmetries" is with a general notion of symmetries (通过一个通用的Symmetries 概念)* "that admits the existence of symmetric optimal values and policies" is that admits the existence of symmetric optimal values and policies (可以承认Symmetric 的最优值和策略的存在)* "Motivated by these properties" is Motivated by these properties (受到这些属性的激发)* "we design neural network architectures with symmetric constraints" is we design neural network architectures with symmetric constraints (我们设计了具有Symmetric 约束的神经网络架构)* "embedded as an inductive bias" is embedded as an inductive bias (作为一种启发性的偏好)* "for multi-agent actor-critic methods" is for multi-agent actor-critic methods (用于多智能agent actor-critic 方法)* "This inductive bias results in superior performance" is This inductive bias results in superior performance (这种启发性偏好会导致更高的性能)* "in various cooperative MARL benchmarks" is in various cooperative MARL benchmarks (在多种合作MARL 标准测试中)* "and impressive generalization capabilities" is and impressive generalization capabilities (以及各种泛化能力)* "such as zero-shot learning and transfer learning" is such as zero-shot learning and transfer learning (如无需训练学习和过渡学习)* "in unseen scenarios with repeated symmetric patterns" is in unseen scenarios with repeated symmetric patterns (在未经训练的情况下,重复的Symmetric 模式)* "The code is available at" is The code is available at (代码可以在)* "https://github.com/dchen48/E3AC" is https://github.com/dchen48/E3AC (在 GitHub 上的 E3AC 项目)

A Survey for Federated Learning Evaluations: Goals and Measures

  • paper_url: http://arxiv.org/abs/2308.11841
  • repo_url: None
  • paper_authors: Di Chai, Leye Wang, Liu Yang, Junxue Zhang, Kai Chen, Qiang Yang
  • for: 评估 Federated Learning(FL)系统的用途,包括评估FL的实用性、效率和安全性。
  • methods: 论文综述了现有研究中采用的主要评估目标,以及对每个目标的评估指标。同时,文章还介绍了FedEval,一个开源的评估平台,可以为FL算法提供标准化和全面的评估框架。
  • results: 文章评估了FL算法的实用性、效率和安全性,并提出了一些评估挑战和未来研究方向。
    Abstract Evaluation is a systematic approach to assessing how well a system achieves its intended purpose. Federated learning (FL) is a novel paradigm for privacy-preserving machine learning that allows multiple parties to collaboratively train models without sharing sensitive data. However, evaluating FL is challenging due to its interdisciplinary nature and diverse goals, such as utility, efficiency, and security. In this survey, we first review the major evaluation goals adopted in the existing studies and then explore the evaluation metrics used for each goal. We also introduce FedEval, an open-source platform that provides a standardized and comprehensive evaluation framework for FL algorithms in terms of their utility, efficiency, and security. Finally, we discuss several challenges and future research directions for FL evaluation.
    摘要 评估是一种系统的方法,用于评估系统是否达到其目标。联邦学习(FL)是一种新的隐私保护机器学习方法,允许多方共同训练模型而无需分享敏感数据。然而,评估FL具有多种目标和特点,如实用性、效率和安全性,这使得评估变得具有挑战性。在这篇文章中,我们首先评审了现有研究中采用的主要评估目标,然后探讨每个目标的评估指标。我们还介绍了FedEval,一个开源平台,提供了对FL算法的标准化和完整的评估框架,包括实用性、效率和安全性。最后,我们讨论了FL评估中的一些挑战和未来研究方向。

A Benchmark Study on Calibration

  • paper_url: http://arxiv.org/abs/2308.11838
  • repo_url: https://github.com/Aryia-Behroziuan/history1
  • paper_authors: Linwei Tao, Younan Zhu, Haolan Guo, Minjing Dong, Chang Xu
  • for: This paper aims to explore calibration properties within Neural Architecture Search (NAS) and answer several longstanding questions in the field, such as whether model calibration can be generalized across different tasks, whether robustness can be used as a calibration measurement, and how calibration interacts with accuracy.
  • methods: The paper leverages the NAS search space to create a model calibration dataset that evaluates 117,702 unique neural networks across 90 bin-based and 12 additional calibration measurements. The authors use this dataset to explore calibration properties and answer the aforementioned questions.
  • results: The paper provides a comprehensive analysis of calibration properties within NAS, including the impact of bin size on calibration measurement and the beneficial architectural designs for calibration. The authors also explore the relationship between calibration and accuracy, and investigate the robustness of calibration metrics.
    Abstract Deep neural networks are increasingly utilized in various machine learning tasks. However, as these models grow in complexity, they often face calibration issues, despite enhanced prediction accuracy. Many studies have endeavored to improve calibration performance through data preprocessing, the use of specific loss functions, and training frameworks. Yet, investigations into calibration properties have been somewhat overlooked. Our study leverages the Neural Architecture Search (NAS) search space, offering an exhaustive model architecture space for thorough calibration properties exploration. We specifically create a model calibration dataset. This dataset evaluates 90 bin-based and 12 additional calibration measurements across 117,702 unique neural networks within the widely employed NATS-Bench search space. Our analysis aims to answer several longstanding questions in the field, using our proposed dataset: (i) Can model calibration be generalized across different tasks? (ii) Can robustness be used as a calibration measurement? (iii) How reliable are calibration metrics? (iv) Does a post-hoc calibration method affect all models uniformly? (v) How does calibration interact with accuracy? (vi) What is the impact of bin size on calibration measurement? (vii) Which architectural designs are beneficial for calibration? Additionally, our study bridges an existing gap by exploring calibration within NAS. By providing this dataset, we enable further research into NAS calibration. As far as we are aware, our research represents the first large-scale investigation into calibration properties and the premier study of calibration issues within NAS.
    摘要 深度神经网络在不同机器学习任务中日益普及,然而随着模型复杂度的增加,它们经常面临准确性问题,即使预测精度得到提高。许多研究已尝试通过数据处理、特定损失函数和训练框架来改善准确性表现。然而,关于准确性质量的研究相对落后。我们的研究利用Neural Architecture Search(NAS)搜索空间,提供了详细的模型建立空间,以便对准确性质量进行全面探索。我们专门创建了一个模型准确性数据集。这个数据集评估了90个分割值和12个附加准确性测量,在117,702个Unique Neural Networks(NATS-Bench)搜索空间中进行了117,702次评估。我们的分析旨在回答以下长期未解之问题:(i)模型准确性是否可以泛化到不同任务?(ii)可以使用Robustness作为准确性测量吗?(iii)准确性指标如何可靠?(iv)后期准确性方法对所有模型是否具有同样的影响?(v)准确性与精度之间是否存在相互关系?(vi)划分大小对准确性测量有何影响?(vii)哪些建筑设计对准确性有益?我们的研究填补了现有的空白,通过对NAS中的准确性进行大规模的探索。我们的研究表明,准确性质量和模型复杂度之间存在着深刻的关系。

Characterizing normal perinatal development of the human brain structural connectivity

  • paper_url: http://arxiv.org/abs/2308.11836
  • repo_url: None
  • paper_authors: Yihan Wu, Lana Vasung, Camilo Calixto, Ali Gholipour, Davood Karimi
  • for: 这个研究用于研究新生儿脑发育阶段的结构连接ome的发展趋势,以及这些连接ome在脑发育和环境因素的影响下的作用。
  • methods: 这个研究使用了基于时空平均的计算框架,以确定新生儿脑发育阶段的结构连接ometry的normative基线。
  • results: 研究发现,在33-44周孕期间,脑结构连接ometry发展出了明显和强的趋势,global和local效率增加,特征路径长度减少,脑叶和脑半球之间和cross脑叶之间的连接强度增加。
    Abstract Early brain development is characterized by the formation of a highly organized structural connectome. The interconnected nature of this connectome underlies the brain's cognitive abilities and influences its response to diseases and environmental factors. Hence, quantitative assessment of structural connectivity in the perinatal stage is useful for studying normal and abnormal neurodevelopment. However, estimation of the connectome from diffusion MRI data involves complex computations. For the perinatal period, these computations are further challenged by the rapid brain development and imaging difficulties. Combined with high inter-subject variability, these factors make it difficult to chart the normal development of the structural connectome. As a result, there is a lack of reliable normative baselines of structural connectivity metrics at this critical stage in brain development. In this study, we developed a computational framework, based on spatio-temporal averaging, for determining such baselines. We used this framework to analyze the structural connectivity between 33 and 44 postmenstrual weeks using data from 166 subjects. Our results unveiled clear and strong trends in the development of structural connectivity in perinatal stage. Connection weighting based on fractional anisotropy and neurite density produced the most consistent results. We observed increases in global and local efficiency, a decrease in characteristic path length, and widespread strengthening of the connections within and across brain lobes and hemispheres. We also observed asymmetry patterns that were consistent between different connection weighting approaches. The new computational method and results are useful for assessing normal and abnormal development of the structural connectome early in life.
    摘要 早期大脑发展 caracterized by the formation of a highly organized structural connectome。这个 connectome 的相互连接性是大脑的认知能力的基础,也影响大脑对疾病和环境因素的反应。因此,量化评估早期Structural connectivity 是研究正常和异常 neural development 的有用工具。然而,来自 diffusion MRI 数据的 connectome 的计算具有复杂的计算。在早期,这些计算更加困难,因为大脑快速发展和成像困难。加之高 между人组织变化,这些因素使得不可能 chart 正常发展的 Structural connectivity 基线。在这项研究中,我们开发了一个基于 spatio-temporal averaging 的计算框架,以确定这些基线。我们使用这个框架分析33-44 postmenstrual weeks 期间的 Structural connectivity,使用了166个subject的数据。我们的结果表明,在早期 stage 的Structural connectivity 发展出了明确和强大的趋势。基于 fractional anisotropy 和 neurite density 的连接重量得到了最好的结果。我们发现全球和局部效率增加,特征路径长度减少,并且广泛加强了大脑内部和外部的连接。我们还发现了一些协调的偏好,这些偏好在不同的连接重量方法中都是一致的。这种新的计算方法和结果有助于评估早期生命中 Structural connectome 的正常和异常发展。

Performance Comparison and Implementation of Bayesian Variants for Network Intrusion Detection

  • paper_url: http://arxiv.org/abs/2308.11834
  • repo_url: None
  • paper_authors: Tosin Ige, Christopher Kiekintveld
    for: 本研究的目的是对网络入侵异常检测中使用权重聚合法进行比较性研究,并investigate each variant of Bayesian classifier assumption的影响。methods: 本研究使用了多种权重聚合法的变体,包括Multinomial、Bernoulli和Gaussian。results: 实验结果表明,Bernoulli的准确率为69.9%(71%的训练集),Multinomial的准确率为31.2%(31.2%的训练集),而Gaussian的准确率为81.69%(82.84%的训练集)。进一步调查发现,每种Naive Bayes变体的性能和准确率归结于它们的假设,Gaussian类器表现最佳,因为它假设特征遵循正态分布,而Multinomial类器表现糟糕,因为它只假设离散和多omial分布。
    Abstract Bayesian classifiers perform well when each of the features is completely independent of the other which is not always valid in real world application. The aim of this study is to implement and compare the performances of each variant of Bayesian classifier (Multinomial, Bernoulli, and Gaussian) on anomaly detection in network intrusion, and to investigate whether there is any association between each variant assumption and their performance. Our investigation showed that each variant of Bayesian algorithm blindly follows its assumption regardless of feature property, and that the assumption is the single most important factor that influences their accuracy. Experimental results show that Bernoulli has accuracy of 69.9% test (71% train), Multinomial has accuracy of 31.2% test (31.2% train), while Gaussian has accuracy of 81.69% test (82.84% train). Going deeper, we investigated and found that each Naive Bayes variants performances and accuracy is largely due to each classifier assumption, Gaussian classifier performed best on anomaly detection due to its assumption that features follow normal distributions which are continuous, while multinomial classifier have a dismal performance as it simply assumes discreet and multinomial distribution.
    摘要 bayesian 分类器在实际应用中表现良好时,每个特征都必须独立于其他特征,这并不总是正确的。本研究的目的是对网络入侵异常检测中bayesian分类器的每个变体(多omial、bernoulli和 Gaussian)的实现和比较其性能,以及每个变体假设和其性能之间的关系。我们的调查发现,bayesian算法的每个变体都会遵循其假设,不管特征的性质如何。实验结果表明,bernoulli的准确率为69.9%(71%训练集),多omial的准确率为31.2%(31.2%训练集),而 Gaussian 的准确率为81.69%(82.84%训练集)。进一步调查发现,每种Naive Bayes变体的性能和准确率主要归功于它们的假设,Gaussian分类器在异常检测中表现最佳,因为它假设特征遵循正态分布,而多omial分类器表现不佳,因为它只假设离散和多omial分布。

Exploring the Effectiveness of GPT Models in Test-Taking: A Case Study of the Driver’s License Knowledge Test

  • paper_url: http://arxiv.org/abs/2308.11827
  • repo_url: None
  • paper_authors: Saba Rahimi, Tucker Balch, Manuela Veloso
  • for: 这项研究的目的是使用不在模型训练数据中包含的信息源来使GPT模型回答问题。
  • methods: 这种方法包括Contextual信息的预处理、查询和Context的Embedding的构建、基于Context embedding的提问、以及使用GPT模型回答问题。
  • results: 在一个控制测试场景中,使用加利福尼亚驾驶手册作为信息源,GPT-3模型在50个样本驾驶知识测试题上达到了96%的通过率,而无Context情况下的通过率为82%。然而,即使提供Context,模型仍然无法正确回答一些问题,表明还有改进空间。研究还研究了提问长度和Context格式对模型表现的影响。总的来说,这项研究提供了GPT模型在问题回答任务中的限制和改进空间。
    Abstract Large language models such as Open AI's Generative Pre-trained Transformer (GPT) models are proficient at answering questions, but their knowledge is confined to the information present in their training data. This limitation renders them ineffective when confronted with questions about recent developments or non-public documents. Our research proposes a method that enables GPT models to answer questions by employing context from an information source not previously included in their training data. The methodology includes preprocessing of contextual information, the embedding of contexts and queries, constructing prompt through the integration of context embeddings, and generating answers using GPT models. We applied this method in a controlled test scenario using the California Driver's Handbook as the information source. The GPT-3 model achieved a 96% passing score on a set of 50 sample driving knowledge test questions. In contrast, without context, the model's passing score fell to 82%. However, the model still fails to answer some questions correctly even with providing library of context, highlighting room for improvement. The research also examined the impact of prompt length and context format, on the model's performance. Overall, the study provides insights into the limitations and potential improvements for GPT models in question-answering tasks.
    摘要 大型语言模型如Open AI的生成预训练Transformer(GPT)模型在回答问题方面表现出色,但它们的知识仅仅受训练数据的限制。这个限制使得它们在面对最新的发展或非公开文档时无法回答问题。我们的研究提出了一种方法,让GPT模型通过不包括在训练数据中的信息来回答问题。这个方法包括对背景信息进行预处理、将查询和背景信息转换为嵌入、通过嵌入的集成建立提示,并使用GPT模型产生答案。我们在一个控制过的测试场景中将这种方法应用到加州驾照手册作为资料来源。GPT-3模型在50个验证驾照知识问题中获得96%的得分,而无 Context的模型仅获得82%的得分。然而,即使提供库存中的背景信息,模型仍然无法回答一些问题正确,这显示了改进的空间。研究也检查了提示长度和背景信息格式对模型表现的影响。总的来说,这些研究获得了GPT模型在问题回答任务中的限制和改进的可能性。

Accel-GCN: High-Performance GPU Accelerator Design for Graph Convolution Networks

  • paper_url: http://arxiv.org/abs/2308.11825
  • repo_url: https://github.com/xiexi1990/iccad-accel-gnn
  • paper_authors: Xi Xie, Hongwu Peng, Amit Hasan, Shaoyi Huang, Jiahui Zhao, Haowen Fang, Wei Zhang, Tong Geng, Omer Khan, Caiwen Ding
  • for: 提高GCN的计算效率和多级存储效率,以提高图数据中的秘密信息提取。
  • methods: 使用轻量级度排序、块级别分配策略和组合挺别策略来提高 Shared 存储地方性和工作负荷均衡,并且通过利用内存划Alignment和内存划缩进行内存带宽优化。
  • results: 对于18个 benchmark 图,Accel-GCN比cuSPARSE、GNNAdvisor和graph-BLAST高效性提高1.17倍、1.86倍和2.94倍。
    Abstract Graph Convolutional Networks (GCNs) are pivotal in extracting latent information from graph data across various domains, yet their acceleration on mainstream GPUs is challenged by workload imbalance and memory access irregularity. To address these challenges, we present Accel-GCN, a GPU accelerator architecture for GCNs. The design of Accel-GCN encompasses: (i) a lightweight degree sorting stage to group nodes with similar degree; (ii) a block-level partition strategy that dynamically adjusts warp workload sizes, enhancing shared memory locality and workload balance, and reducing metadata overhead compared to designs like GNNAdvisor; (iii) a combined warp strategy that improves memory coalescing and computational parallelism in the column dimension of dense matrices. Utilizing these principles, we formulated a kernel for sparse matrix multiplication (SpMM) in GCNs that employs block-level partitioning and combined warp strategy. This approach augments performance and multi-level memory efficiency and optimizes memory bandwidth by exploiting memory coalescing and alignment. Evaluation of Accel-GCN across 18 benchmark graphs reveals that it outperforms cuSPARSE, GNNAdvisor, and graph-BLAST by factors of 1.17 times, 1.86 times, and 2.94 times respectively. The results underscore Accel-GCN as an effective solution for enhancing GCN computational efficiency.
    摘要 图像卷积网络(GCNs)在不同领域中提取隐藏信息的过程中扮演着重要的角色,但是它们在主流GPU上的加速受到工作负载不均和内存访问不规则的挑战。为解决这些挑战,我们提出了Accel-GCN,一种针对GCN的GPU加速器架构。Accel-GCN的设计包括:1. 轻量级度排序阶段,用于将节点按照度数分组,以便减少内存访问不规则和metadata开销。2. 基于块级别的分配策略,可以动态调整工作块大小,提高共享内存地方可用性和工作负载均衡,同时减少metadata开销。3. 结合折衔策略,以提高内存归一化和计算并行性在纵向 dense 矩阵中。通过这些原则,我们实现了GCN中的稀疏矩阵乘法(SpMM)kernel,该 kernel使用了块级别的分配策略和结合折衔策略。这种方法可以提高性能和多级内存效率,并且可以利用内存归一化和对齐来增加内存带宽。经过对18个标准图表进行评估,我们发现Accel-GCN在与cuSPARSE、GNNAdvisor和graph-BLAST进行比较时,能够提高性能的 фактор为1.17倍、1.86倍和2.94倍。这些结果证明Accel-GCN是一种有效的GCN计算效率的加速器。

PatchBackdoor: Backdoor Attack against Deep Neural Networks without Model Modification

  • paper_url: http://arxiv.org/abs/2308.11822
  • repo_url: https://github.com/xaiveryuan/patchbackdoor
  • paper_authors: Yizhen Yuan, Rui Kong, Shenghao Xie, Yuanchun Li, Yunxin Liu
  • for: 防止深度学习系统中的攻击,特别是在安全关键场景中。
  • methods: 使用一个特制的补丁(namely backdoor patch),将其置于前置于摄像头,并将其与输入图像一起传输给模型。
  • results: 在实验中,patchbackdoor可以在常见的深度学习模型(VGG、MobileNet、ResNet)上实现攻击成功率为93%到99%。此外,我们在实际应用中也证明了攻击的可行性。
    Abstract Backdoor attack is a major threat to deep learning systems in safety-critical scenarios, which aims to trigger misbehavior of neural network models under attacker-controlled conditions. However, most backdoor attacks have to modify the neural network models through training with poisoned data and/or direct model editing, which leads to a common but false belief that backdoor attack can be easily avoided by properly protecting the model. In this paper, we show that backdoor attacks can be achieved without any model modification. Instead of injecting backdoor logic into the training data or the model, we propose to place a carefully-designed patch (namely backdoor patch) in front of the camera, which is fed into the model together with the input images. The patch can be trained to behave normally at most of the time, while producing wrong prediction when the input image contains an attacker-controlled trigger object. Our main techniques include an effective training method to generate the backdoor patch and a digital-physical transformation modeling method to enhance the feasibility of the patch in real deployments. Extensive experiments show that PatchBackdoor can be applied to common deep learning models (VGG, MobileNet, ResNet) with an attack success rate of 93% to 99% on classification tasks. Moreover, we implement PatchBackdoor in real-world scenarios and show that the attack is still threatening.
    摘要 <>转换文本为简化中文。<>深度学习系统在安全关键场景中面临着主要的后门攻击威胁,这种攻击目标是在攻击者控制的条件下让神经网络模型发生不良行为。然而,大多数后门攻击需要修改神经网络模型通过训练使用毒品数据和/或直接模型修改,这会导致一种常见 pero incorrect的信念,即后门攻击可以通过正确保护模型来避免。在这篇论文中,我们展示了后门攻击可以无需修改模型。相反于在训练数据中注入后门逻辑或模型中注入后门逻辑,我们提议在前置摄像头上置一个特制的贴图(即后门贴图),该贴图与输入图像一起被传递给模型。贴图可以在大多数时间内保持正常行为,而在攻击者控制的触发对象存在时产生错误预测。我们的主要技术包括生成后门贴图的有效训练方法和使用数字物理变换模型增强贴图在实际部署中的可行性。广泛的实验表明,PatchBackdoor可以应用于常见的深度学习模型(VGG、MobileNet、ResNet),攻击成功率在93%到99%之间。此外,我们在实际场景中实现了PatchBackdoor攻击,并证明了攻击仍然具有威胁性。

Mitigating Health Disparity on Biased Electronic Health Records via Deconfounder

  • paper_url: http://arxiv.org/abs/2308.11819
  • repo_url: None
  • paper_authors: Zheng Liu, Xiaohan Li, Philip Yu
  • for: 这篇论文旨在解决临床数据模型中的公平问题,特别是在电子健康记录(EHR)上。因为EHR具有复杂的潜在结构和可能的选择偏见,因此需要同时维护健康差异和模型的总准确性。
  • methods: 本论文提出了一个名为“公平长期医疗混合模型”(FLMD)的新型模型,旨在同时维护公平和总准确性在长期EHR模型中。基于混合模型理论,FLMD使用了两阶段训练过程。在第一阶段,FLMD捕捉了每次遇到的隐藏因素,这些隐藏因素代表了资料以外的医疗因素,例如患者基因和生活方式。在第二阶段,FLMD结合了学习到的潜在表现和其他相关特征来做预测。通过将适当的公平标准加入模型中,FLMD确保了高预测精度和同时维护健康差异。
  • results: 根据实验结果,FLMD在两个真实世界EHR数据集上表现出色,较基eline方法和FLMDVariants在公平和精度方面表现更好。此外,FLMD在干扰/不对称和 sintetic 数据集上也表现出色,显示FLMD在不同设定下的优秀性。
    Abstract The fairness issue of clinical data modeling, especially on Electronic Health Records (EHRs), is of utmost importance due to EHR's complex latent structure and potential selection bias. It is frequently necessary to mitigate health disparity while keeping the model's overall accuracy in practice. However, traditional methods often encounter the trade-off between accuracy and fairness, as they fail to capture the underlying factors beyond observed data. To tackle this challenge, we propose a novel model called Fair Longitudinal Medical Deconfounder (FLMD) that aims to achieve both fairness and accuracy in longitudinal Electronic Health Records (EHR) modeling. Drawing inspiration from the deconfounder theory, FLMD employs a two-stage training process. In the first stage, FLMD captures unobserved confounders for each encounter, which effectively represents underlying medical factors beyond observed EHR, such as patient genotypes and lifestyle habits. This unobserved confounder is crucial for addressing the accuracy/fairness dilemma. In the second stage, FLMD combines the learned latent representation with other relevant features to make predictions. By incorporating appropriate fairness criteria, such as counterfactual fairness, FLMD ensures that it maintains high prediction accuracy while simultaneously minimizing health disparities. We conducted comprehensive experiments on two real-world EHR datasets to demonstrate the effectiveness of FLMD. Apart from the comparison of baseline methods and FLMD variants in terms of fairness and accuracy, we assessed the performance of all models on disturbed/imbalanced and synthetic datasets to showcase the superiority of FLMD across different settings and provide valuable insights into its capabilities.
    摘要 严重性问题在医疗数据建模中,尤其是在电子健康记录(EHR)方面,是非常重要的,因为EHR具有复杂的潜在结构和可能的选择偏见。为了避免健康差距,而且在实践中保持模型的总准确性,常常需要进行权衡。然而,传统方法经常遇到准确性和公平之间的负面选择,因为它们无法捕捉下面数据的深层结构。为解决这个挑战,我们提出了一种新的模型,即公平长期医疗去假设模型(FLMD),旨在在长期EHR模型中同时保持准确性和公平性。受到去假设理论启发,FLMD采用了两个阶段的训练过程。在第一阶段,FLMD捕捉了每次观察的隐藏因素,这些隐藏因素对应于下面数据中的深层结构,如患者的基因和生活习惯。这些隐藏因素是解决准确性/公平之间的负面选择的关键。在第二阶段,FLMD将学习的潜在表示与其他相关特征结合以进行预测。通过包括合适的公平标准,如Counterfactual公平,FLMD确保保持高预测准确性,同时最小化健康差距。我们在两个真实的EHR数据集上进行了广泛的实验,以证明FLMD的有效性。除了与基eline方法和FLMD变体进行公平和准确性的比较外,我们还评估了所有模型在受损/不均衡和 sintetic 数据集上的性能,以验证FLMD在不同的设置下的优势和提供有价值的洞察。

Incorporating Nonlocal Traffic Flow Model in Physics-informed Neural Networks

  • paper_url: http://arxiv.org/abs/2308.11818
  • repo_url: None
  • paper_authors: Archie J. Huang, Animesh Biswas, Shaurya Agarwal
  • for: 提高交通状况估算精度,提高交通管理策略效果
  • methods: 利用非本地LWR模型,physics-informed深度学习框架,提高交通状况估算精度
  • results: 比基eline方法提高了交通状况估算精度,能够更好地支持交通管理策略
    Abstract This research contributes to the advancement of traffic state estimation methods by leveraging the benefits of the nonlocal LWR model within a physics-informed deep learning framework. The classical LWR model, while useful, falls short of accurately representing real-world traffic flows. The nonlocal LWR model addresses this limitation by considering the speed as a weighted mean of the downstream traffic density. In this paper, we propose a novel PIDL framework that incorporates the nonlocal LWR model. We introduce both fixed-length and variable-length kernels and develop the required mathematics. The proposed PIDL framework undergoes a comprehensive evaluation, including various convolutional kernels and look-ahead windows, using data from the NGSIM and CitySim datasets. The results demonstrate improvements over the baseline PIDL approach using the local LWR model. The findings highlight the potential of the proposed approach to enhance the accuracy and reliability of traffic state estimation, enabling more effective traffic management strategies.
    摘要 Translation notes:* "physics-informed deep learning" is translated as "物理学引入深度学习" (wù yì yì yì xué yì deep learning)* "nonlocal LWR model" is translated as "非本地LWR模型" (fēi běn dì LWR módel)* "speed as a weighted mean of downstream traffic density" is translated as "速度为下游交通密度的Weighted Mean" (fù dòu wèi xià yòu jiāo tào mì tí de weighted mean)* "fixed-length and variable-length kernels" is translated as "固定长度和可变长度核心" (gù jí cháng dào hé kě bìan cháng dào)* "NGSIM and CitySim" are translated as "NGSIM和CitySim" (NGSIM hé CitySim)

Evaluation of Deep Neural Operator Models toward Ocean Forecasting

  • paper_url: http://arxiv.org/abs/2308.11814
  • repo_url: None
  • paper_authors: Ellery Rajagopal, Anantha N. S. Babu, Tony Ryu, Patrick J. Haley Jr., Chris Mirabito, Pierre F. J. Lermusiaux
  • for: 研究深度学习模型在预测时间序列数据中的可用性,特别是在大气和海洋领域,以及更广泛的大液体社区中。
  • methods: 使用数据驱动的深度学习模型来模拟和预测经典流体流动和实际海洋动力学 simulations。
  • results: 训练深度 neural operator 模型可以预测 идеализирован的 periodic eddy shedding,并且在实际海洋表层流动和初步研究中显示了一定的能力和潜力。
    Abstract Data-driven, deep-learning modeling frameworks have been recently developed for forecasting time series data. Such machine learning models may be useful in multiple domains including the atmospheric and oceanic ones, and in general, the larger fluids community. The present work investigates the possible effectiveness of such deep neural operator models for reproducing and predicting classic fluid flows and simulations of realistic ocean dynamics. We first briefly evaluate the capabilities of such deep neural operator models when trained on a simulated two-dimensional fluid flow past a cylinder. We then investigate their application to forecasting ocean surface circulation in the Middle Atlantic Bight and Massachusetts Bay, learning from high-resolution data-assimilative simulations employed for real sea experiments. We confirm that trained deep neural operator models are capable of predicting idealized periodic eddy shedding. For realistic ocean surface flows and our preliminary study, they can predict several of the features and show some skill, providing potential for future research and applications.
    摘要 <>使用数据驱动的深度学习模型框架,近期开发了用于预测时间序列数据的预测模型。这些机器学习模型可能在多个领域中有用,包括大气和海洋领域,以及涵盖更广泛的大气和海洋社区。本文研究了使用深度神经运算模型来重现和预测经典流体流和真实的海洋动力学 simulate。我们首先简要评估了深度神经运算模型在模拟两维流体流 past cylinder 的能力。然后,我们探索了它们在forecasting ocean surface circulation in the Middle Atlantic Bight and Massachusetts Bay 中的应用,学习从高分辨率数据 assimilative simulations 中获得的实际海洋实验数据。我们确认了训练过的深度神经运算模型可以预测理想化的 periodic eddy shedding。对真实的海洋表面流和我们的初步研究,它们可以预测一些特征,并表现出一定的能力,这提供了未来研究和应用的潜在可能性。Note: Simplified Chinese is used in this translation, which is a more casual and informal style of Chinese that is commonly used in online communication and informal writing. If you prefer Traditional Chinese or a more formal style, please let me know and I can provide those versions as well.

Ceci n’est pas une pomme: Adversarial Illusions in Multi-Modal Embeddings

  • paper_url: http://arxiv.org/abs/2308.11804
  • repo_url: None
  • paper_authors: Eugene Bagdasaryan, Vitaly Shmatikov
  • for: 这篇论文旨在探讨多Modal Encoder如何受到攻击,以及这些攻击如何影响下游任务。
  • methods: 论文使用了多Modal Encoder将图像、声音、文本、视频等多种模式映射到单一的嵌入空间中,并证明这些嵌入可能受到攻击。
  • results: 论文通过使用ImageBind embeddings示例,展示了攻击者可以通过让输入具有相似的嵌入来让图像与文本、声音与文本等多种模式相互映射。这些攻击可以影响下游任务,如图像生成、文本生成和零例分类。
    Abstract Multi-modal encoders map images, sounds, texts, videos, etc. into a single embedding space, aligning representations across modalities (e.g., associate an image of a dog with a barking sound). We show that multi-modal embeddings can be vulnerable to an attack we call "adversarial illusions." Given an input in any modality, an adversary can perturb it so as to make its embedding close to that of an arbitrary, adversary-chosen input in another modality. Illusions thus enable the adversary to align any image with any text, any text with any sound, etc. Adversarial illusions exploit proximity in the embedding space and are thus agnostic to downstream tasks. Using ImageBind embeddings, we demonstrate how adversarially aligned inputs, generated without knowledge of specific downstream tasks, mislead image generation, text generation, and zero-shot classification.
    摘要 多Modal Encoder将图像、声音、文本、视频等多种模式映射到单一的嵌入空间中,使模式之间的表示相似(例如,将一幅狗图与一个叫声相关联)。我们显示出,多Modal嵌入可能受到我们称为“ adversarial illusions”的攻击。给定任意模式的输入,敌对方可以将其扰动,使其嵌入与选择的 adversary-choosen 模式的输入的嵌入很近。这些Ilusion使敌对方可以将任意图像与任意文本、任意声音等相关联。Adversarial illusions利用嵌入空间的 proximity 特性,因此不受下游任务的影响。使用 ImageBind 嵌入,我们示示了如何通过不知道特定下游任务的知识,生成 adversarially 对齐的输入,并在图像生成、文本生成和零容量分类中造成误导。

Variational Density Propagation Continual Learning

  • paper_url: http://arxiv.org/abs/2308.11801
  • repo_url: None
  • paper_authors: Christopher Angelini, Nidhal Bouaynaya, Ghulam Rasool
  • for: 这篇论文旨在提出一个应用于实际世界中的深度神经网络(DNNs)中的缓冲学习框架,以应对实际世界中的不同类型的噪音、演进目标和资料分布迁移。
  • methods: 这篇论文提出了一种基于信度量化的缓冲学习方法,使用 uncertainty quantification from Bayesian Inference 来减轻严重遗传。这种方法移除了先前的方法中需要 Monte Carlo 抽样模型重量的需求,并且使用关键点数据来优化一个关键点数据的关键点数目数量。
  • results: 这篇论文的结果显示,这种方法可以在多个预期的实验数据上进行对应对问题的缓冲学习,并且可以实现一个最小化模型复杂度的目标。
    Abstract Deep Neural Networks (DNNs) deployed to the real world are regularly subject to out-of-distribution (OoD) data, various types of noise, and shifting conceptual objectives. This paper proposes a framework for adapting to data distribution drift modeled by benchmark Continual Learning datasets. We develop and evaluate a method of Continual Learning that leverages uncertainty quantification from Bayesian Inference to mitigate catastrophic forgetting. We expand on previous approaches by removing the need for Monte Carlo sampling of the model weights to sample the predictive distribution. We optimize a closed-form Evidence Lower Bound (ELBO) objective approximating the predictive distribution by propagating the first two moments of a distribution, i.e. mean and covariance, through all network layers. Catastrophic forgetting is mitigated by using the closed-form ELBO to approximate the Minimum Description Length (MDL) Principle, inherently penalizing changes in the model likelihood by minimizing the KL Divergence between the variational posterior for the current task and the previous task's variational posterior acting as the prior. Leveraging the approximation of the MDL principle, we aim to initially learn a sparse variational posterior and then minimize additional model complexity learned for subsequent tasks. Our approach is evaluated for the task incremental learning scenario using density propagated versions of fully-connected and convolutional neural networks across multiple sequential benchmark datasets with varying task sequence lengths. Ultimately, this procedure produces a minimally complex network over a series of tasks mitigating catastrophic forgetting.
    摘要 深度神经网络(DNNs)在实际应用中经常遇到不同类型的噪声和概念目标的变化。这篇论文提出了一种基于Continual Learning数据集的框架,用于适应数据分布的变化。我们开发了一种基于uncertainty量化的Continual Learning方法,使用bayesian推理来 Mitigate catastrophic forgetting。我们超越了之前的方法,不需要对模型参数进行Monte Carlo采样,而是使用closed-form Evidence Lower Bound(ELBO)目标,approximating the predictive distribution by propagating the first two moments of a distribution,即mean和covariance,through all network layers。通过closed-form ELBO来approximate the Minimum Description Length(MDL)Principle,我们可以penalize changes in the model likelihood by minimizing the KL Divergence between the variational posterior for the current task and the previous task's variational posterior acting as the prior。通过这种方式,我们可以初始化一个稀疏的variational posterior,然后逐个任务进行minimize additional model complexity。我们的方法在多个顺序 benchmark datasets上进行了测试,并在不同的任务序列长度下进行了评估。最终,这种方法可以生成一个对多个任务具有最小复杂性的网络,从而 Mitigate catastrophic forgetting。

Complex-valued neural networks for voice anti-spoofing

  • paper_url: http://arxiv.org/abs/2308.11800
  • repo_url: None
  • paper_authors: Nicolas M. Müller, Philip Sperl, Konstantin Böttinger
  • for: 防止声音恶意模仿(voice spoofing)和声音深层变换(audio deepfake)
  • methods: 使用复杂值神经网络处理CQT频域表示的输入声音,保留相位信息,并允许使用可解释AI方法
  • results: 比前一代方法高效,在”在野”anti-spoofing数据集上取得了更好的表现,并且通过可解释AI方法对结果进行解释,答案验证表示模型学习了相位信息以探测声音 spoofing。
    Abstract Current anti-spoofing and audio deepfake detection systems use either magnitude spectrogram-based features (such as CQT or Melspectrograms) or raw audio processed through convolution or sinc-layers. Both methods have drawbacks: magnitude spectrograms discard phase information, which affects audio naturalness, and raw-feature-based models cannot use traditional explainable AI methods. This paper proposes a new approach that combines the benefits of both methods by using complex-valued neural networks to process the complex-valued, CQT frequency-domain representation of the input audio. This method retains phase information and allows for explainable AI methods. Results show that this approach outperforms previous methods on the "In-the-Wild" anti-spoofing dataset and enables interpretation of the results through explainable AI. Ablation studies confirm that the model has learned to use phase information to detect voice spoofing.
    摘要 当前反伪措施和音频深层迷伪检测系统使用的是 either 幅度 спектрограм基于特征 (如 CQT 或 Melspectrograms) 或 Raw 音频经过 convolution 或 sinc-层处理。两种方法都有缺点:幅度 спектрограм抛弃相位信息,影响音频自然性, Raw-特征基于模型无法使用传统的解释性 AI 方法。这篇论文提议一种新的方法,利用复素函数神经网络来处理输入音频的复素值 CQT 频域表示。这种方法保留相位信息,并允许使用解释性 AI 方法。结果表明这种方法在 "In-the-Wild" 反伪措施数据集上表现出色,并且可以通过解释性 AI 方法解释结果。灵活性研究证明了模型已经学习了使用相位信息检测声音伪装。

Karasu: A Collaborative Approach to Efficient Cluster Configuration for Big Data Analytics

  • paper_url: http://arxiv.org/abs/2308.11792
  • repo_url: None
  • paper_authors: Dominik Scheinert, Philipp Wiesner, Thorsten Wittkopp, Lauritz Thamsen, Jonathan Will, Odej Kao
  • for: 提高大数据分析任务资源配置的自动化方法,以提高效率、降低成本和能效率。
  • methods: Karasu使用了分享数据的方法,通过将多个用户的运行时信息相互整合,以生成轻量级性能模型,并将其组合成ensemble方法来挖掘配置搜索空间中的内在知识。
  • results: Karasu可以帮助提高现有方法的性能、搜索时间和成本,即使只有少量相似的 profiling 运行可用,并且可以同时优化多个目标。
    Abstract Selecting the right resources for big data analytics jobs is hard because of the wide variety of configuration options like machine type and cluster size. As poor choices can have a significant impact on resource efficiency, cost, and energy usage, automated approaches are gaining popularity. Most existing methods rely on profiling recurring workloads to find near-optimal solutions over time. Due to the cold-start problem, this often leads to lengthy and costly profiling phases. However, big data analytics jobs across users can share many common properties: they often operate on similar infrastructure, using similar algorithms implemented in similar frameworks. The potential in sharing aggregated profiling runs to collaboratively address the cold start problem is largely unexplored. We present Karasu, an approach to more efficient resource configuration profiling that promotes data sharing among users working with similar infrastructures, frameworks, algorithms, or datasets. Karasu trains lightweight performance models using aggregated runtime information of collaborators and combines them into an ensemble method to exploit inherent knowledge of the configuration search space. Moreover, Karasu allows the optimization of multiple objectives simultaneously. Our evaluation is based on performance data from diverse workload executions in a public cloud environment. We show that Karasu is able to significantly boost existing methods in terms of performance, search time, and cost, even when few comparable profiling runs are available that share only partial common characteristics with the target job.
    摘要 Big data analytics tasks often have many similarities, such as operating on similar infrastructure, using similar algorithms, and implementing similar frameworks. By sharing aggregated profiling runs, we can collaboratively address the cold start problem and improve resource configuration profiling.We present Karasu, an approach to more efficient resource configuration profiling that promotes data sharing among users with similar infrastructures, frameworks, algorithms, or datasets. Karasu trains lightweight performance models using aggregated runtime information from collaborators and combines them into an ensemble method to exploit the inherent knowledge of the configuration search space. Karasu also allows for the optimization of multiple objectives simultaneously.Our evaluation is based on performance data from diverse workload executions in a public cloud environment. We show that Karasu is able to significantly improve existing methods in terms of performance, search time, and cost, even when there are few comparable profiling runs available that share only partial common characteristics with the target job.

HypBO: Expert-Guided Chemist-in-the-Loop Bayesian Search for New Materials

  • paper_url: http://arxiv.org/abs/2308.11787
  • repo_url: None
  • paper_authors: Abdoulatif Cisse, Xenophon Evangelopoulos, Sam Carruthers, Vladimir V. Gusev, Andrew I. Cooper
  • for: 该研究旨在使用人类专家知识来加速 Bayesian 优化算法,以更好地解决复杂多变量科学问题。
  • methods: 该方法使用专家假设来指导 Bayesian 搜索,并使用搜索结果来改进模型数据。
  • results: 实验结果表明,该方法可以在新、未探索的科学任务中更快地找到有价值的答案,并且可以更好地考虑专家的知识和假设。
    Abstract Robotics and automation offer massive accelerations for solving intractable, multivariate scientific problems such as materials discovery, but the available search spaces can be dauntingly large. Bayesian optimization (BO) has emerged as a popular sample-efficient optimization engine, thriving in tasks where no analytic form of the target function/property is known. Here we exploit expert human knowledge in the form of hypotheses to direct Bayesian searches more quickly to promising regions of chemical space. Previous methods have used underlying distributions derived from existing experimental measurements, which is unfeasible for new, unexplored scientific tasks. Also, such distributions cannot capture intricate hypotheses. Our proposed method, which we call HypBO, uses expert human hypotheses to generate an improved seed of samples. Unpromising seeds are automatically discounted, while promising seeds are used to augment the surrogate model data, thus achieving better-informed sampling. This process continues in a global versus local search fashion, organized in a bilevel optimization framework. We validate the performance of our method on a range of synthetic functions and demonstrate its practical utility on a real chemical design task where the use of expert hypotheses accelerates the search performance significantly.
    摘要 робототехника и автоматизация提供了巨大的加速器,用于解决不可解的多变量科学问题,如材料发现。但是可用的搜索空间可能会是惊人的大。 bayesian优化(BO)已成为一种受欢迎的效率优化引擎,在没有明确目标函数/性能的情况下,在任务中取得了成功。在我们的方法中,我们使用专家人类知识来导引bayesian搜索,以更快地导向有潜力的化学空间。先前的方法使用了基于现有实验室测量的下面分布,这是不可能的 для新、未探索的科学任务。此外,这些分布也无法捕捉复杂的假设。我们提出的方法,我们称之为 HypBO,使用专家人类假设来生成改进的样本。不可能的样本将被排除,而有潜力的样本将用于增强Surrogate模型数据,从而实现更加有知识的搜索。这个过程继续在全球 versus 地方搜索的方式下进行,组织成一个双层优化框架。我们验证了我们的方法在一系列的 sintetic函数上的性能,并在一个真实的化学设计任务中表明了它的实用性。在使用专家假设时,搜索性能得到了显著的加速。

Coarse-to-Fine Multi-Scene Pose Regression with Transformers

  • paper_url: http://arxiv.org/abs/2308.11783
  • repo_url: https://github.com/yolish/c2f-ms-transformer
  • paper_authors: Yoli Shavit, Ron Ferens, Yosi Keller
  • for: 估计摄像头的位置和 orientations,并在不同场景中学习多个场景的相对位置。
  • methods: 使用 transformer 来捕捉 activation map 和场景编码,并使用自注意力机制来混合多个场景的信息。
  • results: 在常见的indoor和outdoor数据集上进行评估,并与多Scene和单Scene精度相位估计器进行比较,达到了更高的本地化精度。
    Abstract Absolute camera pose regressors estimate the position and orientation of a camera given the captured image alone. Typically, a convolutional backbone with a multi-layer perceptron (MLP) head is trained using images and pose labels to embed a single reference scene at a time. Recently, this scheme was extended to learn multiple scenes by replacing the MLP head with a set of fully connected layers. In this work, we propose to learn multi-scene absolute camera pose regression with Transformers, where encoders are used to aggregate activation maps with self-attention and decoders transform latent features and scenes encoding into pose predictions. This allows our model to focus on general features that are informative for localization, while embedding multiple scenes in parallel. We extend our previous MS-Transformer approach \cite{shavit2021learning} by introducing a mixed classification-regression architecture that improves the localization accuracy. Our method is evaluated on commonly benchmark indoor and outdoor datasets and has been shown to exceed both multi-scene and state-of-the-art single-scene absolute pose regressors.
    摘要 通用相机pose预测器可以根据捕捉到的图像估计相机的位置和方向。通常,一个卷积神经树与多层完全连接层(MLP)组合用于使用图像和pose标签来嵌入单个参照场景。在最近的研究中,这种方案被扩展以学习多个场景,通过将MLP头换为一组完全连接层来实现。在这项工作中,我们提议使用转换器来学习多Scene绝对相机pose预测,其中扩展器用于聚合活动地图并将场景编码进行pose预测。这使得我们的模型能够专注于通用特征,而同时将多个场景编码在平行。我们在前一项MS-Transformer方法 \cite{shavit2021learning} 中提出了混合分类预测 arquitectucture,该结构可以提高本地化精度。我们的方法在常见的indoor和outdoor数据集上进行了评估,并已经超过了多Scene和状态对照单Scene绝对相机pose预测器。

Addressing Dynamic and Sparse Qualitative Data: A Hilbert Space Embedding of Categorical Variables

  • paper_url: http://arxiv.org/abs/2308.11781
  • repo_url: None
  • paper_authors: Anirban Mukherjee, Hannah H. Chang
  • for: 本文提出了一种新的框架,用于在量化模型中包含质量数据。之前的方法通常使用分类变量来建立量化模型,但这会导致数据缺失和偏差估计。本文使用函数分析创造了一个更灵活和精细的框架。
  • methods: 本文使用了函数空间中的一个稳定的映射,将观察到的分类转换为一个可重producing kernel space中的表示函数。通过里兹表示定理,我们证明了这种方法可以在量化模型中归一化分类变量的处理。
  • results: 通过大量的 simulations 和一个真实的应用例,我们证明了这种方法的超越性。特别在分类信息具有复杂和细腻的场景下,我们的模型表现出色。
    Abstract We propose a novel framework for incorporating qualitative data into quantitative models for causal estimation. Previous methods use categorical variables derived from qualitative data to build quantitative models. However, this approach can lead to data-sparse categories and yield inconsistent (asymptotically biased) and imprecise (finite sample biased) estimates if the qualitative information is dynamic and intricate. We use functional analysis to create a more nuanced and flexible framework. We embed the observed categories into a latent Baire space and introduce a continuous linear map -- a Hilbert space embedding -- from the Baire space of categories to a Reproducing Kernel Hilbert Space (RKHS) of representation functions. Through the Riesz representation theorem, we establish that the canonical treatment of categorical variables in causal models can be transformed into an identified structure in the RKHS. Transfer learning acts as a catalyst to streamline estimation -- embeddings from traditional models are paired with the kernel trick to form the Hilbert space embedding. We validate our model through comprehensive simulation evidence and demonstrate its relevance in a real-world study that contrasts theoretical predictions from economics and psychology in an e-commerce marketplace. The results confirm the superior performance of our model, particularly in scenarios where qualitative information is nuanced and complex.
    摘要 我们提出一种新的框架,用于将质量数据 incorporated 到量化模型中,以估计 causal 关系。先前的方法通常使用分类变量, derivated 从质量数据来构建量化模型。但这种方法可能会导致数据稀缺和不一致(偏差的)估计,特别是质量信息是动态和复杂的。我们使用函数分析来创建一个更加灵活和精细的框架。我们将观察到的分类 embed 到一个具有勾配空间的 Baire 空间中,并引入一个连续线性映射——一个 Reproducing Kernel Hilbert Space (RKHS) 的表示函数的 kontinuierliche 线性映射。通过里茨表述定理,我们证明了,将分类变量在 causal 模型中的 canonical 处理可以转化为 RKHS 中已知的一个结构。借助转移学习,我们可以将传统模型中的 embeddings 与内核技巧相结合,以形成一个 Hilbert 空间的 embedding。我们通过了广泛的 simulations 证明和一个实际的案例研究,confirm 了我们的模型在质量信息是复杂和细腻的场景中表现更加出色。

Few-shot Anomaly Detection in Text with Deviation Learning

  • paper_url: http://arxiv.org/abs/2308.11780
  • repo_url: None
  • paper_authors: Anindya Sundar Das, Aravind Ajay, Sriparna Saha, Monowar Bhuyan
  • for: 本研究旨在提出一种基于深度几个示例学习的检测异常文本方法,利用有限的异常示例和直接学习异常分数,以提高异常检测的精度和效率。
  • methods: 本方法基于深度几个示例学习,并使用异常分数学习和多头自注意力层,以及多个实例学习方法来学习异常行为。
  • results: 经过对多个标准数据集的实验表明,本方法可以达到新的顶峰性能水平。
    Abstract Most current methods for detecting anomalies in text concentrate on constructing models solely relying on unlabeled data. These models operate on the presumption that no labeled anomalous examples are available, which prevents them from utilizing prior knowledge of anomalies that are typically present in small numbers in many real-world applications. Furthermore, these models prioritize learning feature embeddings rather than optimizing anomaly scores directly, which could lead to suboptimal anomaly scoring and inefficient use of data during the learning process. In this paper, we introduce FATE, a deep few-shot learning-based framework that leverages limited anomaly examples and learns anomaly scores explicitly in an end-to-end method using deviation learning. In this approach, the anomaly scores of normal examples are adjusted to closely resemble reference scores obtained from a prior distribution. Conversely, anomaly samples are forced to have anomalous scores that considerably deviate from the reference score in the upper tail of the prior. Additionally, our model is optimized to learn the distinct behavior of anomalies by utilizing a multi-head self-attention layer and multiple instance learning approaches. Comprehensive experiments on several benchmark datasets demonstrate that our proposed approach attains a new level of state-of-the-art performance.
    摘要 Current methods for detecting anomalies in text mainly rely on constructing models using only unlabeled data. These models assume that no labeled anomalous examples are available, which limits their ability to utilize prior knowledge of anomalies that are typically present in small numbers in real-world applications. Moreover, these models prioritize learning feature embeddings over optimizing anomaly scores directly, which could lead to suboptimal anomaly scoring and inefficient use of data during the learning process.In this paper, we propose FATE, a deep few-shot learning-based framework that leverages limited anomaly examples and learns anomaly scores explicitly in an end-to-end manner using deviation learning. In this approach, the anomaly scores of normal examples are adjusted to closely resemble reference scores obtained from a prior distribution. Conversely, anomaly samples are forced to have anomalous scores that considerably deviate from the reference score in the upper tail of the prior. Additionally, our model is optimized to learn the distinct behavior of anomalies by utilizing a multi-head self-attention layer and multiple instance learning approaches.Comprehensive experiments on several benchmark datasets demonstrate that our proposed approach achieves a new level of state-of-the-art performance.

Understanding Hessian Alignment for Domain Generalization

  • paper_url: http://arxiv.org/abs/2308.11778
  • repo_url: https://github.com/huawei-noah/federated-learning
  • paper_authors: Sobhan Hemati, Guojun Zhang, Amir Estiri, Xi Chen
  • For: The paper focuses on improving out-of-distribution (OOD) generalization in deep learning models, specifically in healthcare and autonomous vehicles applications.* Methods: The paper analyzes the role of Hessian and gradient alignment in domain generalization using recent OOD theory of transferability. It also proposes two simple yet effective methods to match the classifier’s head Hessians and gradients, based on the Hessian Gradient Product (HGP) and Hutchinson’s method (Hutchinson), without directly calculating Hessians.* Results: The paper shows that Hessian alignment methods achieve promising performance on various OOD benchmarks, including transferability, severe correlation shift, label shift, and diversity shift.Here are the three information points in Simplified Chinese text:* For: 本文主要针对深度学习模型的对外输出(Out-of-distribution,OOD)泛化问题,具体应用于医疗和自动驾驶等领域。* Methods: 本文通过最近的OOD理论来分析领域泛化中约束的作用,并提出了两种简单 yet有效的方法来匹配核心分类器的头部Hessian和梯度,基于Hessian Gradient Product(HGP)和哈希逊的方法(Hutchinson),不直接计算Hessians。* Results: 本文显示,对于不同的OOD场景,包括转移性、严重相关变化、标签变化和多样化变化,梯度匹配方法可以获得显著的性能提升。
    Abstract Out-of-distribution (OOD) generalization is a critical ability for deep learning models in many real-world scenarios including healthcare and autonomous vehicles. Recently, different techniques have been proposed to improve OOD generalization. Among these methods, gradient-based regularizers have shown promising performance compared with other competitors. Despite this success, our understanding of the role of Hessian and gradient alignment in domain generalization is still limited. To address this shortcoming, we analyze the role of the classifier's head Hessian matrix and gradient in domain generalization using recent OOD theory of transferability. Theoretically, we show that spectral norm between the classifier's head Hessian matrices across domains is an upper bound of the transfer measure, a notion of distance between target and source domains. Furthermore, we analyze all the attributes that get aligned when we encourage similarity between Hessians and gradients. Our analysis explains the success of many regularizers like CORAL, IRM, V-REx, Fish, IGA, and Fishr as they regularize part of the classifier's head Hessian and/or gradient. Finally, we propose two simple yet effective methods to match the classifier's head Hessians and gradients in an efficient way, based on the Hessian Gradient Product (HGP) and Hutchinson's method (Hutchinson), and without directly calculating Hessians. We validate the OOD generalization ability of proposed methods in different scenarios, including transferability, severe correlation shift, label shift and diversity shift. Our results show that Hessian alignment methods achieve promising performance on various OOD benchmarks. The code is available at \url{https://github.com/huawei-noah/Federated-Learning/tree/main/HessianAlignment}.
    摘要 外部数据(OOD)泛化是深度学习模型在许多实际场景中的关键能力,包括医疗和自动驾驶。在这些场景中,不同的技术已经被提出来提高OOD泛化能力。其中,梯度基本的正则化方法已经表现出色,相比其他竞争者。然而,我们对梯度和梯度对齐在领域泛化中的作用还是有限的理解。为了解决这个问题,我们通过最近的OOD理论来分析梯度和梯度对齐在领域泛化中的作用。我们证明了在目标领域和源领域之间的梯度对齐是泛化度量的上限。此外,我们还分析了梯度和梯度对齐的对应关系,包括梯度对齐和梯度的对应关系。我们的分析解释了许多正则化方法,如CORAL、IRM、V-REx、Fish、IGA和Fishr的成功,这些方法都在梯度和梯度对齐中进行正则化。最后,我们提出了两种简单 yet有效的方法来匹配梯度和梯度对齐,基于梯度和梯度对齐的产品(HGP)和哈特金森方法(Hutchinson),而无需直接计算梯度。我们 Validate了我们提出的方法在不同的OOD场景中的性能,包括传输性、严重相关转移、标签转移和多样性转移。我们的结果表明,梯度对齐方法在各种OOD场景中表现出色。代码可以在上获取。

3ET: Efficient Event-based Eye Tracking using a Change-Based ConvLSTM Network

  • paper_url: http://arxiv.org/abs/2308.11771
  • repo_url: None
  • paper_authors: Qinyu Chen, Zuowen Wang, Shih-Chii Liu, Chang Gao
  • for: 这个论文旨在提出一种稀疏变化基于卷积长短时间记忆(CB-ConvLSTM)模型,用于基于事件的眼动跟踪,这种技术是未来可穿戴医疗技术,如AR/VR头戴式设备的关键。
  • methods: 这个论文利用了眼睛类型的事件摄像头的优点,即快速响应和稀疏输出事件流,而不是传统的帧类型摄像头。CB-ConvLSTM架构效果地提取了眼动跟踪的空间时间特征,并且比传统的CNN结构更高效。
  • results: 这个论文使用了 delta-编码的束缚回归路增强活动稀疏性,从而降低了计算量约为4.7倍,而不失去精度。这种提高的效率使其成为实时眼动跟踪的理想选择,尤其是在有限的设备中。
    Abstract This paper presents a sparse Change-Based Convolutional Long Short-Term Memory (CB-ConvLSTM) model for event-based eye tracking, key for next-generation wearable healthcare technology such as AR/VR headsets. We leverage the benefits of retina-inspired event cameras, namely their low-latency response and sparse output event stream, over traditional frame-based cameras. Our CB-ConvLSTM architecture efficiently extracts spatio-temporal features for pupil tracking from the event stream, outperforming conventional CNN structures. Utilizing a delta-encoded recurrent path enhancing activation sparsity, CB-ConvLSTM reduces arithmetic operations by approximately 4.7$\times$ without losing accuracy when tested on a \texttt{v2e}-generated event dataset of labeled pupils. This increase in efficiency makes it ideal for real-time eye tracking in resource-constrained devices. The project code and dataset are openly available at \url{https://github.com/qinche106/cb-convlstm-eyetracking}.
    摘要

Patient Clustering via Integrated Profiling of Clinical and Digital Data

  • paper_url: http://arxiv.org/abs/2308.11748
  • repo_url: None
  • paper_authors: Dongjin Choi, Andy Xiang, Ozgur Ozturk, Deep Shrestha, Barry Drake, Hamid Haidarian, Faizan Javed, Haesun Park
  • for: 这篇论文是为了描述一种基于负载特征的患者群组模型,用于医疗数据分析。
  • methods: 该模型使用了受限的低维度approximation方法,利用患者的临床数据和数字互动数据(包括浏览和搜索)构建患者profile。
  • results: 对于实际的患者数据集,该模型与其他基线相比,表现出较高的团结性和推荐精度。
    Abstract We introduce a novel profile-based patient clustering model designed for clinical data in healthcare. By utilizing a method grounded on constrained low-rank approximation, our model takes advantage of patients' clinical data and digital interaction data, including browsing and search, to construct patient profiles. As a result of the method, nonnegative embedding vectors are generated, serving as a low-dimensional representation of the patients. Our model was assessed using real-world patient data from a healthcare web portal, with a comprehensive evaluation approach which considered clustering and recommendation capabilities. In comparison to other baselines, our approach demonstrated superior performance in terms of clustering coherence and recommendation accuracy.
    摘要 我们介绍了一种基于Profile的患者划分模型,适用于医疗数据。我们的模型利用了基于受限制的低纬度近似的方法,使用患者的临床数据和数字互动数据,包括浏览和搜索,构建患者 profil。因此,我们生成了非负嵌入向量,用于表示患者。我们的模型在使用实际的患者数据from a healthcare web portal进行评估,并对划分和推荐能力进行全面评估。与其他基线相比,我们的方法在划分准确性和推荐准确性方面表现出色。

Animal3D: A Comprehensive Dataset of 3D Animal Pose and Shape

  • paper_url: http://arxiv.org/abs/2308.11737
  • repo_url: None
  • paper_authors: Jiacong Xu, Yi Zhang, Jiawei Peng, Wufei Ma, Artur Jesslen, Pengliang Ji, Qixin Hu, Jiehua Zhang, Qihao Liu, Jiahao Wang, Wei Ji, Chen Wang, Xiaoding Yuan, Prakhar Kaushik, Guofeng Zhang, Jie Liu, Yushan Xie, Yawen Cui, Alan Yuille, Adam Kortylewski
  • for: 这个论文的目的是提出了一个全面的动物3D姿势和形状估算数据集,以便更好地理解动物行为,并可能对野生动物保护等应用程序产生影响。
  • methods: 这篇论文使用了高质量的26个关键点标注,以及SMAL模型的姿势和形状参数,对40种哺乳动物进行了3379张图像收集。这些标注都是由人工进行了多个阶段的检查和标注,以确保结果的最高质量。
  • results: 这篇论文的实验结果表明, predicting animals across species的3D姿势和形状仍然是一个非常困难的任务,尽管人体姿势估算技术已经取得了显著进步。同时,这篇论文也表明,将Synthetic数据用于适应实际数据的转移是一个可靠的策略,可以提高模型的性能。
    Abstract Accurately estimating the 3D pose and shape is an essential step towards understanding animal behavior, and can potentially benefit many downstream applications, such as wildlife conservation. However, research in this area is held back by the lack of a comprehensive and diverse dataset with high-quality 3D pose and shape annotations. In this paper, we propose Animal3D, the first comprehensive dataset for mammal animal 3D pose and shape estimation. Animal3D consists of 3379 images collected from 40 mammal species, high-quality annotations of 26 keypoints, and importantly the pose and shape parameters of the SMAL model. All annotations were labeled and checked manually in a multi-stage process to ensure highest quality results. Based on the Animal3D dataset, we benchmark representative shape and pose estimation models at: (1) supervised learning from only the Animal3D data, (2) synthetic to real transfer from synthetically generated images, and (3) fine-tuning human pose and shape estimation models. Our experimental results demonstrate that predicting the 3D shape and pose of animals across species remains a very challenging task, despite significant advances in human pose estimation. Our results further demonstrate that synthetic pre-training is a viable strategy to boost the model performance. Overall, Animal3D opens new directions for facilitating future research in animal 3D pose and shape estimation, and is publicly available.
    摘要 通过正确估计动物的3D姿势和形状,可以更好地理解动物的行为,并可能帮助许多下游应用,如野生动物保护。但是,这个领域的研究受到缺乏全面和多样的数据集,高品质的3D姿势和形状标注的限制。在这篇论文中,我们提出了动物3D(Animal3D)数据集,是首个涵盖40种哺乳动物的全面数据集。动物3D包括3379张图像,来自40种哺乳动物,以及26个关键点的高品质标注。所有标注都是由人工标注和确认,以 Ensure highest quality results。基于动物3D数据集,我们在:(1)仅使用动物3D数据集进行监督学习,(2)从 sintetically generated images进行synthetic to real transfer,以及(3)人体姿势和形状估计模型的精确调整。我们的实验结果显示,预测动物遍布多种物种的3D姿势和形状仍然是一个非常困难的任务,尽管人体姿势估计已经取得了重要进步。我们的结果还显示,Synthetic pre-training是一种可行的策略,可以增强模型的性能。总之,动物3D开启了新的研究方向,并且公开可用。

Knowledge Graph Prompting for Multi-Document Question Answering

  • paper_url: http://arxiv.org/abs/2308.11730
  • repo_url: None
  • paper_authors: Yu Wang, Nedim Lipka, Ryan A. Rossi, Alexa Siu, Ruiyi Zhang, Tyler Derr
  • for: 提高大型自然语言模型(LLM)在多文档问答(MD-QA)任务中的表现,特别是在具有多个文档和不同结构的情况下。
  • methods: 提出了一种知识图 prompting(KGP)方法,包括知识图建构模块和知识图游走模块。知识图建构模块使用多个文档中的节点和边表示文章或文档结构的semantic/lexical相似性,而知识图游走模块使用LM帮助浏览器在多个文档中搜索相关的段落,以帮助LLM在MD-QA任务中进行问题回答。
  • results: 经验表明,KGP方法可以有效提高LLM在MD-QA任务中的表现,这表明可以通过图来提高LLM的表现,并且可以在多个文档和不同结构的情况下使用这种方法。
    Abstract The 'pre-train, prompt, predict' paradigm of large language models (LLMs) has achieved remarkable success in open-domain question answering (OD-QA). However, few works explore this paradigm in the scenario of multi-document question answering (MD-QA), a task demanding a thorough understanding of the logical associations among the contents and structures of different documents. To fill this crucial gap, we propose a Knowledge Graph Prompting (KGP) method to formulate the right context in prompting LLMs for MD-QA, which consists of a graph construction module and a graph traversal module. For graph construction, we create a knowledge graph (KG) over multiple documents with nodes symbolizing passages or document structures (e.g., pages/tables), and edges denoting the semantic/lexical similarity between passages or intra-document structural relations. For graph traversal, we design an LM-guided graph traverser that navigates across nodes and gathers supporting passages assisting LLMs in MD-QA. The constructed graph serves as the global ruler that regulates the transitional space among passages and reduces retrieval latency. Concurrently, the LM-guided traverser acts as a local navigator that gathers pertinent context to progressively approach the question and guarantee retrieval quality. Extensive experiments underscore the efficacy of KGP for MD-QA, signifying the potential of leveraging graphs in enhancing the prompt design for LLMs. Our code is at https://github.com/YuWVandy/KG-LLM-MDQA.
    摘要 “pre-train, prompt, predict”模式下的大型自然语言模型(LLM)在开放预测问答(OD-QA)任务上取得了很大的成功。然而,有很少的研究探讨这种模式在多文档问答(MD-QA)任务中的应用,MD-QA任务需要对不同文档的内容和结构进行深入的理解。为了填补这个重要的 gap,我们提出了知识图 prompting(KGP)方法,用于在 LLM 中提示 MD-QA,该方法包括知识图构建模块和知识图游走模块。为知识图构建,我们创建了多个文档之间的知识图(KG),其中节点表示文档中的段落或结构(例如页码/表格),而边表示文档之间的semantic/lexical相似性或内部结构关系。为知识图游走,我们设计了LM-引导的图游走器,该模块可以在知识图中穿梭节点,并收集支持段落,以助于 LLM 在 MD-QA 中进行问题解答。知识图 acts as a global ruler,控制了文档之间的转移空间,并降低了检索延迟。同时,LM-引导的图游走器 acts as a local navigator,收集有关问题的context,逐渐接近问题,并保证检索质量。广泛的实验证明 KGP 对 MD-QA 具有很高的效果,这表明可以通过图来提高 LLM 的提示设计。我们的代码可以在 GitHub 上找到:https://github.com/YuWVandy/KG-LLM-MDQA。

When Are Two Lists Better than One?: Benefits and Harms in Joint Decision-making

  • paper_url: http://arxiv.org/abs/2308.11721
  • repo_url: https://github.com/kpdonahue/benefits_harms_joint_decision_making
  • paper_authors: Kate Donahue, Kostas Kollias, Sreenivas Gollapudi
  • for: 这个论文研究了一种人机算法合作的情况,在这种情况下,算法会给人选择一个 subsets of size k,人会从这个subset中选择最佳的一个。这种情况可以应用于内容推荐、路径规划等任务。
  • methods: 作者使用了不同的噪音模型来分析人机算法合作的优化问题,包括Mallows模型和随机Utility模型。
  • results: 作者发现,在一些噪音模型下,可以通过人机算法合作来提高最终选择的概率,并且这种合作效果在k在[2, n-1]之间最佳。但是,当人和算法的噪音水平不同时,合作的效果是倒挂的。
    Abstract Historically, much of machine learning research has focused on the performance of the algorithm alone, but recently more attention has been focused on optimizing joint human-algorithm performance. Here, we analyze a specific type of human-algorithm collaboration where the algorithm has access to a set of $n$ items, and presents a subset of size $k$ to the human, who selects a final item from among those $k$. This scenario could model content recommendation, route planning, or any type of labeling task. Because both the human and algorithm have imperfect, noisy information about the true ordering of items, the key question is: which value of $k$ maximizes the probability that the best item will be ultimately selected? For $k=1$, performance is optimized by the algorithm acting alone, and for $k=n$ it is optimized by the human acting alone. Surprisingly, we show that for multiple of noise models, it is optimal to set $k \in [2, n-1]$ - that is, there are strict benefits to collaborating, even when the human and algorithm have equal accuracy separately. We demonstrate this theoretically for the Mallows model and experimentally for the Random Utilities models of noisy permutations. However, we show this pattern is reversed when the human is anchored on the algorithm's presented ordering - the joint system always has strictly worse performance. We extend these results to the case where the human and algorithm differ in their accuracy levels, showing that there always exist regimes where a more accurate agent would strictly benefit from collaborating with a less accurate one, but these regimes are asymmetric between the human and the algorithm's accuracy.
    摘要 Translated into Simplified Chinese:历史上,许多机器学习研究都集中在算法性能上,但最近更多的注意力转移到了人机合作性能的优化。在这种人机合作中,算法可以访问一个集合中的 $n$ 个项目,并将一个包含 $k$ 个项目的子集提供给人,人将选择最终的项目。这种情况可以模型内容推荐、路径规划或任何类型的标注任务。因为人和算法都有不准确、噪音的信息,关键问题是:哪个值的 $k$ 最大化最终选择最佳项目的概率? For $k=1$, 性能是由算法独立作用优化的,而 For $k=n$ 是由人独立作用优化的。Surprisingly, we show that for multiple noise models, it is optimal to set $k \in [2, n-1]$ - that is, there are strict benefits to collaborating, even when the human and algorithm have equal accuracy separately. We demonstrate this theoretically for the Mallows model and experimentally for the Random Utilities models of noisy permutations. However, we show this pattern is reversed when the human is anchored on the algorithm's presented ordering - the joint system always has strictly worse performance. We extend these results to the case where the human and algorithm differ in their accuracy levels, showing that there always exist regimes where a more accurate agent would strictly benefit from collaborating with a less accurate one, but these regimes are asymmetric between the human and the algorithm's accuracy.

Collect, Measure, Repeat: Reliability Factors for Responsible AI Data Collection

  • paper_url: http://arxiv.org/abs/2308.12885
  • repo_url: None
  • paper_authors: Oana Inel, Tim Draws, Lora Aroyo
    for:这篇论文旨在提出一种负责任AI(RAI)方法,用于系统地评估数据采集过程中的质量和可靠性因素。methods:这篇论文提出了一个综合的评估 metric 集,用于评估数据采集过程中的内部可靠性和外部稳定性。此外,论文还 validate 了这些 metric 的可行性和效果性。results:论文的实验结果表明,使用 RAI 方法可以帮助评估数据采集过程中的质量和可靠性问题,并且可以提高数据采集过程中的质量和可靠性。此外,论文还发现了一些可能的不公正和偏见问题,并提出了一些建议来解决这些问题。
    Abstract The rapid entry of machine learning approaches in our daily activities and high-stakes domains demands transparency and scrutiny of their fairness and reliability. To help gauge machine learning models' robustness, research typically focuses on the massive datasets used for their deployment, e.g., creating and maintaining documentation for understanding their origin, process of development, and ethical considerations. However, data collection for AI is still typically a one-off practice, and oftentimes datasets collected for a certain purpose or application are reused for a different problem. Additionally, dataset annotations may not be representative over time, contain ambiguous or erroneous annotations, or be unable to generalize across issues or domains. Recent research has shown these practices might lead to unfair, biased, or inaccurate outcomes. We argue that data collection for AI should be performed in a responsible manner where the quality of the data is thoroughly scrutinized and measured through a systematic set of appropriate metrics. In this paper, we propose a Responsible AI (RAI) methodology designed to guide the data collection with a set of metrics for an iterative in-depth analysis of the factors influencing the quality and reliability} of the generated data. We propose a granular set of measurements to inform on the internal reliability of a dataset and its external stability over time. We validate our approach across nine existing datasets and annotation tasks and four content modalities. This approach impacts the assessment of data robustness used for AI applied in the real world, where diversity of users and content is eminent. Furthermore, it deals with fairness and accountability aspects in data collection by providing systematic and transparent quality analysis for data collections.
    摘要 machine learning技术在我们日常生活和高飞度领域的快速进入需要透明度和审核其公平性和可靠性。为了评估机器学习模型的坚固性,研究通常会关注在其部署时使用的庞大数据集,例如创建和维护对其起源、开发过程和伦理考虑的文档。然而,AI数据收集仍然是一种一次性的做法,常常数据集用于一个特定目的或应用程序将被重复使用。此外,数据集标注可能无法在时间上保持代表性,包含歧义或错误的标注,或者无法泛化到问题或领域。 latest research shows that these practices may lead to unfair, biased, or inaccurate outcomes.我们认为AI数据收集应该按照负责任的方式进行,即系统地评估数据集质量的相关指标。在这篇论文中,我们提出了一种负责任AI(RAI)方法,用于指导数据收集,并提供了一组 metrics 用于对数据生成的质量进行Iterative深入分析。我们验证了我们的方法在九个现有数据集和注释任务中,以及四种内容模式中。这种方法对实际世界中AI应用的评估数据坚固性产生了影响,而且解决了数据收集中的公平性和责任问题。

SuperCalo: Calorimeter shower super-resolution

  • paper_url: http://arxiv.org/abs/2308.11700
  • repo_url: https://github.com/ian-pang/supercalo
  • paper_authors: Ian Pang, John Andrew Raine, David Shih
    for: 这篇论文的目的是来解决加大数据蒸气推测的大型巨原子对撞机计算管道中的主要瓶颈。methods: 这篇论文使用了深度生成式代理模型来解决这个挑战。results: 这篇论文引入了SuperCalo流程基础超解析模型,可以快速将高维度细腻的推测器蒸气转换为高精度的推测器蒸气,从而降低计算成本、内存需求和生成时间。此外,该论文还证明了转换后的推测器蒸气具有高度的多样性,这使得从少量粗糙的推测器蒸气中生成大量高精度的推测器蒸气,从而实现了额外的计算时间缩减。
    Abstract Calorimeter shower simulation is a major bottleneck in the Large Hadron Collider computational pipeline. There have been recent efforts to employ deep-generative surrogate models to overcome this challenge. However, many of best performing models have training and generation times that do not scale well to high-dimensional calorimeter showers. In this work, we introduce SuperCalo, a flow-based super-resolution model, and demonstrate that high-dimensional fine-grained calorimeter showers can be quickly upsampled from coarse-grained showers. This novel approach presents a way to reduce computational cost, memory requirements and generation time associated with fast calorimeter simulation models. Additionally, we show that the showers upsampled by SuperCalo possess a high degree of variation. This allows a large number of high-dimensional calorimeter showers to be upsampled from much fewer coarse showers with high-fidelity, which results in additional reduction in generation time.
    摘要 加速器浸水示例是大型哈丁核爆机器学pipeline中的主要瓶颈。有最近尝试使用深度生成器模型来缓解这个挑战。然而,许多最高性能的模型的训练和生成时间不符合高维度加速器浸水的需求。在这项工作中,我们介绍SuperCalo,一种流基的超分辨模型,并证明了高维度细化加速器浸水可以快速升频自粗化浸水。这种新的方法可以降低计算成本、内存需求和生成时间相关的快速加速器 simulate模型的负担。此外,我们还证明了由SuperCalo升频的浸水具有高度的变化度。这意味着可以从少量粗化浸水中生成大量高维度加速器浸水,这会导致更大的生成时间减少。

Efficient Benchmarking (of Language Models)

  • paper_url: http://arxiv.org/abs/2308.11696
  • repo_url: https://github.com/sumankrsh/Sentiment-Analysis.ipynb
  • paper_authors: Yotam Perlitz, Elron Bandel, Ariel Gera, Ofir Arviv, Liat Ein-Dor, Eyal Shnarch, Noam Slonim, Michal Shmueli-Scheuer, Leshem Choshen
  • for: 本文旨在提出一个名为”高效评测”的问题,即智能减少LM评测计算成本而不损害可靠性。
  • methods: 作者使用了HELMbenchmark作为测试 caso,研究了不同的benchmark设计选择对计算vs可靠性的负面影响。提出了一个新的度量指标DIoR来评估这些决策的可靠性。
  • results: 研究发现,现有的领先者在HELMbenchmark中可能会通过去掉一个低排名的模型来改变排名。同时,只需几个例子就可以获得正确的排名。然而,不同的HELM场景选择会导致排名差异很大。基于这些发现,作者提出了一些具体的建议,以实现更高效的benchmark设计和使用做法,可以减少计算成本,同时减少可靠性损失。
    Abstract The increasing versatility of language models LMs has given rise to a new class of benchmarks that comprehensively assess a broad range of capabilities. Such benchmarks are associated with massive computational costs reaching thousands of GPU hours per model. However the efficiency aspect of these evaluation efforts had raised little discussion in the literature. In this work we present the problem of Efficient Benchmarking namely intelligently reducing the computation costs of LM evaluation without compromising reliability. Using the HELM benchmark as a test case we investigate how different benchmark design choices affect the computation-reliability tradeoff. We propose to evaluate the reliability of such decisions by using a new measure Decision Impact on Reliability DIoR for short. We find for example that the current leader on HELM may change by merely removing a low-ranked model from the benchmark and observe that a handful of examples suffice to obtain the correct benchmark ranking. Conversely a slightly different choice of HELM scenarios varies ranking widely. Based on our findings we outline a set of concrete recommendations for more efficient benchmark design and utilization practices leading to dramatic cost savings with minimal loss of benchmark reliability often reducing computation by x100 or more.
    摘要 LM模型的多样化性已经导致了一种新的评估板准,全面评估LM模型的各种能力。然而,这些评估努力的效率方面几乎没有在文献中得到了讨论。在这个工作中,我们提出了一个问题,即如何智能减少LM评估计算成本,无需妥协可靠性。我们使用HELM benchmark作为案例,研究不同的评估方案对计算与可靠性的负面影响。我们还提出了一个新的度量器,即决策影响可靠性(DIoR),用于评估这些决策的可靠性。我们发现,例如,现有的HELM领导者可以通过移除一个低排名的模型来改变领导者,并且发现一些例子即可以获得正确的排名。然而,不同的HELM场景选择会导致排名差异很大。基于我们的发现,我们提出了一组具体的建议,包括更有效的评估设计和使用方法,可以带来计算量减少100倍或更多,而且减少的计算量不会对可靠性产生负面影响。

Semantic Multi-Resolution Communications

  • paper_url: http://arxiv.org/abs/2308.11604
  • repo_url: None
  • paper_authors: Matin Mortaheb, Mohammad A. Amir Khojastepour, Srimat T. Chakradhar, Sennur Ulukus
  • for: 提高数据重建和保留 semantic 特征
  • methods: 使用深度学习 multi-resolution JSCC 框架和 multi-task learning
  • results: 在 MNIST 和 CIFAR10 dataset 上实验表明,提出的方法可以超过 SSCC 方法,重建数据的不同分辨率,并且可以在不同层次中提高 semantic 特征的抽取精度。
    Abstract Deep learning based joint source-channel coding (JSCC) has demonstrated significant advancements in data reconstruction compared to separate source-channel coding (SSCC). This superiority arises from the suboptimality of SSCC when dealing with finite block-length data. Moreover, SSCC falls short in reconstructing data in a multi-user and/or multi-resolution fashion, as it only tries to satisfy the worst channel and/or the highest quality data. To overcome these limitations, we propose a novel deep learning multi-resolution JSCC framework inspired by the concept of multi-task learning (MTL). This proposed framework excels at encoding data for different resolutions through hierarchical layers and effectively decodes it by leveraging both current and past layers of encoded data. Moreover, this framework holds great potential for semantic communication, where the objective extends beyond data reconstruction to preserving specific semantic attributes throughout the communication process. These semantic features could be crucial elements such as class labels, essential for classification tasks, or other key attributes that require preservation. Within this framework, each level of encoded data can be carefully designed to retain specific data semantics. As a result, the precision of a semantic classifier can be progressively enhanced across successive layers, emphasizing the preservation of targeted semantics throughout the encoding and decoding stages. We conduct experiments on MNIST and CIFAR10 dataset. The experiment with both datasets illustrates that our proposed method is capable of surpassing the SSCC method in reconstructing data with different resolutions, enabling the extraction of semantic features with heightened confidence in successive layers. This capability is particularly advantageous for prioritizing and preserving more crucial semantic features within the datasets.
    摘要 深度学习基于联合源渠道编码(JSCC)已经实现了数据重建的显著进步,相比单独源渠道编码(SSCC)。这种超越来自于数据块长度有限制的情况下,SSCC的优化不足。此外,SSCC在多用户和/或多分辨率上重建数据时也缺乏能力,因为它只是为最坏通道和/或最高质量数据进行了满足。为了解决这些限制,我们提出了一种基于多任务学习(MTL)的深度学习多分辨率JSCC框架。这个提议的框架通过层次结构来编码数据,并通过当前和过去层编码数据来有效地解码。此外,这个框架具有潜在的semantic communication功能,其目标超越了数据重建,是保留特定semantic attribute的过程。这些semantic attribute可以是分类任务中的关键特征,或者其他需要保留的关键特征。在这个框架中,每层编码数据都可以被仔细设计,以保留特定数据 semantics。因此,在successive层中,精度的semantic classifier可以不断提高,强调在编码和解码阶段保留targeted semantics。我们在MNIST和CIFAR10 dataset上进行了实验,实验结果显示,我们的提议方法可以在不同分辨率下重建数据,并且可以在successive层中提高semantic feature的EXTRACTION confidence。这种能力特别有利于优先保留dataset中更重要的semantic feature。

Tryage: Real-time, intelligent Routing of User Prompts to Large Language Models

  • paper_url: http://arxiv.org/abs/2308.11601
  • repo_url: None
  • paper_authors: Surya Narayanan Hari, Matt Thomson
    for: This paper aims to address the problem of selecting and optimizing language models for specific downstream tasks and data domains, and to provide a framework for efficient use of the expanding and evolving language model ecosystem.methods: The proposed method, Tryage, leverages a language model router to predict down-stream model performance on prompts and makes a routing decision using an objective function that integrates performance predictions with user goals and constraints.results: Tryage surpasses Gorilla and GPT3.5 turbo in dynamic model selection, identifying the optimal model with an accuracy of 50.9%, compared to 23.6% by GPT 3.5 Turbo and 10.8% by Gorilla, across heterogeneous data sets that include code, text, clinical data, and patents.
    Abstract The introduction of the transformer architecture and the self-attention mechanism has led to an explosive production of language models trained on specific downstream tasks and data domains. With over 200, 000 models in the Hugging Face ecosystem, users grapple with selecting and optimizing models to suit multifaceted workflows and data domains while addressing computational, security, and recency concerns. There is an urgent need for machine learning frameworks that can eliminate the burden of model selection and customization and unleash the incredible power of the vast emerging model library for end users. Here, we propose a context-aware routing system, Tryage, that leverages a language model router for optimal selection of expert models from a model library based on analysis of individual input prompts. Inspired by the thalamic router in the brain, Tryage employs a perceptive router to predict down-stream model performance on prompts and, then, makes a routing decision using an objective function that integrates performance predictions with user goals and constraints that are incorporated through flags (e.g., model size, model recency). Tryage allows users to explore a Pareto front and automatically trade-off between task accuracy and secondary goals including minimization of model size, recency, security, verbosity, and readability. Across heterogeneous data sets that include code, text, clinical data, and patents, the Tryage framework surpasses Gorilla and GPT3.5 turbo in dynamic model selection identifying the optimal model with an accuracy of 50.9% , compared to 23.6% by GPT 3.5 Turbo and 10.8% by Gorilla. Conceptually, Tryage demonstrates how routing models can be applied to program and control the behavior of multi-model LLM systems to maximize efficient use of the expanding and evolving language model ecosystem.
    摘要 “ transformer 架构和自注意机制的引入,导致了语言模型的激素生产,现有超过200,000个模型在Hugging Face生态系统中。用户面临着选择和优化模型,以适应多样化的工作流程和数据领域,同时解决计算、安全和新鲜度等问题。为了解决这些问题,我们提出了一种基于语言模型路由的上下文意识路由系统,即Tryage。这种系统利用一个具有见识力的路由器,预测下游模型的性能,并根据用户目标和约束(如模型大小、新鲜度、安全性、verbosity和可读性),使用一个 объек function 进行路由决策。Tryage 允许用户探索 Pareto 前提,并自动交易between 任务准确率和次要目标。在不同的数据集中,包括代码、文本、医疗数据和专利,Tryage 框架超过 Gorilla 和 GPT3.5 Turbo 在动态模型选择中,可以准确地选择最佳模型,准确率为 50.9%,比 GPT 3.5 Turbo 的 23.6% 和 Gorilla 的 10.8% 高。概念上,Tryage 示例了如何通过路由模型来控制多模型 LLMS 的行为,以优化该不断扩展和演化的语言模型生态系统的使用。”

Low Tensor Rank Learning of Neural Dynamics

  • paper_url: http://arxiv.org/abs/2308.11567
  • repo_url: None
  • paper_authors: Arthur Pellegrino, N Alex Cayco-Gajic, Angus Chadwick
  • for: 这个论文的目的是研究训练过程中神经网络中连接 Population 的聚合变化,以及这些变化如何影响学习。
  • methods: 这个论文使用了 RNN 的训练数据来研究学习过程中weight matrices的变化,并通过low-tensor-rank decomposition来描述 weight matrices 的变化。
  • results: 研究发现,在训练过程中,weight matrices 的低维度结构会随着学习的进行演化,并且这种演化可以通过low-tensor-rank decomposition来描述。此外,研究还发现了一些数学结果,可以限制神经网络中 population connectivity 的演化过程。
    Abstract Learning relies on coordinated synaptic changes in recurrently connected populations of neurons. Therefore, understanding the collective evolution of synaptic connectivity over learning is a key challenge in neuroscience and machine learning. In particular, recent work has shown that the weight matrices of task-trained RNNs are typically low rank, but how this low rank structure unfolds over learning is unknown. To address this, we investigate the rank of the 3-tensor formed by the weight matrices throughout learning. By fitting RNNs of varying rank to large-scale neural recordings during a motor learning task, we find that the inferred weights are low-tensor-rank and therefore evolve over a fixed low-dimensional subspace throughout the entire course of learning. We next validate the observation of low-tensor-rank learning on an RNN trained to solve the same task by performing a low-tensor-rank decomposition directly on the ground truth weights, and by showing that the method we applied to the data faithfully recovers this low rank structure. Finally, we present a set of mathematical results bounding the matrix and tensor ranks of gradient descent learning dynamics which show that low-tensor-rank weights emerge naturally in RNNs trained to solve low-dimensional tasks. Taken together, our findings provide novel constraints on the evolution of population connectivity over learning in both biological and artificial neural networks, and enable reverse engineering of learning-induced changes in recurrent network dynamics from large-scale neural recordings.
    摘要 学习依赖在相互连接的神经元Population中协调发生 synaptic 变化。因此,理解学习过程中集合 synaptic 连接性的演化是神经科学和机器学习领域的关键挑战。特别是,最近的研究表明,任务训练过的 RNN 的 weight 矩阵通常具有低纬度结构,但是这种低纬度结构如何发展过程是未知。为了解决这个问题,我们研究了 throughout 学习过程中 weight 矩阵的rank。我们使用不同级别的 RNN 适应大规模神经记录数据,并发现它们的推出 веса都是低纬度结构,因此在整个学习过程中发展在固定的低维度子空间中。我们接着验证了这种低纬度结构在一个同样的任务上训练的 RNN 中是否存在,并通过直接对真实的 weights 进行低纬度分解来验证我们的方法。最后,我们提供了一些数学结果,证明在梯度下降学习动力学中,低纬度结构的 weights 会自然地出现在解决低维度任务的 RNN 中。总之,我们的发现提供了对学习过程中 population 连接性的演化的新的约束,并允许从大规模神经记录数据中逆向工程学习学习-induced 变化的 recurrent 网络动力学。

Practical Insights on Incremental Learning of New Human Physical Activity on the Edge

  • paper_url: http://arxiv.org/abs/2308.11691
  • repo_url: None
  • paper_authors: George Arvanitakis, Jingwei Zuo, Mthandazo Ndhlovu, Hakim Hacid
  • for: 本文探讨了Edge机器学习(Edge ML)的问题,强调了在Edge设备上运行计算智能的优势和挑战。
  • methods: 本文使用了MAGNETO系统,通过收集移动传感器数据来学习人类活动。
  • results: 实验表明,Edge ML带来了数据存储受限、计算能力有限和学习类别数量的挑战。
    Abstract Edge Machine Learning (Edge ML), which shifts computational intelligence from cloud-based systems to edge devices, is attracting significant interest due to its evident benefits including reduced latency, enhanced data privacy, and decreased connectivity reliance. While these advantages are compelling, they introduce unique challenges absent in traditional cloud-based approaches. In this paper, we delve into the intricacies of Edge-based learning, examining the interdependencies among: (i) constrained data storage on Edge devices, (ii) limited computational power for training, and (iii) the number of learning classes. Through experiments conducted using our MAGNETO system, that focused on learning human activities via data collected from mobile sensors, we highlight these challenges and offer valuable perspectives on Edge ML.
    摘要 <>转换文本为简化中文。<>Edge机器学习(Edge ML),将计算智能从云端系统传递到边缘设备,目前吸引了广泛的关注,因为它们的明显优点包括降低延迟、提高数据隐私和降低连接依赖。虽然这些优点吸引人,但它们也引入了传统云端方法中缺失的挑战。在这篇论文中,我们探讨边缘学习的细节,检查Edge设备上的受限数据存储、训练计算能力和学习类数的关系。通过我们使用的MAGNETO系统,我们对通过移动传感器收集的数据进行人类活动学习,并将这些挑战和Edge ML的价值观点提出。

Multi-event Video-Text Retrieval

  • paper_url: http://arxiv.org/abs/2308.11551
  • repo_url: https://github.com/gengyuanmax/mevtr
  • paper_authors: Gengyuan Zhang, Jisen Ren, Jindong Gu, Volker Tresp
  • for: 这篇论文 targets the problem of video-text retrieval (VTR) in the era of massive video-text data, specifically addressing the gap between previous models and real-world scenarios where videos contain multiple events.
  • methods: The proposed method, Me-Retriever, incorporates key event video representation and a new MeVTR loss for the Multi-event Video-Text Retrieval (MeVTR) task.
  • results: The straightforward framework outperforms other models in the Video-to-Text and Text-to-Video tasks, establishing a robust baseline for the MeVTR task.Here’s the simplified Chinese text:
  • for: 这篇论文targets视频文本检索任务(VTR),特别是面对巨量视频文本数据时,现有模型假设视频内容与文本匹配是一对一的,而实际应用中视频通常包含多个事件,文本则是单个事件的描述。这种差距导致先前的模型在推理过程中可能存在性能下降。
  • methods: 提出的方法是Me-Retriever,它使用关键事件视频表示和MeVTR损失函数来解决Multi-event Video-Text Retrieval(MeVTR)任务。
  • results: 这种简单的框架在视频到文本和文本到视频任务中比其他模型表现更好,建立了MeVTR任务的可靠基线。
    Abstract Video-Text Retrieval (VTR) is a crucial multi-modal task in an era of massive video-text data on the Internet. A plethora of work characterized by using a two-stream Vision-Language model architecture that learns a joint representation of video-text pairs has become a prominent approach for the VTR task. However, these models operate under the assumption of bijective video-text correspondences and neglect a more practical scenario where video content usually encompasses multiple events, while texts like user queries or webpage metadata tend to be specific and correspond to single events. This establishes a gap between the previous training objective and real-world applications, leading to the potential performance degradation of earlier models during inference. In this study, we introduce the Multi-event Video-Text Retrieval (MeVTR) task, addressing scenarios in which each video contains multiple different events, as a niche scenario of the conventional Video-Text Retrieval Task. We present a simple model, Me-Retriever, which incorporates key event video representation and a new MeVTR loss for the MeVTR task. Comprehensive experiments show that this straightforward framework outperforms other models in the Video-to-Text and Text-to-Video tasks, effectively establishing a robust baseline for the MeVTR task. We believe this work serves as a strong foundation for future studies. Code is available at https://github.com/gengyuanmax/MeVTR.
    摘要 视频文本检索(VTR)在互联网大量视频文本数据时代是一项重要的多Modal任务。许多工作采用了两气流视语言模型建立joint表示方式,学习视频文本对应的共同表示。然而,这些模型假设视频内容与文本对应一一,忽略了实际应用中更加常见的情况,即视频内容通常包含多个事件,而文本 queries 或 webpage metadata 则更加准确地对应单个事件。这种情况导致了之前训练目标与实际应用之间的差距,可能导致先前的模型在推理过程中表现下降。在这项研究中,我们引入了多事件视频文本检索任务(MeVTR),解决了视频内容包含多个不同事件的场景,是传统Video-Text Retrieval Task的一个特殊情况。我们提出了一个简单的模型,Me-Retriever,该模型包括关键事件视频表示和一种新的 MeVTR 损失函数。我们进行了广泛的实验,证明了这个简单的框架可以在视频到文本和文本到视频任务中超越其他模型,成为MeVTR任务的坚实基础。代码可以在https://github.com/gengyuanmax/MeVTR 上获取。