paper_authors: Shruti Mishra, Ankit Anand, Jordan Hoffmann, Nicolas Heess, Martin Riedmiller, Abbas Abdolmaleki, Doina Precup
for: 本研究旨在使用相关的先进教学策略来帮助强化学习代理人学习成功的行为策略。
methods: 本研究使用了多目标策略优化算法(Multi-Objective Maximum a Posteriori Policy Optimization,简称MOPPO),其中包括任务目标以及教学策略作为多个目标。
results: 研究表明,在继续使用教学策略的情况下,强化学习代理人可以更快速地学习任务,特别是在缺乏形成奖励的情况下。在连续观察和动作空间的两个领域中,我们的代理人成功地组合了教学策略序列和并行,并能够进一步扩展教学策略以解决任务。Abstract
We enable reinforcement learning agents to learn successful behavior policies by utilizing relevant pre-existing teacher policies. The teacher policies are introduced as objectives, in addition to the task objective, in a multi-objective policy optimization setting. Using the Multi-Objective Maximum a Posteriori Policy Optimization algorithm \citep{abdolmaleki2020distributional}, we show that teacher policies can help speed up learning, particularly in the absence of shaping rewards. In two domains with continuous observation and action spaces, our agents successfully compose teacher policies in sequence and in parallel, and are also able to further extend the policies of the teachers in order to solve the task. Depending on the specified combination of task and teacher(s), teacher(s) may naturally act to limit the final performance of an agent. The extent to which agents are required to adhere to teacher policies are determined by hyperparameters which determine both the effect of teachers on learning speed and the eventual performance of the agent on the task. In the {\tt humanoid} domain \citep{deepmindcontrolsuite2018}, we also equip agents with the ability to control the selection of teachers. With this ability, agents are able to meaningfully compose from the teacher policies to achieve a superior task reward on the {\tt walk} task than in cases without access to the teacher policies. We show the resemblance of composed task policies with the corresponding teacher policies through videos.
摘要
我们使用已有的教师策略来帮助权威学习代理人学习成功行为策略。教师策略被引入为目标之一,同时与任务目标一起使用多目标策略优化算法 \citep{abdolmaleki2020distributional}。我们在连续观察和动作空间的两个领域中表示,我们的代理人可以成功组合教师策略并且可以进一步扩展教师策略以解决任务。在指定的任务和教师的组合下,教师可能会自然地限制代理人的最终表现。代理人需要遵循教师策略的程度由参数决定,这些参数不仅影响代理人学习速度,还影响代理人在任务上的最终表现。在{\tt humanoid}领域 \citep{deepmindcontrolsuite2018}中,我们还让代理人控制选择教师的能力。通过这种能力,代理人能够有意义地从教师策略中组合任务策略,在{\tt walk}任务上比不使用教师策略的情况下更好的完成任务。我们通过视频显示,组合的任务策略与相应的教师策略之间的相似性。
Random feature approximation for general spectral methods
results: 本文对权重方法的泛化性质进行了研究,并在不同的常数空间中获得了优化的学习率。Abstract
Random feature approximation is arguably one of the most popular techniques to speed up kernel methods in large scale algorithms and provides a theoretical approach to the analysis of deep neural networks. We analyze generalization properties for a large class of spectral regularization methods combined with random features, containing kernel methods with implicit regularization such as gradient descent or explicit methods like Tikhonov regularization. For our estimators we obtain optimal learning rates over regularity classes (even for classes that are not included in the reproducing kernel Hilbert space), which are defined through appropriate source conditions. This improves or completes previous results obtained in related settings for specific kernel algorithms.
摘要
随机特征近似是大规模算法中最受欢迎的技术之一,它提供了对深度神经网络的分析理论方法。我们分析了一大类spectral regularization方法,包括梯度下降或特ikhonov regularization等kernel方法,其中的泛化性质得到了改进或完善。我们的估计器可以在不同的常数类型下获得最佳学习速率,这些常数类型包括 reproduce kernel Hilbert space以外的类型。这些结果与之前在相关的设置中获得的结果相匹配或完善。
Probabilistic solar flare forecasting using historical magnetogram data
results: 包括历史数据在内的多种仪器的日常磁图像数据可以提高预测的准确性和可靠性,磁图像单一帧不含 significatively更多有用信息,而风暴历史信息比我们提取的磁图像特征更具预测力。Abstract
Solar flare forecasting research using machine learning (ML) has focused on high resolution magnetogram data from the SDO/HMI era covering Solar Cycle 24 and the start of Solar Cycle 25, with some efforts looking back to SOHO/MDI for data from Solar Cycle 23. In this paper, we consider over 4 solar cycles of daily historical magnetogram data from multiple instruments. This is the first attempt to take advantage of this historical data for ML-based flare forecasting. We apply a convolutional neural network (CNN) to extract features from full-disk magnetograms together with a logistic regression model to incorporate scalar features based on magnetograms and flaring history. We use an ensemble approach to generate calibrated probabilistic forecasts of M-class or larger flares in the next 24 hours. Overall, we find that including historical data improves forecasting skill and reliability. We show that single frame magnetograms do not contain significantly more relevant information than can be summarized in a small number of scalar features, and that flaring history has greater predictive power than our CNN-extracted features. This indicates the importance of including temporal information in flare forecasting models.
摘要
太阳风暴预测研究使用机器学习(ML)专注于高分辨率磁场图像从SDO/HMI时期的太阳周期24和太阳周期25之前的一些努力,也有一些努力回到SOHO/MDI上的数据。在这篇论文中,我们考虑了4个太阳周期的日常历史磁场数据从多个仪器。这是第一次利用历史数据为ML-基于的太阳风暴预测。我们使用卷积神经网络(CNN)提取磁场图像的特征,并将磁场图像和风暴历史中的一些缺失特征加以逻辑回归模型。我们使用一个集成方法生成标准化的可信度预测M级或大于M级太阳风暴在下一个24小时内发生。总的来说,我们发现包含历史数据可以提高预测技巧和可靠性。我们显示单个帧磁场图像不含有足够重要的信息,而风暴历史更有预测力量。这表明包含时间信息在太阳风暴预测模型中是非常重要的。