results: 在各种MIMO配置下,通过考虑劳顿抖音滤波(Co-CFO)的影响,对比传统方案,提出方案显示出更好的性能,并提供了关于该方案计算复杂度的封闭公式。Abstract
Discrete Cosine Transform (DCT) can be used instead of conventional Discrete Fourier Transform (DFT) for the Orthogonal Frequency Division Multiplexing (OFDM) construction, which offers many advantages. In this paper, the Multiple-Input-Multiple-Output (MIMO) DCT-OFDM is enhanced using a proposed Cosine Domain Equalizer (CDE) instead of a Frequency Domain Equalizer (FDE). The results are evaluated through the Rayleigh fading channel with Co-Carrier Frequency Offset (Co-CFO) of different MIMO configurations. The average bit error probability and the simulated time of the proposed scheme and the conventional one is compared, which indicates the importance of the proposed scheme. Also, a closed formula for the number of arithmetic operations of the proposed equalizer is developed. The proposed equalizer gives a simulation time reduction of about 81.21%, 83.74% compared to that of the conventional LZF-FDE, and LMMSE-FDE, respectively for the case of 4x4 configuration.
摘要
离散归一Transform (DCT) 可以取代传统的离散弗洛勒Transform (DFT) 在Orthogonal Frequency Division Multiplexing (OFDM) 结构中使用,这有多种优点。在这篇论文中,我们使用一种提议的Cosine Domain Equalizer (CDE) 代替Frequency Domain Equalizer (FDE) 来进行多输入多输出 (MIMO) DCT-OFDM 的改进。我们通过套用辐射折射通道进行评估,并对不同的MIMO配置进行比较。结果表明,我们的方案比传统的方案更好,并且我们还发现了一个关闭的公式来计算提议的平衡器的数学操作数量。在4x4配置下,我们的平衡器可以减少计算时间约81.21%、83.74%,相比传统的LZF-FDE和LMMSE-FDE。
Study of MMSE-Based Resource Allocation for Clustered Cell-Free Massive MIMO Networks
methods: 研究人员使用了closed form sum-rate表达式 derivation дляCF和CLCF网络,并使用了线性预编码器(包括零干扰(ZF)和最小平均方差预编码器(MMSE))。
results: 研究人员提出了基于MMSE资源分配技术和改进的增加策略以提高CLCF网络的系统性能,并通过数值计算结果显示,该技术比既有方法有更好的性能,同时计算成本和信号占用量在CLCF网络中大幅减少。Abstract
In this paper, a downlink cell-free massive multiple-input multiple-output (CF massive MIMO) system and a network clustering is considered. Closed form sum-rate expressions are derived for CF and the clustered CF (CLCF) networks where linear precoders included zero forcing (ZF) and minimum mean square error (MMSE) are implemented. An MMSE-based resource allocation technique with multiuser scheduling based on an enhanced greedy technique and power allocation based on the gradient descent (GD) method is proposed in the CLCF network to improve the system performance. Numerical results show that the proposed technique is superior to the existing approaches and the computational cost and the signaling load are essentially reduced in the CLCF network.
摘要
在这篇论文中,我们考虑了下降链Cell-free大量多输入多输出(CF大量MIMO)系统和网络划分。我们 derivatedclosed form的吞吐量表达式 дляCF和划分CF(CLCF)网络,其中使用了线性 precoder,包括零干扰(ZF)和最小平均方差error(MMSE)。我们提出了基于MMSE的资源分配技术,包括多用户调度和基于扩展的greedy技术,以及基于梯度下降(GD)方法的功率分配。numerical results表明,我们的方法与现有方法相比,具有更高的系统性能,并且计算成本和信号载量都有了显著的降低。
Robust Joint Estimation of Galaxy Redshift and Spectral Templates using Online Dictionary Learning
methods: 该算法使用了我们的非线性扩展,可以 JOINTLY estimating the underlying spectral features common across the entire dataset,以及每个星系的红shift。该算法可以处理大量数据,只需要处理一个spectrum at a time。
results: 该算法在一个Mock SPHEREx数据集上表现出色,与现有的状态 искусственный算法相比,具有较低的NMAD标准差(0.18%)和catastrophic error rate(0.40%),并且在各种信号噪比(SNR)下都能够达到低于1%的NMAD和catastrophic error。Abstract
We present a novel approach to analyzing astronomical spectral survey data using our non-linear extension of an online dictionary learning algorithm. Current and upcoming surveys such as SPHEREx will use spectral data to build a 3D map of the universe by estimating the redshifts of millions of galaxies. Existing algorithms rely on hand-curated external templates and have limited performance due to model mismatch error. Our algorithm addresses this limitation by jointly estimating both the underlying spectral features in common across the entire dataset, as well as the redshift of each galaxy. Our online approach scales well to large datasets since we only process a single spectrum in memory at a time. Our algorithm performs better than a state-of-the-art existing algorithm when analyzing a mock SPHEREx dataset, achieving a NMAD standard deviation of 0.18% and a catastrophic error rate of 0.40% when analyzing noiseless data. Our algorithm also performs well over a wide range of signal to noise ratios (SNR), delivering sub-percent NMAD and catastrophic error above median SNR of 20. We released our algorithm publicly at github.com/HyperspectralDictionaryLearning/BryanEtAl2023 .
摘要
我们提出了一种新的方法来分析天文 spectral 数据,使用我们的非线性扩展的在线词典学习算法。现有和未来的Surveys 如 SPHEREx 将使用spectral数据构建宇宙3D地图,估计数百万个星系的红shift。现有算法依赖于手工制作的外部模板,性能有限因模型匹配错误。我们的算法解决了这个限制,同时 estimate 天文spectral 数据中共同存在的下游特征,以及每个星系的红shift。我们的在线方法可扩展到大量数据,只需处理单个spectrum 的内存中数据。我们的算法在分析Mock SPHEREx 数据集时表现出优于现有状态态算法,实现了0.18%的NMAD标准差和0.40%的灾难性错误率,对于无噪数据。我们的算法还在各种信号噪比(SNR)范围内表现出优秀,实现了低于1%的NMAD和灾难性错误,SNR 的 median 值高于20。我们在github.com/HyperspectralDictionaryLearning/BryanEtAl2023 上公开了我们的算法。
Received Signal and Channel Parameter Estimation in Molecular Communications
paper_authors: O. Tansel Baydas, Ozgur B. Akan for:This paper provides a novel molecular communication (MC) model that incorporates a spherical transmitter and receiver with partial absorption, offering a more realistic representation than previous receiver architectures in literature.methods:The paper employs an optimization-based technique using particle swarm optimization (PSO) to accurately estimate the cumulative number of molecules received, and iterative maximum likelihood estimation (MLE) for parameter estimation.results:The proposed model and estimation technique achieve a significant improvement of 5 times in terms of root mean square error (RMSE), and provide an approximate analytical impulse response for estimating channel parameters such as distance, diffusion coefficient, or a combination of both.Abstract
Molecular communication (MC) is a paradigm that employs molecules as information transmitters, hence, requiring unconventional transceivers and detection techniques for the Internet of Bio-Nano Things (IoBNT). In this study, we provide a novel MC model that incorporates a spherical transmitter and receiver with partial absorption. This model offers a more realistic representation than receiver architectures in literature, e.g. passive or entirely absorbing configurations. An optimization-based technique utilizing particle swarm optimization (PSO) is employed to accurately estimate the cumulative number of molecules received. This technique yields nearly constant correction parameters and demonstrates a significant improvement of 5 times in terms of root mean square error (RMSE). The estimated channel model provides an approximate analytical impulse response; hence, it is used for estimating channel parameters such as distance, diffusion coefficient, or a combination of both. We apply iterative maximum likelihood estimation (MLE) for the parameter estimation, which gives consistent errors compared to the estimated Cramer-Rao Lower Bound (CLRB).
摘要
分子通信(MC)是一种 paradigm,它使用分子作为信息传输者,因此需要不同于传统的接收器和检测技术。在这种研究中,我们提供了一种新的 MC 模型,该模型使用球形发射器和接收器,并具有部分吸收。这种模型比文献中的接收器体系结构更加真实,如完全吸收或被动的配置。我们使用优化基于 particle swarm optimization(PSO)技术来准确估计接收到的分子数的总和。这种技术可以提供相对固定的修正参数,并表现出5倍的改善, measured in terms of root mean square error(RMSE)。这个估计的通道模型可以提供一个近似的分析函数,因此可以用于估计通道参数,如距离、扩散系数或两者的组合。我们使用迭代最大 LIKElihood estimation(MLE)进行参数估计,这种方法可以与估计的 Cramer-Rao Lower Bound(CLRB)进行比较。
Cooperative Multi-Monostatic Sensing for Object Localization in 6G Networks
paper_authors: Maximiliano Rivera Figueroa, Pradyumna Kumar Bishoyi, Marina Petrova
for: This paper is written for the purpose of exploring a 5G New Radio (NR)-based cooperative multi-monostatic sensing framework for passive target localization in bandwidth-limited scenarios.
methods: The paper proposes a novel fusion-based estimation process that assigns appropriate weights to the range estimation of each base station (BS) to mitigate the effects of multipath and improve localization accuracy.
results: Extensive simulation results using ray-tracing demonstrate the efficacy of the proposed multi-sensing framework in bandwidth-limited scenarios.Abstract
Enabling passive sensing of the environment using cellular base stations (BSs) will be one of the disruptive features of the sixth-generation (6G) networks. However, accurate localization and positioning of objects are challenging to achieve as multipath significantly degrades the reflected echos. Existing localization techniques perform well under the assumption of large bandwidth available but perform poorly in bandwidth-limited scenarios. To alleviate this problem, in this work, we introduce a 5G New Radio (NR)-based cooperative multi-monostatic sensing framework for passive target localization that operates in the Frequency Range 1 (FR1) band. We propose a novel fusion-based estimation process that can mitigate the effect of multipath by assigning appropriate weight to the range estimation of each BS. Extensive simulation results using ray-tracing demonstrate the efficacy of the proposed multi-sensing framework in bandwidth-limited scenarios.
摘要
<>使用电信基站(BS)进行环境感知将在第六代(6G)网络中引入一项破坏性功能。然而,通过多path导致反射脉冲的准确位置和定位具有挑战性。现有的定位技术在具有大带宽的情况下表现良好,但在带宽限制的情况下表现不佳。为解决这个问题,在这项工作中,我们提出了基于5G新Radio(NR)的合作多单Static感知框架,用于在Frequency Range 1(FR1)频段进行投射式目标定位。我们提议一种基于权重融合的估算过程,以减少多path的影响。通过使用光栅轨迹的 simulations,我们证明了我们的多感知框架在带宽限制的情况下的效果。Note: "Frequency Range 1" (FR1) is a term used in 5G specifications to refer to the frequency range of 450 MHz to 6 GHz.
On RIS-Aided SIMO Gaussian Channels: Towards A Single-RF MIMO Transceiver Architecture
results: 研究发现,在某些情况下,使用 RIS 可以提高 SIMO 系统的容量。此外,提出了一种新的接收机架构,该架构只有一个 RF 前端,并可以使得通道被视为一个 concatenation 的vector Gaussian channel和多个 phase-modulated channel。Abstract
In this paper, for a single-input multiple-output (SIMO) system aided by a passive reconfigurable intelligent surface (RIS), the joint transmission accomplished by the single transmit antenna and the RIS with multiple controllable reflective elements is considered. Relying on a general capacity upper bound derived by a maximum-trace argument, we respectively characterize the capacity of such \rev{a} channel in the low-SNR or the rank-one regimes, in which the optimal configuration of the RIS is proved to be beamforming with carefully-chosen phase shifts. To exploit the potential of modulating extra information on the RIS, based on the QR decomposition, successive interference cancellation, and a strategy named \textit{partially beamforming and partially information-carrying}, we propose a novel transceiver architecture with only a single RF front end at the transmitter, by which the considered channel can be regarded as a concatenation of a vector Gaussian channel and several phase-modulated channels. Especially, we investigate a class of vector Gaussian channels with a hypersphere input support constraint, and not only generalize the existing result to arbitrary-dimensional real spaces but also present its high-order capacity asymptotics, by which both capacities of hypersphere-constrained channels and achievable rates of the proposed transceiver with two different signaling schemes can be well-approximated. Information-theoretic analyses show that the transceiver architecture designed for the SIMO channel has a boosted multiplexing gain, rather than one for the conventionally-used optimized beamforming scheme.Numerical results verify our derived asymptotics and show notable superiority of the proposed transceiver.
摘要
在这篇论文中,我们考虑了单输入多输出(SIMO)系统,其中单个发射天线和pasive重配置智能表面(RIS)的多个可控反射元素的共同传输。我们根据最大跟踪Argument derive一个通用容量Upper bound,然后分别Characterize了这种渠道的容量在low SNR或rank-one regime中,并证明了在这些 régime中,最佳的RIS配置是 beamforming with carefully chosen phase shifts。为了利用RIS的可模塑性,我们基于QR分解、Successive Interference Cancellation和一种名为\textit{partially beamforming and partially information-carrying}的策略,提出了一种新的接收机架构,其中只有一个RF前端在发射器端,从而将考虑的渠道视为一个 concatenation of vector Gaussian channel和多个 phase-modulated channels。特别是,我们研究了一类vector Gaussian channels with hypersphere input support constraint,并不仅总结了现有结果到任意维度实空间中,还提供了其高阶容量尺度 asymptotics,从而可以良好地approximate hypersphere-constrained channels的容量和提案的接收机的可达性。信息理论分析表明,提案的接收机架构具有加大的多路复用效果,而不是使用传统的优化 beamforming scheme。numerical results verify our derived asymptotics and show notable superiority of the proposed transceiver.
SNN Architecture for Differential Time Encoding Using Decoupled Processing Time
paper_authors: Daniel Windhager, Bernhard A. Moser, Michael Lunglmayr
for: This paper is written for researchers and developers who are interested in designing hardware accelerators for spiking neural networks (SNNs).
methods: The paper proposes a lightweight neuron layer architecture that allows SNNs to be directly mapped onto digital hardware, using differential time coding of spike sequences and decoupling processing time and spike timing.
results: The authors present synthesis and performance results showing that their architecture can be implemented for networks of more than 1000 neurons with high clock speeds on a state-of-the-art FPGA, and demonstrate the robustness of their approach to quantization.Here is the same information in Simplified Chinese text:
results: 作者们提供了合成和性能结果,证明他们的架构可以实现超过1000个神经元的网络,在当今最先进的FPGA上达到高频率,并证明其鲁棒性于量化。Abstract
Spiking neural networks (SNNs) have gained attention in recent years due to their ability to handle sparse and event-based data better than regular artificial neural networks (ANNs). Since the structure of SNNs is less suited for typically used accelerators such as GPUs than conventional ANNs, there is a demand for custom hardware accelerators for processing SNNs. In the past, the main focus was on platforms that resemble the structure of multiprocessor systems. In this work, we propose a lightweight neuron layer architecture that allows network structures to be directly mapped onto digital hardware. Our approach is based on differential time coding of spike sequences and the decoupling of processing time and spike timing that allows the SNN to be processed on different hardware platforms. We present synthesis and performance results showing that this architecture can be implemented for networks of more than 1000 neurons with high clock speeds on a State-of-the-Art FPGA. We furthermore show results on the robustness of our approach to quantization. These results demonstrate that high-accuracy inference can be performed with bit widths as low as 4.
摘要
快活 нейрон网络(SNN)在最近几年内受到了关注,因为它们可以更好地处理稀疏和事件驱动的数据,而不是常见的人工神经网络(ANNs)。由于SNNs的结构不适合通常使用的加速器 such as GPUs,因此有一些需求为SNNs的特定硬件加速器。在过去,主要的注意力是关于类似多处理器系统的平台。在这种工作中,我们提出了一种轻量级神经层架构,允许网络结构直接被映射到数字硬件上。我们的方法基于时差编码的脉冲序列和脉冲时间和处理时间的解 couple,使SNN可以在不同的硬件平台上进行处理。我们提供了合成和性能结果,证明这种架构可以实现超过1000个神经元的网络,并且在现代FPGA上可以达到高频率。此外,我们还展示了量化效果的robustness。这些结果表明,使用4个bit宽度可以实现高精度的推理。
Exploiting Active RIS in NOMA Networks with Hardware Impairments
results: 研究结果显示:一、ARIS-NOMA-HIS网络在堆叠 Nakagami-m 抑压通道上提供了更高的多样性顺序和多重化吞吐量;二、随着ARIS的反射率增加,ARIS-NOMA-HIS网络的失败性能得到了强化;三、ARIS-NOMA-HIS网络比ARIS/PRIS-OMA网络和传统的合作方案更能节省能量。Abstract
Active reconfigurable intelligent surface (ARIS) is a promising way to compensate for multiplicative fading attenuation by amplifying and reflecting event signals to selected users. This paper investigates the performance of ARIS assisted non-orthogonal multiple access (NOMA) networks over cascaded Nakagami-m fading channels. The effects of hardware impairments (HIS) and reflection coefficients on ARIS-NOMA networks with imperfect successive interference cancellation (ipSIC) and perfect successive interference cancellation (pSIC) are considered. More specifically, we develop new precise and asymptotic expressions of outage probability and ergodic data rate with ipSIC/pSIC for ARIS-NOMA-HIS networks. According to the approximated analyses, the diversity orders and multiplexing gains for couple of non-orthogonal users are attained in detail. Additionally, the energy efficiency of ARIS-NOMA-HIS networks is surveyed in delay-limited and delay-tolerant transmission schemes. The simulation findings are presented to demonstrate that: i) The outage behaviors and ergodic data rates of ARIS-NOMA-HIS networks precede that of ARIS aided orthogonal multiple access (OMA) and passive reconfigurable intelligent surface (PRIS) aided OMA; ii) As the reflection coefficient of ARIS increases, ARIS-NOMA-HIS networks have the ability to provide the strengthened outage performance; and iii) ARIS-NOMA-HIS networks are more energy efficient than ARIS/PRIS-OMA networks and conventional cooperative schemes.
摘要
活动可重新配置智能表面(ARIS)是一种有前途的方法,用于资料减少多重折射减弱的影响,通过增强和反射事件信号来选择性地向用户进行传输。这篇论文研究了ARIS协助非对称多接入(NOMA)网络在堆积 Nakagami-m 折射通道上的性能。我们考虑了硬件缺陷(HIS)和反射系数的影响,并分析了 ipSIC 和 pSIC 在 ARIS-NOMA-HIS 网络中的效果。具体来说,我们开发了新的精确和极限表达,用于计算 ARIS-NOMA-HIS 网络的失败概率和平均数据速率。根据这些分析结果,我们得出了几个非对称用户的多样性顺序和多重化增益的详细分析结果。此外,我们还对 ARIS-NOMA-HIS 网络的能效性进行了评估,并在延迟限制和延迟忍受传输方案下进行了对比分析。实验结果表明:一、ARIS-NOMA-HIS 网络在失败性和平均数据速率方面比 ARIS 帮助的对称多接入(OMA)和 passive 重配置智能表面(PRIS)帮助的 OMA 网络更佳;二、ARIS 的反射系数增加,ARIS-NOMA-HIS 网络可以提供更强的失业性表现;三、ARIS-NOMA-HIS 网络比 ARIS/PRIS-OMA 网络和传统的合作方案更能效。
An ADMM-Based Geometric Configuration Optimization in RSSD-Based Source Localization By UAVs with Spread Angle Constraint
methods: 本文使用了ADMM优化框架,并提出了两种全球最佳解方案,基于 Von Neumann矩阵跟踪不等式 theorem和大致最小化(MM)算法。
results: 通过大量的 simulations,本文证明了提案的ADMM优化算法的有效性和实用性。Abstract
Deploying multiple unmanned aerial vehicles (UAVs) to locate a signal-emitting source covers a wide range of military and civilian applications like rescue and target tracking. It is well known that the UAVs-source (sensors-target) geometry, namely geometric configuration, significantly affects the final localization accuracy. This paper focuses on the geometric configuration optimization for received signal strength difference (RSSD)-based passive source localization by drone swarm. Different from prior works, this paper considers a general measuring condition where the spread angle of drone swarm centered on the source is constrained. Subject to this constraint, a geometric configuration optimization problem with the aim of maximizing the determinant of Fisher information matrix (FIM) is formulated. After transforming this problem using matrix theory, an alternating direction method of multipliers (ADMM)-based optimization framework is proposed. To solve the subproblems in this framework, two global optimal solutions based on the Von Neumann matrix trace inequality theorem and majorize-minimize (MM) algorithm are proposed respectively. Finally, the effectiveness as well as the practicality of the proposed ADMM-based optimization algorithm are demonstrated by extensive simulations.
摘要
发射多架无人飞行器(UAV)以找到发射信号源的应用范围广泛,包括搜救和目标跟踪。已知UAV-源(感知器-目标)几何configure significantly affects the final localization accuracy。本文关注于基于接收信号强度差(RSSD)的无人机群找源位置优化 geometric configuration。与先前作品不同,本文考虑了一般测量条件下,无人机群中心点处的扩散角度受限。基于这些限制,我们形ulated a geometric configuration optimization problem to maximize the determinant of Fisher information matrix (FIM)。经过矩阵理论的变换,我们提出了基于 alternate direction method of multipliers (ADMM) 优化框架。为解这些子问题,我们提出了两种全球最优解决方案,分别基于 Von Neumann 矩阵跟踪不等式定理和 majorize-minimize (MM) 算法。最后,我们通过广泛的 simulations 展示了提议的 ADMM-based 优化算法的有效性和实用性。
results: 研究结果表明,使用PNP损失函数可以加速听控Synthesizer,并保持听控效果的高度相同性。此外,研究还评估了不同的设计选择的影响,包括参数缩放、预训练、听控表示和梯度截断,并发现PNP加速JTFS听控效果更大于任何其他设计选择。Abstract
Perceptual sound matching (PSM) aims to find the input parameters to a synthesizer so as to best imitate an audio target. Deep learning for PSM optimizes a neural network to analyze and reconstruct prerecorded samples. In this context, our article addresses the problem of designing a suitable loss function when the training set is generated by a differentiable synthesizer. Our main contribution is perceptual-neural-physical loss (PNP), which aims at addressing a tradeoff between perceptual relevance and computational efficiency. The key idea behind PNP is to linearize the effect of synthesis parameters upon auditory features in the vicinity of each training sample. The linearization procedure is massively paralellizable, can be precomputed, and offers a 100-fold speedup during gradient descent compared to differentiable digital signal processing (DDSP). We demonstrate PNP on two datasets of nonstationary sounds: an AM/FM arpeggiator and a physical model of rectangular membranes. We show that PNP is able to accelerate DDSP with joint time-frequency scattering transform (JTFS) as auditory feature, while preserving its perceptual fidelity. Additionally, we evaluate the impact of other design choices in PSM: parameter rescaling, pretraining, auditory representation, and gradient clipping. We report state-of-the-art results on both datasets and find that PNP-accelerated JTFS has greater influence on PSM performance than any other design choice.
摘要
音响匹配(PSM)目的是找到输入参数,以最好地模拟音频目标。深度学习 для PSM 优化了神经网络,以分析和重建预录样本。在这种情况下,我们的文章面临着训练集是由可微分synthesizer生成的问题。我们的主要贡献是听觉神经物理损失(PNP),它寻求在训练样本附近Linear化synthesis参数对听觉特征的影响。这个Linearization过程可以并行执行,可以在梯度下降过程中速度100倍,比可微分数字信号处理(DDSP)更快。我们在两个非站点声音数据集上进行了PNP的示例:AM/FMarpeggio和物理模型的Rectangular Membrane。我们发现PNP可以加速DDSP,并且保持听觉准确性。此外,我们还评估了PSM中其他设计选择的影响:参数缩放、预训练、听觉表示和梯度clip。我们发现PNP加速JTFS在PSM中具有最大的影响,并且在两个数据集上达到了状态之最佳结果。
results: 实验结果表明,使用面积掩码生成器可以达到与手动设计的掩码相当或更好的性能,并且可以明显提高SED模型的性能。该方法在DCASE 2023 Task4B Challenge中获得了最佳成绩。Abstract
The emergence of soft-labeled data for sound event detection (SED) effectively overcomes the lack of traditional strong-labeled data. However, the performance of present SED systems based on such soft labels is still unsatisfactory. In this work, we introduce a dual-branch SED model designed to leverage the information within soft labels. Four variations of the interacted convolutional module are presented to investigate the effective mechanism for information interaction. Furthermore, we incorporate the scene-based mask generated by an estimator to directly apply to the prediction of SED models. Experimental results show that the mask estimator can achieve comparable or even better performance than the manually-designed mask and significantly improve the performance of SED. The proposed approach achieved the top ranking in the DCASE 2023 Task4B Challenge.
摘要
文本中的软标注数据的出现有效地解决了强标注数据的缺乏问题。然而,现有的SED系统基于软标注的性能仍然不满足。在这种工作中,我们提出了一种基于软标注的双支分支SED模型,利用软标注中的信息。我们还提出了四种交互式卷积模块的变种,以研究信息交互的有效机制。此外,我们将场景基于的面积生成器 integrate into SED预测模型中。实验结果显示,面积估计器可以达到与手动设计的面积相同或更好的性能,并Significantly improve SED的性能。我们提出的方法在DCASE 2023 Task4B Challenge中获得了总冠军。
paper_authors: Karina Yang, Alexis Bennett, Dominique Duncan for: 这项研究的目的是提高人工智能(AI)算法在胸部X射线(CXR)图像上的快速和准确诊断COVID-19疾病。methods: 该研究使用了21种 convolutional neural network(CNN)模型,在一个多样化的33,000多张CXR图像上进行了训练和评估,以分类健康、COVID-19和非COVID-19肺炎CXR图像。results: 研究结果显示,使用对抗训练可以提高模型的 robustness和可解释性,并且模型的三个分类精度、回归率和准确率分别达到了97.03%、97.97%和99.95%。此外,对比专业Radiologist的诊断, adversarially trained models的saliency map更好地指出了临床重要的特征,并且更加Robust against extraneous artifacts。Abstract
The novel 2019 Coronavirus disease (COVID-19) global pandemic is a defining health crisis. Recent efforts have been increasingly directed towards achieving quick and accurate detection of COVID-19 across symptomatic patients to mitigate the intensity and spread of the disease. Artificial intelligence (AI) algorithms applied to chest X-ray (CXR) images have emerged as promising diagnostic tools, and previous work has demonstrated impressive classification performances. However, such methods have faced criticisms from physicians due to their black-box reasoning process and unpredictable nature. In contrast to professional radiologist diagnosis, AI systems often lack generalizability, explainability, and robustness in the clinical decision making process. In our work, we address these issues by first proposing an extensive baseline study, training and evaluating 21 convolutional neural network (CNN) models on a diverse set of 33,000+ CXR images to classify between healthy, COVID-19, and non-COVID-19 pneumonia CXRs. Our resulting models achieved a 3-way classification accuracy, recall, and precision of up to 97.03\%, 97.97\%, and 99.95\%, respectively. Next, we investigate the effectiveness of adversarial training on model robustness and explainability via Gradient-weighted Class Activation Mapping (Grad-CAM) heatmaps. We find that adversarially trained models not only significantly outperform their standard counterparts on classifying perturbed images, but also yield saliency maps that 1) better specify clinically relevant features, 2) are robust against extraneous artifacts, and 3) agree considerably more with expert radiologist findings.
摘要
新型冠状病毒疾病(COVID-19)的全球大流行是一个决定性的健康危机。最近的努力主要是通过快速和准确地检测COVID-19来减轻疾病的严重性和扩散。人工智能(AI)算法应用于胸部X射线图像(CXR)中有出色的诊断表现,但这些方法受到医生的批判,因为它们的黑盒式理由过程和不可预测的性。相比职业医生诊断,AI系统经常缺乏普适性、解释性和临床决策过程中的稳定性。在我们的工作中,我们解决这些问题,先是提出了一项广泛的基线研究,在33,000多个CXR图像上训练和评估21种 convolutional neural network(CNN)模型,以分类健康、COVID-19和非COVID-19肺炎CXR图像。我们的结果表明,我们的模型可达3种分类精度、回归率和准确率的97.03%、97.97%和99.95%。接下来,我们研究了对模型稳定性和解释性的影响,通过权重类Activation Mapping(Grad-CAM)热图。我们发现,对模型进行反击训练后,不仅可以明显超过标准模型在处理受损图像的能力,还可以提供更有用的特征积分图,这些图像特征更加注重临床 relevante,更加抗锋抗辐,同时与专业医生诊断结果相符。
A New Benchmark and Model for Challenging Image Manipulation Detection
results: 对比 existed 图像修改检测方法,本研究的模型在挑战性的情况下显示出了显著的提高,并且在新的 CIMD 数据集上达到了最高的检测精度。Abstract
The ability to detect manipulation in multimedia data is vital in digital forensics. Existing Image Manipulation Detection (IMD) methods are mainly based on detecting anomalous features arisen from image editing or double compression artifacts. All existing IMD techniques encounter challenges when it comes to detecting small tampered regions from a large image. Moreover, compression-based IMD approaches face difficulties in cases of double compression of identical quality factors. To investigate the State-of-The-Art (SoTA) IMD methods in those challenging conditions, we introduce a new Challenging Image Manipulation Detection (CIMD) benchmark dataset, which consists of two subsets, for evaluating editing-based and compression-based IMD methods, respectively. The dataset images were manually taken and tampered with high-quality annotations. In addition, we propose a new two-branch network model based on HRNet that can better detect both the image-editing and compression artifacts in those challenging conditions. Extensive experiments on the CIMD benchmark show that our model significantly outperforms SoTA IMD methods on CIMD.
摘要
<>转换文本到简化中文。<>鉴别 multimedia 数据中的修改是数字化检察中的关键。现有的图像修改检测(IMD)方法主要基于检测图像编辑或双重压缩 artifacts 中的异常特征。所有现有的 IMD 技术在检测小修改区域时遇到挑战,尤其是在处理大图像时。此外,压缩基于 IMD 方法在同等质量因子下的双压缩情况下采取的方法也遇到困难。为了调查State-of-The-Art(SoTA)IMD 方法在这些挑战情况下的性能,我们提出了一个新的挑战性图像修改检测(CIMD)benchmark 数据集,该数据集包括两个子集,用于评估编辑基于和压缩基于 IMD 方法。数据集中的图像都是手动修改并有高质量的注释。此外,我们提出了一个基于 HRNet 的两支网络模型,可以更好地检测图像修改和压缩 artifacts 在挑战情况下。广泛的实验表明,我们的模型在 CIMD 上significantly 超过 SoTA IMD 方法的性能。
ECRF: Entropy-Constrained Neural Radiance Fields Compression with Frequency Domain Optimization
results: 我们的模型在多个数据集上实现了superior压缩性能。 sources 代码将公开发布。Abstract
Explicit feature-grid based NeRF models have shown promising results in terms of rendering quality and significant speed-up in training. However, these methods often require a significant amount of data to represent a single scene or object. In this work, we present a compression model that aims to minimize the entropy in the frequency domain in order to effectively reduce the data size. First, we propose using the discrete cosine transform (DCT) on the tensorial radiance fields to compress the feature-grid. This feature-grid is transformed into coefficients, which are then quantized and entropy encoded, following a similar approach to the traditional video coding pipeline. Furthermore, to achieve a higher level of sparsity, we propose using an entropy parameterization technique for the frequency domain, specifically for DCT coefficients of the feature-grid. Since the transformed coefficients are optimized during the training phase, the proposed model does not require any fine-tuning or additional information. Our model only requires a lightweight compression pipeline for encoding and decoding, making it easier to apply volumetric radiance field methods for real-world applications. Experimental results demonstrate that our proposed frequency domain entropy model can achieve superior compression performance across various datasets. The source code will be made publicly available.
摘要
<>使用特征网格基于NeRF模型显示出了较好的渲染质量和较快的训练速度。然而,这些方法通常需要大量的数据来表示单个场景或物体。在这项工作中,我们提出了一种压缩模型,旨在在频域中减少数据大小。首先,我们提议使用离散归一化变换(DCT)对特征网格进行压缩。这个特征网格被转换为系数,然后被量化和 entropy编码,采用类似于传统视频编码管道的方法。此外,为了实现更高的稀疏性,我们提议使用频域的 entropy 参数化技术,专门是为DCT coefficients of the feature-grid。由于转换后的系数在训练阶段被优化,所以我们的模型不需要任何细化或额外信息。我们的模型只需要一个轻量级压缩管道进行编码和解码,使其更易应用于实际应用场景。实验结果表明,我们的提议频域 entropy 模型可以在多个数据集上实现更高的压缩性能。源代码将公开发布。
Enhancing mTBI Diagnosis with Residual Triplet Convolutional Neural Network Using 3D CT
paper_authors: Hanem Ellethy, Shekhar S. Chandra, Viktor Vegh
For: The paper aims to improve the accuracy and efficiency of mild traumatic brain injury (mTBI) diagnosis using 3D Computed Tomography (CT) images and a residual triplet convolutional neural network (RTCNN) model.* Methods: The RTCNN model utilizes a triplet loss function to optimize feature representations and distinguish between mTBI cases and healthy ones. The model shows promising performance in mTBI diagnosis, with an average accuracy of 94.3%, sensitivity of 94.1%, and specificity of 95.2%.* Results: Compared to the conventional Residual Convolutional Neural Network (RCNN) model, the RTCNN model exhibits a significant improvement in specificity (22.5%), accuracy (16.2%), and sensitivity (11.3%). Additionally, the RTCNN model requires lower memory resources, making it both highly effective and resource-efficient.Abstract
Mild Traumatic Brain Injury (mTBI) is a common and challenging condition to diagnose accurately. Timely and precise diagnosis is essential for effective treatment and improved patient outcomes. Traditional diagnostic methods for mTBI often have limitations in terms of accuracy and sensitivity. In this study, we introduce an innovative approach to enhance mTBI diagnosis using 3D Computed Tomography (CT) images and a metric learning technique trained with triplet loss. To address these challenges, we propose a Residual Triplet Convolutional Neural Network (RTCNN) model to distinguish between mTBI cases and healthy ones by embedding 3D CT scans into a feature space. The triplet loss function maximizes the margin between similar and dissimilar image pairs, optimizing feature representations. This facilitates better context placement of individual cases, aids informed decision-making, and has the potential to improve patient outcomes. Our RTCNN model shows promising performance in mTBI diagnosis, achieving an average accuracy of 94.3%, a sensitivity of 94.1%, and a specificity of 95.2%, as confirmed through a five-fold cross-validation. Importantly, when compared to the conventional Residual Convolutional Neural Network (RCNN) model, the RTCNN exhibits a significant improvement, showcasing a remarkable 22.5% increase in specificity, a notable 16.2% boost in accuracy, and an 11.3% enhancement in sensitivity. Moreover, RTCNN requires lower memory resources, making it not only highly effective but also resource-efficient in minimizing false positives while maximizing its diagnostic accuracy in distinguishing normal CT scans from mTBI cases. The quantitative performance metrics provided and utilization of occlusion sensitivity maps to visually explain the model's decision-making process further enhance the interpretability and transparency of our approach.
摘要
轻度脑部损伤(mTBI)是一种常见且困难的诊断问题。时效精准诊断对于有效的治疗和患者结果具有重要意义。传统的mTBI诊断方法经常具有准确率和敏感度的限制。在这种研究中,我们提出了一种创新的mTBI诊断方法,使用三维计算机Tomography(CT)图像和一种度学学习技术,通过三元损失函数进行训练。为解决这些挑战,我们提议了一种差异Triplet Convolutional Neural Network(RTCNN)模型,通过嵌入三维CT扫描图像,将 случа例嵌入特征空间中。三元损失函数最大化类似和不类似图像对的距离,优化特征表示。这有助于更好地地位置个案,帮助决策,并有potential to improve patient outcomes。我们的RTCNN模型在mTBI诊断中表现出色,实现了平均准确率为94.3%,敏感性为94.1%,特异性为95.2%,通过五fold横断验证。与传统的Residual Convolutional Neural Network(RCNN)模型相比,RTCNN表现出了显著的提升,包括22.5%的特异性提高、16.2%的准确率提高和11.3%的敏感性提高。此外,RTCNN需要较低的内存资源,不仅高效,还能够减少假阳性的发生。我们还提供了量化性能指标和使用 occlusion sensitivity maps来可见地解释模型决策过程,进一步增强了我们的方法的可读性和透明度。
HACD: Hand-Aware Conditional Diffusion for Monocular Hand-Held Object Reconstruction
results: 我们的方法在Synthetic ObMan数据集和两个实际数据集HO3D和MOW上进行了实验,结果表明我们的方法在所有现有方法之上具有显著优势。Abstract
Reconstructing hand-held objects from a single RGB image without known 3D object templates, category prior, or depth information is a vital yet challenging problem in computer vision. In contrast to prior works that utilize deterministic modeling paradigms, which make it hard to account for the uncertainties introduced by hand- and self-occlusion, we employ a probabilistic point cloud denoising diffusion model to tackle the above challenge. In this work, we present Hand-Aware Conditional Diffusion for monocular hand-held object reconstruction (HACD), modeling the hand-object interaction in two aspects. First, we introduce hand-aware conditioning to model hand-object interaction from both semantic and geometric perspectives. Specifically, a unified hand-object semantic embedding compensates for the 2D local feature deficiency induced by hand occlusion, and a hand articulation embedding further encodes the relationship between object vertices and hand joints. Second, we propose a hand-constrained centroid fixing scheme, which utilizes hand vertices priors to restrict the centroid deviation of partially denoised point cloud during diffusion and reverse process. Removing the centroid bias interference allows the diffusion models to focus on the reconstruction of shape, thus enhancing the stability and precision of local feature projection. Experiments on the synthetic ObMan dataset and two real-world datasets, HO3D and MOW, demonstrate our approach surpasses all existing methods by a large margin.
摘要
<>TRANSLATE_TEXT重建手持物品从单一的RGB图像中,无知3D对象模板、类别先验或深度信息,是计算机视觉中的一项重要且挑战性问题。在先前的工作中,我们使用deterministic模型,即快导致手 occlusion 引入的不确定性困难很难考虑。在这种情况下,我们采用一种概率点云净化扩散模型,即手持aware Conditional Diffusion(HACD),来解决上述问题。在这个工作中,我们提出了两种方法来模型手持物品的交互:1. 引入手持aware conditioning,模型手持物品的交互从semantic和geometry两个方面。specifically,我们使用一个统一的手持物品semantic embedding,来补做2D本地特征不足induced by手 occlusion,并使用一个手关节embedding,来编码物体顶点和手关节之间的关系。2. 我们提出了一种手做限制中心偏移方案,使用手关节顶点的先验来限制部分净化后的点云的中心偏移的范围。这样,从 diffusion和反过程中移除中心偏移的干扰,使得扩散模型更专注于形状的重建,从而提高了本地特征的投射稳定性和精度。在ObMan synthetic dataset和HO3D和MOW两个实际 dataset上,我们的方法比所有先前方法都高得多。<>
TCuPGAN: A novel framework developed for optimizing human-machine interactions in citizen science
paper_authors: Ramanakumar Sankar, Kameswara Mantha, Lucy Fortson, Helen Spiers, Thomas Pengo, Douglas Mashek, Myat Mo, Mark Sanders, Trace Christensen, Jeffrey Salisbury, Laura Trouille
results: 通过与Zooniverse平台上的公民科学项目进行迭代人机优化,只有一部分2Dslice从这些立方体图像中被参考者看到,并通过补做式对抗性提供 Machine proposal的估计,以便减少参考者的努力。Abstract
In the era of big data in scientific research, there is a necessity to leverage techniques which reduce human effort in labeling and categorizing large datasets by involving sophisticated machine tools. To combat this problem, we present a novel, general purpose model for 3D segmentation that leverages patch-wise adversariality and Long Short-Term Memory to encode sequential information. Using this model alongside citizen science projects which use 3D datasets (image cubes) on the Zooniverse platforms, we propose an iterative human-machine optimization framework where only a fraction of the 2D slices from these cubes are seen by the volunteers. We leverage the patch-wise discriminator in our model to provide an estimate of which slices within these image cubes have poorly generalized feature representations, and correspondingly poor machine performance. These images with corresponding machine proposals would be presented to volunteers on Zooniverse for correction, leading to a drastic reduction in the volunteer effort on citizen science projects. We trained our model on ~2300 liver tissue 3D electron micrographs. Lipid droplets were segmented within these images through human annotation via the `Etch A Cell - Fat Checker' citizen science project, hosted on the Zooniverse platform. In this work, we demonstrate this framework and the selection methodology which resulted in a measured reduction in volunteer effort by more than 60%. We envision this type of joint human-machine partnership will be of great use on future Zooniverse projects.
摘要
在科学研究中的大数据时代,有一种必要性,即利用技术来减少人工标注和分类大数据集的劳动。为解决这个问题,我们提出了一种全新的、通用模型,利用patch-wise adversariality和Long Short-Term Memory来编码sequential信息。使用这种模型,我们提议一种迭代人机优化框架,只有3D数据集(图像立方体)中的一部分2D slice被志愿者看到。我们利用模型中的patch-wise diskriminator提供估计,哪些图像slice在这些图像立方体中有不良泛化特征表示,并且对应的机器提议也有较差性能。这些图像与机器提议将被志愿者在Zooniverse平台上修正,从而减少志愿者的劳动。我们对~2300个肝脏电镜 graf剖板中的脂肪板进行了人工标注,并通过Zooniverse平台上的`Etch A Cell - Fat Checker`公民科学项目,对这些图像进行了人工分类。在这个工作中,我们展示了这种框架和选择方法,并证明了它们可以减少志愿者劳动时间大于60%。我们认为这种人机合作将在未来Zooniverse项目中发挥重要作用。
GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence
results: 该方法可以在RGB图像中快速和准确地估计物体的姿态,并且比现有技术快速38倍。此外,该方法也更加抗性能异常出入图像的分割错误。Abstract
We present GigaPose, a fast, robust, and accurate method for CAD-based novel object pose estimation in RGB images. GigaPose first leverages discriminative templates, rendered images of the CAD models, to recover the out-of-plane rotation and then uses patch correspondences to estimate the four remaining parameters. Our approach samples templates in only a two-degrees-of-freedom space instead of the usual three and matches the input image to the templates using fast nearest neighbor search in feature space, results in a speedup factor of 38x compared to the state of the art. Moreover, GigaPose is significantly more robust to segmentation errors. Our extensive evaluation on the seven core datasets of the BOP challenge demonstrates that it achieves state-of-the-art accuracy and can be seamlessly integrated with a refinement method. Additionally, we show the potential of GigaPose with 3D models predicted by recent work on 3D reconstruction from a single image, relaxing the need for CAD models and making 6D pose object estimation much more convenient. Our source code and trained models are publicly available at https://github.com/nv-nguyen/gigaPose
摘要
我们介绍GigaPose,一种快速、稳定和准确的CAD模型基于RGB图像中新物体姿态估计方法。GigaPose首先利用特征模板,rendered CAD模型的图像,来回归外平面旋转,然后使用贴图匹配来估计四个余下参数。我们的方法在只有两个度量空间中采样模板,而不是常见的三个度量空间,并使用快速的 nearest neighbor search 在特征空间匹配输入图像和模板,从而实现了38倍的速度提升相比于状态机器。此外,GigaPose更加抗性能强,对分割误差具有更好的鲁棒性。我们的广泛的评估表明,GigaPose在七个核心数据集上达到了状态机器的准确率,并且可以与精度补做方法集成。此外,我们还展示了GigaPose在3D模型预测的最新工作中,可以释放CAD模型的需求,使6D姿态对象估计变得更加便捷。我们的源代码和训练模型在 上公开可用。
Automated 3D Tumor Segmentation using Temporal Cubic PatchGAN (TCuP-GAN)
results: 研究者通过使用TCuP-GAN在2023年脑肿分 segmentation(BraTS)挑战中的四个挑战(成人肿瘤、脑膜肿瘤、儿童肿瘤和Sub-Saharan Africa subset)的数据上进行了评估,并使用LesionWise Dice相似度和$95%$ Hausdorff Distance度量来评估模型的性能。研究者成功地学习了TCuP-GAN来预测多类分割mask,并在所有挑战中预测了 robust的多类分割结果。Abstract
Development of robust general purpose 3D segmentation frameworks using the latest deep learning techniques is one of the active topics in various bio-medical domains. In this work, we introduce Temporal Cubic PatchGAN (TCuP-GAN), a volume-to-volume translational model that marries the concepts of a generative feature learning framework with Convolutional Long Short-Term Memory Networks (LSTMs), for the task of 3D segmentation. We demonstrate the capabilities of our TCuP-GAN on the data from four segmentation challenges (Adult Glioma, Meningioma, Pediatric Tumors, and Sub-Saharan Africa subset) featured within the 2023 Brain Tumor Segmentation (BraTS) Challenge and quantify its performance using LesionWise Dice similarity and $95\%$ Hausdorff Distance metrics. We demonstrate the successful learning of our framework to predict robust multi-class segmentation masks across all the challenges. This benchmarking work serves as a stepping stone for future efforts towards applying TCuP-GAN on other multi-class tasks such as multi-organelle segmentation in electron microscopy imaging.
摘要
development of robust general-purpose 3D segmentation frameworks using the latest deep learning techniques is one of the active topics in various bio-medical domains. In this work, we introduce Temporal Cubic PatchGAN (TCuP-GAN), a volume-to-volume translational model that marries the concepts of a generative feature learning framework with Convolutional Long Short-Term Memory Networks (LSTMs), for the task of 3D segmentation. We demonstrate the capabilities of our TCuP-GAN on the data from four segmentation challenges (Adult Glioma, Meningioma, Pediatric Tumors, and Sub-Saharan Africa subset) featured within the 2023 Brain Tumor Segmentation (BraTS) Challenge and quantify its performance using LesionWise Dice similarity and $95\%$ Hausdorff Distance metrics. We demonstrate the successful learning of our framework to predict robust multi-class segmentation masks across all the challenges. This benchmarking work serves as a stepping stone for future efforts towards applying TCuP-GAN on other multi-class tasks such as multi-organelle segmentation in electron microscopy imaging.Here's the translation in Traditional Chinese:开发robust通用3D分类框架,使用最新的深度学习技术是生物医学领域中活跃的主题。在这个工作中,我们介绍Temporal Cubic PatchGAN(TCuP-GAN),一个将feature learning框架与Convolutional Long Short-Term Memory Networks(LSTMs)联合的量值转换模型,用于3D分类任务。我们在2023年Brain Tumor Segmentation(BraTS)挑战中使用TCuP-GAN进行四个分类挑战(成人 Glioma、Meningioma、儿童肿瘤和SUB-SAHARAN Africa subset)的数据上进行评估,并使用LesionWise Dice similarity和$95\%$ Hausdorff Distance度量表示TCuP-GAN的性能。我们显示了TCuP-GAN在所有挑战中预测了多维多类分类掩模的成功。这个参考工作将serve as a stepping stone для未来对TCuP-GAN应用于其他多类任务,例如电子显微镜图像中的多个膜蛋白分类。
Class Balanced Dynamic Acquisition for Domain Adaptive Semantic Segmentation using Active Learning
results: 该方法可以在大型活动学习预算下提高模型的性能,特别是在高预算情况下,对少数类的性能提高了0.6、1.7和2.4个mIoU,并且对最小类的性能提高了0.5、2.9和4.6个IoU。Abstract
Domain adaptive active learning is leading the charge in label-efficient training of neural networks. For semantic segmentation, state-of-the-art models jointly use two criteria of uncertainty and diversity to select training labels, combined with a pixel-wise acquisition strategy. However, we show that such methods currently suffer from a class imbalance issue which degrades their performance for larger active learning budgets. We then introduce Class Balanced Dynamic Acquisition (CBDA), a novel active learning method that mitigates this issue, especially in high-budget regimes. The more balanced labels increase minority class performance, which in turn allows the model to outperform the previous baseline by 0.6, 1.7, and 2.4 mIoU for budgets of 5%, 10%, and 20%, respectively. Additionally, the focus on minority classes leads to improvements of the minimum class performance of 0.5, 2.9, and 4.6 IoU respectively. The top-performing model even exceeds the fully supervised baseline, showing that a more balanced label than the entire ground truth can be beneficial.
摘要
域 adaptive active learning 在标签有效性训练神经网络方面领先。为 semantic segmentation,现状顶尖模型通过两个不确定和多样性的标准来选择训练标签,并结合像素策略。然而,我们显示现有方法在大型活动学习预算中存在分类偏置问题,这会降低其性能。我们then introduce Class Balanced Dynamic Acquisition (CBDA), a novel active learning method that mitigates this issue, especially in high-budget regimes. 更平衡的标签提高了少数类性能,这使得模型可以超越之前的基eline by 0.6, 1.7, and 2.4 mIoU for budgets of 5%, 10%, and 20%, respectively. Furthermore, the focus on minority classes leads to improvements of the minimum class performance of 0.5, 2.9, and 4.6 IoU, respectively. The top-performing model even exceeds the fully supervised baseline, showing that a more balanced label than the entire ground truth can be beneficial.
For: The paper is written for improving the speed and quality of diffusion-based image generation models, specifically addressing the issue of slow generation speeds in existing methods.* Methods: The paper proposes a new method called Adversarial Consistency Training (ACT) that incorporates a discriminator into the consistency training framework to directly minimize the Jensen-Shannon divergence between the target and generated distributions at each timestep.* Results: The paper achieves improved FID scores on CIFAR10 and ImageNet 64x64, retains zero-shot image inpainting capabilities, and uses less than 1/6 of the original batch size and fewer than 1/2 of the model parameters and training steps compared to the baseline method, resulting in a substantial reduction in resource consumption.Here is the information in Simplified Chinese text:
methods: 本文提出了一种新方法called Adversarial Consistency Training (ACT),它在每个步骤中直接使用推理器来减少Jensen-Shannon divergence的值,以便更好地保证生成的质量。
results: 本文在CIFAR10和ImageNet 64x64上实现了改进的FID分数,保持零授练图像填充能力,并使用了原始批处理大小的1/6和模型参数和训练步骤的1/2以下,实现了资源占用的重大减少。Abstract
Though diffusion models excel in image generation, their step-by-step denoising leads to slow generation speeds. Consistency training addresses this issue with single-step sampling but often produces lower-quality generations and requires high training costs. In this paper, we show that optimizing consistency training loss minimizes the Wasserstein distance between target and generated distributions. As timestep increases, the upper bound accumulates previous consistency training losses. Therefore, larger batch sizes are needed to reduce both current and accumulated losses. We propose Adversarial Consistency Training (ACT), which directly minimizes the Jensen-Shannon (JS) divergence between distributions at each timestep using a discriminator. Theoretically, ACT enhances generation quality, and convergence. By incorporating a discriminator into the consistency training framework, our method achieves improved FID scores on CIFAR10 and ImageNet 64$\times$64, retains zero-shot image inpainting capabilities, and uses less than $1/6$ of the original batch size and fewer than $1/2$ of the model parameters and training steps compared to the baseline method, this leads to a substantial reduction in resource consumption.
摘要
尽管扩散模型在图像生成方面表现出色,但是步骤性的干净会导致生成速度变慢。一致训练可以解决这个问题,但是它们通常会生成较低质量的图像,并且需要较高的训练成本。在这篇论文中,我们展示了,最小化一致训练损失可以最小化泊松距离 между目标和生成分布。随着时间步骤的增加,Upper bound会积累之前的一致训练损失。因此,更大的批处大小可以减少当前和积累的损失。我们提议的对抗一致训练(ACT)直接在每个时间步骤中最小化Jensen-Shannon(JS)分布之间的差异使用一个探测器。从理论来看,ACT可以提高生成质量和吸引力,同时减少资源占用。通过在一致训练框架中 incorporate 探测器,我们的方法可以在CIFAR10和ImageNet 64×64上提高FID分数,保留零基本图像填充能力,并且使用较少的原始批处大小和模型参数,以及较少的训练步骤。这将导致资源消耗的减少。
paper_authors: Anikeit Sethi, Krishanu Saini, Sai Mounika Mididoddi
for: 本研究旨在提供一种自动异常事件检测和识别方法,以提高公共安全性的评估。
methods: 本研究使用机器学习技术,特别是生成对抗网络(GAN)模型,来自动识别异常事件。
results: 对四个标准测试集进行比较,研究发现,与现有技术相比,本研究的网络在所有数据集上表现出优异的检测性能。Abstract
Accounting for the increased concern for public safety, automatic abnormal event detection and recognition in a surveillance scene is crucial. It is a current open study subject because of its intricacy and utility. The identification of aberrant events automatically, it's a difficult undertaking because everyone's idea of abnormality is different. A typical occurrence in one circumstance could be seen as aberrant in another. Automatic anomaly identification becomes particularly challenging in the surveillance footage with a large crowd due to congestion and high occlusion. With the use of machine learning techniques, this thesis study aims to offer the solution for this use case so that human resources won't be required to keep an eye out for any unusual activity in the surveillance system records. We have developed a novel generative adversarial network (GAN) based anomaly detection model. This model is trained such that it learns together about constructing a high dimensional picture space and determining the latent space from the video's context. The generator uses a residual Autoencoder architecture made up of a multi-stage channel attention-based decoder and a two-stream, deep convolutional encoder that can realise both spatial and temporal data. We have also offered a technique for refining the GAN model that reduces training time while also generalising the model by utilising transfer learning between datasets. Using a variety of assessment measures, we compare our model to the current state-of-the-art techniques on four benchmark datasets. The empirical findings indicate that, in comparison to existing techniques, our network performs favourably on all datasets.
摘要
现在,因为公共安全的关注度更高,自动异常事件检测和识别在监控场景中是非常重要的。这是一个当前的开放研究主题,因为它的复杂度和实用性。自动异常事件识别是一项困难的任务,因为每个人对异常的定义不同。一个普通的事件在一个情况下可能被看作异常的,而在另一个情况下可能不然。自动异常事件识别在监控视频中出现大量人群的情况下特别Difficult,因为拥堵和高度遮挡。通过机器学习技术,这个研究试图提供一种解决方案,以避免人工资源被用于监控系统记录中的异常活动监测。我们已经开发了一种基于生成对抗网络(GAN)的异常检测模型。这个模型在学习过程中会学习建立高维度图像空间和确定视频上下文中的秘密空间。生成器使用了一种带有多个频道注意力的抽象网络架构,包括多个阶段通道注意力基于解码器和深度卷积Encoder,可以实现空间和时间数据的同时满足。我们还提供了一种使用传输学习来改进GAN模型的技术,以降低训练时间并通过数据集之间的传输学习来泛化模型。使用多种评价指标,我们与当前状态艺术技术进行比较,并在四个benchmark数据集上进行了测试。实验结果表明,相比现有技术,我们的网络在所有数据集上表现良好。
Class Uncertainty: A Measure to Mitigate Class Imbalance
results: 研究发现,通过考虑类uncertainty指标,可以更好地捕捉类均匀性问题,并且在长尾分布中的训练例子上具有更高的精度。Abstract
Class-wise characteristics of training examples affect the performance of deep classifiers. A well-studied example is when the number of training examples of classes follows a long-tailed distribution, a situation that is likely to yield sub-optimal performance for under-represented classes. This class imbalance problem is conventionally addressed by approaches relying on the class-wise cardinality of training examples, such as data resampling. In this paper, we demonstrate that considering solely the cardinality of classes does not cover all issues causing class imbalance. To measure class imbalance, we propose "Class Uncertainty" as the average predictive uncertainty of the training examples, and we show that this novel measure captures the differences across classes better than cardinality. We also curate SVCI-20 as a novel dataset in which the classes have equal number of training examples but they differ in terms of their hardness; thereby causing a type of class imbalance which cannot be addressed by the approaches relying on cardinality. We incorporate our "Class Uncertainty" measure into a diverse set of ten class imbalance mitigation methods to demonstrate its effectiveness on long-tailed datasets as well as on our SVCI-20. Code and datasets will be made available.
摘要
训练例子的类别特征会影响深度分类器的性能。一个已经得到了广泛研究的例子是,训练例子的类别数据遵循长尾分布,这会导致不足代表的类别得到差化性能。这种类别不均衡问题通常通过基于类别卡дина尔ность的方法解决,如数据重采样。在这篇论文中,我们表明,只考虑类别卡дина尔ность不能涵盖所有引起类别不均衡的问题。为了测量类别不均衡,我们提出了“类型uncertainty”作为训练例子的预测不确定性的平均值,并证明这个新的度量能够更好地捕捉类别之间的差异。此外,我们还挑选了SVCI-20作为一个新的数据集,其中类别均匀但具有不同的困难程度,从而导致了基于卡дина尔ность的方法无法解决的一种类别不均衡问题。我们将“类型uncertainty”度量 integrate into了一组多种十种类别不均衡缓解方法,以证明其效果在长尾数据集以及SVCI-20上。代码和数据将被公开。
paper_authors: Roman Stoklasa, Ioannis Stathopoulos, Efstratios Karavasilis, Efstathios Efstathopoulos, Marek Dostál, Miloš Keřkovský, Michal Kozubek, Luigi Serio
results: 该论文显示了自动化脑MRI检测工具的可靠性和准确性,并且可以帮助医生更快地诊断肿瘤样病理。Abstract
In clinical practice, we often see significant delays between MRI scans and the diagnosis made by radiologists, even for severe cases. In some cases, this may be caused by the lack of additional information and clues, so even the severe cases need to wait in the queue for diagnosis. This can be avoided if there is an automatic software tool, which would supplement additional information, alerting radiologists that the particular patient may be a severe case. We are presenting an automatic brain MRI Screening Tool and we are demonstrating its capabilities for detecting tumor-like pathologies. It is the first version on the path toward a robust multi-pathology screening solution. The tool supports Federated Learning, so multiple institutions may contribute to the model without disclosing their private data.
摘要
在临床实践中,我们经常看到MRI扫描后诊断时间过长,即使是严重的情况。在一些情况下,这可能是因为缺乏额外信息和线索,因此严重的患者还需要等待排队诊断。这可以通过自动化软件工具解决,该工具可以补充额外信息,警示医生该患者可能是严重的。我们现在介绍一种自动脑MRI检测工具,我们展示了它在检测肿瘤类疾病方面的能力。这是我们的第一个版本,我们计划在未来开发出一种多种疾病检测解决方案。该工具支持联邦学习,因此多个机构可以为模型提供贡献,而无需披露私有数据。
SinSR: Diffusion-Based Image Super-Resolution in a Single Step
results: 在 synthetic 和实际 datasets 上进行了广泛的实验,并显示了与先前的SOTA方法和教师模型相比的一致或更高的性能,并且具有了remarkable x10的推理速度提升Abstract
While super-resolution (SR) methods based on diffusion models exhibit promising results, their practical application is hindered by the substantial number of required inference steps. Recent methods utilize degraded images in the initial state, thereby shortening the Markov chain. Nevertheless, these solutions either rely on a precise formulation of the degradation process or still necessitate a relatively lengthy generation path (e.g., 15 iterations). To enhance inference speed, we propose a simple yet effective method for achieving single-step SR generation, named SinSR. Specifically, we first derive a deterministic sampling process from the most recent state-of-the-art (SOTA) method for accelerating diffusion-based SR. This allows the mapping between the input random noise and the generated high-resolution image to be obtained in a reduced and acceptable number of inference steps during training. We show that this deterministic mapping can be distilled into a student model that performs SR within only one inference step. Additionally, we propose a novel consistency-preserving loss to simultaneously leverage the ground-truth image during the distillation process, ensuring that the performance of the student model is not solely bound by the feature manifold of the teacher model, resulting in further performance improvement. Extensive experiments conducted on synthetic and real-world datasets demonstrate that the proposed method can achieve comparable or even superior performance compared to both previous SOTA methods and the teacher model, in just one sampling step, resulting in a remarkable up to x10 speedup for inference. Our code will be released at https://github.com/wyf0912/SinSR
摘要
SR 方法基于扩散模型的应用在实际场景中受到很多限制,主要是因为需要进行大量的推理步骤。现有方法通过使用降低图像作为初始状态,从而缩短摘要链。然而,这些解决方案可能需要非常长的生成路径(例如,15次迭代),或者需要精准地表示降低过程。为了提高推理速度,我们提出了一种简单 yet有效的方法,称为 SinSR。具体来说,我们首先从最新的state-of-the-art(SOTA)方法中得到一种快速加速扩散基于 SR 的抽象过程。这使得在训练过程中,从输入随机噪声到生成高分辨率图像的映射可以在减少的和可接受的推理步骤中完成。我们表明了这种权重映射可以在训练过程中被硬编译成一个学生模型,以便在一步推理中实现 SR。此外,我们提出了一种新的一致性保持损失函数,可以同时利用真实的图像,以确保学生模型的性能不受教师模型的特征抽象限制,从而实现更好的性能提升。我们在 synthetic 和实际数据集上进行了广泛的实验,结果表明,我们的方法可以在一步推理中达到相同或更高的性能,并且与教师模型的性能相似,从而实现了remarkable的 x10 倍速化。我们的代码将在https://github.com/wyf0912/SinSR 上发布。
results: 实验结果表明,YO-ReX 可以准确地解释 YOLO、SSD 和 Faster R-CNN 的输出,并且具有较少的负载。Abstract
In this paper, we propose a new black-box explainability algorithm and tool, YO-ReX, for efficient explanation of the outputs of object detectors. The new algorithm computes explanations for all objects detected in the image simultaneously. Hence, compared to the baseline, the new algorithm reduces the number of queries by a factor of 10X for the case of ten detected objects. The speedup increases further with with the number of objects. Our experimental results demonstrate that YO-ReX can explain the outputs of YOLO with a negligible overhead over the running time of YOLO. We also demonstrate similar results for explaining SSD and Faster R-CNN. The speedup is achieved by avoiding backtracking by combining aggressive pruning with a causal analysis.
摘要
在这篇论文中,我们提出了一种新的黑盒式解释算法和工具YO-ReX,用于快速地解释对象检测器的输出。新算法可同时对图像中检测到的所有对象进行解释。相比基线,新算法可以将查询数量减少为10倍,对于检测到的对象数量为十个情况。运行时间上的开销几乎可以忽略不计。我们还进行了对 YOLO、SSD 和 Faster R-CNN 的解释,结果表明 YO-ReX 可以快速地解释对象检测器的输出,并且可以减少对图像的访问次数。这种减少可以通过不再回溯来实现,并且通过 causal analysis 来减少查询数量。
HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding
paper_authors: Peng Xia, Xingtong Yu, Ming Hu, Lie Ju, Zhiyong Wang, Peibo Duan, Zongyuan Ge
For: 本文提出了一种新的框架(HGCLIP),用于解决多维度分类问题。该框架通过结合 CLIP 和图表学习来更好地利用分类层次结构,从而提高图像识别性能。* Methods: 本文使用了一种新的方法,即将类层次结构转化为图表,然后通过图表Encoder 进行学习。文本特征 incorporates 了层次结构信息,而图像特征通过注意机制强调类型特征。* Results: 本文在多种多样化的视觉识别任务上达到了显著提高,包括一般化和细化的图像识别任务。 codes 可以在 GitHub 上找到。Abstract
Object categories are typically organized into a multi-granularity taxonomic hierarchy. When classifying categories at different hierarchy levels, traditional uni-modal approaches focus primarily on image features, revealing limitations in complex scenarios. Recent studies integrating Vision-Language Models (VLMs) with class hierarchies have shown promise, yet they fall short of fully exploiting the hierarchical relationships. These efforts are constrained by their inability to perform effectively across varied granularity of categories. To tackle this issue, we propose a novel framework (HGCLIP) that effectively combines CLIP with a deeper exploitation of the Hierarchical class structure via Graph representation learning. We explore constructing the class hierarchy into a graph, with its nodes representing the textual or image features of each category. After passing through a graph encoder, the textual features incorporate hierarchical structure information, while the image features emphasize class-aware features derived from prototypes through the attention mechanism. Our approach demonstrates significant improvements on both generic and fine-grained visual recognition benchmarks. Our codes are fully available at https://github.com/richard-peng-xia/HGCLIP.
摘要
Object categories 通常被组织成多级划分的分类体系。传统的uni-modal方法主要基于图像特征,在复杂的场景下表现有限。近年来的研究 combining Vision-Language Models (VLMs) with class hierarchies 已经显示了承诺,但它们在不同级别的类划分方面受限。这些努力被限制在不同级别的类划分效果不佳。为解决这个问题,我们提出了一种新的框架(HGCLIP),可以有效地结合 CLIP 与类划分结构的层次结构。我们将类划分结构转换为图,其中节点表示每个类划分的文本或图像特征。经过图编码器后,文本特征包含层次结构信息,而图像特征通过注意机制强调类型特征,从概念到实际的类划分。我们的方法在多个普通和精细的视觉识别 benchmar 上表现出了显著的改进。我们的代码可以在 上下载。
Hardware Resilience Properties of Text-Guided Image Classifiers
paper_authors: Syed Talal Wasim, Kabila Haile Saboka, Abdulrahman Mahmoud, Salman Khan, David Brooks, Gu-Yeon Wei
for: 这 paper 的目的是提高图像分类模型在部署时的可靠性,面对过渡性硬件错误。
methods: 这 paper 使用 GPT-3 生成的充实文本 embedding,与 CLIP 预训练的文本编码器,作为分类层的初始化。
results: 这 paper 的方法可以在不同的架构上实现平均的硬件可靠性提高 ($5.5\times$,最高达 $14\times$),同时减少了模型的参数和 FLOPs 开销,并且可以轻松地与任何图像分类背景集成合作。Abstract
This paper presents a novel method to enhance the reliability of image classification models during deployment in the face of transient hardware errors. By utilizing enriched text embeddings derived from GPT-3 with question prompts per class and CLIP pretrained text encoder, we investigate their impact as an initialization for the classification layer. Our approach achieves a remarkable $5.5\times$ average increase in hardware reliability (and up to 14x) across various architectures in the most critical layer, with minimal accuracy drop (0.3% on average) compared to baseline PyTorch models. Furthermore, our method seamlessly integrates with any image classification backbone, showcases results across various network architectures, decreases parameter and FLOPs overhead, and follows a consistent training recipe. This research offers a practical and efficient solution to bolster the robustness of image classification models against hardware failures, with potential implications for future studies in this domain. Our code and models are released at https://github.com/TalalWasim/TextGuidedResilience.
摘要
Here's the Simplified Chinese translation:这篇论文提出了一种使用GPT-3和CLIP预训练文本编码器 derive的增强图像分类模型的可靠性方法。该方法可以在不同的网络架构上实现remarkable的硬件可靠性提升(5.5倍平均提升,最高达14倍),同时保持准确率的下降(0.3%的平均下降)相比基eline PyTorch模型。该方法可以轻松地与任何图像分类背景集成合并,展示了在不同的网络架构上的结果,具有低的参数和FLOPs负担,并且遵循一致的训练recipe。这些研究可能对未来相关领域的研究产生影响。代码和模型可以在https://github.com/TalalWasim/TextGuidedResilience中下载。
Assessment of Deep Learning Segmentation for Real-Time Free-Breathing Cardiac Magnetic Resonance Imaging
paper_authors: Martin Schilling, Christina Unterberg-Buchwald, Joachim Lotz, Martin Uecker
for: This paper aims to assess the accuracy of deep learning methods for volumetric analysis of the left ventricle in real-time free-breathing CMR at rest and under exercise stress.
methods: The paper uses a freely available neural network (nnU-Net) and compares it to a commercial software (comDL) for segmentation of the left ventricle, myocardium, and right ventricle in both cine and real-time free-breathing CMR.
results: The paper finds that nnU-Net achieves higher accuracy than comDL overall for real-time CMR, and that the performance of deep learning methods is comparable to inter-observer variability in cine CMR, making them a viable option for automatic segmentation. Additionally, the paper shows that nnU-Net can accurately segment the left ventricle, myocardium, and right ventricle in both rest and exercise conditions.Abstract
In recent years, a variety of deep learning networks for cardiac MRI (CMR) segmentation have been developed and analyzed. However, nearly all of them are focused on cine CMR under breathold. In this work, accuracy of deep learning methods is assessed for volumetric analysis (via segmentation) of the left ventricle in real-time free-breathing CMR at rest and under exercise stress. Data from healthy volunteers (n=15) for cine and real-time free-breathing CMR were analyzed retrospectively. Segmentations of a commercial software (comDL) and a freely available neural network (nnU-Net), were compared to a reference created via the manual correction of comDL segmentation. Segmentation of left ventricular endocardium (LV), left ventricular myocardium (MYO), and right ventricle (RV) is evaluated for both end-systolic and end-diastolic phases and analyzed with Dice's coefficient (DC). The volumetric analysis includes LV end-diastolic volume (EDV), LV end-systolic volume (ESV), and LV ejection fraction (EF). For cine CMR, nnU-Net and comDL achieve a DC above 0.95 for LV and 0.9 for MYO, and RV. For real-time CMR, the accuracy of nnU-Net exceeds that of comDL overall. For real-time CMR at rest, nnU-Net achieves a DC of 0.94 for LV, 0.89 for MYO, and 0.90 for RV; mean absolute differences between nnU-Net and reference are 2.9mL for EDV, 3.5mL for ESV and 2.6% for EF. For real-time CMR under exercise stress, nnU-Net achieves a DC of 0.92 for LV, 0.85 for MYO, and 0.83 for RV; mean absolute differences between nnU-Net and reference are 11.4mL for EDV, 2.9mL for ESV and 3.6% for EF. Deep learning methods designed or trained for cine CMR segmentation can perform well on real-time CMR. For real-time free-breathing CMR at rest, the performance of deep learning methods is comparable to inter-observer variability in cine CMR and is usable or fully automatic segmentation.
摘要
近年来,许多深度学习网络 для心脏MRI(CMR)分割被开发和分析。然而,大多数其中都是针对呼吸自由的照片CMR进行分割。在这项工作中,我们评估了深度学习方法在实时自由呼吸CMR中心 Ventricle left volume的分割准确性。数据来自健康志愿者(n=15)的照片CMR和实时自由呼吸CMR分割分析。 commercial software(comDL)和一个可用的神经网络(nnU-Net)的分割结果与手动修改comDL分割结果作为参考点进行比较。分割左心脏内壁(LV)、左心脏肌肉(MYO)和右心脏(RV)的精度被评估于结束 systolic 和 diastolic 阶段,并使用 dice 系数(DC)进行分析。 volumetric analysis包括左心脏结束 diastolic 体积(EDV)、左心脏结束 systolic 体积(ESV)和左心脏抽出率(EF)。对于照片CMR, nnU-Net 和 comDL 的 DC 超过 0.95 для LV 和 0.9 для MYO 和 RV。对于实时 CMR,nnU-Net 的准确性超过 comDL 总体。在实时 CMR 的休息状态下,nnU-Net achiev 0.94 DC para LV, 0.89 para MYO 和 0.90 para RV; 参考点和 nnU-Net 之间的平均绝对差为 2.9 mL para EDV, 3.5 mL para ESV 和 2.6% para EF。在实时 CMR 的运动状态下,nnU-Net achiev 0.92 DC para LV, 0.85 para MYO 和 0.83 para RV; 参考点和 nnU-Net 之间的平均绝对差为 11.4 mL para EDV, 2.9 mL para ESV 和 3.6% para EF。深度学习方法,设计或训练于照片CMR分割,可以在实时 CMR 中表现良好。在实时自由呼吸 CMR 的休息状态下,深度学习方法的性能相当于人工 observer 间的变化,可以用于自动或半自动的分割。
Understanding the Vulnerability of CLIP to Image Compression
results: 研究发现,CLIP模型对压缩影像的改变会导致零截shot识别精度下降,并使用Integrated Gradients方法可以帮助更好地理解压缩对模型的影响。Abstract
CLIP is a widely used foundational vision-language model that is used for zero-shot image recognition and other image-text alignment tasks. We demonstrate that CLIP is vulnerable to change in image quality under compression. This surprising result is further analysed using an attribution method-Integrated Gradients. Using this attribution method, we are able to better understand both quantitatively and qualitatively exactly the nature in which the compression affects the zero-shot recognition accuracy of this model. We evaluate this extensively on CIFAR-10 and STL-10. Our work provides the basis to understand this vulnerability of CLIP and can help us develop more effective methods to improve the robustness of CLIP and other vision-language models.
摘要
CLIP 是一种广泛使用的基础视觉语言模型,用于零 shot 图像识别和其他图像文本对齐任务。我们发现 CLIP 对压缩图像质量变化很敏感,这是一个意外的结果。我们使用 Integrated Gradients 方法进行分析,以更好地理解压缩对 CLIP 零 shot 识别精度的影响。我们对 CIFAR-10 和 STL-10 进行了广泛的评估。我们的工作为我们理解 CLIP 的敏感性提供了基础,可以帮助我们开发更有效的鲁棒性提高方法,以适应不同的应用场景。
Creating and Benchmarking a Synthetic Dataset for Cloud Optical Thickness Estimation
paper_authors: Aleksis Pirinen, Nosheen Abid, Nuria Agues Paszkowsky, Thomas Ohlson Timoudas, Ronald Scheirer, Chiara Ceccobello, György Kovács, Anders Persson
results: 研究结果表明,使用虚拟数据集可以提高云体积估计的准确性和精度,并且可以在不同的云类型和环境下进行普适的应用。Abstract
Cloud formations often obscure optical satellite-based monitoring of the Earth's surface, thus limiting Earth observation (EO) activities such as land cover mapping, ocean color analysis, and cropland monitoring. The integration of machine learning (ML) methods within the remote sensing domain has significantly improved performance on a wide range of EO tasks, including cloud detection and filtering, but there is still much room for improvement. A key bottleneck is that ML methods typically depend on large amounts of annotated data for training, which is often difficult to come by in EO contexts. This is especially true for the task of cloud optical thickness (COT) estimation. A reliable estimation of COT enables more fine-grained and application-dependent control compared to using pre-specified cloud categories, as is commonly done in practice. To alleviate the COT data scarcity problem, in this work we propose a novel synthetic dataset for COT estimation, where top-of-atmosphere radiances have been simulated for 12 of the spectral bands of the Multi-Spectral Instrument (MSI) sensor onboard Sentinel-2 platforms. These data points have been simulated under consideration of different cloud types, COTs, and ground surface and atmospheric profiles. Extensive experimentation of training several ML models to predict COT from the measured reflectivity of the spectral bands demonstrates the usefulness of our proposed dataset. Generalization to real data is also demonstrated on two satellite image datasets -- one that is publicly available, and one which we have collected and annotated. The synthetic data, the newly collected real dataset, code and models have been made publicly available at https://github.com/aleksispi/ml-cloud-opt-thick.
摘要
云形成 oft obscures OPTICAL satellite-based monitoring of the Earth's surface, limiting EARTH OBSERVATION (EO) activities such as land cover mapping, ocean color analysis, and cropland monitoring. The integration of MACHINE LEARNING (ML) methods within the remote sensing domain has significantly improved performance on a wide range of EO tasks, including cloud detection and filtering, but there is still much room for improvement. A key bottleneck is that ML methods typically depend on large amounts of annotated data for training, which is often difficult to come by in EO contexts. This is especially true for the task of CLOUD OPTICAL THICKNESS (COT) estimation. A reliable estimation of COT enables more fine-grained and application-dependent control compared to using pre-specified cloud categories, as is commonly done in practice. To alleviate the COT data scarcity problem, in this work we propose a novel SYNTHETIC DATASET for COT estimation, where top-of-atmosphere radiances have been simulated for 12 of the spectral bands of the Multi-Spectral Instrument (MSI) sensor onboard Sentinel-2 platforms. These data points have been simulated under consideration of different cloud types, COTs, and ground surface and atmospheric profiles. Extensive experimentation of training several ML models to predict COT from the measured reflectivity of the spectral bands demonstrates the usefulness of our proposed dataset. Generalization to real data is also demonstrated on two satellite image datasets -- one that is publicly available, and one which we have collected and annotated. The SYNTHETIC data, the newly collected real dataset, code, and models have been made publicly available at https://github.com/aleksispi/ml-cloud-opt-thick.
Shadow: A Novel Loss Function for Efficient Training in Siamese Networks
paper_authors: Alif Elham Khan, Mohammad Junayed Hasan, Humayra Anjum, Nabeel Mohammed
for: 提高 Similarity Detection 任务的效能 under memory constraints
methods: 提出一种新的损失函数 called Shadow Loss,通过压缩维度空间来降低计算成本
results: 比对 Triplet Margin Loss 的性能,Shadow Loss 在多个 dataset 上提高了5%-10%的准确率,并且可以在不同的模型和 dataset 上保持性能。Abstract
Despite significant recent advances in similarity detection tasks, existing approaches pose substantial challenges under memory constraints. One of the primary reasons for this is the use of computationally expensive metric learning loss functions such as Triplet Loss in Siamese networks. In this paper, we present a novel loss function called Shadow Loss that compresses the dimensions of an embedding space during loss calculation without loss of performance. The distance between the projections of the embeddings is learned from inputs on a compact projection space where distances directly correspond to a measure of class similarity. Projecting on a lower-dimension projection space, our loss function converges faster, and the resulting classified image clusters have higher inter-class and smaller intra-class distances. Shadow Loss not only reduces embedding dimensions favoring memory constraint devices but also consistently performs better than the state-of-the-art Triplet Margin Loss by an accuracy of 5\%-10\% across diverse datasets. The proposed loss function is also model agnostic, upholding its performance across several tested models. Its effectiveness and robustness across balanced, imbalanced, medical, and non-medical image datasets suggests that it is not specific to a particular model or dataset but demonstrates superior performance consistently while using less memory and computation.
摘要
High-resolution Population Maps Derived from Sentinel-1 and Sentinel-2
paper_authors: Nando Metzger, Rodrigo Caye Daudt, Devis Tuia, Konrad Schindler
for: This paper aims to provide a timely and scalable method for generating population maps, particularly in data-scarce regions.
methods: The proposed method, called POPCORN, uses free, globally available satellite images from Sentinel-1 and Sentinel-2, and a small number of aggregate population counts over coarse census districts for calibration.
results: The method surpasses the mapping accuracy of existing schemes, including those that rely on high-resolution imagery, and produces population maps with an $R^2$ score of 66% and an average error of only $\pm$10 inhabitants/ha in Kigali. Additionally, the method provides interpretable results, such as maps of built-up areas and local building occupancy rates, and can be applied repeatedly to track population changes and transferred to geographically similar regions.Abstract
Detailed population maps play an important role in diverse fields ranging from humanitarian action to urban planning. Generating such maps in a timely and scalable manner presents a challenge, especially in data-scarce regions. To address it we have developed POPCORN, a population mapping method whose only inputs are free, globally available satellite images from Sentinel-1 and Sentinel-2; and a small number of aggregate population counts over coarse census districts for calibration. Despite the minimal data requirements our approach surpasses the mapping accuracy of existing schemes, including several that rely on building footprints derived from high-resolution imagery. E.g., we were able to produce population maps for Rwanda with 100m GSD based on less than 400 regional census counts. In Kigali, those maps reach an $R^2$ score of 66% w.r.t. a ground truth reference map, with an average error of only $\pm$10 inhabitants/ha. Conveniently, POPCORN retrieves explicit maps of built-up areas and of local building occupancy rates, making the mapping process interpretable and offering additional insights, for instance about the distribution of built-up, but unpopulated areas, e.g., industrial warehouses. Moreover, we find that, once trained, the model can be applied repeatedly to track population changes; and that it can be transferred to geographically similar regions, e.g., from Uganda to Rwanda). With our work we aim to democratize access to up-to-date and high-resolution population maps, recognizing that some regions faced with particularly strong population dynamics may lack the resources for costly micro-census campaigns.
摘要
“精细人口地图在多个领域发挥重要作用,包括人道主义行动和城市规划。但生成这些地图尤其在数据缺乏地区是一个挑战。为解决这个问题,我们开发了一种人口地图生成方法,只需要自由可用的卫星图像和一些粗略的人口总数计算。尽管我们的方法只需要这些最小数据,但它仍然可以超过现有方法的准确率,包括基于高分辨率图像的建筑物踪迹方法。例如,我们可以为rwanda生成100米GSD的人口地图,只需要400个区域普查数据。在基于这些数据的某些地区,我们可以达到66%的$R^2$分数,相对误差只有±10人/ha。此外,POPCORN还可以提取明确的建筑区域和地方人口占比图,使地图生成过程更加可读性高,并提供额外的意义,例如不发展建筑区域的分布,如工业仓库。此外,我们发现,一旦训练完成,这种模型可以重复应用,跟踪人口变化,并可以在地理相似的地区转移。我们的工作是要民用化最新和高分辨率人口地图的访问权,认为一些面临特别强人口动态的地区可能缺乏高成本的微型人口普查活动的资源。”
GRJointNET: Synergistic Completion and Part Segmentation on 3D Incomplete Point Clouds
paper_authors: Yigit Gurses, Melisa Taspinar, Mahmut Yurt, Sedat Ozer
for: 提高3D点云的实用性和用途
methods: 使用GRNet和GRJointNet两种深度学习模型,实现点云的完成和分割
results: GRJointNet能够超过GRNet的完成率,同时GRNet无法进行分割,GRJointNet可以同时进行完成和分割,提高点云的实用性和用途。Abstract
Segmentation of three-dimensional (3D) point clouds is an important task for autonomous systems. However, success of segmentation algorithms depends greatly on the quality of the underlying point clouds (resolution, completeness etc.). In particular, incomplete point clouds might reduce a downstream model's performance. GRNet is proposed as a novel and recent deep learning solution to complete point clouds, but it is not capable of part segmentation. On the other hand, our proposed solution, GRJointNet, is an architecture that can perform joint completion and segmentation on point clouds as a successor of GRNet. Features extracted for the two tasks are also utilized by each other to increase the overall performance. We evaluated our proposed network on the ShapeNet-Part dataset and compared its performance to GRNet. Our results demonstrate GRJointNet can outperform GRNet on point completion. It should also be noted that GRNet is not capable of segmentation while GRJointNet is. This study1, therefore, holds a promise to enhance practicality and utility of point clouds in 3D vision for autonomous systems.
摘要
三维点云分割是自动化系统中重要的任务。然而,下游模型的性能受点云质量(分辨率、完整性等)的影响很大。特别是含漏点云可能会降低下游模型的性能。GRNet是一种新的深度学习解决方案,用于完善点云,但它不能进行部分分割。相比之下,我们提出的解决方案GRJointNet是一种建立在GRNet之上的架构,可以同时进行点云完善和分割。两个任务之间的特征提取也得到了共享,以提高总体性能。我们在ShapeNet-Part数据集上评估了我们的提案网络,并与GRNet进行比较。我们的结果表明,GRJointNet可以在点云完善任务上超越GRNet。此外,GRNet无法进行分割,而GRJointNet可以。这一研究因此增强了点云在三维视觉中的实用性和实用性。
EIGEN: Expert-Informed Joint Learning Aggregation for High-Fidelity Information Extraction from Document Images
results: 论文的实验表明,EIGEN方法可以在具有少量注解数据的情况下,significantly improve state-of-the-art深度模型的性能。 code可以在https://github.com/ayushayush591/EIGEN-High-Fidelity-Extraction-Document-Images中下载。Abstract
Information Extraction (IE) from document images is challenging due to the high variability of layout formats. Deep models such as LayoutLM and BROS have been proposed to address this problem and have shown promising results. However, they still require a large amount of field-level annotations for training these models. Other approaches using rule-based methods have also been proposed based on the understanding of the layout and semantics of a form such as geometric position, or type of the fields, etc. In this work, we propose a novel approach, EIGEN (Expert-Informed Joint Learning aGgrEatioN), which combines rule-based methods with deep learning models using data programming approaches to circumvent the requirement of annotation of large amounts of training data. Specifically, EIGEN consolidates weak labels induced from multiple heuristics through generative models and use them along with a small number of annotated labels to jointly train a deep model. In our framework, we propose the use of labeling functions that include incorporating contextual information thus capturing the visual and language context of a word for accurate categorization. We empirically show that our EIGEN framework can significantly improve the performance of state-of-the-art deep models with the availability of very few labeled data instances. The source code is available at https://github.com/ayushayush591/EIGEN-High-Fidelity-Extraction-Document-Images.
摘要
信息提取(IE)从文档图像中具有高度变化的格式,使得模型训练时需要大量的字段级别标注。深度模型如LayoutLM和BROS已经提出来解决这个问题,但它们仍需要大量的训练数据。其他使用规则编程方法的方法也被提出,基于文档格式和semantic的理解,如几何位置或字段类型等。在这种工作中,我们提出了一种新的方法,EIGEN(专家支持的共同学习),它将规则编程方法与深度学习模型结合使用数据编程方法,以避免大量训练数据的标注。我们的框架中提出使用标签函数,包括 incorporating contextual information,以确定word的准确分类。我们的实验表明,EIGEN框架可以在很少的标注数据 instances 的情况下显著提高现有深度模型的性能。代码可以在 中下载。
FViT-Grasp: Grasping Objects With Using Fast Vision Transformers
methods: 该方法使用 Fast Vision Transformer (FViT) 神经网络进行视觉数据处理,预测最佳抓取位置。
results: 研究显示该方法在速度和准确率两个方面具有state-of-the-art表现,可能用于实时机器人抓取应用。I hope this helps! Let me know if you have any other questions.Abstract
This study addresses the challenge of manipulation, a prominent issue in robotics. We have devised a novel methodology for swiftly and precisely identifying the optimal grasp point for a robot to manipulate an object. Our approach leverages a Fast Vision Transformer (FViT), a type of neural network designed for processing visual data and predicting the most suitable grasp location. Demonstrating state-of-the-art performance in terms of speed while maintaining a high level of accuracy, our method holds promise for potential deployment in real-time robotic grasping applications. We believe that this study provides a baseline for future research in vision-based robotic grasp applications. Its high speed and accuracy bring researchers closer to real-life applications.
摘要
Low Latency Instance Segmentation by Continuous Clustering for Rotating LiDAR Sensors
results: 该方法可以减少实时点云分割的延迟,并且不会出现扫描整个 LiDAR 数据流时的缺失点云问题。 codes 将于 https://github.com/UniBwTAS/continuous_clustering 上发布。Abstract
Low-latency instance segmentation of LiDAR point clouds is crucial in real-world applications because it serves as an initial and frequently-used building block in a robot's perception pipeline, where every task adds further delay. Particularly in dynamic environments, this total delay can result in significant positional offsets of dynamic objects, as seen in highway scenarios. To address this issue, we employ continuous clustering of obstacle points in order to obtain an instance-segmented point cloud. Unlike most existing approaches, which use a full revolution of the LiDAR sensor, we process the data stream in a continuous and seamless fashion. More specifically, each column of a range image is processed as soon it is available. Obstacle points are clustered to existing instances in real-time and it is checked at a high-frequency which instances are completed and are ready to be published. An additional advantage is that no problematic discontinuities between the points of the start and the end of a scan are observed. In this work we describe the two-layered data structure and the corresponding algorithm for continuous clustering, which is able to cluster the incoming data in real time. We explain the importance of a large perceptive field of view. Furthermore, we describe and evaluate important architectural design choices, which could be relevant to design an architecture for deep learning based low-latency instance segmentation. We are publishing the source code at https://github.com/UniBwTAS/continuous_clustering.
摘要
低延迟实例分割 LiDAR 点云是实世界应用中非常重要的,因为它作为机器人感知管道中的第一个和常用的建筑物,每个任务都会添加额外的延迟。特别在动态环境下,总延迟可能导致动态对象的位置偏移,如公路场景中所见。为解决这个问题,我们使用连续划分障碍点,以获得实例分割的点云。与大多数现有方法不同,我们不需要整个 LiDAR 传感器的一次革命,而是在连续和无缝的方式处理数据流。更具体地说,每列范围图像中的每一列都会在可用时被处理。障碍点被分配到现有的实例中,并在高频率上检查是否已经完成并可以发布。此外,我们不会看到扫描器开始和结束点云的问题。在这个工作中,我们介绍了两层数据结构和相应的算法,可以在实时中连续划分入来数据。我们还讨论了关键的建筑设计选择,可能对设计深度学习基于低延迟实例分割的架构有重要影响。我们将源代码发布在 GitHub 上,可以通过 https://github.com/UniBwTAS/continuous_clustering 访问。
Investigating the use of publicly available natural videos to learn Dynamic MR image reconstruction
paper_authors: Olivier Jaubert, Michele Pascale, Javier Montalt-Tordera, Julius Akesson, Ruta Virsinskaite, Daniel Knight, Simon Arridge, Jennifer Steeden, Vivek Muthurangu
results: 在 simulations(N=104个数据集)中,使用深度学习网络训练 cardiac 数据和自然视频数据,并使用压缩感知(CS)重建动态MRI图像,得到了较佳的MSE、PSNR和SSIM指标(p<0.05)。在实际应用中,使用了这些DL方法重建的图像质量得到了主观评估和数据分析的结果,并与CS方法进行比较。Abstract
Purpose: To develop and assess a deep learning (DL) pipeline to learn dynamic MR image reconstruction from publicly available natural videos (Inter4K). Materials and Methods: Learning was performed for a range of DL architectures (VarNet, 3D UNet, FastDVDNet) and corresponding sampling patterns (Cartesian, radial, spiral) either from true multi-coil cardiac MR data (N=692) or from pseudo-MR data simulated from Inter4K natural videos (N=692). Real-time undersampled dynamic MR images were reconstructed using DL networks trained with cardiac data and natural videos, and compressed sensing (CS). Differences were assessed in simulations (N=104 datasets) in terms of MSE, PSNR, and SSIM and prospectively for cardiac (short axis, four chambers, N=20) and speech (N=10) data in terms of subjective image quality ranking, SNR and Edge sharpness. Friedman Chi Square tests with post-hoc Nemenyi analysis were performed to assess statistical significance. Results: For all simulation metrics, DL networks trained with cardiac data outperformed DL networks trained with natural videos, which outperformed CS (p<0.05). However, in prospective experiments DL reconstructions using both training datasets were ranked similarly (and higher than CS) and presented no statistical differences in SNR and Edge Sharpness for most conditions. Additionally, high SSIM was measured between the DL methods with cardiac data and natural videos (SSIM>0.85). Conclusion: The developed pipeline enabled learning dynamic MR reconstruction from natural videos preserving DL reconstruction advantages such as high quality fast and ultra-fast reconstructions while overcoming some limitations (data scarcity or sharing). The natural video dataset, code and pre-trained networks are made readily available on github. Key Words: real-time; dynamic MRI; deep learning; image reconstruction; machine learning;
摘要
目的:开发和评估一个深度学习(DL)管道,以learn动态MR图像重建从公共可用的自然视频(Inter4K)。 材料和方法:学习使用了不同的DL架构(VarNet、3D UNet、FastDVDNet)和相应的抽象样式(Cartesian、 радиаль、螺旋), Either from true multi-coil cardiac MR数据(N=692)或从 pseudo-MR数据通过Inter4K自然视频 simulated(N=692)。实时减样动态MR图像被使用DL网络训练 cardiac data和自然视频,并使用压缩感知(CS)重建。在 simulated(N=104 datasets)中, differenced were assessed in terms of MSE, PSNR, and SSIM, and prospectively for cardiac(短轴、四室、N=20)and speech(N=10)data in terms of subjective image quality ranking, SNR, and Edge sharpness。 Friedman Chi Square tests with post-hoc Nemenyi analysis were performed to assess statistical significance. 结果:对于所有的 simulated метри克,DL网络通过cardiac数据训练表现出色,outperformed DL网络通过自然视频训练(p<0.05),而自然视频训练DL网络则outperformed CS(p<0.05)。然而,在实际实验中,DL重建使用两个训练数据集得分类似(和CS),并没有显示统计学上的差异(p>0.05)。此外,高的SSIM(>0.85)被测量在DL方法中使用cardiac数据和自然视频之间。 结论:已经开发的管道可以通过自然视频来学习动态MR重建,保持DL重建的优点,如快速高质量重建,同时解决一些限制(数据缺乏或分享)。自然视频数据、代码和预训练网络在github上可以获得。 关键词:实时;动态MRI;深度学习;图像重建;机器学习;
RankFeat&RankWeight: Rank-1 Feature/Weight Removal for Out-of-distribution Detection
methods: 本研究使用了特征矩阵的协方差分析,发现OOD样本的特征矩阵具有较大的主要特征值,这导致了OOD样本的分类结果受到这些特征值的影响。基于这个发现,本研究提出了一种简单 yet有效的post hoc方法,即 removing the rank-1 matrix composed of the largest singular value and the associated singular vectors from the high-level feature.
results: 本研究的实验结果表明,使用了提案的方法可以大幅降低OOD检测中的false positive rate(FPR95),并且与其他方法相比,提案的方法具有更好的compatibility和更高的抗抑制性。此外,本研究还提出了一种基于weight matrix的OOD检测方法,即 removing the rank-1 weight from the parameter matrices of a single deep layer.这种方法可以单独使用,也可以与提案的方法结合使用,以达到更高的检测性能。Abstract
The task of out-of-distribution (OOD) detection is crucial for deploying machine learning models in real-world settings. In this paper, we observe that the singular value distributions of the in-distribution (ID) and OOD features are quite different: the OOD feature matrix tends to have a larger dominant singular value than the ID feature, and the class predictions of OOD samples are largely determined by it. This observation motivates us to propose \texttt{RankFeat}, a simple yet effective \emph{post hoc} approach for OOD detection by removing the rank-1 matrix composed of the largest singular value and the associated singular vectors from the high-level feature. \texttt{RankFeat} achieves \emph{state-of-the-art} performance and reduces the average false positive rate (FPR95) by 17.90\% compared with the previous best method. The success of \texttt{RankFeat} motivates us to investigate whether a similar phenomenon would exist in the parameter matrices of neural networks. We thus propose \texttt{RankWeight} which removes the rank-1 weight from the parameter matrices of a single deep layer. Our \texttt{RankWeight}is also \emph{post hoc} and only requires computing the rank-1 matrix once. As a standalone approach, \texttt{RankWeight} has very competitive performance against other methods across various backbones. Moreover, \texttt{RankWeight} enjoys flexible compatibility with a wide range of OOD detection methods. The combination of \texttt{RankWeight} and \texttt{RankFeat} refreshes the new \emph{state-of-the-art} performance, achieving the FPR95 as low as 16.13\% on the ImageNet-1k benchmark. Extensive ablation studies and comprehensive theoretical analyses are presented to support the empirical results.
摘要
“模型在真实世界中部署需要进行外部分布(OOD)检测。本文发现内部分布(ID)和OOD特征值分布不同:OOD特征矩阵通常有较大的主要对角线值,并且影响了标签预测的OOD标签。这个观察驱使我们提出\texttt{RankFeat},一个简单 yet effective的\emph{post hoc}方法,通过删除高级特征矩阵中的主要对角线值和相关的对角线 вектор来进行OOD检测。\texttt{RankFeat}实现了\emph{状况之境}的性能,降低了平均错 Positive Rate (FPR95)的值为17.90%,较前一代最佳方法为19.67%。\texttt{RankFeat}的成功驱使我们 investigates whether a similar phenomenon would exist in the parameter matrices of neural networks。我们因此提出\texttt{RankWeight},删除单一深度层中的主要重量矩阵。\texttt{RankWeight}是\emph{post hoc}的,仅需一次计算主要对角线 Matrix。作为一个独立的方法,\texttt{RankWeight}的性能仅次于其他方法,并且具有与多种OOD检测方法Compatible的自适应性。 combining \texttt{RankWeight} and \texttt{RankFeat} refreshes the new \emph{状况之境} performance,FPR95的值降低到16.13%,在ImageNet-1k Benchmark上。广泛的实验研究和深入的理论分析支持了实验结果。”
High-Order Tensor Recovery with A Tensor $U_1$ Norm
results: 数学分析表明,提案的算法可以将karush-kuhn-tucker (KKT) 点的优化问题解决。实验表明,提案的方法在高阶张量完成问题中表现出色,特别是在tensor数据中存在非平滑变化的情况下。Abstract
Recently, numerous tensor SVD (t-SVD)-based tensor recovery methods have emerged, showing promise in processing visual data. However, these methods often suffer from performance degradation when confronted with high-order tensor data exhibiting non-smooth changes, commonly observed in real-world scenarios but ignored by the traditional t-SVD-based methods. Our objective in this study is to provide an effective tensor recovery technique for handling non-smooth changes in tensor data and efficiently explore the correlations of high-order tensor data across its various dimensions without introducing numerous variables and weights. To this end, we introduce a new tensor decomposition and a new tensor norm called the Tensor $U_1$ norm. We utilize these novel techniques in solving the problem of high-order tensor completion problem and provide theoretical guarantees for the exact recovery of the resulting tensor completion models. An optimization algorithm is proposed to solve the resulting tensor completion model iteratively by combining the proximal algorithm with the Alternating Direction Method of Multipliers. Theoretical analysis showed the convergence of the algorithm to the Karush-Kuhn-Tucker (KKT) point of the optimization problem. Numerical experiments demonstrated the effectiveness of the proposed method in high-order tensor completion, especially for tensor data with non-smooth changes.
摘要
近些年,许多基于t-SVD(tensor singular value decomposition)的高维数据恢复方法出现,显示了处理视觉数据的抢手。然而,这些方法常受到高维数据中非平滑变化的影响,这些变化在实际场景中很常见,但是传统的t-SVD基本方法忽略了这些变化。我们的目标在这个研究中是提供一种有效的高维数据恢复技术,能够处理非平滑变化并快速探索高维数据中各维度之间的相关性,而不需要创建大量的变量和负重。为此,我们引入了一种新的高维分解和一种新的高维范数,称为Tensor $U_1$ 范数。我们利用这些新技术解决高维tensor完成问题,并提供了关于恢复模型的理论保证。一种基于 proximal 算法和Alternating Direction Method of Multipliers(ADMM)的优化算法是提出来解决高维tensor completion问题。理论分析表明该算法会 converges to KKT 点。numerical experiments表明我们的方法在高维tensor completion问题中的效果,特别是对于具有非平滑变化的tensor数据。
Electric Network Frequency Optical Sensing Devices
paper_authors: Christos Moysiadis, Georgios Karantaidis, Constantine Kotropoulos for: 本研究的目的是在室内照明环境下使用光电子设备来估算电网频率(ENF)。methods: 本研究使用了一种基于 photodiode 的首先光电子设备,以及直接从电力主要中收集 ENF 的设备作为真实参照值。此外,还使用了一个摄像头作为第二个光电子感知器,以估算 ENF。results: 研究发现,使用光电子设备来估算 ENF 的精度取决于各种因素,包括照明环境、摄像头的位置和光电子设备的配置。研究还提供了广泛的实验证据,证明了光电子设备可以在不同的场景下(包括静止场景和人员活动场景)准确地估算 ENF。Abstract
Electric Network Frequency (ENF) acts as a fingerprint in multimedia forensics applications. In indoor environments, ENF variations affect the intensity of light sources connected to power mains. Accordingly, the light intensity variations captured by sensing devices can be exploited to estimate the ENF. A first optical sensing device based on a photodiode is developed for capturing ENF variations in indoor lighting environments. In addition, a device that captures the ENF directly from power mains is implemented. This device serves as a ground truth ENF collector. Video recordings captured by a camera are also employed to estimate the ENF. The camera serves as a second optical sensor. The factors affecting the ENF estimation are thoroughly studied. The maximum correlation coefficient between the ENF estimated by the two optical sensors and that estimated directly from power mains is used to measure the estimation accuracy. The paper's major contribution is in the disclosure of extensive experimental evidence on ENF estimation in scenes ranging from static ones capturing a white wall to non-static ones, including human activity.
摘要
电网频率(ENF)作为多媒体嫌疑人证件。在室内环境中,ENF变化影响了连接到电力主线的灯光源的强度。因此,通过感测设备记录的光度变化可以估算ENF。首先开发了基于 фото逻器的一种光学感测设备,用于在室内照明环境中捕捉ENF变化。此外,还实现了直接从电力主线收集ENF的设备,作为真实ENF收集器。 Camera也被用作第二种光学感测设备,用于估算ENF。影响ENF估算的因素得到了仔细研究。使用最大相关系数来衡量估算精度。文章的主要贡献在于提供了广泛的实验证据,描述了ENF估算在不同场景中的性能,从静止场景(捕捉白墙)到非静止场景(包括人类活动)。
Robustness-Reinforced Knowledge Distillation with Correlation Distance and Network Pruning
paper_authors: Seonghak Kim, Gyeongdo Ham, Yucheol Cho, Daeshik Kim
for: 提高高效轻量级模型(学生模型)的性能
methods: 使用知识储存(KD)技术,通过将知识从更复杂的模型(教师模型)传递给学生模型
results: 在不同的图像数据集上,包括 CIFAR-100、FGVR、TinyImagenet 和 ImageNet,与当前状态艺技术相比,提出了更高效的方法 R2KD,并通过数据扩展和网络剔除来提高性能Abstract
The improvement in the performance of efficient and lightweight models (i.e., the student model) is achieved through knowledge distillation (KD), which involves transferring knowledge from more complex models (i.e., the teacher model). However, most existing KD techniques rely on Kullback-Leibler (KL) divergence, which has certain limitations. First, if the teacher distribution has high entropy, the KL divergence's mode-averaging nature hinders the transfer of sufficient target information. Second, when the teacher distribution has low entropy, the KL divergence tends to excessively focus on specific modes, which fails to convey an abundant amount of valuable knowledge to the student. Consequently, when dealing with datasets that contain numerous confounding or challenging samples, student models may struggle to acquire sufficient knowledge, resulting in subpar performance. Furthermore, in previous KD approaches, we observed that data augmentation, a technique aimed at enhancing a model's generalization, can have an adverse impact. Therefore, we propose a Robustness-Reinforced Knowledge Distillation (R2KD) that leverages correlation distance and network pruning. This approach enables KD to effectively incorporate data augmentation for performance improvement. Extensive experiments on various datasets, including CIFAR-100, FGVR, TinyImagenet, and ImageNet, demonstrate our method's superiority over current state-of-the-art methods.
摘要
通过知识储存(KD)提高高效轻量级模型(学生模型)的性能,我们可以通过从更复杂的模型(教师模型)传递知识来实现这一目标。然而,现有的大多数KD技术都基于基本矩阵(KL)差异,这有一些限制。首先,如果教师分布有高 entropy,KL差异的模均性限制了学生模型继承的目标信息的传递。其次,当教师分布有低 entropy时,KL差异倾向于过度强调特定模式,从而对学生模型的知识传递造成损害。因此,在处理含有许多干扰或挑战样本的数据集时,学生模型可能会遇到差异知识的问题,导致性能下降。此外,在之前的KD方法中,我们发现了数据扩展,一种目的是增强模型的通用性的技术,可能会对KD造成负面影响。因此,我们提出了一种Robustness-Reinforced Knowledge Distillation(R2KD),它利用相关距离和网络剪辑来实现有效的KD incorporation。我们的方法在多个数据集上,包括CIFAR-100、FGVR、TinyImagenet和ImageNet等,进行了广泛的实验,并证明了与当前状态的先进方法相比,我们的方法具有更高的超越性。
Periodically Exchange Teacher-Student for Source-Free Object Detection
results: 在多个 SFOD 标准测试集上,我们的方法 achieved state-of-the-art 性能,比如其他相关方法更高,这表明我们的方法在 SFOD 任务中表现出了优势。Abstract
Source-free object detection (SFOD) aims to adapt the source detector to unlabeled target domain data in the absence of source domain data. Most SFOD methods follow the same self-training paradigm using mean-teacher (MT) framework where the student model is guided by only one single teacher model. However, such paradigm can easily fall into a training instability problem that when the teacher model collapses uncontrollably due to the domain shift, the student model also suffers drastic performance degradation. To address this issue, we propose the Periodically Exchange Teacher-Student (PETS) method, a simple yet novel approach that introduces a multiple-teacher framework consisting of a static teacher, a dynamic teacher, and a student model. During the training phase, we periodically exchange the weights between the static teacher and the student model. Then, we update the dynamic teacher using the moving average of the student model that has already been exchanged by the static teacher. In this way, the dynamic teacher can integrate knowledge from past periods, effectively reducing error accumulation and enabling a more stable training process within the MT-based framework. Further, we develop a consensus mechanism to merge the predictions of two teacher models to provide higher-quality pseudo labels for student model. Extensive experiments on multiple SFOD benchmarks show that the proposed method achieves state-of-the-art performance compared with other related methods, demonstrating the effectiveness and superiority of our method on SFOD task.
摘要
Attribute-Aware Representation Rectification for Generalized Zero-Shot Learning
results: 在多个 benchmark datasets上,our方法得到了广泛的实验证明,显示了其效果。Abstract
Generalized Zero-shot Learning (GZSL) has yielded remarkable performance by designing a series of unbiased visual-semantics mappings, wherein, the precision relies heavily on the completeness of extracted visual features from both seen and unseen classes. However, as a common practice in GZSL, the pre-trained feature extractor may easily exhibit difficulty in capturing domain-specific traits of the downstream tasks/datasets to provide fine-grained discriminative features, i.e., domain bias, which hinders the overall recognition performance, especially for unseen classes. Recent studies partially address this issue by fine-tuning feature extractors, while may inevitably incur catastrophic forgetting and overfitting issues. In this paper, we propose a simple yet effective Attribute-Aware Representation Rectification framework for GZSL, dubbed $\mathbf{(AR)^{2}$, to adaptively rectify the feature extractor to learn novel features while keeping original valuable features. Specifically, our method consists of two key components, i.e., Unseen-Aware Distillation (UAD) and Attribute-Guided Learning (AGL). During training, UAD exploits the prior knowledge of attribute texts that are shared by both seen/unseen classes with attention mechanisms to detect and maintain unseen class-sensitive visual features in a targeted manner, and meanwhile, AGL aims to steer the model to focus on valuable features and suppress them to fit noisy elements in the seen classes by attribute-guided representation learning. Extensive experiments on various benchmark datasets demonstrate the effectiveness of our method.
摘要
通用零分学习(GZSL)已经实现了很好的表现,通过设计一系列不偏见的视 Semantics 映射,其中精度归功于提取到seen和unseen类中的完整的视觉特征。然而,在GZSL中通常存在一个问题,即预训练的特征提取器可能会难以捕捉下游任务/数据集的域特征,导致精度下降,特别是对未经见类型的案例。 latest studies 部分解决了这个问题,通过 fine-tuning 特征提取器,但可能会导致恰等忘记和过拟合问题。在这篇论文中,我们提出了一种简单 yet effective的特征整Rectification 框架,名为 $\mathbf{(AR)^{2}$,用于适应GZSL。我们的方法包括两个关键组成部分:即 Unseen-Aware Distillation (UAD) 和 Attribute-Guided Learning (AGL)。在训练中,UAD 利用seen和unseen类之间共享的属性文本的先前知识,通过注意机制检测和保留未经见类型的视觉特征,同时,AGL 目标是使模型关注有价值的特征,并且压缩其以适应噪音元素在seen类中的表示学习。我们在多个标准数据集上进行了广泛的实验,证明了我们的方法的有效性。
MetaFBP: Learning to Learn High-Order Predictor for Personalized Facial Beauty Prediction
results: 提出了一种 MetaFBP 框架,通过一种 universal feature extractor 捕捉美学共同部分,然后通过一种 meta-学机制来适应用户特有的美学偏好。经验表明,该方法可以快速适应用户偏好。Abstract
Predicting individual aesthetic preferences holds significant practical applications and academic implications for human society. However, existing studies mainly focus on learning and predicting the commonality of facial attractiveness, with little attention given to Personalized Facial Beauty Prediction (PFBP). PFBP aims to develop a machine that can adapt to individual aesthetic preferences with only a few images rated by each user. In this paper, we formulate this task from a meta-learning perspective that each user corresponds to a meta-task. To address such PFBP task, we draw inspiration from the human aesthetic mechanism that visual aesthetics in society follows a Gaussian distribution, which motivates us to disentangle user preferences into a commonality and an individuality part. To this end, we propose a novel MetaFBP framework, in which we devise a universal feature extractor to capture the aesthetic commonality and then optimize to adapt the aesthetic individuality by shifting the decision boundary of the predictor via a meta-learning mechanism. Unlike conventional meta-learning methods that may struggle with slow adaptation or overfitting to tiny support sets, we propose a novel approach that optimizes a high-order predictor for fast adaptation. In order to validate the performance of the proposed method, we build several PFBP benchmarks by using existing facial beauty prediction datasets rated by numerous users. Extensive experiments on these benchmarks demonstrate the effectiveness of the proposed MetaFBP method.
摘要
预测个人美学偏好具有重要的实际应用和学术意义,但现有研究主要集中在学习和预测共同的脸 Beauty 的吸引力,忽略了个人化的脸 Beauty 预测(PFBP)。PFBP 目标是开发一种可适应个人美学偏好的机器,只需要每名用户提供几张排名过的图像。在这篇论文中,我们从meta-学习的角度来解决这个任务,认为每名用户对应一个meta-任务。为Addressing PFBP任务,我们 drew inspiration from human的美学机制,认为视觉美学在社会中遵循 Gaussian 分布,这种分布驱动我们将用户偏好分解为共同部分和个体部分。为此,我们提出了一个Novel MetaFBP框架,其中我们设计了通用的特征提取器来捕捉美学共同部分,然后通过meta-学习机制来适应个体部分。不同于传统的meta-学习方法可能会降低到慢速适应或过拟合到小支持集,我们提出了一种新的方法,即通过高阶预测器来快速适应。为验证提出的方法效果,我们建立了多个PFBP benchmark,使用现有的脸 Beauty 预测数据集,由多名用户评分。广泛的实验表明,我们的MetaFBP方法具有显著的效果。
Predicting Recovery or Decease of COVID-19 Patients with Clinical and RT-PCR Using Machine Learning Classification Algorithms
paper_authors: Mohammad Dehghani, Zahra Yazdanparast for: This study aims to examine whether machine learning algorithms can predict the outcome of COVID-19 cases (recovery or death) based on the features present in the dataset, and to determine which feature set (clinical or RT-PCR) is more reliable for predicting recovery and decease.methods: The study uses six machine learning methods to build prediction models, including random forest, which showed promising results with an accuracy of 78.7%.results: The study finds that recovery and decease of patients are predictable using machine learning, with clinical alone (without using RT-PCR), trained with AdaBoost algorithm, being the most accurate with an accuracy of 82.1%.Abstract
The COVID-19 pandemic has disrupted the global economy and people's daily lives in unprecedented ways. To make appropriate decisions, it is necessary to diagnose COVID-19 rapidly and accurately. Clinical decision making is influenced by data collected from patients. With the aid of artificial intelligence, COVID-19 has been diagnosed quickly by analyzing symptoms, polymerase chain reaction (PCR), computed tomography scans, chest X-rays, routine laboratory blood tests and even cough sounds. Furthermore, these data can be used to predict a patient's morality, although there is a question about which data makes the most accurate predictions. Therefore, this study consists of two parts. Our first objective is to examine whether machine learning algorithms can predict the outcome of COVID-19 cases (recovery or death), based on the features present in the dataset. In the second part of the research, we investigated the impact of clinical and RT-PCR on prediction of recovery and decease to determine which one is more reliable. We defined four stages with different feature sets and use six machine learning methods to build prediction model. With an accuracy of 78.7%, random forest showed promising results for predicting death and recovery of patients. Based on this, it appears that recovery and decease of patients are predictable using machine learning. For second objective, results indicate that clinical alone (without using RT-PCR), trained with AdaBoost algorithm, is the most accurate with an accuracy of 82.1%. This study can provide guidance for medical professionals in the event of a crisis or outbreak similar to COVID-19.
摘要
COVID-19 大流行对全球经济和人们日常生活造成了无前例的干扰。为了作出合适的决策,需要快速和准确地诊断 COVID-19。临床决策受到了患者的数据的影响。通过人工智能, COVID-19 已经快速地诊断了通过症状、核酸链式反应(PCR)、 computed tomography 成像、胸部X射线、常规实验室血液检测和喊喊 зву频等数据。此外,这些数据还可以预测患者的生存可能性,但是有一个问题是哪些数据可以提供最准确的预测。因此,这个研究包括两个部分。我们的第一个目标是检查机器学习算法是否可以根据数据集中的特征预测 COVID-19 患者的结局(recovery 或 death)。在第二部分的研究中,我们研究了临床和 RT-PCR 对预测恢复和死亡的影响,以确定哪一种更可靠。我们定义了四个阶段,每个阶段具有不同的特征集,并使用六种机器学习方法建立预测模型。random forest 显示了78.7% 的准确率,用于预测患者的死亡和恢复。根据这个结果,表明患者的恢复和死亡是可预测的。在第二个目标中,结果表明,临床alone(不使用 RT-PCR),使用 AdaBoost 算法,是最准确的,准确率为82.1%。这个研究可以为医疗专业人员在类似 COVID-19 的危机或爆发时提供指导。
Expanding the deep-learning model to diagnosis LVNC: Limitations and trade-offs
results: 对于不同的 cardiomyopathy 患者群,DL-LVTQ 的精度、特异性和kappa值均有所提高,而且保持了感知性的高度。 cardiologists 评估表明,98.9%的评估输出都得到了严格的临床诊断。Abstract
Hyper-trabeculation or non-compaction in the left ventricle of the myocardium (LVNC) is a recently classified form of cardiomyopathy. Several methods have been proposed to quantify the trabeculae accurately in the left ventricle, but there is no general agreement in the medical community to use a particular approach. In previous work, we proposed DL-LVTQ, a deep learning approach for left ventricular trabecular quantification based on a U-Net CNN architecture. DL-LVTQ was an automatic diagnosis tool developed from a dataset of patients with the same cardiomyopathy (hypertrophic cardiomyopathy). In this work, we have extended and adapted DL-LVTQ to cope with patients with different cardiomyopathies. The dataset consists of up 379 patients in three groups with different particularities and cardiomyopathies. Patient images were taken from different scanners and hospitals. We have modified and adapted the U-Net convolutional neural network to account for the different particularities of a heterogeneous group of patients with various unclassifiable or mixed and inherited cardiomyopathies. The inclusion of new groups of patients has increased the accuracy, specificity and kappa values while maintaining the sensitivity of the automatic deep learning method proposed. Therefore, a better-prepared diagnosis tool is ready for various cardiomyopathies with different characteristics. Cardiologists have considered that 98.9% of the evaluated outputs are verified clinically for diagnosis. Therefore, the high precision to segment the different cardiac structures allows us to make a robust diagnostic system objective and faster, decreasing human error and time spent.
摘要
Left ventricular non-compaction (LVNC) is a newly recognized form of cardiomyopathy. Several methods have been proposed to accurately quantify the trabeculae in the left ventricle, but there is no consensus in the medical community on which approach to use. In previous work, we proposed a deep learning approach called DL-LVTQ, which uses a U-Net CNN architecture to quantify left ventricular trabecular. DL-LVTQ was developed from a dataset of patients with hypertrophic cardiomyopathy. In this study, we extended and adapted DL-LVTQ to accommodate patients with different cardiomyopathies. The dataset consisted of 379 patients in three groups with different characteristics and cardiomyopathies. The patient images were obtained from different scanners and hospitals. We modified and adapted the U-Net convolutional neural network to account for the different characteristics of the heterogeneous group of patients with various unclassifiable or mixed and inherited cardiomyopathies.The inclusion of new groups of patients increased the accuracy, specificity, and kappa values of the automatic deep learning method, while maintaining its sensitivity. Therefore, a more comprehensive diagnosis tool is ready for various cardiomyopathies with different characteristics. Cardiologists have verified that 98.9% of the evaluated outputs are clinically useful for diagnosis. The high precision in segmenting different cardiac structures allows for a more objective and faster diagnosis, reducing human error and the time spent.
results: 实验结果表明,我们的方法对所有类型的活动都有效。Abstract
This paper focuses on activity retrieval from a video query in an imbalanced scenario. In current query-by-activity-video literature, a common assumption is that all activities have sufficient labelled examples when learning an embedding. This assumption does however practically not hold, as only a portion of activities have many examples, while other activities are only described by few examples. In this paper, we propose a visual-semantic embedding network that explicitly deals with the imbalanced scenario for activity retrieval. Our network contains two novel modules. The visual alignment module performs a global alignment between the input video and fixed-sized visual bank representations for all activities. The semantic module performs an alignment between the input video and fixed-sized semantic activity representations. By matching videos with both visual and semantic activity representations that are of equal size over all activities, we no longer ignore infrequent activities during retrieval. Experiments on a new imbalanced activity retrieval benchmark show the effectiveness of our approach for all types of activities.
摘要
Compositional Zero-shot Learning via Progressive Language-based Observations
methods: 这个方法使用Progressive Language-based Observations (PLO),可以 dynamically determine a better observation order of primitives,包括使用预训练的vision-language models (VLMs) 和 large language models (LLMs) 来掌握图像内容。
results: 实验结果显示PLO比顶对照方法 superior,能够实现Compositional recognition的能力。Abstract
Compositional zero-shot learning aims to recognize unseen state-object compositions by leveraging known primitives (state and object) during training. However, effectively modeling interactions between primitives and generalizing knowledge to novel compositions remains a perennial challenge. There are two key factors: object-conditioned and state-conditioned variance, i.e., the appearance of states (or objects) can vary significantly when combined with different objects (or states). For instance, the state "old" can signify a vintage design for a "car" or an advanced age for a "cat". In this paper, we argue that these variances can be mitigated by predicting composition categories based on pre-observed primitive. To this end, we propose Progressive Language-based Observations (PLO), which can dynamically determine a better observation order of primitives. These observations comprise a series of concepts or languages that allow the model to understand image content in a step-by-step manner. Specifically, PLO adopts pre-trained vision-language models (VLMs) to empower the model with observation capabilities. We further devise two variants: 1) PLO-VLM: a two-step method, where a pre-observing classifier dynamically determines the observation order of two primitives. 2) PLO-LLM: a multi-step scheme, which utilizes large language models (LLMs) to craft composition-specific prompts for step-by-step observing. Extensive ablations on three challenging datasets demonstrate the superiority of PLO compared with state-of-the-art methods, affirming its abilities in compositional recognition.
摘要
compositional zero-shot learning的目标是通过使用已知基本元素(状态和对象)进行训练,recognize未看过的状态-对象组合。然而,有两个关键因素:对象conditioned和状态conditioned的差异,即不同的对象(或状态)组合可能导致状态(或对象)的外观发生很大的变化。例如,状态“老”可能表示车型的“旧式设计”或“老鼠”的“高龄”。在这篇论文中,我们认为这些差异可以通过基于先前观察的基本元素来缓解。为此,我们提出了进步语言基于观察(PLO),可以动态确定更好的基本元素观察顺序。这些观察包括一系列的概念或语言,allowing the model to understand图像内容以步骤方式。具体来说,PLO使用预训练的视力语言模型(VLM)来赋能模型。我们还开发了两个变体:1)PLO-VLM:一种两步方法,在两个基本元素之间动态确定观察顺序。2)PLO-LLM:一种多步方案,使用大型语言模型(LLM)来为步骤观察编写特定的作业。我们在三个复杂的数据集进行了广泛的ablations,并证明了PLO在compositional recognition中的优越性,证明了它在基本元素观察和组合recognition方面的能力。
results: 实验结果显示,PointPCA+可以达到高度的预测性能,与对公共可用数据集的主观真实分数进行比较。Here’s the breakdown of each point in more detail:
for: The paper is proposing a new metric for assessing the quality of point clouds, called PointPCA+.
methods: The method uses PCA only on the geometry data, and enriches the existing geometry and texture descriptors.
results: The experimental results show that PointPCA+ achieves high predictive performance, compared to subjective ground truth scores from publicly available datasets.Abstract
A computationally-simplified and descriptor-richer Point Cloud Quality Assessment (PCQA) metric, namely PointPCA+, is proposed in this paper, which is an extension of PointPCA. PointPCA proposed a set of perceptually-relevant descriptors based on PCA decomposition that were applied to both the geometry and texture data of point clouds for full reference PCQA. PointPCA+ employs PCA only on the geometry data while enriching existing geometry and texture descriptors, that are computed more efficiently. Similarly to PointPCA, a total quality score is obtained through a learning-based fusion of individual predictions from geometry and texture descriptors that capture local shape and appearance properties, respectively. Before feature fusion, a feature selection module is introduced to choose the most effective features from a proposed super-set. Experimental results show that PointPCA+ achieves high predictive performance against subjective ground truth scores obtained from publicly available datasets. The code is available at \url{https://github.com/cwi-dis/pointpca_suite/}.
摘要
“这篇论文提出了一个 computationally-simplified 且描述量更高的 Point Cloud Quality Assessment (PCQA) 度量,即 PointPCA+,它是 PointPCA 的扩展。PointPCA 使用 PCA 分解提出了一系列具有感知意义的描述子,并将它们应用到了点云的几何和текxture数据上,以进行全参照 PCQA。PointPCA+ 则是将 PCA 只对几何数据进行分解,并对几何和текxture数据进行更加充分的描述,并且计算更加高效。PointPCA+ 使用一个学习基于的混合方法,将几何和tekxture数据的描述器组合成一个总质量分数。在混合前,我们引入了一个特选功能,以选择最有效的特征。实验结果显示,PointPCA+ 可以对对公开 available 的数据集进行高精度预测。代码可以在 \url{https://github.com/cwi-dis/pointpca_suite/} 上获取。”Note: Please keep in mind that the translation is in Simplified Chinese, and the grammar and word choice may be different from Traditional Chinese.
results: 在两个 benchmark 数据集上实验表明,我们的方法可以在语言导航下实现新基eline 的几拍 semantic segmentation,并与最近的视觉导航方法竞争。Abstract
Few-shot learning is a promising way for reducing the label cost in new categories adaptation with the guidance of a small, well labeled support set. But for few-shot semantic segmentation, the pixel-level annotations of support images are still expensive. In this paper, we propose an innovative solution to tackle the challenge of few-shot semantic segmentation using only language information, i.e.image-level text labels. Our approach involves a vision-language-driven mask distillation scheme, which contains a vision-language pretraining (VLP) model and a mask refiner, to generate high quality pseudo-semantic masks from text prompts. We additionally introduce a distributed prototype supervision method and complementary correlation matching module to guide the model in digging precise semantic relations among support and query images. The experiments on two benchmark datasets demonstrate that our method establishes a new baseline for language-guided few-shot semantic segmentation and achieves competitive results to recent vision-guided methods.
摘要
新领域适应中减少标签成本的有前途的方法之一是几拍学习。然而,几拍Semantic segmentation的像素级注释仍然是昂贵的。在这篇论文中,我们提出了一种创新的解决方案,即通过语言信息来实现几拍Semantic segmentation,即使没有像素级注释。我们的方法包括一个视力语言预training(VLP)模型和一个mask整理器,以生成高质量的假Semantic masks从文本提示。我们还引入了分布式原型监督方法和补做匹配模块,以导引模型在支持和查询图像之间挖掘精准的semantic关系。实验结果表明,我们的方法在两个 benchmark dataset 上建立了新的基准点,并与最近的视力导向方法匹敌。
Perceptual Image Compression with Cooperative Cross-Modal Side Information
paper_authors: Shiyu Qin, Bin Chen, Yujun Huang, Baoyi An, Tao Dai, Shu-Tao Xia for: This paper aims to enhance image compression by utilizing text-level semantic dependencies to improve the rate-perception tradeoff and semantic distortion.methods: The proposed method employs the CLIP text encoder and an effective Semantic-Spatial Aware block to fuse text and image features, and uses a text-conditional generative adversarial network to improve perceptual quality.results: The proposed approach achieves superior results in terms of rate-perception tradeoff and semantic distortion, as demonstrated by extensive experiments on four datasets and ten image quality assessment metrics.Here’s the Chinese translation:for: 本文目的是使用文本水平semantic dependencies来提高图像压缩的rate-perceptiontradeoff和semantic distortion。methods: 提议的方法使用CLIP文本编码器和有效的Semantic-Spatial Aware块来融合文本和图像特征,并使用文本 conditional generative adversarial networks来提高图像的感知质量。results: 实验结果显示,提议的方法在四个dataset和十种图像质量评价指标下 achieve superior results in terms of rate-perception tradeoff和semantic distortion。Abstract
The explosion of data has resulted in more and more associated text being transmitted along with images. Inspired by from distributed source coding, many works utilize image side information to enhance image compression. However, existing methods generally do not consider using text as side information to enhance perceptual compression of images, even though the benefits of multimodal synergy have been widely demonstrated in research. This begs the following question: How can we effectively transfer text-level semantic dependencies to help image compression, which is only available to the decoder? In this work, we propose a novel deep image compression method with text-guided side information to achieve a better rate-perception-distortion tradeoff. Specifically, we employ the CLIP text encoder and an effective Semantic-Spatial Aware block to fuse the text and image features. This is done by predicting a semantic mask to guide the learned text-adaptive affine transformation at the pixel level. Furthermore, we design a text-conditional generative adversarial networks to improve the perceptual quality of reconstructed images. Extensive experiments involving four datasets and ten image quality assessment metrics demonstrate that the proposed approach achieves superior results in terms of rate-perception trade-off and semantic distortion.
摘要
“数据爆炸”的出现导致更多的相关文本被传输 вместе与图像。启发自分布式源编码,许多研究利用图像侧信息进行图像压缩。然而,现有方法通常不考虑使用文本作为侧信息来提高图像压缩的效果,尽管研究中已经广泛证明了多Modal synergy的好处。这引起了以下问题:如何有效地传输文本水平的semantic依赖度以帮助图像压缩,即使只有解码器可以获得这些信息?在这项工作中,我们提出了一种新的深度图像压缩方法,使用文本指导的侧信息来实现更好的rate-perception-distortion质量衡平衡。具体来说,我们使用CLIP文本编码器和一个有效的Semantic-Spatial Aware块来融合文本和图像特征。这是通过预测一个semantic掩蔽来引导学习文本适应变换来实现的。此外,我们设计了一个文本 conditional Generative Adversarial Networks来提高重建图像的 perceived质量。经验证明了在四个数据集和十个图像质量评价指标下,我们的方法可以获得更好的rate-perception-distortion质量衡平衡和semantic Distortion。
Progressive Learning with Visual Prompt Tuning for Variable-Rate Image Compression
results: 对比多种方法,包括分立优化的多模型方法,提出的方法可以达到同样的性能,但却占用参数存储的80%和数据集的90%。同时,我们的模型超越了现有的变量压缩图像方法,并接近 fixes 压缩图像压缩方法。Abstract
In this paper, we propose a progressive learning paradigm for transformer-based variable-rate image compression. Our approach covers a wide range of compression rates with the assistance of the Layer-adaptive Prompt Module (LPM). Inspired by visual prompt tuning, we use LPM to extract prompts for input images and hidden features at the encoder side and decoder side, respectively, which are fed as additional information into the Swin Transformer layer of a pre-trained transformer-based image compression model to affect the allocation of attention region and the bits, which in turn changes the target compression ratio of the model. To ensure the network is more lightweight, we involves the integration of prompt networks with less convolutional layers. Exhaustive experiments show that compared to methods based on multiple models, which are optimized separately for different target rates, the proposed method arrives at the same performance with 80% savings in parameter storage and 90% savings in datasets. Meanwhile, our model outperforms all current variable bitrate image methods in terms of rate-distortion performance and approaches the state-of-the-art fixed bitrate image compression methods trained from scratch.
摘要
在本文中,我们提出了一种进程式学习模式,用于基于变换器的变量比例图像压缩。我们的方法覆盖了各种压缩率的范围,通过层 adaptive 提示模块 (LPM) 的帮助。以视觉提示调整为 inspirations,我们使用 LPM 提取输入图像和隐藏特征在encoder和decoder两侧的提示,并将其作为附加信息传递给 pré-train 的 transformer 基于图像压缩模型的 Swin Transformer 层,以影响 allocating 注意力区域和比特,从而改变模型的目标压缩率。为了使模型更轻量级,我们涉及了提示网络的集成,并减少了 convolutional 层数。经过 exhaustive 的实验表明,与基于多个模型优化的方法相比,我们的方法可以达到同等性能,并且占用参数存储空间的80%和数据集的90%。同时,我们的模型超越了当前的变量比例图像压缩方法,并接近预先训练的 fixes 比例图像压缩方法。
HOMOE: A Memory-Based and Composition-Aware Framework for Zero-Shot Learning with Hopfield Network and Soft Mixture of Experts
paper_authors: Do Huu Dat, Po Yuan Mao, Tien Hoang Nguyen, Wray Buntine, Mohammed Bennamoun for:This paper focuses on the problem of zero-shot learning, specifically the challenge of managing unfamiliar combinations of seen and unseen classes. It proposes a novel framework that combines the Modern Hopfield Network with a Mixture of Experts (HOMOE) to classify the compositions of previously unseen objects.methods:The proposed framework uses the Modern Hopfield Network to create a memory that stores label prototypes and identifies relevant labels for a given input image. The Mixture of Experts then integrates the image with the fitting prototype to produce the final composition classification.results:The approach achieves state-of-the-art (SOTA) performance on several benchmarks, including MIT-States and UT-Zappos. The paper also examines how each component contributes to improved generalization.Here is the simplified Chinese translation of the three key points:for:这篇论文关注零shot学习问题,具体来说是处理未经见过的组合类型的挑战。它提议一种新的框架,将现代套件网络与混合专家模型(HOMOE)结合起来,对未经见过的物品进行组合分类。methods:该框架使用现代套件网络创建一个存储标签谱系的记忆,并将输入图像与适应的标签拟合到一起进行最终的组合分类。results:该方法在多个 benchmark 上达到了领先的状态(SOTA)性能,包括 MIT-States 和 UT-Zappos。文章还分析了每个组件对改进泛化的贡献。Abstract
Compositional Zero-Shot Learning (CZSL) has emerged as an essential paradigm in machine learning, aiming to overcome the constraints of traditional zero-shot learning by incorporating compositional thinking into its methodology. Conventional zero-shot learning has difficulty managing unfamiliar combinations of seen and unseen classes because it depends on pre-defined class embeddings. In contrast, Compositional Zero-Shot Learning uses the inherent hierarchies and structural connections among classes, creating new class representations by combining attributes, components, or other semantic elements. In our paper, we propose a novel framework that for the first time combines the Modern Hopfield Network with a Mixture of Experts (HOMOE) to classify the compositions of previously unseen objects. Specifically, the Modern Hopfield Network creates a memory that stores label prototypes and identifies relevant labels for a given input image. Following this, the Mixture of Expert models integrates the image with the fitting prototype to produce the final composition classification. Our approach achieves SOTA performance on several benchmarks, including MIT-States and UT-Zappos. We also examine how each component contributes to improved generalization.
摘要
compositional zero-shot learning (CZSL) 已经成为机器学习中的一种重要思想,旨在超越传统的零例学习方法,通过在方法ологии中吸收compositional思维。传统的零例学习在处理未经见过的类型和组合时存在困难,因为它依赖于预先定义的类别嵌入。而CZSL则利用类型之间的自然层次结构和semantic元素的连接,创造新的类别表示。在我们的论文中,我们提出了一种新的框架,将现代洛氏网络(Modern Hopfield Network)与权重混合模型(Mixture of Experts,HOMOE)结合,以分类未经见过的对象的组合。具体来说,现代洛氏网络创建了一个存储标签范例的记忆,然后使用相关的标签来识别输入图像的标签。接着,权重混合模型将图像与适合的标签混合以生成最终的组合分类结果。我们的方法在多个benchmark上达到了顶尖性能,包括MIT-States和UT-Zappos。我们还研究了每个组件在改进泛化性能方面的贡献。
results: 该方法可以帮助实现一种平衡 между符合目标特征和保持源内容的identiy,并且可以在不同的参数空间中采样目标。Abstract
We introduce Posterior Distillation Sampling (PDS), a novel optimization method for parametric image editing based on diffusion models. Existing optimization-based methods, which leverage the powerful 2D prior of diffusion models to handle various parametric images, have mainly focused on generation. Unlike generation, editing requires a balance between conforming to the target attribute and preserving the identity of the source content. Recent 2D image editing methods have achieved this balance by leveraging the stochastic latent encoded in the generative process of diffusion models. To extend the editing capabilities of diffusion models shown in pixel space to parameter space, we reformulate the 2D image editing method into an optimization form named PDS. PDS matches the stochastic latents of the source and the target, enabling the sampling of targets in diverse parameter spaces that align with a desired attribute while maintaining the source's identity. We demonstrate that this optimization resembles running a generative process with the target attribute, but aligning this process with the trajectory of the source's generative process. Extensive editing results in Neural Radiance Fields and Scalable Vector Graphics representations demonstrate that PDS is capable of sampling targets to fulfill the aforementioned balance across various parameter spaces.
摘要
我们介绍Posterior Distillation Sampling(PDS),一种新的优化方法 дляParametric Image Editing基于扩散模型。现有的优化基于方法,通过利用强大的2D假设扩散模型来处理各种参数图像,主要侧重于生成。不过,编译需要寻求参数空间中的平衡,既要遵循目标属性,又要保持来源内容的身份。现有的2D图像编译方法通过扩散模型的生成过程中的随机内码来实现这个平衡。为了将这些编译能力从像素空间延伸到参数空间,我们将这种2D图像编译方法改编为名为PDS的优化形式。PDS将目标和来源的随机内码匹配,使得目标在参数空间中的样本可以遵循一个想定的属性,同时保持来源内容的身份。我们显示了将这个优化视为在目标属性下的生成过程,并与来源生成过程的路径进行对齐,可以在各种参数空间中获得具有均衡的样本。我们的实验结果显示,PDS可以在Neural Radiance Fields和Scalable Vector Graphics表示中进行具有均衡的样本。
Evidential Active Recognition: Intelligent and Prudent Open-World Embodied Perception
results: 本研究表明,引入uncertainty可以提高活动认识的表现,并且提出了一种基于证据组合理论的sequential evidence-gathering процесsto提供步骤的uncertainty量化和可靠预测。Abstract
Active recognition enables robots to intelligently explore novel observations, thereby acquiring more information while circumventing undesired viewing conditions. Recent approaches favor learning policies from simulated or collected data, wherein appropriate actions are more frequently selected when the recognition is accurate. However, most recognition modules are developed under the closed-world assumption, which makes them ill-equipped to handle unexpected inputs, such as the absence of the target object in the current observation. To address this issue, we propose treating active recognition as a sequential evidence-gathering process, providing by-step uncertainty quantification and reliable prediction under the evidence combination theory. Additionally, the reward function developed in this paper effectively characterizes the merit of actions when operating in open-world environments. To evaluate the performance, we collect a dataset from an indoor simulator, encompassing various recognition challenges such as distance, occlusion levels, and visibility. Through a series of experiments on recognition and robustness analysis, we demonstrate the necessity of introducing uncertainties to active recognition and the superior performance of the proposed method.
摘要
活动识别可以让机器人智能地探索新的观察结果,从而获得更多的信息而不是避免不想要的观察条件。 latest approaches 倾向于从 simulated 或 collected 数据学习策略,其中适当的动作更加频繁地被选择当 recognition 是准确的。然而,大多数 recognition 模块是在closed-world assumption下开发的,这使得它们无法处理意外输入,如目标对象在当前观察中的缺失。为解决这个问题,我们提议将活动识别看作是一个顺序的证据收集过程,提供步骤不确定性评估和可靠预测基于证据组合理论。此外,我们在这篇论文中提出的奖励函数有效地描述在开放世界环境中操作时的动作的价值。为评估表现,我们从室内 simulator 中收集了一个数据集,包括识别挑战的多种因素,如距离、遮挡水平和可见度。通过一系列的识别和Robustness 分析,我们证明了引入不确定性是活动识别的必要和我们的方法的超越性。
All in One: RGB, RGB-D, and RGB-T Salient Object Detection
results: AiOSOD 模型在 RGB、RGB-D 和 RGB-T 数据集上实现了高速(780FPS)和高效率(6.25M参数)的精炼对象检测,并且在不同数据类型上具有优秀的检测性能。Abstract
Salient object detection (SOD) aims to identify the most attractive objects within an image. Depending on the type of data being detected, SOD can be categorized into various forms, including RGB, RGB-D (Depth), RGB-T (Thermal) and light field SOD. Previous researches have focused on saliency detection with individual data type. If the RGB-D SOD model is forced to detect RGB-T data it will perform poorly. We propose an innovative model framework that provides a unified solution for the salient object detection task of three types of data (RGB, RGB-D, and RGB-T). The three types of data can be handled in one model (all in one) with the same weight parameters. In this framework, the three types of data are concatenated in an ordered manner within a single input batch, and features are extracted using a transformer network. Based on this framework, we propose an efficient lightweight SOD model, namely AiOSOD, which can detect any RGB, RGB-D, and RGB-T data with high speed (780FPS for RGB data, 485FPS for RGB-D or RGB-T data). Notably, with only 6.25M parameters, AiOSOD achieves excellent performance on RGB, RGB-D, and RGB-T datasets.
摘要
抽象对象检测(SOD)目标是在图像中标识最吸引人的对象。根据检测数据的类型,SOD可以分为不同的形式,包括RGB、RGB-D(深度)和RGB-T(热场)等。之前的研究主要集中在单独的数据类型上进行了抽象检测。如果强制RGB-D SOD模型检测RGB-T数据,其性能将下降。我们提出了一种创新的模型框架,可以对三种数据类型(RGB、RGB-D和RGB-T)进行统一的检测任务。这三种数据类型在一个模型中(all in one)使用同一个参数进行处理,并通过变换网络提取特征。基于这种框架,我们提出了一种高效的轻量级 SOD模型,即AiOSOD,可以高速检测RGB、RGB-D和RGB-T数据(RGB数据780帧/秒,RGB-D或RGB-T数据485帧/秒)。值得注意的是,AiOSOD只有6.25万参数,在RGB、RGB-D和RGB-T数据集上表现出色。
Dynamic Compositional Graph Convolutional Network for Efficient Composite Human Motion Prediction
For: The paper is written for the task of human motion prediction, specifically addressing the challenge of predicting composite actions.* Methods: The paper proposes a Composite Action Generation (CAG) module to generate synthetic composite actions for training, and a Dynamic Compositional Graph Convolutional Network (DC-GCN) to handle the composite actions.* Results: The paper achieves state-of-the-art motion prediction accuracies on the Human3.6M dataset and the newly collected CHAMP dataset, with few extra computational costs compared to traditional GCN-based human motion methods.Here is the information in Simplified Chinese text:
results: 本文在 Human3.6M 数据集和新收集的 CHAMP 数据集上实现了状态机器人动作预测精度的最佳性和最小化计算成本,与传统 GCNN 基于人体动作方法相比。Abstract
With potential applications in fields including intelligent surveillance and human-robot interaction, the human motion prediction task has become a hot research topic and also has achieved high success, especially using the recent Graph Convolutional Network (GCN). Current human motion prediction task usually focuses on predicting human motions for atomic actions. Observing that atomic actions can happen at the same time and thus formulating the composite actions, we propose the composite human motion prediction task. To handle this task, we first present a Composite Action Generation (CAG) module to generate synthetic composite actions for training, thus avoiding the laborious work of collecting composite action samples. Moreover, we alleviate the effect of composite actions on demand for a more complicated model by presenting a Dynamic Compositional Graph Convolutional Network (DC-GCN). Extensive experiments on the Human3.6M dataset and our newly collected CHAMP dataset consistently verify the efficiency of our DC-GCN method, which achieves state-of-the-art motion prediction accuracies and meanwhile needs few extra computational costs than traditional GCN-based human motion methods.
摘要
With potential applications in fields including intelligent surveillance and human-robot interaction, the human motion prediction task has become a hot research topic and has achieved high success, especially using the recent Graph Convolutional Network (GCN). Current human motion prediction tasks usually focus on predicting human motions for atomic actions. Observing that atomic actions can happen at the same time and thus formulating the composite actions, we propose the composite human motion prediction task. To handle this task, we first present a Composite Action Generation (CAG) module to generate synthetic composite actions for training, thus avoiding the laborious work of collecting composite action samples. Moreover, we alleviate the effect of composite actions on demand for a more complicated model by presenting a Dynamic Compositional Graph Convolutional Network (DC-GCN). Extensive experiments on the Human3.6M dataset and our newly collected CHAMP dataset consistently verify the efficiency of our DC-GCN method, which achieves state-of-the-art motion prediction accuracies and meanwhile needs few extra computational costs than traditional GCN-based human motion methods.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and widely used in other countries as well. The Traditional Chinese version of the text would be slightly different, with some characters and word order changes.
Detection and Identification Accuracy of PCA-Accelerated Real-Time Processing of Hyperspectral Imagery
results: 根据这篇论文的结果,可以将主成分数量降低到一个可观的水平,而不会影响检测率。这表示可以实现更快速的处理速度。Abstract
Real-time or near real-time hyperspectral detection and identification are extremely useful and needed in many fields. These data sets can be quite large, and the algorithms can require numerous computations that slow the process down. A common way of speeding up the process is to use principal component analysis (PCA) for dimension reduction. In the reduced dimensional space, provided by a subset of the principal components, fewer computations are needed to process the data resulting in a faster run time. In this paper, we propose a way to further decrease the time required to use PCA by investigating how many principal components may be omitted with minimal impact on the detection rate. Using ACE to perform the detection, and then probability, and spectral fit for identification, we find that the number of principal components can be reduced by a substantial amount before seeing a noticeable change in detection rates.
摘要
< transtable实时或准实时多spectral探测和识别是许多领域中非常有用和需要的。这些数据集可能很大,算法可能需要很多计算,导致处理时间增长。一种常见的加速方法是使用主成分分析(PCA)进行维度减少。在减少的维度空间中,通过一个子集的主成分,需要 fewer computations 来处理数据,从而减少处理时间。在这篇论文中,我们提出了一种方法,来进一步减少使用 PCA 时的时间。我们使用 ACE 进行探测,然后使用概率和spectral fit进行识别,我们发现可以减少主成分的数量,而不会影响探测率。
GS-Pose: Category-Level Object Pose Estimation via Geometric and Semantic Correspondence
results: 我们的方法可以在几何和semantic特征的情况下提高pose estimation的精度和效率,并且需要训练数据的量相对较少。我们通过了一系列评估,证明了我们的方法可以在几何和semantic特征的情况下提高pose estimation的精度和效率,并且需要训练数据的量相对较少。Abstract
Category-level pose estimation is a challenging task with many potential applications in computer vision and robotics. Recently, deep-learning-based approaches have made great progress, but are typically hindered by the need for large datasets of either pose-labelled real images or carefully tuned photorealistic simulators. This can be avoided by using only geometry inputs such as depth images to reduce the domain-gap but these approaches suffer from a lack of semantic information, which can be vital in the pose estimation problem. To resolve this conflict, we propose to utilize both geometric and semantic features obtained from a pre-trained foundation model.Our approach projects 2D features from this foundation model into 3D for a single object model per category, and then performs matching against this for new single view observations of unseen object instances with a trained matching network. This requires significantly less data to train than prior methods since the semantic features are robust to object texture and appearance. We demonstrate this with a rich evaluation, showing improved performance over prior methods with a fraction of the data required.
摘要
category-level pose estimation是一个复杂的计算机视觉和机器人领域的任务。现在,深度学习基础的方法已经取得了很大的进步,但通常受到大量的真实图像或高度调整的 фото真实模拟器的需求,这可以避免使用只geometry输入来减少领域差距,但这些方法缺乏semantic信息,这可以是pose estimation问题中的关键因素。为解决这个冲突,我们提议使用基础模型已经预训练的both geometric和semantic特征。我们的方法将基础模型的2D特征映射到3D,并使用一个已经训练的匹配网络对新的单视观察结果进行匹配。这需要训练数据量相对较少,因为semantic特征对物体文件和外观不敏感。我们通过了 ric evaluation,展示了我们的方法在对比先前方法的情况下具有更高的性能,并且需要的数据量只占了相对较少的一部分。
paper_authors: Shivam Gupta, Aditya Parulekar, Eric Price, Zhiyang Xun
for: This paper focuses on improving the efficiency of score-based diffusion models for deep generative modeling of images.
methods: The paper uses a score-matching objective to estimate the score function, which is then used for sampling. The authors show that estimating the score in $L^2$ requires a polynomial dependence on the data radius and desired Wasserstein accuracy, but that a polylogarithmic number of samples suffice for sampling.
results: The paper shows that with a polylogarithmic number of samples, the ERM of the score-matching objective is $L^2$ accurate on all but a probability $\delta$ fraction of the true distribution, and that this weaker guarantee is sufficient for efficient sampling.Here’s the Chinese translation of the three points:
results: 论文显示,polylogarithmic数量的样本可以使ERM分数匹配目标在true分布中的$L^2$精度高于某些probability $\delta$ 的分布上,这种较弱的保证足以保证有效的抽取。Abstract
Score-based diffusion models have become the most popular approach to deep generative modeling of images, largely due to their empirical performance and reliability. Recently, a number of theoretical works \citep{chen2022, Chen2022ImprovedAO, Chenetal23flowode, benton2023linear} have shown that diffusion models can efficiently sample, assuming $L^2$-accurate score estimates. The score-matching objective naturally approximates the true score in $L^2$, but the sample complexity of existing bounds depends \emph{polynomially} on the data radius and desired Wasserstein accuracy. By contrast, the time complexity of sampling is only logarithmic in these parameters. We show that estimating the score in $L^2$ \emph{requires} this polynomial dependence, but that a number of samples that scales polylogarithmically in the Wasserstein accuracy actually do suffice for sampling. We show that with a polylogarithmic number of samples, the ERM of the score-matching objective is $L^2$ accurate on all but a probability $\delta$ fraction of the true distribution, and that this weaker guarantee is sufficient for efficient sampling.
摘要
“分数基于的扩散模型已成为深度生成图像的最受欢迎方法,主要是因为它们的实际性和可靠性。最近,一些理论工作(《chen2022》、《Chen2022ImprovedAO》、《Chenetal23flowode》、《benton2023linear》)表明,扩散模型可以高效地采样,假设有$L^2$准确的分数估计。分数匹配目标自然地 aproximates 真的分数在$L^2$中,但现有的样本复杂度的下界取决于数据半径和所需的温顺环境精度。然而,采样的时间复杂度只是对这些参数的对数增长。我们表明,估计分数在$L^2$中需要这样的多项式依赖,但是只需要polylogarithmic数量的样本,实际上可以在 Wasserstein 精度下采样。我们还证明,使用polylogarithmic数量的样本,ERM 的分数匹配目标在所有 except 一个可能性下是$L^2$准确的,并且这个弱保证足够 для有效的采样。”
For: The paper aims to improve the traditional expert-based approach to identifying and evaluating project risks in large infrastructure projects, and to provide a data-driven framework for risk management.* Methods: The paper uses historical data and artificial intelligence techniques to automatically identify risks and evaluate the quality of early risk registers and risk assessments.* Results: The study examines the evolution of risks over time and compares the effectiveness of risk identification and assessment in the initial phase versus project execution. The results provide insights into how project teams can improve their risk management practices and enhance the success of large infrastructure projects.Abstract
Managing project risk is a key part of the successful implementation of any large project and is widely recognized as a best practice for public agencies to deliver infrastructures. The conventional method of identifying and evaluating project risks involves getting input from subject matter experts at risk workshops in the early phases of a project. As a project moves through its life cycle, these identified risks and their assessments evolve. Some risks are realized to become issues, some are mitigated, and some are retired as no longer important. Despite the value provided by conventional expert-based approaches, several challenges remain due to the time-consuming and expensive processes involved. Moreover, limited is known about how risks evolve from ex-ante to ex-post over time. How well does the project team identify and evaluate risks in the initial phase compared to what happens during project execution? Using historical data and artificial intelligence techniques, this study addressed these limitations by introducing a data-driven framework to identify risks automatically and to examine the quality of early risk registers and risk assessments. Risk registers from more than 70 U.S. major transportation projects form the input dataset.
摘要
管理项目风险是大型项目实施成功的关键部分,广泛被认为是公共机构实施基础设施的最佳做法。传统的风险认知和评估方法是在项目早期阶段通过专家参与的风险工作shop获取输入。项目进行生命周期中,这些识别的风险和评估会随着时间的推移而发展。一些风险实现为问题,一些被mitigate,一些不再重要而被退役。虽然传统专家基本方法提供了价值,但是存在一些挑战,其中一些是时间consuming和昂贵的过程。此外,对于风险从初始阶段到执行阶段的发展不熟悉。项目团队在初期阶段是否能够correctly识别和评估风险?使用历史数据和人工智能技术,本研究解决了这些限制,通过自动识别风险和评估early risk registers和风险评估质量。研究使用了美国70多个大型交通项目的风险登记表作为输入数据。
The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024
paper_authors: Benjamin Kiefer, Lojze Žust, Matej Kristan, Janez Perš, Matija Teršek, Arnold Wiliem, Martin Messmer, Cheng-Yen Yang, Hsiang-Wei Huang, Zhongyu Jiang, Heng-Cheng Kuo, Jie Mei, Jenq-Neng Hwang, Daniel Stadler, Lars Sommer, Kaer Huang, Aiguo Zheng, Weitu Chong, Kanokphan Lertniphonphan, Jun Xie, Feng Chen, Jian Li, Zhepeng Wang, Luca Zedda, Andrea Loddo, Cecilia Di Ruberto, Tuan-Anh Vu, Hai Nguyen-Truong, Tan-Sang Ha, Quan-Dung Pham, Sai-Kit Yeung, Yuan Feng, Nguyen Thanh Thien, Lixin Tian, Sheng-Yao Kuan, Yuan-Hao Ho, Angel Bueno Rodriguez, Borja Carrillo-Perez, Alexander Klein, Antje Alex, Yannik Steiniger, Felix Sattler, Edgardo Solano-Carrillo, Matej Fabijanić, Magdalena Šumunec, Nadir Kapetanović, Andreas Michel, Wolfgang Gross, Martin Weinmann
results: 本文对挑战中的发现进行了全面的概述,并提供了统计分析和质量分析,评估了来自超过195个提交的趋势。所有数据集、评估代码和排名可以在https://macvi.org/workshop/macvi24上公开获得。Abstract
The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 addresses maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicles (USV). Three challenges categories are considered: (i) UAV-based Maritime Object Tracking with Re-identification, (ii) USV-based Maritime Obstacle Segmentation and Detection, (iii) USV-based Maritime Boat Tracking. The USV-based Maritime Obstacle Segmentation and Detection features three sub-challenges, including a new embedded challenge addressing efficicent inference on real-world embedded devices. This report offers a comprehensive overview of the findings from the challenges. We provide both statistical and qualitative analyses, evaluating trends from over 195 submissions. All datasets, evaluation code, and the leaderboard are available to the public at https://macvi.org/workshop/macvi24.
摘要
第二届海上计算机视觉工作shop(MaCVi2024)关注海上计算机视觉领域中无人飞行器(UAV)和无人水面船舶(USV)的应用。工作shop分为三个挑战类别:1. UAV-based Maritime Object Tracking with Re-identification:关注使用UAV进行海上物体跟踪和重新识别。2. USV-based Maritime Obstacle Segmentation and Detection:关注使用USV进行海上障碍物分割和检测。3. USV-based Maritime Boat Tracking:关注使用USV进行海上船舶跟踪。USV-based Maritime Obstacle Segmentation and Detection中有三个子挑战,其中一个新的嵌入式挑战是关注实际设备上快速和高效的推理。本报告提供了海上计算机视觉领域的批处理结果,包括统计分析和质量分析,评估了来自超过195个提交的趋势。所有数据集、评估代码和排名可以在https://macvi.org/workshop/macvi24上公开获取。
Appearance-based gaze estimation enhanced with synthetic images using deep neural networks
results: 使用MetaHuman工具生成了超过57,000个人脸的大规模 sintetic数据集,并将其公开提供。在训练模型时,将这个数据集与标准的哥伦比亚眼动数据集一起使用,导致眼动估计的均方误差在眼球倾斜和旋转方向下降至两度以下,与相关方法相比。此外,我们还验证了该模型在实际场景中的可行性,通过使用NICO semi-humanoid机器人的内置4K相机进行先期测试。Abstract
Human eye gaze estimation is an important cognitive ingredient for successful human-robot interaction, enabling the robot to read and predict human behavior. We approach this problem using artificial neural networks and build a modular system estimating gaze from separately cropped eyes, taking advantage of existing well-functioning components for face detection (RetinaFace) and head pose estimation (6DRepNet). Our proposed method does not require any special hardware or infrared filters but uses a standard notebook-builtin RGB camera, as often approached with appearance-based methods. Using the MetaHuman tool, we also generated a large synthetic dataset of more than 57,000 human faces and made it publicly available. The inclusion of this dataset (with eye gaze and head pose information) on top of the standard Columbia Gaze dataset into training the model led to better accuracy with a mean average error below two degrees in eye pitch and yaw directions, which compares favourably to related methods. We also verified the feasibility of our model by its preliminary testing in real-world setting using the builtin 4K camera in NICO semi-humanoid robot's eye.
摘要
人类眼光估计是成功人机交互中重要的认知元素,允许机器人阅读和预测人类行为。我们采用人工神经网络方法解决这个问题,建立了分解眼睛的眼光估计系统,利用现有的面部检测(RetinaFace)和头部姿态估计(6DRepNet)的可靠组件。我们的提议方法不需要任何特殊硬件或红外筛选,只使用了标准的RGB摄像头,与现有的外观基于方法类似。使用MetaHuman工具,我们还生成了大量的人类面部synthetic数据集,并公开提供了这些数据。在训练模型时,将这些数据集(包括眼光和头部姿态信息)与标准的科学地域眼光数据集相加之后,模型的准确率下降至每个眼球的投掷和旋转方向的平均误差下二度,与相关方法相比较较高。我们还验证了我们的模型在实际场景中的可行性,通过NICO半人工机器人的内置4K摄像头进行先期测试。
Evaluating GPT-4’s Vision Capabilities on Brazilian University Admission Exams
results: 研究显示,GPT-4模型在处理复杂多学科问题方面具有人类水平表现,而且在使用文本和视觉元素时,文本描述性的表现更为出色。然而,数学题仍然是这些现代模型的挑战。研究代码和数据可以在GitHub上获取(https://github.com/piresramon/gpt-4-enem)。Abstract
Recent advancements in language models have showcased human-comparable performance in academic entrance exams. However, existing studies often overlook questions that require the integration of visual comprehension, thus compromising the full spectrum and complexity inherent in real-world scenarios. To address this gap, we present a comprehensive framework to evaluate language models on entrance exams, which incorporates both textual and visual elements. We evaluate the two most recent editions of Exame Nacional do Ensino M\'edio (ENEM), the main standardized entrance examination adopted by Brazilian universities. Our study not only reaffirms the capabilities of GPT-4 as the state of the art for handling complex multidisciplinary questions, but also pioneers in offering a realistic assessment of multimodal language models on Portuguese examinations. One of the highlights is that text captions transcribing visual content outperform the direct use of images, suggesting that the vision model has room for improvement. Yet, despite improvements afforded by images or captions, mathematical questions remain a challenge for these state-of-the-art models. The code and data used on experiments are available at https://github.com/piresramon/gpt-4-enem.
摘要
Variational Annealing on Graphs for Combinatorial Optimization
paper_authors: Sebastian Sanokowski, Wilhelm Berghammer, Sepp Hochreiter, Sebastian Lehner
for: solves combinatorial optimization (CO) problems with probabilistic approaches, but with performance limitations on difficult problem instances.
methods: introduces subgraph tokenization to represent the configuration of solution variables with a single token, alleviating the drawback of long sequential sampling procedure, and uses annealed entropy regularization to ensure efficient and stable learning.
results: demonstrates superior performance on many popular CO problems compared to traditional probabilistic approaches, and provides theoretical motivation for the annealed entropy regularization.Abstract
Several recent unsupervised learning methods use probabilistic approaches to solve combinatorial optimization (CO) problems based on the assumption of statistically independent solution variables. We demonstrate that this assumption imposes performance limitations in particular on difficult problem instances. Our results corroborate that an autoregressive approach which captures statistical dependencies among solution variables yields superior performance on many popular CO problems. We introduce subgraph tokenization in which the configuration of a set of solution variables is represented by a single token. This tokenization technique alleviates the drawback of the long sequential sampling procedure which is inherent to autoregressive methods without sacrificing expressivity. Importantly, we theoretically motivate an annealed entropy regularization and show empirically that it is essential for efficient and stable learning.
摘要
几种最近的无监督学习方法使用概率方法解决 combinatorial optimization(CO)问题,基于解决变量独立性的假设。我们示出这种假设对特别是困难问题实例带来性能限制。我们的结果证明了一种自适应方法,该方法捕捉解决变量之间的统计相关性,在许多流行的 CO 问题上表现出优秀的性能。我们还介绍了子图化划分,该技术使得解决变量的配置被表示为单个Token。这种划分技术消除了概率方法无法避免的长顺序采样过程的缺点,而不是牺牲表达能力。更重要的是,我们 theoretically 驱动了一种热退 entropy 规范,并 empirically 表明其是有效和稳定学习的关键。
Tube-NeRF: Efficient Imitation Learning of Visuomotor Policies from MPC using Tube-Guided Data Augmentation and NeRFs
results: 研究表明,通过使用Tube-NeRF数据增强策略和稳定MPC,可以快速增强多旋翼机器人的视emo听控制能力,并且可以在实际环境中实现高精度的地理位和跟踪控制,即使面临大的干扰。Abstract
Imitation learning (IL) can train computationally-efficient sensorimotor policies from a resource-intensive Model Predictive Controller (MPC), but it often requires many samples, leading to long training times or limited robustness. To address these issues, we combine IL with a variant of robust MPC that accounts for process and sensing uncertainties, and we design a data augmentation (DA) strategy that enables efficient learning of vision-based policies. The proposed DA method, named Tube-NeRF, leverages Neural Radiance Fields (NeRFs) to generate novel synthetic images, and uses properties of the robust MPC (the tube) to select relevant views and to efficiently compute the corresponding actions. We tailor our approach to the task of localization and trajectory tracking on a multirotor, by learning a visuomotor policy that generates control actions using images from the onboard camera as only source of horizontal position. Our evaluations numerically demonstrate learning of a robust visuomotor policy with an 80-fold increase in demonstration efficiency and a 50% reduction in training time over current IL methods. Additionally, our policies successfully transfer to a real multirotor, achieving accurate localization and low tracking errors despite large disturbances, with an onboard inference time of only 1.5 ms.
摘要
模仿学习(IL)可以训练计算效率高的感知动作策略从一个资源占用过重的预测控制器(MPC)中,但它通常需要大量样本,导致训练时间长或Robustness有限。为解决这些问题,我们将IL与一种考虑过程和感知不确定性的robust MPC结合,并设计了一种数据扩充(DA)策略。我们称之为Tube-NeRF的DA方法,它利用神经辐射场(NeRFs)生成新的 sintetic图像,并使用Tube的特性选择相关的视图和高效地计算相应的动作。我们将我们的方法应用于多旋翼机的本地化和轨迹跟踪任务,通过学习一个视图动作策略,该策略使用机载摄像头中的图像作为垂直位置的唯一来源。我们的评估结果表明,我们可以通过Tube-NeRF DA方法学习一个Robust的视图动作策略,与现有IL方法相比,提高示例效率80倍,降低训练时间50%。此外,我们的策略成功地转移到实际的多旋翼机上,实现了准确的本地化和低跟踪误差,即使面临大的干扰,仅在机载推理时间1.5ms。
Machine Learning For An Explainable Cost Prediction of Medical Insurance
results: 研究发现所有模型都表现出色,但 XGBoost 模型在整体性能方面表现更好,尽管它需要更多的计算资源;而 RF 模型则录得较低的预测误差,但它所需的计算资源比 XGBoost 模型少得多。此外,对于每个模型,使用 XAi 方法来揭示决定因素的结果表明,ICE 图表提供了更多的交互detail,而 SHAP 分析则显示了更高一级的总体视图。Abstract
Predictive modeling in healthcare continues to be an active actuarial research topic as more insurance companies aim to maximize the potential of Machine Learning approaches to increase their productivity and efficiency. In this paper, the authors deployed three regression-based ensemble ML models that combine variations of decision trees through Extreme Gradient Boosting, Gradient-boosting Machine, and Random Forest) methods in predicting medical insurance costs. Explainable Artificial Intelligence methods SHapley Additive exPlanations and Individual Conditional Expectation plots were deployed to discover and explain the key determinant factors that influence medical insurance premium prices in the dataset. The dataset used comprised 986 records and is publicly available in the KAGGLE repository. The models were evaluated using four performance evaluation metrics, including R-squared, Mean Absolute Error, Root Mean Squared Error, and Mean Absolute Percentage Error. The results show that all models produced impressive outcomes; however, the XGBoost model achieved a better overall performance although it also expanded more computational resources, while the RF model recorded a lesser prediction error and consumed far fewer computing resources than the XGBoost model. Furthermore, we compared the outcome of both XAi methods in identifying the key determinant features that influenced the PremiumPrices for each model and whereas both XAi methods produced similar outcomes, we found that the ICE plots showed in more detail the interactions between each variable than the SHAP analysis which seemed to be more high-level. It is the aim of the authors that the contributions of this study will help policymakers, insurers, and potential medical insurance buyers in their decision-making process for selecting the right policies that meet their specific needs.
摘要
预测模型在医疗保险领域仍然是活跃的 actuarial 研究话题,更多的保险公司希望通过机器学习方法提高产出力和效率。本文中,作者们部署了三种重合式机器学习模型(EXTREME GRADIENT BOOSTING、GRADIENT-BOOSTING MACHINE和RANDOM FOREST)来预测医疗保险费用。使用可解释人工智能方法(SHapley Additive exPlanations和Individual Conditional Expectation plots)来发现和解释医疗保险费用中的关键determinant factor。使用的数据集包括986个记录,可公开在KAGGLE存储库中。模型通过四种表现评价指标(R-squared、Mean Absolute Error、Root Mean Squared Error和Mean Absolute Percentage Error)进行评价。结果显示所有模型都达到了出色的结果,但XGBoost模型在整体表现更好,尽管它也需要更多的计算资源;而RF模型记录了较小的预测错误,但它的计算资源consumption比XGBoost模型要少。此外,我们对两种XAi方法的结果进行比较,发现ICE plots显示了每个变量之间的互动,而SHAP分析则似乎是高级的。作者们希望通过本研究的贡献,帮助政策制定者、保险公司和可能需要购买医疗保险的人们在选择适合自己需求的策略时作出更 Informed 决策。
Byzantine Robustness and Partial Participation Can Be Achieved Simultaneously: Just Clip Gradient Differences
results: 本文的实验结果显示,该方法可以在满足一定的假设下达到与现有 SOTA 理论结果相同的抽象率。Abstract
Distributed learning has emerged as a leading paradigm for training large machine learning models. However, in real-world scenarios, participants may be unreliable or malicious, posing a significant challenge to the integrity and accuracy of the trained models. Byzantine fault tolerance mechanisms have been proposed to address these issues, but they often assume full participation from all clients, which is not always practical due to the unavailability of some clients or communication constraints. In our work, we propose the first distributed method with client sampling and provable tolerance to Byzantine workers. The key idea behind the developed method is the use of gradient clipping to control stochastic gradient differences in recursive variance reduction. This allows us to bound the potential harm caused by Byzantine workers, even during iterations when all sampled clients are Byzantine. Furthermore, we incorporate communication compression into the method to enhance communication efficiency. Under quite general assumptions, we prove convergence rates for the proposed method that match the existing state-of-the-art (SOTA) theoretical results.
摘要
分布式学习已经成为训练大型机器学习模型的主导方法。然而,在实际场景中,参与者可能不可靠或有恶意行为,这会对训练模型的精度和完整性造成重大挑战。拜占庭错误tolerance机制已经被提出来解决这些问题,但它们通常假设所有客户端都会参与训练,这并不是实际场景中一定可行的。在我们的工作中,我们提出了首个基于客户端采样的分布式方法,并且可以证明对拜占庭工作者快速响应。我们的方法的关键思想是使用梯度截断来控制随机梯度差异在回归变量减少中。这使得我们可以在所有采样客户端都是恶意的情况下缓 bound潜在的害。此外,我们还将通信压缩 incorporated into the method to enhance communication efficiency. Under quite general assumptions, we prove convergence rates for the proposed method that match the existing state-of-the-art (SOTA) theoretical results.
Towards Auditing Large Language Models: Improving Text-based Stereotype Detection
results: 该论文的实验结果表明,使用多重刻板类别器可以超过一对一的binary分类器的性能,并且可以准确地检测和评估大语言模型中的刻板印象偏见。Abstract
Large Language Models (LLM) have made significant advances in the recent past becoming more mainstream in Artificial Intelligence (AI) enabled human-facing applications. However, LLMs often generate stereotypical output inherited from historical data, amplifying societal biases and raising ethical concerns. This work introduces i) the Multi-Grain Stereotype Dataset, which includes 52,751 instances of gender, race, profession and religion stereotypic text and ii) a novel stereotype classifier for English text. We design several experiments to rigorously test the proposed model trained on the novel dataset. Our experiments show that training the model in a multi-class setting can outperform the one-vs-all binary counterpart. Consistent feature importance signals from different eXplainable AI tools demonstrate that the new model exploits relevant text features. We utilise the newly created model to assess the stereotypic behaviour of the popular GPT family of models and observe the reduction of bias over time. In summary, our work establishes a robust and practical framework for auditing and evaluating the stereotypic bias in LLM.
摘要
多粒子刻板数据集(Multi-Grain Stereotype Dataset),包含52,751个性别、种族、职业和宗教刻板文本的例子。2. 一个新的刻板分类器 для English 文本,用于评估 LLM 的刻板性。我们设计了多个实验,用于严谨地测试我们的提案。我们发现,在多组分类设定下,我们的模型可以超越一对一的 binary 对抗方案。各种 Explainable AI 工具的常见特征重要性信号表明,我们的新模型充分利用了文本特征。我们使用我们创建的模型,评估 GPT 家族模型中的刻板性,并发现随时间的减少偏见。总之,我们的工作建立了一个可靠且实用的框架,用于审核和评估 LLM 中的刻板性。
methods: 这 paper 使用了一种新的debate协议,其中honestStrategy可以在幂等数量的步骤上进行模拟,并且可以验证杂音 AI 系统的对称性。
results: 这 paper 的结果表明,这种新的debate协议可以在杂音 AI 系统下实现honestStrategy的成功,并且可以验证对称性。Abstract
The emergence of pre-trained AI systems with powerful capabilities across a diverse and ever-increasing set of complex domains has raised a critical challenge for AI safety as tasks can become too complicated for humans to judge directly. Irving et al. [2018] proposed a debate method in this direction with the goal of pitting the power of such AI models against each other until the problem of identifying (mis)-alignment is broken down into a manageable subtask. While the promise of this approach is clear, the original framework was based on the assumption that the honest strategy is able to simulate deterministic AI systems for an exponential number of steps, limiting its applicability. In this paper, we show how to address these challenges by designing a new set of debate protocols where the honest strategy can always succeed using a simulation of a polynomial number of steps, whilst being able to verify the alignment of stochastic AI systems, even when the dishonest strategy is allowed to use exponentially many simulation steps.
摘要
“随着具有强大能力的预训AI系统在复杂和不断增长的领域中出现,AI安全问题受到了严重挑战。尔金等人(2018)提出了一种辩论方法,目的是通过将AI模型之间进行辩论,直到解决不同领域中的(mis)调整问题。尽管这种方法的推奨可以预见,但原始框架假设了诚实策略可以递对数steps进行实际模拟,从而导致其应用范围受限。在这篇论文中,我们显示了如何解决这些挑战,通过设计一组新的辩论协议,让诚实策略在几何步骤的模拟中获得成功,并且能够验证数学AI系统的调整,即使对方可以使用无限多步骤进行模拟。”
A density estimation perspective on learning from pairwise human preferences
results: 研究发现,在一些特定的生成过程下,通过对 pairwise 偏好数据进行反馈来学习 annotator 的隐藏偏好分布是有效的,但是在一些特定的情况下,如果对 annotator 的偏好分布进行了不正确的假设,那么这种方法可能会遇到问题。Abstract
Learning from human feedback (LHF) -- and in particular learning from pairwise preferences -- has recently become a crucial ingredient in training large language models (LLMs), and has been the subject of much research. Most recent works frame it as a reinforcement learning problem, where a reward function is learned from pairwise preference data and the LLM is treated as a policy which is adapted to maximize the rewards, often under additional regularization constraints. We propose an alternative interpretation which centers on the generative process for pairwise preferences and treats LHF as a density estimation problem. We provide theoretical and empirical results showing that for a family of generative processes defined via preference behavior distribution equations, training a reward function on pairwise preferences effectively models an annotator's implicit preference distribution. Finally, we discuss and present findings on "annotator misspecification" -- failure cases where wrong modeling assumptions are made about annotator behavior, resulting in poorly-adapted models -- suggesting that approaches that learn from pairwise human preferences could have trouble learning from a population of annotators with diverse viewpoints.
摘要
学习人类反馈(LHF)——特别是学习对比喜好——在训练大型自然语言模型(LLM)中变得越来越重要,而这也成为了许多研究的主要话题。大多数最新的研究将其视为一种强化学习问题,在对比喜好数据上学习一个奖励函数,并将LLM视为一种策略,该策略是在奖励函数的支持下进行最大化。经常附加一些常规约束。我们提出了一个不同的解释,即基于对比喜好生成过程的概率分布来对LHF进行定义。我们提供了理论和实验结果,表明对一家族的生成过程定义via喜好行为分布方程,训练对对比人类喜好的奖励函数可以有效地模型录音器的隐藏喜好分布。最后,我们讨论了“录音器误pecification”——录音器行为假设错误的情况,导致模型适应度差。我们发现,通过对对比人类喜好进行学习,可能会遇到录音器视角多样性导致的适应度差问题。
When is Off-Policy Evaluation Useful? A Data-Centric Perspective
results: DataCOPE 能够评估机器学习和人类专家政策(如医疗指南)的性能。我们在contextual bandit 设置下对医疗数据进行了Empirical 分析,并证明了 DataCOPE 的能力。Abstract
Evaluating the value of a hypothetical target policy with only a logged dataset is important but challenging. On the one hand, it brings opportunities for safe policy improvement under high-stakes scenarios like clinical guidelines. On the other hand, such opportunities raise a need for precise off-policy evaluation (OPE). While previous work on OPE focused on improving the algorithm in value estimation, in this work, we emphasize the importance of the offline dataset, hence putting forward a data-centric framework for evaluating OPE problems. We propose DataCOPE, a data-centric framework for evaluating OPE, that answers the questions of whether and to what extent we can evaluate a target policy given a dataset. DataCOPE (1) forecasts the overall performance of OPE algorithms without access to the environment, which is especially useful before real-world deployment where evaluating OPE is impossible; (2) identifies the sub-group in the dataset where OPE can be inaccurate; (3) permits evaluations of datasets or data-collection strategies for OPE problems. Our empirical analysis of DataCOPE in the logged contextual bandit settings using healthcare datasets confirms its ability to evaluate both machine-learning and human expert policies like clinical guidelines.
摘要
评估假设的目标策略的价值使用只有日志数据是重要但也是挑战。一方面,它提供了高度风险场景如医疗指南中的机会,例如,评估医疗指南的安全性。另一方面,这些机会需要精准的非直接评估(OPE)。先前的OPE研究主要关注了估计价值的算法,而在这种工作中,我们强调数据的重要性,因此提出了一个数据中心的评估框架,即DataCOPE。DataCOPE可以回答评估目标策略的问题,即是否可以评估目标策略,以及评估结果的准确程度。DataCOPE的主要特点包括:1. 无需环境访问,可以预测OPE算法的总性表现,特别是在实际部署前,评估OPE是不可能的时候。2. 可以在数据集中identify不准确的子组,从而避免在评估中出现偏导的问题。3. 允许对OPE问题进行数据集或数据采集策略的评估。我们对DataCOPE在日志上下文ual bandit设置中进行了实验,使用了医疗数据集,并证明了它能够评估机器学习和人类专家政策,如医疗指南。
Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training
results: 通过多个benchmark测试,证明MC-CoT可以显著提高多模态理解模型的表现, même使用较小的基础模型,可以达到与大型模型相同的表现水平Abstract
Multimodal reasoning is a challenging task that requires models to reason across multiple modalities to answer questions. Existing approaches have made progress by incorporating language and visual modalities into a two-stage reasoning framework, separating rationale generation from answer inference. However, these approaches often fall short due to the inadequate quality of the generated rationales. In this work, we delve into the importance of rationales in model reasoning. We observe that when rationales are completely accurate, the model's accuracy significantly improves, highlighting the need for high-quality rationale generation. Motivated by this, we propose MC-CoT, a self-consistency training strategy that generates multiple rationales and answers, subsequently selecting the most accurate through a voting process. This approach not only enhances the quality of generated rationales but also leads to more accurate and robust answers. Through extensive experiments, we demonstrate that our approach significantly improves model performance across various benchmarks. Remarkably, we show that even smaller base models, when equipped with our proposed approach, can achieve results comparable to those of larger models, illustrating the potential of our approach in harnessing the power of rationales for improved multimodal reasoning. The code is available at https://github.com/chengtan9907/mc-cot.
摘要
多模态理解是一项复杂的任务,需要模型能够跨多种模式来回答问题。现有的方法已经做出了进步,通过将语言和视觉modalities integrating into a two-stage reasoning framework,分开了理由生成和答案推理。然而,这些方法经常因为生成的理由质量不够而失败。在这项工作中,我们深入探讨了理由在模型理解中的重要性。我们发现,当理由完全准确时,模型的准确率会明显提高,这 highlights the need for high-quality rationale generation. 驱动于这一点,我们提出了 MC-CoT,一种自适应训练策略,通过生成多个理由和答案,并将其选择为最准确的答案。这种方法不仅提高了生成的理由质量,还导致更加准确和稳定的答案。通过广泛的实验,我们证明了我们的方法可以在多个标准底下显著提高模型性能。吸引人的是,我们显示了使用我们提出的方法,甚至小型基础模型也可以达到与大型模型相当的性能,这说明了我们的方法在挖掘 rationales 的力量上的潜力。代码可以在 中找到。
results: 研究发现,GPT-4、3.5和3等模型具有英语和 протестан领教派国家的文化价值。 however, the mitigation strategy only reduces cultural bias in recent models for some countries/territories, but not for all.I hope this helps! Let me know if you have any other questions.Abstract
Culture fundamentally shapes people's reasoning, behavior, and communication. Generative artificial intelligence (AI) technologies may cause a shift towards a dominant culture. As people increasingly use AI to expedite and even automate various professional and personal tasks, cultural values embedded in AI models may bias authentic expression. We audit large language models for cultural bias, comparing their responses to nationally representative survey data, and evaluate country-specific prompting as a mitigation strategy. We find that GPT-4, 3.5 and 3 exhibit cultural values resembling English-speaking and Protestant European countries. Our mitigation strategy reduces cultural bias in recent models but not for all countries/territories. To avoid cultural bias in generative AI, especially in high-stakes contexts, we suggest using culture matching and ongoing cultural audits.
摘要
文化深深影响人们的思维、行为和communication。生成人工智能技术可能导致一种主导文化。随着人们越来越使用AI快速和自动完成不同的职业和个人任务,AI模型中嵌入的文化价值可能干扰原始表达。我们对大语言模型进行文化偏见审核,比较其响应与国家代表性调查数据,并评估国家特定的提示作为缓解策略。我们发现GPT-4、3.5和3都具有类似英语和新教欧洲国家的文化价值。我们的缓解策略可以减少最新模型中的文化偏见,但不适用于所有国家/地区。为了避免生成AI中的文化偏见,特别在高风险上,我们建议使用文化匹配和持续文化审核。
PortfolioMentor: Multimodal Generative AI Companion for Learning and Crafting Interactive Digital Art Portfolios
for: This paper is written for design students who struggle to translate their creative ideas into tangible codes and designs, with a focus on providing tailored resources and academic support for non-technical art students.
methods: The paper presents a coding companion chatbot called PortfolioMentor, which guides and collaborates with students through proactive suggestions and responsible Q&As for learning, inspiration, and support.
results: The system synthesizes the artist’s visions, visual illustrations, audio or music suggestions, click-scroll effects, and creative vision conceptualization into a polished interactive digital portfolio.Abstract
Digital art portfolios serve as impactful mediums for artists to convey their visions, weaving together visuals, audio, interactions, and narratives. However, without technical backgrounds, design students often find it challenging to translate creative ideas into tangible codes and designs, given the lack of tailored resources for the non-technical, academic support in art schools, and a comprehensive guiding tool throughout the mentally demanding process. Recognizing the role of companionship in code learning and leveraging generative AI models' capabilities in supporting creative tasks, we present PortfolioMentor, a coding companion chatbot for IDEs. This tool guides and collaborates with students through proactive suggestions and responsible Q&As for learning, inspiration, and support. In detail, the system starts with the understanding of the task and artist's visions, follows the co-creation of visual illustrations, audio or music suggestions and files, click-scroll effects for interactions, and creative vision conceptualization, and finally synthesizes these facets into a polished interactive digital portfolio.
摘要
数字艺术端folio serves as a powerful medium for artists to convey their visions, combining visuals, audio, interactions, and narratives. However, without technical backgrounds, design students often find it difficult to translate their creative ideas into tangible codes and designs, due to the lack of tailored resources for non-technical students in art schools, comprehensive guiding tools, and the mentally demanding process. Recognizing the importance of companionship in code learning and leveraging generative AI models' capabilities in supporting creative tasks, we present PortfolioMentor, a coding companion chatbot for IDEs. This tool guides and collaborates with students through proactive suggestions and responsible Q&As for learning, inspiration, and support. In detail, the system starts with understanding the task and artist's visions, co-creating visual illustrations, audio or music suggestions and files, click-scroll effects for interactions, and creative vision conceptualization, and finally synthesizes these facets into a polished interactive digital portfolio.
AI-Generated Images Introduce Invisible Relevance Bias to Text-Image Retrieval
for: This paper investigates the issue of invisible relevance bias in cross-modal retrieval, particularly when AI-generated images are present in the search results.
methods: The authors construct a benchmark to explore the existence of the bias and conduct extensive experiments to reveal that AI-generated images introduce an invisible relevance bias to text-image retrieval models.
results: The study shows that text-image retrieval models tend to rank AI-generated images higher than real images, even though the AI-generated images do not exhibit more visually relevant features to the query. The inclusion of AI-generated images in the training data of the retrieval models exacerbates the invisible relevance bias.Abstract
With the advancement of generation models, AI-generated content (AIGC) is becoming more realistic, flooding the Internet. A recent study suggests that this phenomenon has elevated the issue of source bias in text retrieval for web searches. Specifically, neural retrieval models tend to rank generated texts higher than human-written texts. In this paper, we extend the study of this bias to cross-modal retrieval. Firstly, we successfully construct a suitable benchmark to explore the existence of the bias. Subsequent extensive experiments on this benchmark reveal that AI-generated images introduce an invisible relevance bias to text-image retrieval models. Specifically, our experiments show that text-image retrieval models tend to rank the AI-generated images higher than the real images, even though the AI-generated images do not exhibit more visually relevant features to the query than real images. This invisible relevance bias is prevalent across retrieval models with varying training data and architectures. Furthermore, our subsequent exploration reveals that the inclusion of AI-generated images in the training data of the retrieval models exacerbates the invisible relevance bias. The above phenomenon triggers a vicious cycle, which makes the invisible relevance bias become more and more serious. To elucidate the potential causes of invisible relevance and address the aforementioned issues, we introduce an effective training method aimed at alleviating the invisible relevance bias. Subsequently, we apply our proposed debiasing method to retroactively identify the causes of invisible relevance, revealing that the AI-generated images induce the image encoder to embed additional information into their representation. This information exhibits a certain consistency across generated images with different semantics and can make the retriever estimate a higher relevance score.
摘要
results: 与现状级方法相当的 metric scores 在 Salicon 和 MIT300 测试benchmark上Abstract
We present a novel approach for saliency prediction in images, leveraging parallel decoding in transformers to learn saliency solely from fixation maps. Models typically rely on continuous saliency maps, to overcome the difficulty of optimizing for the discrete fixation map. We attempt to replicate the experimental setup that generates saliency datasets. Our approach treats saliency prediction as a direct set prediction problem, via a global loss that enforces unique fixations prediction through bipartite matching and a transformer encoder-decoder architecture. By utilizing a fixed set of learned fixation queries, the cross-attention reasons over the image features to directly output the fixation points, distinguishing it from other modern saliency predictors. Our approach, named Saliency TRansformer (SalTR), achieves metric scores on par with state-of-the-art approaches on the Salicon and MIT300 benchmarks.
摘要
我团队提出了一种新的眩潜预测方法,利用并行解码器来学习眩潜,直接从定点图中学习眩潜。大多数模型通常采用连续眩潜地图来超越定点图优化的困难。我们尝试复制实验设置生成眩潜数据集的方式。我们的方法将眩潜预测视为直接集体预测问题,通过全球损失来强制唯一定点预测,并使用稳定学习的定点查询来直接输出定点。与其他现代眩潜预测器不同,我们的方法使用固定的学习定点查询,跨处理器的自注意力来直接输出定点。我们的方法,命名为眩潜变换(SalTR),在Salicon和MIT300benchmark上达到了与现代方法相当的度量分数。
Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision
results: 我们的方法使用了轻量级架构,却可以与其他单点指导下的方法竞争,在DOTA/DIOR/HRSC datasets上获得了41.05%/27.62%/80.01%的表现。Abstract
With the rapidly increasing demand for oriented object detection (OOD), recent research involving weakly-supervised detectors for learning rotated box (RBox) from the horizontal box (HBox) has attracted more and more attention. In this paper, we explore a more challenging yet label-efficient setting, namely single point-supervised OOD, and present our approach called Point2RBox. Specifically, we propose to leverage two principles: 1) Synthetic pattern knowledge combination: By sampling around each labelled point on the image, we transfer the object feature to synthetic visual patterns with the known bounding box to provide the knowledge for box regression. 2) Transform self-supervision: With a transformed input image (e.g. scaled/rotated), the output RBoxes are trained to follow the same transformation so that the network can perceive the relative size/rotation between objects. The detector is further enhanced by a few devised techniques to cope with peripheral issues, e.g. the anchor/layer assignment as the size of the object is not available in our point supervision setting. To our best knowledge, Point2RBox is the first end-to-end solution for point-supervised OOD. In particular, our method uses a lightweight paradigm, yet it achieves a competitive performance among point-supervised alternatives, 41.05%/27.62%/80.01% on DOTA/DIOR/HRSC datasets.
摘要
受到对 Oriented Object Detection (OOD) 的需求不断增长的研究,最近的一些研究探讨了使用弱监督的检测器来学习旋转盒 (RBox) 从水平盒 (HBox)。在这篇论文中,我们探讨了更加挑战性强,但 Label-efficient 的设置,即单点监督 OOD。我们提出了两个原则:1. 综合使用生成的视觉模式知识:通过在每个标注点上采样图像,将对象特征传递到生成的视觉模式中,以提供盒形 regression 的知识。2. 变换自我监督:使用变换输入图像(例如缩放/旋转),输出 RBoxes 在不同的变换下进行训练,使网络可以感受到对象之间的相对大小/旋转关系。我们还提出了一些解决邻域问题的技术,例如对锚点/层次分配的问题,因为在我们的点监督设置下,对象大小不可用。我们认为 Point2RBox 是首个点监督 OOD 的综合解决方案。具体来说,我们的方法采用轻量级的思路,然而它在点监督相关的选择中达到了竞争性的性能,分别在 DOTA/DIOR/HRSC 数据集上得到了 41.05%/27.62%/80.01% 的表现。
PointOBB: Learning Oriented Object Detection via Single Point Supervision
results: 实验结果表明,PointOBB方法在DIOR-R和DOTA-v1.0数据集上表现出色,与可能的基于单点的参考模型相比,具有显著的性能优势。Abstract
Single point-supervised object detection is gaining attention due to its cost-effectiveness. However, existing approaches focus on generating horizontal bounding boxes (HBBs) while ignoring oriented bounding boxes (OBBs) commonly used for objects in aerial images. This paper proposes PointOBB, the first single Point-based OBB generation method, for oriented object detection. PointOBB operates through the collaborative utilization of three distinctive views: an original view, a resized view, and a rotated/flipped (rot/flp) view. Upon the original view, we leverage the resized and rot/flp views to build a scale augmentation module and an angle acquisition module, respectively. In the former module, a Scale-Sensitive Consistency (SSC) loss is designed to enhance the deep network's ability to perceive the object scale. For accurate object angle predictions, the latter module incorporates self-supervised learning to predict angles, which is associated with a scale-guided Dense-to-Sparse (DS) matching strategy for aggregating dense angles corresponding to sparse objects. The resized and rot/flp views are switched using a progressive multi-view switching strategy during training to achieve coupled optimization of scale and angle. Experimental results on the DIOR-R and DOTA-v1.0 datasets demonstrate that PointOBB achieves promising performance, and significantly outperforms potential point-supervised baselines.
摘要
单点监督对象检测正在吸引注意力,因为它的成本效益很高。然而,现有的方法都是生成水平 bounding box (HBB),而忽略了从航空图像中常见的 oriented bounding box (OBB)。本文提出了 PointOBB,单点基于 OBB 生成方法,用于 oriented 对象检测。PointOBB 通过三种不同的视图进行协同使用:原始视图、缩放视图和旋转/翻转 (rot/flp) 视图。在原始视图上,我们利用缩放视图和旋转/翻转视图建立了一个缩放增强模块,用于提高深度网络对对象大小的感知。为了准确预测对象角度,我们在这个模块中使用了无监督学习来预测角度,并与权重导向的 dense-to-sparse (DS) 匹配策略来聚合稠密的角度。在缩放视图和旋转/翻转视图之间进行交互的多视图交换策略来实现对scale和角度的相互优化。实验结果表明,PointOBB 在 DIOR-R 和 DOTA-v1.0 数据集上达到了出色的性能,并与单点监督基线相比显著提高。
Shortcut Bias Mitigation via Ensemble Diversity Using Diffusion Probabilistic Models
results: 通过DPMs生成图像中新的特征组合,提高模型的多样性和通用性。Abstract
Spurious correlations in the data, where multiple cues are predictive of the target labels, often lead to a phenomenon known as simplicity bias, where a model relies on erroneous, easy-to-learn cues while ignoring reliable ones. In this work, we propose an ensemble diversification framework exploiting Diffusion Probabilistic Models (DPMs) for shortcut bias mitigation. We show that at particular training intervals, DPMs can generate images with novel feature combinations, even when trained on images displaying correlated input features. We leverage this crucial property to generate synthetic counterfactuals to increase model diversity via ensemble disagreement. We show that DPM-guided diversification is sufficient to remove dependence on primary shortcut cues, without a need for additional supervised signals. We further empirically quantify its efficacy on several diversification objectives, and finally show improved generalization and diversification performance on par with prior work that relies on auxiliary data collection.
摘要
<>通过使用扩散概率模型(DPM),我们提出了一种多样性增强框架,以避免简单的寻索偏见。我们发现,在特定的训练 интерVALS,DPM 可以生成具有新的特征组合的图像,即使图像中的输入特征相关。我们利用这一特性,生成假数据,以增加模型多样性。我们证明,通过DPM 导航的多样性增强是可以消除主要寻索偏见的,无需额外的指导信号。我们进一步测试了这种方法的效果,并证明其与先前基于辅助数据收集的方法相当。>>>
paper_authors: Zixuan Hu, Li Shen, Zhenyi Wang, Yongxian Wei, Baoyuan Wu, Chun Yuan, Dacheng Tao for:这篇论文旨在提高数据自由元学习(DFML)的效率,使其可以快速学习新任务,不需要原始训练数据。methods:这篇论文提出了一种robust DFML框架,通过在一个具有内存 Buffer 的紧凑任务集中进行任务 interpolating来保证任务分布的稳定性。此外,论文还引入了一种自动模型选择机制,通过在元训练阶段使用Policy Gradient算法来优化每个模型的可靠性。results:实验结果表明,该框架能够有效地 Mitigate Task-Distribution Shift 和 Task-Distribution Corruption,并在实际应用中提高 DFML 的效果。Abstract
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data. Existing inversion-based DFML methods construct pseudo tasks from a learnable dataset, which is inversely generated from the pre-trained model pool. For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift (TDS) and Task-Distribution Corruption (TDC). TDS leads to a biased meta-learner because of the skewed task distribution towards newly generated tasks. TDC occurs when untrusted models characterized by misleading labels or poor quality pollute the task distribution. To tackle these issues, we introduce a robust DFML framework that ensures task distributional robustness. We propose to meta-learn from a pseudo task distribution, diversified through task interpolation within a compact task-memory buffer. This approach reduces the meta-learner's overreliance on newly generated tasks by maintaining consistent performance across a broader range of interpolated memory tasks, thus ensuring its generalization for unseen tasks. Additionally, our framework seamlessly incorporates an automated model selection mechanism into the meta-training phase, parameterizing each model's reliability as a learnable weight. This is optimized with a policy gradient algorithm inspired by reinforcement learning, effectively addressing the non-differentiable challenge posed by model selection. Comprehensive experiments across various datasets demonstrate the framework's effectiveness in mitigating TDS and TDC, underscoring its potential to improve DFML in real-world scenarios.
摘要
“数据自由元学习(DFML)旨在高效地学习新任务,无需原始训练数据。现有的倒推基于DFML方法构建假任务从learnable dataset,这是由先前训练的模型池生成的反向推导的。我们为首次揭示了两大实际应用中阻碍因素:任务分布偏移(TDS)和任务分布损害(TDC)。TDS会导致偏向新生成任务的元学习器,因为生成任务的分布偏移。TDC发生在不可靠的模型中,它们具有误导性的标签或低质量,从而损害任务分布。为解决这些问题,我们提出了一种robust DFML框架,确保任务分布的稳定性。我们提议在 pseudo 任务分布上进行元学习,通过任务混合在一个减小的任务记忆缓存中来增强任务分布的多样性。这种方法可以减少元学习器对新生成任务的过分依赖,从而确保其在未看到的任务上的普适性。此外,我们的框架可以自然地 интеGRATE一种自动模型选择机制到元训练阶段,这是通过学习权重来parameterize每个模型的可靠性。这个问题可以通过一种基于奖励学习的策略算法来解决,这有效地 Addresses the non-differentiable challenge posed by model selection。实验结果表明,我们的框架可以有效地mitigate TDS和TDC,强调其在实际场景中的应用潜力。”
Towards Explainable Strategy Templates using NLP Transformers
for: bridges the gap between mathematical heuristic strategies learned from Deep Reinforcement Learning (DRL) and comprehensible, natural language explanations, making these strategies more accessible to non-experts.
methods: leverages traditional Natural Language Processing (NLP) techniques and Large Language Models (LLMs) equipped with Transformers to transform parts of DRL strategies into user-friendly, human-like English narratives.
results: presents a top-level algorithm that involves parsing mathematical expressions of strategy templates, semantically interpreting variables and structures, generating rule-based primary explanations, and utilizing a Generative Pre-trained Transformer (GPT) model to refine and contextualize these explanations, with subsequent customization for varied audiences and meticulous validation processes in an example illustrating the approach’s applicability and potential.Abstract
This paper bridges the gap between mathematical heuristic strategies learned from Deep Reinforcement Learning (DRL) in automated agent negotiation, and comprehensible, natural language explanations. Our aim is to make these strategies more accessible to non-experts. By leveraging traditional Natural Language Processing (NLP) techniques and Large Language Models (LLMs) equipped with Transformers, we outline how parts of DRL strategies composed of parts within strategy templates can be transformed into user-friendly, human-like English narratives. To achieve this, we present a top-level algorithm that involves parsing mathematical expressions of strategy templates, semantically interpreting variables and structures, generating rule-based primary explanations, and utilizing a Generative Pre-trained Transformer (GPT) model to refine and contextualize these explanations. Subsequent customization for varied audiences and meticulous validation processes in an example illustrate the applicability and potential of this approach.
摘要
Here is the text in Simplified Chinese:这篇论文旨在将深度游戏学习(DRL)中自动代理人的数学决策策略带到更加易懂的自然语言解释。我们的目标是让这些策略更加容易被非专家理解。我们利用传统的自然语言处理(NLP)技术和大语言模型(LLM)Equipped with Transformers,我们详细介绍了如何将DRL策略的部分拟合成为人类化的英文 narra。为达到这个目标,我们提出了一个总级算法,该算法包括解析策略模板中的数学表达、semantic地理解变量和结构、生成规则基于的首要解释,以及使用Generative Pre-trained Transformer(GPT)模型来细化和contextualize these explanations。随后,我们适应不同的听众和仔细验证过程,以示该方法的可行性和潜力。
Identification for Tree-shaped Structural Causal Models in Polynomial Time
for: This paper is written to study linear structural causal models (SCMs) and develop algorithms for identifying the causal parameters from correlations between the nodes in SCMs.
methods: The paper uses a PSPACE-algorithm and a randomized polynomial-time algorithm to solve the identification problem for tree-shaped SCMs, and provides fractional affine square root terms of polynomials (FASTPs) for the corresponding parameters.
results: The paper presents a randomized polynomial-time algorithm that can identify the causal parameters in tree-shaped SCMs, and determines whether each parameter is generically identifiable, generically 2-identifiable, or generically unidentifiable.Abstract
Linear structural causal models (SCMs) are used to express and analyse the relationships between random variables. Direct causal effects are represented as directed edges and confounding factors as bidirected edges. Identifying the causal parameters from correlations between the nodes is an open problem in artificial intelligence. In this paper, we study SCMs whose directed component forms a tree. Van der Zander et al. (AISTATS'22, PLMR 151, pp. 6770--6792, 2022) give a PSPACE-algorithm for the identification problem in this case, which is a significant improvement over the general Gr\"obner basis approach, which has doubly-exponential time complexity in the number of structural parameters. In this work, we present a randomized polynomial-time algorithm, which solves the identification problem for tree-shaped SCMs. For every structural parameter, our algorithms decides whether it is generically identifiable, generically 2-identifiable, or generically unidentifiable. (No other cases can occur.) In the first two cases, it provides one or two fractional affine square root terms of polynomials (FASTPs) for the corresponding parameter, respectively.
摘要
Van der Zander et al. (AISTATS'22, PLMR 151, pp. 6770--6792, 2022) give a PSPACE-algorithm for the identification problem in this case, which is a significant improvement over the general Gr\"obner basis approach, which has doubly-exponential time complexity in the number of structural parameters.In this work, we present a randomized polynomial-time algorithm that solves the identification problem for tree-shaped SCMs. For every structural parameter, our algorithm decides whether it is generically identifiable, generically 2-identifiable, or generically unidentifiable. In the first two cases, it provides one or two fractional affine square root terms of polynomials (FASTPs) for the corresponding parameter, respectively.
Assessing the Impact of Noise on Quantum Neural Networks: An Experimental Analysis
paper_authors: Erik B. Terres Escudero, Danel Arias Alamo, Oier Mentxaka Gómez, Pablo García Bringas
for: 这 paper 旨在研究量子神经网络 (QNNs) 在各种噪声模型下的性能影响。
methods: 这 paper 使用 Mottonen 状态准备算法进行研究,并对多层 QNNs 的量子状态倒退进行分析。
results: 这 paper 发现噪声会导致 QNNs 性能下降,并证明了噪声模型对量子计算的挑战。Abstract
In the race towards quantum computing, the potential benefits of quantum neural networks (QNNs) have become increasingly apparent. However, Noisy Intermediate-Scale Quantum (NISQ) processors are prone to errors, which poses a significant challenge for the execution of complex algorithms or quantum machine learning. To ensure the quality and security of QNNs, it is crucial to explore the impact of noise on their performance. This paper provides a comprehensive analysis of the impact of noise on QNNs, examining the Mottonen state preparation algorithm under various noise models and studying the degradation of quantum states as they pass through multiple layers of QNNs. Additionally, the paper evaluates the effect of noise on the performance of pre-trained QNNs and highlights the challenges posed by noise models in quantum computing. The findings of this study have significant implications for the development of quantum software, emphasizing the importance of prioritizing stability and noise-correction measures when developing QNNs to ensure reliable and trustworthy results. This paper contributes to the growing body of literature on quantum computing and quantum machine learning, providing new insights into the impact of noise on QNNs and paving the way towards the development of more robust and efficient quantum algorithms.
摘要
在量子计算领域,量子神经网络(QNN)的潜在利益已经越来越明显。然而,中等规模量子计算(NISQ)处理器受到错误的影响,对复杂的算法或量子机器学习的执行 pose significant challenge。为保证 QNN 的质量和安全,需要探索噪声对其性能的影响。这篇论文对 QNN 在不同噪声模型下的 Mottonen 状态准备算法进行了全面的分析,并研究了多层 QNN 中 quantum state 的异常损耗。此外,论文还评估了噪声对预训练 QNN 的影响,并指出了量子计算中噪声模型的挑战。这些发现对量子软件的开发有重要的影响,强调在开发 QNN 时需要优先级置于稳定性和噪声纠正措施,以确保可靠和可信worthy的结果。这篇论文的发现对量子计算和量子机器学习的成长体系有重要的贡献,为开发更加稳定和高效的量子算法提供新的意见和方向。
results: 实现了高效的通信压缩和隐私保护,在标准5G网络下实现了对device-only解决方案的300%的吞吐量提高和对A100 GPU的80%的性能提高。Abstract
End users face a choice between privacy and efficiency in current Large Language Model (LLM) service paradigms. In cloud-based paradigms, users are forced to compromise data locality for generation quality and processing speed. Conversely, edge device paradigms maintain data locality but fail to deliver satisfactory performance. In this work, we propose a novel LLM service paradigm that distributes privacy-sensitive computation on edge devices and shared computation in the cloud. Only activations are transmitted between the central cloud and edge devices to ensure data locality. Our core innovation, PrivateLoRA, addresses the challenging communication overhead by exploiting the low rank of residual activations, achieving over 95% communication reduction. Consequently, PrivateLoRA effectively maintains data locality and is extremely resource efficient. Under standard 5G networks, PrivateLoRA achieves throughput over 300% of device-only solutions for 7B models and over 80% of an A100 GPU for 33B models. PrivateLoRA also provides tuning performance comparable to LoRA for advanced personalization. Our approach democratizes access to state-of-the-art generative AI for edge devices, paving the way for more tailored LLM experiences for the general public. To our knowledge, our proposed framework is the first efficient and privacy-preserving LLM solution in the literature.
摘要
(Simplified Chinese translation)用户面临着隐私和效率之间的选择,在当前的大语言模型(LLM)服务方案中。在云基础设施中,用户被迫为了生成质量和处理速度而 compromise 数据地域性。相反,边缘设备方案保持数据地域性,但是无法提供满意的性能。在这种工作中,我们提出了一种新的 LLM 服务方案,将隐私敏感计算分布在边缘设备和共享计算在云中。只有活动被传输 между中央云和边缘设备,以确保数据地域性。我们的核心创新,PrivateLoRA,利用 residual 活动的低级别来解决困难的通信开销,实现了超过 95% 的通信减少。因此,PrivateLoRA 能够保持数据地域性,同时具有极高的资源效率。在标准 5G 网络下,PrivateLoRA 可以实现更 чем 300% 的设备Only 解决方案的throughput,并且可以达到 A100 GPU 的80%。 PrivateLoRA 还提供了与 LoRA 相当的进退具有调整性能。我们的方法通过将state-of-the-art 生成 AI 技术带到边缘设备,为普通用户提供了更个性化的 LLM 经验。根据我们所知,我们提出的框架是 литературе中首次提出的有效和隐私保护的 LLM 解决方案。
Continual Learning of Diffusion Models with Generative Distillation
results: 论文表明,使用Generative Distillation方法可以在不断学习中提高Diffusion模型的性能,但需要只有moderate幂等增加计算成本。Abstract
Diffusion models are powerful generative models that achieve state-of-the-art performance in tasks such as image synthesis. However, training them demands substantial amounts of data and computational resources. Continual learning would allow for incrementally learning new tasks and accumulating knowledge, thus reusing already trained models would be possible. One potentially suitable approach is generative replay, where a copy of a generative model trained on previous tasks produces synthetic data that are interleaved with data from the current task. However, standard generative replay applied to diffusion models results in a catastrophic loss in denoising capabilities. In this paper, we propose generative distillation, an approach that distils the entire reverse process of a diffusion model. We demonstrate that our approach significantly improves the continual learning performance of generative replay with only a moderate increase in the computational costs.
摘要
输入文本翻译为简化中文:Diffusion models是强大的生成模型,可以在图像生成等任务中达到状态级表现。然而,它们的训练需要大量的数据和计算资源。不断学习可以让模型逐渐学习新任务和积累知识,因此可以重用已经训练过的模型。一种可能有利可图的方法是生成回放,其中一个拷贝了已经训练过的生成模型生成的 sintetic 数据与当前任务的数据相间。但标准的生成回放方法应用于扩散模型会导致降低噪声能力。在这篇论文中,我们提出了生成凝聚,一种方法,它借鉴了扩散模型的反向过程的整体凝聚。我们示出了我们的方法可以在只有 moderate 的计算成本增加的情况下显著提高不断学习性能。
When Side-Channel Attacks Break the Black-Box Property of Embedded Artificial Intelligence
results: 该论文通过实验表明,使用这种攻击方法可以在黑盒子情况下,高效地生成敌意例子,并且可以在不同的网络架构下进行攻击。Abstract
Artificial intelligence, and specifically deep neural networks (DNNs), has rapidly emerged in the past decade as the standard for several tasks from specific advertising to object detection. The performance offered has led DNN algorithms to become a part of critical embedded systems, requiring both efficiency and reliability. In particular, DNNs are subject to malicious examples designed in a way to fool the network while being undetectable to the human observer: the adversarial examples. While previous studies propose frameworks to implement such attacks in black box settings, those often rely on the hypothesis that the attacker has access to the logits of the neural network, breaking the assumption of the traditional black box. In this paper, we investigate a real black box scenario where the attacker has no access to the logits. In particular, we propose an architecture-agnostic attack which solve this constraint by extracting the logits. Our method combines hardware and software attacks, by performing a side-channel attack that exploits electromagnetic leakages to extract the logits for a given input, allowing an attacker to estimate the gradients and produce state-of-the-art adversarial examples to fool the targeted neural network. Through this example of adversarial attack, we demonstrate the effectiveness of logits extraction using side-channel as a first step for more general attack frameworks requiring either the logits or the confidence scores.
摘要
人工智能和深度神经网络(DNN)在过去的一代快速崛起为多种任务,从特定的广告到物体检测。DNN的表现使得它们成为了 kritical 的嵌入系统,需要高效和可靠。然而,DNN也面临着针对性的攻击,即恶意示例(adversarial examples)。在过去的研究中,人们提出了在黑盒设置下实施这些攻击的方案,但这些方案通常假设攻击者有访问神经网络的logits的权限,这打破了传统的黑盒假设。在这篇论文中,我们调查了一种真正的黑盒场景,在这种场景下,攻击者没有访问神经网络的logits的权限。我们提出了一种 architecture-agnostic 攻击,解决这个约束,通过使用电磁遍历攻击来提取logits。我们的方法结合硬件和软件攻击,通过对给定的输入进行电磁遍历攻击,以获取logits,并且通过估算Gradient来生成高效的恶意示例,欺骗目标神经网络。通过这个例子,我们示示了使用电磁遍历攻击来提取logits的效果,作为更通用的攻击框架的第一步。
Direct Preference-Based Evolutionary Multi-Objective Optimization with Dueling Bandit
results: 实验结果表明,该方法可以快速地适应用户的偏好,并在多目标演化算法中实现了高效的搜索。Abstract
Optimization problems find widespread use in both single-objective and multi-objective scenarios. In practical applications, users aspire for solutions that converge to the region of interest (ROI) along the Pareto front (PF). While the conventional approach involves approximating a fitness function or an objective function to reflect user preferences, this paper explores an alternative avenue. Specifically, we aim to discover a method that sidesteps the need for calculating the fitness function, relying solely on human feedback. Our proposed approach entails conducting direct preference learning facilitated by an active dueling bandit algorithm. The experimental phase is structured into three sessions. Firstly, we assess the performance of our active dueling bandit algorithm. Secondly, we implement our proposed method within the context of Multi-objective Evolutionary Algorithms (MOEAs). Finally, we deploy our method in a practical problem, specifically in protein structure prediction (PSP). This research presents a novel interactive preference-based MOEA framework that not only addresses the limitations of traditional techniques but also unveils new possibilities for optimization problems.
摘要
优化问题在单目标和多目标场景中都有广泛的应用。在实际应用中,用户希望得到折衔到 интере点区域(ROI)的解决方案,同时满足用户的偏好。而 convential 方法通常是通过 aproximating 一个fitness函数或一个目标函数来反映用户的偏好。本文探讨了一种alternative 的方法,即通过直接学习用户偏好来解决问题。我们提议的方法是通过活跃对抗bandit算法来实现direct preference learning。实验阶段分为三个 sessio。首先,我们评估我们的活跃对抗bandit算法的性能。其次,我们在多目标演化算法(MOEA)中实现我们的提议方法。最后,我们在蛋白结构预测(PSP)中应用我们的方法,以展示一种新的互动式偏好基于MOEA框架,不仅解决了传统方法的局限性,还探讨了新的优化问题的可能性。
Learning Dynamic Selection and Pricing of Out-of-Home Deliveries
paper_authors: Fabian Akkerman, Peter Dieter, Martijn Mes for: 这篇论文的目的是解决最后一英里物流的问题,即门派送失败、交通拥堵和处理时间过长,这些外部因素会增加物流总成本的28%和温室气体排放的25%。methods: 这篇论文使用了动态选择和定价策略(DSPO),一种基于人工智能的算法管道,来模型客户选择行为。DSPO使用了一种新的空间-时间状态编码,并使用 convolutional neural network 进行预测。results: 对比三种现有方法,DSPO 可以节省20.8%的成本,比无 OOH 位置情况更高,比静态选择和定价策略高8.1%,并且比一种 state-of-the-art 需求管理标准高4.6%。这些结果提供了对 OOH 交付动态选择和定价策略的深入理解,并建议实施者采用这种策略,随着 OOH 交付市场份额的增加。Abstract
Home delivery failures, traffic congestion, and relatively large handling times have a negative impact on the profitability of last-mile logistics. These external factors contribute to up to $28\%$ of the overall costs and $25\%$ of emissions for the home delivery supply chain. A potential solution, showing annual growth rates up to $36\%$, is the delivery to parcel lockers or parcel shops, denoted by out-of-home (OOH) delivery. In the academic literature, models of customer behavior with respect to OOH delivery were so far limited to deterministic settings, contrasting with the stochastic nature of actual customer choices. We model the sequential decision-making problem of which OOH location to offer against what incentive for each incoming customer, taking into account future customer arrivals and choices. We propose Dynamic Selection and Pricing of OOH (DSPO), an algorithmic pipeline that uses a novel spatial-temporal state encoding as input to a convolutional neural network. We demonstrate the performance of our method by benchmarking it against three state-of-the-art approaches. Our extensive numerical study, guided by real-world data, reveals that DSPO can save $20.8\%$ in costs compared to a situation without OOH locations, $8.1\%$ compared to a static selection and pricing policy, and $4.6\%$ compared to a state-of-the-art demand management benchmark. We provide comprehensive insights into the complex interplay between OOH delivery dynamics and customer behavior influenced by pricing strategies. The implications of our findings suggest practitioners to adopt dynamic selection and pricing policies as OOH delivery gains a larger market share.
摘要
last-mile logistics的营运成本受到家配送失败、交通堵塞和较长的处理时间的负面影响,这些外部因素可以占到总成本的28%和25%的排放。一个可能的解决方案是通过物流到快递箱或快递店(简称OOH)进行交付。在学术文献中,OOH交付的顾客行为模型曾被限定为决定性设定,与实际顾客选择的杂乱性相对。我们模型了顾客到达时的选择决策,考虑到未来的顾客到达和选择。我们提出了动态选择和价格(DSPO)算法批处,使用一种新的空间-时间状态编码作为输入,并使用卷积神经网络进行分析。我们通过对三种现有方法进行比较,证明了DSPO的性能。我们的广泛的数学研究,受到实际数据指导,表明DSPO可以将成本降低20.8%,比无OOH位置情况更好28.1%,并且比静态选择和价格策略更好8.1%。我们提供了详细的折衔分析,揭示了OOH交付动态选择和价格策略对顾客行为的影响,并提供了实践者采用动态选择和价格策略的建议。
Probabilistic Tree-of-thought Reasoning for Answering Knowledge-intensive Complex Questions
results: 实验结果表明,ProbTree方法在三个复杂问题集上比前一个最佳方法表现出色,显示了probabilistic tree-of-thought reasoning的效果。Abstract
Large language models (LLMs) are capable of answering knowledge-intensive complex questions with chain-of-thought (CoT) reasoning. However, they tend to generate factually incorrect reasoning steps when the required knowledge is not available or up-to-date in models' parameters. Recent works turn to retrieving external knowledge to augment CoT reasoning. Despite being promising, these chain-based methods suffer from: 1) Negative retrieval. Unnecessary or incorrect retrieval may mislead the reasoning; 2) Limited sight. Lacking the ability to look backward or forward, a local error in one step will propagate along the chain. In this paper, we propose a novel approach: Probabilistic Tree-of-thought Reasoning (ProbTree). First, LLMs translate a complex question into a query tree, in which each non-root node denotes a sub-question of its parent node. Then, probabilistic reasoning is conducted over the tree, by solving questions from leaf to root considering the confidence of both question decomposing and answering. During reasoning, for leaf nodes, LLMs choose a more confident answer from Closed-book QA that employs parametric knowledge and Open-book QA that employs retrieved external knowledge, thus eliminating the negative retrieval problem. For non-leaf nodes, with the hierarchical structure, LLMs have broader sights and are able to globally reason with the information from child nodes, thus recovering from local errors. The experiments on three Complex QA datasets under the open-domain setting show that our approach outperforms SOTA methods significantly, demonstrating the effect of probabilistic tree-of-thought reasoning.
摘要
大型语言模型(LLM)可以回答复杂问题,并使用链式(CoT)逻辑进行解释。然而,当需要的知识不在模型参数中时,它们往往会产生错误的逻辑步骤。现有的链基方法是将外部知识加入链式中,以增强逻辑能力。然而,这些链基方法受到以下两个问题:1)负面检索。不必要或错误的检索可能会诱导逻辑;2)局部视野。这些方法缺乏回看前进的能力,因此当一个步骤发生错误时,错误将在链式中传播。在这篇论文中,我们提出了一种新的方法:可信度链式思维推理(ProbTree)。首先,LLMs将复杂问题转换为一个查询树,其中每个非根节点表示父节点的子问题。然后,我们使用可信度推理,将问题由叶节点至根节点解释,并考虑问题的可信度。在推理过程中,对叶节点,LLMs从关闭书签QA和开放书签QA中选择更信任的答案,因此解决负面检索问题。对非根节点,这些链式结构允许LLMs在儿子节点中具有更广泛的视野,因此能够对全局信息进行全面的推理,从而获得更好的效果。我们在三个复杂问题集上进行了开放领域的实验,结果显示,我们的方法与顶尖方法相比,表现出色,证明了可信度链式思维推理的效用。
Cluster trajectory of SOFA score in predicting mortality in sepsis
paper_authors: Yuhe Ke, Matilda Swee Sun Tang, Celestine Jia Ling Loh, Hairil Rizal Abdullah, Nicholas Brian Shannon
For: 这项研究旨在调查在初期72小时ICU admit期间SOFA分数的动态变化与患者结果之间的关系。* Methods: 研究使用了组合时间戳拼接和k-means分组分析,从ICU admit期间的SOFA分数动态变化中挖掘出不同的轨迹模式。* Results: 研究发现,ICU和医院mortality最高的群组是 persistently elevated scores(群组D),而且ICU和医院的停留时间最长。医院内转床率最初相似,但逐渐下降的群组C在医院内转床率较低。I hope this helps! Let me know if you have any further questions.Abstract
Objective: Sepsis is a life-threatening condition. Sequential Organ Failure Assessment (SOFA) score is commonly used to assess organ dysfunction and predict ICU mortality, but it is taken as a static measurement and fails to capture dynamic changes. This study aims to investigate the relationship between dynamic changes in SOFA scores over the first 72 hours of ICU admission and patient outcomes. Design, setting, and participants: 3,253 patients in the Medical Information Mart for Intensive Care IV database who met the sepsis-3 criteria and were admitted from the emergency department with at least 72 hours of ICU admission and full-active resuscitation status were analysed. Group-based trajectory modelling with dynamic time warping and k-means clustering identified distinct trajectory patterns in dynamic SOFA scores. They were subsequently compared using Python. Main outcome measures: Outcomes including hospital and ICU mortality, length of stay in hospital and ICU, and readmission during hospital stay, were collected. Discharge time from ICU to wards and cut-offs at 7-day and 14-day were taken. Results: Four clusters were identified: A (consistently low SOFA scores), B (rapid increase followed by a decline in SOFA scores), C (higher baseline scores with gradual improvement), and D (persistently elevated scores). Cluster D had the longest ICU and hospital stays, highest ICU and hospital mortality. Discharge rates from ICU were similar for Clusters A and B, while Cluster C had initially comparable rates but a slower transition to ward. Conclusion: Monitoring dynamic changes in SOFA score is valuable for assessing sepsis severity and treatment responsiveness.
摘要
目标:脓毒是一种可能致命的疾病。脓毒失血评估(SOFA)分数通常用于评估器官功能退化和预测ICU死亡率,但它被视为静态测量,而不能捕捉动态变化。这项研究的目的是研究在ICU admit期间的首72小时内SOFA分数的动态变化与病人结果之间的关系。设计、场景和参与者:这项研究使用了医疗信息 mart for Intensive Care IV数据库中的3,253名患者,他们符合脓毒-3 критериria,从急诊室入院并在ICU中接受了全面抢救。使用群体轨迹模型、动态时间折叠和k-means分组,研究人员发现了不同的轨迹模式在动态SOFA分数中。他们之后使用Python进行比较。主要结果计量:研究人员收集了医院和ICU死亡率、医院和ICU的治疗时间、在医院内重新入院率以及转移时间。医院内ICU转移时间和7天、14天的分割点也被记录。结论:监测SOFA分数的动态变化有价值,可以评估脓毒严重程度和治疗效果。 clusters A(consistently low SOFA scores),B(rapid increase followed by a decline in SOFA scores),C(higher baseline scores with gradual improvement)和D(persistently elevated scores)。 cluster D had the longest ICU and hospital stays, highest ICU and hospital mortality。 transferred from ICU to wards at similar rates as clusters A and B, while cluster C had initially comparable rates but a slower transition to ward。
Deep Interactive Segmentation of Medical Images: A Systematic Review and Taxonomy
paper_authors: Zdravko Marinov, Paul F. Jäger, Jan Egger, Jens Kleesiek, Rainer Stiefelhagen for:This paper provides a comprehensive overview of the emerging field of interactive segmentation in medical image analysis, with a focus on deep learning-based approaches.methods:The paper reviews 121 methods proposed in the medical imaging domain, providing a systematic analysis of current practices and identifying challenges and opportunities in the field.results:The paper discusses the challenges of comparing different methods due to the lack of standardized baselines and benchmarks, highlighting the need for further research in this area.Abstract
Interactive segmentation is a crucial research area in medical image analysis aiming to boost the efficiency of costly annotations by incorporating human feedback. This feedback takes the form of clicks, scribbles, or masks and allows for iterative refinement of the model output so as to efficiently guide the system towards the desired behavior. In recent years, deep learning-based approaches have propelled results to a new level causing a rapid growth in the field with 121 methods proposed in the medical imaging domain alone. In this review, we provide a structured overview of this emerging field featuring a comprehensive taxonomy, a systematic review of existing methods, and an in-depth analysis of current practices. Based on these contributions, we discuss the challenges and opportunities in the field. For instance, we find that there is a severe lack of comparison across methods which needs to be tackled by standardized baselines and benchmarks.
摘要
协作分割是医疗影像分析中的一个关键研究领域,旨在提高costly annotations的效率,通过人类反馈。这种反馈可以是点击、畅推或面Mask,并允许iterative refinement模型输出,以便有效导引系统到所需的行为。在过去的几年中,深度学习基于的方法在医疗影像领域取得了很大进步,已经推动了该领域的快速发展,总计121种方法在医疗影像领域中提出。本文提供了这个emerging领域的结构化概述,并进行了系统性的方法评议和深入分析。根据这些贡献,我们讨论了该领域的挑战和机遇。例如,我们发现了一种严重的对比方法的不足,需要通过标准化基准和benchmarks来解决这个问题。
Human Machine Co-Creation. A Complementary Cognitive Approach to Creative Character Design Process Using GANs
paper_authors: Mohammad Lataifeh, Xavier A Carrascoa, Ashraf M Elnagara, Naveed Ahmeda, Imran Junejo for: This research aims to create a complementary codesign process between humans and machines to augment character designers’ abilities in visualizing and creating new characters for multimedia projects.methods: The proposed approach uses Generative Adversarial Networks (GANs) to generate new visual content, and the machine-generated concepts are used as a launching platform for character designers to conceptualize new characters.results: The discussed results substantiate the value of the proposed co-creation framework and elucidate how the generated concepts are used as cognitive substances that interact with designers’ competencies in a versatile manner to influence the creative processes of conceptualizing novel characters.Abstract
Recent advances in Generative Adversarial Networks GANs applications continue to attract the attention of researchers in different fields. In such a framework, two neural networks compete adversely to generate new visual contents indistinguishable from the original dataset. The objective of this research is to create a complementary codesign process between humans and machines to augment character designers abilities in visualizing and creating new characters for multimedia projects such as games and animation. Driven by design cognitive scaffolding, the proposed approach aims to inform the process of perceiving, knowing, and making. The machine generated concepts are used as a launching platform for character designers to conceptualize new characters. A labelled dataset of 22,000 characters was developed for this work and deployed using different GANs to evaluate the most suited for the context, followed by mixed methods evaluation for the machine output and human derivations. The discussed results substantiate the value of the proposed cocreation framework and elucidate how the generated concepts are used as cognitive substances that interact with designers competencies in a versatile manner to influence the creative processes of conceptualizing novel characters.
摘要
Driven by design cognitive scaffolding, the proposed approach aims to inform the process of perceiving, knowing, and making. The machine-generated concepts are used as a starting point for character designers to conceptualize new characters. A labeled dataset of 22,000 characters was developed for this work and deployed using different GANs to evaluate the most suitable for the context. The results were evaluated using mixed methods to assess the machine output and human derivations.The discussed results demonstrate the value of the proposed co-creation framework and illustrate how the generated concepts are used as cognitive substances that interact with designers' competencies in a versatile manner to influence the creative processes of conceptualizing novel characters.
Learning Uniform Clusters on Hypersphere for Deep Graph-level Clustering
results: 根据八个知名数据集的实验结果,UDGC比state-of-the-art模型有更好的表现,实现了更好的cluster分布和避免cluster collapse问题。Abstract
Graph clustering has been popularly studied in recent years. However, most existing graph clustering methods focus on node-level clustering, i.e., grouping nodes in a single graph into clusters. In contrast, graph-level clustering, i.e., grouping multiple graphs into clusters, remains largely unexplored. Graph-level clustering is critical in a variety of real-world applications, such as, properties prediction of molecules and community analysis in social networks. However, graph-level clustering is challenging due to the insufficient discriminability of graph-level representations, and the insufficient discriminability makes deep clustering be more likely to obtain degenerate solutions (cluster collapse). To address the issue, we propose a novel deep graph-level clustering method called Uniform Deep Graph Clustering (UDGC). UDGC assigns instances evenly to different clusters and then scatters those clusters on unit hypersphere, leading to a more uniform cluster-level distribution and a slighter cluster collapse. Specifically, we first propose Augmentation-Consensus Optimal Transport (ACOT) for generating uniformly distributed and reliable pseudo labels for partitioning clusters. Then we adopt contrastive learning to scatter those clusters. Besides, we propose Center Alignment Optimal Transport (CAOT) for guiding the model to learn better parameters, which further promotes the cluster performance. Our empirical study on eight well-known datasets demonstrates that UDGC significantly outperforms the state-of-the-art models.
摘要
“graph clustering在近年中得到了广泛的研究。然而,现有的graph clustering方法主要关注单个图的节点级别划分,即将单个图的节点分为几个群。相比之下,多个图的划分,即graph-level clustering,尚未得到充分的研究。graph-level clustering在各种实际应用中具有重要意义,如药物性质预测和社交网络社区分析。然而,graph-level clustering受到graph-level表示缺乏充分特征和深度划分容易导致解决方案呈极值现象(即群集坍塌)的影响。为解决这个问题,我们提出了一种新的深度graph-level clustering方法,即均衡深度graph clustering(UDGC)。UDGC将实例均匀分配到不同的群集中,然后将这些群集投射到单位球上,从而导致群集水平分布更加均匀,并避免群集坍塌。具体来说,我们首先提出了增强合理交通(ACOT)来生成具有均匀分布和可靠性的 pseudo标签,用于分区群集。然后,我们采用对比学习来散布这些群集。此外,我们还提出了中心对齐对比交通(CAOT),用于导引模型学习更好的参数,从而进一步提高群集性能。我们的实验表明,UDGC在八个知名数据集上显著超过了当前模型。”
Parameter Exchange for Robust Dynamic Domain Generalization
For: The paper aims to improve the generalization ability of dynamic domain generalization (DDG) models on unknown target domains by disentangling the static and dynamic components more thoroughly from an optimization perspective.* Methods: The proposed method, called Parameter Exchange (PE), perturbs the combination between the static and dynamic components to enable the static component to learn domain-invariant features more comprehensively, while the dynamic component focuses on learning adaptive domain-specific features. The model is optimized using the gradients from both the perturbed and non-perturbed feed-forward jointly.* Results: Extensive experiments show that PE can be easily plugged into existing dynamic networks to improve their generalization ability without bells and whistles, resisting agnostic domain shifts and improving self-adaptability on unknown target domains.Here’s the simplified Chinese text for the three information points:* For: 这篇论文目标是提高未知目标领域中 dynamic domain generalization(DDG)模型的泛化能力。* Methods: 提议方法是 Parameter Exchange(PE),它在 Optimization 角度上更加彻底地分离静态和动态组件,使静态组件更好地学习域无关特征,而动态组件则更专注地学习适应性域pecific特征。* Results: 实验结果表明,PE 可以轻松地插入现有的动态网络中,提高其泛化能力,抵御agnostic域shift和提高 unknown 目标域上的自适应性。Abstract
Agnostic domain shift is the main reason of model degradation on the unknown target domains, which brings an urgent need to develop Domain Generalization (DG). Recent advances at DG use dynamic networks to achieve training-free adaptation on the unknown target domains, termed Dynamic Domain Generalization (DDG), which compensates for the lack of self-adaptability in static models with fixed weights. The parameters of dynamic networks can be decoupled into a static and a dynamic component, which are designed to learn domain-invariant and domain-specific features, respectively. Based on the existing arts, in this work, we try to push the limits of DDG by disentangling the static and dynamic components more thoroughly from an optimization perspective. Our main consideration is that we can enable the static component to learn domain-invariant features more comprehensively by augmenting the domain-specific information. As a result, the more comprehensive domain-invariant features learned by the static component can then enforce the dynamic component to focus more on learning adaptive domain-specific features. To this end, we propose a simple yet effective Parameter Exchange (PE) method to perturb the combination between the static and dynamic components. We optimize the model using the gradients from both the perturbed and non-perturbed feed-forward jointly to implicitly achieve the aforementioned disentanglement. In this way, the two components can be optimized in a mutually-beneficial manner, which can resist the agnostic domain shifts and improve the self-adaptability on the unknown target domain. Extensive experiments show that PE can be easily plugged into existing dynamic networks to improve their generalization ability without bells and whistles.
摘要
agnostic domain shift是模型降低的主要原因,带来了开发Domain Generalization(DG)的紧迫需求。 current advances in DG use dynamic networks to achieve training-free adaptation on unknown target domains, termed Dynamic Domain Generalization(DDG), which compensates for the lack of self-adaptability in static models with fixed weights. The parameters of dynamic networks can be decoupled into a static and a dynamic component, which are designed to learn domain-invariant and domain-specific features, respectively. Based on existing arts, in this work, we try to push the limits of DDG by disentangling the static and dynamic components more thoroughly from an optimization perspective. Our main consideration is that we can enable the static component to learn domain-invariant features more comprehensively by augmenting domain-specific information. As a result, the more comprehensive domain-invariant features learned by the static component can then enforce the dynamic component to focus more on learning adaptive domain-specific features. To this end, we propose a simple yet effective Parameter Exchange(PE)method to perturb the combination between the static and dynamic components. We optimize the model using the gradients from both the perturbed and non-perturbed feed-forward jointly to implicitly achieve the aforementioned disentanglement. In this way, the two components can be optimized in a mutually-beneficial manner, which can resist the agnostic domain shifts and improve the self-adaptability on the unknown target domain. Extensive experiments show that PE can be easily plugged into existing dynamic networks to improve their generalization ability without bells and whistles.
A DRL solution to help reduce the cost in waiting time of securing a traffic light for cyclists
paper_authors: Lucas Magnana, Hervé Rivano, Nicolas Chiabaut
for: 本研究旨在减少 cyclists 等待时间,通过对交通灯车流控制算法进行深度学习优化。
methods: 本研究使用了深度学习算法,并基于车辆计数器数据进行比较。
results: 研究结果显示,使用深度学习算法可以更好地减少车辆等待时间,并且对摩托车流量变化不大强度具有稳定性。Here’s the English version of the paper’s abstract again for reference:”Cyclists prefer to use infrastructure that separates them from motorized traffic. Using a traffic light to segregate car and bike flows, with the addition of bike-specific green phases, is a lightweight and cheap solution that can be deployed dynamically to assess the opportunity of a heavier infrastructure such as a separate bike lane. To compensate for the increased waiting time induced by these new phases, we introduce in this paper a deep reinforcement learning solution that adapts the green phase cycle of a traffic light to the traffic. Vehicle counter data are used to compare the DRL approach with the actuated traffic light control algorithm over whole days. Results show that DRL achieves better minimization of vehicle waiting time at almost all hours. Our DRL approach is also robust to moderate changes in bike traffic.”Abstract
Cyclists prefer to use infrastructure that separates them from motorized traffic. Using a traffic light to segregate car and bike flows, with the addition of bike-specific green phases, is a lightweight and cheap solution that can be deployed dynamically to assess the opportunity of a heavier infrastructure such as a separate bike lane. To compensate for the increased waiting time induced by these new phases, we introduce in this paper a deep reinforcement learning solution that adapts the green phase cycle of a traffic light to the traffic. Vehicle counter data are used to compare the DRL approach with the actuated traffic light control algorithm over whole days. Results show that DRL achieves better minimization of vehicle waiting time at almost all hours. Our DRL approach is also robust to moderate changes in bike traffic. The code of this paper is available at https://github.com/LucasMagnana/A-DRL-solution-to-help-reduce-the-cost-in-waiting-time-of-securing-a-traffic-light-for-cyclists.
摘要
自行车车手偏好使用与机动车辆分离的基建,使用交通灯将汽车和自行车流径分隔,加上自行车特定的绿色阶段,是一种轻量级和便宜的解决方案,可以 dynamically 评估 heavier 基建如自行车专用道的可行性。为了补偿增加 waitting time 的影响,这篇论文提出了一个深度强化学习解决方案,可以适应交通灯的绿色阶段周期。使用车辆计数数据与实际控制算法进行比较,结果显示 DRL 方法能够更好地 minimize 车辆等待时间,且对于轻度的自行车交通变化具有耐性。代码这篇论文可以在 GitHub 上找到:https://github.com/LucasMagnana/A-DRL-solution-to-help-reduce-the-cost-in-waiting-time-of-securing-a-traffic-light-for-cyclists。
General Phrase Debiaser: Debiasing Masked Language Models at a Multi-Token Level
results: 该论文通过使用Wikipedia页面中的特征性短语生成和模型减少stage来减少语言模型中的偏见,并在标准数据集和度量上达到了顶尖的成果,可以减少职业和多个领域中的性别偏见。Abstract
The social biases and unwelcome stereotypes revealed by pretrained language models are becoming obstacles to their application. Compared to numerous debiasing methods targeting word level, there has been relatively less attention on biases present at phrase level, limiting the performance of debiasing in discipline domains. In this paper, we propose an automatic multi-token debiasing pipeline called \textbf{General Phrase Debiaser}, which is capable of mitigating phrase-level biases in masked language models. Specifically, our method consists of a \textit{phrase filter stage} that generates stereotypical phrases from Wikipedia pages as well as a \textit{model debias stage} that can debias models at the multi-token level to tackle bias challenges on phrases. The latter searches for prompts that trigger model's bias, and then uses them for debiasing. State-of-the-art results on standard datasets and metrics show that our approach can significantly reduce gender biases on both career and multiple disciplines, across models with varying parameter sizes.
摘要
社会偏见和不欢迎的刻板印象,由预训练语言模型表明出来,成为其应用的障碍。相比于单词减偏方法的多样性,对话层偏见减少的注意度相对较少,这限制了对话层减少的性能。在这篇论文中,我们提出了一个自动多ток文本减偏管道,称为《通用短语减偏器》,它可以为掩码语言模型中的短语层减少偏见。具体来说,我们的方法包括一个《短语筛选阶段》,生成来自Wikipedia页面的刻板短语,以及一个《模型减偏阶段》,可以在多ток文本层次上减少模型的偏见。后者通过找到模型偏见的触发器,然后使用它们进行减偏。使用标准数据集和度量,我们的方法可以在不同参数大小的模型中显著减少性别偏见,并且在职业和多个领域中减少偏见。
Can Physics Informed Neural Operators Self Improve?
results: 论文发现,通过自适应学习,FNO可以在1D-Burgers和2D-Darcy等典型问题上达到比较高的准确率,而无需使用实际数据。此外,论文还发现了一种使用pseudo-labels进行自适应学习的方法,可以提高PINO的性能。Abstract
Self-training techniques have shown remarkable value across many deep learning models and tasks. However, such techniques remain largely unexplored when considered in the context of learning fast solvers for systems of partial differential equations (Eg: Neural Operators). In this work, we explore the use of self-training for Fourier Neural Operators (FNO). Neural Operators emerged as a data driven technique, however, data from experiments or traditional solvers is not always readily available. Physics Informed Neural Operators (PINO) overcome this constraint by utilizing a physics loss for the training, however the accuracy of PINO trained without data does not match the performance obtained by training with data. In this work we show that self-training can be used to close this gap in performance. We examine canonical examples, namely the 1D-Burgers and 2D-Darcy PDEs, to showcase the efficacy of self-training. Specifically, FNOs, when trained exclusively with physics loss through self-training, approach 1.07x for Burgers and 1.02x for Darcy, compared to FNOs trained with both data and physics loss. Furthermore, we discover that pseudo-labels can be used for self-training without necessarily training to convergence in each iteration. A consequence of this is that we are able to discover self-training schedules that improve upon the baseline performance of PINO in terms of accuracy as well as time.
摘要
自我训练技术在多种深度学习模型和任务上表现出了非凡的价值。然而,这些技术在系统 partial differential equations(PDE)中学习快速解决器时仍然未得到了广泛的探索。在这个工作中,我们探索了使用自我训练来解决 Fourier Neural Operators(FNO)。Neural Operators是一种数据驱动的技术,但是实验或传统的解决器数据不总是可用。Physics Informed Neural Operators(PINO)可以解决这个问题,因为它们使用物理损失来进行训练,但是没有数据的情况下PINO的性能并不如训练数据的情况下。在这个工作中,我们表明了自我训练可以用来弥补这个性能差距。我们使用一些典型的例子,如1D-Burgers和2D-Darcy PDEs,来示cases the efficacy of self-training. Specifically, FNOs, when trained exclusively with physics loss through self-training, approach 1.07x for Burgers and 1.02x for Darcy, compared to FNOs trained with both data and physics loss. Furthermore, we discover that pseudo-labels can be used for self-training without necessarily training to convergence in each iteration. As a result, we are able to discover self-training schedules that improve upon the baseline performance of PINO in terms of accuracy as well as time.
Controlling Large Language Model-based Agents for Large-Scale Decision-Making: An Actor-Critic Approach
results: 通过在系统资源分配和机器人网格运输等实验中的评估,显示了提案的方法具有显著的优势。Abstract
The significant advancements in large language models (LLMs) have presented novel opportunities for tackling planning and decision-making within multi-agent systems. However, as the number of agents increases, the issues of hallucination in LLMs and coordination in multi-agent systems (MAS) have become increasingly pronounced. Additionally, the efficient utilization of tokens becomes a critical consideration when employing LLMs to facilitate the interactions of large numbers of agents. In this paper, we present a novel framework aimed at enhancing coordination and decision-making capabilities of LLMs within large-scale multi-agent environments. Our approach draws inspiration from the actor-critic framework employed in multi-agent reinforcement learning, and we develop a modular and token-efficient solution that effectively addresses challenges presented by LLMs and MAS. Through evaluations conducted in experiments involving system resource allocation and robot grid transportation, we demonstrate the considerable advantages afforded by our proposed approach.
摘要
大型语言模型(LLM)的进步已经为多智能体系统(MAS)中的规划和决策带来了新的机遇。然而,随着智能体数量的增加,LLM中的幻觉问题和多智能体系统中的协调问题变得越来越突出。此外,在使用LLM来促进多个智能体之间的交互时,有效使用token变得非常重要。在这篇论文中,我们提出了一种新的框架,旨在提高LLM在大规模多智能体环境中的协调和决策能力。我们的方法 draws inspiration from multi-agent reinforcement learning中的actor-critic框架,并开发了一种模块化和Token高效的解决方案,有效地解决LLM和MAS中的挑战。通过在系统资源分配和机器人网格运输等实验中进行的评估,我们 demonstate了我们提出的方法具有了很大的优势。
A Multi-solution Study on GDPR AI-enabled Completeness Checking of DPAs
results: 研究发现,使用预训练的BERT和RoBERTa语言模型可以达到最高的F2分(86.7%和89.7%),而深度学习(BiLSTM)和几何学习(SetFit)也可以实现相似的准确率,但是更快速发展。Abstract
Specifying legal requirements for software systems to ensure their compliance with the applicable regulations is a major concern to requirements engineering (RE). Personal data which is collected by an organization is often shared with other organizations to perform certain processing activities. In such cases, the General Data Protection Regulation (GDPR) requires issuing a data processing agreement (DPA) which regulates the processing and further ensures that personal data remains protected. Violating GDPR can lead to huge fines reaching to billions of Euros. Software systems involving personal data processing must adhere to the legal obligations stipulated in GDPR and outlined in DPAs. Requirements engineers can elicit from DPAs legal requirements for regulating the data processing activities in software systems. Checking the completeness of a DPA according to the GDPR provisions is therefore an essential prerequisite to ensure that the elicited requirements are complete. Analyzing DPAs entirely manually is time consuming and requires adequate legal expertise. In this paper, we propose an automation strategy to address the completeness checking of DPAs against GDPR. Specifically, we pursue ten alternative solutions which are enabled by different technologies, namely traditional machine learning, deep learning, language modeling, and few-shot learning. The goal of our work is to empirically examine how these different technologies fare in the legal domain. We computed F2 score on a set of 30 real DPAs. Our evaluation shows that best-performing solutions yield F2 score of 86.7% and 89.7% are based on pre-trained BERT and RoBERTa language models. Our analysis further shows that other alternative solutions based on deep learning (e.g., BiLSTM) and few-shot learning (e.g., SetFit) can achieve comparable accuracy, yet are more efficient to develop.
摘要
在需求工程(RE)中,指定法律要求以确保软件系统符合适用法规是一个主要问题。组织收集的个人数据经常被分享给其他组织进行特定的处理活动。在这种情况下,欧盟一般数据保护条例(GDPR)要求发出数据处理协议(DPA),以规范处理并确保个人数据继续保护。违反GDPR可能导致极高的罚款,达到数十亿欧元。软件系统包含个人数据处理的情况必须遵循GDPR中的法律规定,并根据DPA进行规范。需求工程师可以从DPA中提取法律要求,以便在软件系统中规范数据处理活动。完整性检查DPA根据GDPR规定是一项必要的前提,以确保提取的需求完整。手动检查DPA整体是时间consuming,需要充足的法律知识。在这篇论文中,我们提出自动化策略,以检查DPA的完整性。我们追究10种方案,其中每种方案都是基于不同的技术,如传统机器学习、深度学习、语言模型和几何学习。我们的目标是通过实验检验这些不同技术在法律领域的表现。我们对30个实际DPA进行计算F2分,我们的评估显示最佳解决方案的F2分为86.7%和89.7%,基于预训练BERT和RoBERTa语言模型。我们的分析还显示,基于深度学习(例如BiLSTM)和几何学习(例如SetFit)的解决方案可以实现相似的准确率, yet是更高效的开发。
Minimizing Factual Inconsistency and Hallucination in Large Language Models
results: 在生物科学领域中,使用该框架可以提高对药物相关问题的回答质量,并且可以提高OpenAI GPT-3.5-turbo的准确率和忠实度。同时,对小型开源语言模型进行精细调整可以提高其精度,并与商业模型竞争。Abstract
Large Language Models (LLMs) are widely used in critical fields such as healthcare, education, and finance due to their remarkable proficiency in various language-related tasks. However, LLMs are prone to generating factually incorrect responses or "hallucinations," which can lead to a loss of credibility and trust among users. To address this issue, we propose a multi-stage framework that generates the rationale first, verifies and refines incorrect ones, and uses them as supporting references to generate the answer. The generated rationale enhances the transparency of the answer and our framework provides insights into how the model arrived at this answer, by using this rationale and the references to the context. In this paper, we demonstrate its effectiveness in improving the quality of responses to drug-related inquiries in the life sciences industry. Our framework improves traditional Retrieval Augmented Generation (RAG) by enabling OpenAI GPT-3.5-turbo to be 14-25% more faithful and 16-22% more accurate on two datasets. Furthermore, fine-tuning samples based on our framework improves the accuracy of smaller open-access LLMs by 33-42% and competes with RAG on commercial models.
摘要
methods: 这篇论文使用了多种方法来分析法定要求,包括将法规转化为机器可读格式,Survey existing automated means for enabling compliance verification against regulations,并反思当前法定要求分析的挑战。
results: 这篇论文通过分析法定要求,提出了一些可能的法定要求分析方法,并对GDPR进行了 exemplification。Abstract
Modern software has been an integral part of everyday activities in many disciplines and application contexts. Introducing intelligent automation by leveraging artificial intelligence (AI) led to break-throughs in many fields. The effectiveness of AI can be attributed to several factors, among which is the increasing availability of data. Regulations such as the general data protection regulation (GDPR) in the European Union (EU) are introduced to ensure the protection of personal data. Software systems that collect, process, or share personal data are subject to compliance with such regulations. Developing compliant software depends heavily on addressing legal requirements stipulated in applicable regulations, a central activity in the requirements engineering (RE) phase of the software development process. RE is concerned with specifying and maintaining requirements of a system-to-be, including legal requirements. Legal agreements which describe the policies organizations implement for processing personal data can provide an additional source to regulations for eliciting legal requirements. In this chapter, we explore a variety of methods for analyzing legal requirements and exemplify them on GDPR. Specifically, we describe possible alternatives for creating machine-analyzable representations from regulations, survey the existing automated means for enabling compliance verification against regulations, and further reflect on the current challenges of legal requirements analysis.
摘要
现代软件已经成为多个领域和应用场景的日常活动的一部分。通过利用人工智能(AI),实现了许多领域的breakthrough。AI的效果可以归功于许多因素,其中之一是数据的更加普遍性。例如欧洲联盟(EU)的通用数据保护条例(GDPR)等法规,是为保护个人数据而实施的。软件系统收集、处理或分享个人数据时,必须遵守相关法规。开发符合法规的软件具有重要的法律要求,是软件开发过程中的中心活动。系统需求工程(RE)阶段是软件开发过程中的一个关键阶段,旨在确定和维护系统所需的要求。法律要求是RE的一个重要组成部分。组织实施处理个人数据的政策可以提供一个额外的来源,用于抽取法律要求。在这章中,我们将探讨多种分析法律要求的方法,并将其应用于GDPR。具体来说,我们将介绍可能的法律要求分析方法,评估现有的自动化方法,以及legal requirements分析current challenges。
Challenges of Large Language Models for Mental Health Counseling
results: 论文发现了LMM在心理咨询领域的应用存在许多挑战,包括模型幻想、可读性、偏见、隐私和临床有效性等问题。然而,通过仔细解决这些问题,AI可以在心理咨询领域提供有用的支持。Abstract
The global mental health crisis is looming with a rapid increase in mental disorders, limited resources, and the social stigma of seeking treatment. As the field of artificial intelligence (AI) has witnessed significant advancements in recent years, large language models (LLMs) capable of understanding and generating human-like text may be used in supporting or providing psychological counseling. However, the application of LLMs in the mental health domain raises concerns regarding the accuracy, effectiveness, and reliability of the information provided. This paper investigates the major challenges associated with the development of LLMs for psychological counseling, including model hallucination, interpretability, bias, privacy, and clinical effectiveness. We explore potential solutions to these challenges that are practical and applicable to the current paradigm of AI. From our experience in developing and deploying LLMs for mental health, AI holds a great promise for improving mental health care, if we can carefully navigate and overcome pitfalls of LLMs.
摘要
全球精神健康危机正在迅速增加,有限的资源和寻求治疗的社会偏见,使得人工智能(AI)在心理辅导方面的应用日益受到关注。然而,将大型自然语言模型(LLMs)应用于心理辅导领域,带来了精神健康信息的准确性、有效性和可靠性的担忧。本文探讨了心理辅导领域实施LLMs的主要挑战,包括模型幻视、可读性、偏见、隐私和临床效果。我们探索了解决这些挑战的实用解决方案,并从我们在发展和应用LLMs for mental health的经验中获得了有利的结论:AI在精神健康领域具有巨大的推动力,只要我们能够小心翼节和突破LLMs的问题。
A Cross Attention Approach to Diagnostic Explainability using Clinical Practice Guidelines for Depression
for: This paper aims to address the lack of explainability in Artificial Intelligence-powered analysis of unstructured clinical dialogue, specifically in the context of mental health (MH) and depression diagnosis.
methods: The authors develop a method called ProcesS knowledge-infused cross ATtention (PSAT) that incorporates clinical practice guidelines (CPGs) when computing attention, which enables the model to provide clinician-understandable explanations for classification.
results: The authors evaluate PSAT on three expert-curated datasets related to depression and demonstrate its ability to provide application-relevant explainability, surpassing the performance of nine baseline models. Additionally, the authors show that PSAT can provide explanations where other baselines fall short.Abstract
The lack of explainability using relevant clinical knowledge hinders the adoption of Artificial Intelligence-powered analysis of unstructured clinical dialogue. A wealth of relevant, untapped Mental Health (MH) data is available in online communities, providing the opportunity to address the explainability problem with substantial potential impact as a screening tool for both online and offline applications. We develop a method to enhance attention in popular transformer models and generate clinician-understandable explanations for classification by incorporating external clinical knowledge. Inspired by how clinicians rely on their expertise when interacting with patients, we leverage relevant clinical knowledge to model patient inputs, providing meaningful explanations for classification. This will save manual review time and engender trust. We develop such a system in the context of MH using clinical practice guidelines (CPG) for diagnosing depression, a mental health disorder of global concern. We propose an application-specific language model called ProcesS knowledge-infused cross ATtention (PSAT), which incorporates CPGs when computing attention. Through rigorous evaluation on three expert-curated datasets related to depression, we demonstrate application-relevant explainability of PSAT. PSAT also surpasses the performance of nine baseline models and can provide explanations where other baselines fall short. We transform a CPG resource focused on depression, such as the Patient Health Questionnaire (e.g. PHQ-9) and related questions, into a machine-readable ontology using SNOMED-CT. With this resource, PSAT enhances the ability of models like GPT-3.5 to generate application-relevant explanations.
摘要
因为现有的人工智能分析不具有相关的医疗知识,因此在使用无结构化医疗对话分析方面存在不可靠性问题。然而,在线社区中有巨量的无用的精神健康(MH)数据,这提供了解释问题的机会,并具有巨大的潜在影响力,作为在线和离线应用的检测工具。我们开发了一种方法,通过在受欢迎的转换器模型中插入外部医疗知识来增强注意力,并生成医生理解的解释,以便分类。这将节省人工审核时间,并且增加信任。我们在精神健康方面开发了一种基于临床实践指南(CPG)的应用特定语言模型,称为ProcesS知识插入交叉注意力(PSAT)。我们通过对三个专家精心编辑的精神健康相关数据进行严格评估,示出PSAT在应用上的解释能力。PSAT还超过了九个基线模型的性能,并且在其他基线模型无法提供解释时能够提供解释。我们将精神健康问题的资源,如抑郁症(DEPRESSION)的患者健康问naire(PHQ-9)和相关问题,转化为机器可读的 ontology 使用 SNOMED-CT。通过这个资源,PSAT 使得模型如 GPT-3.5 能够生成应用相关的解释。
results: 本文通过揭示现有方法之间的联系,可能为diffusion模型等sampling方法的推广应用提供帮助,例如降低扩散估计时间和生成样本的不够多样性等问题。Abstract
The number of sampling methods could be daunting for a practitioner looking to cast powerful machine learning methods to their specific problem. This paper takes a theoretical stance to review and organize many sampling approaches in the ``generative modeling'' setting, where one wants to generate new data that are similar to some training examples. By revealing links between existing methods, it might prove useful to overcome some of the current challenges in sampling with diffusion models, such as long inference time due to diffusion simulation, or the lack of diversity in generated samples.
摘要
“sampling方法的数量可能会让实践者感到压力,想要使用强大的机器学习方法来解决特定问题。这篇论文从理论角度来审视和组织许多采样方法,在``生成模型''Setting中,我们想要生成与训练示例相似的新数据。通过把现有方法联系起来,可能会帮助解决 diffusion模型中的推理时间过长或生成样本缺乏多样性的现象。”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.
Exact Combinatorial Optimization with Temporo-Attentional Graph Neural Networks
paper_authors: Mehdi Seyfi, Amin Banitalebi-Dehkordi, Zirui Zhou, Yong Zhang
for: 这篇论文旨在提高分支和约束(Branch and Bound)算法中机器学习(ML)模型的性能。
methods: 论文使用了两种机制来改善B&B算法:时间特征和矩阵关注。
results: 实验结果表明,在使用这两种机制的情况下,B&B算法的性能有所提高,并且可以在几个标准数据集上达到最佳性能。Here is the same information in English:
for: The paper aims to improve the performance of the Branch and Bound algorithm by incorporating machine learning models.
methods: The paper uses two mechanisms to improve the B&B algorithm: temporal features and bipartite graph attention.
results: Experimental results show that the B&B algorithm’s performance is improved when these two mechanisms are used, and it can achieve optimal performance on several standard datasets.Abstract
Combinatorial optimization finds an optimal solution within a discrete set of variables and constraints. The field has seen tremendous progress both in research and industry. With the success of deep learning in the past decade, a recent trend in combinatorial optimization has been to improve state-of-the-art combinatorial optimization solvers by replacing key heuristic components with machine learning (ML) models. In this paper, we investigate two essential aspects of machine learning algorithms for combinatorial optimization: temporal characteristics and attention. We argue that for the task of variable selection in the branch-and-bound (B&B) algorithm, incorporating the temporal information as well as the bipartite graph attention improves the solver's performance. We support our claims with intuitions and numerical results over several standard datasets used in the literature and competitions. Code is available at: https://developer.huaweicloud.com/develop/aigallery/notebook/detail?id=047c6cf2-8463-40d7-b92f-7b2ca998e935
摘要
combinatorial optimization 找到一个最佳解 Within a discrete set of variables and constraints. 这个 Field 在研究和工业中都有很大的进步。在过去十年里,深度学习的成功导致了 combinatorial optimization 中使用机器学习(ML)模型的增加。在这篇论文中,我们investigate two essential aspects of machine learning algorithms for combinatorial optimization:temporal characteristics和 attention。我们 argue that for the task of variable selection in the branch-and-bound(B&B)algorithm,将时间信息和双向图注意力包含在机器学习模型中可以提高算法的性能。我们支持我们的声明 With intuitions and numerical results over several standard datasets used in the literature and competitions. Code is available at: https://developer.huaweicloud.com/develop/aigallery/notebook/detail?id=047c6cf2-8463-40d7-b92f-7b2ca998e935.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard versions of Chinese. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.
HypUC: Hyperfine Uncertainty Calibration with Gradient-boosted Corrections for Reliable Regression on Imbalanced Electrocardiograms
results: 这个研究在处理大量的医疗时间序列数据中表现出色,较前一些传统基准模型的性能。Abstract
The automated analysis of medical time series, such as the electrocardiogram (ECG), electroencephalogram (EEG), pulse oximetry, etc, has the potential to serve as a valuable tool for diagnostic decisions, allowing for remote monitoring of patients and more efficient use of expensive and time-consuming medical procedures. Deep neural networks (DNNs) have been demonstrated to process such signals effectively. However, previous research has primarily focused on classifying medical time series rather than attempting to regress the continuous-valued physiological parameters central to diagnosis. One significant challenge in this regard is the imbalanced nature of the dataset, as a low prevalence of abnormal conditions can lead to heavily skewed data that results in inaccurate predictions and a lack of certainty in such predictions when deployed. To address these challenges, we propose HypUC, a framework for imbalanced probabilistic regression in medical time series, making several contributions. (i) We introduce a simple kernel density-based technique to tackle the imbalanced regression problem with medical time series. (ii) Moreover, we employ a probabilistic regression framework that allows uncertainty estimation for the predicted continuous values. (iii) We also present a new approach to calibrate the predicted uncertainty further. (iv) Finally, we demonstrate a technique to use calibrated uncertainty estimates to improve the predicted continuous value and show the efficacy of the calibrated uncertainty estimates to flag unreliable predictions. HypUC is evaluated on a large, diverse, real-world dataset of ECGs collected from millions of patients, outperforming several conventional baselines on various diagnostic tasks, suggesting a potential use-case for the reliable clinical deployment of deep learning models.
摘要
“医疗时间序列自动分析,如电cardiogram(ECG)、电энцеfalogram(EEG)、脉冲推测等,有可能成为诊断决策的有价值工具,允许远程监测病人并更有效地使用昂贵和时间consuming的医疗程序。深度神经网络(DNNs)已经证明可以有效地处理这些信号。然而,先前的研究主要集中在分类医疗时间序列而不是尝试 regression continuous-valued physiological parameters central to diagnosis。一个重要的挑战在这个方面是数据集的偏好性,因为低频率的异常情况可能导致极其偏好的数据,从而导致不准确的预测和预测中的不确定性。为解决这些挑战,我们提出了 HypUC 框架,这是一种用于医疗时间序列的不对称可能 regression 框架,我们做了以下贡献:(i)我们介绍了一种简单的基于 kernel density 的技术来处理医疗时间序列的不对称 regression 问题。(ii)我们采用了一种 probabilistic regression 框架,允许预测结果中的uncertainty estimation。(iii)我们还提出了一种新的方法来准确化预测结果中的uncertainty estimate。(iv)最后,我们示出了如何使用准确化的uncertainty estimate来改进预测结果,并证明了准确化的uncertainty estimate可以 flag 不可靠的预测。HypUC 在一个大、多样化、真实世界的 ECG 数据集上进行了评估,与多种常见基eline相比,表现出色,这表明了 HypUC 的可能用途是可靠地在临床中部署深度学习模型。”
Fairness-Aware Domain Generalization under Covariate and Dependence Shifts
paper_authors: Chen Zhao, Kai Jiang, Xintao Wu, Haoliang Wang, Latifur Khan, Christan Grant, Feng Chen
for: 本研究旨在Addressing domain shifts while simultaneously considering model fairness in machine learning, particularly in the context of invariant classifier generalization.
results: 我们的方法在四个标准数据集上进行了广泛的实验研究,并证明了我们的方法在unseen domains中的表现优于当前state-of-the-art方法。Abstract
Achieving the generalization of an invariant classifier from source domains to shifted target domains while simultaneously considering model fairness is a substantial and complex challenge in machine learning. Existing domain generalization research typically attributes domain shifts to concept shift, which relates to alterations in class labels, and covariate shift, which pertains to variations in data styles. In this paper, by introducing another form of distribution shift, known as dependence shift, which involves variations in fair dependence patterns across domains, we propose a novel domain generalization approach that addresses domain shifts by considering both covariate and dependence shifts. We assert the existence of an underlying transformation model can transform data from one domain to another. By generating data in synthetic domains through the model, a fairness-aware invariant classifier is learned that enforces both model accuracy and fairness in unseen domains. Extensive empirical studies on four benchmark datasets demonstrate that our approach surpasses state-of-the-art methods.
摘要
通过减少域外域化问题的挑战,我们提出了一种基于依赖shift的领域泛化方法。我们认为,域外域化问题不仅由概念shift和 covariate shift引起,还由依赖shift引起,即在域之间存在不同的公平依赖模式。我们提出了一种基于变换模型的领域泛化方法,通过生成数据在synthetic域中,实现了一个公平性感知的泛化预测器,并且在未知域中保持了模型准确性和公平性。我们在四个标准数据集上进行了广泛的实验,结果表明,我们的方法超过了当前最佳方法。
Mechanical Characterization and Inverse Design of Stochastic Architected Metamaterials Using Neural Operators
results: 结果表明,使用神经操作符和高级微机械实验技术,可以实现 inverse design 的目标结构,并且预测错误在5-10%之间。这种方法可以在数据稀缺的情况下进行设计,并且可以实现设计Complex micro-architected materials with desired properties。Abstract
Machine learning (ML) is emerging as a transformative tool for the design of architected materials, offering properties that far surpass those achievable through lab-based trial-and-error methods. However, a major challenge in current inverse design strategies is their reliance on extensive computational and/or experimental datasets, which becomes particularly problematic for designing micro-scale stochastic architected materials that exhibit nonlinear mechanical behaviors. Here, we introduce a new end-to-end scientific ML framework, leveraging deep neural operators (DeepONet), to directly learn the relationship between the complete microstructure and mechanical response of architected metamaterials from sparse but high-quality in situ experimental data. The approach facilitates the inverse design of structures tailored to specific nonlinear mechanical behaviors. Results obtained from spinodal microstructures, printed using two-photon lithography, reveal that the prediction error for mechanical responses is within a range of 5 - 10%. Our work underscores that by employing neural operators with advanced micro-mechanics experimental techniques, the design of complex micro-architected materials with desired properties becomes feasible, even in scenarios constrained by data scarcity. Our work marks a significant advancement in the field of materials-by-design, potentially heralding a new era in the discovery and development of next-generation metamaterials with unparalleled mechanical characteristics derived directly from experimental insights.
摘要
Education distillation:getting student models to learn in shcools
results: compared with单独的传授算法, combing education distillation strategies with distillation algorithms on the public dataset CIFAR100,Caltech256, Food-101 dataset 的性能有所提高。Abstract
Knowledge distillation is one of the methods for model compression, and existing knowledge distillation techniques focus on how to improve the distillation algorithm so as to enhance the distillation efficiency. This paper introduces dynamic incremental learning into knowledge distillation and proposes a distillation strategy for education distillation. Specifically, it is proposed to take fragmented student models divided from the complete student model as lower-grade models. As the grade level rises, fragmented student models deepen in conjunction with designed teaching reference layers, while learning and distilling from more teacher models. By moving from lower to higher grades, fragmented student models were gradually integrated into a complete target student model, and the performance of the student models gradually improved from lower to higher grades of the stage. Education distillation strategies combined with distillation algorithms outperform the results of single distillation algorithms on the public dataset CIFAR100,Caltech256, Food-101 dataset.
摘要
知识填充是模型压缩的一种方法,现有的知识填充技术主要关注如何改进填充算法以提高填充效率。这篇论文介绍了动态增量学习到知识填充中,并提出了教育填充策略。具体来说,是提出使用 Fragmented student models(分解学生模型)作为lower-grade models。随着学生模型的等级提高,分解学生模型与设计的教学引用层结合进行深化学习和填充,而学习和填充更多的教师模型。在从下到上的等级阶段,分解学生模型逐渐 интеGRATED到了完整的target student model,并且学生模型的性能逐渐提高从下到上的等级阶段。结合填充算法和教育填充策略的结果超出了单独使用填充算法的结果在公共数据集CIFAR100、Caltech256和Food-101数据集上。
Bridging Classical and Quantum Machine Learning: Knowledge Transfer From Classical to Quantum Neural Networks Using Knowledge Distillation
paper_authors: Mohammad Junayed Hasan, M. R. C. Mahdy for: This paper aims to bridge the gap between classical machine learning and emergent quantum computing techniques by transferring knowledge from classical to quantum neural networks using knowledge distillation.methods: The paper introduces a new method of transfer learning through classical-quantum integration using knowledge distillation, where classical convolutional neural network (CNN) architectures like LeNet and AlexNet serve as teacher networks to train student quantum models.results: The approach yields significant performance improvements for the quantum models by solely depending on classical CNNs, with quantum models achieving an average accuracy improvement of 0.80% on the MNIST dataset and 5.40% on the more complex Fashion MNIST dataset.Abstract
Very recently, studies have shown that quantum neural networks surpass classical neural networks in tasks like image classification when a similar number of learnable parameters are used. However, the development and optimization of quantum models are currently hindered by issues such as qubit instability and limited qubit availability, leading to error-prone systems with weak performance. In contrast, classical models can exhibit high-performance owing to substantial resource availability. As a result, more studies have been focusing on hybrid classical-quantum integration. A line of research particularly focuses on transfer learning through classical-quantum integration or quantum-quantum approaches. Unlike previous studies, this paper introduces a new method to transfer knowledge from classical to quantum neural networks using knowledge distillation, effectively bridging the gap between classical machine learning and emergent quantum computing techniques. We adapt classical convolutional neural network (CNN) architectures like LeNet and AlexNet to serve as teacher networks, facilitating the training of student quantum models by sending supervisory signals during backpropagation through KL-divergence. The approach yields significant performance improvements for the quantum models by solely depending on classical CNNs, with quantum models achieving an average accuracy improvement of 0.80% on the MNIST dataset and 5.40% on the more complex Fashion MNIST dataset. Applying this technique eliminates the cumbersome training of huge quantum models for transfer learning in resource-constrained settings and enables re-using existing pre-trained classical models to improve performance.Thus, this study paves the way for future research in quantum machine learning (QML) by positioning knowledge distillation as a core technique for advancing QML applications.
摘要
近些时候,研究表明,量子神经网络在图像分类任务中超过了类传统神经网络,当使用相同数量的可学习参数时。然而,量子模型的开发和优化目前受到芯片不稳定和可用芯片数量的限制,导致错误的系统和弱的性能。相比之下,类传统模型可以达到高性能由于资源的可用性。因此,更多的研究都在注重классические量子结合。一条研究特别关注在传输学习通过类传统-量子结合或量子-量子方法。与前一代研究不同,这篇论文引入了一种新的方法,将类传统神经网络的知识传递给量子神经网络,使用知识填充减少了类传统机器学习和量子计算技术之间的差距。我们采用类传统卷积神经网络(CNN) architecture如LeNet和AlexNet作为教师网络,通过反propagation通过KL散度来训练学生量子模型。这种方法对量子模型产生了显著性能改进,量子模型在MNIST数据集上平均准确率提高0.80%,在更复杂的Fashion MNIST数据集上提高5.40%。通过这种技术,可以消除训练庞大量子模型的负担,并使用现有的预训练类传统模型来提高性能。因此,这种研究为未来的量子机器学习(QML)领域开创了新的可能性。
Enhancing Intrusion Detection In Internet Of Vehicles Through Federated Learning
results: 作者透过使用不同的表现指标和与传统分类器进行比较,证明了提案的方法能够有效地检测侵入,并且可以保护敏感数据。Abstract
Federated learning is a technique of decentralized machine learning. that allows multiple parties to collaborate and learn a shared model without sharing their raw data. Our paper proposes a federated learning framework for intrusion detection in Internet of Vehicles (IOVs) using the CIC-IDS 2017 dataset. The proposed framework employs SMOTE for handling class imbalance, outlier detection for identifying and removing abnormal observations, and hyperparameter tuning to optimize the model's performance. The authors evaluated the proposed framework using various performance metrics and demonstrated its effectiveness in detecting intrusions with other datasets (KDD-Cup 99 and UNSW- NB-15) and conventional classifiers. Furthermore, the proposed framework can protect sensitive data while achieving high intrusion detection performance.
摘要
federated learning是一种分布式机器学习技术,允许多方共同协作学习共享模型,无需分享原始数据。我们的论文提出了基于CIC-IDS 2017 dataset的IOVs Federated Learning框架,该框架采用SMOTE处理类偏移、异常检测 removing 异常观察,并进行参数调整以优化模型性能。作者通过不同的性能指标进行评估,并证明了该框架在使用其他dataset(KDD-Cup 99和UNSW-NB-15)和传统分类器时的护卫功能。此外,该框架还能保护敏感数据,同时实现高度的护卫性能。
Scalable AI Generative Content for Vehicular Network Semantic Communication
results: 比基eline方法更高的检测率,可靠地压缩通信数据,并可以在带宽允许的情况下integrate auxilary informationAbstract
Perceiving vehicles in a driver's blind spot is vital for safe driving. The detection of potentially dangerous vehicles in these blind spots can benefit from vehicular network semantic communication technology. However, efficient semantic communication involves a trade-off between accuracy and delay, especially in bandwidth-limited situations. This paper unveils a scalable Artificial Intelligence Generated Content (AIGC) system that leverages an encoder-decoder architecture. This system converts images into textual representations and reconstructs them into quality-acceptable images, optimizing transmission for vehicular network semantic communication. Moreover, when bandwidth allows, auxiliary information is integrated. The encoder-decoder aims to maintain semantic equivalence with the original images across various tasks. Then the proposed approach employs reinforcement learning to enhance the reliability of the generated contents. Experimental results suggest that the proposed method surpasses the baseline in perceiving vehicles in blind spots and effectively compresses communication data. While this method is specifically designed for driving scenarios, this encoder-decoder architecture also holds potential for wide use across various semantic communication scenarios.
摘要
识别司机盲区内的车辆是安全驾驶的关键。检测可能危险的车辆在盲区内可以通过交通网络semantic通信技术进行检测。然而,有效的semantic通信通常会存在精度和延迟之间的负担,特别是在带宽有限的情况下。这篇论文揭示了一种可扩展的人工智能生成内容(AIGC)系统,该系统利用编码器-解码器架构,将图像转换为文本表示形式,并将其重构为可接受质量的图像,以便为交通网络semantic通信进行优化传输。此外,当带宽允许时,auxiliary信息也会被集成。编码器-解码器旨在维护图像原始Semantic的等价性,并使用反馈学习进行改进。实验结果表明,提议的方法可以超越基准值,准确地检测司机盲区内的车辆,并有效地压缩通信数据。尽管该方法特性是为驾驶场景设计的,但编码器-解码器架构也可能在各种semantic通信场景中广泛应用。
Archiving Body Movements: Collective Generation of Chinese Calligraphy
results: 这篇论文的结果表明,透过这种互动和生成的方法,可以将身体运动转换为字体,并且这种方法可以激发更多的注意力和讨论 concerning Chinese characters和字体。Abstract
As a communication channel, body movements have been widely explored in behavioral studies and kinesics. Performing and visual arts share the same interests but focus on documenting and representing human body movements, such as for dance notation and visual work creation. This paper investigates body movements in oriental calligraphy and how to apply calligraphy principles to stimulate and archive body movements. Through an artwork (Wushu), the authors experiment with an interactive and generative approach to engage the audience's bodily participation and archive the body movements as a compendium of generated calligraphy. The audience assumes the role of both writers and readers; creating ("writing") and appreciating ("reading") the generated calligraphy becomes a cyclical process within this infinite "Book," which can motivate further attention and discussions concerning Chinese characters and calligraphy.
摘要
为了探讨人体运动的表达和记录,通过动作研究和形式学,人们已经广泛利用了身体语言。表演艺术和视觉艺术都具有共同的利益,但它们更关注记录和表现人体运动的方法,如舞蹈notation和视觉创作。这篇论文探讨了中国书法中的身体运动,并如何通过书法原理来刺激和存档身体运动。通过一件艺术作品(武术),作者们实验了一种互动和生成的方法,让观众通过创作(写)和赏析(读)生成的书法来参与并存档身体运动。观众在这个无限的“书”中扮演了作家和读者的双重角色,创造和赏析生成的书法成为了一种循环的过程,可以激发进一步的注意和讨论中国字符和书法。
3D-MIR: A Benchmark and Empirical Study on 3D Medical Image Retrieval in Radiology
paper_authors: Asma Ben Abacha, Alberto Santamaria-Pang, Ho Hin Lee, Jameson Merkow, Qin Cai, Surya Teja Devarakonda, Abdullah Islam, Julia Gong, Matthew P. Lungren, Thomas Lin, Noel C Codella, Ivan Tarapov
results: 这篇论文提供了评估这些搜寻策略的量值和质感分析,并进一步针对未来研究提供了一些建议。Abstract
The increasing use of medical imaging in healthcare settings presents a significant challenge due to the increasing workload for radiologists, yet it also offers opportunity for enhancing healthcare outcomes if effectively leveraged. 3D image retrieval holds potential to reduce radiologist workloads by enabling clinicians to efficiently search through diagnostically similar or otherwise relevant cases, resulting in faster and more precise diagnoses. However, the field of 3D medical image retrieval is still emerging, lacking established evaluation benchmarks, comprehensive datasets, and thorough studies. This paper attempts to bridge this gap by introducing a novel benchmark for 3D Medical Image Retrieval (3D-MIR) that encompasses four different anatomies imaged with computed tomography. Using this benchmark, we explore a diverse set of search strategies that use aggregated 2D slices, 3D volumes, and multi-modal embeddings from popular multi-modal foundation models as queries. Quantitative and qualitative assessments of each approach are provided alongside an in-depth discussion that offers insight for future research. To promote the advancement of this field, our benchmark, dataset, and code are made publicly available.
摘要
随着医疗设备的广泛应用,医生的工作负担增加,但同时也提供了改善医疗效果的机会。三维医疗图像检索可以减轻医生的工作负担,使临床专业人员能够高效搜索相似或相关的病例,以更快、更准确的诊断。然而,三维医疗图像检索领域仍处于初期阶段,缺乏成熔标准、完善的数据集和深入的研究。这篇论文意图填补这个差距,通过引入三维医疗图像检索(3D-MIR)的新标准,涵盖了 computed tomography 图像四种不同的生物结构。使用这个标准,我们探索了多种搜索策略,包括使用汇总的二维slice、三维体和多模态嵌入。每种方法的量化和质量评估,以及深入的讨论,以便对未来研究提供指导。为推动这个领域的发展,我们将标准、数据集和代码公开发布。
Towards Transferable Multi-modal Perception Representation Learning for Autonomy: NeRF-Supervised Masked AutoEncoder
results: 对于多modal和单modal(摄像头只和激光达只)感知模型,NS-MAE学习的表示能够具备优秀的转移性,并在多种3D感知下游任务(3D物体检测和BEV地图分割)中表现出色,只需少量的精度调整标签数据。此外,我们还发现NS-MAE具有机制协调的优势,即masked autoencoder和NeRF的机制相互补偿。Abstract
This work proposes a unified self-supervised pre-training framework for transferable multi-modal perception representation learning via masked multi-modal reconstruction in Neural Radiance Field (NeRF), namely NeRF-Supervised Masked AutoEncoder (NS-MAE). Specifically, conditioned on certain view directions and locations, multi-modal embeddings extracted from corrupted multi-modal input signals, i.e., Lidar point clouds and images, are rendered into projected multi-modal feature maps via neural rendering. Then, original multi-modal signals serve as reconstruction targets for the rendered multi-modal feature maps to enable self-supervised representation learning. Extensive experiments show that the representation learned via NS-MAE shows promising transferability for diverse multi-modal and single-modal (camera-only and Lidar-only) perception models on diverse 3D perception downstream tasks (3D object detection and BEV map segmentation) with diverse amounts of fine-tuning labeled data. Moreover, we empirically find that NS-MAE enjoys the synergy of both the mechanism of masked autoencoder and neural radiance field. Our code shall be released upon acceptance.
摘要
这个工作提出了一种统一的自我超vision批处理框架,用于实现可转移多Modal观察表示学习,通过遮盖多Modal输入信号,例如激光点云和图像,提取的多Modal嵌入被映射到 проекed多Modal特征图像上。然后,原始多Modal信号被用作恢复目标,以便在自我超视批处理中学习表示。广泛的实验表明,通过NS-MAE学习的表示具有优秀的多Modal和单Modal(摄像机只和激光只)观察模型的转移性,在多种3D观察下沉tasks(3D对象检测和BEV地图分割)中,与不同量的精度标注数据进行微调。此外,我们还发现NS-MAE具有masked autoencoder和神经采样场的机制的共同作用,实际上,我们的代码将在接受后发布。
Security and Privacy Challenges in Deep Learning Models
results: 研究发现,深度学习模型在不同阶段都可能受到攻击,导致模型的安全性和数据隐私受到威胁。此外,研究还发现了一些可能的防御策略,可以帮助减少模型的攻击风险。Abstract
These days, deep learning models have achieved great success in multiple fields, from autonomous driving to medical diagnosis. These models have expanded the abilities of artificial intelligence by offering great solutions to complex problems that were very difficult to solve earlier. In spite of their unseen success in various, it has been identified, through research conducted, that deep learning models can be subjected to various attacks that compromise model security and data privacy of the Deep Neural Network models. Deep learning models can be subjected to various attacks at different stages of their lifecycle. During the testing phase, attackers can exploit vulnerabilities through different kinds of attacks such as Model Extraction Attacks, Model Inversion attacks, and Adversarial attacks. Model Extraction Attacks are aimed at reverse-engineering a trained deep learning model, with the primary objective of revealing its architecture and parameters. Model inversion attacks aim to compromise the privacy of the data used in the Deep learning model. These attacks are done to compromise the confidentiality of the model by going through the sensitive training data from the model's predictions. By analyzing the model's responses, attackers aim to reconstruct sensitive information. In this way, the model's data privacy is compromised. Adversarial attacks, mainly employed on computer vision models, are made to corrupt models into confidently making incorrect predictions through malicious testing data. These attacks subtly alter the input data, making it look normal but misleading deep learning models to make incorrect decisions. Such attacks can happen during both the model's evaluation and training phases. Data Poisoning Attacks add harmful data to the training set, disrupting the learning process and reducing the reliability of the deep learning mode.
摘要
Model Extraction Attacks are aimed at reverse-engineering a trained deep learning model, with the primary objective of revealing its architecture and parameters. Model inversion attacks aim to compromise the privacy of the data used in the Deep learning model. These attacks are done to compromise the confidentiality of the model by going through the sensitive training data from the model's predictions. By analyzing the model's responses, attackers aim to reconstruct sensitive information. In this way, the model's data privacy is compromised. Adversarial attacks, mainly employed on computer vision models, are made to corrupt models into confidently making incorrect predictions through malicious testing data. These attacks subtly alter the input data, making it look normal but misleading deep learning models to make incorrect decisions. Such attacks can happen during both the model's evaluation and training phases. Data Poisoning Attacks add harmful data to the training set, disrupting the learning process and reducing the reliability of the deep learning mode.
FinMe: A Performance-Enhanced Large Language Model Trading Agent with Layered Memory and Character Design
results: compared to 其他算法性代理,\textsc{FinMe} 在一个大规模的实际金融数据集上表现出了突出的投资成绩,并且通过自适应其专业知识的方式来自适应新的投资提示,从而不断提高投资回报。Abstract
Recent advancements in Large Language Models (LLMs) have exhibited notable efficacy in question-answering (QA) tasks across diverse domains. Their prowess in integrating extensive web knowledge has fueled interest in developing LLM autonomous agents. While LLMs are efficient in decoding human instructions and deriving solutions by holistically processing historical inputs, transitioning to purpose-driven agents requires a supplementary rational architecture to process multi-source information, establish reasoning chains, and prioritize critical tasks. Addressing this, we introduce \textsc{FinMe}, a novel LLM-based agent framework devised for financial decision-making, encompassing three core modules: Profiling, to outline the agent's characteristics; Memory, with layered processing, to aid the agent in assimilating realistic hierarchical financial data; and Decision-making, to convert insights gained from memories into investment decisions. Notably, \textsc{FinMe}'s memory module aligns closely with the cognitive structure of human traders, offering robust interpretability and real-time tuning. Its adjustable cognitive span allows for the retention of critical information beyond human perceptual limits, thereby enhancing trading outcomes. This framework enables the agent to self-evolve its professional knowledge, react agilely to new investment cues, and continuously refine trading decisions in the volatile financial environment. We first compare \textsc{FinMe} with various algorithmic agents on a scalable real-world financial dataset, underscoring its leading trading performance in stocks and funds. We then fine-tuned the agent's perceptual spans to achieve a significant trading performance. Collectively, \textsc{FinMe} presents a cutting-edge LLM agent framework for automated trading, boosting cumulative investment returns.
摘要
Recent advances in Large Language Models (LLMs) have shown notable effectiveness in question-answering (QA) tasks across diverse domains. Their ability to integrate extensive web knowledge has sparked interest in developing LLM autonomous agents. Although LLMs are efficient in decoding human instructions and deriving solutions by holistically processing historical inputs, transitioning to purpose-driven agents requires a supplementary rational architecture to process multi-source information, establish reasoning chains, and prioritize critical tasks. To address this, we introduce \textsc{FinMe}, a novel LLM-based agent framework for financial decision-making, which includes three core modules:1. Profiling: to outline the agent's characteristics.2. Memory: with layered processing, to aid the agent in assimilating realistic hierarchical financial data.3. Decision-making: to convert insights gained from memories into investment decisions.Notably, \textsc{FinMe}'s memory module closely aligns with the cognitive structure of human traders, offering robust interpretability and real-time tuning. Its adjustable cognitive span allows for the retention of critical information beyond human perceptual limits, thereby enhancing trading outcomes. This framework enables the agent to self-evolve its professional knowledge, react agilely to new investment cues, and continuously refine trading decisions in the volatile financial environment.We first compared \textsc{FinMe} with various algorithmic agents on a scalable real-world financial dataset, highlighting its leading trading performance in stocks and funds. We then fine-tuned the agent's perceptual spans to achieve a significant trading performance. Collectively, \textsc{FinMe} presents a cutting-edge LLM agent framework for automated trading, boosting cumulative investment returns.
OASIS: Offsetting Active Reconstruction Attacks in Federated Learning
paper_authors: Tre’ R. Jeter, Truc Nguyen, Raed Alharbi, My T. Thai
for: 保护用户隐私和提高模型训练效率
methods: 使用图像增强来防御活动重建攻击
results: 实验表明,OASIS可以有效地防御活动重建攻击,同时保持模型性能Here’s a breakdown of each point:
for: The paper is written to address the challenge of active reconstruction attacks in Federated Learning (FL), which can compromise user privacy and threaten the security of model training.
methods: The proposed defense mechanism is based on image augmentation, which is used to undermine the attack principle of gradient inversion.
results: Comprehensive evaluations demonstrate the efficacy of OASIS, highlighting its feasibility as a solution to protect user privacy and enhance model training efficiency in FL.Abstract
Federated Learning (FL) has garnered significant attention for its potential to protect user privacy while enhancing model training efficiency. However, recent research has demonstrated that FL protocols can be easily compromised by active reconstruction attacks executed by dishonest servers. These attacks involve the malicious modification of global model parameters, allowing the server to obtain a verbatim copy of users' private data by inverting their gradient updates. Tackling this class of attack remains a crucial challenge due to the strong threat model. In this paper, we propose OASIS, a defense mechanism based on image augmentation that effectively counteracts active reconstruction attacks while preserving model performance. We first uncover the core principle of gradient inversion that enables these attacks and theoretically identify the main conditions by which the defense can be robust regardless of the attack strategies. We then construct OASIS with image augmentation showing that it can undermine the attack principle. Comprehensive evaluations demonstrate the efficacy of OASIS highlighting its feasibility as a solution.
摘要
methods: 这个研究使用了 five experimental conditions 来收集 hate speech 和不当语言的标注数据,然后使用 BERT 模型进行 fine-tuning,并评估模型在剩下部分的性能和预测。
results: 研究发现,不同的 annotation instrument 设计会导致不同的 hate speech 和不当语言标注分布,以及模型性能和预测的差异。 这些结果告诉我们, annotation instrument 的设计对下游模型性能和预测具有重要的影响。Abstract
When training data are collected from human annotators, the design of the annotation instrument, the instructions given to annotators, the characteristics of the annotators, and their interactions can impact training data. This study demonstrates that design choices made when creating an annotation instrument also impact the models trained on the resulting annotations. We introduce the term annotation sensitivity to refer to the impact of annotation data collection methods on the annotations themselves and on downstream model performance and predictions. We collect annotations of hate speech and offensive language in five experimental conditions of an annotation instrument, randomly assigning annotators to conditions. We then fine-tune BERT models on each of the five resulting datasets and evaluate model performance on a holdout portion of each condition. We find considerable differences between the conditions for 1) the share of hate speech/offensive language annotations, 2) model performance, 3) model predictions, and 4) model learning curves. Our results emphasize the crucial role played by the annotation instrument which has received little attention in the machine learning literature. We call for additional research into how and why the instrument impacts the annotations to inform the development of best practices in instrument design.
摘要
We collected annotations of hate speech and offensive language in five different conditions of an annotation instrument, and randomly assigned annotators to each condition. We then fine-tuned BERT models on each of the five datasets and evaluated their performance on a separate portion of each condition. We found significant differences between the conditions in:1. The share of hate speech/offensive language annotations2. Model performance3. Model predictions4. Model learning curvesOur results highlight the important role played by the annotation instrument, which has received little attention in the machine learning literature. We recommend further research into how and why the instrument impacts the annotations, in order to develop best practices in instrument design.
A Systematic Review of Deep Learning-based Research on Radiology Report Generation
results: 本研究提供了一个全面的深度学习基于医疗影像自由文本描述生成的综述,包括不同方法的研究和比较,以及未来发展趋势的分析。Abstract
Radiology report generation (RRG) aims to automatically generate free-text descriptions from clinical radiographs, e.g., chest X-Ray images. RRG plays an essential role in promoting clinical automation and presents significant help to provide practical assistance for inexperienced doctors and alleviate radiologists' workloads. Therefore, consider these meaningful potentials, research on RRG is experiencing explosive growth in the past half-decade, especially with the rapid development of deep learning approaches. Existing studies perform RRG from the perspective of enhancing different modalities, provide insights on optimizing the report generation process with elaborated features from both visual and textual information, and further facilitate RRG with the cross-modal interactions among them. In this paper, we present a comprehensive review of deep learning-based RRG from various perspectives. Specifically, we firstly cover pivotal RRG approaches based on the task-specific features of radiographs, reports, and the cross-modal relations between them, and then illustrate the benchmark datasets conventionally used for this task with evaluation metrics, subsequently analyze the performance of different approaches and finally offer our summary on the challenges and the trends in future directions. Overall, the goal of this paper is to serve as a tool for understanding existing literature and inspiring potential valuable research in the field of RRG.
摘要
radiology report generation (RRG) 目的是自动生成医疗影像中的自由文本描述,例如胸部X射线图像。 RRG 扮演着临床自动化的重要角色,可以为不熟悉的医生提供实用的帮助,并减轻 радиологи学家的工作负担。因此,研究 RRG 在过去半个 décennial 内经受了极速增长,特别是在深度学习方法的快速发展。现有的研究从不同的角度进行 RRG,包括在不同模式下进行优化报告生成过程,以及在视觉信息和文本信息之间进行交互。在这篇论文中,我们提供了深度学习基于 RRG 的全面回顾,包括任务特定的特征、报告、影像和交互之间的跨模态关系。我们首先介绍了主要的 RRG 方法,然后介绍了通常用于这项任务的基准数据集,评估指标,分析不同方法的性能,并最后提供了未来发展的挑战和趋势。总的来说,本文的目的是为了帮助读者理解现有文献,以及激发可能有价值的研究在 RRG 领域。
Question Answering in Natural Language: the Special Case of Temporal Expressions
results: 我们的评估表明,通过将模式匹配技术应用于时间问题,可以准确地回答问题。Abstract
Although general question answering has been well explored in recent years, temporal question answering is a task which has not received as much focus. Our work aims to leverage a popular approach used for general question answering, answer extraction, in order to find answers to temporal questions within a paragraph. To train our model, we propose a new dataset, inspired by SQuAD, specifically tailored to provide rich temporal information. We chose to adapt the corpus WikiWars, which contains several documents on history's greatest conflicts. Our evaluation shows that a deep learning model trained to perform pattern matching, often used in general question answering, can be adapted to temporal question answering, if we accept to ask questions whose answers must be directly present within a text.
摘要
(Simplified Chinese)尽管普通的问答任务在最近几年得到了广泛的研究,但是时间问答任务尚未得到过相应的关注。我们的工作想要利用通用的问答任务解决方案,即答案提取,以找到文本中的时间问题的答案。为了训练我们的模型,我们提出了一个新的数据集,受到SQuAD的启发,特地设计为提供丰富的时间信息。我们选择了适应 WikiWars 词库,这个词库包含了历史上最大的战争文档。我们的评估表明,通过 Pattern matching 深度学习模型,通常用于普通的问答任务,可以适应时间问答任务,只要我们问的问题的答案必须直接存在于文本中。
Searching for Snippets of Open-Domain Dialogue in Task-Oriented Dialogue Datasets
results: 研究发现,这些训练集已经含有社交对话序列,表明可以将社交对话纳入任务对话中以增强对话效果。Abstract
Most existing dialogue corpora and models have been designed to fit into 2 predominant categories : task-oriented dialogues portray functional goals, such as making a restaurant reservation or booking a plane ticket, while chit-chat/open-domain dialogues focus on holding a socially engaging talk with a user. However, humans tend to seamlessly switch between modes and even use chitchat to enhance task-oriented conversations. To bridge this gap, new datasets have recently been created, blending both communication modes into conversation examples. The approaches used tend to rely on adding chit-chat snippets to pre-existing, human-generated task-oriented datasets. Given the tendencies observed in humans, we wonder however if the latter do not \textit{already} hold chit-chat sequences. By using topic modeling and searching for topics which are most similar to a set of keywords related to social talk, we explore the training sets of Schema-Guided Dialogues and MultiWOZ. Our study shows that sequences related to social talk are indeed naturally present, motivating further research on ways chitchat is combined into task-oriented dialogues.
摘要
现有大多数对话 corpus 和模型都是为两种主导类型设计的:任务强调对话,描述了做一件事情,如预订餐厅或订车票,而另一种是开放领域对话,强调与用户保持社交互动。然而,人类往往在对话中自然地轮换between these two modes,甚至使用社交交流来增强任务 oriented 对话。为了填补这个差距,新的 dataset 已经被创建,将这两种通信模式结合在一起。现有的方法通常是通过添加社交交流的精炼到 pré-existing, 人类生成的任务 oriented dataset 中。从人类的行为来看,我们始终思考是否不是 latter dataset 已经包含了社交交流序列。我们使用主题分析和使用相关于社交交流的关键词进行搜索,我们的研究表明,这些training set 中已经存在社交交流序列,这种现象激励我们进一步研究如何将社交交流与任务 oriented 对话结合在一起。
Enhancing Task-Oriented Dialogues with Chitchat: a Comparative Study Based on Lexical Diversity and Divergence
results: 研究发现,使用不同的增加幽默话语方法可以提高任务对话的多样性和自然性,并且可以减少幽默话语和任务对话之间的重复和预测性。Abstract
As a recent development, task-oriented dialogues (TODs) have been enriched with chitchat in an effort to make dialogues more diverse and engaging. This enhancement is particularly valuable as TODs are often confined to narrow domains, making the mitigation of repetitive and predictable responses a significant challenge. This paper presents a comparative analysis of three chitchat enhancements, aiming to identify the most effective approach in terms of diversity. Additionally, we quantify the divergence between the added chitchat, the original task-oriented language, and chitchat typically found in chitchat datasets, highlighting the top 20 divergent keywords for each comparison. Our findings drive a discussion on future enhancements for augmenting TODs, emphasizing the importance of grounding dialogues beyond the task to achieve more diverse and natural exchanges.
摘要
Recently, task-oriented dialogues (TODs) have been enriched with chitchat to make dialogues more diverse and engaging. This enhancement is particularly valuable as TODs are often confined to narrow domains, making it challenging to mitigate repetitive and predictable responses. This paper presents a comparative analysis of three chitchat enhancements to identify the most effective approach in terms of diversity. Additionally, we quantify the divergence between the added chitchat, the original task-oriented language, and chitchat typically found in chitchat datasets, highlighting the top 20 divergent keywords for each comparison. Our findings drive a discussion on future enhancements for augmenting TODs, emphasizing the importance of grounding dialogues beyond the task to achieve more diverse and natural exchanges.Here's the translation of the text in Traditional Chinese:近期,任务对话 (TOD) 已经被丰富化,以增加对话的多样性和兴趣。这个增值特别重要,因为 TOD 经常受限于狭窄的领域,导致回答很可能会很重复和预测性。这篇 paper 提供了三种增加对话的方法的比较分析,以找出最有效的方法。此外,我们也量化了增加的对话与原始任务语言之间的差异,以及对话 dataset 中常见的对话语言之间的差异,并 highlights 这 20 个最大的差异词汇。我们的发现驱动了未来对 TOD 的增值,强调需要将对话脱离任务,以 achieve 更多样化和自然的交流。
results: 对一系列公开available的VSR模型进行评估,发现其在WildVSR测试集上表现较差,相比LRS3测试集的结果显示出word Error Rates(WER)的增加,这被解释为模型无法泛化轻度更难和在野lip sequences。Abstract
The Lip Reading Sentences-3 (LRS3) benchmark has primarily been the focus of intense research in visual speech recognition (VSR) during the last few years. As a result, there is an increased risk of overfitting to its excessively used test set, which is only one hour duration. To alleviate this issue, we build a new VSR test set named WildVSR, by closely following the LRS3 dataset creation processes. We then evaluate and analyse the extent to which the current VSR models generalize to the new test data. We evaluate a broad range of publicly available VSR models and find significant drops in performance on our test set, compared to their corresponding LRS3 results. Our results suggest that the increase in word error rates is caused by the models inability to generalize to slightly harder and in the wild lip sequences than those found in the LRS3 test set. Our new test benchmark is made public in order to enable future research towards more robust VSR models.
摘要
《嘴巴读写句子-3》(LRS3)标准在过去几年内对视觉语音识别(VSR)进行了激烈的研究。由于这,存在过拟合LRS3测试集的风险增加,这个测试集只有一个小时的时长。为解决这个问题,我们创建了一个新的 VSR测试集 named WildVSR,按照LRS3数据集创建过程进行了仿真。然后,我们评估和分析当前 VSR 模型是否能够通过WildVSR测试集进行泛化。我们评估了一系列公开available的 VSR 模型,并发现它们在我们的测试集中表现出了明显的下降。我们的结果表明,这种下降是由模型无法泛化到WildVSR测试集中的轻微难度和野生嘴巴序列所致。我们新的测试 benchmark 将会公开,以便未来的研究可以帮助开发更加Robust VSR 模型。
Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark
methods: 这个论文使用了一种新的Annotation guide,以涵盖音乐行业的指南,包括折衣、笔迹、后台 vocals 和非语音元素。它还使用了一组新的评价指标,以取代传统的 word error rate。
results: 这个论文的研究结果表明,使用这种新的Annotation guide和评价指标可以更好地评估歌词识别系统的性能,并且可以更好地捕捉歌词中的细节,如rhythm、情感强调、rhyme 和高级结构。Abstract
Current automatic lyrics transcription (ALT) benchmarks focus exclusively on word content and ignore the finer nuances of written lyrics including formatting and punctuation, which leads to a potential misalignment with the creative products of musicians and songwriters as well as listeners' experiences. For example, line breaks are important in conveying information about rhythm, emotional emphasis, rhyme, and high-level structure. To address this issue, we introduce Jam-ALT, a new lyrics transcription benchmark based on the JamendoLyrics dataset. Our contribution is twofold. Firstly, a complete revision of the transcripts, geared specifically towards ALT evaluation by following a newly created annotation guide that unifies the music industry's guidelines, covering aspects such as punctuation, line breaks, spelling, background vocals, and non-word sounds. Secondly, a suite of evaluation metrics designed, unlike the traditional word error rate, to capture such phenomena. We hope that the proposed benchmark contributes to the ALT task, enabling more precise and reliable assessments of transcription systems and enhancing the user experience in lyrics applications such as subtitle renderings for live captioning or karaoke.
摘要
Firstly, we have revised the transcripts specifically for ALT evaluation using a newly created annotation guide that unifies the music industry's guidelines, covering aspects such as punctuation, line breaks, spelling, background vocals, and non-word sounds.Secondly, we have developed a suite of evaluation metrics that are designed to capture these phenomena, unlike the traditional word error rate. We hope that the proposed benchmark will contribute to the ALT task, enabling more precise and reliable assessments of transcription systems and enhancing the user experience in lyrics applications such as subtitle renderings for live captioning or karaoke.
results: 对于不同的数据集和模型,我们的提议方法可以显著提高文本后门攻击的效果,特别是在卫生标Setting下,只需10个毒素污染样本,可以达到高于90%的Attack Success Rate。Abstract
With the boom in the natural language processing (NLP) field these years, backdoor attacks pose immense threats against deep neural network models. However, previous works hardly consider the effect of the poisoning rate. In this paper, our main objective is to reduce the number of poisoned samples while still achieving a satisfactory Attack Success Rate (ASR) in text backdoor attacks. To accomplish this, we propose an efficient trigger word insertion strategy in terms of trigger word optimization and poisoned sample selection. Extensive experiments on different datasets and models demonstrate that our proposed method can significantly improve attack effectiveness in text classification tasks. Remarkably, our approach achieves an ASR of over 90% with only 10 poisoned samples in the dirty-label setting and requires merely 1.5% of the training data in the clean-label setting.
摘要
随着自然语言处理(NLP)领域的发展,深度神经网络模型受到了大量攻击威胁。然而,前一些研究几乎没有考虑毒素率的影响。在这篇论文中,我们的主要目标是降低毒素样本数量,同时仍能在文本后门攻击中实现满意的攻击成功率(ASR)。为达到这一目标,我们提出了一种高效的词干插入策略,包括词干优化和毒素样本选择。经验表明,我们的提议方法可以在不同的数据集和模型上显著提高文本分类任务中的攻击效果。特别是,我们的方法可以在受损标签设定下达到ASR高于90%的目标,只需要10个毒素样本,并且只需要1.5%的训练数据。
MLLM-Bench, Evaluating Multi-modal LLMs using GPT-4V
For: The paper aims to address the challenge of evaluating the efficacy of multi-modal language models (MLLMs) due to the subjective nature of tasks that lack definitive answers.* Methods: The paper introduces MLLM-Bench, a novel benchmark inspired by Vicuna, which spans a diverse array of scenarios, including Perception, Understanding, Applying, Analyzing, Evaluating, and Creation, to provide a more holistic assessment of model performance.* Results: Comparative evaluations indicate a significant performance gap between existing open-source models and GPT-4V, demonstrating the effectiveness of MLLM-Bench in assessing the capabilities of vision-language models.Here are the three key points in Simplified Chinese:* For: 这篇论文目标是解决评估多模态语言模型(MLLMs)的效果困难,由于任务缺乏准确答案而导致。* Methods: 论文提出了基于Vicuna的MLLM-Benchbenchmark,涵盖了多种场景,如感知、理解、应用、分析、评估和创作,以提供更全面的模型性能评估。* Results: 对比评估表明,现有的开源模型与GPT-4V存在显著的性能差距,证明MLLM-Bench的有效性在评估视频语言模型。Abstract
In the pursuit of Artificial General Intelligence (AGI), the integration of vision in language models has marked a significant milestone. The advent of vision-language models (MLLMs) like GPT-4V have expanded AI applications, aligning with the multi-modal capabilities of the human brain. However, evaluating the efficacy of MLLMs poses a substantial challenge due to the subjective nature of tasks that lack definitive answers. Existing automatic evaluation methodologies on multi-modal large language models rely on objective queries that have standard answers, inadequately addressing the nuances of creative and associative multi-modal tasks. To address this, we introduce MLLM-Bench, an innovative benchmark inspired by Vicuna, spanning a diverse array of scenarios, including Perception, Understanding, Applying, Analyzing, Evaluating, and Creation along with the ethical consideration. MLLM-Bench is designed to reflect user experience more accurately and provide a more holistic assessment of model performance. Comparative evaluations indicate a significant performance gap between existing open-source models and GPT-4V. We posit that MLLM-Bench will catalyze progress in the open-source community towards developing user-centric vision-language models that meet a broad spectrum of real-world applications. See online leaderboard in \url{https://mllm-bench.llmzoo.com}.
摘要
在追求人工通用智能(AGI)的探索中,融合视觉在语言模型中的推出标志着一项重要的突破。新一代视觉语言模型(MLLMs)如GPT-4V,扩展了人工智能应用的范围,与人类大脑的多modal能力相匹配。然而,评估MLLMs的效果具有杰出的挑战,因为这些任务缺乏定义的答案。现有的自动评估方法ologies on多modal大语言模型依靠对象查询,无法准确反映用户经验,不能充分考虑创造和相关多modal任务的细节。为解决这一问题,我们介绍MLLM-Bench,一种灵感来自于Vicuna的创新性 benchmark,覆盖了多样化的场景,包括感知、理解、应用、分析、评价和创造等,同时也考虑了伦理考虑。MLLM-Bench设计用于更准确地反映用户经验,提供更全面的模型性能评估。与现有开源模型相比,GPT-4V在比较评估中表现出了显著的性能差距。我们认为MLLM-Bench将推动开源社区的进步,开发用户中心的视觉语言模型,满足广泛的实际应用需求。请参考在线排名 \url{https://mllm-bench.llmzoo.com}。
Exploring Methods for Cross-lingual Text Style Transfer: The Case of Text Detoxification
results: 本研究通过对多个语言的净化任务进行评估,发现了一些有效的跨语言净化策略。同时,本研究还提出了一些新的自动净化评价指标,与之前的标准净化评价指标相比,具有更高的相关性。Abstract
Text detoxification is the task of transferring the style of text from toxic to neutral. While here are approaches yielding promising results in monolingual setup, e.g., (Dale et al., 2021; Hallinan et al., 2022), cross-lingual transfer for this task remains a challenging open problem (Moskovskiy et al., 2022). In this work, we present a large-scale study of strategies for cross-lingual text detoxification -- given a parallel detoxification corpus for one language; the goal is to transfer detoxification ability to another language for which we do not have such a corpus. Moreover, we are the first to explore a new task where text translation and detoxification are performed simultaneously, providing several strong baselines for this task. Finally, we introduce new automatic detoxification evaluation metrics with higher correlations with human judgments than previous benchmarks. We assess the most promising approaches also with manual markup, determining the answer for the best strategy to transfer the knowledge of text detoxification between languages.
摘要
文本净化任务是将敏感文本转换为中性文本。虽然在单语言设置下有许多方法可以达到有希望的结果(dale等,2021;hallinan等,2022),但跨语言传递仍然是一个困难的开放问题(moskovskiy等,2022)。在这项工作中,我们对跨语言文本净化策略进行了大规模的研究,具体来说是在一种语言上有并行净化词库的情况下,将净化能力传递到另一种语言。此外,我们还是首次探讨了同时进行文本翻译和净化的新任务,并提供了许多强大的基线方案。最后,我们还引入了新的自动净化评价指标,与人工评价更高相关性。我们评估了最有前途的方法,并通过手动标记确定了跨语言文本净化知识传递的最佳策略。
Some Like It Small: Czech Semantic Embedding Models for Industry Applications
results: 研究人员通过了对小型模型的内在和外在分析,并证明了这些模型在与较大模型相比具有竞争力,具有约8倍小型和5倍快速速度。此外,研究人员还在 Seznam.cz 搜索引擎中应用了这些嵌入模型,提高了搜索体验,如组织搜索、特色搜索和图像搜索等。Abstract
This article focuses on the development and evaluation of Small-sized Czech sentence embedding models. Small models are important components for real-time industry applications in resource-constrained environments. Given the limited availability of labeled Czech data, alternative approaches, including pre-training, knowledge distillation, and unsupervised contrastive fine-tuning, are investigated. Comprehensive intrinsic and extrinsic analyses are conducted, showcasing the competitive performance of our models compared to significantly larger counterparts, with approximately 8 times smaller size and 5 times faster speed than conventional Base-sized models. To promote cooperation and reproducibility, both the models and the evaluation pipeline are made publicly accessible. Ultimately, this article presents practical applications of the developed sentence embedding models in Seznam.cz, the Czech search engine. These models have effectively replaced previous counterparts, enhancing the overall search experience for instance, in organic search, featured snippets, and image search. This transition has yielded improved performance.
摘要
Translation notes:* "Small-sized" is translated as "小型" (xiǎo xíng) to emphasize the size of the models.* "Czech sentence embedding models" is translated as "捷克句子嵌入模型" (dì kè jù xīn zhī módelǐ) to specify the language and type of models.* "Pre-training" is translated as "预训练" (yù xùn liáo) to emphasize the training process.* "Knowledge distillation" is translated as "知识填充" (zhī shí fán chōng) to emphasize the transfer of knowledge.* "Unsupervised contrastive fine-tuning" is translated as "无监督对比细化" (wú jiān dǎo duì bǐ xiǎo huà) to emphasize the training process and the lack of supervision.* "Intrinsic and extrinsic analyses" is translated as "内在和外在分析" (nèi zài hé wài zài fān yì) to emphasize the different aspects of the models' performance.* "Conventional Base-sized models" is translated as "常规基础模型" (cháng guī jī bù módelǐ) to specify the type of models being compared.* "Practical applications" is translated as "实际应用" (shí jí yìng yòu) to emphasize the use of the models in real-world scenarios.* "Seznam.cz" is translated as "Seznam.cz" to maintain the original name of the search engine.* "Organic search" is translated as "自然搜索" (zì rán sōu zhòu) to emphasize the type of search.* "Featured snippets" is translated as "推荐剪辑" (tuī yù jiǎn piān) to emphasize the type of search result.* "Image search" is translated as "图像搜索" (tú xiàng sōu zhòu) to emphasize the type of search.
Dialogue Quality and Emotion Annotations for Customer Support Conversations
paper_authors: John Mendonça, Patrícia Pereira, Miguel Menezes, Vera Cabarrão, Ana C. Farinha, Helena Moniz, João Paulo Carvalho, Alon Lavie, Isabel Trancoso
results: 这个论文提供了对话质量和情感识别的全面标注方法,并提供了一个valuable的资源 для对话应用程序的发展。Abstract
Task-oriented conversational datasets often lack topic variability and linguistic diversity. However, with the advent of Large Language Models (LLMs) pretrained on extensive, multilingual and diverse text data, these limitations seem overcome. Nevertheless, their generalisability to different languages and domains in dialogue applications remains uncertain without benchmarking datasets. This paper presents a holistic annotation approach for emotion and conversational quality in the context of bilingual customer support conversations. By performing annotations that take into consideration the complete instances that compose a conversation, one can form a broader perspective of the dialogue as a whole. Furthermore, it provides a unique and valuable resource for the development of text classification models. To this end, we present benchmarks for Emotion Recognition and Dialogue Quality Estimation and show that further research is needed to leverage these models in a production setting.
摘要
Task-oriented conversational datasets часто lack topic variability 和 linguistic diversity. However, with the advent of Large Language Models (LLMs) pretrained on extensive, multilingual and diverse text data, these limitations seem overcome. Nevertheless, their generalisability to different languages and domains in dialogue applications remains uncertain without benchmarking datasets. This paper presents a holistic annotation approach for emotion and conversational quality in the context of bilingual customer support conversations. By performing annotations that take into consideration the complete instances that compose a conversation, one can form a broader perspective of the dialogue as a whole. Furthermore, it provides a unique and valuable resource for the development of text classification models. To this end, we present benchmarks for Emotion Recognition and Dialogue Quality Estimation and show that further research is needed to leverage these models in a production setting.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is used in Hong Kong, Taiwan, and other regions.
Grammatical Error Correction via Mixed-Grained Weighted Training
results: 在Seq2Seq和Seq2Edit两种方式下,MainGEC实现了对两个benchmark数据集的性能的持续和 statistically significant 提高,证明了权重杂合训练的效果和优势。Abstract
The task of Grammatical Error Correction (GEC) aims to automatically correct grammatical errors in natural texts. Almost all previous works treat annotated training data equally, but inherent discrepancies in data are neglected. In this paper, the inherent discrepancies are manifested in two aspects, namely, accuracy of data annotation and diversity of potential annotations. To this end, we propose MainGEC, which designs token-level and sentence-level training weights based on inherent discrepancies in accuracy and potential diversity of data annotation, respectively, and then conducts mixed-grained weighted training to improve the training effect for GEC. Empirical evaluation shows that whether in the Seq2Seq or Seq2Edit manner, MainGEC achieves consistent and significant performance improvements on two benchmark datasets, demonstrating the effectiveness and superiority of the mixed-grained weighted training. Further ablation experiments verify the effectiveness of designed weights of both granularities in MainGEC.
摘要
GEC任务的目标是自动 corrections grammatical errors in natural texts. Previous works almost all treat annotated training data equally, but inherent discrepancies in data are neglected. In this paper, the inherent discrepancies are manifested in two aspects, namely, accuracy of data annotation and diversity of potential annotations. To this end, we propose MainGEC, which designs token-level and sentence-level training weights based on inherent discrepancies in accuracy and potential diversity of data annotation, respectively, and then conducts mixed-grained weighted training to improve the training effect for GEC. Empirical evaluation shows that whether in the Seq2Seq or Seq2Edit manner, MainGEC achieves consistent and significant performance improvements on two benchmark datasets, demonstrating the effectiveness and superiority of the mixed-grained weighted training. Further ablation experiments verify the effectiveness of designed weights of both granularities in MainGEC.Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, I can provide that as well.
Lego: Learning to Disentangle and Invert Concepts Beyond Object Appearance in Text-to-Image Diffusion Models
results: 在详细的用户研究中,使用 Lego 方法生成的概念被比基准方法好上限70%。此外,使用大语言模型进行视觉问答也表明,Lego 生成的概念更好地与文本描述相匹配。Abstract
Diffusion models have revolutionized generative content creation and text-to-image (T2I) diffusion models in particular have increased the creative freedom of users by allowing scene synthesis using natural language. T2I models excel at synthesizing concepts such as nouns, appearances, and styles. To enable customized content creation based on a few example images of a concept, methods such as Textual Inversion and DreamBooth invert the desired concept and enable synthesizing it in new scenes. However, inverting more general concepts that go beyond object appearance and style (adjectives and verbs) through natural language, remains a challenge. Two key characteristics of these concepts contribute to the limitations of current inversion methods. 1) Adjectives and verbs are entangled with nouns (subject) and can hinder appearance-based inversion methods, where the subject appearance leaks into the concept embedding and 2) describing such concepts often extends beyond single word embeddings (being frozen in ice, walking on a tightrope, etc.) that current methods do not handle. In this study, we introduce Lego, a textual inversion method designed to invert subject entangled concepts from a few example images. Lego disentangles concepts from their associated subjects using a simple yet effective Subject Separation step and employs a Context Loss that guides the inversion of single/multi-embedding concepts. In a thorough user study, Lego-generated concepts were preferred over 70% of the time when compared to the baseline. Additionally, visual question answering using a large language model suggested Lego-generated concepts are better aligned with the text description of the concept.
摘要
Diffusion models 已经革命化了生成内容的创作,特别是文本到图像(T2I)扩散模型,它们提高了用户的创作自由度,让用户通过自然语言来Synthesize scene。T2I模型在Synthesize概念 such as nouns, appearances, and styles 方面表现出色。为了基于一些概念示例图像来生成个性化内容,方法 such as Textual Inversion 和 DreamBooth 可以将概念反转,并将其Synthesize到新的场景中。然而,通过自然语言来反转更一般的概念(例如 adjectives 和 verbs)仍然是一个挑战。两个概念的特点限制了当前反转方法的能力:1) adjectives 和 verbs 与主语(主题)相互杂糅,使得主题的外观泄露到概念嵌入中,2)描述这些概念通常超过单词嵌入(例如被冰结、走在绳子上等),当前方法无法处理。在本研究中,我们介绍了 Lego,一种特resoled textual inversion 方法,可以将主语杂糅的概念反转。Lego 使用简单 yet effective 的主题分离步骤,并使用 Context Loss 来引导反转单/多嵌入概念。在详细的用户研究中,Lego-生成的概念被比基准70%的时间更好。此外,使用大语言模型进行视觉问答还表明 Lego-生成的概念更好地与文本描述相匹配。
results: 实验结果表明,AdaTyper 可以在新的 semantics 类型和数据分布下提高适应性,并且可以在只看到5个示例后达到0.6的准确率。Abstract
Understanding the semantics of relational tables is instrumental for automation in data exploration and preparation systems. A key source for understanding a table is the semantics of its columns. With the rise of deep learning, learned table representations are now available, which can be applied for semantic type detection and achieve good performance on benchmarks. Nevertheless, we observe a gap between this performance and its applicability in practice. In this paper, we propose AdaTyper to address one of the most critical deployment challenges: adaptation. AdaTyper uses weak-supervision to adapt a hybrid type predictor towards new semantic types and shifted data distributions at inference time, using minimal human feedback. The hybrid type predictor of AdaTyper combines rule-based methods and a light machine learning model for semantic column type detection. We evaluate the adaptation performance of AdaTyper on real-world database tables hand-annotated with semantic column types through crowdsourcing and find that the f1-score improves for new and existing types. AdaTyper approaches an average precision of 0.6 after only seeing 5 examples, significantly outperforming existing adaptation methods based on human-provided regular expressions or dictionaries.
摘要
To address this challenge, we propose AdaTyper, which uses weak-supervision to adapt a hybrid type predictor to new semantic types and shifted data distributions at inference time, using minimal human feedback. The hybrid type predictor combines rule-based methods and a light machine learning model for semantic column type detection.We evaluate the adaptation performance of AdaTyper on real-world database tables hand-annotated with semantic column types through crowdsourcing and find that the f1-score improves for new and existing types. AdaTyper approaches an average precision of 0.6 after only seeing 5 examples, significantly outperforming existing adaptation methods based on human-provided regular expressions or dictionaries.
DaG LLM ver 1.0: Pioneering Instruction-Tuned Language Modeling for Korean NLP
paper_authors: Dongjun Jang, Sangah Lee, Sungjoo Byun, Jinwoong Kim, Jean Seo, Minseok Kim, Soyeon Kim, Chaeyoung Oh, Jaeyoon Kim, Hyemi Jo, Hyopil Shin
results: 经过调整后,DaG LLM在13种不同类别中的41个任务上都达到了优秀的成绩。Abstract
This paper presents the DaG LLM (David and Goliath Large Language Model), a language model specialized for Korean and fine-tuned through Instruction Tuning across 41 tasks within 13 distinct categories.
摘要
这篇论文介绍了DaG LLM(大卫和果利大语言模型),这是一种专门为韩语而设计的语言模型,通过Instruction Tuning在13个不同类别中的41个任务上进行了精细调整。
Transformer-based Named Entity Recognition in Construction Supply Chain Risk Management in Australia
results: 通过分析新闻文章,变换器模型可以提取特定风险种类的实体和相关信息,为澳大建筑供应链风险管理提供更多的有价值信息。Abstract
The construction industry in Australia is characterized by its intricate supply chains and vulnerability to myriad risks. As such, effective supply chain risk management (SCRM) becomes imperative. This paper employs different transformer models, and train for Named Entity Recognition (NER) in the context of Australian construction SCRM. Utilizing NER, transformer models identify and classify specific risk-associated entities in news articles, offering a detailed insight into supply chain vulnerabilities. By analysing news articles through different transformer models, we can extract relevant entities and insights related to specific risk taxonomies local (milieu) to the Australian construction landscape. This research emphasises the potential of NLP-driven solutions, like transformer models, in revolutionising SCRM for construction in geo-media specific contexts.
摘要
澳大利用链接模型进行供应链风险管理(SCRM)是非常重要的。这篇论文使用不同的变换器模型进行命名实体识别(NER),以识别澳大的建筑供应链中的特定风险相关实体。通过分析新闻文章,变换器模型可以提取与特定风险类别相关的实体和洞察,从而为澳大建筑供应链管理提供更加详细的洞察。本研究认为,使用自然语言处理(NLP)技术,如变换器模型,可以在媒体特定的地区(澳大建筑领域)进行革命性的SCRM改进。
results: 研究结果表明,在小 eigenvalues的子空间中,ASGD比SGD更快地衰减偏差错误,而在大 eigenvalues的子空间中,ASGD的偏差错误 decay slower than SGD。此外,ASGD的方差错误总是大于SGD。研究结果表明,当初始化和真实的权重向量之间的差异主要围绕在小 eigenvalues 的子空间时,ASGD可以超过SGD。Abstract
Accelerated stochastic gradient descent (ASGD) is a workhorse in deep learning and often achieves better generalization performance than SGD. However, existing optimization theory can only explain the faster convergence of ASGD, but cannot explain its better generalization. In this paper, we study the generalization of ASGD for overparameterized linear regression, which is possibly the simplest setting of learning with overparameterization. We establish an instance-dependent excess risk bound for ASGD within each eigen-subspace of the data covariance matrix. Our analysis shows that (i) ASGD outperforms SGD in the subspace of small eigenvalues, exhibiting a faster rate of exponential decay for bias error, while in the subspace of large eigenvalues, its bias error decays slower than SGD; and (ii) the variance error of ASGD is always larger than that of SGD. Our result suggests that ASGD can outperform SGD when the difference between the initialization and the true weight vector is mostly confined to the subspace of small eigenvalues. Additionally, when our analysis is specialized to linear regression in the strongly convex setting, it yields a tighter bound for bias error than the best-known result.
摘要
逐步加速 gradient descent(ASGD)是深度学习中的工具之一,通常比SGD更好地泛化性能。然而,现有的优化理论只能解释ASGD更快的涨进,而不能解释其更好的泛化。在这篇论文中,我们研究了透parameterized线性回归中ASGD的泛化性能,这可能是学习过parameterization最简单的情况。我们建立了每个数据协方差矩阵中的实例特定剩余风险边界,我们的分析表明:1. 在小 eigenvalues的子空间中,ASGD比SGD更快地衰减偏差误差,其偏差误差的衰减速率比SGD更快;而在大 eigenvalues的子空间中,ASGD的偏差误差衰减速率比SGD更慢。2. ASGD的偏差误差总是大于SGD的。我们的结果表明,当初始值和真实的权重向量之间的差异主要受到小 eigenvalues的子空间时,ASGD可以超越SGD。此外,当我们的分析特化到线性回归的强Converter Setting时,它提供了更紧的偏差误差边界,比最好的结果更好。
Assumption-lean and Data-adaptive Post-Prediction Inference
results: 我们通过实验和大规模 genomic 数据示范了我们的方法的优越性和可应用性,并证明了它的假设轻量级和数据适应特性可以减少现有的后预测推断方法的效率问题。Abstract
A primary challenge facing modern scientific research is the limited availability of gold-standard data which can be both costly and labor-intensive to obtain. With the rapid development of machine learning (ML), scientists have relied on ML algorithms to predict these gold-standard outcomes with easily obtained covariates. However, these predicted outcomes are often used directly in subsequent statistical analyses, ignoring imprecision and heterogeneity introduced by the prediction procedure. This will likely result in false positive findings and invalid scientific conclusions. In this work, we introduce an assumption-lean and data-adaptive Post-Prediction Inference (POP-Inf) procedure that allows valid and powerful inference based on ML-predicted outcomes. Its "assumption-lean" property guarantees reliable statistical inference without assumptions on the ML-prediction, for a wide range of statistical quantities. Its "data-adaptive'" feature guarantees an efficiency gain over existing post-prediction inference methods, regardless of the accuracy of ML-prediction. We demonstrate the superiority and applicability of our method through simulations and large-scale genomic data.
摘要
现代科学研究面临着金标数据的有限可用性,这些数据可能是成本高昂、人工繁琐的获取的。随着机器学习(ML)的快速发展,科学家们通过ML算法预测金标结果,使用易获得的共变量。然而,这些预测结果经常直接用于后续统计分析,忽略预测过程中引入的不确定性和多样性,这将可能导致假阳性结果和无效的科学结论。在这项工作中,我们介绍了一种假设轻量级和数据适应的后预测统计方法(POP-Inf),允许基于ML预测结果进行有效和强大的统计推断。它的“假设轻量级”性特征保证了统计推断的可靠性,无论ML预测的准确性如何。它的“数据适应”特征 garanties an efficiency gain over existing post-prediction inference methods, regardless of the accuracy of ML-prediction。我们通过实验和大规模 genomic 数据示范了我们的方法的优越性和可应用性。
Extending Variability-Aware Model Selection with Bias Detection in Machine Learning Projects
results: 该方法可以在一个心衰病预测项目中进行应用,并且可以使模型选择过程变得更加可评估、可变、非静态。Abstract
Data science projects often involve various machine learning (ML) methods that depend on data, code, and models. One of the key activities in these projects is the selection of a model or algorithm that is appropriate for the data analysis at hand. ML model selection depends on several factors, which include data-related attributes such as sample size, functional requirements such as the prediction algorithm type, and non-functional requirements such as performance and bias. However, the factors that influence such selection are often not well understood and explicitly represented. This paper describes ongoing work on extending an adaptive variability-aware model selection method with bias detection in ML projects. The method involves: (i) modeling the variability of the factors that affect model selection using feature models based on heuristics proposed in the literature; (ii) instantiating our variability model with added features related to bias (e.g., bias-related metrics); and (iii) conducting experiments that illustrate the method in a specific case study to illustrate our approach based on a heart failure prediction project. The proposed approach aims to advance the state of the art by making explicit factors that influence model selection, particularly those related to bias, as well as their interactions. The provided representations can transform model selection in ML projects into a non ad hoc, adaptive, and explainable process.
摘要
results: 研究人员使用了一个新的深度神经网络架构(DNN)来正确地类别个人。DNN 架构包括三个紧密的层,并使用了多对多映射技术。当联合新的特征和现有的特征时,支持向量机器(SVM)和 k-最近邻(kNN)分别得到了 94.7% 和 94.6% 的类别精度。此外,研究人员还测试了七种其他的分类器,其中 decision tree 和我们的提议的 DNN 分类器得到了 100% 的精度。其他分类器包括:条件树(LR)、线性条件分析(LDA)、加aussian Naive Bayes(NB)、神经网络(NN)和 VGGNet,其中的精度分别为 94.7%、95.9%、31.9%、88.8% 和 96.1%。Abstract
Our research aims at classifying individuals based on their unique interactions on touchscreen-based smartphones. In this research, we use Touch-Analytics datasets, which include 41 subjects and 30 different behavioral features. Furthermore, we derived new features from the raw data to improve the overall authentication performance. Previous research has already been done on the Touch-Analytics datasets with the state-of-the-art classifiers, including Support Vector Machine (SVM) and k-nearest neighbor (kNN), and achieved equal error rates (EERs) between 0% to 4%. Here, we propose a novel Deep Neural Net (DNN) architecture to classify the individuals correctly. The proposed DNN architecture has three dense layers and uses many-to-many mapping techniques. When we combine the new features with the existing ones, SVM and kNN achieved the classification accuracy of 94.7% and 94.6%, respectively. This research explored seven other classifiers and out of them, the decision tree and our proposed DNN classifiers resulted in the highest accuracy of 100%. The others included: Logistic Regression (LR), Linear Discriminant Analysis (LDA), Gaussian Naive Bayes (NB), Neural Network, and VGGNet with the following accuracy scores of 94.7%, 95.9%, 31.9%, 88.8%, and 96.1%, respectively.
摘要
我们的研究旨在基于触摸屏智能手机上的唯一互动来分类个人。在这项研究中,我们使用了Touch-Analytics数据集,包括41名参与者和30种不同的行为特征。此外,我们还从原始数据中 derivated新的特征,以提高总体验证性能。先前的研究已经在Touch-Analytics数据集上使用了当前顶尖分类器,包括支持向量机(SVM)和k-最近邻(kNN),并实现了0%至4%的相等错误率(EER)。在这里,我们提出了一种新的深度神经网络(DNN)架构来正确地分类个人。我们的DNN架构包括三层激活函数,并使用多对多映射技术。当我们将新特征与现有特征结合时,SVM和kNN分别实现了94.7%和94.6%的分类精度。本研究还探索了7种其他分类器,其中包括决策树和我们所提议的DNN分类器,两者均达到了100%的分类精度。其余的分类器包括:拟合回归分析(LR)、线性分解分析(LDA)、高斯愚蠢抽象(NB)、神经网络和VGGNet,它们的分类精度分别为94.7%、95.9%、31.9%、88.8%和96.1%。
Gradient-based bilevel optimization for multi-penalty Ridge regression through matrix differential calculus
for: This paper addresses the problem of linear regression with l2-regularization, where a different regularization hyperparameter is associated with each input variable.
methods: The authors use a gradient-based approach to optimize the regularization hyperparameters, and introduce two strategies tailored for sparse model learning problems to reduce the risk of overfitting to the validation data.
results: The authors demonstrate that their multi-hyperparameter regularization approach outperforms LASSO, Ridge, and Elastic Net regression, and the analytical computation of the gradient is more efficient in terms of computational time compared to automatic differentiation, especially when handling a large number of input variables.Here’s the Chinese translation of the three key information points:
results: 作者证明了他们的多参数正则化方法比LASSO、Ridge和Elastic Net regression更高效,并且在处理大量输入变量时,使用梯度法计算的速度比自动导数法更快。Abstract
Common regularization algorithms for linear regression, such as LASSO and Ridge regression, rely on a regularization hyperparameter that balances the tradeoff between minimizing the fitting error and the norm of the learned model coefficients. As this hyperparameter is scalar, it can be easily selected via random or grid search optimizing a cross-validation criterion. However, using a scalar hyperparameter limits the algorithm's flexibility and potential for better generalization. In this paper, we address the problem of linear regression with l2-regularization, where a different regularization hyperparameter is associated with each input variable. We optimize these hyperparameters using a gradient-based approach, wherein the gradient of a cross-validation criterion with respect to the regularization hyperparameters is computed analytically through matrix differential calculus. Additionally, we introduce two strategies tailored for sparse model learning problems aiming at reducing the risk of overfitting to the validation data. Numerical examples demonstrate that our multi-hyperparameter regularization approach outperforms LASSO, Ridge, and Elastic Net regression. Moreover, the analytical computation of the gradient proves to be more efficient in terms of computational time compared to automatic differentiation, especially when handling a large number of input variables. Application to the identification of over-parameterized Linear Parameter-Varying models is also presented.
摘要
通用的常规化算法,如LASSO和Ridge regression,需要一个常规化参数来衡量模型学习的质量和模型系数的 нор。这个参数是一个浮点数,可以通过随机或格子搜索来选择,但这会限制算法的灵活性和泛化能力。在这篇论文中,我们解决了线性回归问题,其中每个输入变量有自己的正则化参数。我们使用梯度法来优化这些参数,并引入了两种适用于稀疏模型学习问题的策略,以降低验证数据上的溢出风险。numerical例子表明,我们的多参数常规化方法超过了LASSO、Ridge和Elastic Net回归的性能。此外,使用梯度计算的计算时间比自动导数法更加高效,尤其是当处理大量的输入变量时。我们还应用了这些方法于确定过参数化的线性参数-变量模型的标识。
Fast Policy Learning for Linear Quadratic Regulator with Entropy Regularization
results: 论文证明了这两种方法在约束regularization下都是收敛的,其中IPO方法可以在当地区域附近于优化政策时实现超线性收敛率。此外,当一个已知环境中的优化政策被转移到一个未知环境中的RL问题中,IPO方法可以在该问题中实现超线性收敛率,只要两个问题之间的距离够小。Abstract
This paper proposes and analyzes two new policy learning methods: regularized policy gradient (RPG) and iterative policy optimization (IPO), for a class of discounted linear-quadratic regulator (LQR) problems over an infinite time horizon with entropy regularization. Assuming access to the exact policy evaluation, both proposed approaches are proved to converge linearly in finding optimal policies of the regularized LQR. Moreover, the IPO method can achieve a super-linear convergence rate once it enters a local region around the optimal policy. Finally, when the optimal policy from a well-understood environment in an RL problem is appropriately transferred as the initial policy to an RL problem with an unknown environment, the IPO method is shown to enable a super-linear convergence rate if the latter is sufficiently close to the former. The performances of these proposed algorithms are supported by numerical examples.
摘要
这个论文提出了两种新的政策学习方法:规范化政策Gradient(RPG)和迭代政策优化(IPO),用于一类无限时间束违 quadrature regulator(LQR)问题中,带有Entropy regularization。假设可以获得正确的政策评估,两种提出的方法都被证明为在找到规范化LQR的优化策略上 linearly converges。此外,IPO方法在当地region around the optimal policy附近可以实现super-linear convergence rate。最后,如果一个已知环境的优化策略在RL问题中被合适地传输给一个未知环境的RL问题,则IPO方法可以在后者 sufficiently close to the former 的情况下实现super-linear convergence rate。这些提出的算法的性能被数字示例支持。Note: The translation is in Simplified Chinese, which is one of the two standard varieties of Chinese. The other variety is Traditional Chinese.
Efficient and Robust Jet Tagging at the LHC with Knowledge Distillation
results: 研究发现,使用师模型具有强导引性的lorentz симметry的知识传递,可以在学生模型中induce同样的导引性,从而提高对arbitrary Lorentz boost的Robustness。Abstract
The challenging environment of real-time data processing systems at the Large Hadron Collider (LHC) strictly limits the computational complexity of algorithms that can be deployed. For deep learning models, this implies that only models with low computational complexity that have weak inductive bias are feasible. To address this issue, we utilize knowledge distillation to leverage both the performance of large models and the reduced computational complexity of small ones. In this paper, we present an implementation of knowledge distillation, demonstrating an overall boost in the student models' performance for the task of classifying jets at the LHC. Furthermore, by using a teacher model with a strong inductive bias of Lorentz symmetry, we show that we can induce the same inductive bias in the student model which leads to better robustness against arbitrary Lorentz boost.
摘要
在 Large Hadron Collider (LHC) 的实时数据处理系统中,实时数据处理的环境带来了对算法的计算复杂性的严格限制。对深度学习模型而言,这意味着只能使用具有低计算复杂性的模型,这些模型具有弱的推导假设。为解决这个问题,我们利用知识塑造来利用大模型的性能和小模型的减少计算复杂性。在这篇论文中,我们介绍了知识塑造的实现,并证明了学生模型在分类喷气中的总体提升。此外,通过使用具有强推导假设的 Lorentz симметрии的教师模型,我们显示了可以在学生模型中引入同样的推导假设,从而提高模型的Robustness 性。
results: 可以提供私钥保护的回归路径,并且可以实现实用和可行的回归方法。Abstract
When individuals are subject to adverse outcomes from machine learning models, providing a recourse path to help achieve a positive outcome is desirable. Recent work has shown that counterfactual explanations - which can be used as a means of single-step recourse - are vulnerable to privacy issues, putting an individuals' privacy at risk. Providing a sequential multi-step path for recourse can amplify this risk. Furthermore, simply adding noise to recourse paths found from existing methods can impact the realism and actionability of the path for an end-user. In this work, we address privacy issues when generating realistic recourse paths based on instance-based counterfactual explanations, and provide PrivRecourse: an end-to-end privacy preserving pipeline that can provide realistic recourse paths. PrivRecourse uses differentially private (DP) clustering to represent non-overlapping subsets of the private dataset. These DP cluster centers are then used to generate recourse paths by forming a graph with cluster centers as the nodes, so that we can generate realistic - feasible and actionable - recourse paths. We empirically evaluate our approach on finance datasets and compare it to simply adding noise to data instances, and to using DP synthetic data, to generate the graph. We observe that PrivRecourse can provide paths that are private and realistic.
摘要
In this work, we address privacy issues when generating realistic recourse paths based on instance-based counterfactual explanations. We provide PrivRecourse, an end-to-end privacy-preserving pipeline that can provide realistic recourse paths. PrivRecourse uses differentially private (DP) clustering to represent non-overlapping subsets of the private dataset. These DP cluster centers are then used to generate recourse paths by forming a graph with cluster centers as the nodes, allowing us to generate realistic and feasible recourse paths.We empirically evaluate our approach on finance datasets and compare it to simply adding noise to data instances and to using DP synthetic data to generate the graph. Our results show that PrivRecourse can provide private and realistic recourse paths.
A Blockchain Solution for Collaborative Machine Learning over IoT
paper_authors: Carlos Beis-Penedo, Francisco Troncoso-Pastoriza, Rebeca P. Díaz-Redondo, Ana Fernández-Vilas, Manuel Fernández-Veiga, Martín González Soto
results: 作者通过一系列实验证明了其系统的性能,并证明了其能够提高智能设备中机器学习任务的准确率和效率。Abstract
The rapid growth of Internet of Things (IoT) devices and applications has led to an increased demand for advanced analytics and machine learning techniques capable of handling the challenges associated with data privacy, security, and scalability. Federated learning (FL) and blockchain technologies have emerged as promising approaches to address these challenges by enabling decentralized, secure, and privacy-preserving model training on distributed data sources. In this paper, we present a novel IoT solution that combines the incremental learning vector quantization algorithm (XuILVQ) with Ethereum blockchain technology to facilitate secure and efficient data sharing, model training, and prototype storage in a distributed environment. Our proposed architecture addresses the shortcomings of existing blockchain-based FL solutions by reducing computational and communication overheads while maintaining data privacy and security. We assess the performance of our system through a series of experiments, showcasing its potential to enhance the accuracy and efficiency of machine learning tasks in IoT settings.
摘要
互联网物联网(IoT)设备和应用的快速增长已导致高级分析和机器学习技术的增加需求,以处理数据隐私、安全和扩展性等挑战。联邦学习(FL)和区块链技术已成为解决这些挑战的有力的方法,可以实现分布式数据源上的协同、安全和隐私保护的模型训练。本文提出了一种基于Ethereum区块链技术和增量学习向量量化算法(XuILVQ)的新的IoT解决方案,用于实现安全、高效的数据分享、模型训练和原型存储在分布式环境中。我们的提议的架构解决了现有的区块链基于FL解决方案的缺陷,例如计算和通信负担减少,保持数据隐私和安全。我们通过一系列实验证明了我们的系统的可行性和在IoT设置中机器学习任务的增加精度和效率。
Exactly conservative physics-informed neural networks and deep operator networks for dynamical systems
results: 使用保守的physics-informed neural network 和 deep operator network 可以superior于非保守的counterparts的多个实际问题。Abstract
We introduce a method for training exactly conservative physics-informed neural networks and physics-informed deep operator networks for dynamical systems. The method employs a projection-based technique that maps a candidate solution learned by the neural network solver for any given dynamical system possessing at least one first integral onto an invariant manifold. We illustrate that exactly conservative physics-informed neural network solvers and physics-informed deep operator networks for dynamical systems vastly outperform their non-conservative counterparts for several real-world problems from the mathematical sciences.
摘要
我们介绍了一种方法,用于专门训练具有保守性的物理学 Informed Neural Network 和 Deep Operator Network,以解决动力系统中的问题。这种方法使用一种投影技术,将候选解预测器从神经网络学习器中对任何具有至少一个第一 интеграル的动力系统的解析映射到不变构造上。我们显示了具有保守性的物理学 Informed Neural Network 和 Deep Operator Network 在一些实际世界问题上明显超越其非保守的对应者。
Weight fluctuations in (deep) linear neural networks and a derivation of the inverse-variance flatness relation
results: 研究发现,单层网络在弱抽样 regime下,SGD动力学中噪声矩阵的 спектр与梯度矩阵不同,这可以归结于SGD动力学中的细节平衡被破坏。单层网络的weight fluctuations是不均匀的,但是经受着均匀的损失。二层网络中的weight fluctuations受到了层之间的相互作用,导致weight fluctuations经受着非均匀的损失,损失的平坦程度与噪声variance之间存在正相关。这些结果与深度线性神经网络模型中观察到的倒variance-平坦关系相符。Abstract
We investigate the stationary (late-time) training regime of single- and two-layer linear neural networks within the continuum limit of stochastic gradient descent (SGD) for synthetic Gaussian data. In the case of a single-layer network in the weakly oversampled regime, the spectrum of the noise covariance matrix deviates notably from the Hessian, which can be attributed to the broken detailed balance of SGD dynamics. The weight fluctuations are in this case generally anisotropic, but experience an isotropic loss. For a two-layer network, we obtain the stochastic dynamics of the weights in each layer and analyze the associated stationary covariances. We identify the inter-layer coupling as a new source of anisotropy for the weight fluctuations. In contrast to the single-layer case, the weight fluctuations experience an anisotropic loss, the flatness of which is inversely related to the fluctuation variance. We thereby provide an analytical derivation of the recently observed inverse variance-flatness relation in a deep linear network model.
摘要
我们研究单层和二层线性神经网络在维度下采用Stochastic Gradient Descent(SGD)的站点训练 режиμ。在弱抽样 regime下,单层网络的噪声 covariance 矩阵 spectrum 与梯度矩阵不同,这可以归因于SGD 动力学中的破碎细致平衡。网络的权重噪声通常是非均匀的,但是经受着均匀的损失。在二层网络中,我们获得了每层 weights 的杂动动力学和相关的站点协方差。我们发现了Inter-layer coupling 对权重噪声的新来源,这些噪声经受着不同方向的损失,其损失平均值与噪声方差之间存在逆相关性。我们通过analytical derivation 验证了深度线性神经网络模型中Recently observed inverse variance-flatness relation。
SySMOL: A Hardware-software Co-design Framework for Ultra-Low and Fine-Grained Mixed-Precision Neural Networks
results: 研究结果表明,通过使用这种新的硬件软件共设方法和配置CPU SIMD架构,可以实现神经网络的训练和推理时间和能效性的显著提高,相比整数点网络可以提高10-20倍。Abstract
Recent advancements in quantization and mixed-precision techniques offer significant promise for improving the run-time and energy efficiency of neural networks. In this work, we further showed that neural networks, wherein individual parameters or activations can take on different precisions ranging between 1 and 4 bits, can achieve accuracies comparable to or exceeding the full-precision counterparts. However, the deployment of such networks poses numerous challenges, stemming from the necessity to manage and control the compute/communication/storage requirements associated with these extremely fine-grained mixed precisions for each piece of data. There is a lack of existing efficient hardware and system-level support tailored to these unique and challenging requirements. Our research introduces the first novel holistic hardware-software co-design approach for these networks, which enables a continuous feedback loop between hardware design, training, and inference to facilitate systematic design exploration. As a proof-of-concept, we illustrate this co-design approach by designing new, configurable CPU SIMD architectures tailored for these networks, tightly integrating the architecture with new system-aware training and inference techniques. We perform systematic design space exploration using this framework to analyze various tradeoffs. The design for mixed-precision networks that achieves optimized tradeoffs corresponds to an architecture that supports 1, 2, and 4-bit fixed-point operations with four configurable precision patterns, when coupled with system-aware training and inference optimization -- networks trained for this design achieve accuracies that closely match full-precision accuracies, while compressing and improving run-time efficiency of the neural networks drastically by 10-20x, compared to full-precision networks.
摘要
MINTY: Rule-based Models that Minimize the Need for Imputing Features with Missing Values
results: 在synthetic和实际数据集的实验中,MINTY方法的预测性能与基准相当,同时具有较小的特征缺失依赖性和更好的可读性。Abstract
Rule models are often preferred in prediction tasks with tabular inputs as they can be easily interpreted using natural language and provide predictive performance on par with more complex models. However, most rule models' predictions are undefined or ambiguous when some inputs are missing, forcing users to rely on statistical imputation models or heuristics like zero imputation, undermining the interpretability of the models. In this work, we propose fitting concise yet precise rule models that learn to avoid relying on features with missing values and, therefore, limit their reliance on imputation at test time. We develop MINTY, a method that learns rules in the form of disjunctions between variables that act as replacements for each other when one or more is missing. This results in a sparse linear rule model, regularized to have small dependence on features with missing values, that allows a trade-off between goodness of fit, interpretability, and robustness to missing values at test time. We demonstrate the value of MINTY in experiments using synthetic and real-world data sets and find its predictive performance comparable or favorable to baselines, with smaller reliance on features with missing values.
摘要
<>TRANSLATE_TEXTRule models 常被 prefer 在预测任务中,因为它们可以轻松地被自然语言解释,并且提供与更复杂的模型相当的预测性能。然而,大多数 Rule models 的预测结果在某些输入缺失时是未定或歧义的,迫使用户依赖统计插入模型或规则 like zero imputation,这会破坏模型的解释性。在这种情况下,我们提议使用 concise yet precise Rule models,这些模型可以避免依赖缺失输入的特征,并因此在测试时减少依赖 imputation。我们开发了 MINTY,一种学习规则的方法,这些规则是在缺失一个或多个特征时,其他特征之间的并集。这导致一个稀疏线性规则模型,这个模型被规则化以减少依赖缺失输入特征的依赖度,这样可以在测试时实现一个折衔 между 好坏性、解释性和缺失输入特征的Robustness。我们通过使用 sintetic 和实际世界的数据集进行实验,并发现 MINTY 的预测性能与基eline相当或有利,同时具有较小的依赖缺失输入特征。Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.
results: 研究发现,这种方法可以大幅提高训练效率、参数利用率和通用性表现,同时减少计算成本。Abstract
Neural network ensembles have been effectively used to improve generalization by combining the predictions of multiple independently trained models. However, the growing scale and complexity of deep neural networks have led to these methods becoming prohibitively expensive and time consuming to implement. Low-cost ensemble methods have become increasingly important as they can alleviate the need to train multiple models from scratch while retaining the generalization benefits that traditional ensemble learning methods afford. This dissertation introduces and formalizes a low-cost framework for constructing Subnetwork Ensembles, where a collection of child networks are formed by sampling, perturbing, and optimizing subnetworks from a trained parent model. We explore several distinct methodologies for generating child networks and we evaluate their efficacy through a variety of ablation studies and established benchmarks. Our findings reveal that this approach can greatly improve training efficiency, parametric utilization, and generalization performance while minimizing computational cost. Subnetwork Ensembles offer a compelling framework for exploring how we can build better systems by leveraging the unrealized potential of deep neural networks.
摘要
这个论文提出了一种低成本的拟合网络集成方法,称为子网络集成(Subnetwork Ensembles),其中一个父模型中的多个子网络被采样、扰动和优化,以形成一个集成模型。我们考虑了多种生成子网络的方法,并通过许多缺失研究和已知的标准准则进行评估。我们的发现表明,这种方法可以大幅提高训练效率、参数利用率和泛化性能,同时减少计算成本。Subnetwork Ensembles 提供了一种可行的方式,可以通过利用深度神经网络的不利用的潜力,建立更好的系统。
Robust Decision Aggregation with Second-order Information
results: 研究发现,当专家们的信号是conditionally independent given the world state时,then second-order information can provide significant benefits for the aggregator. In particular, a deterministic aggregator that leverages second-order information can outperform its counterparts without it. However, when the experts’ signals are not conditionally independent, second-order information does not provide any benefits.Abstract
We consider a decision aggregation problem with two experts who each make a binary recommendation after observing a private signal about an unknown binary world state. An agent, who does not know the joint information structure between signals and states, sees the experts' recommendations and aims to match the action with the true state. Under the scenario, we study whether supplemented additionally with second-order information (each expert's forecast on the other's recommendation) could enable a better aggregation. We adopt a minimax regret framework to evaluate the aggregator's performance, by comparing it to an omniscient benchmark that knows the joint information structure. With general information structures, we show that second-order information provides no benefit. No aggregator can improve over a trivial aggregator, which always follows the first expert's recommendation. However, positive results emerge when we assume experts' signals are conditionally independent given the world state. When the aggregator is deterministic, we present a robust aggregator that leverages second-order information, which can significantly outperform counterparts without it. Second, when two experts are homogeneous, by adding a non-degenerate assumption on the signals, we demonstrate that random aggregators using second-order information can surpass optimal ones without it. In the remaining settings, the second-order information is not beneficial. We also extend the above results to the setting when the aggregator's utility function is more general.
摘要
我们考虑了一个决策集成问题,其中两位专家每个人根据自己看到的私人信号来做 Binary 建议。一个代理人,不知道专家们之间的联合信息结构,看到专家们的建议和一个未知的二进制世界状态。在这种情况下,我们研究了 Whether supplemented with second-order information (每位专家对别的建议预测) 可以提供更好的集成。 We adopt a minimax regret framework to evaluate the aggregator's performance by comparing it to an omniscient benchmark that knows the joint information structure. With general information structures, we show that second-order information provides no benefit. No aggregator can improve over a trivial aggregator, which always follows the first expert's recommendation. However, positive results emerge when we assume experts' signals are conditionally independent given the world state. When the aggregator is deterministic, we present a robust aggregator that leverages second-order information, which can significantly outperform counterparts without it. Second, when two experts are homogeneous, by adding a non-degenerate assumption on the signals, we demonstrate that random aggregators using second-order information can surpass optimal ones without it. In the remaining settings, the second-order information is not beneficial. We also extend the above results to the setting when the aggregator's utility function is more general.
Forecasting Cryptocurrency Prices Using Deep Learning: Integrating Financial, Blockchain, and Text Data
results: 研究发现,将NLP数据集成到模型中可以显著提高估价性能。Twitter-RoBERTa和BART MNLI预训练模型在捕捉市场情绪方面表现出色,而细订LLMs也能够提供较好的预测改进。Abstract
This paper explores the application of Machine Learning (ML) and Natural Language Processing (NLP) techniques in cryptocurrency price forecasting, specifically Bitcoin (BTC) and Ethereum (ETH). Focusing on news and social media data, primarily from Twitter and Reddit, we analyse the influence of public sentiment on cryptocurrency valuations using advanced deep learning NLP methods. Alongside conventional price regression, we treat cryptocurrency price forecasting as a classification problem. This includes both the prediction of price movements (up or down) and the identification of local extrema. We compare the performance of various ML models, both with and without NLP data integration. Our findings reveal that incorporating NLP data significantly enhances the forecasting performance of our models. We discover that pre-trained models, such as Twitter-RoBERTa and BART MNLI, are highly effective in capturing market sentiment, and that fine-tuning Large Language Models (LLMs) also yields substantial forecasting improvements. Notably, the BART MNLI zero-shot classification model shows considerable proficiency in extracting bullish and bearish signals from textual data. All of our models consistently generate profit across different validation scenarios, with no observed decline in profits or reduction in the impact of NLP data over time. The study highlights the potential of text analysis in improving financial forecasts and demonstrates the effectiveness of various NLP techniques in capturing nuanced market sentiment.
摘要
The paper compares the performance of various ML models, both with and without NLP data integration. The results show that incorporating NLP data significantly enhances the forecasting performance of the models. The paper finds that pre-trained models such as Twitter-RoBERTa and BART MNLI are effective in capturing market sentiment, and fine-tuning Large Language Models (LLMs) also yields substantial forecasting improvements. Notably, the BART MNLI zero-shot classification model shows considerable proficiency in extracting bullish and bearish signals from textual data.All of the models consistently generate profits across different validation scenarios, with no observed decline in profits or reduction in the impact of NLP data over time. The study highlights the potential of text analysis in improving financial forecasts and demonstrates the effectiveness of various NLP techniques in capturing nuanced market sentiment.
Empirical Comparison between Cross-Validation and Mutation-Validation in Model Selection
results: 研究发现,多态验证和十分之一折衔都可以选择具有实际等价总体化预测性能的模型,但是 MV 方法在选择简单模型和计算效率方面表现出了优势。然而,MV 方法在某些情况下可能会选择过于简单的模型,导致下降性能,并且在hyperparameter选择中存在不稳定性。这些 MV 方法的限制在一个实际的 neuroscientific 任务中预测性别的性表现更为明显。Abstract
Mutation validation (MV) is a recently proposed approach for model selection, garnering significant interest due to its unique characteristics and potential benefits compared to the widely used cross-validation (CV) method. In this study, we empirically compared MV and $k$-fold CV using benchmark and real-world datasets. By employing Bayesian tests, we compared generalization estimates yielding three posterior probabilities: practical equivalence, CV superiority, and MV superiority. We also evaluated the differences in the capacity of the selected models and computational efficiency. We found that both MV and CV select models with practically equivalent generalization performance across various machine learning algorithms and the majority of benchmark datasets. MV exhibited advantages in terms of selecting simpler models and lower computational costs. However, in some cases MV selected overly simplistic models leading to underfitting and showed instability in hyperparameter selection. These limitations of MV became more evident in the evaluation of a real-world neuroscientific task of predicting sex at birth using brain functional connectivity.
摘要
mutation validation(MV)是一种最近提出的方法选择模型,受到广泛关注,因为它具有独特的特点和比较于广泛使用的十字验证(CV)方法的优点。在这项研究中,我们employs Bayesian测试比较了MV和$k$-fold CV使用标准和实际世界数据集。通过计算三个 posterior 概率:实际相等、CV优于MV和MV优于CV。我们还评估了选择模型的能力和计算效率之间的差异。我们发现,MV和CV在不同的机器学习算法和大多数标准数据集中都选择了具有实际相等的泛化性能。然而,MV在一些情况下选择了过分简单的模型,导致下降和模型选择不稳定。这些MV的限制在评估一个实际神经科学任务——预测新生儿性别用大脑功能连接性的预测中变得更加明显。
Machine learning-based decentralized TDMA for VLC IoT networks
results: 实验结果表明,提出的算法快速 converges和提供了冲突自由的分布式TDMA,与CSMA/CA算法相比,提出的算法可提供最多61%更多的净负载和最多49% menos的平均延迟。Abstract
In this paper, a machine learning-based decentralized time division multiple access (TDMA) algorithm for visible light communication (VLC) Internet of Things (IoT) networks is proposed. The proposed algorithm is based on Q-learning, a reinforcement learning algorithm. This paper considers a decentralized condition in which there is no coordinator node for sending synchronization frames and assigning transmission time slots to other nodes. The proposed algorithm uses a decentralized manner for synchronization, and each node uses the Q-learning algorithm to find the optimal transmission time slot for sending data without collisions. The proposed algorithm is implemented on a VLC hardware system, which had been designed and implemented in our laboratory. Average reward, convergence time, goodput, average delay, and data packet size are evaluated parameters. The results show that the proposed algorithm converges quickly and provides collision-free decentralized TDMA for the network. The proposed algorithm is compared with carrier-sense multiple access with collision avoidance (CSMA/CA) algorithm as a potential selection for decentralized VLC IoT networks. The results show that the proposed algorithm provides up to 61% more goodput and up to 49% less average delay than CSMA/CA.
摘要
本文提出了一种基于机器学习的分布式时分多access(TDMA)算法,用于无线光通信(VLC)互联网关系(IoT)网络。该算法基于Q学习算法,在各个节点之间不存在协调器节点,不需要发送同步帧和分配传输时间槽。提出的算法采用分布式方式进行同步,每个节点使用Q学习算法来找到无碰撞发送数据的最佳传输时间槽。该算法在实验室中设计制造的VLC硬件系统上进行实现,并评估了平均奖励、整合时间、吞吐量、平均延迟和数据包大小等参数。结果表明,提出的算法快速 converges 并提供了无碰撞的分布式TDMA для网络。与CSMA/CA算法进行比较,结果显示,提出的算法可以提供最多61%更高的吞吐量和最多49% less的平均延迟。
RetroDiff: Retrosynthesis as Multi-stage Distribution Interpolation
results: 实验结果表明,RetroDiff方法在标准测试集上比所有半模板方法更高效。Abstract
Retrosynthesis poses a fundamental challenge in biopharmaceuticals, aiming to aid chemists in finding appropriate reactant molecules and synthetic pathways given determined product molecules. With the reactant and product represented as 2D graphs, retrosynthesis constitutes a conditional graph-to-graph generative task. Inspired by the recent advancements in discrete diffusion models for graph generation, we introduce Retrosynthesis Diffusion (RetroDiff), a novel diffusion-based method designed to address this problem. However, integrating a diffusion-based graph-to-graph framework while retaining essential chemical reaction template information presents a notable challenge. Our key innovation is to develop a multi-stage diffusion process. In this method, we decompose the retrosynthesis procedure to first sample external groups from the dummy distribution given products and then generate the external bonds to connect the products and generated groups. Interestingly, such a generation process is exactly the reverse of the widely adapted semi-template retrosynthesis procedure, i.e. from reaction center identification to synthon completion, which significantly reduces the error accumulation. Experimental results on the benchmark have demonstrated the superiority of our method over all other semi-template methods.
摘要
逆Synthesis pose a fundamental challenge in biopharmaceuticals, aiming to aid chemists in finding appropriate reactant molecules and synthetic pathways given determined product molecules. With the reactant and product represented as 2D graphs, 逆Synthesis constitutes a conditional graph-to-graph generative task. Inspired by the recent advancements in discrete diffusion models for graph generation, we introduce 逆Synthesis Diffusion (RetroDiff), a novel diffusion-based method designed to address this problem. However, integrating a diffusion-based graph-to-graph framework while retaining essential chemical reaction template information presents a notable challenge. Our key innovation is to develop a multi-stage diffusion process. In this method, we decompose the 逆Synthesis procedure to first sample external groups from the dummy distribution given products and then generate the external bonds to connect the products and generated groups. Interestingly, such a generation process is exactly the reverse of the widely adapted semi-template 逆Synthesis procedure, i.e. from reaction center identification to synthon completion, which significantly reduces the error accumulation. Experimental results on the benchmark have demonstrated the superiority of our method over all other semi-template methods.
DPSUR: Accelerating Differentially Private Stochastic Gradient Descent Using Selective Update and Release
results: 在MNIST、FMNIST、CIFAR-10和IMDB等数据集上实现了更快的迭代速度和更高的模型用用性,并且比前一些工作具有更好的隐私保护和更高的安全性。Abstract
Machine learning models are known to memorize private data to reduce their training loss, which can be inadvertently exploited by privacy attacks such as model inversion and membership inference. To protect against these attacks, differential privacy (DP) has become the de facto standard for privacy-preserving machine learning, particularly those popular training algorithms using stochastic gradient descent, such as DPSGD. Nonetheless, DPSGD still suffers from severe utility loss due to its slow convergence. This is partially caused by the random sampling, which brings bias and variance to the gradient, and partially by the Gaussian noise, which leads to fluctuation of gradient updates. Our key idea to address these issues is to apply selective updates to the model training, while discarding those useless or even harmful updates. Motivated by this, this paper proposes DPSUR, a Differentially Private training framework based on Selective Updates and Release, where the gradient from each iteration is evaluated based on a validation test, and only those updates leading to convergence are applied to the model. As such, DPSUR ensures the training in the right direction and thus can achieve faster convergence than DPSGD. The main challenges lie in two aspects -- privacy concerns arising from gradient evaluation, and gradient selection strategy for model update. To address the challenges, DPSUR introduces a clipping strategy for update randomization and a threshold mechanism for gradient selection. Experiments conducted on MNIST, FMNIST, CIFAR-10, and IMDB datasets show that DPSUR significantly outperforms previous works in terms of convergence speed and model utility.
摘要
机器学习模型有可能因为减少训练损失而记忆私人数据,这可能会被隐私攻击 such as 模型反转和会员推测。为了保护这些攻击,分布式隐私(DP)已成为隐私保护机器学习的标准,特别是流行的训练算法使用随机梯度下降,如 DPSGD。然而,DPSGD仍然受到严重的用户损失,这部分是由随机抽样引入的偏见和变异所致,另一部分是由抽样噪声引入的梯度更新的振荡。为了解决这些问题,我们的关键想法是应用到模型训练中的选择性更新,而且抛弃无用或甚至有害的更新。这种想法导致了这paper提出了一种基于选择性更新和释放的分布式隐私训练框架(DPSUR)。DPSUR Ensures the training in the right direction and thus can achieve faster convergence than DPSGD.两个主要挑战是隐私问题由梯度评估引入,以及梯度选择策略 для模型更新。为了解决这些挑战,DPSUR引入了更新随机化的剪切策略和梯度选择的阈值机制。经验表明,在MNIST、FMNIST、CIFAR-10和IMDB等数据集上,DPSUR明显超过了先前的工作,特别是在速度和模型有用性两个方面。
AdapterFL: Adaptive Heterogeneous Federated Learning for Resource-constrained Mobile Computing Systems
results: 实验结果显示,AdapterFL 可以在资源受限的情况下实现大约 12% 的精度提升,相比于现有的不同型联邦学习方法。Abstract
Federated Learning (FL) enables collaborative learning of large-scale distributed clients without data sharing. However, due to the disparity of computing resources among massive mobile computing devices, the performance of traditional homogeneous model-based Federated Learning (FL) is seriously limited. On the one hand, to achieve model training in all the diverse clients, mobile computing systems can only use small low-performance models for collaborative learning. On the other hand, devices with high computing resources cannot train a high-performance large model with their insufficient raw data. To address the resource-constrained problem in mobile computing systems, we present a novel heterogeneous FL approach named AdapterFL, which uses a model reassemble strategy to facilitate collaborative training of massive heterogeneous mobile devices adaptively. Specifically, we select multiple candidate heterogeneous models based on the computing performance of massive mobile devices and then divide each heterogeneous model into two partitions. By reassembling the partitions, we can generate models with varied sizes that are combined by the partial parameters of the large model with the partial parameters of the small model. Using these reassembled models for FL training, we can train the partial parameters of the large model using low-performance devices. In this way, we can alleviate performance degradation in large models due to resource constraints. The experimental results show that AdapterFL can achieve up to 12\% accuracy improvement compared to the state-of-the-art heterogeneous federated learning methods in resource-constrained scenarios.
摘要
联合学习(FL)允许分布式客户端之间合作学习,而不需要数据共享。然而,由于分布式计算设备的资源差异,传统的同类型模型基于FL的性能受到严重限制。一方面,为了在各种不同的客户端上进行模型训练,移动计算系统只能使用小型低性能模型进行合作学习。另一方面,拥有高性能计算资源的设备无法使用不够的原始数据训练高性能大模型。为解决移动计算系统中资源受限的问题,我们提出了一种基于适配器FL的新型多样化联合学习方法,称为AdapterFL。在这种方法中,我们选择了大量不同类型的移动设备的计算性能为基础,并将每个异类模型分为两个分 partitions。通过重新组装这些分 partitions,我们可以生成具有不同大小的模型,其中包括大模型的部分参数和小模型的部分参数。使用这些重新组装的模型进行联合学习训练,我们可以使用低性能设备训练大模型的部分参数,从而缓解大模型因资源约束而受到的性能下降。实验结果表明,AdapterFL可以与状态当前的多样化联合学习方法相比,在资源受限的场景中提高Accuracy达12%。
Multivariate Scenario Generation of Day-Ahead Electricity Prices using Normalizing Flows
results: 研究结果表明,正常化流可以生成高质量的场景,复制真实的价格分布,并提供高精度的预测。此外,我们的分析还表明,通过 periodic 重训练和扩展特征集,正常化流可以适应市场变化,继续生成高质量的日前价格场景。Abstract
Trading on electricity markets requires accurate information about the realization of electricity prices and the uncertainty attached to the predictions. We present a probabilistic forecasting approach for day-ahead electricity prices using the fully data-driven deep generative model called normalizing flows. Our modeling approach generates full-day scenarios of day-ahead electricity prices based on conditional features such as residual load forecasts. Furthermore, we propose extended feature sets of prior realizations and a periodic retraining scheme that allows the normalizing flow to adapt to the changing conditions of modern electricity markets. In particular, we investigate the impact of the energy crisis ensuing from the Russian invasion of Ukraine. Our results highlight that the normalizing flow generates high-quality scenarios that reproduce the true price distribution and yield highly accurate forecasts. Additionally, our analysis highlights how our improvements towards adaptations in changing regimes allow the normalizing flow to adapt to changing market conditions and enables continued sampling of high-quality day-ahead price scenarios.
摘要
交易电力市场需要准确的电力价格信息和预测 uncertainty 的信息。我们提出了基于全数据驱动的深度生成模型——正常化流的概率预测方法。我们的模型方法生成了一天前电力价格的全天enario 基于 conditional 特征,如剩余负荷预测。此外,我们提议了extended 特征集和 periodic 重训练方案,allowing the normalizing flow to adapt to the changing conditions of modern electricity markets。特别是,我们研究了由俄罗斯入侵乌克兰而导致的能源危机的影响。我们的结果表明,正常化流生成了高质量的scenario,重现了真实的价格分布,并且实现了高精度的预测。此外,我们的分析表明,我们的改进方法可以适应 changing 市场条件,并继续生成高质量的一天前价格scenario。
ExCeL : Combined Extreme and Collective Logit Information for Enhancing Out-of-Distribution Detection
results: experiments表明,该方法在CIFAR100和ImageNet-200 datasets上与21种基础方法相比,在考虑近OOD和远OOD的情况下,AUROC和FPR95的 JOINT性能在前五名。此外,该方法在两个dataset上都达到了最佳总性能,而其他基础方法则只在一个dataset上达到了最佳性能,在另一个dataset上表现下降。Abstract
Deep learning models often exhibit overconfidence in predicting out-of-distribution (OOD) data, underscoring the crucial role of OOD detection in ensuring reliability in predictions. Among various OOD detection approaches, post-hoc detectors have gained significant popularity, primarily due to their ease of use and implementation. However, the effectiveness of most post-hoc OOD detectors has been constrained as they rely solely either on extreme information, such as the maximum logit, or on the collective information (i.e., information spanned across classes or training samples) embedded within the output layer. In this paper, we propose ExCeL that combines both extreme and collective information within the output layer for enhanced accuracy in OOD detection. We leverage the logit of the top predicted class as the extreme information (i.e., the maximum logit), while the collective information is derived in a novel approach that involves assessing the likelihood of other classes appearing in subsequent ranks across various training samples. Our idea is motivated by the observation that, for in-distribution (ID) data, the ranking of classes beyond the predicted class is more deterministic compared to that in OOD data. Experiments conducted on CIFAR100 and ImageNet-200 datasets demonstrate that ExCeL consistently is among the five top-performing methods out of twenty-one existing post-hoc baselines when the joint performance on near-OOD and far-OOD is considered (i.e., in terms of AUROC and FPR95). Furthermore, ExCeL shows the best overall performance across both datasets, unlike other baselines that work best on one dataset but has a performance drop in the other.
摘要
深度学习模型经常表现出过度自信在预测非标量(OOD)数据上,这 highlights the crucial role of OOD detection in ensuring the reliability of predictions. Among various OOD detection approaches, post-hoc detectors have gained significant popularity due to their ease of use and implementation. However, the effectiveness of most post-hoc OOD detectors has been limited as they rely solely on either extreme information, such as the maximum logit, or collective information (i.e., information spanned across classes or training samples) embedded within the output layer.在这篇论文中,我们提出了ExCeL,它将两者结合在输出层中,以提高OOD检测的准确性。我们使用最大的logit来获取极值信息,而集成信息则是通过评估不同训练样本中类别的顺序出现概率来获取。我们的想法是基于ID数据中类别的排序是更加决定性的,而OOD数据中则不是。我们在CIFAR100和ImageNet-200 datasets上进行了实验,结果显示ExCeL在21个基eline中排名前5,并且在近OOD和远OOD的 JOIN 性能(AUROC和FPR95)中表现最佳。此外,ExCeL在两个dataset上都表现出最好的总性能,而其他基eline则在一个dataset上表现出色,但在另一个dataset上表现下降。
On the Hyperparameter Landscapes of Machine Learning Algorithms
paper_authors: Mingyu Huang, Ke Li for:这篇论文的目的是为了探讨机器学习模型的超参数优化(HPO)过程中的模型超参数(HP)和预测损失之间的复杂关系,以提高HPO的可解释性和人类信任度。methods:这篇论文使用了大规模的适应性景观分析(FLA)方法,对6种机器学习模型的11个配置进行了67个数据集和不同级别的检测。这些配置包括了不同的数据集和级别的检测。results:这篇论文发现了这些模型的适应性景观具有一定的滑坡性、中和性和多样性,这些特征是可以在不同的数据集和级别上传递的。这些发现为多元化的自动化机器学习任务提供了基础性的证明。Abstract
Despite the recent success in a plethora of hyperparameter optimization (HPO) methods for machine learning (ML) models, the intricate interplay between model hyperparameters (HPs) and predictive losses (a.k.a fitness), which is a key prerequisite for understanding HPO, remain notably underexplored in our community. This results in limited explainability in the HPO process, rendering a lack of human trust and difficulties in pinpointing algorithm bottlenecks. In this paper, we aim to shed light on this black box by conducting large-scale fitness landscape analysis (FLA) on 1,500 HP loss landscapes of 6 ML models with more than 11 model configurations, across 67 datasets and different levels of fidelities. We reveal the first unified, comprehensive portrait of their topographies in terms of smoothness, neutrality and modality. We also show that such properties are highly transferable across datasets and fidelities, providing fundamental evidence for the success of multi-fidelity and transfer learning methods. These findings are made possible by developing a dedicated FLA framework that incorporates a combination of visual and quantitative measures. We further demonstrate the potential of this framework by analyzing the NAS-Bench-101 landscape, and we believe it is able to faciliate fundamental understanding of a broader range of AutoML tasks.
摘要
尽管现在有很多机器学习(ML)模型的超参数优化(HPO)方法得到了成功,但是模型超参数(HP)和预测损失(fitness)之间复杂的互动,这是我们社区未能充分探索的关键问题。这导致HPO过程中的解释性受限,人类信任度低,缺乏指定算法瓶颈的能力。在这篇论文中,我们想使这个黑盒子变得更加 transparent,通过对1500个HP损失 landscape的大规模分析,揭示了6种ML模型的11个配置选择对67个数据集和不同的精度水平 exhibit的topography的统一、抽象和多样性。我们还发现这些属性具有很高的传输性,这为多个精度和转移学习方法提供了基础证据。这些发现得到了我们开发的专门的FLA框架的支持,该框架包括视觉和量化度量的组合。我们还通过分析NAS-Bench-101 landscape的结果,证明了这个框架的潜在应用范围。我们believe这个框架能够为AutoML任务提供基础理解。
Docking Multirotors in Close Proximity using Learnt Downwash Models
results: 这篇论文的评估结果显示,使用了学习式下测模型的补偿可以将追踪误差降低至0.06米范围内,相比之下传统/ primitive 方法的误差可以高达3-4倍。此外,论文还成功地实现了两架空中飞行中的多rotor物理对接。Abstract
Unmodeled aerodynamic disturbances pose a key challenge for multirotor flight when multiple vehicles are in close proximity to each other. However, certain missions \textit{require} two multirotors to approach each other within 1-2 body-lengths of each other and hold formation -- we consider one such practical instance: vertically docking two multirotors in the air. In this leader-follower setting, the follower experiences significant downwash interference from the leader in its final docking stages. To compensate for this, we employ a learnt downwash model online within an optimal feedback controller to accurately track a docking maneuver and then hold formation. Through real-world flights with different maneuvers, we demonstrate that this compensation is crucial for reducing the large vertical separation otherwise required by conventional/naive approaches. Our evaluations show a tracking error of less than 0.06m for the follower (a 3-4x reduction) when approaching vertically within two body-lengths of the leader. Finally, we deploy the complete system to effect a successful physical docking between two airborne multirotors in a single smooth planned trajectory.
摘要
无模型风动干扰对多rotor飞行 pose 一个关键挑战,特别是当多个 vehicles 在 close proximity 之间时。然而,某些任务需要两个 multirotors 在附近之间进行接合 -- 我们考虑一个实际情况:在空中垂直协作。在这种领导者-跟随者设置下,跟随者在接合的最后阶段会经受领导者的下沉干扰。为了弥补这一点,我们使用在线学习的下沉模型来精准跟踪接合动作,然后保持形态。通过实际飞行不同的动作,我们示示了这种补偿是对减少 conventional/naive 方法所需的大Vertical separation 的重要补偿。我们的评估显示,当跟随者在领导者的两个体长之间接合时,跟随者的跟踪错误 less than 0.06m(下降3-4倍)。最后,我们部署完整的系统,实现了两个空中 multirotors 之间的成功物理协作。
MedISure: Towards Assuring Machine Learning-based Medical Image Classifiers using Mixup Boundary Analysis
paper_authors: Adam Byfield, William Poulett, Ben Wallace, Anusha Jose, Shatakshi Tyagi, Smita Shembekar, Adnan Qayyum, Junaid Qadir, Muhammad Bilal
for: 本研究旨在验证机器学习(ML)模型在医疗技术中的安全性、公平性、可靠性和信任性。
methods: 本研究使用了一种新的技术 called Mix-Up Boundary Analysis(MUBA)来评估图像分类器的预测公平性。
results: 研究在两个重要的医疗图像任务——脑肿瘤分类和乳腺癌分类中达到了有望的结果。Abstract
Machine learning (ML) models are becoming integral in healthcare technologies, presenting a critical need for formal assurance to validate their safety, fairness, robustness, and trustworthiness. These models are inherently prone to errors, potentially posing serious risks to patient health and could even cause irreparable harm. Traditional software assurance techniques rely on fixed code and do not directly apply to ML models since these algorithms are adaptable and learn from curated datasets through a training process. However, adapting established principles, such as boundary testing using synthetic test data can effectively bridge this gap. To this end, we present a novel technique called Mix-Up Boundary Analysis (MUBA) that facilitates evaluating image classifiers in terms of prediction fairness. We evaluated MUBA for two important medical imaging tasks -- brain tumour classification and breast cancer classification -- and achieved promising results. This research aims to showcase the importance of adapting traditional assurance principles for assessing ML models to enhance the safety and reliability of healthcare technologies. To facilitate future research, we plan to publicly release our code for MUBA.
摘要
机器学习(ML)模型在医疗技术中变得越来越重要,但这些模型具有严重的安全、公平、稳定性和可靠性问题。这些模型的错误可能会对患者健康造成严重的威胁,甚至会导致无法修复的损害。传统的软件验证技术基于固定的代码,而ML模型是可变的,通过训练数据集来学习。因此,需要采用适应的方法来验证ML模型的安全性。为此,我们提出了一种名为混合边界分析(MUBA)的新技术,可以评估图像分类器的预测公平性。我们对涉及到脑肿瘤分类和乳腺癌分类等两个重要的医疗影像任务进行了评估,并获得了有望的结果。本研究的目的是证明适应传统验证原则来评估ML模型的安全性和可靠性是对医疗技术的重要进步。为便于未来的研究,我们计划在线上公开我们的MUBA代码。
Object Location Prediction in Real-time using LSTM Neural Network and Polynomial Regression
results: 实现了实时高精度的物体坐标预测,误差为 0.11米,比传统的 Kalman 筛法优化76%,具有低延迟和高精度。Abstract
This paper details the design and implementation of a system for predicting and interpolating object location coordinates. Our solution is based on processing inertial measurements and global positioning system data through a Long Short-Term Memory (LSTM) neural network and polynomial regression. LSTM is a type of recurrent neural network (RNN) particularly suited for processing data sequences and avoiding the long-term dependency problem. We employed data from real-world vehicles and the global positioning system (GPS) sensors. A critical pre-processing step was developed to address varying sensor frequencies and inconsistent GPS time steps and dropouts. The LSTM-based system's performance was compared with the Kalman Filter. The system was tuned to work in real-time with low latency and high precision. We tested our system on roads under various driving conditions, including acceleration, turns, deceleration, and straight paths. We tested our proposed solution's accuracy and inference time and showed that it could perform in real-time. Our LSTM-based system yielded an average error of 0.11 meters with an inference time of 2 ms. This represents a 76\% reduction in error compared to the traditional Kalman filter method, which has an average error of 0.46 meters with a similar inference time to the LSTM-based system.
摘要
Optimal Power Flow in Highly Renewable Power System Based on Attention Neural Networks
results: 对欧洲电力系统的综合评估表明,本研究的方法在数据驱动技术中表现出优于现有的OPF解决方法,并且可以在实时应用中提供快速、稳定、高效的解决方案。Abstract
The Optimal Power Flow (OPF) problem is pivotal for power system operations, guiding generator output and power distribution to meet demand at minimized costs, while adhering to physical and engineering constraints. The integration of renewable energy sources, like wind and solar, however, poses challenges due to their inherent variability. This variability, driven largely by changing weather conditions, demands frequent recalibrations of power settings, thus necessitating recurrent OPF resolutions. This task is daunting using traditional numerical methods, particularly for extensive power systems. In this work, we present a cutting-edge, physics-informed machine learning methodology, trained using imitation learning and historical European weather datasets. Our approach directly correlates electricity demand and weather patterns with power dispatch and generation, circumventing the iterative requirements of traditional OPF solvers. This offers a more expedient solution apt for real-time applications. Rigorous evaluations on aggregated European power systems validate our method's superiority over existing data-driven techniques in OPF solving. By presenting a quick, robust, and efficient solution, this research sets a new standard in real-time OPF resolution, paving the way for more resilient power systems in the era of renewable energy.
摘要
“优质电力流(OPF)问题是电力系统运行中的关键问题,帮助发电机输出和电力分配与需求相匹配,以最小化成本,同时遵循物理和工程限制。然而,可再生能源源,如风和太阳能,带来了挑战,因为它们的自然变化会导致发电机输出的频繁调整,从而需要频繁的OPF解决。这个任务使用传统的数值方法来进行是困难的,特别是对于大规模的电力系统。在这种情况下,我们提出了一种前沿的、物理学习方法,通过模仿学习和历史欧洲天气数据集来训练。我们的方法直接关系电力需求和天气patterns,从而缩短了传统OPF解决方法的迭代次数。这提供了一个更快速、更稳定、更高效的解决方案,适用于实时应用。我们的研究设置了一个新的标准 для实时OPF解决,为可再生能源时代的更加可靠的电力系统开铺了道路。”
Exploring the impact of social stress on the adaptive dynamics of COVID-19: Typing the behavior of naïve populations faced with epidemics
results: 研究发现,不同国家/地区的文化特征对疫情防控策略的优化具有重要意义,通过研究人类与自然因素之间的互动,可以更好地预测和应对全球社会灾害。Abstract
In the context of natural disasters, human responses inevitably intertwine with natural factors. The COVID-19 pandemic, as a significant stress factor, has brought to light profound variations among different countries in terms of their adaptive dynamics in addressing the spread of infection outbreaks across different regions. This emphasizes the crucial role of cultural characteristics in natural disaster analysis. The theoretical understanding of large-scale epidemics primarily relies on mean-field kinetic models. However, conventional SIR-like models failed to fully explain the observed phenomena at the onset of the COVID-19 outbreak. These phenomena included the unexpected cessation of exponential growth, the reaching of plateaus, and the occurrence of multi-wave dynamics. In situations where an outbreak of a highly virulent and unfamiliar infection arises, it becomes crucial to respond swiftly at a non-medical level to mitigate the negative socio-economic impact. Here we present a theoretical examination of the first wave of the epidemic based on a simple SIRSS model (SIR with Social Stress). We conduct an analysis of the socio-cultural features of na\"ive population behaviors across various countries worldwide. The unique characteristics of each country/territory are encapsulated in only a few constants within our model, derived from the fitted COVID-19 statistics. These constants also reflect the societal response dynamics to the external stress factor, underscoring the importance of studying the mutual behavior of humanity and natural factors during global social disasters. Based on these distinctive characteristics of specific regions, local authorities can optimize their strategies to effectively combat epidemics until vaccines are developed.
摘要
在自然灾害之上,人类反应不可避免地与自然因素相互作用。COVID-19大流行疫情为一种重要的压力因素,带来了不同国家对于感染爆发区域的适应动力的明显差异。这强调了自然灾害分析中文化特征的重要性。现代大规模流行病理论主要基于均质动力学模型。但是,传统的SIR模型未能完全解释COVID-19疫情发生的观察现象,包括突然停止增长的现象、板块化的现象以及多波动的现象。在疫情发生前,当一种高度感染和不熟悉的病毒爆发时,就需要迅速地在非医学水平上采取措施,以减少社会经济的负面影响。在这篇文章中,我们提出了一个基于SIRSS模型(SIR with Social Stress)的理论分析,对不同国家/地区的社会文化特征进行了分析。我们将这些特征简化为一些几个常数,这些常数来自COVID-19统计。这些常数也反映了社会对外部压力因素的回应动力,强调了在全球社会灾害中研究人类和自然因素之间的互动。根据这些特定地区的特征,地方管理者可以优化他们的策略,以有效抗击疫情,直到疫苗开发。
Unsupervised Learning for Topological Classification of Transportation Networks
results: 使用 PCA 方法并与 K-means 分类算法,可以将交通网络分为五个群组,Silhouette 分数为 $0.510$。Abstract
With increasing urbanization, transportation plays an increasingly critical role in city development. The number of studies on modeling, optimization, simulation, and data analysis of transportation systems is on the rise. Many of these studies utilize transportation test networks to represent real-world transportation systems in urban areas, examining the efficacy of their proposed approaches. Each of these networks exhibits unique characteristics in their topology, making their applications distinct for various study objectives. Despite their widespread use in research, there is a lack of comprehensive study addressing the classification of these networks based on their topological characteristics. This study aims to fill this gap by employing unsupervised learning methods, particularly clustering. We present a comprehensive framework for evaluating various topological network characteristics. Additionally, we employ two dimensionality reduction techniques, namely Principal Component Analysis (PCA) and Isometric Feature Mapping (ISOMAP), to reduce overlaps of highly correlated features and enhance the interpretability of the subsequent classification results. We then utilize two clustering algorithms, K-means and HDBSCAN, to classify 14 transportation networks. The PCA method, followed by the K-means clustering approach, outperforms other alternatives with a Silhouette score of $0.510$, enabling the classification of transportation networks into five clusters. We also provide a detailed discussion on the resulting classification.
摘要
随着城市化的进程,交通系统在城市发展中扮演了越来越重要的角色。随着交通系统模型优化、模拟和数据分析的研究数量的增加,许多研究使用交通测试网络来代表实际城市区域的交通系统,以评估提出的方法的效果。每个测试网络具有独特的特征,使其在不同的研究目标上有着不同的应用。虽然交通测试网络在研究中广泛使用,但是没有一项全面的研究,探讨这些网络的 topological 特征的分类。本研究旨在填补这一空白,通过使用无监督学习方法,特别是聚类分析。我们提出了一个完整的框架,用于评估不同的 topological 网络特征。此外,我们采用了两种维度减少技术,即主成分分析(PCA)和ISOMAP,以减少高度相关的特征的重叠,提高后续聚类结果的可读性。然后,我们使用 K-means 和 HDBSCAN 两种聚类算法,将 14 个交通网络分类。PCA 方法,然后使用 K-means 聚类方法,在 Silhouette 分数为 0.510 的情况下,成功地将交通网络分为五个群。我们还提供了详细的分析结果。
Leveraging Optimal Transport via Projections on Subspaces for Machine Learning Applications
results: 本论文的主要结果是在使用水斯坦距离时的扩展和unbalanced OT问题中的slice distance,以及在使用Busemann函数的space of probability measures中的研究。Abstract
Optimal Transport has received much attention in Machine Learning as it allows to compare probability distributions by exploiting the geometry of the underlying space. However, in its original formulation, solving this problem suffers from a significant computational burden. Thus, a meaningful line of work consists at proposing alternatives to reduce this burden while still enjoying its properties. In this thesis, we focus on alternatives which use projections on subspaces. The main such alternative is the Sliced-Wasserstein distance, which we first propose to extend to Riemannian manifolds in order to use it in Machine Learning applications for which using such spaces has been shown to be beneficial in the recent years. We also study sliced distances between positive measures in the so-called unbalanced OT problem. Back to the original Euclidean Sliced-Wasserstein distance between probability measures, we study the dynamic of gradient flows when endowing the space with this distance in place of the usual Wasserstein distance. Then, we investigate the use of the Busemann function, a generalization of the inner product in metric spaces, in the space of probability measures. Finally, we extend the subspace detour approach to incomparable spaces using the Gromov-Wasserstein distance.
摘要
One such alternative is the Sliced-Wasserstein distance, which we extend to Riemannian manifolds to use in Machine Learning applications where such spaces have been shown to be beneficial in recent years. We also study sliced distances between positive measures in the unbalanced OT problem.Returning to the original Euclidean Sliced-Wasserstein distance between probability measures, we investigate the dynamics of gradient flows when endowing the space with this distance instead of the usual Wasserstein distance. We also explore the use of the Busemann function, a generalization of the inner product in metric spaces, in the space of probability measures.Finally, we extend the subspace detour approach to incomparable spaces using the Gromov-Wasserstein distance.
Locally Optimal Descent for Dynamic Stepsize Scheduling
For: 该文章的目标是提供一种基于理论的动态学习率调整方案,以简化在实践中手动调整学习率的困难和时间consuming问题。* Methods: 该方法基于估计当前步骤的最佳步长,确保在泊松范围内最大化梯度下降方向的下降。* Results: 该方法可以在不同的数据集和优化算法下实现比较好的性能,而无需进行较多的参数调整和温存阶段。Abstract
We introduce a novel dynamic learning-rate scheduling scheme grounded in theory with the goal of simplifying the manual and time-consuming tuning of schedules in practice. Our approach is based on estimating the locally-optimal stepsize, guaranteeing maximal descent in the direction of the stochastic gradient of the current step. We first establish theoretical convergence bounds for our method within the context of smooth non-convex stochastic optimization, matching state-of-the-art bounds while only assuming knowledge of the smoothness parameter. We then present a practical implementation of our algorithm and conduct systematic experiments across diverse datasets and optimization algorithms, comparing our scheme with existing state-of-the-art learning-rate schedulers. Our findings indicate that our method needs minimal tuning when compared to existing approaches, removing the need for auxiliary manual schedules and warm-up phases and achieving comparable performance with drastically reduced parameter tuning.
摘要
我们提出了一种新的动态学习率调整方案,基于理论,以简化实践中的手动和时间消耗性的调整。我们的方法是根据估计当前步骤的局部最优步长,保证最大化权重向量当前步骤的随机梯度下降。我们首先在非 convex 随机优化中的Context中确立了我们的方法的理论准确性 bound,与现有的 bound 匹配,只需要知道凸函数的平均值。然后我们提供了实用的实现方式,在多个数据集和优化算法上进行系统性的实验,与现有的学习率调整器进行比较。我们的发现表明,我们的方法需 minimal tuning,与现有方法相比,无需辅助的手动调整和温存阶段,并且可以达到相同的性能水平,只需要减少参数调整。
L(M)V-IQL: Multiple Intention Inverse Reinforcement Learning for Animal Behavior Characterization
results: 通过验证在模拟实验和真实鼠行为数据集上,L(M)V-IQL算法超过当前标准,提高动物决策预测的准确性,生成可解释的奖励函数,有助于动物心理学和神经科学的研究,探索动物决策过程的下面机制。Abstract
In advancing the understanding of decision-making processes, mathematical models, particularly Inverse Reinforcement Learning (IRL), have proven instrumental in reconstructing animal's multiple intentions amidst complex behaviors. Given the recent development of a continuous-time multi-intention IRL framework, there has been persistent inquiry into inferring discrete time-varying reward functions with multiple intention IRL approaches. To tackle the challenge, we introduce the Latent (Markov) Variable Inverse Q-learning (L(M)V-IQL) algorithms, a novel IRL framework tailored for accommodating discrete intrinsic rewards. Leveraging an Expectation-Maximization approach, we cluster observed trajectories into distinct intentions and independently solve the IRL problem for each. Demonstrating the efficacy of L(M)V-IQL through simulated experiments and its application to different real mouse behavior datasets, our approach surpasses current benchmarks in animal behavior prediction, producing interpretable reward functions. This advancement holds promise for neuroscience and psychology, contributing to a deeper understanding of animal decision-making and uncovering underlying brain mechanisms.
摘要
在提高决策过程理解方面,数学模型,特别是反射学习(IRL),在复杂行为中推断动物的多个目标意图表现了重要的作用。随着连续时间多目标IRL框架的开发,对于推断时间变化的奖励函数的多目标IRL方法仍然存在挑战。为解决这个挑战,我们介绍了隐藏变量反Q学习(L(M)V-IQL)算法,一种适应 discrete intrinsic reward 的新的IRL框架。通过对观察轨迹进行归类并独立解决IRL问题,我们使用了期望最大化方法。我们通过在模拟实验和真实鼠行为数据集上应用L(M)V-IQL算法,证明了我们的方法在动物决策预测方面的效果。这种进步对神经科学和心理学都具有潜在的应用,可能为动物决策中的深入理解和探索下面的大脑机制提供新的窗口。
Which Matters Most in Making Fund Investment Decisions? A Multi-granularity Graph Disentangled Learning Framework
results: 实验表明,MGDL 可以在线上和离线上环境中提供 stronger 的分离表示,并且在基金投资决策中达到更高的精度。Abstract
In this paper, we highlight that both conformity and risk preference matter in making fund investment decisions beyond personal interest and seek to jointly characterize these aspects in a disentangled manner. Consequently, we develop a novel M ulti-granularity Graph Disentangled Learning framework named MGDL to effectively perform intelligent matching of fund investment products. Benefiting from the well-established fund graph and the attention module, multi-granularity user representations are derived from historical behaviors to separately express personal interest, conformity and risk preference in a fine-grained way. To attain stronger disentangled representations with specific semantics, MGDL explicitly involve two self-supervised signals, i.e., fund type based contrasts and fund popularity. Extensive experiments in offline and online environments verify the effectiveness of MGDL.
摘要
在这篇论文中,我们强调了 Both 遵从和风险偏好在投资基金决策中的重要性,并尝试同时 caracterize 这些方面在分离的方式下。因此,我们开发了一种名为 MGDL 的多 Granularity Graph Disentangled Learning 框架,以有效地进行基金投资产品匹配。基于已经建立的基金图和注意模块,我们从历史行为中获得了多 Granularity 用户表示,以分别表达个人兴趣、遵从和风险偏好的细腻表示。为了增强分离表示的具体 semantics,MGDL 显式地使用了两种自动化信号,即基金类型基础上的对比和基金流行度。我们在线上和离线上进行了广泛的实验,并证明了 MGDL 的有效性。
results: 本文的结果表明,稳定性可以保证模型均值在合理的条件下具有良好的一致性和普遍风险最小化性。此外,使用10fold交叉验证来选择参数和使用Weighted average来降低参数选择的影响,可以更好地降低模型均值的抖音风险。Monte Carlo实验和应用示例也证明了提出的方法的有用性。Abstract
Model averaging has received much attention in the past two decades, which integrates available information by averaging over potential models. Although various model averaging methods have been developed, there are few literatures on the theoretical properties of model averaging from the perspective of stability, and the majority of these methods constrain model weights to a simplex. The aim of this paper is to introduce stability from statistical learning theory into model averaging. Thus, we define the stability, asymptotic empirical risk minimizer, generalization, and consistency of model averaging and study the relationship among them. Our results indicate that stability can ensure that model averaging has good generalization performance and consistency under reasonable conditions, where consistency means model averaging estimator can asymptotically minimize the mean squared prediction error. We also propose a L2-penalty model averaging method without limiting model weights and prove that it has stability and consistency. In order to reduce the impact of tuning parameter selection, we use 10-fold cross-validation to select a candidate set of tuning parameters and perform a weighted average of the estimators of model weights based on estimation errors. The Monte Carlo simulation and an illustrative application demonstrate the usefulness of the proposed method.
摘要
“模型平均”在过去二十年内得到了很多关注,它将可用信息集成到一起,并通过平均多个模型来减少模型风险。虽然有很多模型平均方法被开发出来,但是关于模型平均的理论性质从统计学学习角度来看的研究非常少,大多数方法都将模型权重约束在简单体上。本文的目标是将统计学学习理论中的稳定性引入到模型平均中。因此,我们定义了模型平均的稳定性、极限Empirical Risk Minimizer(ERM)、泛化和一致性,并研究它们之间的关系。我们的结果表明,稳定性可以保证模型平均在合理的条件下具有良好的泛化性和一致性。此外,我们还提出了L2正则化的模型平均方法,不需要限制模型权重,并证明它具有稳定性和一致性。为了减少参数调整的影响,我们使用10fold交叉验证选择一个候选集的参数,并使用Weighted Average来计算模型权重的估计误差。实验和应用示例表明了我们提出的方法的实用性。
Molecular Identification and Peak Assignment: Leveraging Multi-Level Multimodal Alignment on NMR
paper_authors: Hao Xu, Zhengyang Zhou, Pengyu Hong
for: 提供valuable insights into molecular dynamics and interactions
methods: 使用Multi-Level Multimodal Alignment with Knowledge-Guided Instance-Wise Discrimination (K-M3AID) Establish meaningful correspondences between molecular graphs (structures) and NMR spectra
results: 能够address multiple zero-shot tasks, offering a promising solution to bridge the gap between structural information and spectral data in complex NMR scenarios.Abstract
Nuclear magnetic resonance (NMR) spectroscopy plays an essential role across various scientific disciplines, providing valuable insights into molecular dynamics and interactions. Despite the promise of AI-enhanced NMR prediction models, challenges persist in the interpretation of spectra for tasks such as molecular retrieval, isomer recognition, and peak assignment. In response, this paper introduces Multi-Level Multimodal Alignment with Knowledge-Guided Instance-Wise Discrimination (K-M3AID) to establish meaningful correspondences between two heterogeneous modalities: molecular graphs (structures) and NMR spectra. In particular, K-M3AID employs a dual-coordinated contrastive learning architecture, and incorporates a graph-level alignment module, a node-level alignment module, and a communication channel. Notably, the framework introduces knowledge-guided instance-wise discrimination into contrastive learning within the node-level alignment module, significantly enhancing accuracy in cross-modal alignment. Additionally, K-M3AID showcases its capability of meta-learning by demonstrating that skills acquired during node-level alignment positively impact graph-level alignment. Empirical validation underscores K-M3AID's effectiveness in addressing multiple zero-shot tasks, offering a promising solution to bridge the gap between structural information and spectral data in complex NMR scenarios.
摘要
K-M3AID employs a dual-coordinated contrastive learning architecture and incorporates a graph-level alignment module, a node-level alignment module, and a communication channel. Notably, the framework introduces knowledge-guided instance-wise discrimination into contrastive learning within the node-level alignment module, significantly enhancing accuracy in cross-modal alignment.Additionally, K-M3AID showcases its capability of meta-learning by demonstrating that skills acquired during node-level alignment positively impact graph-level alignment. Empirical validation underscores K-M3AID's effectiveness in addressing multiple zero-shot tasks, offering a promising solution to bridge the gap between structural information and spectral data in complex NMR scenarios.
Knowledge Distillation Based Semantic Communications For Multiple Users
results: KD技术可以在意外干扰中提高SemCom系统的Robustness和泛化能力,并降低模型性能损失when压缩模型大小Abstract
Deep learning (DL) has shown great potential in revolutionizing the traditional communications system. Many applications in communications have adopted DL techniques due to their powerful representation ability. However, the learning-based methods can be dependent on the training dataset and perform worse on unseen interference due to limited model generalizability and complexity. In this paper, we consider the semantic communication (SemCom) system with multiple users, where there is a limited number of training samples and unexpected interference. To improve the model generalization ability and reduce the model size, we propose a knowledge distillation (KD) based system where Transformer based encoder-decoder is implemented as the semantic encoder-decoder and fully connected neural networks are implemented as the channel encoder-decoder. Specifically, four types of knowledge transfer and model compression are analyzed. Important system and model parameters are considered, including the level of noise and interference, the number of interfering users and the size of the encoder and decoder. Numerical results demonstrate that KD significantly improves the robustness and the generalization ability when applied to unexpected interference, and it reduces the performance loss when compressing the model size.
摘要
深度学习(DL)在传统通信系统中表现出了很大的潜力,许多应用在通信领域都采用了DL技术,这是因为它们具有强大的表示能力。然而,学习基于方法可能因训练集 limitation和不可预测的干扰而表现不佳。在这篇论文中,我们考虑了 semantic communication(SemCom)系统,该系统有限的训练样本和不可预测的干扰。为了提高模型泛化能力和减少模型大小,我们提议了知识储存(KD)基于系统,其中 transformer 基于 Encoder-Decoder 和全连接神经网络实现了 semantic Encoder-Decoder,并对 channel Encoder-Decoder 进行了知识传递和模型压缩。我们分析了四种知识传递和模型压缩方法,并考虑了重要的系统和模型参数,包括干扰和干扰水平、数量的干扰用户和 Encoder 和 Decoder 的大小。数据分析结果表明,KD 可以在不可预测的干扰下提高模型的Robustness和泛化能力,并在压缩模型大小时减少表现损失。
Learning Hierarchical Polynomials with Three-Layer Neural Networks
results: 对于许多度 $k$ 多项式函数 $p$,三层神经网络可以在 $\widetilde{\mathcal{O}(d^k)$ 样本和幂级时间内学习目标函数 $h$,超过kernel方法和两层神经网络的保证。当 $p$ 是二次多项式时,我们可以达到信息理论最佳的样本复杂度 $\widetilde{\mathcal{O}(d^2)$。Abstract
We study the problem of learning hierarchical polynomials over the standard Gaussian distribution with three-layer neural networks. We specifically consider target functions of the form $h = g \circ p$ where $p : \mathbb{R}^d \rightarrow \mathbb{R}$ is a degree $k$ polynomial and $g: \mathbb{R} \rightarrow \mathbb{R}$ is a degree $q$ polynomial. This function class generalizes the single-index model, which corresponds to $k=1$, and is a natural class of functions possessing an underlying hierarchical structure. Our main result shows that for a large subclass of degree $k$ polynomials $p$, a three-layer neural network trained via layerwise gradient descent on the square loss learns the target $h$ up to vanishing test error in $\widetilde{\mathcal{O}(d^k)$ samples and polynomial time. This is a strict improvement over kernel methods, which require $\widetilde \Theta(d^{kq})$ samples, as well as existing guarantees for two-layer networks, which require the target function to be low-rank. Our result also generalizes prior works on three-layer neural networks, which were restricted to the case of $p$ being a quadratic. When $p$ is indeed a quadratic, we achieve the information-theoretically optimal sample complexity $\widetilde{\mathcal{O}(d^2)$, which is an improvement over prior work~\citep{nichani2023provable} requiring a sample size of $\widetilde\Theta(d^4)$. Our proof proceeds by showing that during the initial stage of training the network performs feature learning to recover the feature $p$ with $\widetilde{\mathcal{O}(d^k)$ samples. This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
摘要
我们研究学习层次多项式函数的问题,具体来说是使用三层神经网络学习函数形式为$h = g \circ p$,其中$p : \mathbb{R}^d \rightarrow \mathbb{R}$是一个度量$k$的多项式函数,$g: \mathbb{R} \rightarrow \mathbb{R}$是一个度量$q$的多项式函数。这个函数类扩展了单个指数模型,对应于$k=1$,并是一个具有层次结构的自然函数类型。我们的主要结果表明,对于一个大的度量$k$多项式函数$p$,通过层次梯度下降在平方损失函数上训练三层神经网络,可以在$\widetilde{\mathcal{O}(d^k)$批量和很快的时间内学习目标函数$h$,而不需要$\widetilde \Theta(d^{kq})$批量,也不需要目标函数是低维的。我们的结果还推广了先前关于三层神经网络的研究,它们只是限制了$p$是二次函数。当$p$是二次函数时,我们可以 дости到信息理论最优的样本复杂度$\widetilde{\mathcal{O}(d^2)$,这是在之前的工作(Nichani et al., 2023)中所需的样本大小$\widetilde\Theta(d^4)$的改进。我们的证明是通过证明在训练过程的初期阶段,神经网络进行特征学习,以recover $p$的特征,需要$\widetilde{\mathcal{O}(d^k)$批量。这个研究表明了三层神经网络的能力,它可以学习复杂的特征,并因此学习一种广泛的层次函数。
A Unified Framework for Fair Spectral Clustering With Effective Graph Learning
results: 本文的实验表明,对于 sintetic、benchmark 和实际数据,我们的模型在比较保护性 clustering 方法时表现出优异性。Abstract
We consider the problem of spectral clustering under group fairness constraints, where samples from each sensitive group are approximately proportionally represented in each cluster. Traditional fair spectral clustering (FSC) methods consist of two consecutive stages, i.e., performing fair spectral embedding on a given graph and conducting $k$means to obtain discrete cluster labels. However, in practice, the graph is usually unknown, and we need to construct the underlying graph from potentially noisy data, the quality of which inevitably affects subsequent fair clustering performance. Furthermore, performing FSC through separate steps breaks the connections among these steps, leading to suboptimal results. To this end, we first theoretically analyze the effect of the constructed graph on FSC. Motivated by the analysis, we propose a novel graph construction method with a node-adaptive graph filter to learn graphs from noisy data. Then, all independent stages of conventional FSC are integrated into a single objective function, forming an end-to-end framework that inputs raw data and outputs discrete cluster labels. An algorithm is developed to jointly and alternately update the variables in each stage. Finally, we conduct extensive experiments on synthetic, benchmark, and real data, which show that our model is superior to state-of-the-art fair clustering methods.
摘要
我团队考虑了在群体公平性约束下进行спектраль划分问题,其中每个敏感群体中的样本需要相对准确地分布在每个划分中。传统的公平划分(FSC)方法包括两个阶段,即在给定图上进行公平划分embedding,然后使用kmeans进行粒子划分。然而,在实践中,图通常是未知的,我们需要从杂质数据中构建下面的图,这将影响后续的公平划分性能。此外,传统的FSC方法通过独立的阶段进行分解,导致结果不佳。为此,我们首先 theoretically 分析了构建的图对FSC的影响。受这一分析的激发,我们提出了一种基于节点适应的图构建方法,用于从杂质数据中学习图。然后,我们将所有独立的FSC阶段集成到了一个总体目标函数中,形成了一个从原始数据直接输入到粒子划分标签输出的端到端框架。我们开发了一个联合更新变量的算法,用于在每个阶段进行交叉更新。最后,我们对synthetic、benchmark和实际数据进行了广泛的实验,结果表明,我们的模型在公平划分方面表现更优于当前的公平划分方法。
Learning Optimal and Fair Policies for Online Allocation of Scarce Societal Resources from Data Collected in Deployment
results: 我们的数据驱动策略在长期下 asymptotically 达到了预期的外样策略的期望结果,只要技术假设是温和的。我们还扩展了我们的框架,以满足不同的公平约束。我们对设计分配住房资源的政策进行了评估,并发现使用我们的策略可以提高无家者 Exit 率 by 1.9%,而且公平的分配和结果公平约束来的代价很低。Abstract
We study the problem of allocating scarce societal resources of different types (e.g., permanent housing, deceased donor kidneys for transplantation, ventilators) to heterogeneous allocatees on a waitlist (e.g., people experiencing homelessness, individuals suffering from end-stage renal disease, Covid-19 patients) based on their observed covariates. We leverage administrative data collected in deployment to design an online policy that maximizes expected outcomes while satisfying budget constraints, in the long run. Our proposed policy waitlists each individual for the resource maximizing the difference between their estimated mean treatment outcome and the estimated resource dual-price or, roughly, the opportunity cost of using the resource. Resources are then allocated as they arrive, in a first-come first-serve fashion. We demonstrate that our data-driven policy almost surely asymptotically achieves the expected outcome of the optimal out-of-sample policy under mild technical assumptions. We extend our framework to incorporate various fairness constraints. We evaluate the performance of our approach on the problem of designing policies for allocating scarce housing resources to people experiencing homelessness in Los Angeles based on data from the homeless management information system. In particular, we show that using our policies improves rates of exit from homelessness by 1.9% and that policies that are fair in either allocation or outcomes by race come at a very low price of fairness.
摘要
我们研究如何分配社会资源的不同类型(例如:永久住房、捐献者肾脏移植、呼吸机)给不同类型的候选人(例如:无家者、患有末期肾脏病的人、COVID-19患者),基于他们观察到的共同项目。我们利用部署中的行政数据设计了一个在网上政策,以最大化预期结果,同时满足预算限制,在长期运行。我们的提案的政策将每个人列入最大化评估结果和资源双价的差异(约等于机会成本)的资源列表。资源然后按照到来的顺序分配。我们显示了我们的数据驱动政策在不同的公平性限制下几乎确定地在无限制下 достиieves the expected outcome of the optimal out-of-sample policy under mild technical assumptions。我们将此框架扩展到包括多种公平性限制。我们评估了我们的方法在 Los Angeles 分配短缺住房资源的问题上的表现。具体来说,我们显示使用我们的政策可以提高无家者状况改善率 by 1.9%,并且与不同的 раса公平性限制相关的公平性成本几乎没有影响。
Extraction of n = 0 pick-up by locked mode detectors based on neural networks in J-TEXT
For: This paper presents a new method for measuring locked mode (LM) in magnetohydrodynamic (MHD) instabilities and plasma disruptions.* Methods: The method uses neural networks (NNs) to predict the n=0 pick-up and subtract it from the signal to obtain the amplitude and phase of the LM. The power multiple time scale (PMTS) approach is used to fit the brn=0 on the LM detectors with little error in multiple frequency ranges.* Results: The n>0 pick-up brn>0 generated by resonant magnetic perturbations (RMPs) can be obtained after subtracting the extracted brn=0. The new method uses only one LM detector, which optimizes the distribution of the LM detectors.Abstract
Measurement of locked mode (LM) is important for the physical research of Magnetohydrodynamic (MHD) instabilities and plasma disruption. The n = 0 pick-up need to be extracted and subtracted to calculate the amplitude and phase of the LM. A new method to extract this pick-up has been developed by predicting the n = 0 pick-up brn=0 by the LM detectors based on Neural Networks (NNs) in J-TEXT. An approach called Power Multiple Time Scale (PMTS) has been developed with outstanding regressing effect in multiple frequency ranges. Three models have been progressed based on PMTS NNs. PMTS could fit the brn=0 on the LM detectors with little errors both in time domain and frequency domain. The n>0 pick-up brn>0 generated by resonant magnetic perturbations (RMPs) can be obtained after subtracting the extracted brn=0. This new method uses only one LM instead of 4 LM detectors to extract brn=0. Therefore, the distribution of the LM detectors can also be optimized based on this new method.
摘要
测量锁定模式(LM)的重要性对于磁液动力学(MHD)不稳定性和束缚束损受的物理研究非常重要。在J-TEXT中,一种新的方法已经开发来抽取n=0捕捉,该方法基于人工神经网络(NN)来预测brn=0。这种方法被称为多域时间尺度(PMTS),它在多个频率范围内具有出色的透际效果。基于PMTS NNs的三种模型已经被提出。PMTS可以准确地将brn=0抽取到LM探测器上,即使在时域和频域中也具有少量的误差。在RMPs中生成的n>0捕捉brn>0可以通过 subtracting 抽取出来。这种新方法只需要一个LM探测器,因此可以优化LM探测器的分布。
results: 通过统计力学方法,可以理解和计算自组织系统的结构和行为。Abstract
After more than a century of concerted effort, physics still lacks basic principles of spontaneous self-organization. To appreciate why, we first state the problem, outline historical approaches, and survey the present state of the physics of self-organization. This frames the particular challenges arising from mathematical intractability and the resulting need for computational approaches, as well as those arising from a chronic failure to define structure. Then, an overview of two modern mathematical formulations of organization -- intrinsic computation and evolution operators -- lays out a way to overcome these challenges. Together, the vantage point they afford shows how to account for the emergence of structured states via a statistical mechanics of systems arbitrarily far from equilibrium. The result is a constructive path forward to principles of organization that builds on mathematical identification of structure.
摘要
After more than a century of concerted effort, physics still lacks basic principles of spontaneous self-organization. To appreciate why, we first state the problem, outline historical approaches, and survey the present state of the physics of self-organization. This frames the particular challenges arising from mathematical intractability and the resulting need for computational approaches, as well as those arising from a chronic failure to define structure. Then, an overview of two modern mathematical formulations of organization -- intrinsic computation and evolution operators -- lays out a way to overcome these challenges. Together, the vantage point they afford shows how to account for the emergence of structured states via a statistical mechanics of systems arbitrarily far from equilibrium. The result is a constructive path forward to principles of organization that builds on mathematical identification of structure.Here's the translation in Traditional Chinese as well:After more than a century of concerted effort, physics still lacks basic principles of spontaneous self-organization. To appreciate why, we first state the problem, outline historical approaches, and survey the present state of the physics of self-organization. This frames the particular challenges arising from mathematical intractability and the resulting need for computational approaches, as well as those arising from a chronic failure to define structure. Then, an overview of two modern mathematical formulations of organization -- intrinsic computation and evolution operators -- lays out a way to overcome these challenges. Together, the vantage point they afford shows how to account for the emergence of structured states via a statistical mechanics of systems arbitrarily far from equilibrium. The result is a constructive path forward to principles of organization that builds on mathematical identification of structure.
results: 研究发现,最低的缩放大小(1.25x和2.5x)在十分之一验证中表现最好,而中间的缩放大小(5x和10x)在保留测试中表现最好(准确率为62%和61%)。此外,使用较低的缩放大小可以更快速地训练和评估卵巢癌病型。Abstract
Artificial intelligence has found increasing use for ovarian cancer morphological subtyping from histopathology slides, but the optimal magnification for computational interpretation is unclear. Higher magnifications offer abundant cytological information, whereas lower magnifications give a broader histoarchitectural overview. Using attention-based multiple instance learning, we performed the most extensive analysis of ovarian cancer tissue magnifications to date, with data at six magnifications subjected to the same preprocessing, hyperparameter tuning, cross-validation and hold-out testing procedures. The lowest magnifications (1.25x and 2.5x) performed best in cross-validation, and intermediate magnifications (5x and 10x) performed best in hold-out testing (62% and 61% accuracy, respectively). Lower magnification models were also significantly faster, with the 5x model taking 5% as long to train and 31% as long to evaluate slides compared to 40x. This indicates that the standard usage of high magnifications for computational ovarian cancer subtyping may be unnecessary, with lower magnifications giving faster, more accurate alternatives.
摘要
results: simulation结果表明,提案的设计可以实现更高的宽带幅增强,而且二层和三层设计的宽带幅增强都高于常用的孤波变换器。一个使用38个单元细节的二层匹配镜被制造并通过打开端面导向管来验证实验结果。Abstract
This paper presents a non-zoned discrete dielectric lens comprising two or three matching layers to reduce the 50-110 GHz frequency range reflections. Based on Chebyshev and binomial multi-section transformers, the designed models use matching layers at the top and bottom. In addition, the presented designs use pins instead of the conventional slots for the matching layers, thus easing the manufacturing process. The results show that the broadband realized gain obtained using the proposed design is higher for both the two- and three-layer design than the commonly used quarter-wave transformer. A Binomial lens with two matchings layers using 38 unit cells is fabricated and illuminated by an open-ended waveguide to validate the simulation results obtained using CST Microwave Studio. The fabrication process uses stereolithography additive manufacturing.
摘要
Translation notes:* "non-zoned" is translated as "无区域" (wú zhōu yì)* "discrete" is translated as "分立" (fēn lí)* "dielectric lens" is translated as "电性透镜" (diàn xìng tōu jìng)* "Chebyshev and binomial multi-section transformers" are translated as "Chebychev和二次多段变换器" (Chébyshev hé èr chóng duō jiān biàn huà)* "matching layers" are translated as "匹配层" (pǐ bèi cè)* "pins" are translated as "钉子" (dī zǐ)* "stereolithography additive manufacturing" is translated as "材料层加工" (cái wù jiā gōng)
Threat-Based Resource Allocation Strategy for Target Tracking in a Cognitive Radar Network
methods: 本研究使用了威胁在干扰 radar资源分配中的使用,并解决了干扰 dwell time 分配问题使用第二个卷积 Program (SOCP)。
results: 数值仿真结果表明,提出的方案可以增强干扰 radar的作战评价和目标跟踪性能。Abstract
Cognitive radar is developed to utilize the feedback of its operating environment obtained from a beam to make resource allocation decisions by solving optimization problems. Previous works focused on target tracking accuracy by designing an evaluation metric for an optimization problem. However, in a real combat situation, not only the tracking performance of the target but also its operational perspective should be considered. In this study, the usage of threats in the allocation of radar resource is proposed for a cognitive radar framework. Resource allocation regarding radar dwell time is considered to reflect the operational importance of target effects. The dwell time allocation problem is solved using a Second-Order Cone Program (SOCP). Numerical simulations are performed to verify the effectiveness of the proposed framework.
摘要
cognitive radar 是为了利用它的运作环境回授,通过解决优化问题进行资源配置决策。 previous works 专注于目标追踪精度,通过设计评估度量来解决优化问题。但在实战情况下,不仅需要追踪目标的追踪性能,还需要考虑目标的运作价值。本研究提出了在 cognitive radar 框架中使用威胁来配置激光资源。这个问题的解决方案是使用第二类拓扑程式 (SOCP)。 numerical simulations 是进行验证的。
Beamforming Design for Hybrid IRS-aided AF Relay Wireless Networks
results: 对比基准方案(通过passive IRS-aided AF relay和只有AF relay网络),提档LC-SCA-FP方法实现的速率均高于基准方案Abstract
In this paper, a hybrid IRS-aided amplify-and-forward (AF) relay wireless network is put forward, where the hybrid IRS is made up of passive and active elements. For maximum signal-to-noise ratio (SNR), a low-complexity method based on successive convex approximation and fractional programming (LC-SCA-FP) is proposed to jointly optimize the beamforming matrix at AF relay and the reflecting coefficient matrices at IRS. Simulation results verify that the rate achieved by the proposed LC-SCA-FP method surpass those of the benchmark schemes, namely the passive IRS-aided AF relay and only AF relay network.
摘要
在本文中,一种混合式IRS受助的增强前向(AF)无线网络被提出,其中IRS由pasive和活动元素组成。为了最大化信号噪听比(SNR),一种低复杂度的LC-SCA-FP方法被提议来同时优化AF关键矩阵和IRS反射矩阵。实验结果表明,提议的LC-SCA-FP方法可以超越参考方案,即pasive IRS受助的AF关键矩阵和只有AF关键矩阵网络。Here's the breakdown of the translation:* "In this paper" is translated as "在本文中" (在本文中).* "a hybrid IRS-aided amplify-and-forward (AF) relay wireless network" is translated as "一种混合式IRS受助的增强前向(AF)无线网络" (一种混合式IRS受助的增强前向(AF)无线网络).* "where the hybrid IRS is made up of passive and active elements" is translated as "其中IRS由pasive和活动元素组成" (其中IRS由pasive和活动元素组成).* "For maximum signal-to-noise ratio (SNR)" is translated as "为了最大化信号噪听比(SNR)" (为了最大化信号噪听比(SNR)).* "a low-complexity method based on successive convex approximation and fractional programming (LC-SCA-FP)" is translated as "一种低复杂度的LC-SCA-FP方法" (一种低复杂度的LC-SCA-FP方法).* "is proposed to jointly optimize the beamforming matrix at AF relay and the reflecting coefficient matrices at IRS" is translated as "提议同时优化AF关键矩阵和IRS反射矩阵" (提议同时优化AF关键矩阵和IRS反射矩阵).* "Simulation results verify that the rate achieved by the proposed LC-SCA-FP method surpass those of the benchmark schemes" is translated as "实验结果表明,提议的LC-SCA-FP方法可以超越参考方案" (实验结果表明,提议的LC-SCA-FP方法可以超越参考方案).* "namely the passive IRS-aided AF relay and only AF relay network" is translated as "即pasive IRS受助的AF关键矩阵和只有AF关键矩阵网络" (即pasive IRS受助的AF关键矩阵和只有AF关键矩阵网络).
A Deep Reinforcement Learning Approach for Improving Age of Information in Mission-Critical IoT
results: 比较related-work的结果,本研究的方法可以实现信息新鲜度的优化,并且实现了AoI过衡概率的下降。Abstract
The emerging mission-critical Internet of Things (IoT) play a vital role in remote healthcare, haptic interaction, and industrial automation, where timely delivery of status updates is crucial. The Age of Information (AoI) is an effective metric to capture and evaluate information freshness at the destination. A system design based solely on the optimization of the average AoI might not be adequate to capture the requirements of mission-critical applications, since averaging eliminates the effects of extreme events. In this paper, we introduce a Deep Reinforcement Learning (DRL)-based algorithm to improve AoI in mission-critical IoT applications. The objective is to minimize an AoI-based metric consisting of the weighted sum of the average AoI and the probability of exceeding an AoI threshold. We utilize the actor-critic method to train the algorithm to achieve optimized scheduling policy to solve the formulated problem. The performance of our proposed method is evaluated in a simulated setup and the results show a significant improvement in terms of the average AoI and the AoI violation probability compared to the related-work.
摘要
“现代互联网Of Things(IoT)在远程医疗、感觉交互和工业自动化等mission-critical应用中扮演着重要的角色,其中快速传递状态更新是关键。信息年龄(AoI)是一个有效的度量器,用于捕捉和评估目标端信息的新鲜度。然而,基于AoI的平均值的系统设计可能不够 capture mission-critical应用的需求,因为平均值会消除EXTREME事件的影响。在本文中,我们提出了基于深度强化学习(DRL)的算法,用于提高mission-critical IoT应用中的AoI。我们的目标是将AoI-based度量器中的平均AoI和超过AoI阈值的概率加权和。我们使用actor-critic方法来训练算法,以实现优化的排程策略,解决表述的问题。我们的提出的方法在模拟setup中进行了评估,结果显示与相关工作相比,有显著的提高 average AoI和AoI违反概率。”Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.
for: This paper proposes new bounds for parameter estimation in statistical signal processing, which can be used to evaluate the performance of estimation methods.
methods: The proposed bounds are based on the Bobrovsky–Mayer-Wolf–Zakai class of bounds and are optimized using a weighting function.
results: The proposed bounds are asymptotically attainable and coincide with the expected Cramér-Rao bound (ECRB) in several fundamental signal processing examples. Unlike the Bayesian Cramér-Rao bound (BCRB), the proposed bounds are valid for any estimator and do not require uniformly unbiasedness.Abstract
Performance bounds for parameter estimation play a crucial role in statistical signal processing theory and applications. Two widely recognized bounds are the Cram\'{e}r-Rao bound (CRB) in the non-Bayesian framework, and the Bayesian CRB (BCRB) in the Bayesian framework. However, unlike the CRB, the BCRB is asymptotically unattainable in general, and its equality condition is restrictive. This paper introduces an extension of the Bobrovsky--Mayer-Wolf--Zakai class of bounds, also known as the weighted BCRB (WBCRB). The WBCRB is optimized by tuning the weighting function in the scalar case. Based on this result, we propose an asymptotically tight version of the bound called AT-BCRB. We prove that the AT-BCRB is asymptotically attained by the maximum {\it a-posteriori} probability (MAP) estimator. Furthermore, we extend the WBCRB and the AT-BCRB to the case of vector parameters. The proposed bounds are evaluated in several fundamental signal processing examples, such as variance estimation of white Gaussian process, direction-of-arrival estimation, and mean estimation of Gaussian process with unknown variance and prior statistical information. It is shown that unlike the BCRB, the proposed bounds are asymptotically attainable and coincide with the expected CRB (ECRB). The ECRB, which imposes uniformly unbiasedness, cannot serve as a valid lower bound in the Bayesian framework, while the proposed bounds are valid for any estimator.
摘要
perf bounds for parameter estimation play a crucial role in statistical signal processing theory and applications. Two widely recognized bounds are the Cramér-Rao bound (CRB) in the non-Bayesian framework, and the Bayesian CRB (BCRB) in the Bayesian framework. However, unlike the CRB, the BCRB is asymptotically unattainable in general, and its equality condition is restrictive. This paper introduces an extension of the Bobrovsky--Mayer-Wolf--Zakai class of bounds, also known as the weighted BCRB (WBCRB). The WBCRB is optimized by tuning the weighting function in the scalar case. Based on this result, we propose an asymptotically tight version of the bound called AT-BCRB. We prove that the AT-BCRB is asymptotically attained by the maximum \emph{a-posteriori} probability (MAP) estimator. Furthermore, we extend the WBCRB and the AT-BCRB to the case of vector parameters. The proposed bounds are evaluated in several fundamental signal processing examples, such as variance estimation of white Gaussian process, direction-of-arrival estimation, and mean estimation of Gaussian process with unknown variance and prior statistical information. It is shown that unlike the BCRB, the proposed bounds are asymptotically attainable and coincide with the expected CRB (ECRB). The ECRB, which imposes uniformly unbiasedness, cannot serve as a valid lower bound in the Bayesian framework, while the proposed bounds are valid for any estimator.Here's the translation in Traditional Chinese: perf bounds for parameter estimation play a crucial role in statistical signal processing theory and applications. Two widely recognized bounds are the Cramér-Rao bound (CRB) in the non-Bayesian framework, and the Bayesian CRB (BCRB) in the Bayesian framework. However, unlike the CRB, the BCRB is asymptotically unattainable in general, and its equality condition is restrictive. This paper introduces an extension of the Bobrovsky--Mayer-Wolf--Zakai class of bounds, also known as the weighted BCRB (WBCRB). The WBCRB is optimized by tuning the weighting function in the scalar case. Based on this result, we propose an asymptotically tight version of the bound called AT-BCRB. We prove that the AT-BCRB is asymptotically attained by the maximum \emph{a-posteriori} probability (MAP) estimator. Furthermore, we extend the WBCRB and the AT-BCRB to the case of vector parameters. The proposed bounds are evaluated in several fundamental signal processing examples, such as variance estimation of white Gaussian process, direction-of-arrival estimation, and mean estimation of Gaussian process with unknown variance and prior statistical information. It is shown that unlike the BCRB, the proposed bounds are asymptotically attainable and coincide with the expected CRB (ECRB). The ECRB, which imposes uniformly unbiasedness, cannot serve as a valid lower bound in the Bayesian framework, while the proposed bounds are valid for any estimator.
A Fast Power Spectrum Sensing Solution for Generalized Coprime Sampling
results: simulations 结果显示,这个方案比现有的方法更具有低复杂度和快速执行速度,同时保持了相同的性能水准,而且在分布式群体enario中,模型误差的影响也仅对测定性能有轻微的影响。Abstract
The growing scarcity of spectrum resources, wideband spectrum sensing is required to process a prohibitive volume of data at a high sampling rate. For some applications, spectrum estimation only requires second-order statistics. In this case, a fast power spectrum sensing solution is proposed based on the generalized coprime sampling. By exploring the sensing vector inherent structure, the autocorrelation sequence of inputs can be reconstructed from sub-Nyquist samples by only utilizing the parallel Fourier transform and simple multiplication operations. Thus, it takes less time than the state-of-the-art methods while maintaining the same performance, and it achieves higher performance than the existing methods within the same execution time, without the need for pre-estimating the number of inputs. Furthermore, the influence of the model mismatch has only a minor impact on the estimation performance, which allows for more efficient use of the spectrum resource in a distributed swarm scenario. Simulation results demonstrate the low complexity in sampling and computation, making it a more practical solution for real-time and distributed wideband spectrum sensing applications.
摘要
随着频率资源的增加稀缺,宽带频率感知成为处理极高采样率的离散数据的必要手段。有些应用程序只需第二阶统计来进行频率估计。在这种情况下,一种快速的电力 спектrum感知解决方案基于泛函 coprime 采样。通过探索感知向量的内在结构,输入序列的自相关可以从低于 Nyquist 采样点重建,只需要平行傅里叶变换和简单的乘法操作。因此,它比现有方法更快速,同时保持同样的性能,而不需要预先估算输入的数量。此外,模型匹配的影响也很小,使得频率资源的使用更加高效,在分布式群组enario下。 simulation 结果表明,采样和计算的低复杂性使得这种解决方案在实时和分布式宽带频率感知应用中更加实用。
results: 该研究提出了一种基于人体特征自动提取的HRTF个性化实现方法,可以提供高度个性化的听音体验。Abstract
Spatial audio and 3-Dimensional sound rendering techniques play a pivotal and essential role in immersive audio experiences. Head-Related Transfer Functions (HRTFs) are acoustic filters which represent how sound interacts with an individual's unique head and ears anatomy. The use of HRTFs compliant to the subjects anatomical traits is crucial to ensure a personalized and unique spatial experience. This work proposes the implementation of an HRTF individualization method based on anthropometric features automatically extracted from ear images using a Convolutional Neural Network (CNN). Firstly, a CNN is implemented and tested to assess the performance of machine learning on positioning landmarks on ear images. The I-BUG dataset, containing ear images with corresponding 55 landmarks, was used to train and test the neural network. Subsequently, 12 relevant landmarks were selected to correspond to 7 specific anthropometric measurements established by the HUTUBS database. These landmarks serve as a reference for distance computation in pixels in order to retrieve the anthropometric measurements from the ear images. Once the 7 distances in pixels are extracted from the ear image, they are converted in centimetres using conversion factors, a best match method vector is implemented computing the Euclidean distance for each set in a database of 116 ears with their corresponding 7 anthropometric measurements provided by the HUTUBS database. The closest match of anthropometry can be identified and the corresponding set of HRTFs can be obtained for personnalized use. The method is evaluated in its validity instead of the accuracy of the results. The conceptual scope of each stage has been verified and substantiated to function correctly. The various steps and the available elements in the process are reviewed and challenged to define a greater algorithm entity designed for the desired task.
摘要
幻 Audio 和三维声音渲染技术在幻 Audio 经验中发挥关键和必要的作用。人体相关传播函数(HRTF)是一种声学筛子,表示声音与个人唯一的头和耳朵解剖特征之间的互动。为确保个性化的幻 Audio 经验,HRTF 必须遵循个人解剖特征。这项工作提议基于人体解剖特征自动提取的 HRTF 个性化方法。首先,我们实现了一种基于卷积神经网络(CNN)的 HRTF 个性化方法。我们使用 I-BUG 数据集,包含有 ear 图像和相应的 55 个标注点,来训练和测试神经网络。然后,我们选择了 12 个重要的标注点,以便与 HUTUBS 数据库中的 7 个人体测量相匹配。这些标注点用于计算 ear 图像中的距离(在像素单位),以获取人体测量。最后,我们将 ear 图像中的距离转换为厘米,使用比例因子,并实现了一个最佳匹配方法,以计算每个 ear 图像中的人体测量。通过将最佳匹配的人体测量与 HUTUBS 数据库中的 116 个耳朵和其相应的 7 个人体测量进行比较,可以确定最佳匹配的人体测量,并获取相应的 HRTF。我们评估了这种方法的有效性,而不是准确率。我们验证了每个阶段的概念范围是否正确,并验证了各个步骤和可用的元素是否正确。我们对这些步骤和元素进行了评估和挑战,以定义一个更大的算法实体,用于实现所需的任务。