2023-07-11

eess.IV

eess.IV - 2023-07-11

APRF: Anti-Aliasing Projection Representation Field for Inverse Problem in Imaging

paper_url: http://arxiv.org/abs/2307.05270
repo_url: None
paper_authors: Zixuan Chen, Lingxiao Yang, Jianhuang Lai, Xiaohua Xie
for: 高品质 Computed Tomography (CT) 影像重建
methods: 使用 Implicit Neural Representations (INRs) 建立坐标基于的映射，以及自我监督 Anti-Aliasing Projection Representation Field (APRF) 方法
results: 比前方法更高效，能够提供更高品质的 CT 影像，并且具有更少的噪音和杂音Here’s a more detailed explanation of each point:
for: The paper is focused on developing a novel method for reconstructing high-quality CT images from sparse-view measurements.
methods: The proposed method uses Implicit Neural Representations (INRs) to establish a coordinate-based mapping between sinograms and CT images. Additionally, a self-supervised method called Anti-Aliasing Projection Representation Field (APRF) is proposed to improve the quality of the reconstructed images.
results: The proposed method outperforms state-of-the-art methods in terms of image quality, with fewer artifacts and more accurate details. The code for the proposed method will be publicly available soon.

Abstract
Sparse-view Computed Tomography (SVCT) reconstruction is an ill-posed inverse problem in imaging that aims to acquire high-quality CT images based on sparsely-sampled measurements. Recent works use Implicit Neural Representations (INRs) to build the coordinate-based mapping between sinograms and CT images. However, these methods have not considered the correlation between adjacent projection views, resulting in aliasing artifacts on SV sinograms. To address this issue, we propose a self-supervised SVCT reconstruction method -- Anti-Aliasing Projection Representation Field (APRF), which can build the continuous representation between adjacent projection views via the spatial constraints. Specifically, APRF only needs SV sinograms for training, which first employs a line-segment sampling module to estimate the distribution of projection views in a local region, and then synthesizes the corresponding sinogram values using center-based line integral module. After training APRF on a single SV sinogram itself, it can synthesize the corresponding dense-view (DV) sinogram with consistent continuity. High-quality CT images can be obtained by applying re-projection techniques on the predicted DV sinograms. Extensive experiments on CT images demonstrate that APRF outperforms state-of-the-art methods, yielding more accurate details and fewer artifacts. Our code will be publicly available soon.

摘要
sparse-view 计算机断层成像（SVCT）重建是一个不定问题在成像中，旨在基于稀疏样本获取高质量 CT 图像。 current works 使用隐式神经表示（INRs）建立坐标基于的映射 между sinograms 和 CT 图像。 however, these methods have not considered the correlation between adjacent projection views, resulting in aliasing artifacts on SV sinograms. to address this issue, we propose a self-supervised SVCT reconstruction method -- Anti-Aliasing Projection Representation Field (APRF), which can build the continuous representation between adjacent projection views via the spatial constraints. specifically, APRF only needs SV sinograms for training, which first employs a line-segment sampling module to estimate the distribution of projection views in a local region, and then synthesizes the corresponding sinogram values using center-based line integral module. after training APRF on a single SV sinogram itself, it can synthesize the corresponding dense-view (DV) sinogram with consistent continuity. high-quality CT images can be obtained by applying re-projection techniques on the predicted DV sinograms. extensive experiments on CT images demonstrate that APRF outperforms state-of-the-art methods, yielding more accurate details and fewer artifacts. our code will be publicly available soon.Note: The translation is in Simplified Chinese, which is one of the two standard versions of Chinese. The other version is Traditional Chinese.

paper_url: http://arxiv.org/abs/2307.05241
repo_url: https://github.com/gama-ufsc/brain-age
paper_authors: Bruno Machado Pacheco, Victor Hugo Rocha de Oliveira, Augusto Braga Fernandes Antunes, Saulo Domingos de Souza Pedro, Danilo Silva
for: 预测健康人脑年龄，作为脑健康指标和成功老化指标，以及疾病生物标志。
methods: 使用深度学习模型预测健康人脑年龄，并在训练过程中使用脑相关任务进行预处理。
results: 研究发现，使用脑相关任务预处理的深度学习模型可以在ADNI数据集上达到状态之作的结果，而且可以验证这些模型的脑年龄生物标志在轻度认知障碍和阿尔茨heimer病患者的图像上的效果。

Abstract
Brain age prediction using neuroimaging data has shown great potential as an indicator of overall brain health and successful aging, as well as a disease biomarker. Deep learning models have been established as reliable and efficient brain age estimators, being trained to predict the chronological age of healthy subjects. In this paper, we investigate the impact of a pre-training step on deep learning models for brain age prediction. More precisely, instead of the common approach of pre-training on natural imaging classification, we propose pre-training the models on brain-related tasks, which led to state-of-the-art results in our experiments on ADNI data. Furthermore, we validate the resulting brain age biomarker on images of patients with mild cognitive impairment and Alzheimer's disease. Interestingly, our results indicate that better-performing deep learning models in terms of brain age prediction on healthy patients do not result in more reliable biomarkers.

摘要
��colon cancer prediction using neuroimaging data has shown great potential as an indicator of overall brain health and successful aging, as well as a disease biomarker. Deep learning models have been established as reliable and efficient brain age estimators, being trained to predict the chronological age of healthy subjects. In this paper, we investigate the impact of a pre-training step on deep learning models for brain age prediction. More precisely, instead of the common approach of pre-training on natural imaging classification, we propose pre-training the models on brain-related tasks, which led to state-of-the-art results in our experiments on ADNI data. Furthermore, we validate the resulting brain age biomarker on images of patients with mild cognitive impairment and Alzheimer's disease. Interestingly, our results indicate that better-performing deep learning models in terms of brain age prediction on healthy patients do not result in more reliable biomarkers.Here's the word-for-word translation:��colon cancer prediction using neuroimaging data has shown great potential as an indicator of overall brain health and successful aging, as well as a disease biomarker. Deep learning models have been established as reliable and efficient brain age estimators, being trained to predict the chronological age of healthy subjects. In this paper, we investigate the impact of a pre-training step on deep learning models for brain age prediction. More precisely, instead of the common approach of pre-training on natural imaging classification, we propose pre-training the models on brain-related tasks, which led to state-of-the-art results in our experiments on ADNI data. Furthermore, we validate the resulting brain age biomarker on images of patients with mild cognitive impairment and Alzheimer's disease. Interestingly, our results indicate that better-performing deep learning models in terms of brain age prediction on healthy patients do not result in more reliable biomarkers.

Encoder Complexity Control in SVT-AV1 by Speed-Adaptive Preset Switching

paper_url: http://arxiv.org/abs/2307.05208
repo_url: None
paper_authors: Lena Eichermüller, Gaurang Chaudhari, Ioannis Katsavounidis, Zhijun Lei, Hassene Tmar, André Kaup, Christian Herglotz
for: 提高视频编码器的复杂度控制机制，以满足在线视频流量的增长和相关的编码需求。
methods: 使用速度自适应预设切换机制，在完整预设范围内实现用户定义的时间限制，准确率为8.9%。
results: 可以在编码过程中实现用户定义的时间限制，而无需添加额外延迟。

Abstract
Current developments in video encoding technology lead to continuously improving compression performance but at the expense of increasingly higher computational demands. Regarding the online video traffic increases during the last years and the concomitant need for video encoding, encoder complexity control mechanisms are required to restrict the processing time to a sufficient extent in order to find a reasonable trade-off between performance and complexity. We present a complexity control mechanism in SVT-AV1 by using speed-adaptive preset switching to comply with the remaining time budget. This method enables encoding with a user-defined time constraint within the complete preset range with an average precision of 8.9 \% without introducing any additional latencies.

摘要
当前的视频编码技术发展趋势导致无限扩大压缩性能，但是同时也增加了计算负担。随着上传视频流量的增长和相关的编码需求，需要控制编码器复杂性来限制处理时间，以达到一个合理的性能和复杂性交互。我们在SVT-AV1中提出了一种复杂性控制机制，通过速度适应预设切换来遵守剩余时间预算。这种方法可以在完整的预设范围内进行编码，并且保证用户定义的时间约束，平均准确率为8.9%，不增加任何延迟。

HistoColAi: An Open-Source Web Platform for Collaborative Digital Histology Image Annotation with AI-Driven Predictive Integration

paper_url: http://arxiv.org/abs/2307.07525
repo_url: None
paper_authors: Cristian Camilo Pulgarín-Ospina, Rocío del Amor, Adrián Colomera, Julio Silva-Rodríguez, Valery Naranjo
for: 这篇论文是为了提供一种高效的在线工具，用于可见化和注释数字 histological 图像的开发而写的。
methods: 本论文使用了深度学习基于方法，用于图像分析，以帮助在数字patology中提高诊断精度。
results: 本论文包含了一个关于细胞皮肤肿瘤诊断的用例，并进行了多个注释者的可用性研究，以验证开发的工具的可行性。

Abstract
Digital pathology has become a standard in the pathology workflow due to its many benefits. These include the level of detail of the whole slide images generated and the potential immediate sharing of cases between hospitals. Recent advances in deep learning-based methods for image analysis make them of potential aid in digital pathology. However, a major limitation in developing computer-aided diagnostic systems for pathology is the lack of an intuitive and open web application for data annotation. This paper proposes a web service that efficiently provides a tool to visualize and annotate digitized histological images. In addition, to show and validate the tool, in this paper we include a use case centered on the diagnosis of spindle cell skin neoplasm for multiple annotators. A usability study of the tool is also presented, showing the feasibility of the developed tool.

摘要
数字病理学已成为病理学工作流程中的标准，这主要归功于它的多种优点。这些优点包括整个报告图像的细节水平和可以立即将案例传递给医院。Recent advances in deep learning-based methods for image analysis make them of potential aid in digital pathology. However, a major limitation in developing computer-aided diagnostic systems for pathology is the lack of an intuitive and open web application for data annotation. This paper proposes a web service that efficiently provides a tool to visualize and annotate digitized histological images. In addition, to show and validate the tool, in this paper we include a use case centered on the diagnosis of spindle cell skin neoplasm for multiple annotators. A usability study of the tool is also presented, showing the feasibility of the developed tool.Here's the translation in Traditional Chinese: digitale pathology 已成为病理学工作流程中的标准，这主要从其许多优点中来。这些优点包括整个报告图像的细节水平和可以立即将案例传递到医院。Recent advances in deep learning-based methods for image analysis make them of potential aid in digital pathology. However, a major limitation in developing computer-aided diagnostic systems for pathology is the lack of an intuitive and open web application for data annotation. This paper proposes a web service that efficiently provides a tool to visualize and annotate digitized histological images. In addition, to show and validate the tool, in this paper we include a use case centered on the diagnosis of spindle cell skin neoplasm for multiple annotators. A usability study of the tool is also presented, showing the feasibility of the developed tool.

Super-resolution imaging through a multimode fiber: the physical upsampling of speckle-driven

paper_url: http://arxiv.org/abs/2307.05138
repo_url: None
paper_authors: Chuncheng Zhang, Tingting Liu, Zhihua Xie, Yu Wang, Tong Liu, Qian Chen, Xiubao Sui
for: 这个论文的目的是为了提高endooscopic imaging的精度和质量，以便更好地进行微创外科手术。
methods: 这个论文使用了一种基于深度学习的超分解影像方法，通过physically realizing the upsampling of low-resolution images来提高模型的感知能力。
results: 实验表明，这种方法可以有效地提高endooscopic imaging的精度和质量，并且可以补偿数据驱动方法中缺失的信息。

Abstract
Following recent advancements in multimode fiber (MMF), miniaturization of imaging endoscopes has proven crucial for minimally invasive surgery in vivo. Recent progress enabled by super-resolution imaging methods with a data-driven deep learning (DL) framework has balanced the relationship between the core size and resolution. However, most of the DL approaches lack attention to the physical properties of the speckle, which is crucial for reconciling the relationship between the magnification of super-resolution imaging and the quality of reconstruction quality. In the paper, we find that the interferometric process of speckle formation is an essential basis for creating DL models with super-resolution imaging. It physically realizes the upsampling of low-resolution (LR) images and enhances the perceptual capabilities of the models. The finding experimentally validates the role played by the physical upsampling of speckle-driven, effectively complementing the lack of information in data-driven. Experimentally, we break the restriction of the poor reconstruction quality at great magnification by inputting the same size of the speckle with the size of the high-resolution (HR) image to the model. The guidance of our research for endoscopic imaging may accelerate the further development of minimally invasive surgery.

摘要
Recent advancements in multimode fiber (MMF) have made it crucial to miniaturize imaging endoscopes for minimally invasive surgery in vivo. Recent progress in super-resolution imaging methods with a data-driven deep learning (DL) framework has balanced the relationship between the core size and resolution. However, most DL approaches overlook the physical properties of speckle, which is essential for reconciling the relationship between the magnification of super-resolution imaging and the reconstruction quality. In our paper, we find that the interferometric process of speckle formation is a crucial basis for creating DL models with super-resolution imaging. It physically realizes the upsampling of low-resolution (LR) images and enhances the perceptual capabilities of the models. Our findings experimentally validate the role played by the physical upsampling of speckle-driven, effectively complementing the lack of information in data-driven. Experimentally, we break the restriction of poor reconstruction quality at great magnification by inputting the same size of the speckle with the size of the high-resolution (HR) image to the model. Our research guidance may accelerate the further development of minimally invasive surgery.

Offline and Online Optical Flow Enhancement for Deep Video Compression

paper_url: http://arxiv.org/abs/2307.05092
repo_url: None
paper_authors: Chuanbo Tang, Xihua Sheng, Zhuoyuan Li, Haotian Zhang, Li Li, Dong Liu
for: 提高深度视频压缩网络的效率，使其更加适应实际视频数据。
methods: 提出了两 stage 优化方法：离线阶段使用已训练的 optical flow 估计网络进行优化，在线阶段使用梯度下降算法对视频进行适应优化。
results: 实验结果显示，该方法可以在 tested 视频上实现平均 12.8% 的比特率减少，而无需增加解码器的模型或计算复杂度。

Abstract
Video compression relies heavily on exploiting the temporal redundancy between video frames, which is usually achieved by estimating and using the motion information. The motion information is represented as optical flows in most of the existing deep video compression networks. Indeed, these networks often adopt pre-trained optical flow estimation networks for motion estimation. The optical flows, however, may be less suitable for video compression due to the following two factors. First, the optical flow estimation networks were trained to perform inter-frame prediction as accurately as possible, but the optical flows themselves may cost too many bits to encode. Second, the optical flow estimation networks were trained on synthetic data, and may not generalize well enough to real-world videos. We address the twofold limitations by enhancing the optical flows in two stages: offline and online. In the offline stage, we fine-tune a trained optical flow estimation network with the motion information provided by a traditional (non-deep) video compression scheme, e.g. H.266/VVC, as we believe the motion information of H.266/VVC achieves a better rate-distortion trade-off. In the online stage, we further optimize the latent features of the optical flows with a gradient descent-based algorithm for the video to be compressed, so as to enhance the adaptivity of the optical flows. We conduct experiments on a state-of-the-art deep video compression scheme, DCVC. Experimental results demonstrate that the proposed offline and online enhancement together achieves on average 12.8% bitrate saving on the tested videos, without increasing the model or computational complexity of the decoder side.

摘要
视频压缩听说很多地利用视频帧之间的时间重复性，通常通过估计和使用运动信息来实现。运动信息通常被表示为光流在大多数现有的深度视频压缩网络中。实际上，这些网络 часто采用预训练的光流估计网络进行运动估计。然而，光流可能对视频压缩不适用，因为以下两个因素：一是光流估计网络在尽可能准确地进行间帧预测，但光流本身可能cost太多比特来编码。二是光流估计网络在synthetic数据上进行训练，可能无法在实际视频中generalize好 enough。我们通过两个阶段进行优化来解决这两个限制：离线阶段和在线阶段。在离线阶段，我们使用一个已经训练过的光流估计网络，并在H.266/VVC中提供的运动信息的基础上进行微调。在在线阶段，我们使用一种基于梯度下降算法的优化方法，以适应视频压缩。我们在DCVC中进行实验，实际结果表明，我们的离线和在线优化结合使用，在测试视频上平均实现12.8%的比特率折损，无需增加解码器的模型或计算复杂度。

SAR-NeRF: Neural Radiance Fields for Synthetic Aperture Radar Multi-View Representation

paper_url: http://arxiv.org/abs/2307.05087
repo_url: None
paper_authors: Zhengxin Lei, Feng Xu, Jiangtao Wei, Feng Cai, Feng Wang, Ya-Qiu Jin
for: 本研究旨在提出一种基于NeRF的SAR图像生成模型，以便更好地利用SAR图像的特点和特殊性。
methods: 本研究使用了SAR图像机理和神经网络结合，提出了一种基于NeRF的SAR图像生成模型，通过映射和投影原理，将SAR图像模型为3D图像空间中的吸收率和散射强度的函数。
results: 通过多种数据集的量化实验，研究者发现SAR-NeRF模型能够具有良好的多视角表示和泛化能力，并且可以在几何学学习设置下提高SAR目标分类性能。

Abstract
SAR images are highly sensitive to observation configurations, and they exhibit significant variations across different viewing angles, making it challenging to represent and learn their anisotropic features. As a result, deep learning methods often generalize poorly across different view angles. Inspired by the concept of neural radiance fields (NeRF), this study combines SAR imaging mechanisms with neural networks to propose a novel NeRF model for SAR image generation. Following the mapping and projection pinciples, a set of SAR images is modeled implicitly as a function of attenuation coefficients and scattering intensities in the 3D imaging space through a differentiable rendering equation. SAR-NeRF is then constructed to learn the distribution of attenuation coefficients and scattering intensities of voxels, where the vectorized form of 3D voxel SAR rendering equation and the sampling relationship between the 3D space voxels and the 2D view ray grids are analytically derived. Through quantitative experiments on various datasets, we thoroughly assess the multi-view representation and generalization capabilities of SAR-NeRF. Additionally, it is found that SAR-NeRF augumented dataset can significantly improve SAR target classification performance under few-shot learning setup, where a 10-type classification accuracy of 91.6\% can be achieved by using only 12 images per class.

摘要
SAR图像受观测配置影响很大，并且在不同视角下显示出显著的变化，这使得深度学习方法很难通过不同视角来泛化。以NeRF原理为灵感，本研究将SAR探测机制与神经网络结合，提出了一种新的NeRF模型 дляSAR图像生成。通过 mapping和projection原理，SAR-NeRF模型将被用来学习积分吸收率和散射强度的分布，其中每个小体积的SAR渲染公式和视角网格之间的样本关系被分析 derivation。通过多种数据集的量化实验，我们全面评估了SAR-NeRF模型在多视角表示和泛化能力方面。此外，我们发现SAR-NeRF数据集可以大幅提高SAR目标分类性能，特别是在少量学习setup下，只需使用12个图像per类可以 дости得10类分类精度达91.6%。

Towards Anytime Optical Flow Estimation with Event Cameras

paper_url: http://arxiv.org/abs/2307.05033
repo_url: https://github.com/yaozhuwa/eva-flow
paper_authors: Yaozu Ye, Hao Shi, Kailun Yang, Ze Wang, Xiaoting Yin, Yaonan Wang, Kaiwei Wang
For: The paper is focused on developing a high-frame-rate, low-latency event representation for optical flow estimation using event cameras.* Methods: The proposed method, called EVA-Flow, uses a unified voxel grid to represent events and a stacked Spatiotemporal Motion Refinement (SMR) module to predict temporally-dense optical flow. The method also utilizes a Rectified Flow Warp Loss (RFWL) for unsupervised evaluation of intermediate optical flow.* Results: The proposed method achieves competitive performance, with super-low latency (5ms), fastest inference (9.2ms), time-dense motion estimation (200Hz), and strong generalization.Here is the information in Simplified Chinese text:
for: 本文主要针对使用事件摄像机进行光流估计。
methods: 提议的方法使用团结矩阵表示事件，并使用堆叠的空间时间运动级化模块预测时间密集的光流。方法还利用Rectified Flow Warp Loss（RFWL）进行不监督的中间光流评估。
results: 提议的方法实现了竞争性的性能，具有超低延迟（5ms）、最快的推理（9.2ms）、时间密集的运动估计（200Hz）和强大的泛化性。

Abstract
Event cameras are capable of responding to log-brightness changes in microseconds. Its characteristic of producing responses only to the changing region is particularly suitable for optical flow estimation. In contrast to the super low-latency response speed of event cameras, existing datasets collected via event cameras, however, only provide limited frame rate optical flow ground truth, (e.g., at 10Hz), greatly restricting the potential of event-driven optical flow. To address this challenge, we put forward a high-frame-rate, low-latency event representation Unified Voxel Grid, sequentially fed into the network bin by bin. We then propose EVA-Flow, an EVent-based Anytime Flow estimation network to produce high-frame-rate event optical flow with only low-frame-rate optical flow ground truth for supervision. The key component of our EVA-Flow is the stacked Spatiotemporal Motion Refinement (SMR) module, which predicts temporally-dense optical flow and enhances the accuracy via spatial-temporal motion refinement. The time-dense feature warping utilized in the SMR module provides implicit supervision for the intermediate optical flow. Additionally, we introduce the Rectified Flow Warp Loss (RFWL) for the unsupervised evaluation of intermediate optical flow in the absence of ground truth. This is, to the best of our knowledge, the first work focusing on anytime optical flow estimation via event cameras. A comprehensive variety of experiments on MVSEC, DESC, and our EVA-FlowSet demonstrates that EVA-Flow achieves competitive performance, super-low-latency (5ms), fastest inference (9.2ms), time-dense motion estimation (200Hz), and strong generalization. Our code will be available at https://github.com/Yaozhuwa/EVA-Flow.

摘要
Event 摄像头可以在微秒级别响应日志亮度变化。它的特点是仅响应变化区域，特别适合光流估算。然而，现有的事件摄像头数据集仅提供有限制的帧率光流真实值（例如10Hz），大大限制了事件驱动的光流潜力。为解决这个挑战，我们提出了高帧率、低延迟事件表示 Unified Voxel Grid，顺序Feed into网络bin by bin。然后，我们提出了EVENT-based Anytime Flow estimation Network（EVA-Flow），以生成高帧率事件光流，只需低帧率光流真实值作为超vision。EVA-Flow的关键组件是堆叠的空间时间运动级化（SMR）模块，预测时间密集的光流并通过空间时间运动级化提高准确性。SMR模块使用的时间密集特征扭曲提供了隐式超vision для中间光流。此外，我们引入了Rectified Flow Warp Loss（RFWL），用于无监督评估中间光流。这是我们所知道的首个关注在事件摄像头上的任何时间光流估算工作。我们在MVSEC、DESC和我们自己的EVA-FlowSet上进行了广泛的实验，并证明了EVA-Flow可以实现竞争性表现，超低延迟（5ms），最快执行（9.2ms），时间密集运动估计（200Hz）和强大总体化。我们的代码将在https://github.com/Yaozhuwa/EVA-Flow中提供。

Count-Free Single-Photon 3D Imaging with Race Logic

paper_url: http://arxiv.org/abs/2307.04924
repo_url: None
paper_authors: Atul Ingle, David Maier
for: The paper is written for developing an online approach for distance estimation in single-photon cameras (SPCs) without explicitly storing photon counts.
methods: The paper uses race logic to process photon streams in the time-delay domain and constructs count-free equi-depth histograms using a binner element to represent the distribution of photons.
results: The paper shows that the proposed method can provide an order of magnitude reduction in bandwidth and power consumption while maintaining similar distance reconstruction accuracy as conventional processing methods.Here’s the Chinese version of the three points:
for: 这篇论文是为了开发一种在单 photon 相机（SPC）中实现距离估计而不需要直接存储光子计数的在线方法。
methods: 该论文使用了竞速逻辑处理单 photon 流水，并使用一个归一化元素构建无计数的等深分布图。
results: 论文表明，该方法可以减少带宽和功耗消耗量，同时保持与传统处理方法相同的距离重建精度。

Abstract
Single-photon cameras (SPCs) have emerged as a promising technology for high-resolution 3D imaging. A single-photon 3D camera determines the round-trip time of a laser pulse by capturing the arrival of individual photons at each camera pixel. Constructing photon-timestamp histograms is a fundamental operation for a single-photon 3D camera. However, in-pixel histogram processing is computationally expensive and requires large amount of memory per pixel. Digitizing and transferring photon timestamps to an off-sensor histogramming module is bandwidth and power hungry. Here we present an online approach for distance estimation without explicitly storing photon counts. The two key ingredients of our approach are (a) processing photon streams using race logic, which maintains photon data in the time-delay domain, and (b) constructing count-free equi-depth histograms. Equi-depth histograms are a succinct representation for ``peaky'' distributions, such as those obtained by an SPC pixel from a laser pulse reflected by a surface. Our approach uses a binner element that converges on the median (or, more generally, to another quantile) of a distribution. We cascade multiple binners to form an equi-depth histogrammer that produces multi-bin histograms. Our evaluation shows that this method can provide an order of magnitude reduction in bandwidth and power consumption while maintaining similar distance reconstruction accuracy as conventional processing methods.

摘要
单 photon 摄像机（SPC）已成为高分辨率 3D 成像技术的承诺。单 photon 3D 摄像机通过记录每个像素的各个 фотоン的到达时间来确定激光脉冲的圆涂时间。构建 photon 时间频谱 Histogram 是单 photon 3D 摄像机的基本操作之一。然而，在每个像素中进行 Histogram 处理是计算昂贵的，需要大量的内存。将 photon 时间频谱数据转移到外部 Histogramming 模块进行处理也是带宽和功耗浪费。我们现在提出了一种在线方法，不需要直接存储 photon 计数。我们的方法包括以下两个关键组成部分：1. 使用竞赛逻辑处理 photon 流，以保持 photon 数据在时延频域中。2. 使用 equi-depth histogram 构建器，通过 converges onto the median （或更一般地，另一个量化）的方式，来生成具有 "peaky" 分布的 histogram。equi-depth histogram 是一种简洁的表示方式，用于描述由 SPC 像素反射激光脉冲后得到的分布。我们的方法使用一个 binner 元素，该元素 converge 到分布的中值（或更一般地，另一个量化）。我们将多个 binner 元素串接起来，形成一个 equi-depth histogrammer，该机制可以生成多个 histogram。我们的评估表明，这种方法可以在带宽和功耗上减少一个数量级，而保持与传统处理方法相似的距离重建精度。

Kinematically-Decoupled Impedance Control for Fast Object Visual Servoing and Grasping on Quadruped Manipulators

paper_url: http://arxiv.org/abs/2307.04918
repo_url: None
paper_authors: Riccardo Parosi, Mattia Risiglione, Darwin G. Caldwell, Claudio Semini, Victor Barasuol
for: 这 paper 是为了提出一个控制管道，用于对物体进行搜索、接近和抓取（SAG），基于分离的机械臂链和阻力控制，并 integrate 图像基于视能服务（IBVS）。
methods: 该管道使用了分离的机械臂链和阻力控制，并 integrate 图像基于视能服务（IBVS）。
results: 经过各种实验，提出的方法能够在动态移动的四足机器人上实现稳定的搜索、接近和抓取操作，并且能够抗抵抗外部干扰。

Abstract
We propose a control pipeline for SAG (Searching, Approaching, and Grasping) of objects, based on a decoupled arm kinematic chain and impedance control, which integrates image-based visual servoing (IBVS). The kinematic decoupling allows for fast end-effector motions and recovery that leads to robust visual servoing. The whole approach and pipeline can be generalized for any mobile platform (wheeled or tracked vehicles), but is most suitable for dynamically moving quadruped manipulators thanks to their reactivity against disturbances. The compliance of the impedance controller makes the robot safer for interactions with humans and the environment. We demonstrate the performance and robustness of the proposed approach with various experiments on our 140 kg HyQReal quadruped robot equipped with a 7-DoF manipulator arm. The experiments consider dynamic locomotion, tracking under external disturbances, and fast motions of the target object.

摘要
我们提出了一个SAG（搜索、接近和抓取）控制管道，基于分离式机械臂链和弹簧控制，并 integrates 图像基于视服务（IBVS）。机械链的分离使得结束器速度快，并且可以快速恢复，从而实现了可靠的视服务。整个方法和管道可以应用于任何移动平台（轮式或轨道车辆），但是最适合动态移动四足机械人，因为它们对干扰的反应更强。弹簧控制器的灵活性使得机器人在与人类和环境的互动中更安全。我们通过对我们7度自由度 manipulate 机械臂的HyQReal四足机械人进行多种实验，证明了我们的方法的性能和稳定性。实验包括动态移动、外部干扰追踪和目标物体快速移动。

Rapid Deforestation and Burned Area Detection using Deep Multimodal Learning on Satellite Imagery

paper_url: http://arxiv.org/abs/2307.04916
repo_url: https://github.com/h2oai/cvpr-multiearth-deforestation-segmentation
paper_authors: Gabor Fodor, Marcos V. Conde
for: 这个研究报告的目的是提出一种基于多模态卫星影像和远程感知技术的方法，用于估计亚马逊热带雨林中的Deforestation和野火检测。
methods: 该研究使用了 convolutional neural networks (CNNs) 和完整的数据处理技术来解决这些问题。 dataset 包括从 Sentinel、Landsat、VIIRS 和 MODIS 卫星获取的 curaed 图像和多个通道频率。
results: 该方法在未见图像上达到了高精度的Deforestation 估计和烧毁地带检测。Here’s the same information in English:
for: The purpose of this research paper is to propose a method based on multimodal satellite imagery and remote sensing technology for estimating deforestation and detecting wildfire in the Amazon region.
methods: The research uses convolutional neural networks (CNNs) and comprehensive data processing techniques to solve these problems. The dataset includes curated images and diverse channel bands from Sentinel, Landsat, VIIRS, and MODIS satellites.
results: The method achieves high-precision deforestation estimation and burned area detection on unseen images from the region.

Abstract
Deforestation estimation and fire detection in the Amazon forest poses a significant challenge due to the vast size of the area and the limited accessibility. However, these are crucial problems that lead to severe environmental consequences, including climate change, global warming, and biodiversity loss. To effectively address this problem, multimodal satellite imagery and remote sensing offer a promising solution for estimating deforestation and detecting wildfire in the Amazonia region. This research paper introduces a new curated dataset and a deep learning-based approach to solve these problems using convolutional neural networks (CNNs) and comprehensive data processing techniques. Our dataset includes curated images and diverse channel bands from Sentinel, Landsat, VIIRS, and MODIS satellites. We design the dataset considering different spatial and temporal resolution requirements. Our method successfully achieves high-precision deforestation estimation and burned area detection on unseen images from the region. Our code, models and dataset are open source: https://github.com/h2oai/cvpr-multiearth-deforestation-segmentation

摘要
亚马逊森林的排除和野火检测 pose 一个 significative 挑战，因为该区域的面积很大，而访问也很困难。然而，这些问题会导致严重的环境后果，包括气候变化、全球变暖和生物多样性损失。为了有效地解决这个问题，多模态卫星成像和远程感知提供了一个有希望的解决方案，通过 convolutional neural networks (CNNs) 和全面的数据处理技术。我们的数据集包括手动准备的图像和多种通道频谱的卫星图像，包括 Sentinel、Landsat、VIIRS 和 MODIS 卫星。我们设计数据集，考虑不同的空间和时间分辨率要求。我们的方法在未见图像上达到了高精度的排除和烧毁地带检测。我们的代码、模型和数据集都是开源的，可以在 GitHub 上找到：https://github.com/h2oai/cvpr-multiearth-deforestation-segmentation。

KU-DMIS-MSRA at RadSum23: Pre-trained Vision-Language Model for Radiology Report Summarization

paper_url: http://arxiv.org/abs/2307.07409
repo_url: None
paper_authors: Gangwoo Kim, Hajung Kim, Lei Ji, Seongsu Bae, Chanhwi Kim, Mujeen Sung, Hyunjae Kim, Kun Yan, Eric Chang, Jaewoo Kang
for: 这个研究是为了开发一个新的预训掌握视觉语言模型（VLM），用于胸部X射影领域。
methods: 这个模型首先在多个多元领域的资料上进行预训，然后转移到胸部X射影领域。我们将不同的领域对应 задачу聚合为一个简单的序列转换架构，让模型从有限资源中获得所需的知识和技能。
results: 我们的模型在 BioNLP 共享任务的评分数据上显示出了超过其他模型的表现，并在 RadSum23 隐藏测试集的领导排行板上获得了第一名。

Abstract
In this paper, we introduce CheXOFA, a new pre-trained vision-language model (VLM) for the chest X-ray domain. Our model is initially pre-trained on various multimodal datasets within the general domain before being transferred to the chest X-ray domain. Following a prominent VLM, we unify various domain-specific tasks into a simple sequence-to-sequence schema. It enables the model to effectively learn the required knowledge and skills from limited resources in the domain. Demonstrating superior performance on the benchmark datasets provided by the BioNLP shared task, our model benefits from its training across multiple tasks and domains. With subtle techniques including ensemble and factual calibration, our system achieves first place on the RadSum23 leaderboard for the hidden test set.

摘要
在这篇论文中，我们引入CheXOFA，一种新的预训练视语言模型（VLM），用于胸X光领域。我们的模型首先在通用领域中预训练于多种多样的数据集，然后转移到胸X光领域。借鉴一些知名的VLM，我们将各种领域特定任务整合成简单的序列到sequence schema。这使得模型可以很好地从有限资源中学习需要的知识和技能。在BioNLP共享任务提供的标准 datasets上表现出色，我们的模型受益于跨任务和领域的训练。通过微妙的技巧，包括ensemble和事实抽象，我们的系统在RadSum23隐藏测试集上达到了第一名。

CVPR MultiEarth 2023 Deforestation Estimation Challenge:SpaceVision4Amazon

paper_url: http://arxiv.org/abs/2307.04715
repo_url: None
paper_authors: Sunita Arya, S Manthira Moorthi, Debajyoti Dhar
for: 本研究提出了一种基于注意力指导UNet架构的森林砍伐估计方法，使用电Optical（EO）和Synthetic Aperture Radar（SAR）卫星影像。
methods: 用Landsat-8和Sentinel-1卫星数据进行训练和验证提议的模型，由于数据的时空尺度不可用，因此每架次训练一个模型。
results: 训练时LandSat-8模型达到了93.45%的训练和验证像素精度，Sentinel-2模型达到了83.87%的像素精度。测试集评估中，模型达到了84.70%的像素精度，F1-Score为0.79，IOU为0.69。

Abstract
In this paper, we present a deforestation estimation method based on attention guided UNet architecture using Electro-Optical (EO) and Synthetic Aperture Radar (SAR) satellite imagery. For optical images, Landsat-8 and for SAR imagery, Sentinel-1 data have been used to train and validate the proposed model. Due to the unavailability of temporally and spatially collocated data, individual model has been trained for each sensor. During training time Landsat-8 model achieved training and validation pixel accuracy of 93.45% and Sentinel-2 model achieved 83.87% pixel accuracy. During the test set evaluation, the model achieved pixel accuracy of 84.70% with F1-Score of 0.79 and IoU of 0.69.

摘要
在这篇论文中，我们提出了一种基于注意力引导的UNet架构的森林伐木估计方法，使用电Optical（EO）和Synthetic Aperture Radar（SAR）卫星图像。对于光学图像，我们使用了Landstat-8数据进行训练和验证；对于SAR图像，我们使用了Sentinel-1数据。由于数据的 temporally和spatially不同，我们需要单独训练每款感知器。在训练时，Landstat-8模型在训练和验证像素精度方面达到了93.45%，而Sentinel-2模型在验证像素精度方面达到了83.87%。在测试集评估中，模型在像素精度方面达到了84.70%，F1-Score为0.79，IOU为0.69。

2023-07-10

cs.SD

cs.SD - 2023-07-10

Collaborative Song Dataset (CoSoD): An annotated dataset of multi-artist collaborations in popular music

paper_url: http://arxiv.org/abs/2307.05588
repo_url: None
paper_authors: Michèle Duguay, Kate Mancey, Johanna Devaney
for: 本研究使用的 CoSoD 数据集包含 331 首多个艺人合作的歌曲，来自 2010-2019 年 Billboard “Hot 100” 年度榜单。
methods: 本研究使用的 annotation 方法包括 formal sections, 声音生成方法（包括延迟、层次、扬声和performer 的性别）以及相关的 metadata。
results: 研究发现，男性艺人的 vocals 通常具有 menos 的延迟和更窄的位置在声音混合中，与女性艺人的 vocals 相比。

Abstract
The Collaborative Song Dataset (CoSoD) is a corpus of 331 multi-artist collaborations from the 2010-2019 Billboard "Hot 100" year-end charts. The corpus is annotated with formal sections, aspects of vocal production (including reverberation, layering, panning, and gender of the performers), and relevant metadata. CoSoD complements other popular music datasets by focusing exclusively on musical collaborations between independent acts. In addition to facilitating the study of song form and vocal production, CoSoD allows for the in-depth study of gender as it relates to various timbral, pitch, and formal parameters in musical collaborations. In this paper, we detail the contents of the dataset and outline the annotation process. We also present an experiment using CoSoD that examines how the use of reverberation, layering, and panning are related to the gender of the artist. In this experiment, we find that men's voices are on average treated with less reverberation and occupy a more narrow position in the stereo mix than women's voices.

摘要
《合作歌曲数据集（CoSoD）》是2010-2019年度 Billboard "Hot 100" 年度榜单中331首多位艺人合作的数据集。该数据集已经被标注了正式部分、声乐生产方面的属性（包括延迟、层次、扬声和performer的性别）以及相关的元数据。CoSoD 与其他流行音乐数据集相比，专注于独立艺人之间的音乐合作。此外，CoSoD 还允许对歌曲结构和声乐生产进行深入研究，以及对 gender 与不同的时间、抽象和形式参数之间的关系进行深入研究。在这篇文章中，我们详细介绍了数据集的内容和标注过程，以及使用 CoSoD 进行了一个实验，发现男性艺人的声音在延迟方面比女性艺人的声音更为低，并且男性艺人的声音在立体混合中占据更窄的位置。

The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task

paper_url: http://arxiv.org/abs/2307.04630
repo_url: None
paper_authors: Kun Song, Yi lei, Peikun Chen, Yiqing Cao, Kun Wei, Yongmao Zhang, Lei Xie, Ning Jiang, Guoqing Zhao
for: 这个论文旨在提出一个基于自动语音识别（ASR）、机器翻译（MT）和文本到语音（TTS）的语音到语音翻译（S2ST）系统，用于从英语语音多源到中文语音的翻译。
methods: 这个系统采用了多种数据增强策略和ROVER基于多个ASR模型输出的分数融合，以提高多源语音输入的Robustness。此外，为了更好地处理含杂ASR转文本的问题，我们提出了三个阶段精度调整策略。
results: 实验结果显示，我们的系统在翻译准确率、语音自然度、声音质量和 speaker相似性方面具有优秀的表现，同时也在多源数据下表现良好的Robustness。

Abstract
This paper describes the NPU-MSXF system for the IWSLT 2023 speech-to-speech translation (S2ST) task which aims to translate from English speech of multi-source to Chinese speech. The system is built in a cascaded manner consisting of automatic speech recognition (ASR), machine translation (MT), and text-to-speech (TTS). We make tremendous efforts to handle the challenging multi-source input. Specifically, to improve the robustness to multi-source speech input, we adopt various data augmentation strategies and a ROVER-based score fusion on multiple ASR model outputs. To better handle the noisy ASR transcripts, we introduce a three-stage fine-tuning strategy to improve translation accuracy. Finally, we build a TTS model with high naturalness and sound quality, which leverages a two-stage framework, using network bottleneck features as a robust intermediate representation for speaker timbre and linguistic content disentanglement. Based on the two-stage framework, pre-trained speaker embedding is leveraged as a condition to transfer the speaker timbre in the source English speech to the translated Chinese speech. Experimental results show that our system has high translation accuracy, speech naturalness, sound quality, and speaker similarity. Moreover, it shows good robustness to multi-source data.

摘要

Exploiting an External Microphone for Binaural RTF-Vector-Based Direction of Arrival Estimation for Multiple Speakers

paper_url: http://arxiv.org/abs/2307.04460
repo_url: None
paper_authors: Daniel Fejgin, Simon Doclo
for: 减少计算复杂性的听力器 DOA估计方法
methods: 使用外部麦克风和听力器麦克风的低相关性假设进行RTF向量估计，并使用covariance whitening方法获取RTF向量
results: 在听力器麦克风和外部麦克风的听音环境中，提出了一种低计算复杂性的DOA估计方法，并且在两个说话人和杂音环境下获得了相当于CW方法的DOA估计性能，而且具有较低的计算复杂性。

Abstract
In hearing aid applications, an important objective is to accurately estimate the direction of arrival (DOA) of multiple speakers in noisy and reverberant environments. Recently, we proposed a binaural DOA estimation method, where the DOAs of the speakers are estimated by selecting the directions for which the so-called Hermitian angle spectrum between the estimated relative transfer function (RTF) vector and a database of prototype anechoic RTF vectors is maximized. The RTF vector is estimated using the covariance whitening (CW) method, which requires a computationally complex generalized eigenvalue decomposition. The spatial spectrum is obtained by only considering frequencies where it is likely that one speaker dominates over the other speakers, noise and reverberation. In this contribution, we exploit the availability of an external microphone that is spatially separated from the hearing aid microphones and consider a low-complexity RTF vector estimation method that assumes a low spatial coherence between the undesired components in the external microphone and the hearing aid microphones. Using recordings of two speakers and diffuse-like babble noise in acoustic environments with mild reverberation and low signal-to-noise ratio, simulation results show that the proposed method yields a comparable DOA estimation performance as the CW method at a lower computational complexity.

摘要
在听觉器应用中，一个重要的目标是准确估计多个说话人的方向来达（DOA）在噪音和反射环境中。最近，我们提出了一种双耳DOA估计方法，其中DOA的估计是通过选择RTFVector的 Hermitian角spectrum与一个数据库中的prototype静音RTFVector之间的最大匹配来实现。RTFVector是通过covariance whitening（CW）方法来估计，该方法需要计算复杂的普通值归一化分解。 spatial spectrum是通过只考虑可能有一个说话人在其他说话人、噪音和反射中占据主导地位的频率来获得。在这种贡献中，我们利用了一个外部麦克风，该麦克风与听觉器麦克风之间存在低空间协方差，并考虑了一种低复杂度RTFVector估计方法，该方法假设外部麦克风和听觉器麦克风之间的不desired ком分布具有低空间协方差。通过对两个说话人和杂音噪音的记录在带有柔和反射的Acoustic环境中进行模拟，结果表明，我们的方法可以与CW方法具有相同的DOA估计性能，但具有较低的计算复杂度。

HCLAS-X: Hierarchical and Cascaded Lyrics Alignment System Using Multimodal Cross-Correlation

paper_url: http://arxiv.org/abs/2307.04377
repo_url: None
paper_authors: Minsung Kang, Soochul Park, Keunwoo Choi
for: 本研究targets the challenge of lyrics alignment, which involves aligning the lyrics and vocal components of songs.
methods: 我们提出了一种基于supervised learning的模型，利用歌词和音频模态之间的cross-correlation矩阵来适应这个挑战。我们的模型采用了层次结构和顺序串行的设计，首先在句子级别预测同步时间，然后在词语级别预测同步时间。
results: 我们的提议模型在比较 experiment中显示了significant improvement in mean average error，并且在 deploying 在多个音乐流媒体服务上的实践中也得到了良好的结果。

Abstract
In this work, we address the challenge of lyrics alignment, which involves aligning the lyrics and vocal components of songs. This problem requires the alignment of two distinct modalities, namely text and audio. To overcome this challenge, we propose a model that is trained in a supervised manner, utilizing the cross-correlation matrix of latent representations between vocals and lyrics. Our system is designed in a hierarchical and cascaded manner. It predicts synced time first on a sentence-level and subsequently on a word-level. This design enables the system to process long sequences, as the cross-correlation uses quadratic memory with respect to sequence length. In our experiments, we demonstrate that our proposed system achieves a significant improvement in mean average error, showcasing its robustness in comparison to the previous state-of-the-art model. Additionally, we conduct a qualitative analysis of the system after successfully deploying it in several music streaming services.

摘要
在这个工作中，我们面临着歌词对齐挑战，即将歌词和声音组件对齐。这个问题需要对两种不同的modalities进行对齐，即文本和音频。为了解决这个挑战，我们提议一种在有监督下训练的模型，使用歌词和声音的横排相关矩阵来训练。我们的系统设计为层次和分层的结构，首先在句子水平预测同步时间，然后在单词水平进行精度调整。这种设计使得系统能够处理长序列，因为交叉相关使用了序列长度的二次memory。在我们的实验中，我们示出了我们提议的系统在平均误差中具有显著的改善，这反映了它在比前一个状态艺术模型时更加稳定。此外，我们还进行了音乐流服务中成功部署后的系统Qualitative分析。

2023-07-10

eess.AS

eess.AS - 2023-07-10

Timbre transfer using image-to-image denoising diffusion implicit models

paper_url: http://arxiv.org/abs/2307.04586
repo_url: None
paper_authors: Luca Comanducci, Fabio Antonacci, Augusto Sarti
for: 这个论文的目的是将乐器的 зву色转移到另一种乐器上，保持音乐特征的内容。
methods: 这个论文使用了深度学习基于生成的Denosing Diffusion Models（DDMs）进行音色转移。具体来说，使用最近提出的Denosing Diffusion Implicit Models（DDIMs）加速采样过程。
results: 该论文通过 listening tests 和对象度量来证明提出的模型的效果。

Abstract
Timbre transfer techniques aim at converting the sound of a musical piece generated by one instrument into the same one as if it was played by another instrument, while maintaining as much as possible the content in terms of musical characteristics such as melody and dynamics. Following their recent breakthroughs in deep learning-based generation, we apply Denoising Diffusion Models (DDMs) to perform timbre transfer. Specifically, we apply the recently proposed Denoising Diffusion Implicit Models (DDIMs) that enable to accelerate the sampling procedure. Inspired by the recent application of DDMs to image translation problems we formulate the timbre transfer task similarly, by first converting the audio tracks into log mel spectrograms and by conditioning the generation of the desired timbre spectrogram through the input timbre spectrogram. We perform both one-to-one and many-to-many timbre transfer, by converting audio waveforms containing only single instruments and multiple instruments, respectively. We compare the proposed technique with existing state-of-the-art methods both through listening tests and objective measures in order to demonstrate the effectiveness of the proposed model.

摘要
timbre 传送技术目的在于将一种乐器生成的音乐作品的声音转换成另一种乐器上的同样的声音，保持音乐特点如旋律和声 dynamics 等。我们通过深度学习基本的生成技术，应用 Denoising Diffusion Models (DDMs) 来实现 timbre 传送。具体来说，我们使用最近提出的 Denoising Diffusion Implicit Models (DDIMs)，以加速抽样过程。受图像翻译问题的应用启发，我们将 timbre 传送任务设置为类似的形式，首先将音频轨迹转换为宽度 spectrograms 并通过输入 timbre spectrogram 来控制生成 Desired timbre spectrogram。我们进行了一对一和多对多 timbre 传送，将单个乐器和多个乐器的音频波形转换为另一种 timbre。我们将比较我们的提议方法与现有状态的方法，通过听测试和 объек 的度量来证明我们的模型的效果。

Study on the Correlation between Objective Evaluations and Subjective Speech Quality and Intelligibility

paper_url: http://arxiv.org/abs/2307.04517
repo_url: None
paper_authors: Hsin-Tien Chiang, Kuo-Hsuan Hung, Szu-Wei Fu, Heng-Cheng Kuo, Ming-Hsueh Tsai, Yu Tsao
for: 这项研究旨在评估中文语音质量和理解度的对象测量和人类识别能力之间的相关性，以及使用深度学习技术组合现有的对象测量方法预测语音质量和理解度。
methods: 本研究使用中文语音数据集，对常用的对象测量方法进行了评估，并提出了一种新的深度学习模型，该模型可以预测语音质量和理解度，同时减少了训练数据量。
results: 研究发现，新提出的深度学习模型可以准确预测语音质量和理解度，而且可以减少训练数据量。此外，包含主观语音质量评分在语音理解预测中的影响也得到了研究。

Abstract
Subjective tests are the gold standard for evaluating speech quality and intelligibility, but they are time-consuming and expensive. Thus, objective measures that align with human perceptions are crucial. This study evaluates the correlation between commonly used objective measures and subjective speech quality and intelligibility using a Chinese speech dataset. Moreover, new objective measures are proposed combining current objective measures using deep learning techniques to predict subjective quality and intelligibility. The proposed deep learning model reduces the amount of training data without significantly impacting prediction performance. We interpret the deep learning model to understand how objective measures reflect subjective quality and intelligibility. We also explore the impact of including subjective speech quality ratings on speech intelligibility prediction. Our findings offer valuable insights into the relationship between objective measures and human perceptions.

摘要
Translation notes:* "Subjective tests" is translated as "主观测试" (zhǔ yì cè shì), which refers to tests that are evaluated subjectively by human raters.* "Objective measures" is translated as "对象评价指标" (duì yì bìng jí), which refers to measures that are evaluated objectively using quantitative methods.* "Speech quality" is translated as "语音质量" (yǔ yīn zhì liàng), which refers to the overall quality of speech.* "Intelligibility" is translated as "语音可识别度" (yǔ yīn kě shí bèi duō), which refers to the ability to understand speech.* "Deep learning model" is translated as "深度学习模型" (shēn dào xué xí mó delì), which refers to a type of machine learning model that uses artificial neural networks to analyze data.* "Subjective speech quality ratings" is translated as "主观语音质量评分" (zhǔ yì yǔ yīn zhì liàng píng fān), which refers to ratings given by human raters to evaluate the subjective quality of speech.

A Demand-Driven Perspective on Generative Audio AI

paper_url: http://arxiv.org/abs/2307.04292
repo_url: None
paper_authors: Sangshin Oh, Minsung Kang, Hyeongi Moon, Keunwoo Choi, Ben Sangbae Chon
for: 本研究的目的是了解业界对AI研究的需求，以便成功地应用AI技术。
methods: 本研究使用询问专业音响工程师的方式确定研究优先级和定义多种研究任务。
results: 调查结果表明当前最大的瓶颈是数据集的可用性，以及音质和控制性的现有挑战。研究还提出了一些解决这些问题的可能性，并提供了empirical证据。

Abstract
To achieve successful deployment of AI research, it is crucial to understand the demands of the industry. In this paper, we present the results of a survey conducted with professional audio engineers, in order to determine research priorities and define various research tasks. We also summarize the current challenges in audio quality and controllability based on the survey. Our analysis emphasizes that the availability of datasets is currently the main bottleneck for achieving high-quality audio generation. Finally, we suggest potential solutions for some revealed issues with empirical evidence.

摘要
要成功推广人工智能研究，我们需要了解行业的需求。在这篇论文中，我们通过询问专业音频工程师来确定研究优先级和定义各种研究任务。我们还总结了现有数据的可用性是达到高质量音频生成的主要瓶颈。最后，我们提出了一些解决问题的可能性，并提供了实证证据。Note: Please note that the translation is in Simplified Chinese, which is used in mainland China and Singapore, while Traditional Chinese is used in Taiwan, Hong Kong, and other parts of the world.

2023-07-10

cs.CV

cs.CV - 2023-07-10

Joint Salient Object Detection and Camouflaged Object Detection via Uncertainty-aware Learning

paper_url: http://arxiv.org/abs/2307.04651
repo_url: https://github.com/baneitixiaomai/joint_sod_cod
paper_authors: Aixuan Li, Jing Zhang, Yunqiu Lv, Tong Zhang, Yiran Zhong, Mingyi He, Yuchao Dai
for: 这个论文主要针对的是��Salient Object Detection（SOD）和Camouflaged Object Detection（COD）两个任务的uncertainty-aware学习ipeline的研究。methods: 该论文提出了一种基于数据水平和任务水平的 contradiction modeling的方法，通过利用数据集之间的相互关系和任务之间的相互关系，以及一种joint-task contrastive learning框架，来减少模型的训练数据量，同时提高模型的 Robustness和 representation learning能力。results: 实验结果表明，该方法可以在标准 benchmark datasets 上达到状态革命的性能，同时也可以提供有用的uncertainty estimation。

Abstract
Salient objects attract human attention and usually stand out clearly from their surroundings. In contrast, camouflaged objects share similar colors or textures with the environment. In this case, salient objects are typically non-camouflaged, and camouflaged objects are usually not salient. Due to this inherent contradictory attribute, we introduce an uncertainty-aware learning pipeline to extensively explore the contradictory information of salient object detection (SOD) and camouflaged object detection (COD) via data-level and task-wise contradiction modeling. We first exploit the dataset correlation of these two tasks and claim that the easy samples in the COD dataset can serve as hard samples for SOD to improve the robustness of the SOD model. Based on the assumption that these two models should lead to activation maps highlighting different regions of the same input image, we further introduce a contrastive module with a joint-task contrastive learning framework to explicitly model the contradictory attributes of these two tasks. Different from conventional intra-task contrastive learning for unsupervised representation learning, our contrastive module is designed to model the task-wise correlation, leading to cross-task representation learning. To better understand the two tasks from the perspective of uncertainty, we extensively investigate the uncertainty estimation techniques for modeling the main uncertainties of the two tasks, namely task uncertainty (for SOD) and data uncertainty (for COD), and aiming to effectively estimate the challenging regions for each task to achieve difficulty-aware learning. Experimental results on benchmark datasets demonstrate that our solution leads to both state-of-the-art performance and informative uncertainty estimation.

摘要
salient objects 吸引人类注意力，通常与周围环境分别出色。相反，涂抹物体通常与环境颜色或文化相似。在这种情况下，突出物体通常不是涂抹的，而涂抹物体通常不是突出的。由于这种内在的矛盾特征，我们介绍了一种不确定性意识学习管道，激进探索这两个任务之间的矛盾信息。我们首先利用这两个任务的数据集之间的相关性，并认为易Difficult样本在COD任务中可以为SOD任务提高稳健性。基于假设这两个模型应该导致输入图像的活动图标高亮不同的区域，我们还引入了一个集成任务对比学习框架，以显式地模型这两个任务之间的矛盾特征。与传统的同任务内部对比学习不同，我们的对比模块旨在模型这两个任务之间的任务相关性，导致跨任务表示学习。为更好地理解这两个任务的不确定性，我们进行了广泛的不确定性估计技术研究，包括任务不确定性（ для SOD）和数据不确定性（ для COD），并aspire to effectively estimate the challenging regions for each task to achieve difficulty-aware learning。实验结果表明，我们的解决方案可以同时实现状态机器人表现和有用的不确定性估计。

Multimodal brain age estimation using interpretable adaptive population-graph learning

paper_url: http://arxiv.org/abs/2307.04639
repo_url: https://github.com/bintsi/adaptive-graph-learning
paper_authors: Kyriaki-Margarita Bintsi, Vasileios Baltatzis, Rolandos Alexandros Potamias, Alexander Hammers, Daniel Rueckert
for: 这个论文主要是为了提出一种基于人口图的大脑年龄估计方法，以便在 neuroscience 中提供有价值的信息，特别是在抑郁性神经疾病方面。
methods: 该论文使用了人口图和图像学习网络（GCN），并提出了一种可学习的人口图结构，以优化下游任务的性能。该方法使用了注意力机制，将多modal 的成像特征和非成像特征（phenotypes）作为权重，进行边EXTRACTION。
results: 该论文在使用 UK Biobank 数据集进行大脑年龄估计和分类任务中，超过了竞争static graph方法和其他状态态 adaptive 方法。此外，通过可视化注意力权重，提高了图的可读性。

Abstract
Brain age estimation is clinically important as it can provide valuable information in the context of neurodegenerative diseases such as Alzheimer's. Population graphs, which include multimodal imaging information of the subjects along with the relationships among the population, have been used in literature along with Graph Convolutional Networks (GCNs) and have proved beneficial for a variety of medical imaging tasks. A population graph is usually static and constructed manually using non-imaging information. However, graph construction is not a trivial task and might significantly affect the performance of the GCN, which is inherently very sensitive to the graph structure. In this work, we propose a framework that learns a population graph structure optimized for the downstream task. An attention mechanism assigns weights to a set of imaging and non-imaging features (phenotypes), which are then used for edge extraction. The resulting graph is used to train the GCN. The entire pipeline can be trained end-to-end. Additionally, by visualizing the attention weights that were the most important for the graph construction, we increase the interpretability of the graph. We use the UK Biobank, which provides a large variety of neuroimaging and non-imaging phenotypes, to evaluate our method on brain age regression and classification. The proposed method outperforms competing static graph approaches and other state-of-the-art adaptive methods. We further show that the assigned attention scores indicate that there are both imaging and non-imaging phenotypes that are informative for brain age estimation and are in agreement with the relevant literature.

摘要
�XV�urrency age estimation is clinically important as it can provide valuable information in the context of neurodegenerative diseases such as Alzheimer's. Population graphs, which include multimodal imaging information of the subjects along with the relationships among the population, have been used in literature along with Graph Convolutional Networks (GCNs) and have proved beneficial for a variety of medical imaging tasks. A population graph is usually static and constructed manually using non-imaging information. However, graph construction is not a trivial task and might significantly affect the performance of the GCN, which is inherently very sensitive to the graph structure. In this work, we propose a framework that learns a population graph structure optimized for the downstream task. An attention mechanism assigns weights to a set of imaging and non-imaging features (phenotypes), which are then used for edge extraction. The resulting graph is used to train the GCN. The entire pipeline can be trained end-to-end. Additionally, by visualizing the attention weights that were the most important for the graph construction, we increase the interpretability of the graph. We use the UK Biobank, which provides a large variety of neuroimaging and non-imaging phenotypes, to evaluate our method on brain age regression and classification. The proposed method outperforms competing static graph approaches and other state-of-the-art adaptive methods. We further show that the assigned attention scores indicate that there are both imaging and non-imaging phenotypes that are informative for brain age estimation and are in agreement with the relevant literature.Here's the translation in Traditional Chinese:��XV�urrency age estimation is clinically important as it can provide valuable information in the context of neurodegenerative diseases such as Alzheimer's. Population graphs, which include multimodal imaging information of the subjects along with the relationships among the population, have been used in literature along with Graph Convolutional Networks (GCNs) and have proved beneficial for a variety of medical imaging tasks. A population graph is usually static and constructed manually using non-imaging information. However, graph construction is not a trivial task and might significantly affect the performance of the GCN, which is inherently very sensitive to the graph structure. In this work, we propose a framework that learns a population graph structure optimized for the downstream task. An attention mechanism assigns weights to a set of imaging and non-imaging features (phenotypes), which are then used for edge extraction. The resulting graph is used to train the GCN. The entire pipeline can be trained end-to-end. Additionally, by visualizing the attention weights that were the most important for the graph construction, we increase the interpretability of the graph. We use the UK Biobank, which provides a large variety of neuroimaging and non-imaging phenotypes, to evaluate our method on brain age regression and classification. The proposed method outperforms competing static graph approaches and other state-of-the-art adaptive methods. We further show that the assigned attention scores indicate that there are both imaging and non-imaging phenotypes that are informative for brain age estimation and are in agreement with the relevant literature.

SPLAL: Similarity-based pseudo-labeling with alignment loss for semi-supervised medical image classification

paper_url: http://arxiv.org/abs/2307.04610
repo_url: None
paper_authors: Md Junaid Mahmood, Pranaw Raj, Divyansh Agarwal, Suruchi Kumari, Pravendra Singh
for: 这篇论文的目的是提出一个新的半监督学习（SSL）方法，以便在医疗影像分类中处理缺乏标签的资料。
methods: 本论文提出的方法是使用分类 prototype 和复数分类器的加权组合来预测不标的影像标签。此外，它还引入了平衡损失来减少模型偏好的问题。
results: 实验结果显示，我们的提案方法在两个公开的医疗影像分类 benchmark 数据集上（ISIC 2018 和 BCCD）表现出色，较前一些状态的 SSL 方法在不同的评估指标上表现出明显的改善。具体来说，我们的方法在 ISIC 2018 数据集上的 Accuracy 和 F1 分别提高了2.24% 和11.40%。此外，我们还进行了广泛的折冲实验，以验证方法的有效性。

Abstract
Medical image classification is a challenging task due to the scarcity of labeled samples and class imbalance caused by the high variance in disease prevalence. Semi-supervised learning (SSL) methods can mitigate these challenges by leveraging both labeled and unlabeled data. However, SSL methods for medical image classification need to address two key challenges: (1) estimating reliable pseudo-labels for the images in the unlabeled dataset and (2) reducing biases caused by class imbalance. In this paper, we propose a novel SSL approach, SPLAL, that effectively addresses these challenges. SPLAL leverages class prototypes and a weighted combination of classifiers to predict reliable pseudo-labels over a subset of unlabeled images. Additionally, we introduce alignment loss to mitigate model biases toward majority classes. To evaluate the performance of our proposed approach, we conduct experiments on two publicly available medical image classification benchmark datasets: the skin lesion classification (ISIC 2018) and the blood cell classification dataset (BCCD). The experimental results empirically demonstrate that our approach outperforms several state-of-the-art SSL methods over various evaluation metrics. Specifically, our proposed approach achieves a significant improvement over the state-of-the-art approach on the ISIC 2018 dataset in both Accuracy and F1 score, with relative margins of 2.24\% and 11.40\%, respectively. Finally, we conduct extensive ablation experiments to examine the contribution of different components of our approach, validating its effectiveness.

摘要
医学图像分类是一项具有挑战性的任务，原因在于缺乏标注样本和类别偏度导致的高度变化。 semi-supervised learning（SSL）方法可以减轻这些挑战，通过利用标注和未标注数据。然而，SSL方法 для医学图像分类需要解决两个关键挑战：（1）估算未标注图像的可靠 Pseudo-标签和（2）减轻类别偏度引起的模型偏好。在这篇论文中，我们提出了一种新的SSL方法，称为SPLAL，可以有效地解决这些挑战。SPLAL利用类 prototype和权重组合的类ifier来预测未标注图像的可靠 Pseudo-标签。此外，我们引入了对齐损失来减轻模型偏好向主要类别。为评估我们提出的方法的性能，我们在两个公开可用的医学图像分类标准测试集上进行了实验：皮肤病诊断（ISIC 2018）和血球分类 dataset（BCCD）。实验结果证明，我们的方法在多种评价指标上超越了一些状态方法。具体来说，我们的方法在 ISIC 2018 数据集上取得了显著的改进，其中 Accuracy 和 F1 score 分别提高了2.24%和11.40%。最后，我们进行了广泛的减少实验，以验证我们的方法的有效性。

Source-Free Open-Set Domain Adaptation for Histopathological Images via Distilling Self-Supervised Vision Transformer

paper_url: http://arxiv.org/abs/2307.04596
repo_url: https://github.com/lts5/proto-sf-osda
paper_authors: Guillaume Vray, Devavrat Tomar, Behzad Bozorgtabar, Jean-Philippe Thiran
for: 这个论文的目的是提出一种可行的设定，以解决批量诊断图像分类中的开集难题，包括减轻批量诊断图像的标注负担，以及在不可访问源数据集的情况下，将知识从目标频谱中提取出来，并同时检测开集样本。
methods: 这个论文使用了一种自我超vised vision transformer，通过在目标频谱中进行自我超vised学习，提取出高度Contextualized embedding。然后，使用semantic相似性来分 grouped target images，并使用源模型提供的weak pseudo-labels来权重调整这些分 grouped image的类别代表。
results: 这个论文的实验结果表明，其方法可以significantly outperform previous methods，包括开集检测、测试时 Adaptation 和 SF-OSDA方法，在三个公共的 histopathological 数据集上（Kather-16、Kather-19和CRCTP） achieve state-of-the-art 性能。

Abstract
There is a strong incentive to develop computational pathology models to i) ease the burden of tissue typology annotation from whole slide histological images; ii) transfer knowledge, e.g., tissue class separability from the withheld source domain to the distributionally shifted unlabeled target domain, and simultaneously iii) detect Open Set samples, i.e., unseen novel categories not present in the training source domain. This paper proposes a highly practical setting by addressing the abovementioned challenges in one fell swoop, i.e., source-free Open Set domain adaptation (SF-OSDA), which addresses the situation where a model pre-trained on the inaccessible source dataset can be adapted on the unlabeled target dataset containing Open Set samples. The central tenet of our proposed method is distilling knowledge from a self-supervised vision transformer trained in the target domain. We propose a novel style-based data augmentation used as hard positives for self-training a vision transformer in the target domain, yielding strongly contextualized embedding. Subsequently, semantically similar target images are clustered while the source model provides their corresponding weak pseudo-labels with unreliable confidence. Furthermore, we propose cluster relative maximum logit score (CRMLS) to rectify the confidence of the weak pseudo-labels and compute weighted class prototypes in the contextualized embedding space that are utilized for adapting the source model on the target domain. Our method significantly outperforms the previous methods, including open set detection, test-time adaptation, and SF-OSDA methods, setting the new state-of-the-art on three public histopathological datasets of colorectal cancer (CRC) assessment- Kather-16, Kather-19, and CRCTP. Our code is available at https://github.com/LTS5/Proto-SF-OSDA.

摘要
There is a strong incentive to develop computational pathology models to ease the burden of tissue typology annotation from whole slide histological images, transfer knowledge from the withheld source domain to the distributionally shifted unlabeled target domain, and simultaneously detect Open Set samples. This paper proposes a highly practical setting by addressing the above challenges in one fell swoop, i.e., source-free Open Set domain adaptation (SF-OSDA), which addresses the situation where a model pre-trained on the inaccessible source dataset can be adapted on the unlabeled target dataset containing Open Set samples. The central tenet of our proposed method is distilling knowledge from a self-supervised vision transformer trained in the target domain. We propose a novel style-based data augmentation used as hard positives for self-training a vision transformer in the target domain, yielding strongly contextualized embeddings. Subsequently, semantically similar target images are clustered while the source model provides their corresponding weak pseudo-labels with unreliable confidence. Furthermore, we propose cluster relative maximum logit score (CRMLS) to rectify the confidence of the weak pseudo-labels and compute weighted class prototypes in the contextualized embedding space that are utilized for adapting the source model on the target domain. Our method significantly outperforms the previous methods, including open set detection, test-time adaptation, and SF-OSDA methods, setting the new state-of-the-art on three public histopathological datasets of colorectal cancer (CRC) assessment- Kather-16, Kather-19, and CRCTP. Our code is available at https://github.com/LTS5/Proto-SF-OSDA.

DWA: Differential Wavelet Amplifier for Image Super-Resolution

paper_url: http://arxiv.org/abs/2307.04593
repo_url: None
paper_authors: Brian B. Moser, Stanislav Frolov, Federico Raue, Sebastian Palacio, Andreas Dengel
For: 提高图像超分辨率（SR）模型的可持续性和效能。* Methods: 使用Discrete Wavelet Transformation（DWT）和两个卷积 filters的差异来提高波лет频域中特征提取和噪声抑制。* Results: 通过与现有SR模型（如DWSR和MWCNN）结合， demonstarted improved SR performance in classical SR tasks, and also enabled direct application of these models to the input image space, reducing the DWT representation channel-wise.

Abstract
This work introduces Differential Wavelet Amplifier (DWA), a drop-in module for wavelet-based image Super-Resolution (SR). DWA invigorates an approach recently receiving less attention, namely Discrete Wavelet Transformation (DWT). DWT enables an efficient image representation for SR and reduces the spatial area of its input by a factor of 4, the overall model size, and computation cost, framing it as an attractive approach for sustainable ML. Our proposed DWA model improves wavelet-based SR models by leveraging the difference between two convolutional filters to refine relevant feature extraction in the wavelet domain, emphasizing local contrasts and suppressing common noise in the input signals. We show its effectiveness by integrating it into existing SR models, e.g., DWSR and MWCNN, and demonstrate a clear improvement in classical SR tasks. Moreover, DWA enables a direct application of DWSR and MWCNN to input image space, reducing the DWT representation channel-wise since it omits traditional DWT.

摘要
Our proposed DWA model enhances wavelet-based SR models by leveraging the difference between two convolutional filters to refine relevant feature extraction in the wavelet domain, emphasizing local contrasts and suppressing common noise in the input signals. We demonstrate the effectiveness of DWA by integrating it into existing SR models, such as DWSR and MWCNN, and show a clear improvement in classical SR tasks.Moreover, DWA enables a direct application of DWSR and MWCNN to the input image space, reducing the DWT representation channel-wise since it omits traditional DWT. This simplifies the model architecture and reduces computational complexity, making it more sustainable and efficient.

A Graph Multi-separator Problem for Image Segmentation

paper_url: http://arxiv.org/abs/2307.04592
repo_url: None
paper_authors: Jannik Irmai, Shengxian Zhao, Jannik Presberger, Bjoern Andres
for: 本研究提出了一种新的图像分割任务抽象方法，称为多分隔问题（Multi-separator problem），用于寻找每个像素是否属于一个分割或分隔结构。
methods: 本研究使用了两种特殊情况的解法和两种本地搜索算法来解决多分隔问题。
results: 在模拟的三维图像中分割 foam cells 和 filaments 时，本研究的方法得到了有效的结果。

Abstract
We propose a novel abstraction of the image segmentation task in the form of a combinatorial optimization problem that we call the multi-separator problem. Feasible solutions indicate for every pixel whether it belongs to a segment or a segment separator, and indicate for pairs of pixels whether or not the pixels belong to the same segment. This is in contrast to the closely related lifted multicut problem where every pixel is associated to a segment and no pixel explicitly represents a separating structure. While the multi-separator problem is NP-hard, we identify two special cases for which it can be solved efficiently. Moreover, we define two local search algorithms for the general case and demonstrate their effectiveness in segmenting simulated volume images of foam cells and filaments.

摘要
我们提出了一种新的图像分割任务抽象方法，即多分隔器问题（Multi-Separator Problem）。这个问题的可行解释是每个像素是否属于一个分割结构中，以及每对像素是否属于同一个分割结构。这与它与提高的多分隔器问题（Lifted Multicut Problem）相似，但不同的是，在这个问题中没有像素Explicitly表示分割结构。虽然多分隔器问题是NP困难的，但我们发现了两种特殊情况，可以有效解决。此外，我们还定义了两种本地搜索算法，并在模拟的壳体细胞和纤维物体图像中进行了效果证明。

AnyTeleop: A General Vision-Based Dexterous Robot Arm-Hand Teleoperation System

paper_url: http://arxiv.org/abs/2307.04577
repo_url: None
paper_authors: Yuzhe Qin, Wei Yang, Binghao Huang, Karl Van Wyk, Hao Su, Xiaolong Wang, Yu-Wei Chao, Dieter Fox
for: 提供多种机器人模型和运行环境下的智能远程操作系统
methods: 使用一个通用的和普适的 теле操作系统，支持多种臂、手、现实和摄像头配置
results: 在实际实验中，AnyTeleop可以比之前特定机器人硬件的系统更高的成功率，同时在虚拟实验中也可以比之前特定模拟器的系统更好的模仿学习性能

Abstract
Vision-based teleoperation offers the possibility to endow robots with human-level intelligence to physically interact with the environment, while only requiring low-cost camera sensors. However, current vision-based teleoperation systems are designed and engineered towards a particular robot model and deploy environment, which scales poorly as the pool of the robot models expands and the variety of the operating environment increases. In this paper, we propose AnyTeleop, a unified and general teleoperation system to support multiple different arms, hands, realities, and camera configurations within a single system. Although being designed to provide great flexibility to the choice of simulators and real hardware, our system can still achieve great performance. For real-world experiments, AnyTeleop can outperform a previous system that was designed for a specific robot hardware with a higher success rate, using the same robot. For teleoperation in simulation, AnyTeleop leads to better imitation learning performance, compared with a previous system that is particularly designed for that simulator. Project page: http://anyteleop.com/.

摘要
“视觉基于的 теле操作提供了让机器人具备人类水平的智能，只需要低成本的摄像头传感器。然而，当前的视觉基于的 теле操作系统是为特定的机器人模型和部署环境设计和工程，这会导致系统在机器人模型池的扩展和操作环境的多样性增加时，缺乏扩展性。在这篇论文中，我们提出了 AnyTeleop，一个通用和总体的 теле操作系统，可以支持多种不同的机器人臂、手、现实和摄像头配置。尽管我们的系统设计了很大的灵活性，但它仍然可以达到高性能。在实际实验中， AnyTeleop 可以比一个特定机器人硬件的前一个系统高效性更高，使用同一个机器人。在 simulations 中， AnyTeleop 可以比一个特定 simulator 的前一个系统更好地学习模仿性。项目页面：http://anyteleop.com/。”

TFR: Texture Defect Detection with Fourier Transform using Normal Reconstructed Template of Simple Autoencoder

paper_url: http://arxiv.org/abs/2307.04574
repo_url: None
paper_authors: Jongwook Si, Sungyoung Kim
for: 本研究旨在检测图像中的纹理缺陷，以提高图像质量和检测精度。
methods: 本研究使用了简单的自动编码器和傅立叫变换，并将这两种方法组合使用。傅立叫变换是一种强大的图像分析工具，可以在频谱域中分析图像和信号。
results: 实验结果表明，该方法可以准确和有效地检测纹理缺陷。与现有方法相比，该方法的性能和精度更高。

Abstract
Texture is an essential information in image representation, capturing patterns and structures. As a result, texture plays a crucial role in the manufacturing industry and is extensively studied in the fields of computer vision and pattern recognition. However, real-world textures are susceptible to defects, which can degrade image quality and cause various issues. Therefore, there is a need for accurate and effective methods to detect texture defects. In this study, a simple autoencoder and Fourier transform are employed for texture defect detection. The proposed method combines Fourier transform analysis with the reconstructed template obtained from the simple autoencoder. Fourier transform is a powerful tool for analyzing the frequency domain of images and signals. Moreover, since texture defects often exhibit characteristic changes in specific frequency ranges, analyzing the frequency domain enables effective defect detection. The proposed method demonstrates effectiveness and accuracy in detecting texture defects. Experimental results are presented to evaluate its performance and compare it with existing approaches.

摘要
Texture 是图像表示中的关键信息，捕捉 Patterns 和结构。因此，Texture 在制造业中扮演着重要的角色，在计算机视觉和模式识别领域广泛研究。然而，实际世界中的 Texture 容易受到缺陷的影响，这可能会下降图像质量并导致多种问题。因此，有一需要高精度和有效的方法来检测 Texture 缺陷。在本研究中，我们employs a simple autoencoder 和 Fourier transform 来检测 Texture 缺陷。我们的方法结合了 Fourier transform 分析和从 simple autoencoder 获得的重建模板。Fourier transform 是一个强大的图像和信号分析工具，而 texture 缺陷通常在特定频谱范围内表现出特征变化，因此在频谱域进行分析可以有效地检测缺陷。我们的方法在检测 Texture 缺陷方面表现出了高效和准确的性能。实验结果用于评估我们的方法和现有方法的比较。

Unraveling the Age Estimation Puzzle: Comparative Analysis of Deep Learning Approaches for Facial Age Estimation

paper_url: http://arxiv.org/abs/2307.04570
repo_url: https://github.com/paplhjak/facial-age-estimation-benchmark
paper_authors: Jakub Paplham, Vojtech Franc
for: 本研究旨在解决不同年龄估计方法之间的比较困难，即由于benchmarking过程中的不一致性导致出版的结果不可靠。
methods: 本研究使用cross-entropy损失函数作为标准方法，并系统分析了影响年龄估计结果的多种因素，包括脸部对alignment、脸部覆盖率、图像分辨率、图像表示方式、模型架构以及数据量。
results: 研究发现，这些因素frequently exerts a more significant influence than the choice of age estimation method itself，而且使用一致的数据处理方法和建立标准benchmarks是 Ensure reliable and meaningful comparisons.

Abstract
Comparing different age estimation methods poses a challenge due to the unreliability of published results, stemming from inconsistencies in the benchmarking process. Previous studies have reported continuous performance improvements over the past decade using specialized methods; however, our findings challenge these claims. We argue that, for age estimation tasks outside of the low-data regime, designing specialized methods is unnecessary, and the standard approach of utilizing cross-entropy loss is sufficient. This paper aims to address the benchmark shortcomings by evaluating state-of-the-art age estimation methods in a unified and comparable setting. We systematically analyze the impact of various factors, including facial alignment, facial coverage, image resolution, image representation, model architecture, and the amount of data on age estimation results. Surprisingly, these factors often exert a more significant influence than the choice of the age estimation method itself. We assess the generalization capability of each method by evaluating the cross-dataset performance for publicly available age estimation datasets. The results emphasize the importance of using consistent data preprocessing practices and establishing standardized benchmarks to ensure reliable and meaningful comparisons. The source code is available at https://github.com/paplhjak/Facial-Age-Estimation-Benchmark.

摘要
Translated into Simplified Chinese:与不同的年龄估计方法进行比较存在挑战，主要是因为发表的结果不可靠，这主要归结于测试过程中的不一致。先前的研究报告过去十年内，通过特殊方法进行改进，但我们的发现推翻了这些宣称。我们认为，在低数据域以外，特殊的方法设计是不必要的，使用权重损失是充分的。这篇论文旨在解决测试缺陷，对现有的年龄估计方法进行统一和可比较的评估。我们系统地分析了不同因素的影响，包括人脸匹配、人脸覆盖率、图像分辨率、图像表示方式、模型体系和数据量，这些因素经常对年龄估计结果产生更大的影响，而不是选择年龄估计方法本身。我们评估每种方法的通用能力，通过评估公共可用的年龄估计数据集上的交叉数据表现来评估。结果表明，使用一致的数据处理方法和建立标准化的标准是确保可靠和有意义的比较的重要。软件代码可以在https://github.com/paplhjak/Facial-Age-Estimation-Benchmark上获取。

Important Clues that Facilitate Visual Emergence: Three Psychological Experiments

paper_url: http://arxiv.org/abs/2307.10194
repo_url: None
paper_authors: Jingmeng Li, Hui Wei
for: 本研究旨在探讨视觉 emergence 现象中的因素。
methods: 研究采用三个心理实验，以探索 emerging 图像的识别因素。
results: 研究发现，本地区域的斑点密度和一些关键斑点的排列有关于 emerging 图像的识别。

Abstract
Visual emergence is the phenomenon in which the visual system obtains a holistic perception after grouping and reorganizing local signals. The picture Dalmatian dog is known for its use in explaining visual emergence. This type of image, which consists of a set of discrete black speckles (speckles), is called an emerging image. Not everyone can find the dog in Dalmatian dog, and among those who can, the time spent varies greatly. Although Gestalt theory summarizes perceptual organization into several principles, it remains ambiguous how these principles affect the perception of emerging images. This study, therefore, designed three psychological experiments to explore the factors that influence the perception of emerging images. In the first, we found that the density of speckles in the local area and the arrangements of some key speckles played a key role in the perception of an emerging case. We set parameters in the algorithm to characterize these two factors. We then automatically generated diversified emerging-test images (ETIs) through the algorithm and verified their effectiveness in two subsequent experiments.

摘要
“视觉突出现”是指视系统从局部信号集成和重新排序后获得整体视觉。达尔马提亚狗图例是用于解释视觉突出现的图像。这种图像由一组独立的黑点（点）组成，被称为突出现图像。不 everyone可以在达尔马提亚狗图中找到狗，而且那些找到狗的人之间的时间花费差异很大。虽然格式理论总结了感知组织化的几个原则，但是这些原则如何影响突出现图像的感知仍然抽象。这个研究因此设计了三个心理实验，探讨突出现图像的因素。在第一个实验中，我们发现了局部区域的密度和一些关键点的排列对突出现的感知产生了关键作用。我们在算法中设置了这两个因素的参数。然后我们通过算法自动生成多样化的突出现测试图像（ETIs），并在两个后续实验中证明了它们的有效性。

SparseVSR: Lightweight and Noise Robust Visual Speech Recognition

paper_url: http://arxiv.org/abs/2307.04552
repo_url: None
paper_authors: Adriana Fernandez-Lopez, Honglie Chen, Pingchuan Ma, Alexandros Haliassos, Stavros Petridis, Maja Pantic
for: 这个研究旨在测试不同的几何基于簇节约技术，以生成轻量级的模型，并评估其对于视觉声译解的表现。
methods: 本研究使用了深度神经网络，并应用了不同的几何基于簇节约技术，以生成轻量级的模型。
results: 研究发现，使用几何基于簇节约技术可以实现轻量级的模型，并且在视觉声译解任务中表现更好，尤其是在视觉噪音存在的情况下。

Abstract
Recent advances in deep neural networks have achieved unprecedented success in visual speech recognition. However, there remains substantial disparity between current methods and their deployment in resource-constrained devices. In this work, we explore different magnitude-based pruning techniques to generate a lightweight model that achieves higher performance than its dense model equivalent, especially under the presence of visual noise. Our sparse models achieve state-of-the-art results at 10% sparsity on the LRS3 dataset and outperform the dense equivalent up to 70% sparsity. We evaluate our 50% sparse model on 7 different visual noise types and achieve an overall absolute improvement of more than 2% WER compared to the dense equivalent. Our results confirm that sparse networks are more resistant to noise than dense networks.

摘要
Translated into Simplified Chinese:最近的深度神经网络技术已经取得了无 precedent 的成功 в视觉语音识别领域。然而，目前的方法和其在资源受限的设备中的部署仍然存在很大的差距。在这项工作中，我们探索了不同的大小基于的剪除技术，以生成一个轻量级的模型，其性能高于其稠密模型等效的，尤其是在视觉噪声存在的情况下。我们的稀疏模型在 LRS3 数据集上达到了state-of-the-art 的结果，并在 10% 稀疏率下超过了稠密模型的性能。我们的 50% 稀疏模型在 7 种不同的视觉噪声类型下进行评估，实现了全部绝对改进超过 2% WER 的成果，相比稠密模型。我们的结果表明，稀疏网络比稠密网络更抗障压。

Customizing Synthetic Data for Data-Free Student Learning

paper_url: http://arxiv.org/abs/2307.04542
repo_url: https://github.com/luoshiya/csd
paper_authors: Shiya Luo, Defang Chen, Can Wang
for: 这篇论文目的是提出一种基于无数据知识的学生模型训练方法，即Customizing Synthetic Data for Data-Free Student Learning（CSD）。
methods: 该方法使用一种自我超vised的补充任务来估算学生学习能力，然后根据这个估算值进行数据生成动态调整，以生成难样本 для学生模型。
results: 实验结果表明，该方法可以在不同的数据集和教师-学生模型下达到效果，并且可以提高学生模型的性能。代码可以在： $\href{https://github.com/luoshiya/CSD}{https://github.com/luoshiya/CSD}$ 获取。

Abstract
Data-free knowledge distillation (DFKD) aims to obtain a lightweight student model without original training data. Existing works generally synthesize data from the pre-trained teacher model to replace the original training data for student learning. To more effectively train the student model, the synthetic data shall be customized to the current student learning ability. However, this is ignored in the existing DFKD methods and thus negatively affects the student training. To address this issue, we propose Customizing Synthetic Data for Data-Free Student Learning (CSD) in this paper, which achieves adaptive data synthesis using a self-supervised augmented auxiliary task to estimate the student learning ability. Specifically, data synthesis is dynamically adjusted to enlarge the cross entropy between the labels and the predictions from the self-supervised augmented task, thus generating hard samples for the student model. The experiments on various datasets and teacher-student models show the effectiveness of our proposed method. Code is available at: $\href{https://github.com/luoshiya/CSD}{https://github.com/luoshiya/CSD}$

摘要
“数据空间知识储备（DFKD）目的是获得轻量级学生模型无需原始训练数据。现有的方法通常将先行训练的教师模型中的数据进行合成，以替代学生学习的原始数据。但是现有的DFKD方法忽略了对学生学习能力的适应，这会负面影响学生模型的训练。为解决这个问题，我们在本文提出了适应数据生成 для无数据学习（CSD）方法，它通过自我监督增强任务来估算学生学习能力，并在数据生成中进行动态调整。具体来说，通过自我监督增强任务来增加label和预测值之间的cross entropy，从而生成硬样本 для学生模型。实验结果表明，我们的提议方法在不同的数据集和教师-学生模型上具有效果。代码可以在： $\href{https://github.com/luoshiya/CSD}{https://github.com/luoshiya/CSD}$”

Cluster-Induced Mask Transformers for Effective Opportunistic Gastric Cancer Screening on Non-contrast CT Scans

paper_url: http://arxiv.org/abs/2307.04525
repo_url: None
paper_authors: Mingze Yuan, Yingda Xia, Xin Chen, Jiawen Yao, Junli Wang, Mingyan Qiu, Hexin Dong, Jingren Zhou, Bin Dong, Le Lu, Li Zhang, Zaiyi Liu, Ling Zhang
for: 这个研究旨在检测胃癌，它是全球第三大导致死亡癌症之一，但是没有建议的检测测试。现有的方法可能是侵入性高，成本高，敏感度低，无法检测早期胃癌。methods: 我们使用了一种深度学习方法，使用非对照CT扫描图进行胃癌检测。我们提出了一种新的归一化引导Transporter，该模型同时分割肿瘤并将异常性分类。我们的模型包括学习集的概率，使用自我和对自我的关注来交互对征特征。results: 我们的方法在一个测试集上达到了敏感性85.0%和特征性92.6%，比两名医生的平均敏感性73.5%和特征性84.3%高。此外，我们在一个外部测试集上获得了特征性97.7%。我们的方法与血液测试和endoscopia等现有的胃癌检测工具相当，同时具有更高的敏感性，可能成为一种新的、不侵入性、成本低、准确的胃癌creening方法。

Abstract
Gastric cancer is the third leading cause of cancer-related mortality worldwide, but no guideline-recommended screening test exists. Existing methods can be invasive, expensive, and lack sensitivity to identify early-stage gastric cancer. In this study, we explore the feasibility of using a deep learning approach on non-contrast CT scans for gastric cancer detection. We propose a novel cluster-induced Mask Transformer that jointly segments the tumor and classifies abnormality in a multi-task manner. Our model incorporates learnable clusters that encode the texture and shape prototypes of gastric cancer, utilizing self- and cross-attention to interact with convolutional features. In our experiments, the proposed method achieves a sensitivity of 85.0% and specificity of 92.6% for detecting gastric tumors on a hold-out test set consisting of 100 patients with cancer and 148 normal. In comparison, two radiologists have an average sensitivity of 73.5% and specificity of 84.3%. We also obtain a specificity of 97.7% on an external test set with 903 normal cases. Our approach performs comparably to established state-of-the-art gastric cancer screening tools like blood testing and endoscopy, while also being more sensitive in detecting early-stage cancer. This demonstrates the potential of our approach as a novel, non-invasive, low-cost, and accurate method for opportunistic gastric cancer screening.

摘要
胃癌是全球第三大的癌症相关死亡原因，但没有任何指南推荐的检测试exist。现有方法可能是侵入的、昂贵的，并且缺乏感知到早期胃癌的能力。在这项研究中，我们研究了使用深度学习方法来非对照CT扫描图像中的胃癌检测。我们提出了一种新的归一化Transformer，它同时分割肿瘤并将异常性分类，使用多任务方式进行协同学习。我们的模型利用学习的归一化来编码胃癌的纹理和形态谱，通过自我和十字关注来交互与卷积特征。在我们的实验中，我们的方法达到了检测胃癌的敏感性85.0%和特征精度92.6%，与两名医生的平均检测率73.5%和84.3%相比。此外，我们还获得了97.7%的特征精度在外部测试集上，该测试集包含903个正常例。我们的方法与已有的胃癌检测工具如血液测试和内视镜相比，同时具有更高的敏感性，可能成为一种新的、不侵入的、低成本的和准确的胃癌creening方法。

Efficient Match Pair Retrieval for Large-scale UAV Images via Graph Indexed Global Descriptor

paper_url: http://arxiv.org/abs/2307.04520
repo_url: None
paper_authors: San Jiang, Yichen Ma, Qingquan Li, Wanshou Jiang, Bingxuan Guo, Lelin Li, Lizhe Wang
for: 这 paper 是用于提高 UAV 图像姿态重建的 Structure from Motion (SfM) 方法的研究。methods: 该 paper 提出了一种高效的匹配对选择方法，包括在线训练个性化 Codebook，使用 VLAD 综合器将本地特征转换为高维度全局描述符，并使用 HNSW 图结构进行最近匹配搜索。results: 测试结果表明，提出的解决方案可以减少匹配对选择的计算量，并且在相对和绝对姿态重建中具有竞争力的准确率。

Abstract
SfM (Structure from Motion) has been extensively used for UAV (Unmanned Aerial Vehicle) image orientation. Its efficiency is directly influenced by feature matching. Although image retrieval has been extensively used for match pair selection, high computational costs are consumed due to a large number of local features and the large size of the used codebook. Thus, this paper proposes an efficient match pair retrieval method and implements an integrated workflow for parallel SfM reconstruction. First, an individual codebook is trained online by considering the redundancy of UAV images and local features, which avoids the ambiguity of training codebooks from other datasets. Second, local features of each image are aggregated into a single high-dimension global descriptor through the VLAD (Vector of Locally Aggregated Descriptors) aggregation by using the trained codebook, which remarkably reduces the number of features and the burden of nearest neighbor searching in image indexing. Third, the global descriptors are indexed via the HNSW (Hierarchical Navigable Small World) based graph structure for the nearest neighbor searching. Match pairs are then retrieved by using an adaptive threshold selection strategy and utilized to create a view graph for divide-and-conquer based parallel SfM reconstruction. Finally, the performance of the proposed solution has been verified using three large-scale UAV datasets. The test results demonstrate that the proposed solution accelerates match pair retrieval with a speedup ratio ranging from 36 to 108 and improves the efficiency of SfM reconstruction with competitive accuracy in both relative and absolute orientation.

摘要
SfM (结构从运动) 已广泛应用于无人飞行器（UAV）图像方向。其效率直接取决于特征匹配。 although image retrieval has been extensively used for match pair selection, high computational costs are consumed due to a large number of local features and the large size of the used codebook. Therefore, this paper proposes an efficient match pair retrieval method and implements an integrated workflow for parallel SfM reconstruction.First, an individual codebook is trained online by considering the redundancy of UAV images and local features, which avoids the ambiguity of training codebooks from other datasets. Second, local features of each image are aggregated into a single high-dimensional global descriptor through the VLAD (Vector of Locally Aggregated Descriptors) aggregation by using the trained codebook, which remarkably reduces the number of features and the burden of nearest neighbor searching in image indexing.Third, the global descriptors are indexed via the HNSW (Hierarchical Navigable Small World) based graph structure for the nearest neighbor searching. Match pairs are then retrieved by using an adaptive threshold selection strategy and utilized to create a view graph for divide-and-conquer based parallel SfM reconstruction.Finally, the performance of the proposed solution has been verified using three large-scale UAV datasets. The test results demonstrate that the proposed solution accelerates match pair retrieval with a speedup ratio ranging from 36 to 108 and improves the efficiency of SfM reconstruction with competitive accuracy in both relative and absolute orientation.

An Examination of Wearable Sensors and Video Data Capture for Human Exercise Classification

paper_url: http://arxiv.org/abs/2307.04516
repo_url: None
paper_authors: Ashish Singh, Antonio Bevilacqua, Timilehin B. Aderinola, Thach Le Nguyen, Darragh Whelan, Martin O’Reilly, Brian Caulfield, Georgiana Ifrim
for:这种研究用于评估人体运动表现，并使用增强式计时系统（IMU）和视频技术进行分类。methods:这种研究使用了单个相机和5个IMU，并使用了多变量时间序列分类器来处理数据。results:研究发现，使用单个相机可以在平均上高于使用单个IMU的表现，而且至少需要3个IMU来与单个相机相比。此外，将数据从单个相机和单个IMU组合在一起可以达到更高的表现。

Abstract
Wearable sensors such as Inertial Measurement Units (IMUs) are often used to assess the performance of human exercise. Common approaches use handcrafted features based on domain expertise or automatically extracted features using time series analysis. Multiple sensors are required to achieve high classification accuracy, which is not very practical. These sensors require calibration and synchronization and may lead to discomfort over longer time periods. Recent work utilizing computer vision techniques has shown similar performance using video, without the need for manual feature engineering, and avoiding some pitfalls such as sensor calibration and placement on the body. In this paper, we compare the performance of IMUs to a video-based approach for human exercise classification on two real-world datasets consisting of Military Press and Rowing exercises. We compare the performance using a single camera that captures video in the frontal view versus using 5 IMUs placed on different parts of the body. We observe that an approach based on a single camera can outperform a single IMU by 10 percentage points on average. Additionally, a minimum of 3 IMUs are required to outperform a single camera. We observe that working with the raw data using multivariate time series classifiers outperforms traditional approaches based on handcrafted or automatically extracted features. Finally, we show that an ensemble model combining the data from a single camera with a single IMU outperforms either data modality. Our work opens up new and more realistic avenues for this application, where a video captured using a readily available smartphone camera, combined with a single sensor, can be used for effective human exercise classification.

摘要
佩戴式感测器如力度测量单元（IMU）经常用于评估人类运动表现。常见的方法使用域专业知识或自动提取的特征来制定手动特征工程。使用多个感测器可以获得高分类精度，但是不实际。这些感测器需要协调和同步，并可能会导致身体部位的不适，尤其是在长时间内。现有的计算机视觉技术研究表示，通过视频来评估人类运动，可以达到类似的性能，而不需要手动特征工程，并避免一些障碍物，如感测器协调和身体部位的印象。在这篇论文中，我们对IMUs和视频方法进行了人类运动分类的比较，使用了两个实际数据集，一个是军事推举操作，另一个是舟行操作。我们发现，使用单个摄像头捕捉视频的方法可以在 average 上高于单个IMU的性能，并且至少需要3个IMUs以达到单个摄像头的性能。此外，我们发现，使用原始数据进行多变量时间序列分类可以超越传统的手动特征工程和自动提取特征的方法。最后，我们表明，将单个摄像头和单个IMU的数据 ensemble 起来可以超越任何一个数据类型。我们的工作开启了新的、更实际的应用途径，通过使用普遍可用的智能手机摄像头，并与单个感测器结合，可以实现有效的人类运动分类。

CoactSeg: Learning from Heterogeneous Data for New Multiple Sclerosis Lesion Segmentation

paper_url: http://arxiv.org/abs/2307.04513
repo_url: https://github.com/ycwu1997/coactseg
paper_authors: Yicheng Wu, Zhonghua Wu, Hengcan Shi, Bjoern Picker, Winston Chong, Jianfei Cai
for:* 新型多发性纤维病（MS）诊断和治疗中的新损害分 segmentation 问题具有重要的实际意义，因为它可以帮助评估疾病的进程和治疗效果。methods:* 提出了一种协同分割（CoactSeg）框架，利用新损害注意点和所有损害注意点的同时注入，以提高新损害分 segmentation 的性能。* 提出了一种简单而有效的时间关系约束，以保证新损害和所有损害之间的长期关系，从而改进模型学习。results:* 利用多种数据（包括新损害注意点和所有损害注意点）和提出的时间关系约束，可以明显提高新损害分 segmentation 的性能。* 在评估新损害和所有损害分 segmentation 任务中，模型表现出色，并且在不同的数据上进行了广泛的验证。I hope that helps! Let me know if you have any other questions.

Abstract
New lesion segmentation is essential to estimate the disease progression and therapeutic effects during multiple sclerosis (MS) clinical treatments. However, the expensive data acquisition and expert annotation restrict the feasibility of applying large-scale deep learning models. Since single-time-point samples with all-lesion labels are relatively easy to collect, exploiting them to train deep models is highly desirable to improve new lesion segmentation. Therefore, we proposed a coaction segmentation (CoactSeg) framework to exploit the heterogeneous data (i.e., new-lesion annotated two-time-point data and all-lesion annotated single-time-point data) for new MS lesion segmentation. The CoactSeg model is designed as a unified model, with the same three inputs (the baseline, follow-up, and their longitudinal brain differences) and the same three outputs (the corresponding all-lesion and new-lesion predictions), no matter which type of heterogeneous data is being used. Moreover, a simple and effective relation regularization is proposed to ensure the longitudinal relations among the three outputs to improve the model learning. Extensive experiments demonstrate that utilizing the heterogeneous data and the proposed longitudinal relation constraint can significantly improve the performance for both new-lesion and all-lesion segmentation tasks. Meanwhile, we also introduce an in-house MS-23v1 dataset, including 38 Oceania single-time-point samples with all-lesion labels. Codes and the dataset are released at https://github.com/ycwu1997/CoactSeg.

摘要
新的肿瘤分割是评估多发性硬化病（MS）治疗效果和病程进程中的关键。然而，收集数据和专家标注的成本限制了大规模深度学习模型的应用性。由于单时点检查结果具有所有肿瘤标注，因此利用它们来训练深度模型是非常有利的。因此，我们提出了合作分割（CoactSeg）框架，以利用不同类型的数据（包括新肿瘤标注的两个时点数据和所有肿瘤标注的单个时点数据）进行新肿瘤分割。CoactSeg模型采用同一个输入（基线、追加和其中间脑变化）和同一个输出（对应的所有肿瘤和新肿瘤预测），无论使用哪种不同类型的数据。此外，我们还提出了一种简单而有效的时间相关约束，以提高模型学习。我们的实验表明，利用多种数据和提议的时间相关约束可以显著提高新肿瘤和所有肿瘤分割任务的性能。此外，我们还推出了一个MS-23v1 dataset，包括38个澳大利亚单时点样本，每个样本均有所有肿瘤标注。代码和数据可以在https://github.com/ycwu1997/CoactSeg中下载。

Exact Diffusion Inversion via Bi-directional Integration Approximation

paper_url: http://arxiv.org/abs/2307.10829
repo_url: None
paper_authors: Guoqiang Zhang, J. P. Lewis, W. Bastiaan Kleijn
for: 减少Diffusion Inversion Matrix（DDIM）的计算复杂度，实现图像修改和重建。
methods: 提出了一种新的技术——双向集成近似（BDIA），通过两次DDIM更新过程来实现精确的扩散推算，并且可以在不同的ODE解析器上进行改进。
results: 通过实验证明，BDIA可以减少计算复杂度，并且在图像重建和修改中获得更好的性能。

Abstract
Recently, different methods have been proposed to address the inconsistency issue of DDIM inversion to enable image editing, such as EDICT \cite{Wallace23EDICT} and Null-text inversion \cite{Mokady23NullTestInv}. However, the above methods introduce considerable computational overhead. In this paper, we propose a new technique, named \emph{bi-directional integration approximation} (BDIA), to perform exact diffusion inversion with neglible computational overhead. Suppose we would like to estimate the next diffusion state $\boldsymbol{z}_{i-1}$ at timestep $t_i$ with the historical information $(i,\boldsymbol{z}_i)$ and $(i+1,\boldsymbol{z}_{i+1})$. We first obtain the estimated Gaussian noise $\hat{\boldsymbol{\epsilon}(\boldsymbol{z}_i,i)$, and then apply the DDIM update procedure twice for approximating the ODE integration over the next time-slot $[t_i, t_{i-1}]$ in the forward manner and the previous time-slot $[t_i, t_{t+1}]$ in the backward manner. The DDIM step for the previous time-slot is used to refine the integration approximation made earlier when computing $\boldsymbol{z}_i$. One nice property with BDIA-DDIM is that the update expression for $\boldsymbol{z}_{i-1}$ is a linear combination of $(\boldsymbol{z}_{i+1}, \boldsymbol{z}_i, \hat{\boldsymbol{\epsilon}(\boldsymbol{z}_i,i))$. This allows for exact backward computation of $\boldsymbol{z}_{i+1}$ given $(\boldsymbol{z}_i, \boldsymbol{z}_{i-1})$, thus leading to exact diffusion inversion. Experiments on both image reconstruction and image editing were conducted, confirming our statement. BDIA can also be applied to improve the performance of other ODE solvers in addition to DDIM. In our work, it is found that applying BDIA to the EDM sampling procedure produces slightly better FID score over CIFAR10.

摘要
最近，有些方法已经被提出来解决了DDIM逆向问题的不一致性issue，如EDICT \cite{Wallace23EDICT}和Null-text逆向 \cite{Mokady23NullTestInv}。然而，这些方法会带来较大的计算开销。在这篇论文中，我们提出了一新的方法，即双向集成approximation（BDIA），以实现精确的扩散逆向。假设我们想要在时间步长$t_i$中计算下一个扩散状态$\boldsymbol{z}_{i-1}$，使用历史信息 $(i,\boldsymbol{z}_i)$ 和 $(i+1,\boldsymbol{z}_{i+1})$。我们首先获取估计的高斯噪声 $\hat{\boldsymbol{\epsilon}(\boldsymbol{z}_i,i)$，然后使用DDIM更新过程两次来approximate ODE интеграção在下一个时间步长 $[t_i, t_{i-1}]$ 和上一个时间步长 $[t_i, t_{t+1}]$ 之间的integração。DDIM步骤用于上一个时间步长来精度地计算以前的integração，从而提高了integration的精度。BDIA-DDIM的更新表达式为$\boldsymbol{z}_{i-1}$ 是 $\boldsymbol{z}_{i+1}$, $\boldsymbol{z}_i$ 和 $\hat{\boldsymbol{\epsilon}(\boldsymbol{z}_i,i)$ 的线性组合。这使得可以精确地计算 $\boldsymbol{z}_{i+1}$ given $(\boldsymbol{z}_i, \boldsymbol{z}_{i-1})$，从而实现精确的扩散逆向。我们在图像重建和图像修改方面进行了实验，并证明了我们的说法。BDIA还可以用于提高其他ODE解析器的性能，而不仅仅是DDIM。在我们的工作中，我们发现了应用BDIA到EDM采样过程中会提高FID分数。

Partial Vessels Annotation-based Coronary Artery Segmentation with Self-training and Prototype Learning

paper_url: http://arxiv.org/abs/2307.04472
repo_url: https://github.com/zhangz7112/pva-cas
paper_authors: Zheng Zhang, Xiaolei Zhang, Yaolei Qi, Guanyu Yang
for: 这个研究旨在提出一种 partial vessels annotation (PVA) based 的 label-efficient learning algorithm, 以应对 coronary artery segmentation 的挑战和临床诊断特点。
methods: 提出了一个 progressive weakly supervised learning 框架，包括 local feature learning、global structure learning 和 feature embedding 等步骤，以实现精确的 segmentation。
results: 在临床数据上的实验表明，提出的框架比竞争方法在 PVA 下表现出色 (24.29% 血管), 并且与基线模型使用完整注释 (100% 血管) 的表现相似。

Abstract
Coronary artery segmentation on coronary-computed tomography angiography (CCTA) images is crucial for clinical use. Due to the expertise-required and labor-intensive annotation process, there is a growing demand for the relevant label-efficient learning algorithms. To this end, we propose partial vessels annotation (PVA) based on the challenges of coronary artery segmentation and clinical diagnostic characteristics. Further, we propose a progressive weakly supervised learning framework to achieve accurate segmentation under PVA. First, our proposed framework learns the local features of vessels to propagate the knowledge to unlabeled regions. Subsequently, it learns the global structure by utilizing the propagated knowledge, and corrects the errors introduced in the propagation process. Finally, it leverages the similarity between feature embeddings and the feature prototype to enhance testing outputs. Experiments on clinical data reveals that our proposed framework outperforms the competing methods under PVA (24.29% vessels), and achieves comparable performance in trunk continuity with the baseline model using full annotation (100% vessels).

摘要
coronary artery 分 segmentation on coronary-computed tomography angiography (CCTA) 图像是临床使用的关键。由于注意力和劳动力需求的注意力和劳动力需求，因此有一种增长的需求 für relevant label-efficient learning algorithms。为此，我们提出了partial vessels annotation (PVA)，基于 coronary artery segmentation 和临床诊断特点的挑战。然后，我们提出了一种进步的弱监督学习框架，以实现高精度的分 segmentation。首先，我们的提出的框架学习了local feature of vessels，以帮助它在无标注区域中传播知识。然后，它学习了全局结构，利用传播的知识，并 corrections 了在传播过程中引入的错误。最后，它利用特征嵌入和特征原型之间的相似性，提高测试输出。在临床数据上的实验表明，我们的提出的框架在 PVA (24.29% 血管) 下比竞争方法表现出色，并与基于全注解 (100% 血管) 的基eline模型具有相同的 trunk continuity 性能。

Test-Time Adaptation for Nighttime Color-Thermal Semantic Segmentation

paper_url: http://arxiv.org/abs/2307.04470
repo_url: None
paper_authors: Yexin Liu, Weiming Zhang, Guoyang Zhao, Jinjing Zhu, Athanasios Vasilakos, Lin Wang
For: The paper focuses on improving the performance of RGB-Thermal (RGB-T) semantic segmentation in nighttime scenes, which is a challenging task due to the large day-night gap and the inconsistent performance of RGB images at night.* Methods: The proposed method, called Night-TTA, uses a test-time adaptation (TTA) framework to address the challenges of nighttime RGB-T semantic segmentation without requiring access to the source (daytime) data during adaptation. The method consists of three key technical parts: Imaging Heterogeneity Refinement (IHR), Class Aware Refinement (CAR), and a specific learning scheme.* Results: The proposed method achieves state-of-the-art (SoTA) performance with a 13.07% boost in mean Intersection over Union (mIoU) compared to the baseline method.

Abstract
The ability to scene understanding in adverse visual conditions, e.g., nighttime, has sparked active research for RGB-Thermal (RGB-T) semantic segmentation. However, it is essentially hampered by two critical problems: 1) the day-night gap of RGB images is larger than that of thermal images, and 2) the class-wise performance of RGB images at night is not consistently higher or lower than that of thermal images. we propose the first test-time adaptation (TTA) framework, dubbed Night-TTA, to address the problems for nighttime RGBT semantic segmentation without access to the source (daytime) data during adaptation. Our method enjoys three key technical parts. Firstly, as one modality (e.g., RGB) suffers from a larger domain gap than that of the other (e.g., thermal), Imaging Heterogeneity Refinement (IHR) employs an interaction branch on the basis of RGB and thermal branches to prevent cross-modal discrepancy and performance degradation. Then, Class Aware Refinement (CAR) is introduced to obtain reliable ensemble logits based on pixel-level distribution aggregation of the three branches. In addition, we also design a specific learning scheme for our TTA framework, which enables the ensemble logits and three student logits to collaboratively learn to improve the quality of predictions during the testing phase of our Night TTA. Extensive experiments show that our method achieves state-of-the-art (SoTA) performance with a 13.07% boost in mIoU.

摘要
《夜间RGB-热图（RGB-T）semantic segmentation的研究受到了日间和夜间视觉条件下的场景理解能力的激发。然而，这些研究受到了两个重要问题的限制：1）RGB图像的日夜差距比热图更大，2）夜间RGB图像的类别性能不一定高于或低于热图。我们提出了第一个测试时适应（TTA）框架，名为夜间TTA，以解决这些问题。我们的方法具有三个关键技术部分。首先，由于一种感知Modal（例如RGB）的频谱差异比另一种Modal（例如热图）更大，我们使用了Imaging Heterogeneity Refinement（IHR）模块，该模块基于RGB和热图分支来避免交互Modal差异并避免性能下降。其次，我们引入了Class Aware Refinement（CAR）模块，以基于像素级分布聚合来获取可靠的ensemble logits。此外，我们还设计了特殊的学习方案 для我们的TTA框架，这使得ensemble logits和三个学生logits可以共同学习以提高测试阶段的预测质量。广泛的实验表明，我们的方法可以 дости到当前最佳性能（SoTA），并提高了13.07%的mIoU。》Note: The translation is in Simplified Chinese, which is the standard written form of Chinese used in mainland China. If you need the translation in Traditional Chinese, please let me know.

SAM-IQA: Can Segment Anything Boost Image Quality Assessment?

paper_url: http://arxiv.org/abs/2307.04455
repo_url: https://github.com/hedlen/sam-iqa
paper_authors: Xinpeng Li, Ting Jiang, Haoqiang Fan, Shuaicheng Liu
for: 本研究旨在提高图像质量评估（IQA）的精度，通过使用大量数据进行训练以获得更加准确的预测结果。
methods: 本研究使用了 Segment Anything 模型的Encoder部分，以提取高级别的semantic特征。同时，通过施加 fourier 和标准卷积操作，以提取频域特征。
results: 实验结果表明，使用我们提议的方法可以在四个代表性的数据集上超越当前最佳状态（SOTA）， both qualitatively and quantitatively。这些实验证明了 Segment Anything 模型的强大特征提取能力，以及频域和空间域特征的组合在 IQA 任务中的价值。

Abstract
Image Quality Assessment (IQA) is a challenging task that requires training on massive datasets to achieve accurate predictions. However, due to the lack of IQA data, deep learning-based IQA methods typically rely on pre-trained networks trained on massive datasets as feature extractors to enhance their generalization ability, such as the ResNet network trained on ImageNet. In this paper, we utilize the encoder of Segment Anything, a recently proposed segmentation model trained on a massive dataset, for high-level semantic feature extraction. Most IQA methods are limited to extracting spatial-domain features, while frequency-domain features have been shown to better represent noise and blur. Therefore, we leverage both spatial-domain and frequency-domain features by applying Fourier and standard convolutions on the extracted features, respectively. Extensive experiments are conducted to demonstrate the effectiveness of all the proposed components, and results show that our approach outperforms the state-of-the-art (SOTA) in four representative datasets, both qualitatively and quantitatively. Our experiments confirm the powerful feature extraction capabilities of Segment Anything and highlight the value of combining spatial-domain and frequency-domain features in IQA tasks. Code: https://github.com/Hedlen/SAM-IQA

摘要
Image Quality Assessment (IQA) 是一个复杂的任务，需要训练大量数据来获得准确的预测。然而，由于缺乏 IQA 数据，深度学习基于 IQA 方法通常会使用预训练的网络作为特征提取器，以增强其泛化能力，如 ImageNet 上的 ResNet 网络。在这篇论文中，我们利用 Segment Anything 中的encoder，一个最近提出的分割模型，在大量数据上进行高级别semantic特征提取。大多数 IQA 方法只是提取空间频域特征，而频率频域特征已经被证明可以更好地表示噪音和模糊。因此，我们利用空间频域和频率频域特征，通过Fourier和标准 convolutions应用于提取的特征，分别进行特征提取。我们进行了广泛的实验，以证明所提出的所有组件的效果，结果表明我们的方法在四个代表性数据集上， both qualitatively and quantitatively 超越了状态前的最佳方法。我们的实验证明了 Segment Anything 的强大特征提取能力，并高亮了在 IQA 任务中，结合空间频域和频率频域特征的价值。代码：https://github.com/Hedlen/SAM-IQA

Search-time Efficient Device Constraints-Aware Neural Architecture Search

paper_url: http://arxiv.org/abs/2307.04443
repo_url: None
paper_authors: Oshin Dutta, Tanu Kanvar, Sumeet Agarwal
for: 这篇论文目标是解决边缘计算领域中的深度学习问题，即使用边缘设备进行数据处理，而不需要依赖云计算。
methods: 该论文使用了神经网络搜索（NAS）技术自动构建适合边缘设备的特定任务深度学习架构，并包括了设备约束（如模型大小和浮点运算）。它还使用了共享权重和扁担束技术以加速搜索过程。
results: 根据实验表明，DCA-NAS比手动设计的架构更高效，并与流行的移动架构在多个图像分类数据集上达到了相似的性能水平。此外，通过在DARTS和NAS-Bench-201上进行搜索空间的测试，表明DCA-NAS具有广泛的通用能力。最后，通过在Hardware-NAS-Bench上进行评估，发现了适用于具体设备的特定架构，并达到了低于10ms的推理延迟和现场性能的国际前景。

Abstract
Edge computing aims to enable edge devices, such as IoT devices, to process data locally instead of relying on the cloud. However, deep learning techniques like computer vision and natural language processing can be computationally expensive and memory-intensive. Creating manual architectures specialized for each device is infeasible due to their varying memory and computational constraints. To address these concerns, we automate the construction of task-specific deep learning architectures optimized for device constraints through Neural Architecture Search (NAS). We present DCA-NAS, a principled method of fast neural network architecture search that incorporates edge-device constraints such as model size and floating-point operations. It incorporates weight sharing and channel bottleneck techniques to speed up the search time. Based on our experiments, we see that DCA-NAS outperforms manual architectures for similar sized models and is comparable to popular mobile architectures on various image classification datasets like CIFAR-10, CIFAR-100, and Imagenet-1k. Experiments with search spaces -- DARTS and NAS-Bench-201 show the generalization capabilities of DCA-NAS. On further evaluating our approach on Hardware-NAS-Bench, device-specific architectures with low inference latency and state-of-the-art performance were discovered.

摘要
Edge computing旨在让边缘设备，如IoT设备，在云上进行数据处理而不依赖于云。然而，深度学习技术如计算机视觉和自然语言处理可能会占用许多计算资源和内存。为了解决这些问题，我们自动化了任务特定的深度学习建筑，以适应设备的变化。我们称之为DCA-NAS，它是一种快速的神经网络建筑搜索方法，它包括设备约束such as模型大小和浮点运算。它还包括共享权重和扁拟束技术来加速搜索时间。根据我们的实验，DCA-NAS比相同大小的手动建筑更高效，与流行的移动设备建筑相当在多个图像分类数据集上，如CIFAR-10、CIFAR-100和Imagenet-1k。我们的实验还表明DCA-NAS可以在不同的搜索空间中进行普适化，例如DARTS和NAS-Bench-201。进一步评估我们的方法在Hardware-NAS-Bench上，我们发现了适用于具体设备的硬件特定建筑，具有低评估延迟和当前最佳性能。

Automatic diagnosis of knee osteoarthritis severity using Swin transformer

paper_url: http://arxiv.org/abs/2307.04442
repo_url: None
paper_authors: Aymen Sekhri, Marouane Tliba, Mohamed Amine Kerkouri, Yassine Nasser, Aladine Chetouani, Alessandro Bruno, Rachid Jennane
for: 预测拟合股骨关节疾病（KOA）的严重程度
methods: 使用SwinTransformer自动预测KOA严重程度，使用公共可用的 radiographic数据集（Kellgren和Lawrence分数）进行早期检测和严重评估
results: 实验结果表明我们的方法可以准确预测KOA严重程度

Abstract
Knee osteoarthritis (KOA) is a widespread condition that can cause chronic pain and stiffness in the knee joint. Early detection and diagnosis are crucial for successful clinical intervention and management to prevent severe complications, such as loss of mobility. In this paper, we propose an automated approach that employs the Swin Transformer to predict the severity of KOA. Our model uses publicly available radiographic datasets with Kellgren and Lawrence scores to enable early detection and severity assessment. To improve the accuracy of our model, we employ a multi-prediction head architecture that utilizes multi-layer perceptron classifiers. Additionally, we introduce a novel training approach that reduces the data drift between multiple datasets to ensure the generalization ability of the model. The results of our experiments demonstrate the effectiveness and feasibility of our approach in predicting KOA severity accurately.

摘要
《骨关节炎（KOA）是一种广泛存在的疾病，可以导致chronic pain和股关节僵硬。早期发现和诊断是成功临床干预和管理的关键，以避免严重的complications，如 mobilit产生 loss。在这篇论文中，我们提出了一种自动化的方法，使用Swin Transformer来预测KOA的严重程度。我们的模型使用公共可用的 radiographic数据集来实现早期发现和严重评估。为了提高模型的准确性，我们使用多个预测头 Architecture，并使用多层感知器来进行分类。此外，我们还引入了一种新的训练方法，以降低数据的漂移，以确保模型的泛化能力。实验结果表明我们的方法可以准确地预测KOA的严重程度。》Note: The translation is in Simplified Chinese, which is the standard written form of Chinese used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Global and Local Visual Processing: Influence of Perceptual Field Variables

paper_url: http://arxiv.org/abs/2307.04435
repo_url: None
paper_authors: Zahra Rezvani, Ali Katanforoush, Richard van Wezel, Hamidreza Pouretemad
for: 这个论文旨在探讨全球优势效应（GPE）在视觉过程中的普遍性，以及不同PERCEPTUAL FIELD VARIABLES（PFVs）对GPE的影响。
methods: 这个研究使用了各种实验方法，包括匹配判断和相似判断任务，以explore GPE在不同任务 paradigms下的表现。
results: 研究发现，CONGRUENCY和SIZE两个PFVs有显著的影响，而SPARSITY只有小的影响。此外，任务概念和PFVs之间的交互也有显著的效果。这些结果表明，任务概念在评估PFVs对GPE的影响中扮演着重要的角色。

Abstract
The Global Precedence Effect (GPE) suggests that the processing of global properties of a visual stimulus precedes the processing of local properties. The generality of this theory was argued for four decades during different known Perceptual Field Variables. The effect size of various PFVs, regarding the findings during these four decades, were pooled in our recent meta-analysis study. Pursuing the study, in the present paper, we explore the effects of Congruency, Size, and Sparsity and their interaction on global advantage in two different experiments with different task paradigms; Matching judgment and Similarity judgment. Upon results of these experiments, Congruency and Size have significant effects and Sparsity has small effects. Also, the task paradigm and its interaction with other PFVs are shown significant effects in this study, which shows the prominence of the role of task paradigms in evaluating PFVs' effects on GPE. Also, we found that the effects of these parameters were not specific to the special condition that individuals were instructed to retinal stabilize. So, the experiments were more extendible to daily human behavior.

摘要
globally precedes processing of local properties, the Global Precedence Effect (GPE) suggests. 这个理论四十年来在不同的知觉场景中得到了证明，包括多种知觉场景中的效果大小的Pooling。在当前的论文中，我们探究了 Congruency、Size 和 Sparsity 对全局优势的影响，并在不同的任务模式下进行了两个实验。结果显示，Congruency 和 Size 具有显著影响，而 Sparsity 具有微scopic 影响。此外，任务模式和其他知觉场景之间的交互也在这些实验中得到了显著影响，这表明任务模式在评估知觉场景的效果中扮演着重要的角色。此外，我们发现这些参数的影响不仅限于特殊的指导困难者保持视场的情况，而且可以推广到日常人类行为。

Identification of Hemorrhage and Infarct Lesions on Brain CT Images using Deep Learning

paper_url: http://arxiv.org/abs/2307.04425
repo_url: None
paper_authors: Arunkumar Govindarajan, Arjun Agarwal, Subhankar Chattoraj, Dennis Robert, Satish Golla, Ujjwal Upadhyay, Swetha Tanamala, Aarthi Govindarajan
for: This paper aims to evaluate the performance of a deep learning-based algorithm in identifying intracranial hemorrhage (ICH) and infarct from head non-contrast computed tomography (NCCT) scans.
methods: The paper uses a deep learning-based algorithm to automatically identify ICH and infarct from head-NCCT scans. The dataset used for validation consists of head-NCCT scans collected from multiple diagnostic imaging centers across India.
results: The study shows the potential and limitations of using a DL-based algorithm for identifying ICH and infarct from head-NCCT scans. The algorithm demonstrated high accuracy in identifying ICH and infarct, but the results also highlighted the limitations of using this approach in routine clinical practice due to factors such as image quality and dataset variability.

Abstract
Head Non-contrast computed tomography (NCCT) scan remain the preferred primary imaging modality due to their widespread availability and speed. However, the current standard for manual annotations of abnormal brain tissue on head NCCT scans involves significant disadvantages like lack of cutoff standardization and degeneration identification. The recent advancement of deep learning-based computer-aided diagnostic (CAD) models in the multidisciplinary domain has created vast opportunities in neurological medical imaging. Significant literature has been published earlier in the automated identification of brain tissue on different imaging modalities. However, determining Intracranial hemorrhage (ICH) and infarct can be challenging due to image texture, volume size, and scan quality variability. This retrospective validation study evaluated a DL-based algorithm identifying ICH and infarct from head-NCCT scans. The head-NCCT scans dataset was collected consecutively from multiple diagnostic imaging centers across India. The study exhibits the potential and limitations of such DL-based software for introduction in routine workflow in extensive healthcare facilities.

摘要
<>头部非对比计算机断层成像（NCCT）扫描仍然是首选的主要图像Modalität，因为它们的普及性和速度很高。然而，现有的手动标注不正常脑组织头部NCCT扫描的标准具有一些缺点，如标准化的分割标准和衰变标识。随着深度学习基于计算机支持的诊断（CAD）模型在多学科领域的发展，头部神经医学成像中的自动识别技术受到了广泛的关注。 Earlier literature has been published on the automated identification of brain tissue on different imaging modalities. However, determining intracranial hemorrhage (ICH) and infarct can be challenging due to image texture, volume size, and scan quality variability. This retrospective validation study evaluated a deep learning-based algorithm for identifying ICH and infarct from head-NCCT scans. The head-NCCT scans dataset was collected consecutively from multiple diagnostic imaging centers across India. The study exhibits the potential and limitations of such deep learning-based software for introduction in routine workflow in extensive healthcare facilities.<>

Towards Enabling Cardiac Digital Twins of Myocardial Infarction Using Deep Computational Models for Inverse Inference

paper_url: http://arxiv.org/abs/2307.04421
repo_url: None
paper_authors: Lei Li, Julia Camps, Zhinuo, Wang, Abhirup Banerjee, Marcel Beetz, Blanca Rodriguez, Vicente Grau
for: 这个研究旨在开发一个基于电子参数血液征象（ECG）的个体心脏模型，以实现个体化诊断和治疗规划的个体化诊断和治疗规划。
methods: 这个研究使用了多modal的数据，例如心脏MRI和ECG，以增强推论的精度和可靠性。实验使用了电脑模拟，系统地探讨了损伤的位置、大小、传播度和电子活动的变化对QRS复议的影响，以建立方法的限制。
results: 研究发现，提出的深度计算模型可以有效地捕捉QRS复议和相应的损伤区域之间的复杂关系，具有推广应用的潜力。

Abstract
Myocardial infarction (MI) demands precise and swift diagnosis. Cardiac digital twins (CDTs) have the potential to offer individualized evaluation of cardiac function in a non-invasive manner, making them a promising approach for personalized diagnosis and treatment planning of MI. The inference of accurate myocardial tissue properties is crucial in creating a reliable CDT platform, and particularly in the context of studying MI. In this work, we investigate the feasibility of inferring myocardial tissue properties from the electrocardiogram (ECG), focusing on the development of a comprehensive CDT platform specifically designed for MI. The platform integrates multi-modal data, such as cardiac MRI and ECG, to enhance the accuracy and reliability of the inferred tissue properties. We perform a sensitivity analysis based on computer simulations, systematically exploring the effects of infarct location, size, degree of transmurality, and electrical activity alteration on the simulated QRS complex of ECG, to establish the limits of the approach. We subsequently propose a deep computational model to infer infarct location and distribution from the simulated QRS. The in silico experimental results show that our model can effectively capture the complex relationships between the QRS signals and the corresponding infarct regions, with promising potential for clinical application in the future. The code will be released publicly once the manuscript is accepted for publication.

摘要
In this study, we investigate the feasibility of inferring myocardial tissue properties from the electrocardiogram (ECG) to develop a comprehensive CDT platform for MI. We integrate multi-modal data, including cardiac MRI and ECG, to enhance the accuracy and reliability of the inferred tissue properties.To evaluate the effectiveness of our approach, we perform a sensitivity analysis based on computer simulations. We systematically explore the effects of infarct location, size, degree of transmurality, and electrical activity alteration on the simulated QRS complex of ECG. Our results show that our model can effectively capture the complex relationships between the QRS signals and the corresponding infarct regions, with promising potential for clinical application in the future.Our code will be released publicly once the manuscript is accepted for publication, providing a valuable resource for researchers and clinicians working in the field of MI diagnosis and treatment.

FODVid: Flow-guided Object Discovery in Videos

paper_url: http://arxiv.org/abs/2307.04392
repo_url: None
paper_authors: Silky Singh, Shripad Deshmukh, Mausoom Sarkar, Rishabh Jain, Mayur Hemani, Balaji Krishnamurthy
for: solve Video Object Segmentation (VOS) in an unsupervised setting
methods: incorporating intra-frame appearance and flow similarities, and inter-frame temporal continuation of the objects under consideration
results: results comparable (within a range of ~2 mIoU) to the existing top approaches in unsupervised VOS

Abstract
Segmentation of objects in a video is challenging due to the nuances such as motion blurring, parallax, occlusions, changes in illumination, etc. Instead of addressing these nuances separately, we focus on building a generalizable solution that avoids overfitting to the individual intricacies. Such a solution would also help us save enormous resources involved in human annotation of video corpora. To solve Video Object Segmentation (VOS) in an unsupervised setting, we propose a new pipeline (FODVid) based on the idea of guiding segmentation outputs using flow-guided graph-cut and temporal consistency. Basically, we design a segmentation model incorporating intra-frame appearance and flow similarities, and inter-frame temporal continuation of the objects under consideration. We perform an extensive experimental analysis of our straightforward methodology on the standard DAVIS16 video benchmark. Though simple, our approach produces results comparable (within a range of ~2 mIoU) to the existing top approaches in unsupervised VOS. The simplicity and effectiveness of our technique opens up new avenues for research in the video domain.

摘要
视频对象分割是一项复杂的任务，因为存在许多细节，如运动模糊、相对位移、遮挡、照明变化等。而不是单独处理这些细节，我们强调总体化解决方案，以避免过拟合特定细节。这样的解决方案不仅可以减少对视频资料的人工标注成本，还可以开拓新的研究领域。为解决无监督视频对象分割（VOS）问题，我们提出了一个新的管道（FODVid），基于流场导向图割和时间一致性。我们设计了一种包含内帧出现相似性和流动相似性的分割模型，并且在不同帧之间继续对对象进行时间一致性。我们对标准的DAVIS16视频benchmark进行了广泛的实验分析，并证明了我们的简单方法在无监督VOS中可以获得比较好的结果（与对比方法的误差在 ~2 mIoU 之间）。我们的简单 yet effective 的方法打开了新的研究领域，特别是在视频领域。

CT-based Subchondral Bone Microstructural Analysis in Knee Osteoarthritis via MR-Guided Distillation Learning

paper_url: http://arxiv.org/abs/2307.04390
repo_url: https://github.com/jackhu-bme/srrd
paper_authors: Yuqi Hu, Xiangyu Zhao, Gaowei Qing, Kai Xie, Chenglei Liu, Lichi Zhang
For: The paper aims to develop a novel method for subchondral bone microstructural analysis using easily-acquired CT images, which can enhance the accuracy of knee osteoarthritis classification.* Methods: The proposed method, named SRRD, leverages paired MR images to enhance the CT-based analysis model during training. The method uses a GAN-based generative model to transform MR images into CT images, and a distillation-learning technique to transfer MR structural information to the CT-based model.* Results: The proposed method achieved high reliability and validity in MR-CT registration, regression, and knee osteoarthritis classification, with an AUC score of 0.767 (95% CI, 0.681-0.853). The use of distillation learning significantly improved the performance of the CT-based knee osteoarthritis classification method using the CNN approach.

Abstract
Background: MR-based subchondral bone effectively predicts knee osteoarthritis. However, its clinical application is limited by the cost and time of MR. Purpose: We aim to develop a novel distillation-learning-based method named SRRD for subchondral bone microstructural analysis using easily-acquired CT images, which leverages paired MR images to enhance the CT-based analysis model during training. Materials and Methods: Knee joint images of both CT and MR modalities were collected from October 2020 to May 2021. Firstly, we developed a GAN-based generative model to transform MR images into CT images, which was used to establish the anatomical correspondence between the two modalities. Next, we obtained numerous patches of subchondral bone regions of MR images, together with their trabecular parameters (BV / TV, Tb. Th, Tb. Sp, Tb. N) from the corresponding CT image patches via regression. The distillation-learning technique was used to train the regression model and transfer MR structural information to the CT-based model. The regressed trabecular parameters were further used for knee osteoarthritis classification. Results: A total of 80 participants were evaluated. CT-based regression results of trabecular parameters achieved intra-class correlation coefficients (ICCs) of 0.804, 0.773, 0.711, and 0.622 for BV / TV, Tb. Th, Tb. Sp, and Tb. N, respectively. The use of distillation learning significantly improved the performance of the CT-based knee osteoarthritis classification method using the CNN approach, yielding an AUC score of 0.767 (95% CI, 0.681-0.853) instead of 0.658 (95% CI, 0.574-0.742) (p<.001). Conclusions: The proposed SRRD method showed high reliability and validity in MR-CT registration, regression, and knee osteoarthritis classification, indicating the feasibility of subchondral bone microstructural analysis based on CT images.

摘要
Materials and Methods: Knee joint CT and MR images were collected from October 2020 to May 2021. A GAN-based generative model was used to transform MR images into CT images for anatomical correspondence. Regression was used to obtain trabecular parameters (BV/TV, Tb. Th, Tb. Sp, Tb. N) from CT image patches. Distillation learning was used to train the regression model and transfer MR structural information to the CT-based model. The regressed trabecular parameters were used for knee osteoarthritis classification.Results: 80 participants were evaluated. CT-based regression achieved intra-class correlation coefficients (ICCs) of 0.804, 0.773, 0.711, and 0.622 for BV/TV, Tb. Th, Tb. Sp, and Tb. N, respectively. Distillation learning significantly improved the performance of the CT-based knee osteoarthritis classification method using the CNN approach, yielding an AUC score of 0.767 (95% CI, 0.681-0.853) instead of 0.658 (95% CI, 0.574-0.742) (p<.001).Conclusions: The proposed SRRD method showed high reliability and validity in MR-CT registration, regression, and knee osteoarthritis classification, indicating the feasibility of subchondral bone microstructural analysis based on CT images.

Towards Generalizable Diabetic Retinopathy Grading in Unseen Domains

paper_url: http://arxiv.org/abs/2307.04378
repo_url: https://github.com/chehx/dgdr
paper_authors: Haoxuan Che, Yuhan Cheng, Haibo Jin, Hao Chen
for:This paper aims to address the domain generalization problem in deep learning-based automated diabetic retinopathy (DR) grading, which hinders the real-world deployment of DR grading systems.methods:The proposed method, called Generalizable Diabetic Retinopathity Grading Network (GDRNet), consists of three components: fundus visual-artifact augmentation (FundusAug), dynamic hybrid-supervised loss (DahLoss), and domain-class-aware re-balancing (DCR).results:GDRNet achieves state-of-the-art performance on a publicly available benchmark and demonstrates better generalization ability than existing methods through extensive comparison experiments and ablation studies.

Abstract
Diabetic Retinopathy (DR) is a common complication of diabetes and a leading cause of blindness worldwide. Early and accurate grading of its severity is crucial for disease management. Although deep learning has shown great potential for automated DR grading, its real-world deployment is still challenging due to distribution shifts among source and target domains, known as the domain generalization problem. Existing works have mainly attributed the performance degradation to limited domain shifts caused by simple visual discrepancies, which cannot handle complex real-world scenarios. Instead, we present preliminary evidence suggesting the existence of three-fold generalization issues: visual and degradation style shifts, diagnostic pattern diversity, and data imbalance. To tackle these issues, we propose a novel unified framework named Generalizable Diabetic Retinopathy Grading Network (GDRNet). GDRNet consists of three vital components: fundus visual-artifact augmentation (FundusAug), dynamic hybrid-supervised loss (DahLoss), and domain-class-aware re-balancing (DCR). FundusAug generates realistic augmented images via visual transformation and image degradation, while DahLoss jointly leverages pixel-level consistency and image-level semantics to capture the diverse diagnostic patterns and build generalizable feature representations. Moreover, DCR mitigates the data imbalance from a domain-class view and avoids undesired over-emphasis on rare domain-class pairs. Finally, we design a publicly available benchmark for fair evaluations. Extensive comparison experiments against advanced methods and exhaustive ablation studies demonstrate the effectiveness and generalization ability of GDRNet.

摘要
糖尿病综述病 (DR) 是 диабеت tipo 2 的常见并发症，并是全球最大的盲人原因。早期和准确地评估其严重程度是疾病管理的关键。虽然深度学习已经表现出了自动DR评估的潜在能力，但是在实际应用中仍然存在域泛化问题，即域域识别问题。现有的工作主要归结于域域识别问题，即限制域域识别问题，但是我们发现了三种普遍问题：视觉和衰落风格差异、诊断样本多样性和数据不均衡。为解决这些问题，我们提出了一个全新的框架，即普遍DR评估网络（GDRNet）。 GDRNet包括三个重要组成部分：基于图像增强的视觉增强（FundusAug）、动态混合监督损失（DahLoss）和域类感知重新平衡（DCR）。 FundusAug通过视觉变换和图像衰落来生成真实的增强图像，而 DahLoss 同时captures多样的诊断模式和建立普遍的特征表示。此外，DCR 消除了域类视图中的数据不均衡，以避免不必要地强调少数域类对的独特对。最后，我们设计了一个公共可用的标准套件，以便公平的评估。对于先进方法和详细的抽象研究，我们进行了广泛的比较和探索。结果表明，GDRNet 具有普遍能力和高效性。

One-Shot Pruning for Fast-adapting Pre-trained Models on Devices

paper_url: http://arxiv.org/abs/2307.04365
repo_url: None
paper_authors: Haiyan Zhao, Guodong Long
for: 这篇 paper 的目的是提出一个可扩展的一体化推断运算方法，将大规模预训练模型部署到低能量设备上。
methods: 方法是根据类似任务的删除模型来提取适当大小的子网络，以便快速适束新任务。
results: 实验分析显示，提案的方法可以在不同的数据集上实现高精度和高效率，并且在不同的下游任务和设备上表现更好 than 对等的删除基eline方法。

Abstract
Large-scale pre-trained models have been remarkably successful in resolving downstream tasks. Nonetheless, deploying these models on low-capability devices still requires an effective approach, such as model pruning. However, pruning the model from scratch can pose a practical challenge given the limited resources of each downstream task or device. To tackle this issue, we present a scalable one-shot pruning method that leverages pruned knowledge of similar tasks to extract a sub-network from the pre-trained model for a new task. Specifically, we create a score mask using the pruned models of similar tasks to identify task-specific filters/nodes in the pre-trained model for the new task. Based on this mask, we conduct a single round of pruning to extract a suitably-sized sub-network that can quickly adapt to the new task with only a few training iterations. Our experimental analysis demonstrates the effectiveness of the proposed method on the convolutional neural networks (CNNs) and vision transformers (ViT) with various datasets. The proposed method consistently outperforms popular pruning baseline methods in terms of accuracy and efficiency when dealing with diverse downstream tasks with different memory constraints.

摘要
大规模预训练模型已经在下游任务中表现惊人成功。然而，在低能力设备上部署这些模型仍然需要有效的方法，如模型剪除。然而，从零开始剪除模型可能是实际挑战，因为每个下游任务或设备的资源都是有限的。为解决这个问题，我们提出了可扩展的一键剪除方法，利用预剪的相似任务知识来提取预训练模型中的子网络 для新任务。特别是，我们创建了一个分数面罩，使用预剪模型中的相似任务来标识新任务中的任务特有的滤波器/节点。根据这个面罩，我们在单轮剪除中提取了适当大小的子网络，可以快速适应新任务，只需几个训练轮。我们的实验分析表明，我们的方法在 convolutional neural networks (CNNs) 和 vision transformers (ViT) 中对各种数据集表现出了效果。我们的方法在处理不同的下游任务时，常常超越流行的剪除基准方法，在准确性和效率方面。

InfLoR-SNN: Reducing Information Loss for Spiking Neural Networks

paper_url: http://arxiv.org/abs/2307.04356
repo_url: None
paper_authors: Yufei Guo, Yuanpei Chen, Liwen Zhang, Xiaode Liu, Xinyi Tong, Yuanyuan Ou, Xuhui Huang, Zhe Ma
for: 这个论文是为了提出一种基于超viscous reset的激活函数的神经网络模型，以提高神经网络的能效性和精度。
methods: 该论文使用了基于加法的信号传输方式，并提出了一种基于动态重置电压的”软重置”机制，以及一种名为”Rectifier”的电压重定义器，以减少量化误差。
results: 实验结果表明，使用”软重置”机制和”Rectifier”组合可以在静止和动态数据集上超过原始神经网络的性能。

Abstract
The Spiking Neural Network (SNN) has attracted more and more attention recently. It adopts binary spike signals to transmit information. Benefitting from the information passing paradigm of SNNs, the multiplications of activations and weights can be replaced by additions, which are more energy-efficient. However, its "Hard Reset" mechanism for the firing activity would ignore the difference among membrane potentials when the membrane potential is above the firing threshold, causing information loss. Meanwhile, quantifying the membrane potential to 0/1 spikes at the firing instants will inevitably introduce the quantization error thus bringing about information loss too. To address these problems, we propose to use the "Soft Reset" mechanism for the supervised training-based SNNs, which will drive the membrane potential to a dynamic reset potential according to its magnitude, and Membrane Potential Rectifier (MPR) to reduce the quantization error via redistributing the membrane potential to a range close to the spikes. Results show that the SNNs with the "Soft Reset" mechanism and MPR outperform their vanilla counterparts on both static and dynamic datasets.

摘要
《刺激神经网络（SNN）》在最近几年内受到了越来越多的关注。它使用二进制刺激信号来传输信息。由于SNN的信息传递模式，活动和权重的乘法可以被替换为加法，这会更加能效。然而，SNN的“硬重置”机制会忽略细胞电位的差异，当细胞电位超过刺激阈值时，这会导致信息损失。同时，将细胞电位量化为0/1刺激在刺激实时会导致量化误差，从而再次导致信息损失。为解决这些问题，我们提议使用“软重置”机制来进行监督式训练的SNN，将细胞电位驱动到一个动态重置电位，根据其大小，并使用细胞电位重定向器（MPR）来减少量化误差。结果表明，使用“软重置”机制和MPR的SNN比其原始版本在静态和动态数据集上表现更佳。

Hierarchical Semantic Tree Concept Whitening for Interpretable Image Classification

paper_url: http://arxiv.org/abs/2307.04343
repo_url: None
paper_authors: Haixing Dai, Lu Zhang, Lin Zhao, Zihao Wu, Zhengliang Liu, David Liu, Xiaowei Yu, Yanjun Lyu, Changying Li, Ninghao Liu, Tianming Liu, Dajiang Zhu
for: 提高深度神经网络（DNNs）模型解释性
methods: 使用层次结构的semantic概念存储知识，在训练深度模型时进行正则化，使模型表示含义更加精炼
results: 提高模型解释性，提取Semantic概念的分离，不增加模型分类性能的损失

Abstract
With the popularity of deep neural networks (DNNs), model interpretability is becoming a critical concern. Many approaches have been developed to tackle the problem through post-hoc analysis, such as explaining how predictions are made or understanding the meaning of neurons in middle layers. Nevertheless, these methods can only discover the patterns or rules that naturally exist in models. In this work, rather than relying on post-hoc schemes, we proactively instill knowledge to alter the representation of human-understandable concepts in hidden layers. Specifically, we use a hierarchical tree of semantic concepts to store the knowledge, which is leveraged to regularize the representations of image data instances while training deep models. The axes of the latent space are aligned with the semantic concepts, where the hierarchical relations between concepts are also preserved. Experiments on real-world image datasets show that our method improves model interpretability, showing better disentanglement of semantic concepts, without negatively affecting model classification performance.

摘要
随着深度神经网络（DNN）的普及，模型解释性变得越来越重要。许多方法已经被开发来解决这个问题，如解释预测如何进行或者理解中间层神经元的含义。然而，这些方法只能找到模型中自然存在的模式或规则。在这项工作中，我们不再仰仗后期分析方法，而是主动填充了人理解的概念知识，以影响深度模型中的表示。 Specifically，我们使用一个层次结构的semantic概念树来存储知识，并将其用于训练深度模型中的数据实例表示。模型的幂等空间轴与semantic概念之间的层次关系保持一致。实验表明，我们的方法可以提高模型解释性，提取 semantic概念的分离度更高，不会影响模型的分类性能。

New Variants of Frank-Wolfe Algorithm for Video Co-localization Problem

paper_url: http://arxiv.org/abs/2307.04319
repo_url: None
paper_authors: Hamid Nazari
for: 解决图像和视频中同类对象的同时本地化问题
methods: 基于Frank-Wolfe算法和conditional gradient sliding算法（CGS）的新变种
results: 在YouTube-Objects数据集上实现比较高的效率，并与 conditional gradient sliding算法（CGS）进行比较

Abstract
The co-localization problem is a model that simultaneously localizes objects of the same class within a series of images or videos. In \cite{joulin2014efficient}, authors present new variants of the Frank-Wolfe algorithm (aka conditional gradient) that increase the efficiency in solving the image and video co-localization problems. The authors show the efficiency of their methods with the rate of decrease in a value called the Wolfe gap in each iteration of the algorithm. In this project, inspired by the conditional gradient sliding algorithm (CGS) \cite{CGS:Lan}, We propose algorithms for solving such problems and demonstrate the efficiency of the proposed algorithms through numerical experiments. The efficiency of these methods with respect to the Wolfe gap is compared with implementing them on the YouTube-Objects dataset for videos.

摘要
《协同地ocalization问题》是一种同时对同一类图像或视频中的对象进行本地化的模型。在《\cite{joulin2014efficient}》中，作者们提出了新的Frank-Wolfe算法（即条件Gradient）的变种，以提高解决图像和视频协同地ocalization问题的效率。作者们通过每次算法的Wolfe差值下降率来证明方法的效果。在这个项目中，我们Draw inspiration from conditional gradient sliding algorithm（CGS）《\cite{CGS:Lan}》，并提出了解决这些问题的算法。我们通过数值实验证明了我们的方法的效率，并与实现在YouTube-Objects dataset上进行比较。

Leveraging Multiple Descriptive Features for Robust Few-shot Image Learning

paper_url: http://arxiv.org/abs/2307.04317
repo_url: None
paper_authors: Zhili Feng, Anna Bair, J. Zico Kolter
for: 本研究旨在提高图像分类的准确率和robustness，通过自动生成多个视觉描述来代表每个类别。
methods: 研究使用大型语言模型（LLM）生成多个视觉描述，然后使用视图图像模型将描述转化为多个视觉特征，最后使用稀疏逻辑回归选择相关的特征来分类图像。
results: 研究表明，使用这种方法可以在几个shot learning Setting中提高图像分类的准确率和Robustness，并且在训练和测试集之间的混合性能也达到了状态艺术的水平。

Abstract
Modern image classification is based upon directly predicting model classes via large discriminative networks, making it difficult to assess the intuitive visual ``features'' that may constitute a classification decision. At the same time, recent works in joint visual language models such as CLIP provide ways to specify natural language descriptions of image classes but typically focus on providing single descriptions for each class. In this work, we demonstrate that an alternative approach, arguably more akin to our understanding of multiple ``visual features'' per class, can also provide compelling performance in the robust few-shot learning setting. In particular, we automatically enumerate multiple visual descriptions of each class -- via a large language model (LLM) -- then use a vision-image model to translate these descriptions to a set of multiple visual features of each image; we finally use sparse logistic regression to select a relevant subset of these features to classify each image. This both provides an ``intuitive'' set of relevant features for each class, and in the few-shot learning setting, outperforms standard approaches such as linear probing. When combined with finetuning, we also show that the method is able to outperform existing state-of-the-art finetuning approaches on both in-distribution and out-of-distribution performance.

摘要
现代图像分类基于直接预测模型类 via 大量抽象网络，使得分类决策的视觉特征难以评估。同时， recient works in JOINT 视觉语言模型（CLIP）提供了提供单个描述每个类的方法，但通常将精力集中在单个描述每个类。在这种工作中，我们示出了一种不同的方法，更加符合我们对多个“视觉特征”per 类的认知。具体来说，我们使用大语言模型（LLM）自动生成每个类的多个视觉描述，然后使用视觉图像模型将这些描述翻译成每个图像的多个视觉特征集；最后，我们使用稀疏逻辑回归选择每个图像的相关 subset 的特征进行分类。这种方法不仅提供了每个类的“直观”的视觉特征集，而且在少数shot learning setting中也能够超越标准方法，如线性探测。当与 fine-tuning 结合使用时，我们还示出了该方法可以超越现有的状态之artefinetuning 方法在 both in-distribution 和 out-of-distribution 性能上。

Robust Feature Learning Against Noisy Labels

paper_url: http://arxiv.org/abs/2307.04312
repo_url: None
paper_authors: Tsung-Ming Tai, Yun-Jie Jhang, Wen-Jyi Hwang
for: 这篇论文的目的是提出一种能够有效地适应噪音标签的深度神经网络学习方法。
methods: 这篇论文使用了不supervised augmentation restoration和cluster regularization来学习具有响应力的特征表现，并且引入进步自我发展来减少噪音标签的负面影响。
results: 实验结果显示，这篇论文提出的方法可以高效地提高模型对噪音标签的抗衰强性。

Abstract
Supervised learning of deep neural networks heavily relies on large-scale datasets annotated by high-quality labels. In contrast, mislabeled samples can significantly degrade the generalization of models and result in memorizing samples, further learning erroneous associations of data contents to incorrect annotations. To this end, this paper proposes an efficient approach to tackle noisy labels by learning robust feature representation based on unsupervised augmentation restoration and cluster regularization. In addition, progressive self-bootstrapping is introduced to minimize the negative impact of supervision from noisy labels. Our proposed design is generic and flexible in applying to existing classification architectures with minimal overheads. Experimental results show that our proposed method can efficiently and effectively enhance model robustness under severely noisy labels.

摘要
<>对深度神经网络的超级vised学习 heavily relies on large-scale datasets annotated with high-quality labels. However, mislabeled samples can significantly degrade the generalization of models and result in memorizing samples, leading to learning erroneous associations of data contents to incorrect annotations. To address this issue, this paper proposes an efficient approach to tackle noisy labels by learning robust feature representation based on unsupervised augmentation restoration and cluster regularization. Additionally, progressive self-bootstrapping is introduced to minimize the negative impact of supervision from noisy labels. Our proposed design is generic and flexible, and can be applied to existing classification architectures with minimal overheads. Experimental results show that our proposed method can efficiently and effectively enhance model robustness under severely noisy labels.Translation notes:* "supervised learning" is translated as "超级vised学习" (with "超级" meaning "super" and "vised" meaning "learning")* "deep neural networks" is translated as "深度神经网络"* "large-scale datasets" is translated as "大规模 datasets"* "high-quality labels" is translated as "高质量 labels"* "mislabeled samples" is translated as "错abeled samples"* "noisy labels" is translated as "噪音 labels"* "robust feature representation" is translated as "鲁棒特征表示"* "unsupervised augmentation restoration" is translated as "无监督增强 Restoration"* "cluster regularization" is translated as "群集 regularization"* "progressive self-bootstrapping" is translated as "进行式自我启动"* "minimal overheads" is translated as "最小的开销"Please note that the translation is done in Simplified Chinese, which is the most widely used standard for Chinese writing. If you need the translation in Traditional Chinese, please let me know.

K-Space-Aware Cross-Modality Score for Synthesized Neuroimage Quality Assessment

paper_url: http://arxiv.org/abs/2307.04296
repo_url: None
paper_authors: Jinbao Wang, Guoyang Xie, Yawen Huang, Jiayi Lyu, Feng Zheng, Yefeng Zheng, Yaochu Jin
for: 评估多Modalität医学影像合成的问题尚未得到充分研究。现有的度量方法，如PSNR和SSIM，主要关注结构特征，忽略了医学影像中重要的病变位置和基本k空间特点。
methods: 我们提出了一个新的度量方法K-CROSS，以促进这个问题的解决。K-CROSS使用预训练的多Modalität分割网络预测病变位置，并使用肿瘤编码器表示特征，如текстура细节和亮度强度。为了更好地反映医学影像原理中的频率特征，我们获得了k空间特征和视觉特征，并将其集成到了我们的全面编码器中。
results: 我们的方法在NIRPS数据集中表现出色，特别是与放射学专业人员相比。我们的实验结果表明，K-CROSS度量方法可以更好地评估多Modalität医学影像合成的质量。

Abstract
The problem of how to assess cross-modality medical image synthesis has been largely unexplored. The most used measures like PSNR and SSIM focus on analyzing the structural features but neglect the crucial lesion location and fundamental k-space speciality of medical images. To overcome this problem, we propose a new metric K-CROSS to spur progress on this challenging problem. Specifically, K-CROSS uses a pre-trained multi-modality segmentation network to predict the lesion location, together with a tumor encoder for representing features, such as texture details and brightness intensities. To further reflect the frequency-specific information from the magnetic resonance imaging principles, both k-space features and vision features are obtained and employed in our comprehensive encoders with a frequency reconstruction penalty. The structure-shared encoders are designed and constrained with a similarity loss to capture the intrinsic common structural information for both modalities. As a consequence, the features learned from lesion regions, k-space, and anatomical structures are all captured, which serve as our quality evaluators. We evaluate the performance by constructing a large-scale cross-modality neuroimaging perceptual similarity (NIRPS) dataset with 6,000 radiologist judgments. Extensive experiments demonstrate that the proposed method outperforms other metrics, especially in comparison with the radiologists on NIRPS.

摘要
医学多Modal图像合成评估问题尚未得到广泛的研究。现有的度量方法，如PSNR和SSIM，主要关注医学图像的结构特征，而忽略了病变位置和基本扫描空间特点。为解决这问题，我们提出了一种新的度量方法，即K-CROSS。具体来说，K-CROSS使用预训练的多Modal Segmentation网络预测病变位置，同时使用肿瘤编码器来表示特征，如文字特征和亮度强度。为了更好地反映医学图像的频率特征，我们获得了k空间特征和视觉特征，并将其与共同Encoder结合，并使用频率重建罚。结构共享Encoder受到相似损失的限制，以捕捉两种模式之间的共同结构信息。因此，我们可以从病变区域、k空间和解剖结构中获得特征，这些特征作为我们评估质量的依据。我们通过构建6000个医生评估的大规模跨Modal神经成像相似度（NIRPS）数据集来评估性能。广泛的实验表明，我们提出的方法在比较其他度量方法时，特别是与医生的NIRPS表现出色。

Convex Decomposition of Indoor Scenes

paper_url: http://arxiv.org/abs/2307.04246
repo_url: None
paper_authors: Vaibhav Vavilala, David Forsyth
for: 这个论文的目的是解决复杂、杂乱的室内场景 parsing 问题，以获得简洁的场景结构抽象。
methods: 该方法使用学习回归过程将RGBD输入解决为固定数量的简单 convex primitives，可选择接受 segmentation 以提高分解。然后，使用 descendant 方法进行调整和优化，以 producenextremely good fit。
results: 论文的结果表明，使用该方法可以达到与单个图像预测深度的误差相似的精度。

Abstract
We describe a method to parse a complex, cluttered indoor scene into primitives which offer a parsimonious abstraction of scene structure. Our primitives are simple convexes. Our method uses a learned regression procedure to parse a scene into a fixed number of convexes from RGBD input, and can optionally accept segmentations to improve the decomposition. The result is then polished with a descent method which adjusts the convexes to produce a very good fit, and greedily removes superfluous primitives. Because the entire scene is parsed, we can evaluate using traditional depth, normal, and segmentation error metrics. Our evaluation procedure demonstrates that the error from our primitive representation is comparable to that of predicting depth from a single image.

摘要
我们描述了一种方法，用于将复杂敷地室内场景分解成基本元素，这些基本元素提供了场景结构的简洁抽象。我们的基本元素是简单的凹陷体。我们的方法使用学习回归过程将RGBD输入 parsed into a fixed number of convexes，并可选地接受分割来改进分解。结果 subsequentially polished with a descent method，以达到非常好的适应。因为整个场景都被分解，我们可以使用传统的深度、 норма和分割错误度量来评估。我们的评估过程表明，我们的基本表示错误与从单个图像预测深度的错误相当。

Mx2M: Masked Cross-Modality Modeling in Domain Adaptation for 3D Semantic Segmentation

paper_url: http://arxiv.org/abs/2307.04231
repo_url: https://github.com/zbx0810/Mx2M
paper_authors: Boxiang Zhang, Zunran Wang, Yonggen Ling, Yuanyuan Guan, Shenghao Zhang, Wenhui Li
for: 该研究旨在解决cross-modal domain adaptation for 3D semantic segmentation task中的缺乏监督问题，提高结果的精度和可靠性。
methods: 该方法基于masked modeling和cross-modal removal和预测（xMRP）技术，以及一种新的cross-modal feature matching方法——动态cross-modal filter（DxMF）。
results: 对于三个DA场景（Day/Night、USA/Singapore和A2D2/SemanticKITTI）的评估，该方法比前一代方法在许多指标上带来了大幅提升。

Abstract
Existing methods of cross-modal domain adaptation for 3D semantic segmentation predict results only via 2D-3D complementarity that is obtained by cross-modal feature matching. However, as lacking supervision in the target domain, the complementarity is not always reliable. The results are not ideal when the domain gap is large. To solve the problem of lacking supervision, we introduce masked modeling into this task and propose a method Mx2M, which utilizes masked cross-modality modeling to reduce the large domain gap. Our Mx2M contains two components. One is the core solution, cross-modal removal and prediction (xMRP), which makes the Mx2M adapt to various scenarios and provides cross-modal self-supervision. The other is a new way of cross-modal feature matching, the dynamic cross-modal filter (DxMF) that ensures the whole method dynamically uses more suitable 2D-3D complementarity. Evaluation of the Mx2M on three DA scenarios, including Day/Night, USA/Singapore, and A2D2/SemanticKITTI, brings large improvements over previous methods on many metrics.

摘要
现有的跨Modal域适应方法 для 3D semantic segmentation 仅通过2D-3D complementarity来预测结果，而这些 complementarity 通常由跨Modal feature matching 获得。然而，由于缺乏目标域的监督，这些 complementarity 不一定可靠。在差异大的域内，结果并不理想。为解决缺乏监督的问题，我们引入 masked modeling 到这个任务中，并提出了一种方法 Mx2M，该方法利用 masked cross-modality modeling 来减少大域差。Mx2M 包含两个组件。一个是核心解决方案，cross-modal removal and prediction (xMRP)，该方法使得 Mx2M 适应不同的场景，并提供 cross-modal self-supervision。另一个是一种新的跨Modal feature matching 方法，dynamic cross-modal filter (DxMF)，该方法确保整个方法在运行时能够使用更适合的 2D-3D complementarity。对 Mx2M 在三个 DA 场景中进行评估，包括 Day/Night、USA/Singapore 和 A2D2/SemanticKITTI，实现了大幅提高过去方法的多个纪录。

2023-07-10

cs.AI

cs.AI - 2023-07-10

Active Learning for Video Classification with Frame Level Queries

paper_url: http://arxiv.org/abs/2307.05587
repo_url: None
paper_authors: Debanjan Goswami, Shayok Chakraborty
for: 这篇论文的目的是提出一个新的活跃学习框架，以对影片分类 зада目中减少人类标注努力。
methods: 我们的框架使用了不确定性和多样性的标准来选择有用的影片和具有代表性的几帧帧帧，以便只需要人类标注几帧帧帧，而不是标注整个影片。
results: 我们的框架可以对影片分类任务中减少人类标注努力，并且可以获得较高的准确率。

Abstract
Deep learning algorithms have pushed the boundaries of computer vision research and have depicted commendable performance in a variety of applications. However, training a robust deep neural network necessitates a large amount of labeled training data, acquiring which involves significant time and human effort. This problem is even more serious for an application like video classification, where a human annotator has to watch an entire video end-to-end to furnish a label. Active learning algorithms automatically identify the most informative samples from large amounts of unlabeled data; this tremendously reduces the human annotation effort in inducing a machine learning model, as only the few samples that are identified by the algorithm, need to be labeled manually. In this paper, we propose a novel active learning framework for video classification, with the goal of further reducing the labeling onus on the human annotators. Our framework identifies a batch of exemplar videos, together with a set of informative frames for each video; the human annotator needs to merely review the frames and provide a label for each video. This involves much less manual work than watching the complete video to come up with a label. We formulate a criterion based on uncertainty and diversity to identify the informative videos and exploit representative sampling techniques to extract a set of exemplar frames from each video. To the best of our knowledge, this is the first research effort to develop an active learning framework for video classification, where the annotators need to inspect only a few frames to produce a label, rather than watching the end-to-end video.

摘要
深度学习算法在计算机视觉领域的研究中提出了新的挑战，并表现出了卓越的表现。然而，训练一个强大的深度神经网络需要大量的标注训练数据，获取这些数据需要很多时间和人工劳动。这个问题对于视频分类应用来说更加严重，因为人类标注者需要从头到尾播放整个视频，以为每个视频提供标签。活动学习算法可以自动从大量的未标注数据中选择最有用的样本，这将减少人工标注劳动的强度，只需要人类标注者为每个视频提供标签。在这篇论文中，我们提出了一个新的活动学习框架 для视频分类，目的是进一步减少人类标注劳动。我们的框架可以同时选择一批表型视频和每个视频的一些有用帧，人类标注者只需要审查这些帧并为每个视频提供标签。这需要的人工劳动比较少，只需要看到每个视频的一些帧，而不是从头到尾播放整个视频。我们基于不确定性和多样性的准则来选择有用的视频和帧，并使用代表性抽样技术来从每个视频中提取有用的帧。根据我们所知，这是第一篇对视频分类的活动学习框架进行研究，人类标注者只需要审查一些帧来生成标签，而不是从头到尾播放整个视频。

Weakly-supervised positional contrastive learning: application to cirrhosis classification

paper_url: http://arxiv.org/abs/2307.04617
repo_url: https://github.com/guerbet-ai/wsp-contrastive
paper_authors: Emma Sarfati, Alexandre Bône, Marc-Michel Rohé, Pietro Gori, Isabelle Bloch
for: 这个研究是为了提高医疗影像分类的精度和效率，使用低信度标签和弱标签来驱动模型的训练。
methods: 这个研究使用了一种弱相关学习策略，即将医疗影像的空间上下文和低信度标签 integrate 到一个通用的贝叶斯损失函数中，以提高模型的分类精度。
results: 研究结果表明，使用提议的方法可以提高医疗影像分类的分类AUC by 5%（相比基eline模型）和26%（相比公共的LIHC数据集）。

Abstract
Large medical imaging datasets can be cheaply and quickly annotated with low-confidence, weak labels (e.g., radiological scores). Access to high-confidence labels, such as histology-based diagnoses, is rare and costly. Pretraining strategies, like contrastive learning (CL) methods, can leverage unlabeled or weakly-annotated datasets. These methods typically require large batch sizes, which poses a difficulty in the case of large 3D images at full resolution, due to limited GPU memory. Nevertheless, volumetric positional information about the spatial context of each 2D slice can be very important for some medical applications. In this work, we propose an efficient weakly-supervised positional (WSP) contrastive learning strategy where we integrate both the spatial context of each 2D slice and a weak label via a generic kernel-based loss function. We illustrate our method on cirrhosis prediction using a large volume of weakly-labeled images, namely radiological low-confidence annotations, and small strongly-labeled (i.e., high-confidence) datasets. The proposed model improves the classification AUC by 5% with respect to a baseline model on our internal dataset, and by 26% on the public LIHC dataset from the Cancer Genome Atlas. The code is available at: https://github.com/Guerbet-AI/wsp-contrastive.

摘要
大量医疗成像数据可以便宜地和快速地被低信任度、软标签（例如放射学分数）注释。高信任度标签，如 histology-based 诊断，是罕见和昂贵的。无监督策略，如对比学习（CL）方法，可以利用无标签或软标签数据。这些方法通常需要大批量大小，但是在Full Resolution 3D图像的情况下，由于 GPU 内存限制，这可能会增加困难。然而，Volume 的位置信息可以在某些医学应用中是非常重要的。在这种情况下，我们提出了一种高效的软监督位置（WSP）对比学习策略，其中我们将通过一种通用的核心函数基于损失函数来整合每个2Dslice的空间上下文和软标签。我们在 cirrhosis 预测中使用了一个大量的软标签图像，即放射学低信任签名，以及一个小的强 confidence（高信任度）数据集。我们的模型可以与基eline模型相比提高分类AUC值5%，并在公共的LIHC数据集上提高26%。代码可以在：https://github.com/Guerbet-AI/wsp-contrastive 中找到。

Code Generation for Machine Learning using Model-Driven Engineering and SysML

paper_url: http://arxiv.org/abs/2307.05584
repo_url: https://github.com/sraedler/MDE_for_ML_Generation
paper_authors: Simon Raedler, Matthias Rupp, Eugen Rigger, Stefanie Rinderle-Ma
for: 这个论文的目的是提高工程系统的数据驱动工程实践。
methods: 这个论文使用机器学习模型转换来生成可执行代码，以便在实践中更好地实现数据驱动工程。
results: 研究表明，这种方法可以提高模型转换的可 modify性和维护性，并且可以减少实现的努力。此外，这种方法还可以提供一个理论基础，用于标准化数据驱动工程实践。

Abstract
Data-driven engineering refers to systematic data collection and processing using machine learning to improve engineering systems. Currently, the implementation of data-driven engineering relies on fundamental data science and software engineering skills. At the same time, model-based engineering is gaining relevance for the engineering of complex systems. In previous work, a model-based engineering approach integrating the formalization of machine learning tasks using the general-purpose modeling language SysML is presented. However, formalized machine learning tasks still require the implementation in a specialized programming languages like Python. Therefore, this work aims to facilitate the implementation of data-driven engineering in practice by extending the previous work of formalizing machine learning tasks by integrating model transformation to generate executable code. The method focuses on the modifiability and maintainability of the model transformation so that extensions and changes to the code generation can be integrated without requiring modifications to the code generator. The presented method is evaluated for feasibility in a case study to predict weather forecasts. Based thereon, quality attributes of model transformations are assessed and discussed. Results demonstrate the flexibility and the simplicity of the method reducing efforts for implementation. Further, the work builds a theoretical basis for standardizing data-driven engineering implementation in practice.

摘要
数据驱动工程指的是通过机器学习系统化收集和处理数据来提高工程系统。现在，数据驱动工程的实施仍然需要基础数据科学和软件工程技能。同时，模型基于工程在复杂系统工程中得到了应用。在前一项工作中，一种基于SysML通用模型语言的模型基于工程方法 integrates formalized机器学习任务的表现。然而，正式机器学习任务仍然需要通过专门的编程语言如Python进行实现。因此，本工作的目标是通过扩展前一项工作中的形式化机器学习任务，并将模型转换 integrate into executable code来促进数据驱动工程在实践中的实现。该方法强调模型转换的可 modify和维护性，以便在扩展和修改代码生成器时不需要修改代码。在一个案例研究中，通过预测天气预报来评估和讨论模型转换的质量特征。结果表明该方法的灵活性和简洁性，可以降低实现的努力。此外，该工作为数据驱动工程实践中的标准化提供了理论基础。

MiVOLO: Multi-input Transformer for Age and Gender Estimation

paper_url: http://arxiv.org/abs/2307.04616
repo_url: https://github.com/wildchlamydia/mivolo
paper_authors: Maksim Kuprashevich, Irina Tolstykh
for: 这篇论文的目的是提出一种基于视觉变换器的年龄和性别识别方法，以解决在野外的年龄和性别识别任务中存在的挑战。methods: 该方法使用最新的视觉变换器，并将年龄和性别识别任务集成到一个共同的双输入/双出口模型中，利用了人脸信息以及人像数据。results: 经过实验表明，该模型在四个流行的benchmark上 achieve state-of-the-art表现，并且能够实现实时处理。此外，我们还介绍了一个基于Open Images Dataset的新的benchmark，其 annotations是由人工注解员 manually generated，因此得到了高准确率的答案。最后，我们公开了我们的模型，以及验证和推理代码。

Abstract
Age and gender recognition in the wild is a highly challenging task: apart from the variability of conditions, pose complexities, and varying image quality, there are cases where the face is partially or completely occluded. We present MiVOLO (Multi Input VOLO), a straightforward approach for age and gender estimation using the latest vision transformer. Our method integrates both tasks into a unified dual input/output model, leveraging not only facial information but also person image data. This improves the generalization ability of our model and enables it to deliver satisfactory results even when the face is not visible in the image. To evaluate our proposed model, we conduct experiments on four popular benchmarks and achieve state-of-the-art performance, while demonstrating real-time processing capabilities. Additionally, we introduce a novel benchmark based on images from the Open Images Dataset. The ground truth annotations for this benchmark have been meticulously generated by human annotators, resulting in high accuracy answers due to the smart aggregation of votes. Furthermore, we compare our model's age recognition performance with human-level accuracy and demonstrate that it significantly outperforms humans across a majority of age ranges. Finally, we grant public access to our models, along with the code for validation and inference. In addition, we provide extra annotations for used datasets and introduce our new benchmark.

摘要
ages and gender recognition in the wild is a highly challenging task: apart from the variability of conditions, pose complexities, and varying image quality, there are cases where the face is partially or completely occluded. We present MiVOLO (Multi Input VOLO), a straightforward approach for age and gender estimation using the latest vision transformer. Our method integrates both tasks into a unified dual input/output model, leveraging not only facial information but also person image data. This improves the generalization ability of our model and enables it to deliver satisfactory results even when the face is not visible in the image. To evaluate our proposed model, we conduct experiments on four popular benchmarks and achieve state-of-the-art performance, while demonstrating real-time processing capabilities. Additionally, we introduce a novel benchmark based on images from the Open Images Dataset. The ground truth annotations for this benchmark have been meticulously generated by human annotators, resulting in high accuracy answers due to the smart aggregation of votes. Furthermore, we compare our model's age recognition performance with human-level accuracy and demonstrate that it significantly outperforms humans across a majority of age ranges. Finally, we grant public access to our models, along with the code for validation and inference. In addition, we provide extra annotations for used datasets and introduce our new benchmark.

Learning Interpretable Heuristics for WalkSAT

paper_url: http://arxiv.org/abs/2307.04608
repo_url: None
paper_authors: Yannet Interian, Sara Bernardini
for: 这 paper 是解决大型、困难的满足问题 (SAT) 的地方搜索算法的研究。
methods: 这 paper 使用了 reinforcement learning 来学习变量评价函数和噪音参数的优化策略。
results: 实验结果表明，与 WalkSAT 基准和另一种本地搜索学习的策略相比，这种方法能够获得更好的表现。

Abstract
Local search algorithms are well-known methods for solving large, hard instances of the satisfiability problem (SAT). The performance of these algorithms crucially depends on heuristics for setting noise parameters and scoring variables. The optimal setting for these heuristics varies for different instance distributions. In this paper, we present an approach for learning effective variable scoring functions and noise parameters by using reinforcement learning. We consider satisfiability problems from different instance distributions and learn specialized heuristics for each of them. Our experimental results show improvements with respect to both a WalkSAT baseline and another local search learned heuristic.

摘要
<>本文提出了一种使用再征学习来学习有效的变量评分函数和噪声参数，以解决大规模、困难的满足问题（SAT）。这些算法的性能取决于设置噪声参数和评分变量的规则。不同的实例分布下，optimal setting for these heuristics 会有所不同。本文 presente una aproximación para aprender heurísticas especializadas para cada distribución de instancias. Our experimental results show improvements over a WalkSAT baseline and another local search learned heuristic.Note: "WalkSAT" is a local search algorithm for solving SAT problems.

A Memristor-Inspired Computation for Epileptiform Signals in Spheroids

paper_url: http://arxiv.org/abs/2307.04607
repo_url: None
paper_authors: Iván Díez de los Ríos, John Wesley Ephraim, Gemma Palazzolo, Teresa Serrano-Gotarredona, Gabriella Panuccio, Bernabé Linares-Barranco
for: 这种研究是为了开发一种基于折射器的计算方法，用于实时计算和较低的计算成本下的癫病活动特征或指纹。
methods: 这种计算方法基于折射器的原理，使用微电极阵系统记录的癫病活动数据进行计算。
results: 这种计算方法可以在实时中计算并生成癫病事件开始的警示信号，并且可以减少计算成本。

Abstract
In this paper we present a memristor-inspired computational method for obtaining a type of running spectrogram or fingerprint of epileptiform activity generated by rodent hippocampal spheroids. It can be used to compute on the fly and with low computational cost an alert-level signal for epileptiform events onset. Here, we describe the computational method behind this fingerprint technique and illustrate it using epileptiform events recorded from hippocampal spheroids using a microelectrode array system.

摘要
在这篇论文中，我们介绍了一种基于抗晶阻器的计算方法，用于获得来自小鼠脊梁块的 épileptiform 活动的类型运动 спектрограм或指纹。这种方法可以在实时计算和低计算成本下生成 épileptiform 事件开始的警示信号。我们将在这篇论文中介绍这种指纹技术的计算方法，并使用来自脊梁块的 épileptiform 事件记录系统来说明。

DBFed: Debiasing Federated Learning Framework based on Domain-Independent

paper_url: http://arxiv.org/abs/2307.05582
repo_url: None
paper_authors: Jiale Li, Zhixin Li, Yibo Wang, Yao Li, Lei Wang
for: 这篇论文主要是关于如何使用 federated learning 技术保护数据隐私和解决数据岛屿问题，同时mitigate model bias caused by data differences in quality.
methods: 该论文提出了一种基于domain-independent的 debiasing federated learning 框架， named DBFed，可以在客户端上训练带有敏感特征的模型，并且可以Explicitlyencode sensitive attributes to mitigate model bias.
results: 该论文在三个实际数据集上进行了实验，并使用了五个评价指标来衡量模型的准确率和公平性。结果显示，DBFed 的评价指标比三种比较方法更高，充分表明 DBFed 可以减少模型偏见问题。

Abstract
As digital transformation continues, enterprises are generating, managing, and storing vast amounts of data, while artificial intelligence technology is rapidly advancing. However, it brings challenges in information security and data security. Data security refers to the protection of digital information from unauthorized access, damage, theft, etc. throughout its entire life cycle. With the promulgation and implementation of data security laws and the emphasis on data security and data privacy by organizations and users, Privacy-preserving technology represented by federated learning has a wide range of application scenarios. Federated learning is a distributed machine learning computing framework that allows multiple subjects to train joint models without sharing data to protect data privacy and solve the problem of data islands. However, the data among multiple subjects are independent of each other, and the data differences in quality may cause fairness issues in federated learning modeling, such as data bias among multiple subjects, resulting in biased and discriminatory models. Therefore, we propose DBFed, a debiasing federated learning framework based on domain-independent, which mitigates model bias by explicitly encoding sensitive attributes during client-side training. This paper conducts experiments on three real datasets and uses five evaluation metrics of accuracy and fairness to quantify the effect of the model. Most metrics of DBFed exceed those of the other three comparative methods, fully demonstrating the debiasing effect of DBFed.

摘要
如 digital transformation 继续发展，企业在生产、管理和存储庞大数据的同时，人工智能技术也在快速发展。然而，这些技术带来了信息安全和数据安全的挑战。数据安全指的是保护数字信息不被未经授权的访问、损害、盗窃等等的整个生命周期。随着数据安全法规的推出和组织和用户对数据安全和隐私的强调，隐私保护技术如联合学习得到了广泛的应用场景。联合学习是一种分布式机器学习计算框架，允许多个主体共同训练无需分享数据，以保护数据隐私和解决数据孤岛问题。然而，多个主体的数据归并独立，数据质量差异可能导致公平性问题，如多主体数据的偏见问题，导致模型偏见和歧视。因此，我们提出了DBFed，一种基于领域独立的偏见降低联合学习框架，通过客户端训练中显式编码敏感特征来mitigate模型偏见。这篇论文在三个实际数据集上进行了实验，并使用了五个评价指标来衡量模型的效果。大多数DBFed的评价指标超过了其他三种比较方法的评价指标，完全展示了DBFed的偏见降低效果。

Model-Driven Engineering for Artificial Intelligence – A Systematic Literature Review

paper_url: http://arxiv.org/abs/2307.04599
repo_url: https://github.com/sraedler/model-driven-engineering4artificial-intelligence
paper_authors: Simon Raedler, Luca Berardinelli, Karolin Winter, Abbas Rahimi, Stefanie Rinderle-Ma
for: 本研究目的是为了探讨Model-Driven Engineering（MDE）在人工智能（AI）领域的现有知识体系，以锻炼未来的研究方向和定义当前领域的惯性。
methods: 本研究采用了系统性Literature Review（SLR）方法，从五个主要数据库中收集了703个候选研究，最终选择了15个主要研究。每个主要研究都会被评估和讨论，以对MDE原则和实践的采用情况进行分析，同时也会对CRISP-DM方法ологиraph中的不同阶段进行对应。
results: 研究发现，使用MDE在AI领域的应用还处于早期阶段，没有一个广泛使用的工具或方法。现有的方法主要集中在特定的开发阶段，而不是涵盖整个开发过程。此外，训练和定制AI算法是主要的AI相关问题，而数据集准备则受到较少的注意。早期项目阶段，如CRISP-DM《企业理解》阶段，很少被反映。

Abstract
Objective: This study aims to investigate the existing body of knowledge in the field of Model-Driven Engineering MDE in support of AI (MDE4AI) to sharpen future research further and define the current state of the art. Method: We conducted a Systemic Literature Review (SLR), collecting papers from five major databases resulting in 703 candidate studies, eventually retaining 15 primary studies. Each primary study will be evaluated and discussed with respect to the adoption of (1) MDE principles and practices and (2) the phases of AI development support aligned with the stages of the CRISP-DM methodology. Results: The study's findings show that the pillar concepts of MDE (metamodel, concrete syntax and model transformation), are leveraged to define domain-specific languages (DSL) explicitly addressing AI concerns. Different MDE technologies are used, leveraging different language workbenches. The most prominent AI-related concerns are training and modeling of the AI algorithm, while minor emphasis is given to the time-consuming preparation of the data sets. Early project phases that support interdisciplinary communication of requirements, such as the CRISP-DM \textit{Business Understanding} phase, are rarely reflected. Conclusion: The study found that the use of MDE for AI is still in its early stages, and there is no single tool or method that is widely used. Additionally, current approaches tend to focus on specific stages of development rather than providing support for the entire development process. As a result, the study suggests several research directions to further improve the use of MDE for AI and to guide future research in this area.

摘要
目的：本研究旨在调查MDE（Model-Driven Engineering）在支持人工智能（AI）领域的现有知识体系，以锐化未来研究和定义当前领域的惯例。方法：我们进行了系统性文献综述（SLR），从五个主要数据库收集了703份候选学术论文，最终选择了15个首论。每个首论都将被评估和讨论，对MDE原则和实践的采纳以及CRISP-DM方法ологии中的不同阶段进行分析。结果：研究发现，MDE柱论（模型模型、具体语法和模型转换）在定义适用于人工智能问题的域特定语言（DSL）方面发挥了重要作用。不同的MDE技术被使用，利用不同的语言工具箱。AI algoritirthmic的训练和模型化收到了主要的注意，而数据集的准备则很少被考虑。CRISP-DM方法ологии的营运理解阶段（Business Understanding） rare reflection。结论：研究发现，MDE在人工智能领域的使用仍在早期阶段，没有一个广泛使用的工具或方法。当前的approaches倾向于特定的开发阶段，而不是为整个开发过程提供支持。因此，研究建议了 severaltudesearch directions，以进一步改进MDE在人工智能领域的使用，并引导未来的研究。

A Semi-Automated Solution Approach Selection Tool for Any Use Case via Scopus and OpenAI: a Case Study for AI/ML in Oncology

paper_url: http://arxiv.org/abs/2307.04573
repo_url: None
paper_authors: Deniz Kenan Kılıç, Alex Elkjær Vasegaard, Aurélien Desoeuvres, Peter Nielsen
for: 本研究提出了一个半自动化的工具，用于方案方法评估和选择，并且适用于研究者、实践者以及决策者。同时，这个工具可以作为未来研究的benchmark。
methods: 本研究使用了三个模组：（1）选择和分数计算 papers，使用关键词选择方案查询Scopus API， compute relevancy;（2）从 papers中提取方案方法，使用OpenAI API;（3）敏感分析和后analyzes。
results: 本研究显示出了方法方案的趋势、相关文献和方法。在肿瘤领域的实例和多个使用案中，这个工具获得了Promising的结果，与手动ground truth比较。

Abstract
In today's vast literature landscape, a manual review is very time-consuming. To address this challenge, this paper proposes a semi-automated tool for solution method review and selection. It caters to researchers, practitioners, and decision-makers while serving as a benchmark for future work. The tool comprises three modules: (1) paper selection and scoring, using a keyword selection scheme to query Scopus API and compute relevancy; (2) solution method extraction in papers utilizing OpenAI API; (3) sensitivity analysis and post-analyzes. It reveals trends, relevant papers, and methods. AI in the oncology case study and several use cases are presented with promising results, comparing the tool to manual ground truth.

摘要
今天的文献景观中，手动审查非常占时。为解决这个挑战，这篇论文提出了一种半自动化工具，用于解决方法审查和选择。它适用于研究人员、实践者和决策者，同时作为未来工作的标准。工具包括三个模块：（1）文献选择和分数计算，使用关键词选择方案来查询Scopus API计算相关性；（2）解决方法提取在论文中使用OpenAI API；（3）敏感分析和后期分析。它显示出了趋势、相关论文和方法。在肿瘤学案例和多个实践案例中，这种工具取得了有前提的结果，与手动真实比较。

Gradient Surgery for One-shot Unlearning on Generative Model

paper_url: http://arxiv.org/abs/2307.04550
repo_url: None
paper_authors: Seohui Bae, Seoyoon Kim, Hyemin Jung, Woohyung Lim
for: 本研究旨在提出一种简单又有效的方法，用于从深度生成模型中移除特定样本的影响。
methods: 本方法基于多任务学习的想法，通过将梯度投影到梯度的正常平面来规范梯度的影响。
results: 本研究提供了 theoretically analysis，并与现有基eline相比，实现了超出现有基eline的表现。

Abstract
Recent regulation on right-to-be-forgotten emerges tons of interest in unlearning pre-trained machine learning models. While approximating a straightforward yet expensive approach of retrain-from-scratch, recent machine unlearning methods unlearn a sample by updating weights to remove its influence on the weight parameters. In this paper, we introduce a simple yet effective approach to remove a data influence on the deep generative model. Inspired by works in multi-task learning, we propose to manipulate gradients to regularize the interplay of influence among samples by projecting gradients onto the normal plane of the gradients to be retained. Our work is agnostic to statistics of the removal samples, outperforming existing baselines while providing theoretical analysis for the first time in unlearning a generative model.

摘要
近期的“忘记权”规定吸引了大量关注，训练过的机器学习模型的“忘记”问题受到了广泛关注。而现有的机器学习忘记方法通常是将数据重新训练，这种方法可能是复杂且昂贵的。在这篇论文中，我们介绍了一个简单又有效的方法，可以将数据的影响力从深度生成模型中移除。我们参考了多任务学习的研究，我们提出了将梯度调控到梯度的正常平面上，以regularize数据之间的影响相互关系。我们的方法不依赖排除数据的统计特征，而且超越现有的基elines，并提供了理论分析。

Hate Speech Detection via Dual Contrastive Learning

paper_url: http://arxiv.org/abs/2307.05578
repo_url: None
paper_authors: Junyu Lu, Hongfei Lin, Xiaokun Zhang, Zhaoqing Li, Tongyue Zhang, Linlin Zong, Fenglong Ma, Bo Xu
for: 本研究旨在探讨社交媒体上 hate speech 的快速传播对互联网环境和社会造成的影响，以及探索现代自然语言处理技术中 hate speech 检测的问题。
methods: 本研究提出了一种基于 dual contrastive learning（DCL）的新方法，通过同时优化自动supervised和supervised contrastive learning损失函数，捕捉 span-level 信息，超过传统模型所使用的 token-level 情感 semantics。此外，我们还将 focal loss 集成到 DCL 框架中，以解决数据不均衡问题。
results: 我们在两个公开的英文数据集上进行实验，结果表明，提出的模型在 hate speech 检测任务中具有比例性和精度，并 precisely 地检测了 hate speeches。

Abstract
The fast spread of hate speech on social media impacts the Internet environment and our society by increasing prejudice and hurting people. Detecting hate speech has aroused broad attention in the field of natural language processing. Although hate speech detection has been addressed in recent work, this task still faces two inherent unsolved challenges. The first challenge lies in the complex semantic information conveyed in hate speech, particularly the interference of insulting words in hate speech detection. The second challenge is the imbalanced distribution of hate speech and non-hate speech, which may significantly deteriorate the performance of models. To tackle these challenges, we propose a novel dual contrastive learning (DCL) framework for hate speech detection. Our framework jointly optimizes the self-supervised and the supervised contrastive learning loss for capturing span-level information beyond the token-level emotional semantics used in existing models, particularly detecting speech containing abusive and insulting words. Moreover, we integrate the focal loss into the dual contrastive learning framework to alleviate the problem of data imbalance. We conduct experiments on two publicly available English datasets, and experimental results show that the proposed model outperforms the state-of-the-art models and precisely detects hate speeches.

摘要
“社交媒体上快速传播的仇恨言论影响了网络环境和我们的社会，增加了偏见和伤害人。检测仇恨言论已经受到了学界的广泛关注，但这个任务仍然面临两个困难。第一个挑战是仇恨言论中的复杂 semantics，特别是攻击性评语的干扰。第二个挑战是仇恨言论和非仇恨言论之间的数据分布不均，这可能导致模型的性能下降。为了解决这些挑战，我们提出了一个新的双重对照学习（DCL）框架 для仇恨言论检测。我们的框架同时运行自我超vised和supervised对照学习损失，以捕捉 span-level 信息，超过了现有模型中的单词水平情感学。此外，我们将 focal loss 整合到 dual contrastive learning 框架中，以解决数据不均的问题。我们在两个公开的英文数据集上进行实验，实验结果显示，我们提出的模型比现有的模型高效，精准地检测到了仇恨言论。”

Learning Large Margin Sparse Embeddings for Open Set Medical Diagnosis

paper_url: http://arxiv.org/abs/2307.04541
repo_url: None
paper_authors: Mingyuan Liu, Lu Xu, Jicong Zhang
for: 这篇论文目的是解决医学数据集搜集不充分时，算法遇到多种挑战，其中一个重要的挑战是开集识别（OSR）。
methods: 这篇论文提出了两个机制来解决OSR：一个叫做Margin Loss with Adaptive Scale（MLAS），它引入了角度margin来增强内组距离和间组分离，同时还有一个适应因子来强化通用能力；另一个叫做Open-Space Suppression（OSS），它开辟了分类器，让尚未被训练的范围被视为未知。
results: 与现有方法比较，这篇论文的MLAS方法实现了superior表现，测量方式包括ACC、AUROC和OSCR。

Abstract
Fueled by deep learning, computer-aided diagnosis achieves huge advances. However, out of controlled lab environments, algorithms could face multiple challenges. Open set recognition (OSR), as an important one, states that categories unseen in training could appear in testing. In medical fields, it could derive from incompletely collected training datasets and the constantly emerging new or rare diseases. OSR requires an algorithm to not only correctly classify known classes, but also recognize unknown classes and forward them to experts for further diagnosis. To tackle OSR, we assume that known classes could densely occupy small parts of the embedding space and the remaining sparse regions could be recognized as unknowns. Following it, we propose Open Margin Cosine Loss (OMCL) unifying two mechanisms. The former, called Margin Loss with Adaptive Scale (MLAS), introduces angular margin for reinforcing intra-class compactness and inter-class separability, together with an adaptive scaling factor to strengthen the generalization capacity. The latter, called Open-Space Suppression (OSS), opens the classifier by recognizing sparse embedding space as unknowns using proposed feature space descriptors. Besides, since medical OSR is still a nascent field, two publicly available benchmark datasets are proposed for comparison. Extensive ablation studies and feature visualization demonstrate the effectiveness of each design. Compared with state-of-the-art methods, MLAS achieves superior performances, measured by ACC, AUROC, and OSCR.

摘要
We propose Open Margin Cosine Loss (OMCL), which combines two mechanisms:1. Margin Loss with Adaptive Scale (MLAS): This introduces angular margin for improving intra-class compactness and inter-class separability, along with an adaptive scaling factor to enhance generalization capacity.2. Open-Space Suppression (OSS): This opens the classifier by recognizing sparse embedding space as unknowns using proposed feature space descriptors.To evaluate the effectiveness of OMCL, we use two publicly available benchmark datasets for comparison. Extensive ablation studies and feature visualization demonstrate the superior performance of MLAS, as measured by ACC, AUROC, and OSCR. Compared to state-of-the-art methods, MLAS achieves better performance in recognizing unknown classes and forwarding them to experts for further diagnosis.

Pathway toward prior knowledge-integrated machine learning in engineering

paper_url: http://arxiv.org/abs/2307.06950
repo_url: None
paper_authors: Xia Chen, Philipp Geyer
for: This study aims to integrate multidisciplinary domain professions into machine acknowledgeable, data-driven processes.
methods: The study examines information uncertainty sources in knowledge representation and explores knowledge decomposition with a three-tier knowledge-integrated machine learning paradigm.
results: The approach balances holist and reductionist perspectives in the engineering domain.Here is the same information in Simplified Chinese text, as requested:
for: 这种研究旨在将多学科领域专业integrated into机器可知的数据驱动过程中。
methods: 研究检查知识表示中的信息不确定源和知识归纳三层知识结合机器学习模型。
results: 该方法在工程领域实现了平衡整体和分解的观点。

Abstract
Despite the digitalization trend and data volume surge, first-principles models (also known as logic-driven, physics-based, rule-based, or knowledge-based models) and data-driven approaches have existed in parallel, mirroring the ongoing AI debate on symbolism versus connectionism. Research for process development to integrate both sides to transfer and utilize domain knowledge in the data-driven process is rare. This study emphasizes efforts and prevailing trends to integrate multidisciplinary domain professions into machine acknowledgeable, data-driven processes in a two-fold organization: examining information uncertainty sources in knowledge representation and exploring knowledge decomposition with a three-tier knowledge-integrated machine learning paradigm. This approach balances holist and reductionist perspectives in the engineering domain.

摘要
Translation notes:* "first-principles models" 被翻译成 "逻辑驱动、物理驱动、规则驱动或知识驱动模型"* "data-driven approaches" 被翻译成 "数据驱动方法"* "domain knowledge" 被翻译成 "领域知识"* "machine acknowledgeable" 被翻译成 "可识别的机器"* "three-tier knowledge-integrated machine learning paradigm" 被翻译成 "三层知识集成机器学习模型"Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need Traditional Chinese, please let me know.

Q-YOLOP: Quantization-aware You Only Look Once for Panoptic Driving Perception

paper_url: http://arxiv.org/abs/2307.04537
repo_url: None
paper_authors: Chi-Chih Chang, Wei-Cheng Lin, Pei-Shuo Wang, Sheng-Feng Yu, Yu-Chen Lu, Kuan-Cheng Lin, Kai-Chiang Wu
for: 本文提出了一种高效、量化化感知模型（Q- YOLOP），用于自动驾驶场景中的对象检测、 drivable 区域分 segmentation 和车道线分 segmentation。
methods: 该模型使用 Efficient Layer Aggregation Network（ELAN）作为底层，并采用任务特定的头来实现每个任务。训练过程包括四个阶段：预训练于 BDD100K 数据集、精度调整于 BDD100K 和 iVS 数据集、量化感知训练（QAT）于 BDD100K 数据集，并使用强大的数据增强技术和多个数据集训练。
results: 提出的模型在对象检测和分 segmentation tasks 中实现了状态的术语性性能（mAP@0.5 = 0.622）和低计算和存储需求。

Abstract
In this work, we present an efficient and quantization-aware panoptic driving perception model (Q- YOLOP) for object detection, drivable area segmentation, and lane line segmentation, in the context of autonomous driving. Our model employs the Efficient Layer Aggregation Network (ELAN) as its backbone and task-specific heads for each task. We employ a four-stage training process that includes pretraining on the BDD100K dataset, finetuning on both the BDD100K and iVS datasets, and quantization-aware training (QAT) on BDD100K. During the training process, we use powerful data augmentation techniques, such as random perspective and mosaic, and train the model on a combination of the BDD100K and iVS datasets. Both strategies enhance the model's generalization capabilities. The proposed model achieves state-of-the-art performance with an mAP@0.5 of 0.622 for object detection and an mIoU of 0.612 for segmentation, while maintaining low computational and memory requirements.

摘要
在这项工作中，我们提出了一种高效和量化意识的泛型驾驶视觉模型（Q-YOLOP），用于对象检测、驾驶区域分割和车道线分割，在自动驾驶上下文中。我们的模型采用了高效层堆网络（ELAN）作为基础，并在每个任务上采用专门的任务头。我们采用了四个阶段的训练过程，包括预训练于BDD100K数据集、finetuning于BDD100K和iVS数据集，以及量化意识训练（QAT）于BDD100K。在训练过程中，我们使用了强大的数据扩展技术，如随机投影和拼图，并在BDD100K和iVS数据集上训练模型。这两种策略都会增强模型的通用化能力。我们提出的模型在对象检测和分割任务中具有状态机器的表现，即mAP@0.5为0.622，而且保持了低的计算和存储需求。

QBitOpt: Fast and Accurate Bitwidth Reallocation during Training

paper_url: http://arxiv.org/abs/2307.04535
repo_url: None
paper_authors: Jorn Peters, Marios Fournarakis, Markus Nagel, Mart van Baalen, Tijmen Blankevoort
for: 这篇论文的目的是提出一个新的算法来更新量化网络中层的比特宽，以提高量化网络在移动和嵌入式设备上的效率。
methods: 这篇论文使用了量化网络的训练方法，并将层的比特宽更新为获得更高的任务性能。
results: 根据评估 ImageNet 的结果，这篇论文的算法可以在严格的比特宽限制下，与现有的固定和混合精度方法相比，实现更高的任务性能。

Abstract
Quantizing neural networks is one of the most effective methods for achieving efficient inference on mobile and embedded devices. In particular, mixed precision quantized (MPQ) networks, whose layers can be quantized to different bitwidths, achieve better task performance for the same resource constraint compared to networks with homogeneous bitwidths. However, finding the optimal bitwidth allocation is a challenging problem as the search space grows exponentially with the number of layers in the network. In this paper, we propose QBitOpt, a novel algorithm for updating bitwidths during quantization-aware training (QAT). We formulate the bitwidth allocation problem as a constraint optimization problem. By combining fast-to-compute sensitivities with efficient solvers during QAT, QBitOpt can produce mixed-precision networks with high task performance guaranteed to satisfy strict resource constraints. This contrasts with existing mixed-precision methods that learn bitwidths using gradients and cannot provide such guarantees. We evaluate QBitOpt on ImageNet and confirm that we outperform existing fixed and mixed-precision methods under average bitwidth constraints commonly found in the literature.

摘要
quantizing neural networks是一种最有效的方法来实现移动和嵌入式设备上的高效推理。特别是杂性精度归一化（MPQ）网络，它们的层可以被归一化到不同的比特宽，可以在相同的资源约束下实现更好的任务性能。然而，寻找最佳比特宽分配是一个复杂的问题，因为搜索空间会随着网络层数的增加而增长 exponentially。在这篇论文中，我们提出了QBitOpt算法，它是一种在量化训练中更新比特宽的算法。我们将比特宽分配问题定义为一个约束优化问题。通过将快速计算的敏感度与高效的解决方案结合在一起，QBitOpt可以生成杂性精度网络，保证任务性能和严格的资源约束之间的均衡。这与现有的杂性精度方法不同，它们通过梯度来学习比特宽，但无法提供类似的保证。我们对ImageNet进行了评估，并证明了我们在常见的平均比特宽约束下超过了现有的固定和杂性精度方法。

Preventing Errors in Person Detection: A Part-Based Self-Monitoring Framework

paper_url: http://arxiv.org/abs/2307.04533
repo_url: https://github.com/fraunhoferiks/smf-object-detection
paper_authors: Franziska Schwaiger, Andrea Matic, Karsten Roscher, Stephan Günnemann
for: 提高自主系统在真实应用中的可靠性，尤其是人体检测 tasks，以避免错误检测。
methods: 提出了一种自动监控框架，让感知系统在运行时进行可能性检查。通过添加人体部分检测组件，可以显著减少基线设置中的人体检测错失率，最多减少9倍。同时，通过同时训练人体和其部分，可以减少假阳检测率，最多减少50%。
results: 通过在公共可用的数据集DensePose和Pascal VOC上进行广泛的实验，证明了我们的框架的效iveness。代码可以在https://github.com/FraunhoferIKS/smf-object-detection上下载。

Abstract
The ability to detect learned objects regardless of their appearance is crucial for autonomous systems in real-world applications. Especially for detecting humans, which is often a fundamental task in safety-critical applications, it is vital to prevent errors. To address this challenge, we propose a self-monitoring framework that allows for the perception system to perform plausibility checks at runtime. We show that by incorporating an additional component for detecting human body parts, we are able to significantly reduce the number of missed human detections by factors of up to 9 when compared to a baseline setup, which was trained only on holistic person objects. Additionally, we found that training a model jointly on humans and their body parts leads to a substantial reduction in false positive detections by up to 50% compared to training on humans alone. We performed comprehensive experiments on the publicly available datasets DensePose and Pascal VOC in order to demonstrate the effectiveness of our framework. Code is available at https://github.com/ FraunhoferIKS/smf-object-detection.

摘要
自动化系统在实际应用中需要检测已经学习的对象，不管它们的外观如何。尤其是在检测人类时，这是安全关键任务，避免错误是非常重要。为解决这个挑战，我们提出了一个 runtime 进行可靠性检查的自我监控框架。我们发现，通过在 runtime 添加人体部分检测的附加组件，可以大幅减少与基准设置相比（只在人体整体对象上进行训练）的人类检测错过率，达到了9倍的提高。此外，我们发现，同时训练人体和人体部分模型，可以将假阳性检测错误量降低至50%以上，比训练只在人体上进行训练要好。我们在公共可用的 dataset 上进行了广泛的实验，以示效果。代码可以在 https://github.com/FraunhoferIKS/smf-object-detection 上获取。

SAGC-A68: a space access graph dataset for the classification of spaces and space elements in apartment buildings

paper_url: http://arxiv.org/abs/2307.04515
repo_url: https://github.com/a2amir/sagc-a68
paper_authors: Amir Ziaee, Georg Suter
For: The paper aims to provide a dataset and a method for automated classification of spaces and space elements in digital 3D models of apartment buildings, using Graph Deep Learning (GDL) techniques.* Methods: The paper introduces a new dataset, SAGC-A68, which comprises access graphs automatically generated from 68 digital 3D models of space layouts of apartment buildings. The dataset is well-suited for developing GDL models for space function and space element classification. The paper also employs a graph attention network (GAT) to predict 22 space function and 6 space element classes using the dataset.* Results: The paper demonstrates the potential of the dataset and the GAT method by achieving high accuracy in predicting space functions and space elements in digital 3D models of apartment buildings. The dataset and code used in the experiment are available online.

Abstract
The analysis of building models for usable area, building safety, and energy use requires accurate classification data of spaces and space elements. To reduce input model preparation effort and errors, automated classification of spaces and space elements is desirable. A barrier hindering the utilization of Graph Deep Learning (GDL) methods to space function and space element classification is a lack of suitable datasets. To bridge this gap, we introduce a dataset, SAGC-A68, which comprises access graphs automatically generated from 68 digital 3D models of space layouts of apartment buildings. This graph-based dataset is well-suited for developing GDL models for space function and space element classification. To demonstrate the potential of the dataset, we employ it to train and evaluate a graph attention network (GAT) that predicts 22 space function and 6 space element classes. The dataset and code used in the experiment are available online. https://doi.org/10.5281/zenodo.7805872, https://github.com/A2Amir/SAGC-A68.

摘要
“建筑模型中的usable area、建筑安全和能源使用的分析需要准确的空间和空间元素分类数据。为了减少输入模型准备和错误，自动化空间和空间元素分类是非常有利的。然而，使用图深度学习（GDL）方法进行空间功能和空间元素分类时存在一个障碍，即缺乏适当的数据集。为了缓解这个问题，我们介绍了一个数据集，即SAGC-A68，该数据集包含68个数字3D模型的空间布局图像。这个图像基于的数据集非常适合用于开发GDL模型进行空间功能和空间元素分类。为了证明数据集的潜力，我们使用它来训练和评估一个图注意网络（GAT），该网络预测22个空间功能和6个空间元素类别。数据集和实验代码在线可用，请参考https://doi.org/10.5281/zenodo.7805872和https://github.com/A2Amir/SAGC-A68。”Note: Simplified Chinese is used in this translation, as it is the most widely used variety of Chinese in mainland China.

Improving Heterogeneous Graph Learning with Weighted Mixed-Curvature Product Manifold

paper_url: http://arxiv.org/abs/2307.04514
repo_url: https://github.com/sharecodesubmission/weighted_product_manifold
paper_authors: Tuc Nguyen-Van, Dung D. Le, The-Anh Ta
for: 这 paper 的目的是学习不同结构的图像 embedding，以便在多种下游任务中提高表示能力。
methods: 这 paper 使用 product manifold 的方法，其中每个组件空间在 product 空间中有不同的贡献，因此需要根据数据自动确定每个组件的权重。
results: 这 paper 的实验表明，使用 weighted product manifold 方法可以从输入数据中学习更好的图像表示，并在多个下游任务中表现更好，例如 word similarity learning、top-$k$ recommendation 和知识图 embedding。

Abstract
In graph representation learning, it is important that the complex geometric structure of the input graph, e.g. hidden relations among nodes, is well captured in embedding space. However, standard Euclidean embedding spaces have a limited capacity in representing graphs of varying structures. A promising candidate for the faithful embedding of data with varying structure is product manifolds of component spaces of different geometries (spherical, hyperbolic, or euclidean). In this paper, we take a closer look at the structure of product manifold embedding spaces and argue that each component space in a product contributes differently to expressing structures in the input graph, hence should be weighted accordingly. This is different from previous works which consider the roles of different components equally. We then propose WEIGHTED-PM, a data-driven method for learning embedding of heterogeneous graphs in weighted product manifolds. Our method utilizes the topological information of the input graph to automatically determine the weight of each component in product spaces. Extensive experiments on synthetic and real-world graph datasets demonstrate that WEIGHTED-PM is capable of learning better graph representations with lower geometric distortion from input data, and performs better on multiple downstream tasks, such as word similarity learning, top-$k$ recommendation, and knowledge graph embedding.

摘要
在图表示学习中，将复杂的图像结构（如节点之间的隐藏关系） faithful embedding 到嵌入空间中是非常重要的。然而，标准的欧几何嵌入空间具有表示图像的不同结构的局限性。一种有前途的候选人是Product manifold embedding 空间，其中每个分量空间具有不同的几何结构（球形、尖锐形或欧几何形）。在这篇论文中，我们更加细详地研究Product manifold embedding 空间的结构，并论证每个分量空间在Product manifold中对输入图像的表示具有不同的贡献，因此应该被权衡。这与之前的工作不同，那些假设所有分量空间具有相同的贡献。我们然后提出了WEIGHTED-PM，一种基于数据的方法，用于学习异构图像的嵌入。我们的方法利用输入图像的topological信息自动确定每个分量空间的权重。我们对 synthetic 和实际世界图像集进行了广泛的实验，结果表明WEIGHTED-PM 能够从输入数据中学习更好的图像表示，并且在多个下游任务上表现更好，如单词相似学习、top-$k$ 推荐和知识图像嵌入。

Improving Factuality of Abstractive Summarization via Contrastive Reward Learning

paper_url: http://arxiv.org/abs/2307.04507
repo_url: None
paper_authors: I-Chun Chern, Zhiruo Wang, Sanjan Das, Bhavuk Sharma, Pengfei Liu, Graham Neubig
for: 提高报告摘要的准确性和可靠性
methods: 利用最新的奖励学习和实际度指标，实现对Feedback的报告摘要学习
results: 经验研究表明，提出的框架可以通过对实际度指标的反馈学习，从而生成更加准确和可靠的报告摘要。

Abstract
Modern abstractive summarization models often generate summaries that contain hallucinated or contradictory information. In this paper, we propose a simple but effective contrastive learning framework that incorporates recent developments in reward learning and factuality metrics. Empirical studies demonstrate that the proposed framework enables summarization models to learn from feedback of factuality metrics using contrastive reward learning, leading to more factual summaries by human evaluations. This suggests that further advances in learning and evaluation algorithms can feed directly into providing more factual summaries.

摘要

Deductive Controller Synthesis for Probabilistic Hyperproperties

paper_url: http://arxiv.org/abs/2307.04503
repo_url: None
paper_authors: Roman Andriushchenko, Ezio Bartocci, Milan Ceska, Francesco Pontiggia, Sarah Sallinger
for: 这个论文旨在解决Markov决策过程（MDP）和概率协同特性（HyperPCTL）的控制 synthesis问题。
methods: 该方法使用逻辑HyperPCTL和结构约束来描述控制器，并使用抽象减少策略来缩小搜索空间。
results: 实验结果表明，该方法与HyperProb相比，可以耗时更短、更高效地解决控制 synthesis问题，并且可以同时满足概率协同特性和内部控制器约束（例如部分可观测性）以及间接控制器约束（例如共同行动协议）。

Abstract
Probabilistic hyperproperties specify quantitative relations between the probabilities of reaching different target sets of states from different initial sets of states. This class of behavioral properties is suitable for capturing important security, privacy, and system-level requirements. We propose a new approach to solve the controller synthesis problem for Markov decision processes (MDPs) and probabilistic hyperproperties. Our specification language builds on top of the logic HyperPCTL and enhances it with structural constraints over the synthesized controllers. Our approach starts from a family of controllers represented symbolically and defined over the same copy of an MDP. We then introduce an abstraction refinement strategy that can relate multiple computation trees and that we employ to prune the search space deductively. The experimental evaluation demonstrates that the proposed approach considerably outperforms HyperProb, a state-of-the-art SMT-based model checking tool for HyperPCTL. Moreover, our approach is the first one that is able to effectively combine probabilistic hyperproperties with additional intra-controller constraints (e.g. partial observability) as well as inter-controller constraints (e.g. agreements on a common action).

摘要
probabistic 特性指定了从不同的初始状态集到达不同的目标状态集的概率关系。这类行为特性适用于捕捉安全性、隐私和系统级别的需求。我们提出了一种新的控制器Synthesis问题的解决方案，用于Markov决策过程（MDP）和概率特性。我们的规定语言基于逻辑HyperPCTL并增加了对synthesized控制器的结构约束。我们的方法从一个家族控制器的符号表示开始，这些控制器都是基于同一个MDP的。我们然后引入一种抽象减少策略，可以关联多个计算树，并通过推理减少搜索空间。实验评估表明，我们的方法在比 HyperProb，一个现有的SMT基于模板检查工具 для HyperPCTL 的性能上明显优于。此外，我们的方法可以有效地结合概率特性与内部控制器约束（例如半观察性）以及间接控制器约束（例如共同行动协议）。

Model-Driven Engineering Method to Support the Formalization of Machine Learning using SysML

paper_url: http://arxiv.org/abs/2307.04495
repo_url: None
paper_authors: Simon Raedler, Juergen Mangler, Stefanie Rinderle-Ma
for:This paper aims to support the collaborative definition of machine learning tasks by leveraging model-based engineering in the formalization of the systems modeling language SysML.methods:The method introduced in this paper uses SysML to formalize knowledge from various domains, identify and integrate data sources, define semantic connections between data attributes, and define data processing steps within the machine learning support.results:The method is evaluated through two use cases and a user study, demonstrating the potential of integrating machine learning-specific properties in systems engineering techniques to support non-data scientists in defining specific aspects of a machine learning problem and documenting knowledge on the data. The results show that the method can consolidate knowledge from various domains and support the integration of machine learning in industry by involving several stakeholders.Here is the same information in Simplified Chinese:for:这篇论文目的是支持机器学习任务的共同定义，通过基于模型工程的形式化语言SysML的扩展。methods:这篇论文引入的方法使用SysML来形式化各个领域的知识，标识和集成数据源，定义数据属性之间的含义连接，以及在机器学习支持中定义数据处理步骤。results:这篇论文通过两个使用场景和一个用户研究，展示了将机器学习特有的属性 integrate into 系统工程技术中的可能性，以支持非数据科学家在定义机器学习问题中的参与。研究结果表明，这种方法可以汇集各个领域的知识，并且支持机器学习在产业中的整合，并且能够涉及多个团队成员。

Abstract
Methods: This work introduces a method supporting the collaborative definition of machine learning tasks by leveraging model-based engineering in the formalization of the systems modeling language SysML. The method supports the identification and integration of various data sources, the required definition of semantic connections between data attributes, and the definition of data processing steps within the machine learning support. Results: By consolidating the knowledge of domain and machine learning experts, a powerful tool to describe machine learning tasks by formalizing knowledge using the systems modeling language SysML is introduced. The method is evaluated based on two use cases, i.e., a smart weather system that allows to predict weather forecasts based on sensor data, and a waste prevention case for 3D printer filament that cancels the printing if the intended result cannot be achieved (image processing). Further, a user study is conducted to gather insights of potential users regarding perceived workload and usability of the elaborated method. Conclusion: Integrating machine learning-specific properties in systems engineering techniques allows non-data scientists to understand formalized knowledge and define specific aspects of a machine learning problem, document knowledge on the data, and to further support data scientists to use the formalized knowledge as input for an implementation using (semi-) automatic code generation. In this respect, this work contributes by consolidating knowledge from various domains and therefore, fosters the integration of machine learning in industry by involving several stakeholders.

摘要
方法：本研究将机器学习任务支持使用系统模型语言（SysML）进行形式化，以便推广专业人员之间的协作定义。这种方法可以支持资料来源的识别和集成，并定义资料属性之间的semantic连接，以及机器学习支持中的数据处理步骤。结果：通过将领域专家和机器学习专家的知识结构化使用SysML，我们提出了一个具有强大描述机器学习任务的工具。这种方法在两个使用案例中进行评估，包括一个预测天气预报基于传感器数据的智能天气系统，以及一个根据图像处理的废物预防系统。此外，我们还进行了用户研究，以获得潜在用户对方法的感受和使用预期。结论：通过将机器学习特有的属性 integrate到系统工程技术中，非数据科学家可以理解形式化知识，定义特定的机器学习问题的方面，将资料处理步骤文档化，并且支持（半）自动代码生成。在这种方式下，本研究对各个领域的知识集成，因此推动了机器学习在业界的整合，并让多个潜在用户参与。

Exploring Large Language Model for Graph Data Understanding in Online Job Recommendations

paper_url: http://arxiv.org/abs/2307.05722
repo_url: None
paper_authors: Likang Wu, Zhaopeng Qiu, Zhi Zheng, Hengshu Zhu, Enhong Chen
for: This paper aims to explore the potential of large language models (LLMs) in understanding behavior graphs and enhancing job recommendations in online recruitment, including the promotion of out-of-distribution (OOD) applications.
methods: The proposed framework leverages the rich contextual information and semantic representations provided by LLMs to analyze behavior graphs and uncover underlying patterns and relationships. Specifically, it uses a meta-path prompt constructor to understand behavior graphs and a path augmentation module to alleviate prompt bias.
results: The approach is evaluated on a comprehensive dataset and demonstrates improved relevance and quality of recommended jobs compared to traditional path-based sequence input methods. The findings contribute to the growing field of natural language processing and offer practical implications for enhancing job search experiences.

Abstract
Large Language Models (LLMs) have revolutionized natural language processing tasks, demonstrating their exceptional capabilities in various domains. However, their potential for behavior graph understanding in job recommendations remains largely unexplored. This paper focuses on unveiling the capability of large language models in understanding behavior graphs and leveraging this understanding to enhance recommendations in online recruitment, including the promotion of out-of-distribution (OOD) application. We present a novel framework that harnesses the rich contextual information and semantic representations provided by large language models to analyze behavior graphs and uncover underlying patterns and relationships. Specifically, we propose a meta-path prompt constructor that leverages LLM recommender to understand behavior graphs for the first time and design a corresponding path augmentation module to alleviate the prompt bias introduced by path-based sequence input. By leveraging this capability, our framework enables personalized and accurate job recommendations for individual users. We evaluate the effectiveness of our approach on a comprehensive dataset and demonstrate its ability to improve the relevance and quality of recommended quality. This research not only sheds light on the untapped potential of large language models but also provides valuable insights for developing advanced recommendation systems in the recruitment market. The findings contribute to the growing field of natural language processing and offer practical implications for enhancing job search experiences.

摘要
大型自然语言模型（LLM）已经革命化了自然语言处理任务，展示了其在各个领域的出色表现。然而，它们在行为图理解方面的潜在力还尚未得到充分利用。这篇论文探讨了大型语言模型在理解行为图方面的能力，并利用这种理解来提高在线招聘中的推荐。我们提出了一种新的框架，利用大型语言模型提供的背景信息和含义表示来分析行为图，揭示下面的模式和关系。我们提出了一种基于LLM推荐的元路提取器，用于理解行为图，并设计了一种相应的路径增强模块，以减少路径基于序列输入引入的提示偏见。通过利用这种能力，我们的框架可以为个人用户提供个性化和准确的职业推荐。我们对一个完整的数据集进行了评估，并证明了我们的方法可以提高推荐的 relevance 和质量。这些发现不仅探讨了大型语言模型的未利用潜力，还为自然语言处理领域的进步提供了有价值的 suggessions，以及改善employme搜索经验。

paper_url: http://arxiv.org/abs/2307.12777
repo_url: None
paper_authors: Imene Tarakli, Georgios Angelopoulos, Mehdi Hellou, Camille Vindolet, Boris Abramovic, Rocco Limongelli, Dimitri Lacroix, Andrea Bertolini, Silvia Rossi, Alessandro Di Nuovo, Angelo Cangelosi, Gordon Cheng
for: This paper aims to discuss and propose guidelines for personalization in robotics, addressing questions such as how to define it, how to achieve it, and how it should be guided to fit legal and ethical requirements.
methods: The paper uses an interdisciplinary approach, bringing together researchers from various fields to discuss and propose guidelines for personalization in robotics.
results: The paper aims to provide a comprehensive understanding of personalization in robotics, including its definition, achievement, and ethical considerations, to ensure the large-scale adoption of social robotics.

Abstract
Nowadays, robots are expected to interact more physically, cognitively, and socially with people. They should adapt to unpredictable contexts alongside individuals with various behaviours. For this reason, personalisation is a valuable attribute for social robots as it allows them to act according to a specific user's needs and preferences and achieve natural and transparent robot behaviours for humans. If correctly implemented, personalisation could also be the key to the large-scale adoption of social robotics. However, achieving personalisation is arduous as it requires us to expand the boundaries of robotics by taking advantage of the expertise of various domains. Indeed, personalised robots need to analyse and model user interactions while considering their involvement in the adaptative process. It also requires us to address ethical and socio-cultural aspects of personalised HRI to achieve inclusive and diverse interaction and avoid deception and misplaced trust when interacting with the users. At the same time, policymakers need to ensure regulations in view of possible short-term and long-term adaptive HRI. This workshop aims to raise an interdisciplinary discussion on personalisation in robotics. It aims at bringing researchers from different fields together to propose guidelines for personalisation while addressing the following questions: how to define it - how to achieve it - and how it should be guided to fit legal and ethical requirements.

摘要
Translated into Simplified Chinese:现在，机器人被期望能更加physically、cognitively和社会地与人们互动。它们应该适应不可预测的情况，与具有不同行为的人们一起互动。因此，个性化成为社交机器人的有价值特性，允许它们根据特定用户的需求和偏好进行行为，并为人类带来自然和透明的机器人行为。如果正确实施，个性化也可能成为社交机器人的大规模采用的关键。然而，实现个性化是困难的，因为它需要我们拓宽机器人领域，利用不同领域的专家知识。实际上，个性化机器人需要分析和模型用户互动，同时考虑用户参与过程中的适应性。此外，我们还需要考虑人工智能与人类社会的伦理和文化方面，以实现包容和多样化的互动，避免欺骗和不当信任。同时，政策制定者需要确保法规，以适应可能的短期和长期适应HRI。这个工作室希望通过跨学科的交流，让研究人员从不同领域集结一起，提出个性化指南，并解决以下问题：如何定义它？如何实现它？如何使其遵循法律和伦理要求？

PapagAI:Automated Feedback for Reflective Essays

paper_url: http://arxiv.org/abs/2307.07523
repo_url: None
paper_authors: Veronika Solopova, Adrian Gruszczynski, Eiad Rostom, Fritz Cremer, Sascha Witte, Chengming Zhang, Fernando Ramos López Lea Plößl, Florian Hofmann, Ralf Romeike, Michaela Gläser-Zikuda, Christoph Benzmüller, Tim Landgraf
for: 提高学生学习成绩和讲师教学 activites的补充
methods: 基于教学理论的自动反馈工具， hybrid AI系统
results: 提供了一种开源自动反馈工具，可以补充讲师的反馈活动，提高学生的学习成绩

Abstract
Written reflective practice is a regular exercise pre-service teachers perform during their higher education. Usually, their lecturers are expected to provide individual feedback, which can be a challenging task to perform on a regular basis. In this paper, we present the first open-source automated feedback tool based on didactic theory and implemented as a hybrid AI system. We describe the components and discuss the advantages and disadvantages of our system compared to the state-of-art generative large language models. The main objective of our work is to enable better learning outcomes for students and to complement the teaching activities of lecturers.

摘要
高等教育中的写作反思实践是未来教师的常规课程之一。通常，课堂教师会提供个人反馈，但这可能是一项困难的任务。在这篇论文中，我们介绍了第一个开源自动反馈工具，基于教学理论而实现，并作为混合式人工智能系统。我们说明了它的组成部分，并讨论了我们的系统与现有的生成大语言模型之间的优劣点。我们的主要目的是帮助学生学习更好，并辅助课堂教师的教学活动。

Digital Modeling for Everyone: Exploring How Novices Approach Voice-Based 3D Modeling

paper_url: http://arxiv.org/abs/2307.04481
repo_url: None
paper_authors: Giuseppe Desolda, Andrea Esposito, Florian Müller, Sebastian Feger
for: 这项研究的目的是提高voice assistant的用户体验，使得更多人可以通过语音指令制作3D模型。
methods: 研究采用了高严译Wizard of Oz方法，与22名参与者进行实验，以了解初级用户的心理模型如何在语音基于3D模型创建中表现。
results: 研究发现，初级用户通常会发送模糊、不完整或错误的语音指令，因此voice助手需要处理这些指令并提供相应的帮助。 In addition, the study found that users need a set of straightforward commands to shape simple and composite objects, and different strategies to select 3D objects.

Abstract
Manufacturing tools like 3D printers have become accessible to the wider society, making the promise of digital fabrication for everyone seemingly reachable. While the actual manufacturing process is largely automated today, users still require knowledge of complex design applications to produce ready-designed objects and adapt them to their needs or design new objects from scratch. To lower the barrier to the design and customization of personalized 3D models, we explored novice mental models in voice-based 3D modeling by conducting a high-fidelity Wizard of Oz study with 22 participants. We performed a thematic analysis of the collected data to understand how the mental model of novices translates into voice-based 3D modeling. We conclude with design implications for voice assistants. For example, they have to: deal with vague, incomplete and wrong commands; provide a set of straightforward commands to shape simple and composite objects; and offer different strategies to select 3D objects.

摘要
现代制造工具如3D打印机已成为更广泛的社会可达，使得数字制造的承诺似乎已经实现可行。虽然今天的制造过程已经大多是自动化的，但用户仍需具备复杂的设计应用程序知识来生产预设的物体和根据自己的需求或从头开始设计新的物体。为降低个性化3D模型的设计和自定义的门槛，我们进行了一项高效精准的奥瑞奥斯研究，与22名参与者进行了高精准的研究。我们通过主题分析收集的数据来理解 novice 的心理模型如何翻译为语音基于3D模型。我们结论出了设计启示，如：处理欠准确、不完整的指令，提供简单明了的命令来形成简单和复杂物体，以及不同的策略来选择3D对象。

Some Preliminary Steps Towards Metaverse Logic

paper_url: http://arxiv.org/abs/2307.05574
repo_url: None
paper_authors: Antonio L. Furtado, Marco A. Casanova, Edirlei Soares de Lima
for: 这个论文的目的是开发一种能够处理现实世界和虚拟世界应用领域中的不稳定行为的逻辑。
methods: 这篇论文使用了非常ventional的扩展，尝试绘制一种最小化composite逻辑策略。
results: 论文通过使用 chatGPT AI agent，并appealing to common-sense approach，提出了一种可以用于处理不稳定行为的逻辑策略。

Abstract
Assuming that the term 'metaverse' could be understood as a computer-based implementation of multiverse applications, we started to look in the present work for a logic that would be powerful enough to handle the situations arising both in the real and in the fictional underlying application domains. Realizing that first-order logic fails to account for the unstable behavior of even the most simpleminded information system domains, we resorted to non-conventional extensions, in an attempt to sketch a minimal composite logic strategy. The discussion was kept at a rather informal level, always trying to convey the intuition behind the theoretical notions in natural language terms, and appealing to an AI agent, namely ChatGPT, in the hope that algorithmic and common-sense approaches can be usefully combined.

摘要
假设“metaverse”可以理解为基于计算机实现的多元宇宙应用程序，我们在 presente 工作中寻找了一种逻辑能够处理现实世界和 fictional 下的应用领域中出现的情况。因为首领逻辑无法考虑even the simplest information system 的不稳定行为，我们转而使用非标准扩展，以尝试绘制最小的复合逻辑策略。在讨论中，我们尽量使用自然语言来表达理论概念，并征用一个 AI 代理人，即 ChatGPT，希望可以将算法和通俗的方法相结合。

Designing Novel Cognitive Diagnosis Models via Evolutionary Multi-Objective Neural Architecture Search

paper_url: http://arxiv.org/abs/2307.04429
repo_url: https://github.com/devilyangs/emo-nas-cd
paper_authors: Shangshang Yang, Haiping Ma, Cheng Zhen, Ye Tian, Limiao Zhang, Yaochu Jin, Xingyi Zhang
for: 本研究旨在提出一种自动设计智能诊断模型，以提高现有智能教育平台中学生知识概念诊断的精度和效果。
methods: 本研究使用了进化多目标神经网络架构搜索（NAS）方法自动设计智能诊断模型，并采用了多目标遗传编程（MOGP）法来探索搜索空间。每个建筑物被转化为树结构，并使用树来容易优化。此外，还提出了一种初始化策略，以加速共振。
results: 实验结果显示，由提出的方法搜索的诊断模型在两个实际数据集上表现出了显著更好的性能，并且与人工设计的模型一样有效。

Abstract
Cognitive diagnosis plays a vital role in modern intelligent education platforms to reveal students' proficiency in knowledge concepts for subsequent adaptive tasks. However, due to the requirement of high model interpretability, existing manually designed cognitive diagnosis models hold too simple architectures to meet the demand of current intelligent education systems, where the bias of human design also limits the emergence of effective cognitive diagnosis models. In this paper, we propose to automatically design novel cognitive diagnosis models by evolutionary multi-objective neural architecture search (NAS). Specifically, we observe existing models can be represented by a general model handling three given types of inputs and thus first design an expressive search space for the NAS task in cognitive diagnosis. Then, we propose multi-objective genetic programming (MOGP) to explore the NAS task's search space by maximizing model performance and interpretability. In the MOGP design, each architecture is transformed into a tree architecture and encoded by a tree for easy optimization, and a tailored genetic operation based on four sub-genetic operations is devised to generate offspring effectively. Besides, an initialization strategy is also suggested to accelerate the convergence by evolving half of the population from existing models' variants. Experiments on two real-world datasets demonstrate that the cognitive diagnosis models searched by the proposed approach exhibit significantly better performance than existing models and also hold as good interpretability as human-designed models.

摘要
现代智能教育平台中，认知诊断扮演着关键的角色，以揭示学生知识概念的熟练程度，以便进行适应性任务。然而，由于需要高度模型可读性，现有的手动设计的认知诊断模型具有过于简单的结构，无法满足现代智能教育系统的需求。在这篇论文中，我们提议使用进化多目标神经网络搜索（NAS）自动设计新的认知诊断模型。 Specifically，我们发现现有模型可以表示为三种输入类型的通用模型，因此首先设计了表达力强的搜索空间 для NAS任务。然后，我们提出了多目标遗传编程（MOGP）来探索 NAS 任务的搜索空间，并寻找最佳的模型性能和可读性。在 MOGP 设计中，每个建筑物被转换为树结构，并通过树来编码，以便优化。此外，我们还提出了一种 initialization 策略，以加速共振。实验结果表明，由我们提出的方法搜索出的认知诊断模型在两个真实世界数据集上表现出色，并且与人类设计的模型相当可读。

FedDCT: A Dynamic Cross-Tier Federated Learning Scheme in Wireless Communication Networks

paper_url: http://arxiv.org/abs/2307.04420
repo_url: None
paper_authors: Peng Liu, Youquan Xian, Chuanjian Yao, Xiaoyun Gan, Lianghaojie Zhou, Jianyong Jiang, Dongcheng Li
for: 这个研究旨在提高无线通信网络上的联合学习系统表现和准确性。
methods: 我们提出了一个novel的动态跨层联合学习方案（FedDCT），使用一个层分算法将客户端分为不同的层，并将每个层分配不同的超时阈值以减少训练时间。我们还引入了一个跨层客户选择算法，可以有效地选择层和参与者。
results: 我们的实验结果显示，我们的方案可以让模型更快地读数和 дости得更高的准确性在无线通信网络上。

Abstract
With the rapid proliferation of Internet of Things (IoT) devices and the growing concern for data privacy among the public, Federated Learning (FL) has gained significant attention as a privacy-preserving machine learning paradigm. FL enables the training of a global model among clients without exposing local data. However, when a federated learning system runs on wireless communication networks, limited wireless resources, heterogeneity of clients, and network transmission failures affect its performance and accuracy. In this study, we propose a novel dynamic cross-tier FL scheme, named FedDCT to increase training accuracy and performance in wireless communication networks. We utilize a tiering algorithm that dynamically divides clients into different tiers according to specific indicators and assigns specific timeout thresholds to each tier to reduce the training time required. To improve the accuracy of the model without increasing the training time, we introduce a cross-tier client selection algorithm that can effectively select the tiers and participants. Simulation experiments show that our scheme can make the model converge faster and achieve a higher accuracy in wireless communication networks.

摘要

Unmasking the giant: A comprehensive evaluation of ChatGPT’s proficiency in coding algorithms and data structures

paper_url: http://arxiv.org/abs/2307.05360
repo_url: None
paper_authors: Sayed Erfan Arefin, Tasnia Ashrafi Heya, Hasan Al-Qudah, Ynes Ineza, Abdul Serwadda
For: The paper evaluates the coding capabilities of ChatGPT, a large language model, in the Python programming language, specifically focusing on data structures and algorithms.* Methods: The paper uses a comprehensive evaluation of ChatGPT’s coding capabilities based on a large catalog of coding challenges, and investigates the quality of ChatGPT’s code, the nature of run-time errors, and whether ChatGPT might have directly memorized some of the data used to train it.* Results: The paper investigates the above questions from the context of both the underlying learning models (GPT-3.5 and GPT-4) and on a vast array of sub-topics within the main topics, and compares with human performance whenever feasible.

Abstract
The transformative influence of Large Language Models (LLMs) is profoundly reshaping the Artificial Intelligence (AI) technology domain. Notably, ChatGPT distinguishes itself within these models, demonstrating remarkable performance in multi-turn conversations and exhibiting code proficiency across an array of languages. In this paper, we carry out a comprehensive evaluation of ChatGPT's coding capabilities based on what is to date the largest catalog of coding challenges. Our focus is on the python programming language and problems centered on data structures and algorithms, two topics at the very foundations of Computer Science. We evaluate ChatGPT for its ability to generate correct solutions to the problems fed to it, its code quality, and nature of run-time errors thrown by its code. Where ChatGPT code successfully executes, but fails to solve the problem at hand, we look into patterns in the test cases passed in order to gain some insights into how wrong ChatGPT code is in these kinds of situations. To infer whether ChatGPT might have directly memorized some of the data that was used to train it, we methodically design an experiment to investigate this phenomena. Making comparisons with human performance whenever feasible, we investigate all the above questions from the context of both its underlying learning models (GPT-3.5 and GPT-4), on a vast array sub-topics within the main topics, and on problems having varying degrees of difficulty.

摘要
LLMs 的变革性影响在人工智能技术领域是极其深远的。其中，ChatGPT 在多回交流中表现出色，并在多种语言中 exhibit 代码技巧。在这篇论文中，我们对 ChatGPT 的编程能力进行了全面的评估，以及在 Python 编程语言和数据结构、算法两个领域中的问题。我们评估 ChatGPT 是否能够生成正确的解决方案，代码质量和运行时错误的特点。当 ChatGPT 的代码成功执行但未能解决问题时，我们 investigate 到哪些测试用例中的征性。为了判断 ChatGPT 是否直接Memorized 一些训练数据，我们采用了一种系统atic experiment。我们在可能的情况下与人类性能进行比较，并 investigate LLMs 的下面几个方面：GPT-3.5 和 GPT-4 的下面几个学习模型，多个子题下的问题，以及问题的Difficulty 度。

Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation

paper_url: http://arxiv.org/abs/2307.04401
repo_url: https://github.com/thu-coai/targeted-data-extraction
paper_authors: Zhexin Zhang, Jiaxin Wen, Minlie Huang
for: 本研究旨在提出一种名为“Ethicist”的目标数据提取方法，以帮助避免语言模型内存化训练数据，从而降低隐私风险。
methods: 本方法使用损失平滑化和准确性估计来引导模型具有更好的特征，并通过固定模型进行调整软提示 embedding，以便诱导模型内存化训练数据。另外，我们还提出了一种简化损失函数，以使得抽取更加容易。
results: 我们的实验结果表明，使用 Ethicist 方法可以显著改善目标数据提取性能，并且可以调查各种因素（如排序策略、模型规模、前缀长度和后缀长度）对数据提取性能的影响。代码可以在 GitHub 上找到。

Abstract
Large pre-trained language models achieve impressive results across many tasks. However, recent works point out that pre-trained language models may memorize a considerable fraction of their training data, leading to the privacy risk of information leakage. In this paper, we propose a method named Ethicist for targeted training data extraction through loss smoothed soft prompting and calibrated confidence estimation, investigating how to recover the suffix in the training data when given a prefix. To elicit memorization in the attacked model, we tune soft prompt embeddings while keeping the model fixed. We further propose a smoothing loss that smooths the loss distribution of the suffix tokens to make it easier to sample the correct suffix. In order to select the most probable suffix from a collection of sampled suffixes and estimate the prediction confidence, we propose a calibrated confidence estimation method, which normalizes the confidence of the generated suffixes with a local estimation. We show that Ethicist significantly improves the extraction performance on a recently proposed public benchmark. We also investigate several factors influencing the data extraction performance, including decoding strategy, model scale, prefix length, and suffix length. Our code is available at https://github.com/thu-coai/Targeted-Data-Extraction.

摘要
大型预训语言模型在多种任务上实现了很好的结果，但是最近的研究发现，这些预训语言模型可能会记忆一部分的训练数据，导致隐私风险的信息泄露。在这篇论文中，我们提出了一种方法 named Ethicist，通过loss smoothed soft prompting和准确性调整，实现目标数据提取。我们在给定一个prefix时，通过调整软提示 embedding来引起模型的记忆。我们还提出了一种平滑损失，用于平滑训练数据中的 suffix token 的损失分布，以便更容易从 Correct 的 suffix 中采样。为了选择最有可能性的 suffix 从一组采样的 suffix 中，并估计预测结果的可信度，我们提出了一种准确性调整方法，通过地方估计来 норmalize 生成的 suffix 的可信度。我们表明，Ethicist 可以显著提高目标数据提取的性能，并对多种因素的影响进行了调查，包括解码策略、模型缩放、prefix 长度和 suffix 长度。我们的代码可以在中找到。

Recent Advancements in End-to-End Autonomous Driving using Deep Learning: A Survey

paper_url: http://arxiv.org/abs/2307.04370
repo_url: https://github.com/pranav-chib/recent-advancements-in-end-to-end-autonomous-driving-using-deep-learning
paper_authors: Pranav Singh Chib, Pravendra Singh
for: This paper provides a comprehensive review of the End-to-End autonomous driving stack, including the entire driving process from perception to control.
methods: The paper employs neural networks in an End-to-End manner, addressing key challenges encountered in real-world applications.
results: The paper discusses recent developments in End-to-End autonomous driving, including sensorial input, main and auxiliary output, learning approaches, and model evaluation techniques.Here is the information in Simplified Chinese text:
for: 这篇论文提供了综合的End-to-End自动驾驶栈评论，包括整个驾驶过程从感知到控制。
methods: 论文使用神经网络在End-to-End方式下进行整合，解决了实际应用中的主要挑战。
results: 论文介绍了最新的End-to-End自动驾驶发展，包括感知输入、主要和辅助输出、学习方法从模仿到强化学习、评估技术等。

Abstract
End-to-End driving is a promising paradigm as it circumvents the drawbacks associated with modular systems, such as their overwhelming complexity and propensity for error propagation. Autonomous driving transcends conventional traffic patterns by proactively recognizing critical events in advance, ensuring passengers' safety and providing them with comfortable transportation, particularly in highly stochastic and variable traffic settings. This paper presents a comprehensive review of the End-to-End autonomous driving stack. It provides a taxonomy of automated driving tasks wherein neural networks have been employed in an End-to-End manner, encompassing the entire driving process from perception to control, while addressing key challenges encountered in real-world applications. Recent developments in End-to-End autonomous driving are analyzed, and research is categorized based on underlying principles, methodologies, and core functionality. These categories encompass sensorial input, main and auxiliary output, learning approaches ranging from imitation to reinforcement learning, and model evaluation techniques. The survey incorporates a detailed discussion of the explainability and safety aspects. Furthermore, it assesses the state-of-the-art, identifies challenges, and explores future possibilities. We maintained the latest advancements and their corresponding open-source implementations at https://github.com/Pranav-chib/Recent-Advancements-in-End-to-End-Autonomous-Driving-using-Deep-Learning.

摘要
结束到结束驾驶是一种有前途的思路，因为它 circumvents 模块化系统的缺点，如它们的束缚性和错误传递。自动驾驶超越了传统的交通模式，因为它可以主动认识到关键事件的发生，保证乘客的安全，并为他们提供舒适的交通，特别是在高度随机和变量的交通Setting中。本文提供了结束到结束自动驾驶栈的全面回顾。它提供了自动驾驶任务中使用神经网络的End-to-End方式，涵盖整个驾驶过程从感知到控制，并解决了实际应用中遇到的关键挑战。最新的结束到结束自动驾驶发展分析，并将研究分为基本原理、方法论和核心功能方面。这些类别包括感知输入、主要和辅助输出、学习方法从仿制到强化学习，以及模型评估技术。文章还包括对解释性和安全方面的详细讨论，以及现状、挑战和未来可能性的评估。我们在 GitHub 上维护最新的进展和相应的开源实现，请参考。

ECS – an Interactive Tool for Data Quality Assurance

paper_url: http://arxiv.org/abs/2307.04368
repo_url: None
paper_authors: Christian Sieberichs, Simon Geerkens, Alexander Braun, Thomas Waschulzik
for: Ensuring high-quality data for use in safety-critical systems
methods: Novel approach using mathematical basics and multiple examples to detect potentially harmful data points
results: Detection of data points with potentially harmful properties for use in safety-critical systems.Here is the summary in traditional Chinese characters:
for: Ensuring高品质数据 для安全重要系统
methods: Novel方法使用数学基础和多个例子检测潜在危险数据点
results: Detected潜在危险数据点for use in safety-critical systems.

Abstract
With the increasing capabilities of machine learning systems and their potential use in safety-critical systems, ensuring high-quality data is becoming increasingly important. In this paper we present a novel approach for the assurance of data quality. For this purpose, the mathematical basics are first discussed and the approach is presented using multiple examples. This results in the detection of data points with potentially harmful properties for the use in safety-critical systems.

摘要
Here is the text in Simplified Chinese:随着机器学习系统的能力的提高和其在安全关键系统中的潜在使用，保证数据质量的重要性也在增加。在这篇论文中，我们提出了一种新的数据质量保障方法。在这里，我们首先介绍了数学基础，然后通过多个示例介绍了我们的方法。这种方法可以检测数据点中可能有害的特性，以便在安全关键系统中使用。

RLTF: Reinforcement Learning from Unit Test Feedback

paper_url: http://arxiv.org/abs/2307.04349
repo_url: https://github.com/zyq-scut/rltf
paper_authors: Jiate Liu, Yiqin Zhu, Kaiwen Xiao, Qiang Fu, Xiao Han, Wei Yang, Deheng Ye
for: 提高大型自然语言模型（LLM）的代码生成性能，通过强化学习策略。
methods: 使用在线渐进学习策略，并在训练过程中接受多级别单元测试反馈。
results: 在APPS和MBPP测试benchmark上实现了状态前的性能。

Abstract
The goal of program synthesis, or code generation, is to generate executable code based on given descriptions. Recently, there has been an increasing number of studies employing reinforcement learning (RL) to improve the performance of large language models (LLMs) for code. However, these RL methods have only used offline frameworks, limiting their exploration of new sample spaces. Additionally, current approaches that utilize unit test signals are rather simple, not accounting for specific error locations within the code. To address these issues, we proposed RLTF, i.e., Reinforcement Learning from Unit Test Feedback, a novel online RL framework with unit test feedback of multi-granularity for refining code LLMs. Our approach generates data in real-time during training and simultaneously utilizes fine-grained feedback signals to guide the model towards producing higher-quality code. Extensive experiments show that RLTF achieves state-of-the-art performance on the APPS and the MBPP benchmarks. Our code can be found at: https://github.com/Zyq-scut/RLTF.

摘要
目的是代码生成或程序生成，是根据给定的描述生成可执行的代码。最近，有越来越多的研究使用强化学习（RL）来提高大型自然语言模型（LLM）的代码质量。然而，这些RL方法只使用了离线框架，限制了新样本空间的探索。另外，当前的使用单元测试信号的方法比较简单，没有考虑特定的代码错误位置。为解决这些问题，我们提出了RLTF，即基于单元测试反馈的在线RL框架，用于细化代码LLM。我们的方法在训练中生成数据并同时使用多级别的反馈信号来引导模型生成更高质量的代码。我们的实验表明，RLTF在APPS和MBPP测试准则上达到了状态 искусственный智能的性能。您可以在以下GitHub上找到我们的代码：https://github.com/Zyq-scut/RLTF。

Injecting Logical Constraints into Neural Networks via Straight-Through Estimators

paper_url: http://arxiv.org/abs/2307.04347
repo_url: https://github.com/azreasoners/cl-ste
paper_authors: Zhun Yang, Joohyung Lee, Chiyoun Park
for: 本文旨在探讨把逻辑约束直接插入神经网络学习中的挑战。
methods: 我们提出了一种系统地将逻辑约束表示为损失函数，使用梯度下降via直通估计器更新神经网络的权重，使神经网络满足逻辑约束。
results: 我们的方法可以利用GPU和批处理训练，与现有的神经符号计算方法相比，具有更好的扩展性和可扩展性。此外，我们还证明了我们的方法可以应用于不同类型的神经网络，如MLP、CNN和GNN，使它们可以通过直接学习已知约束而不需要大量标注数据。

Abstract
Injecting discrete logical constraints into neural network learning is one of the main challenges in neuro-symbolic AI. We find that a straight-through-estimator, a method introduced to train binary neural networks, could effectively be applied to incorporate logical constraints into neural network learning. More specifically, we design a systematic way to represent discrete logical constraints as a loss function; minimizing this loss using gradient descent via a straight-through-estimator updates the neural network's weights in the direction that the binarized outputs satisfy the logical constraints. The experimental results show that by leveraging GPUs and batch training, this method scales significantly better than existing neuro-symbolic methods that require heavy symbolic computation for computing gradients. Also, we demonstrate that our method applies to different types of neural networks, such as MLP, CNN, and GNN, making them learn with no or fewer labeled data by learning directly from known constraints.

摘要
<>使得神经网络学习中插入离散逻辑约束是新 символи� AI 中的主要挑战。我们发现，一种直接推导器，一种用于训练 binary 神经网络的方法，可以有效地插入逻辑约束到神经网络学习中。更 Specifically，我们设计了一种系统的方法来表示离散逻辑约束，将其作为损失函数来折算；使用梯度下降via straight-through-estimator更新神经网络的权重，以便使得 binarized 输出满足逻辑约束。实验结果表明，通过利用 GPU 和批处理训练，这种方法可以比既存的新 simulate 方法更好地缩放，而且我们的方法可以应用到不同类型的神经网络，如 MLP、CNN 和 GNN，使它们可以通过不或少的标注数据学习。Note: The translation is in Simplified Chinese, which is the standard written form of Chinese used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide that instead.

Continual Learning as Computationally Constrained Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.04345
repo_url: None
paper_authors: Saurabh Kumar, Henrik Marklund, Ashish Rao, Yifan Zhu, Hong Jun Jeon, Yueyang Liu, Benjamin Van Roy
for: 本文旨在解决人工智能领域中长期积累知识的问题，以提高人工智能能力的前iers。
methods: 本文使用了一种概念框架和工具集，以促进对 continual learning 的进一步研究。
results: 本文提出了一种概念框架和工具集，可以帮助研究人员更好地理解和解决 continual learning 问题。

Abstract
An agent that efficiently accumulates knowledge to develop increasingly sophisticated skills over a long lifetime could advance the frontier of artificial intelligence capabilities. The design of such agents, which remains a long-standing challenge of artificial intelligence, is addressed by the subject of continual learning. This monograph clarifies and formalizes concepts of continual learning, introducing a framework and set of tools to stimulate further research.

摘要
一个智能代理人，能够有效地积累知识，以发展逐渐加强的技能，可能会推动人工智能技术的前沿。这种代理人的设计，仍然是人工智能领域的长期挑战。这本善本帮助解决这个问题，提供了一个框架和一套工具，以便进一步探索。

Stroke Extraction of Chinese Character Based on Deep Structure Deformable Image Registration

paper_url: http://arxiv.org/abs/2307.04341
repo_url: https://github.com/mengli-l1/strokeextraction
paper_authors: Meng Li, Yahan Yu, Yi Yang, Guanghao Ren, Jian Wang
for: 提高中文字符识别和生成中的roke抽取精度
methods: 使用深度学习和 prior 信息，包括图像注册、图像semantic segmentation和高精度单roke抽取
results: 比基eline方法高效，可以准确抽取中文字符的rokeHere’s a more detailed explanation of each point:
for: The paper aims to improve the accuracy of stroke extraction for Chinese characters, which is an important step in character recognition and generation.
methods: The proposed method uses deep learning and prior information to extract strokes. It consists of three parts: image registration-based stroke registration, image semantic segmentation-based stroke segmentation, and high-precision extraction of single strokes. The method uses a structure deformable image registration network to achieve structure-deformable transformation while maintaining the stable morphology of single strokes.
results: The experimental results show that the proposed method outperforms the baselines, demonstrating its effectiveness in stroke extraction for Chinese characters.

Abstract
Stroke extraction of Chinese characters plays an important role in the field of character recognition and generation. The most existing character stroke extraction methods focus on image morphological features. These methods usually lead to errors of cross strokes extraction and stroke matching due to rarely using stroke semantics and prior information. In this paper, we propose a deep learning-based character stroke extraction method that takes semantic features and prior information of strokes into consideration. This method consists of three parts: image registration-based stroke registration that establishes the rough registration of the reference strokes and the target as prior information; image semantic segmentation-based stroke segmentation that preliminarily separates target strokes into seven categories; and high-precision extraction of single strokes. In the stroke registration, we propose a structure deformable image registration network to achieve structure-deformable transformation while maintaining the stable morphology of single strokes for character images with complex structures. In order to verify the effectiveness of the method, we construct two datasets respectively for calligraphy characters and regular handwriting characters. The experimental results show that our method strongly outperforms the baselines. Code is available at https://github.com/MengLi-l1/StrokeExtraction.

摘要
stroke extraction of Chinese characters plays an important role in the field of character recognition and generation. Most existing character stroke extraction methods focus on image morphological features, which often lead to errors in cross stroke extraction and stroke matching due to the lack of stroke semantics and prior information. In this paper, we propose a deep learning-based character stroke extraction method that considers stroke semantics and prior information. This method consists of three parts: image registration-based stroke registration, image semantic segmentation-based stroke segmentation, and high-precision extraction of single strokes. In the stroke registration, we propose a structure deformable image registration network to achieve structure-deformable transformation while maintaining the stable morphology of single strokes for character images with complex structures. To verify the effectiveness of the method, we construct two datasets respectively for calligraphy characters and regular handwriting characters. The experimental results show that our method significantly outperforms the baselines. Code is available at https://github.com/MengLi-l1/StrokeExtraction.Here's the translation breakdown:* roke extraction: roke抽取 (stroke extraction)* Chinese characters: 中文字符 (Chinese characters)* plays an important role: 扮演重要的角色 (plays an important role)* field of character recognition and generation: 字符识别和生成领域 (field of character recognition and generation)* most existing methods: 大多数现有方法 (most existing methods)* focus on image morphological features: 强调图像形态特征 (focus on image morphological features)* often lead to errors: 常常导致错误 (often lead to errors)* cross strokes extraction: 交叉roke抽取 (cross strokes extraction)* stroke matching: roke匹配 (stroke matching)* due to the lack of stroke semantics and prior information: 由于缺乏roke semantics和先前信息 (due to the lack of stroke semantics and prior information)* propose a deep learning-based method: 提出一种基于深度学习的方法 (propose a deep learning-based method)* consists of three parts: 分为三部分 (consists of three parts)* image registration-based stroke registration: 图像registratiooon基于stroke registration (image registration-based stroke registration)* image semantic segmentation-based stroke segmentation: 图像semantic segmentation基于stroke segmentation (image semantic segmentation-based stroke segmentation)* high-precision extraction of single strokes: 高精度的单roke抽取 (high-precision extraction of single strokes)* propose a structure deformable image registration network: 提出一种结构可变的图像registratiooon网络 (propose a structure deformable image registration network)* achieve structure-deformable transformation: 实现结构可变的变换 (achieve structure-deformable transformation)* maintaining the stable morphology of single strokes: 保持单roke的稳定形态 (maintaining the stable morphology of single strokes)* for character images with complex structures: для字符图像 WITH complex structures (for character images with complex structures)* construct two datasets: 构建两个数据集 (construct two datasets)* respectively for calligraphy characters and regular handwriting characters: 分别为楷书和常规手写字符 (respectively for calligraphy characters and regular handwriting characters)* the experimental results show that our method strongly outperforms the baselines: 实验结果表明我们的方法明显超越基线 (the experimental results show that our method strongly outperforms the baselines)* Code is available at: 代码可以在https://github.com/MengLi-l1/StrokeExtraction上获取 (Code is available at)

Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU

paper_url: http://arxiv.org/abs/2307.04339
repo_url: None
paper_authors: Zhihe Zhao, Neiwen Ling, Nan Guan, Guoliang Xing
for: 多个深度学习神经网络（DNN）的同时运行，如自动驾驶和增强现实，需要考虑不同级别的实时性要求。然而，边缘GPU上进行多个DNN任务的协调仍然是一个未经研究的领域。
methods: Miriam是一个针对边缘GPU进行多个DNN任务协调的框架，包括两个主要组成部分：灵活kernel生成器和运行时动态kernel协调器。该框架支持混合承载重要任务。
results: 对于两个边缘GPU平台， Miriam可以提高系统吞吐量92%，同时仅增加了 less than 10%的延迟 overhead，对于关键任务。与状态对照基eline相比， Miriam可以提高系统性能。

Abstract
Many applications such as autonomous driving and augmented reality, require the concurrent running of multiple deep neural networks (DNN) that poses different levels of real-time performance requirements. However, coordinating multiple DNN tasks with varying levels of criticality on edge GPUs remains an area of limited study. Unlike server-level GPUs, edge GPUs are resource-limited and lack hardware-level resource management mechanisms for avoiding resource contention. Therefore, we propose Miriam, a contention-aware task coordination framework for multi-DNN inference on edge GPU. Miriam consolidates two main components, an elastic-kernel generator, and a runtime dynamic kernel coordinator, to support mixed critical DNN inference. To evaluate Miriam, we build a new DNN inference benchmark based on CUDA with diverse representative DNN workloads. Experiments on two edge GPU platforms show that Miriam can increase system throughput by 92% while only incurring less than 10\% latency overhead for critical tasks, compared to state of art baselines.

摘要
很多应用程序，如自动驾驶和增强现实，需要同时运行多个深度神经网络（DNN），它们具有不同的实时性要求。然而，在边缘GPU上协调多个DNN任务的协调仍然是一个有限的研究领域。与服务器级GPU不同，边缘GPU具有资源限制，缺乏硬件层次资源管理机制，以避免资源竞争。因此，我们提出了 Miriam，一个协调感知的多DNN推理框架 для边缘GPU。Miriam包括两个主要组件：灵活kernel生成器和运行时动态kernel协调器，以支持混合重要DNN推理。为评估Miriam，我们创建了基于CUDA的多DNN推理测试 benchmark，包含多种代表性DNN工作负荷。实验结果表明，Miriam可以提高系统吞吐量92%，只带来少于10%的延迟增加，相比于状态 искусственный基eline。

Source-Aware Embedding Training on Heterogeneous Information Networks

paper_url: http://arxiv.org/abs/2307.04336
repo_url: None
paper_authors: Tsai Hor Chan, Chi Ho Wong, Jiajun Shen, Guosheng Yin
for: Scalable Unsupervised Multi-Source Heterogeneous Information Network Embedding (SUMSHINE) is proposed to address the issue of distribution discrepancy among subgraphs in Heterogeneous Information Networks (HINs) from multiple sources.
methods: SUMSHINE uses a scalable unsupervised framework to align the embedding distributions among multiple sources of an HIN.
results: Experimental results on real-world datasets in a variety of downstream tasks validate the performance of SUMSHINE over the state-of-the-art heterogeneous information network embedding algorithms.Here’s the text in Simplified Chinese:
for: SUMSHINE 是为了解决多源 heterogeneous information network (HIN) 中各个子图的分布差异而提出的Scalable Unsupervised Multi-Source Heterogeneous Information Network Embedding方法。
methods: SUMSHINE 使用了一种可扩展的无监督框架，将多源 HIN 中各个子图的嵌入分布进行对齐。
results: 在实际世界数据集上，SUMSHINE 在多种下游任务中表现出色，比之前的多源 HIN 嵌入算法更高效。

Abstract
Heterogeneous information networks (HINs) have been extensively applied to real-world tasks, such as recommendation systems, social networks, and citation networks. While existing HIN representation learning methods can effectively learn the semantic and structural features in the network, little awareness was given to the distribution discrepancy of subgraphs within a single HIN. However, we find that ignoring such distribution discrepancy among subgraphs from multiple sources would hinder the effectiveness of graph embedding learning algorithms. This motivates us to propose SUMSHINE (Scalable Unsupervised Multi-Source Heterogeneous Information Network Embedding) -- a scalable unsupervised framework to align the embedding distributions among multiple sources of an HIN. Experimental results on real-world datasets in a variety of downstream tasks validate the performance of our method over the state-of-the-art heterogeneous information network embedding algorithms.

摘要
各种不同类型的信息网络（HINs）已经广泛应用于实际任务中，如推荐系统、社交网络和引用网络。而现有的HIN表示学习方法可以有效地学习网络中的 semantic和结构特征。然而，我们发现忽略了不同来源的子图分布差异会妨碍图像学习算法的效果。这个问题 Motivates我们提出SUMSHINE（可扩展无监督多源多类信息网络嵌入）——一种可扩展的无监督框架，用于对多个来源的HIN嵌入的对齐。实际实验结果表明，我们的方法在多种实际任务中表现更好于现有的多源多类信息网络嵌入算法。

Enhancing Adversarial Robustness via Score-Based Optimization

paper_url: http://arxiv.org/abs/2307.04333
repo_url: None
paper_authors: Boya Zhang, Weijian Luo, Zhihua Zhang
for: 防止深度神经网络分类器受到攻击，提高人工智能的安全性。
methods: 使用分布模型进行防御，在测试时优化抗击amples，以循着分数based的先验知识向原始的干净数据方向进行优化。
results: 在多个dataset上，包括CIFAR10、CIFAR100和ImageNet，OUR方法比既有的抗击方法更高效，同时也能够提高鲁棒性和推理速度。

Abstract
Adversarial attacks have the potential to mislead deep neural network classifiers by introducing slight perturbations. Developing algorithms that can mitigate the effects of these attacks is crucial for ensuring the safe use of artificial intelligence. Recent studies have suggested that score-based diffusion models are effective in adversarial defenses. However, existing diffusion-based defenses rely on the sequential simulation of the reversed stochastic differential equations of diffusion models, which are computationally inefficient and yield suboptimal results. In this paper, we introduce a novel adversarial defense scheme named ScoreOpt, which optimizes adversarial samples at test-time, towards original clean data in the direction guided by score-based priors. We conduct comprehensive experiments on multiple datasets, including CIFAR10, CIFAR100 and ImageNet. Our experimental results demonstrate that our approach outperforms existing adversarial defenses in terms of both robustness performance and inference speed.

摘要
深度神经网络分类器面临着敌意攻击的威胁，这些攻击可以通过小量的修改引起分类器的误差。为确保人工智能的安全使用，开发有效的防御策略是非常重要。 latest studies suggest that score-based diffusion models are effective in adversarial defenses. However, existing diffusion-based defenses rely on the sequential simulation of the reversed stochastic differential equations of diffusion models, which are computationally inefficient and yield suboptimal results.在本文中，我们提出了一种新的防御方案，即 ScoreOpt，它在测试时对原始的干净数据进行优化，以遵循分布式优化的方向。我们在多个数据集上进行了广泛的实验，包括CIFAR10、CIFAR100和ImageNet。我们的实验结果表明，我们的方法在鲁棒性性能和推理速度两个方面都超过了现有的防御策略。

Learning to Generate Equitable Text in Dialogue from Biased Training Data

paper_url: http://arxiv.org/abs/2307.04303
repo_url: https://github.com/anthonysicilia/equitable-dialogue-acl2023
paper_authors: Anthony Sicilia, Malihe Alikhani
for: 这个论文主要研究了对话系统决策过程中嵌入的公平原则的影响，以及这些原则如何影响用户参与度、满意度和任务完成度。
methods: 该论文使用计算学习理论来研究公平文本生成问题，提供了公平文本生成的正式定义，并证明了学习人类化和学习公平之间的正式联系。
results: 该论文通过empirical测试验证了其理论，并证明了一些算法在生成公平文本时的性能。

Abstract
The ingrained principles of fairness in a dialogue system's decision-making process and generated responses are crucial for user engagement, satisfaction, and task achievement. Absence of equitable and inclusive principles can hinder the formation of common ground, which in turn negatively impacts the overall performance of the system. For example, misusing pronouns in a user interaction may cause ambiguity about the intended subject. Yet, there is no comprehensive study of equitable text generation in dialogue. Aptly, in this work, we use theories of computational learning to study this problem. We provide formal definitions of equity in text generation, and further, prove formal connections between learning human-likeness and learning equity: algorithms for improving equity ultimately reduce to algorithms for improving human-likeness (on augmented data). With this insight, we also formulate reasonable conditions under which text generation algorithms can learn to generate equitable text without any modifications to the biased training data on which they learn. To exemplify our theory in practice, we look at a group of algorithms for the GuessWhat?! visual dialogue game and, using this example, test our theory empirically. Our theory accurately predicts relative-performance of multiple algorithms in generating equitable text as measured by both human and automated evaluation.

摘要
“对话系统决策过程中嵌入的公平原则是关键，以确保用户参与度、满意度和任务完成度。缺乏公平和包容的原则可能导致共同基础的形成受阻，从而影响整体系统的性能。例如，在用户交互中错误使用代名词可能导致对象的涉及性 ambiguity。然而，现有的研究中没有全面研究对话系统的公平文本生成。在这种情况下，我们使用计算学理论来研究这个问题。我们提供了对equity in text generation的正式定义，并证明了学习人类化的算法实际上可以降低到学习公平的算法（在增强数据上）。这个发现使我们还能提出了可行的条件，即文本生成算法可以在不修改恶势所训练数据上学习生成公平文本。为了证明我们的理论在实践中的正确性，我们选择了一些GuessWhat?!视觉对话游戏中的算法，并使用这个例子进行实验。我们的理论准确预测了多种算法在生成公平文本方面的相对性能，并被人工和自动评估都验证。”

A Demand-Driven Perspective on Generative Audio AI

paper_url: http://arxiv.org/abs/2307.04292
repo_url: None
paper_authors: Sangshin Oh, Minsung Kang, Hyeongi Moon, Keunwoo Choi, Ben Sangbae Chon
for: 本研究的目的是了解音频研究的产业需求，以便成功地应用人工智能技术。
methods: 本研究使用了访问专业音响工程师的问卷调查，以确定研究优先级和定义各种研究任务。
results: 调查显示当前音频质量和可控性的主要瓶颈是数据集的可用性，而且提供了一些解决这些问题的可能性。

Abstract
To achieve successful deployment of AI research, it is crucial to understand the demands of the industry. In this paper, we present the results of a survey conducted with professional audio engineers, in order to determine research priorities and define various research tasks. We also summarize the current challenges in audio quality and controllability based on the survey. Our analysis emphasizes that the availability of datasets is currently the main bottleneck for achieving high-quality audio generation. Finally, we suggest potential solutions for some revealed issues with empirical evidence.

摘要
要成功部署人工智能研究，非常重要了解行业的需求。在这篇论文中，我们通过询问专业音频工程师，确定研究优先级和定义各种研究任务。我们还总结了现有数据的可用性是实现高质量音频生成的主要瓶颈。最后，我们提出了一些解决问题的可能性，并提供了实证证明。

Generalizing Graph ODE for Learning Complex System Dynamics across Environments

paper_url: http://arxiv.org/abs/2307.04287
repo_url: None
paper_authors: Zijie Huang, Yizhou Sun, Wei Wang
for: 学习多体系统动力学，特别是生物分子动力学等实际应用场景。
methods: 我们提出了一种机器学习框架，即通用图Ordinary Differential Equations（GG-ODE），用于学习连续多体系统动力学。我们使用图神经网络（GNNs）来parametrize neural ordinary differential equations（ODE），以捕捉多体系统中连续交互的特性。
results: 我们的模型可以准确预测系统动力学，特别是在远距离下。我们还发现我们的模型可以在新系统中很好地泛化，即使只有几个观察数据。

Abstract
Learning multi-agent system dynamics has been extensively studied for various real-world applications, such as molecular dynamics in biology. Most of the existing models are built to learn single system dynamics from observed historical data and predict the future trajectory. In practice, however, we might observe multiple systems that are generated across different environments, which differ in latent exogenous factors such as temperature and gravity. One simple solution is to learn multiple environment-specific models, but it fails to exploit the potential commonalities among the dynamics across environments and offers poor prediction results where per-environment data is sparse or limited. Here, we present GG-ODE (Generalized Graph Ordinary Differential Equations), a machine learning framework for learning continuous multi-agent system dynamics across environments. Our model learns system dynamics using neural ordinary differential equations (ODE) parameterized by Graph Neural Networks (GNNs) to capture the continuous interaction among agents. We achieve the model generalization by assuming the dynamics across different environments are governed by common physics laws that can be captured via learning a shared ODE function. The distinct latent exogenous factors learned for each environment are incorporated into the ODE function to account for their differences. To improve model performance, we additionally design two regularization losses to (1) enforce the orthogonality between the learned initial states and exogenous factors via mutual information minimization; and (2) reduce the temporal variance of learned exogenous factors within the same system via contrastive learning. Experiments over various physical simulations show that our model can accurately predict system dynamics, especially in the long range, and can generalize well to new systems with few observations.

摘要
学习多体系统动力学已经广泛研究了多种实际应用，如生物分子动力学。大多数现有模型都是从观察历史数据学习单个系统动力学并预测未来轨迹。然而，在实践中，我们可能会观察到不同环境中生成的多个系统，这些系统之间的潜在因素不同，如温度和重力。一种简单的解决方案是学习每个环境专门的模型，但这会忽略系统动力学之间的共同特性并提供差异环境数据罕见或有限的预测结果。这里，我们提出了GG-ODE（普适图 ordinary differential equations）机器学习框架，用于学习连续多体系统动力学 across environments。我们的模型通过使用图ael neural ordinary differential equations（GNNs）参数化的神经网络学习系统动力学，以捕捉连续间的代理人之间的交互。我们实现模型通用性 by assuming不同环境的动力学都受到同一个物理法则管理，可以通过学习共享的 ODE 函数来捕捉这些法则。每个环境的特性学习的潜在隐藏因素被 incorporated into the ODE function 以考虑它们的差异。为了提高模型性能，我们还设计了两种 regularization loss 来（1）使得学习的初始状态和隐藏因素彼此正交via 信息减少Minimization; 和（2）在同一个系统中减少学习的隐藏因素的时间变差via 对比学习。在多种物理 simulations 中，我们的模型可以准确预测系统动力学，特别是在长距离下，并可以良好地通用到新系统中。

Cloud Render Farm Services Discovery Using NLP And Ontology Based Knowledge Graph

paper_url: http://arxiv.org/abs/2307.13604
repo_url: None
paper_authors: Ruby Annette, Aisha Banu, Sharon Priya, Subash Chandran
for: 该研究旨在提供一个基于 ontology 的云 render farm 服务发现引擎，以便更好地找到符合项目需求的云 render farm 服务。
methods: 该研究使用了知识基础 reasoning 算法，包括概念相似度理解、等价理解和数值相似度理解，来确定云 render farm 服务之间的相似性。
results: 研究发现，使用 ontology 基于的服务发现引擎可以更好地找到符合项目需求的云 render farm 服务，并且比不使用 ontology 和使用通用搜索引擎更高效。

Abstract
Cloud render farm services are the animation domain specific cloud services Platform-as-a-Service (PaaS) type of cloud services that provides a complete platform to render the animation files. However, identifying the render farm services that is cost effective and also matches the functional requirements that changes for almost every project like the animation software, plug-ins required etc., is a challenge. This research work proposes an ontology-based service discovery engine named RenderSelect for the cloud render farm services. The cloud render farm ontology semantically defines the relationship among the cloud render farm services. The knowledge-based reasoning algorithms namely, the Concept similarity reasoning, Equivalent reasoning and the Numerical similarity reasoning have been applied to determine the similarity among the cloud services. The service discovery engine was evaluated for finding the services under three different scenarios namely a) with help of the ontology, b) without the help of the ontology and c) using a common search engine on the internet. The results show that the proposed service discovery engine which is specifically designed for the cloud render farm services using the ontology performs significantly better than the other two.

摘要
<> translate text into Simplified Chinese云 render farm 服务是动画领域专门的云服务平台协议（PaaS）类云服务，提供了完整的平台来渲染动画文件。然而，确定云 render farm 服务的成本效果和匹配动画软件、插件等功能要求的挑战。这项研究工作提出了基于 ontology 的服务发现引擎 named RenderSelect для云 render farm 服务。云 render farm ontology Semantically defines the relationship among the cloud render farm services。使用知识库reasoning算法，包括概念相似理解、相等理解和数值相似理解，对云服务进行相似性判断。服务发现引擎在三个不同的场景下进行测试，分别是：a) 使用 ontology，b) 无ontology和c) 使用互联网搜索引擎。结果表明，特定 для云 render farm 服务使用 ontology 的服务发现引擎在三个场景下表现出显著优异性。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format instead.

RidgeBase: A Cross-Sensor Multi-Finger Contactless Fingerprint Dataset

paper_url: http://arxiv.org/abs/2307.05563
repo_url: https://github.com/bhavinjawade/RidgeBase_Fingerprint_Camera_App
paper_authors: Bhavin Jawade, Deen Dayal Mohan, Srirangaraj Setlur, Nalini Ratha, Venu Govindaraju
for: 本研究旨在提高无接触指纹识别系统的实用性和可靠性，并且提供一个大规模的实际数据集来推动此类研究。
methods: 本研究使用了智能手机摄像头捕获的无接触指纹图像，并提出了一种基于多图像匹配的集成匹配协议，以提高无接触指纹识别的精度和可靠性。
results: 研究人员通过对RidgeBase数据集进行实验，发现该协议可以减少无接触指纹识别中的灵活性和背景影响，并且可以提高识别率和准确率。

Abstract
Contactless fingerprint matching using smartphone cameras can alleviate major challenges of traditional fingerprint systems including hygienic acquisition, portability and presentation attacks. However, development of practical and robust contactless fingerprint matching techniques is constrained by the limited availability of large scale real-world datasets. To motivate further advances in contactless fingerprint matching across sensors, we introduce the RidgeBase benchmark dataset. RidgeBase consists of more than 15,000 contactless and contact-based fingerprint image pairs acquired from 88 individuals under different background and lighting conditions using two smartphone cameras and one flatbed contact sensor. Unlike existing datasets, RidgeBase is designed to promote research under different matching scenarios that include Single Finger Matching and Multi-Finger Matching for both contactless- to-contactless (CL2CL) and contact-to-contactless (C2CL) verification and identification. Furthermore, due to the high intra-sample variance in contactless fingerprints belonging to the same finger, we propose a set-based matching protocol inspired by the advances in facial recognition datasets. This protocol is specifically designed for pragmatic contactless fingerprint matching that can account for variances in focus, polarity and finger-angles. We report qualitative and quantitative baseline results for different protocols using a COTS fingerprint matcher (Verifinger) and a Deep CNN based approach on the RidgeBase dataset. The dataset can be downloaded here: https://www.buffalo.edu/cubs/research/datasets/ridgebase-benchmark-dataset.html

摘要
contactless指纹匹配使用智能手机摄像头可以解决传统指纹系统中的主要挑战，包括卫生性获取、可搬性和演示攻击。然而，实现有用和可靠的无接触指纹匹配技术的发展受到有限的大规模实际数据的限制。为了激励更多的无接触指纹匹配技术的发展，我们介绍了RidgeBase数据集。RidgeBase包含了 более于15,000个无接触和接触基本指纹图像对，从88名个体下获得的不同背景和照明条件下的两个智能手机摄像头和一个平板式接触感测器中获得。与现有数据集不同的是，RidgeBase是设计用于促进不同匹配enario，包括单指匹配和多指匹配，以及CL2CL和C2CL验证和识别。此外，由于无接触指纹图像中的同一个手指之间的高内样异，我们提议一种基于facial recognition数据集的集合匹配协议。这种协议特别适用于实用无接触指纹匹配，可以考虑封闭、极性和手刃角的变化。我们在RidgeBase数据集上运行了一种COTS指纹匹配器（Verifinger）和一种深度 CNN 基于方法的基线结果。数据集可以在以下链接下下载：https://www.buffalo.edu/cubs/research/datasets/ridgebase-benchmark-dataset.html。

The Future of Fundamental Science Led by Generative Closed-Loop Artificial Intelligence

paper_url: http://arxiv.org/abs/2307.07522
repo_url: None
paper_authors: Hector Zenil, Jesper Tegnér, Felipe S. Abrahão, Alexander Lavin, Vipin Kumar, Jeremy G. Frey, Adrian Weller, Larisa Soldatova, Alan R. Bundy, Nicholas R. Jennings, Koichi Takahashi, Lawrence Hunter, Saso Dzeroski, Andrew Briggs, Frederick D. Gregory, Carla P. Gomes, Christopher K. I. Williams, Jon Rowe, James Evans, Hiroaki Kitano, Joshua B. Tenenbaum, Ross King
for: The paper explores the potential of AI-driven automation in scientific discovery, particularly in fundamental deep science, and aims to mitigate current problems in the scientific process such as replication of findings and systematic production of data.
methods: The paper proposes an AI-driven, automated, closed-loop approach to scientific discovery, including self-driven hypothesis generation and open-ended autonomous exploration of the hypothesis space.
results: The paper holds the promise to unleash AI’s potential for searching and discovering the fundamental structure of our world beyond what human scientists have been able to achieve, and could open doors for technological innovation to tackle some of the greatest challenges facing humanity today.

Abstract
Recent advances in machine learning and AI, including Generative AI and LLMs, are disrupting technological innovation, product development, and society as a whole. AI's contribution to technology can come from multiple approaches that require access to large training data sets and clear performance evaluation criteria, ranging from pattern recognition and classification to generative models. Yet, AI has contributed less to fundamental science in part because large data sets of high-quality data for scientific practice and model discovery are more difficult to access. Generative AI, in general, and Large Language Models in particular, may represent an opportunity to augment and accelerate the scientific discovery of fundamental deep science with quantitative models. Here we explore and investigate aspects of an AI-driven, automated, closed-loop approach to scientific discovery, including self-driven hypothesis generation and open-ended autonomous exploration of the hypothesis space. Integrating AI-driven automation into the practice of science would mitigate current problems, including the replication of findings, systematic production of data, and ultimately democratisation of the scientific process. Realising these possibilities requires a vision for augmented AI coupled with a diversity of AI approaches able to deal with fundamental aspects of causality analysis and model discovery while enabling unbiased search across the space of putative explanations. These advances hold the promise to unleash AI's potential for searching and discovering the fundamental structure of our world beyond what human scientists have been able to achieve. Such a vision would push the boundaries of new fundamental science rather than automatize current workflows and instead open doors for technological innovation to tackle some of the greatest challenges facing humanity today.

摘要

ChatGPT in the Age of Generative AI and Large Language Models: A Concise Survey

paper_url: http://arxiv.org/abs/2307.04251
repo_url: https://github.com/iamgmujtaba/scholar_search
paper_authors: Salman Mohamadi, Ghulam Mujtaba, Ngan Le, Gianfranco Doretto, Donald A. Adjeroh
for: 这个论文的主要目标是为了提供一份简洁的论文来概括现代 chatGPT 的研究进展和演化。methods: 本论文使用了两种视角来研究 chatGPT：玻璃盒视角（glass box view）和黑盒视角（black box view）。玻璃盒视角旨在理解技术的内部结构和工作机制，而黑盒视角则是将技术视为一个复杂系统，研究其输入、输出和影响。results: 本论文提供了一个全面的概述，涵盖了 chatGPT 的组件和基础元素，以及其应用、影响和意义。文章还提供了关于 LLM 和 GAI 的基本文献，并评估了现有和缺失的研究方向。最后，文章还探讨了 chatGPT 在不同领域的广泛应用和重要问题。

Abstract
ChatGPT is a large language model (LLM) created by OpenAI that has been carefully trained on a large amount of data. It has revolutionized the field of natural language processing (NLP) and has pushed the boundaries of LLM capabilities. ChatGPT has played a pivotal role in enabling widespread public interaction with generative artificial intelligence (GAI) on a large scale. It has also sparked research interest in developing similar technologies and investigating their applications and implications. In this paper, our primary goal is to provide a concise survey on the current lines of research on ChatGPT and its evolution. We considered both the glass box and black box views of ChatGPT, encompassing the components and foundational elements of the technology, as well as its applications, impacts, and implications. The glass box approach focuses on understanding the inner workings of the technology, and the black box approach embraces it as a complex system, and thus examines its inputs, outputs, and effects. This paves the way for a comprehensive exploration of the technology and provides a road map for further research and experimentation. We also lay out essential foundational literature on LLMs and GAI in general and their connection with ChatGPT. This overview sheds light on existing and missing research lines in the emerging field of LLMs, benefiting both public users and developers. Furthermore, the paper delves into the broad spectrum of applications and significant concerns in fields such as education, research, healthcare, finance, etc.

摘要
chatGPT是一个大型自然语言模型（LLM），由OpenAI公司认真训练了大量数据。它已经革命化了自然语言处理（NLP）领域，并推动了LLM能力的边缘。chatGPT使得大规模的人工智能生成（GAI）与公众进行交互，并且激发了研究人员开发类似技术的兴趣，以及对其应用和影响的研究。在这篇论文中，我们的主要目标是为chatGPT的当前研究进行简要的报告。我们包括了glass box和black box两种视角，涵盖了技术的组件和基础元素，以及其应用、影响和意义。glass box方法强调了技术的内部工作，而black box方法则是视chatGPT为一个复杂系统，从输入、输出和效果的角度进行研究。这种方法使得我们可以全面探讨技术，并为未来的研究和实验提供路线图。我们还提供了LLM和GAI的基础文献，这些文献 shed light on存在和缺失的研究方向，这有助于公众和开发者。此外，论文还探讨了LLM在教育、研究、医疗、金融等领域的广泛应用和重要问题。

A Novel Pipeline for Improving Optical Character Recognition through Post-processing Using Natural Language Processing

paper_url: http://arxiv.org/abs/2307.04245
repo_url: None
paper_authors: Aishik Rakshit, Samyak Mehta, Anirban Dasgupta
for: 这篇论文是用于提高 Optical Character Recognition (OCR) 技术的精度，特别是对于手写文本和印刷文本的应用。
methods: 该论文使用了 OCR 技术，并将其与自然语言处理 (NLP) 技术结合，以提高 OCR 的精度。
results: 该论文提出了一个端到端管道，首先使用 OCR 技术将手写或印刷文本转化为电子文本，然后使用 NLP 技术进行后处理，以提高 OCR 的精度。

Abstract
Optical Character Recognition (OCR) technology finds applications in digitizing books and unstructured documents, along with applications in other domains such as mobility statistics, law enforcement, traffic, security systems, etc. The state-of-the-art methods work well with the OCR with printed text on license plates, shop names, etc. However, applications such as printed textbooks and handwritten texts have limited accuracy with existing techniques. The reason may be attributed to similar-looking characters and variations in handwritten characters. Since these issues are challenging to address with OCR technologies exclusively, we propose a post-processing approach using Natural Language Processing (NLP) tools. This work presents an end-to-end pipeline that first performs OCR on the handwritten or printed text and then improves its accuracy using NLP.

摘要
оптическое характерное признание (OCR)技术在数字化书籍和无结构文档方面找到了应用，同时还有其他领域的应用，如交通统计、刑事调查、安全系统等。现状的方法在处理OCR打印文本的许多应用场景中表现良好，但是印刷书籍和手写文本的准确率受限，主要是因为类似的字符和手写文本中的变化。这些问题难以通过OCR技术alone解决，因此我们提议一种基于自然语言处理（NLP）的后处理方法。本文介绍了一个端到端管道，首先使用OCR技术将手写或印刷文本转化为电子文档，然后使用NLP工具进行后处理，以提高准确率。

paper_url: http://arxiv.org/abs/2307.05561
repo_url: None
paper_authors: Mahmoud Abdulsalam, Nabil Aouf
for: 本研究旨在提高视觉基于6DoF姿态估计的精度，以便在自动化操作中提高机器人抓取应用的精度。
methods: 本文提出了一种基于Transformer的6DoF姿态估计方法，包括一个新的深度估计网络和一个改进的推测网络。
results: 对比其他文献方法，本研究的结果表明，该方法在果园抓取应用中的精度明显高于其他方法。

Abstract
As demand for robotics manipulation application increases, accurate vision-based 6D pose estimation becomes essential for autonomous operations. Convolutional Neural Networks (CNNs) based approaches for pose estimation have been previously introduced. However, the quest for better performance still persists especially for accurate robotics manipulation. This quest extends to the Agri-robotics domain. In this paper, we propose TransPose, an improved Transformer-based 6D pose estimation with a depth refinement module. The architecture takes in only an RGB image as input with no additional supplementing modalities such as depth or thermal images. The architecture encompasses an innovative lighter depth estimation network that estimates depth from an RGB image using feature pyramid with an up-sampling method. A transformer-based detection network with additional prediction heads is proposed to directly regress the object's centre and predict the 6D pose of the target. A novel depth refinement module is then used alongside the predicted centers, 6D poses and depth patches to refine the accuracy of the estimated 6D pose. We extensively compared our results with other state-of-the-art methods and analysed our results for fruit-picking applications. The results we achieved show that our proposed technique outperforms the other methods available in the literature.

摘要
随着 роботи库操作应用的需求增加，精准视觉基于6D姿态估计变得非常重要。先前，基于卷积神经网络（CNN）的 pose estimation 方法已经得到了推出。然而，为了更好地实现自动化操作，特别是在农业机器人领域，仍然存在寻求更高的性能。在这篇论文中，我们提出了 TransPose，一种改进的 transformer 基于 6D 姿态估计方法。该架构只接受 RGB 图像作为输入，不需要附加的补充Modalities 如深度或热力图像。架构包括一个创新的轻量级深度估计网络，该网络使用特征层次来估计 RGB 图像中的深度，并使用更正方法进行升采样。我们还提出了一个基于 transformer 的探测网络，该网络直接将目标对象的中心点和6D 姿态估计作为输出。最后，我们提出了一种新的深度修正模块，该模块与预测的中心点、6D 姿态和深度裁剪图像一起，来修正估计的6D姿态的准确性。我们对其他文献中的结果进行了广泛比较，并对果物摘取应用进行了分析。我们所得到的结果显示，我们的提出的技术超过了现有文献中的方法。

Real-time Human Detection in Fire Scenarios using Infrared and Thermal Imaging Fusion

paper_url: http://arxiv.org/abs/2307.04223
repo_url: None
paper_authors: Truong-Dong Do, Nghe-Nhan Truong, My-Ha Le
for: 提高人员搜救效率，使用视觉基于人体检测系统在低可见性enario中提高生存机会。
methods: 利用多个摄像头捕获图像，通过推热和红外混合技术，提取有用的特征进行人体检测。
results: 实验结果显示，提posed方法可以在理想的速度下处理，并在0.5MAP中达到95%的表现。

Abstract
Fire is considered one of the most serious threats to human lives which results in a high probability of fatalities. Those severe consequences stem from the heavy smoke emitted from a fire that mostly restricts the visibility of escaping victims and rescuing squad. In such hazardous circumstances, the use of a vision-based human detection system is able to improve the ability to save more lives. To this end, a thermal and infrared imaging fusion strategy based on multiple cameras for human detection in low-visibility scenarios caused by smoke is proposed in this paper. By processing with multiple cameras, vital information can be gathered to generate more useful features for human detection. Firstly, the cameras are calibrated using a Light Heating Chessboard. Afterward, the features extracted from the input images are merged prior to being passed through a lightweight deep neural network to perform the human detection task. The experiments conducted on an NVIDIA Jetson Nano computer demonstrated that the proposed method can process with reasonable speed and can achieve favorable performance with a mAP@0.5 of 95%.

摘要
火是人类生命最严重的威胁之一，可能导致高probability of fatalities。这些严重的后果来自于大量烟雾所带来的视线受阻，使得逃生者和救援队伍的视线受到了严重限制。在这种危险情况下，使用视觉基本人体检测系统可以提高生存的可能性。因此，本文提出了一种基于多个摄像头的热成像和 инфра红成像融合策略，用于人体检测在低视野情况下。通过处理多个摄像头的输入图像，可以收集更多的有用特征进行人体检测。首先，摄像头被准确使用一个光热棋盘进行准确。然后，输入图像中提取的特征被合并并传递给一个轻量级深度神经网络进行人体检测任务。实验结果表明，提议的方法可以在合理的速度下进行处理，并且可以在0.5的MAP值上达到95%的表现。

LakeBench: Benchmarks for Data Discovery over Data Lakes

paper_url: http://arxiv.org/abs/2307.04217
repo_url: None
paper_authors: Kavitha Srinivas, Julian Dolby, Ibrahim Abdelaziz, Oktie Hassanzadeh, Harsha Kokel, Aamod Khatiwada, Tejaswini Pedapati, Subhajit Chaudhury, Horst Samulowitz
for: 本研究旨在提供数据湖中数据发现的基准测试集，以便用于评估不同的表Foundational模型在数据发现任务中的表现。
methods: 本研究使用了多种数据源，包括政府数据平台CKAN、Socrata和欧洲中央银行，从而构建了多个 benchmark tasks。然后，对4种公共可用的表Foundational模型进行了比较性evaluation。
results: 研究发现，现有的表Foundational模型尚未被专门用于数据发现任务的训练，其表现表现出了明显的改进空间。结果建议，建立这些基准测试集可能对社区有所帮助，以建立适用于数据湖中数据发现的表Foundational模型。

Abstract
Within enterprises, there is a growing need to intelligently navigate data lakes, specifically focusing on data discovery. Of particular importance to enterprises is the ability to find related tables in data repositories. These tables can be unionable, joinable, or subsets of each other. There is a dearth of benchmarks for these tasks in the public domain, with related work targeting private datasets. In LakeBench, we develop multiple benchmarks for these tasks by using the tables that are drawn from a diverse set of data sources such as government data from CKAN, Socrata, and the European Central Bank. We compare the performance of 4 publicly available tabular foundational models on these tasks. None of the existing models had been trained on the data discovery tasks that we developed for this benchmark; not surprisingly, their performance shows significant room for improvement. The results suggest that the establishment of such benchmarks may be useful to the community to build tabular models usable for data discovery in data lakes.

摘要
在企业中，有一个增长的需求是智能导航数据湖，尤其是数据发现。企业中对于找到相关的表格在数据存储中的能力是非常重要。这些表格可能是可合并的、可连接的或彼此之间的子集。现有的公共领域中没有很多对这些任务的标准准则，相关的工作都是针对私有数据集进行的。在 LakeBench 中，我们开发了多个对这些任务的标准准则，使用从多种数据源中抽出的多个表格，如政府数据平台 CKAN、Socrata 和欧洲中央银行。我们对这些任务的表现进行了比较，发现现有的公共可用基础模型中没有任何一个已经在这些任务上进行过训练。结果表明，建立这些准则可能对社区有用，以建立适用于数据发现的表格模型。

Hierarchical Autoencoder-based Lossy Compression for Large-scale High-resolution Scientific Data

paper_url: http://arxiv.org/abs/2307.04216
repo_url: https://github.com/hieutrungle/data-slim
paper_authors: Hieu Le, Hernan Santos, Jian Tao
for: 大规模科学数据压缩
methods: 使用自适应神经网络进行压缩
results: 实现了高压缩率并保持高重建质量Here’s a more detailed explanation of each point:
for: The paper is focused on compressing large-scale scientific data, which is a growing challenge in many domains. The authors propose a neural network-based approach to address this issue.
methods: The proposed method uses an Autoencoder-based neural network to compress the data. The network is trained to reconstruct the original data from the compressed representation, allowing for efficient compression and reconstruction.
results: The authors test their method on several benchmark data sets and achieve a high compression ratio (140) without compromising the reconstruction quality. They also apply the method to a large-scale high-resolution climate modeling data set and achieve a compression ratio of 200 with negligible reconstruction error.

Abstract
Lossy compression has become an important technique to reduce data size in many domains. This type of compression is especially valuable for large-scale scientific data, whose size ranges up to several petabytes. Although Autoencoder-based models have been successfully leveraged to compress images and videos, such neural networks have not widely gained attention in the scientific data domain. Our work presents a neural network that not only significantly compresses large-scale scientific data but also maintains high reconstruction quality. The proposed model is tested with scientific benchmark data available publicly and applied to a large-scale high-resolution climate modeling data set. Our model achieves a compression ratio of 140 on several benchmark data sets without compromising the reconstruction quality. Simulation data from the High-Resolution Community Earth System Model (CESM) Version 1.3 over 500 years are also being compressed with a compression ratio of 200 while the reconstruction error is negligible for scientific analysis.

摘要
丢弃性压缩已成为许多领域中减少数据大小的重要技术。特别是在大规模科学数据领域，数据的大小可以达到几百亿字节级别。虽然基于自适应神经网络的模型已成功应用于图像和视频压缩，但这些神经网络在科学数据领域并未受到广泛关注。我们的工作推出了一种可以高效压缩大规模科学数据，同时保持高重建质量的神经网络模型。我们的模型在公共可用的科学数据 benchmark 上进行测试，并应用于大规模高分辨率气候模拟数据集。我们的模型在多个 benchmark 数据集上实现了压缩率为140，而且重建质量仍然保持在高水平。同时，我们还对高分辨率地球系统模型（CESM）版本1.3的500年大量数据进行压缩，压缩率达200，而且重建错误几乎可以忽略不计。

2023-07-10

cs.CL

cs.CL - 2023-07-10

BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset

paper_url: http://arxiv.org/abs/2307.04657
repo_url: None
paper_authors: Jiaming Ji, Mickel Liu, Juntao Dai, Xuehai Pan, Chi Zhang, Ce Bian, Chi Zhang, Ruiyang Sun, Yizhou Wang, Yaodong Yang
for: 这个论文旨在鼓励大语言模型（LLM）的安全对齐研究，提供了一个名为BeaverTails的数据集。
methods: 这个数据集独特地将问答对的注释分为帮助性和无害性两个方面，因此可以为这两个关键特征提供多个视角。
results: 作者们编译了30,207个问答对的安全元标和30,144个专家对比数据，并在内容审核和人工反馈学习（RLHF）中应用了BeaverTails，证明了其在LLM的实际安全措施方面的潜在价值。

Abstract
In this paper, we introduce the BeaverTails dataset, aimed at fostering research on safety alignment in large language models (LLMs). This dataset uniquely separates annotations of helpfulness and harmlessness for question-answering pairs, thus offering distinct perspectives on these crucial attributes. In total, we have compiled safety meta-labels for 30,207 question-answer (QA) pairs and gathered 30,144 pairs of expert comparison data for both the helpfulness and harmlessness metrics. We further showcase applications of BeaverTails in content moderation and reinforcement learning with human feedback (RLHF), emphasizing its potential for practical safety measures in LLMs. We believe this dataset provides vital resources for the community, contributing towards the safe development and deployment of LLMs. Our project page is available at the following URL: https://sites.google.com/view/pku-beavertails.

摘要
在本文中，我们介绍了BeaverTails数据集，用于推动大语言模型（LLM）的安全对齐研究。这个数据集独特地将问答对的有用性和无害性标注分开，因此可以提供不同的视角。总共，我们对30,207个问答对进行了安全元标注，并收集了30,144对专家比较数据，以便在有用性和无害性metric上进行评估。我们还展示了在内容审核和人工回馈学习（RLHF）中使用BeaverTails的应用，强调其在LLM的安全实施中的潜在价值。我们认为这个数据集为研究者提供了重要的资源，帮助开发和部署LLM的安全。项目页面的URL为：https://sites.google.com/view/pku-beavertails。

Measuring Lexical Diversity in Texts: The Twofold Length Problem

paper_url: http://arxiv.org/abs/2307.04626
repo_url: None
paper_authors: Yves Bestgen
for: 评估语言学习者文本 lexical diversity 的方法和问题
methods: 使用probabilistic或algorithmic方法来缩短文本长度，但都未能解决第二个问题：敏感度参数的影响
results: 三个英语语言学习者文本 dataset 的分析显示，使用这些方法可以解决长度问题，但都未能解决第二个问题

Abstract
The impact of text length on the estimation of lexical diversity has captured the attention of the scientific community for more than a century. Numerous indices have been proposed, and many studies have been conducted to evaluate them, but the problem remains. This methodological review provides a critical analysis not only of the most commonly used indices in language learning studies, but also of the length problem itself, as well as of the methodology for evaluating the proposed solutions. The analysis of three datasets of English language-learners' texts revealed that indices that reduce all texts to the same length using a probabilistic or an algorithmic approach solve the length dependency problem; however, all these indices failed to address the second problem, which is their sensitivity to the parameter that determines the length to which the texts are reduced. The paper concludes with recommendations for optimizing lexical diversity analysis.

摘要
Text length的影响对语言多样性的估计已经在科学社区内引起了超过一个世纪的关注。许多指标已经被提出，但问题仍然存在。本方法评论提供了不只是语言学研究中最常用的指标的重要分析，还包括长度问题本身以及评估提出的解决方案的方法学分析。分析三个英语学习者的文本数据表明，使用概率或算法方法减小所有文本到同一个长度可以解决长度依赖问题，但所有这些指标都无法解决第二个问题，即它们对参数的敏感性。文章结束于优化语言多样性分析的建议。

On the Computational Modeling of Meaning: Embodied Cognition Intertwined with Emotion

paper_url: http://arxiv.org/abs/2307.04518
repo_url: None
paper_authors: Casey Kennington
for: 本文探讨了语言意义如何形成，尤其关注儿童语言学习。
methods: 本文使用了历史风格，汇总了作者在不同时间发现的想法，并描述了儿童语言学习的设置、身体知识和情感知识的重要性。
results: 本文提出了语言学习机器人需要满足的一些要求，以及对未来语言模型的建议。

Abstract
This document chronicles this author's attempt to explore how words come to mean what they do, with a particular focus on child language acquisition and what that means for models of language understanding.\footnote{I say \emph{historical} because I synthesize the ideas based on when I discovered them and how those ideas influenced my later thinking.} I explain the setting for child language learning, how embodiment -- being able to perceive and enact in the world, including knowledge of concrete and abstract concepts -- is crucial, and how emotion and cognition relate to each other and the language learning process. I end with what I think are some of the requirements for a language-learning agent that learns language in a setting similar to that of children. This paper can act as a potential guide for ongoing and future work in modeling language.

摘要
这份文档记录作者对语言意义的探索，尤其是儿童语言学习的过程。作者提出了一些关于语言理解的模型，并在文档中解释了儿童语言学习的背景和embodiment的重要性。文档还讨论了情感和认知之间的关系，以及语言学习过程中的感知和行为。最后，作者提出了一些对于模拟语言学习的agent来说的需求。这份文档可以作为未来语言模型研究的指南。

Detecting LLM-Generated Text in Computing Education: A Comparative Study for ChatGPT Cases

paper_url: http://arxiv.org/abs/2307.07411
repo_url: None
paper_authors: Michael Sheinman Orenstrakh, Oscar Karnalim, Carlos Anibal Suarez, Michael Liut
for: This paper aims to evaluate the effectiveness of eight publicly-available LLM-generated text detectors in detecting LLM-generated text in computer science submissions.
methods: The authors collected 124 submissions from computer science students and generated 40 ChatGPT submissions to evaluate the eight LLM-generated text detectors using accuracy, false positives, and resilience measures.
results: The results show that CopyLeaks is the most accurate LLM-generated text detector, GPTKit is the best LLM-generated text detector to reduce false positives, and GLTR is the most resilient LLM-generated text detector. However, the authors also note that all LLM-generated text detectors are less accurate with code, other languages, and after the use of paraphrasing tools.Here’s the same information in Simplified Chinese:
for: 这个论文目的是评估现有的8种公共可用LLM生成文本检测器在计算机科学提交中检测LLM生成文本的有效性。
methods: 作者收集了124名计算机科学学生的提交和生成40个ChatGPT提交来评估8种LLM生成文本检测器，使用准确率、假阳性和抗耗力三个指标评估。
results: 结果显示，CopyLeaks是LLM生成文本检测器中最准确的，GPTKit可以减少假阳性，而GLTR是最有抗耗力的LLM生成文本检测器。然而，作者还注意到，所有LLM生成文本检测器在代码、其他语言和使用篇章工具（如QuillBot）后都有减少准确性的问题。

Abstract
Due to the recent improvements and wide availability of Large Language Models (LLMs), they have posed a serious threat to academic integrity in education. Modern LLM-generated text detectors attempt to combat the problem by offering educators with services to assess whether some text is LLM-generated. In this work, we have collected 124 submissions from computer science students before the creation of ChatGPT. We then generated 40 ChatGPT submissions. We used this data to evaluate eight publicly-available LLM-generated text detectors through the measures of accuracy, false positives, and resilience. The purpose of this work is to inform the community of what LLM-generated text detectors work and which do not, but also to provide insights for educators to better maintain academic integrity in their courses. Our results find that CopyLeaks is the most accurate LLM-generated text detector, GPTKit is the best LLM-generated text detector to reduce false positives, and GLTR is the most resilient LLM-generated text detector. We also express concerns over 52 false positives (of 114 human written submissions) generated by GPTZero. Finally, we note that all LLM-generated text detectors are less accurate with code, other languages (aside from English), and after the use of paraphrasing tools (like QuillBot). Modern detectors are still in need of improvements so that they can offer a full-proof solution to help maintain academic integrity. Further, their usability can be improved by facilitating a smooth API integration, providing clear documentation of their features and the understandability of their model(s), and supporting more commonly used languages.

摘要
因为最近的大语言模型（LLM）的改进和普遍可用性，它们对教育的学术 integrity 造成了严重的威胁。现代 LLM 生成文本检测器尝试通过为教师提供检测 LLM 生成文本的服务，以确保学术 integrity 的维护。在这项工作中，我们收集了 124 篇计算机科学学生的作业，然后生成了 40 篇 ChatGPT 作业。我们使用这些数据来评估 eight 个公共可用的 LLM 生成文本检测器，通过准确率、假阳性和抗耗能力三个指标进行评估。本研究的目的是通过检测器的评估，了解哪些 LLM 生成文本检测器效果好、哪些需要改进，以便为教育行业提供更好的学术 integrity 维护方案。我们的结果显示，CopyLeaks 是最准确的 LLM 生成文本检测器，GPTKit 是减少假阳性的最佳选择，GLTR 是最有抗耗能力的 LLM 生成文本检测器。此外，我们还发现 GPTZero 对 114 篇人工写作中的 52 个假阳性存在问题。最后，我们注意到所有 LLM 生成文本检测器都对代码、其他语言（除英语外）和使用副作业工具（如 QuillBot）后的文本准确率较低。现代检测器仍需进一步改进，以提供不可攻击的解决方案，并且可以提高使用者体验，例如通过简单的 API 集成、清晰的功能和模型文档、以及支持更常用的语言。

Enhancing Biomedical Text Summarization and Question-Answering: On the Utility of Domain-Specific Pre-Training

paper_url: http://arxiv.org/abs/2307.04412
repo_url: None
paper_authors: Dima Galat, Marian-Andrei Rizoiu
for: 这个论文旨在解决生物医学摘要 generation 需要大量数据训练的问题。
methods: 论文使用了转移学习，并证明了在 BioASQ 摘要任务中，不一定需要域专的预训练。作者提出了一种适合模型体系，并在这种模型体系下进行了任务特定细化，从而提出了一种三步细化方法，只需要千个域内示例即可。
results: 研究结果表明，没有域专预训练的大语言模型在某些域pecific生物医学文本生成任务中可以具有显著优势。

Abstract
Biomedical summarization requires large datasets to train for text generation. We show that while transfer learning offers a viable option for addressing this challenge, an in-domain pre-training does not always offer advantages in a BioASQ summarization task. We identify a suitable model architecture and use it to show a benefit of a general-domain pre-training followed by a task-specific fine-tuning in the context of a BioASQ summarization task, leading to a novel three-step fine-tuning approach that works with only a thousand in-domain examples. Our results indicate that a Large Language Model without domain-specific pre-training can have a significant edge in some domain-specific biomedical text generation tasks.

摘要

TIM: Teaching Large Language Models to Translate with Comparison

paper_url: http://arxiv.org/abs/2307.04408
repo_url: https://github.com/lemon0830/tim
paper_authors: Jiali Zeng, Fandong Meng, Yongjing Yin, Jie Zhou
for: 提高大型自然语言模型（LLM）在翻译任务中的表现
methods: 使用比较例子来教育LLM学习翻译
results: 比较例子学习翻译的方法可以超越现有的方法，并提高LLM在翻译任务中的表现

Abstract
Open-sourced large language models (LLMs) have demonstrated remarkable efficacy in various tasks with instruction tuning. However, these models can sometimes struggle with tasks that require more specialized knowledge such as translation. One possible reason for such deficiency is that instruction tuning aims to generate fluent and coherent text that continues from a given instruction without being constrained by any task-specific requirements. Moreover, it can be more challenging for tuning smaller LLMs with lower-quality training data. To address this issue, we propose a novel framework using examples in comparison to teach LLMs to learn translation. Our approach involves presenting the model with examples of correct and incorrect translations and using a preference loss to guide the model's learning. We evaluate our method on WMT2022 test sets and show that it outperforms existing methods. Our findings offer a new perspective on fine-tuning LLMs for translation tasks and provide a promising solution for generating high-quality translations. Please refer to Github for more details: https://github.com/lemon0830/TIM.

摘要

Enhancing Cross-lingual Transfer via Phonemic Transcription Integration

paper_url: http://arxiv.org/abs/2307.04361
repo_url: https://github.com/nhhoang96/phonemic_xlingual
paper_authors: Hoang H. Nguyen, Chenwei Zhang, Tao Zhang, Eugene Rohrbaugh, Philip S. Yu
for: 本研究旨在提高cross-lingual transfer的效果，特别是为了 bridging the gap among Chinese-Japanese-Korean-Vietnamese (CJKV) languages.
methods: 本研究使用了一个叫PhoneXL的框架，它将phonemic transcriptions作为cross-lingual transfer中的一个额外语言特征，以便优化cross-lingual transfer。PhoneXL使用了不同的对Alignment objectives，包括本地一对一对 alignment、多模态上下文对 alignment以及多语言上下文对 alignment。
results: 本研究的试验结果表明，PhoneXL可以提高cross-lingual transfer的效果，特别是在CJKV语言之间。在Named Entity Recognition和Part-of-Speech Tagging两个token-level任务上，PhoneXL可以实现了Consistent improvements over orthographic-based multilingual PLMs。

Abstract
Previous cross-lingual transfer methods are restricted to orthographic representation learning via textual scripts. This limitation hampers cross-lingual transfer and is biased towards languages sharing similar well-known scripts. To alleviate the gap between languages from different writing scripts, we propose PhoneXL, a framework incorporating phonemic transcriptions as an additional linguistic modality beyond the traditional orthographic transcriptions for cross-lingual transfer. Particularly, we propose unsupervised alignment objectives to capture (1) local one-to-one alignment between the two different modalities, (2) alignment via multi-modality contexts to leverage information from additional modalities, and (3) alignment via multilingual contexts where additional bilingual dictionaries are incorporated. We also release the first phonemic-orthographic alignment dataset on two token-level tasks (Named Entity Recognition and Part-of-Speech Tagging) among the understudied but interconnected Chinese-Japanese-Korean-Vietnamese (CJKV) languages. Our pilot study reveals phonemic transcription provides essential information beyond the orthography to enhance cross-lingual transfer and bridge the gap among CJKV languages, leading to consistent improvements on cross-lingual token-level tasks over orthographic-based multilingual PLMs.

摘要

Local one-to-one alignment between the two modalities2. Alignment via multi-modality contexts to leverage information from additional modalities3. Alignment via multilingual contexts using bilingual dictionariesWe also release the first phonemic-orthographic alignment dataset on two token-level tasks (Named Entity Recognition and Part-of-Speech Tagging) among the understudied but interconnected Chinese-Japanese-Korean-Vietnamese (CJKV) languages. Our pilot study shows that phonemic transcriptions provide essential information beyond the orthography to enhance cross-lingual transfer and bridge the gap among CJKV languages, leading to consistent improvements on cross-lingual token-level tasks over orthographic-based multilingual PLMs.Translation notes:* “orthographic representation” is translated as “文字表示” (wén zì biǎo yì)* “phonemic transcriptions” is translated as “phoneme транскрипции” (fōnēm yīn xiǎng)* “alignment” is translated as “对应” (duì yì)* “modality” is translated as “modalities” is translated as “语言 modalities” (yǔ yán modalities)* “token-level tasks” is translated as “токен级任务” (tuō kēn jīn yè)* “multilingual PLMs” is translated as “多语言 PLMs” (duō yǔ yán PLMs)

Event Extraction as Question Generation and Answering

paper_url: http://arxiv.org/abs/2307.05567
repo_url: https://github.com/dataminr-ai/event-extraction-as-question-generation-and-answering
paper_authors: Di Lu, Shihao Ran, Joel Tetreault, Alejandro Jaimes
for: 这篇论文的目的是提出一种基于问答模型的事件抽取方法，以提高事件抽取的准确率和效率。
methods: 这篇论文使用了问题生成模型（QG）和动态模板来生成更加具有Contextual information的问题，从而提高事件抽取的准确率。
results: 实验表明， compared to prior single-task-based models, QGA-EE在ACE05英语 dataset上的表现更高， indicating that the proposed method can effectively improve the accuracy and efficiency of event extraction.

Abstract
Recent work on Event Extraction has reframed the task as Question Answering (QA), with promising results. The advantage of this approach is that it addresses the error propagation issue found in traditional token-based classification approaches by directly predicting event arguments without extracting candidates first. However, the questions are typically based on fixed templates and they rarely leverage contextual information such as relevant arguments. In addition, prior QA-based approaches have difficulty handling cases where there are multiple arguments for the same role. In this paper, we propose QGA-EE, which enables a Question Generation (QG) model to generate questions that incorporate rich contextual information instead of using fixed templates. We also propose dynamic templates to assist the training of QG model. Experiments show that QGA-EE outperforms all prior single-task-based models on the ACE05 English dataset.

摘要
最近的Event Extraction研究已经将任务重新定义为问题回答（QA），并取得了良好的结果。这种方法可以直接预测事件参数而不是首先提取候选人选。然而，问题通常基于固定模板，rarely leveraging上下文信息如相关参数。此外，先前的QA-based方法很难处理多个参数的同一个角色情况。在这篇论文中，我们提议QGA-EE，即使用问题生成（QG）模型生成含有丰富上下文信息的问题，而不是使用固定模板。我们还提出了动态模板，以帮助QG模型的训练。实验显示，QGA-EE在ACE05英语数据集上的单任务模型都超过了所有之前的模型。

HistRED: A Historical Document-Level Relation Extraction Dataset

paper_url: http://arxiv.org/abs/2307.04285
repo_url: https://huggingface.co/datasets/Soyoung/HistRED
paper_authors: Soyoung Yang, Minseok Choi, Youngwoo Cho, Jaegul Choo
for: 本研究的目的是促进历史关系抽取（RE）研究，探讨历史数据中的潜在应用场景。
methods: 该研究使用了 Yeonhaengnok 集成了的 HistRED 数据集，该数据集包含了 Hanja 和 Korean 文本的双语注释，以支持历史 RE 任务的研究。
results: 研究提出了一种双语 RE 模型，利用了 Korean 和 Hanja 文本上的上下文来预测实体之间的关系。模型在 HistRED 数据集上表现出色，超过了单语基elines，表明使用多语言上下文可以补充 RE 预测。Here’s the simplified Chinese text for each point:
for: 这个研究的目的是推动历史关系抽取（RE）研究，探索历史数据中的潜在应用场景。
methods: 该研究使用了《연향록》集成的 HistRED 数据集，该数据集包含了汉字和韩语文本的双语注释，以支持历史 RE 任务的研究。
results: 研究提出了一种双语 RE 模型，利用了韩语和汉字文本上的上下文来预测实体之间的关系。模型在 HistRED 数据集上表现出色，超过了单语基elines，表明使用多语言上下文可以补充 RE 预测。

Abstract
Despite the extensive applications of relation extraction (RE) tasks in various domains, little has been explored in the historical context, which contains promising data across hundreds and thousands of years. To promote the historical RE research, we present HistRED constructed from Yeonhaengnok. Yeonhaengnok is a collection of records originally written in Hanja, the classical Chinese writing, which has later been translated into Korean. HistRED provides bilingual annotations such that RE can be performed on Korean and Hanja texts. In addition, HistRED supports various self-contained subtexts with different lengths, from a sentence level to a document level, supporting diverse context settings for researchers to evaluate the robustness of their RE models. To demonstrate the usefulness of our dataset, we propose a bilingual RE model that leverages both Korean and Hanja contexts to predict relations between entities. Our model outperforms monolingual baselines on HistRED, showing that employing multiple language contexts supplements the RE predictions. The dataset is publicly available at: https://huggingface.co/datasets/Soyoung/HistRED under CC BY-NC-ND 4.0 license.

摘要
尽管关系提取（RE）任务在不同领域得到了广泛应用，但历史上的应用尚未得到了充分的研究。为推动历史RE研究，我们现在提出了 HistRED，它是基于《연해нг록》的一个建构。《연해нг록》是一种原始写于汉字的记录，后来被翻译成朝鲜语。HistRED提供了双语注释，使得RE可以在朝鲜语和汉字文本之间进行。此外，HistRED支持多种自 contenido Subtexts，其中 lengths 从句子级到文档级，以支持多种上下文设置，以便研究人员可以评估其RE模型的可靠性。为了证明我们的数据集的有用性，我们提出了一种双语RE模型，该模型利用了朝鲜语和汉字上下文来预测实体之间的关系。我们的模型在 HistRED 上表现出色，超过了单语基eline，显示了employna多语言上下文可以补充RE预测。该数据集现在可以在以下链接获取：https://huggingface.co/datasets/Soyoung/HistRED， unter CC BY-NC-ND 4.0 license。

Automated Essay Scoring in Argumentative Writing: DeBERTeachingAssistant

paper_url: http://arxiv.org/abs/2307.04276
repo_url: None
paper_authors: Yann Hicke, Tonghua Tian, Karan Jha, Choong Hee Kim
for: This paper aims to improve the assessment of argumentative writing by developing a transformer-based architecture that can annotate discourse elements for their persuasiveness quality.
methods: The proposed method uses a transformer-based architecture to analyze argumentative writing and provide annotations for the persuasiveness quality of various discourse elements.
results: The proposed method achieved above-human accuracy in annotating argumentative writing discourse elements for their persuasiveness quality.Here’s the text in Simplified Chinese:
for: 这篇论文的目的是改进口头写作评价，通过开发基于转换器的架构，对口头写作中的语言元素进行评估，并提供有关语言元素的评价结果。
methods: 提议的方法使用基于转换器的架构来分析口头写作，并提供对语言元素的评价结果。
results: 提议的方法在评估口头写作中的语言元素评价上达到了人类以上的准确率。

Abstract
Automated Essay scoring has been explored as a research and industry problem for over 50 years. It has drawn a lot of attention from the NLP community because of its clear educational value as a research area that can engender the creation of valuable time-saving tools for educators around the world. Yet, these tools are generally focused on detecting good grammar, spelling mistakes, and organization quality but tend to fail at incorporating persuasiveness features in their final assessment. The responsibility to give actionable feedback to the student to improve the strength of their arguments is left solely on the teacher's shoulders. In this work, we present a transformer-based architecture capable of achieving above-human accuracy in annotating argumentative writing discourse elements for their persuasiveness quality and we expand on planned future work investigating the explainability of our model so that actionable feedback can be offered to the student and thus potentially enable a partnership between the teacher's advice and the machine's advice.

摘要

Augmenters at SemEval-2023 Task 1: Enhancing CLIP in Handling Compositionality and Ambiguity for Zero-Shot Visual WSD through Prompt Augmentation and Text-To-Image Diffusion

paper_url: http://arxiv.org/abs/2307.05564
repo_url: None
paper_authors: Jie S. Li, Yow-Ting Shiue, Yong-Siang Shih, Jonas Geiping
for: 这篇论文主要是为了解决英语视觉单词意思异步问题。
methods: 论文提出了两种解决方案：增强CLIP和稳定扩散抽样（SD Sampling）。增强CLIP通过使用大语言模型（LLMs）生成句子，以增强CLIP模型对文本中词语的理解能力。SD Sampling使用文本到图像的稳定扩散来生成多个图像，提高图像和文本的匹配机会。
results: 实验结果表明，增强CLIP和SD Sampling可以提高图像和文本的匹配率，并且可以减少多对多的问题。

Abstract
This paper describes our zero-shot approaches for the Visual Word Sense Disambiguation (VWSD) Task in English. Our preliminary study shows that the simple approach of matching candidate images with the phrase using CLIP suffers from the many-to-many nature of image-text pairs. We find that the CLIP text encoder may have limited abilities in capturing the compositionality in natural language. Conversely, the descriptive focus of the phrase varies from instance to instance. We address these issues in our two systems, Augment-CLIP and Stable Diffusion Sampling (SD Sampling). Augment-CLIP augments the text prompt by generating sentences that contain the context phrase with the help of large language models (LLMs). We further explore CLIP models in other languages, as the an ambiguous word may be translated into an unambiguous one in the other language. SD Sampling uses text-to-image Stable Diffusion to generate multiple images from the given phrase, increasing the likelihood that a subset of images match the one that paired with the text.

摘要

Assessing the efficacy of large language models in generating accurate teacher responses

paper_url: http://arxiv.org/abs/2307.04274
repo_url: None
paper_authors: Yann Hicke, Abhishek Masand, Wentao Guo, Tushaar Gangavarapu
for: 这个研究是为了评估大语言模型在教育对话中的生成能力，以便模拟知gable teacher。
methods: 研究使用了多种标准的生成模型，包括GPT-4（少量学习、在场景学习）、精度调整的GPT-2和DialoGPT，并使用了强化学习来优化Flan-T5模型。
results: 研究发现GPT-4在Techer-Student Chatroom Corpus子集上表现出色， measured using BERTScore和DialogRPT。 Additionally, the study found that certain dataset characteristics, such as sampling, representativeness, and dialog completeness, can pose challenges to fine-tuning and contribute to the poor generalizability of the fine-tuned models.

Abstract
(Tack et al., 2023) organized the shared task hosted by the 18th Workshop on Innovative Use of NLP for Building Educational Applications on generation of teacher language in educational dialogues. Following the structure of the shared task, in this study, we attempt to assess the generative abilities of large language models in providing informative and helpful insights to students, thereby simulating the role of a knowledgeable teacher. To this end, we present an extensive evaluation of several benchmarking generative models, including GPT-4 (few-shot, in-context learning), fine-tuned GPT-2, and fine-tuned DialoGPT. Additionally, to optimize for pedagogical quality, we fine-tuned the Flan-T5 model using reinforcement learning. Our experimental findings on the Teacher-Student Chatroom Corpus subset indicate the efficacy of GPT-4 over other fine-tuned models, measured using BERTScore and DialogRPT. We hypothesize that several dataset characteristics, including sampling, representativeness, and dialog completeness, pose significant challenges to fine-tuning, thus contributing to the poor generalizability of the fine-tuned models. Finally, we note the need for these generative models to be evaluated with a metric that relies not only on dialog coherence and matched language modeling distribution but also on the model's ability to showcase pedagogical skills.

摘要
我们认为， dataset 特性，包括采样、 representativeness 和对话完整性，对于调整带来了 significi cant 挑战，这些挑战对于调整模型的泛化性具有负面影响。最后，我们注意到，为了评估这些生成模型，需要使用一种指标，不仅考虑对话 coherence 和模型语言分布的匹配，还需要考虑模型在教学技巧方面的表现。

Automatic Coding at Scale: Design and Deployment of a Nationwide System for Normalizing Referrals in the Chilean Public Healthcare System

paper_url: http://arxiv.org/abs/2307.05560
repo_url: None
paper_authors: Fabián Villena, Matías Rojas, Felipe Arias, Jorge Pacheco, Paulina Vera, Jocelyn Dunstan
for: 该论文主要目的是提出一种自动将疾病提取出 из医疗文档，以便进行 epidemiological studies 等研究。
methods: 该论文提出了一种两步方法，首先使用 state-of-the-art NER 模型来识别疾病提到，然后使用基于 Elasticsearch 的搜索引擎系统将最相关的疾病代码分配给疾病提到。
results: 论文的实验结果表明，该系统可以准确地自动分配疾病代码，MAP 得分为 0.63 和 0.83 分别在 subcategory 和 category 两个水平上。

Abstract
The disease coding task involves assigning a unique identifier from a controlled vocabulary to each disease mentioned in a clinical document. This task is relevant since it allows information extraction from unstructured data to perform, for example, epidemiological studies about the incidence and prevalence of diseases in a determined context. However, the manual coding process is subject to errors as it requires medical personnel to be competent in coding rules and terminology. In addition, this process consumes a lot of time and energy, which could be allocated to more clinically relevant tasks. These difficulties can be addressed by developing computational systems that automatically assign codes to diseases. In this way, we propose a two-step system for automatically coding diseases in referrals from the Chilean public healthcare system. Specifically, our model uses a state-of-the-art NER model for recognizing disease mentions and a search engine system based on Elasticsearch for assigning the most relevant codes associated with these disease mentions. The system's performance was evaluated on referrals manually coded by clinical experts. Our system obtained a MAP score of 0.63 for the subcategory level and 0.83 for the category level, close to the best-performing models in the literature. This system could be a support tool for health professionals, optimizing the coding and management process. Finally, to guarantee reproducibility, we publicly release the code of our models and experiments.

摘要
疾病编码任务是将每个在临床文档中提到的疾病分配一个从控制词汇中获取的唯一标识符。这项任务非常重要，因为它允许从无结构数据中提取信息，以进行例如，疾病发生率和患病率的评估。然而，手动编码过程受到误差的影响，因为医疗人员需要熟悉编码规则和术语。此外，这个过程需要很多时间和能量，这些资源可以用于更有价值的临床任务。为解决这些困难，我们提出了一种自动将疾病编码为疾病名称的两步系统。具体来说，我们的模型使用了当前领域的最佳NER模型，以识别疾病提到的文本，并使用基于Elasticsearch的搜索引擎系统，将疾病提到的最相关的编码词归类。我们的系统在专家手动编码的referral上进行评估，我们的系统在分类层次上获得了MAP分数为0.63，在类别层次上获得了MAP分数为0.83，与文献中最佳模型几乎相同。这个系统可以作为医疗专业人员的支持工具，优化编码和管理过程。最后，为保证可重现性，我们在线上公开发布了我们的模型和实验。

2023-07-10

cs.LG

cs.LG - 2023-07-10

On the power of graph neural networks and the role of the activation function

paper_url: http://arxiv.org/abs/2307.04661
repo_url: None
paper_authors: Sammy Khalife, Amitabh Basu
for: 本研究探讨了图 neural network (GNN) 的表达能力。
methods: 作者使用了对凑分多项式活化函数的研究，证明了任何具有固定大小的 GNN 无法在任意数量的迭代过程中分辨两个非同构的根树。
results: 研究发现，允许Activation function 不是固定多项式的 GNN 在两轮迭代过程中可以分辨任何两个非同构的根树。此外，研究还回答了 [Grohe, 2021] 提出的一个开问，并证明了 bounded 和 unbounded 大小 GNN 之间存在势必的分化。

Abstract
In this article we present new results about the expressivity of Graph Neural Networks (GNNs). We prove that for any GNN with piecewise polynomial activations, whose architecture size does not grow with the graph input sizes, there exists a pair of non-isomorphic rooted trees of depth two such that the GNN cannot distinguish their root vertex up to an arbitrary number of iterations. The proof relies on tools from the algebra of symmetric polynomials. In contrast, it was already known that unbounded GNNs (those whose size is allowed to change with the graph sizes) with piecewise polynomial activations can distinguish these vertices in only two iterations. Our results imply a strict separation between bounded and unbounded size GNNs, answering an open question formulated by [Grohe, 2021]. We next prove that if one allows activations that are not piecewise polynomial, then in two iterations a single neuron perceptron can distinguish the root vertices of any pair of nonisomorphic trees of depth two (our results hold for activations like the sigmoid, hyperbolic tan and others). This shows how the power of graph neural networks can change drastically if one changes the activation function of the neural networks. The proof of this result utilizes the Lindemann-Weierstrauss theorem from transcendental number theory.

摘要
在本文中，我们提出了新的结果关于图神经网络（GNNs）的表达能力。我们证明了，对任何具有分割 polynomials 活化函数的 GNN，其架构大小不随图像大小增长，那么存在两个非同构的根树，其中根节点不可以在无限多轮 iterations 中被 GNN отличить。证明基于同态多项式代数的工具。在对比之下，已知无穷 GNNs（允许架构大小随图像大小变化）可以在两轮 iterations 中分辨这两个根节点。我们的结果表明，有穷 GNNs 和无穷 GNNs 之间存在彻底的分化，解答了 [Grohe, 2021] 提出的开问。我们接着证明，允许非分割 polynomials 活化函数的情况下，在两轮 iterations 中，单个神经元某元素权重网络可以分辨任何两个非同构的深度两个根树的根节点（我们的结果适用于如 сиги模、恒下弯和其他活化函数）。这示cases how the power of graph neural networks can change drastically if one changes the activation function of the neural networks. The proof of this result utilizes the Lindemann-Weierstrauss theorem from transcendental number theory.

Active Learning for Video Classification with Frame Level Queries

paper_url: http://arxiv.org/abs/2307.05587
repo_url: None
paper_authors: Debanjan Goswami, Shayok Chakraborty
for: 这个研究目的是为了更进一步削减人工标注员的努力，使得将机器学习模型训练需要的标注数量更加少。
methods: 我们提出了一个新的活动学习框架，使用了不确定度和多样性的标准来选择具有代表性的视频和具有价值的几帧帧。
results: 我们的方法可以将人工标注员的努力削减至只需要审核几帧帧，而不是观看整个视频。这些结果表明了我们的方法可以帮助将机器学习模型训练需要的标注数量降低，并且可以更好地利用人工标注员的时间和努力。

Abstract
Deep learning algorithms have pushed the boundaries of computer vision research and have depicted commendable performance in a variety of applications. However, training a robust deep neural network necessitates a large amount of labeled training data, acquiring which involves significant time and human effort. This problem is even more serious for an application like video classification, where a human annotator has to watch an entire video end-to-end to furnish a label. Active learning algorithms automatically identify the most informative samples from large amounts of unlabeled data; this tremendously reduces the human annotation effort in inducing a machine learning model, as only the few samples that are identified by the algorithm, need to be labeled manually. In this paper, we propose a novel active learning framework for video classification, with the goal of further reducing the labeling onus on the human annotators. Our framework identifies a batch of exemplar videos, together with a set of informative frames for each video; the human annotator needs to merely review the frames and provide a label for each video. This involves much less manual work than watching the complete video to come up with a label. We formulate a criterion based on uncertainty and diversity to identify the informative videos and exploit representative sampling techniques to extract a set of exemplar frames from each video. To the best of our knowledge, this is the first research effort to develop an active learning framework for video classification, where the annotators need to inspect only a few frames to produce a label, rather than watching the end-to-end video.

摘要

Multimodal brain age estimation using interpretable adaptive population-graph learning

paper_url: http://arxiv.org/abs/2307.04639
repo_url: https://github.com/bintsi/adaptive-graph-learning
paper_authors: Kyriaki-Margarita Bintsi, Vasileios Baltatzis, Rolandos Alexandros Potamias, Alexander Hammers, Daniel Rueckert
for: 本研究旨在提出一种基于图 convolutional neural network (GCN) 的人脑年龄估计方法，以便在诊断阿尔ц海默病等nehrodegenerative疾病中提供有价值的信息。
methods: 本研究使用了图建构机制（GCN）和注意力机制（Attention），通过学习人脑图的结构，从而提高人脑年龄估计的准确性。
results: 比较static图建构方法和其他适应方法，本研究的方法在人脑年龄估计和分类任务上表现出色，并且通过图像和非图像特征（phenotypes）的权重赋值，提高了图建构的可读性。

Abstract
Brain age estimation is clinically important as it can provide valuable information in the context of neurodegenerative diseases such as Alzheimer's. Population graphs, which include multimodal imaging information of the subjects along with the relationships among the population, have been used in literature along with Graph Convolutional Networks (GCNs) and have proved beneficial for a variety of medical imaging tasks. A population graph is usually static and constructed manually using non-imaging information. However, graph construction is not a trivial task and might significantly affect the performance of the GCN, which is inherently very sensitive to the graph structure. In this work, we propose a framework that learns a population graph structure optimized for the downstream task. An attention mechanism assigns weights to a set of imaging and non-imaging features (phenotypes), which are then used for edge extraction. The resulting graph is used to train the GCN. The entire pipeline can be trained end-to-end. Additionally, by visualizing the attention weights that were the most important for the graph construction, we increase the interpretability of the graph. We use the UK Biobank, which provides a large variety of neuroimaging and non-imaging phenotypes, to evaluate our method on brain age regression and classification. The proposed method outperforms competing static graph approaches and other state-of-the-art adaptive methods. We further show that the assigned attention scores indicate that there are both imaging and non-imaging phenotypes that are informative for brain age estimation and are in agreement with the relevant literature.

摘要
<>使用Graph Convolutional Networks (GCNs)在文献中使用人口图（population graph），其中包括多Modal imaging信息和人口之间的关系，已经证明是有用的 для多种医学影像任务。然而，人口图的建构并不是一个轻松的任务，可能会影响GCN的性能，GCN本身是非常敏感于图结构。在这种情况下，我们提出了一个框架，可以学习优化的人口图结构，用于下游任务。我们使用一个注意力机制，将多Modal imaging和非 imaging特征（phenotypes）分配给权重，然后用这些权重进行边EXTRACTION。得到的图可以用来训练GCN。整个管道可以被训练END-to-END。此外，通过Visual化注意力权重，我们提高了图的可读性。我们使用UK Biobank，该提供了丰富的神经成像和非成像特征，来评估我们的方法在脑年龄回归和分类任务中的性能。我们的方法超过了相同的静止图方法和其他状态的适应方法。我们进一步表明，分配的注意力分数指示，有 both imaging和非 imaging特征是脑年龄估计中有用的，与相关文献一致。<>

Weakly-supervised positional contrastive learning: application to cirrhosis classification

paper_url: http://arxiv.org/abs/2307.04617
repo_url: https://github.com/guerbet-ai/wsp-contrastive
paper_authors: Emma Sarfati, Alexandre Bône, Marc-Michel Rohé, Pietro Gori, Isabelle Bloch
for: 这个研究是为了提出一种高效的弱监督位置学习（WSP）对称学习策略，以搭配弱标签和空间上的排序信息来提高医学影像分类的精度。
methods: 本研究使用了一种具有内部积分函数的通用核心损失函数，将弱标签和空间上的排序信息融合到一起，以提高医学影像分类的精度。
results: 研究结果显示，与基准模型相比，提案的模型在内部数据集上提高了分类AUC的值 by 5%，并在公共的LIHC数据集上提高了分类AUC的值 by 26%。

Abstract
Large medical imaging datasets can be cheaply and quickly annotated with low-confidence, weak labels (e.g., radiological scores). Access to high-confidence labels, such as histology-based diagnoses, is rare and costly. Pretraining strategies, like contrastive learning (CL) methods, can leverage unlabeled or weakly-annotated datasets. These methods typically require large batch sizes, which poses a difficulty in the case of large 3D images at full resolution, due to limited GPU memory. Nevertheless, volumetric positional information about the spatial context of each 2D slice can be very important for some medical applications. In this work, we propose an efficient weakly-supervised positional (WSP) contrastive learning strategy where we integrate both the spatial context of each 2D slice and a weak label via a generic kernel-based loss function. We illustrate our method on cirrhosis prediction using a large volume of weakly-labeled images, namely radiological low-confidence annotations, and small strongly-labeled (i.e., high-confidence) datasets. The proposed model improves the classification AUC by 5% with respect to a baseline model on our internal dataset, and by 26% on the public LIHC dataset from the Cancer Genome Atlas. The code is available at: https://github.com/Guerbet-AI/wsp-contrastive.

摘要
大量医学成像数据集可以便宜地和快速地进行低信度、弱标注（例如，放射学分数）的标注。高信度标签，如 histology-based 诊断，则非常罕见和昂贵。预训练策略，如对冲学习（CL）方法，可以利用无标签或弱标签数据集。这些方法通常需要大批量大小，但是在大量3D图像的全分辨率下，由于 GPU 内存限制，这可能会增加困难。然而，三维位势信息可以在某些医学应用中非常重要。在这种情况下，我们提出了一种高效的弱指导位势对比（WSP）对比学习策略，该策略通过一种通用的 kernel-based 损失函数来整合每个2D slice 的空间上下文和弱标签。我们在 cirrhosis 预测 task 上使用了一大量的弱标签图像，即放射学低信度标注，以及一些小型、强标签（即高信度标签）数据集。我们的模型与基线模型在我们的内部数据集上提高了分类 AUC 值5%，并在 LIHC 数据集上提高了26%。代码可以在：https://github.com/Guerbet-AI/wsp-contrastive 中找到。

MiVOLO: Multi-input Transformer for Age and Gender Estimation

paper_url: http://arxiv.org/abs/2307.04616
repo_url: https://github.com/wildchlamydia/mivolo
paper_authors: Maksim Kuprashevich, Irina Tolstykh
for: 这篇论文旨在提出一种基于视觉变换器的年龄和性别认知方法，以便在野外环境中进行年龄和性别预测。
methods: 该方法使用最新的视觉变换器，并将年龄和性别两个任务集成到一个共同的双输入/输出模型中，以利用人脸信息和人像数据。
results: 经过实验表明，该模型在四个流行的标准benchmark上达到了状态机器人性能，并且在实时处理方面表现出色。此外，我们还引入了一个基于Open Images Dataset的新的benchmark，并提供了高精度的人工筛选的标注数据。最终，我们比较了该模型的年龄认知性能与人类水平，并证明它在大多数年龄范围内有显著优势。

Abstract
Age and gender recognition in the wild is a highly challenging task: apart from the variability of conditions, pose complexities, and varying image quality, there are cases where the face is partially or completely occluded. We present MiVOLO (Multi Input VOLO), a straightforward approach for age and gender estimation using the latest vision transformer. Our method integrates both tasks into a unified dual input/output model, leveraging not only facial information but also person image data. This improves the generalization ability of our model and enables it to deliver satisfactory results even when the face is not visible in the image. To evaluate our proposed model, we conduct experiments on four popular benchmarks and achieve state-of-the-art performance, while demonstrating real-time processing capabilities. Additionally, we introduce a novel benchmark based on images from the Open Images Dataset. The ground truth annotations for this benchmark have been meticulously generated by human annotators, resulting in high accuracy answers due to the smart aggregation of votes. Furthermore, we compare our model's age recognition performance with human-level accuracy and demonstrate that it significantly outperforms humans across a majority of age ranges. Finally, we grant public access to our models, along with the code for validation and inference. In addition, we provide extra annotations for used datasets and introduce our new benchmark.

摘要
在野外中，年龄和性别识别是一项非常具有挑战性的任务：除了条件的变化、姿势复杂度和图像质量的变化外，还有情况下面部或完全被遮盖。我们介绍了 MiVOLO（多输入VOLO），一种简单的方法，使用最新的视觉变换器进行年龄和性别估算。我们的方法将这两个任务集成到一个统一的双输入/输出模型中，利用人像数据以及脸部信息。这会提高我们的模型的总体化能力，使其能够在图像中不可见的脸部时还能达到满意的结果。为评估我们提出的模型，我们进行了四个流行的benchmark测试，并实现了实时处理能力。此外，我们还创建了一个基于Open Images Dataset的新的benchmark，其中的ground truth标注由人工标注员仔细生成，因此得到了高度的准确率。此外，我们比较了我们的年龄识别性能与人类水平的准确率，并证明它在大多数年龄范围内Significantly Outperform humans。最后，我们向公众开放了我们的模型，同时提供了验证和推理代码。此外，我们还提供了其他的标注，并引入了我们的新benchmark。

EchoVest: Real-Time Sound Classification and Depth Perception Expressed through Transcutaneous Electrical Nerve Stimulation

paper_url: http://arxiv.org/abs/2307.04604
repo_url: None
paper_authors: Jesse Choe, Siddhant Sood, Ryan Park
for: The paper aims to develop an assistive device for blind/deaf individuals to enhance their awareness of their environment, with a focus on sound classification and localization.
methods: The paper employs a novel audio pipeline that combines the Audio Spectrogram Transformer (AST) model and Fast Fourier Transforms for noise reduction, as well as Otsu’s Method for background noise sound filtering and Complex Time Difference of Arrival algorithms for direction and depth calculation.
results: The final algorithm achieved state-of-the-art results on numerous checkpoints, including a 95.7% accuracy on the ESC-50 dataset for environmental sound classification.Here’s the simplified Chinese text for the three key points:
for: 这篇论文目标是开发一种助听设备，帮助失明聋听人更好地了解环境，特色在声音分类和本地化。
methods: 这篇论文采用了一种新的音频管线，将Audio Spectrogram Transformer（AST）模型和快速傅里埃变换（FFT）组合用于声音干扰reduction，以及OTSU的方法来过滤背景噪声，并使用复杂时差到达算法来计算方向和深度。
results: 最终算法在多个检查点上达到了顶尖的结果，包括ESC-50数据集上的声音分类准确率95.7%。

Abstract
Over 1.5 billion people worldwide live with hearing impairment. Despite various technologies that have been created for individuals with such disabilities, most of these technologies are either extremely expensive or inaccessible for everyday use in low-medium income countries. In order to combat this issue, we have developed a new assistive device, EchoVest, for blind/deaf people to intuitively become more aware of their environment. EchoVest transmits vibrations to the user's body by utilizing transcutaneous electric nerve stimulation (TENS) based on the source of the sounds. EchoVest also provides various features, including sound localization, sound classification, noise reduction, and depth perception. We aimed to outperform CNN-based machine-learning models, the most commonly used machine learning model for classification tasks, in accuracy and computational costs. To do so, we developed and employed a novel audio pipeline that adapts the Audio Spectrogram Transformer (AST) model, an attention-based model, for our sound classification purposes, and Fast Fourier Transforms for noise reduction. The application of Otsu's Method helped us find the optimal thresholds for background noise sound filtering and gave us much greater accuracy. In order to calculate direction and depth accurately, we applied Complex Time Difference of Arrival algorithms and SOTA localization. Our last improvement was to use blind source separation to make our algorithms applicable to multiple microphone inputs. The final algorithm achieved state-of-the-art results on numerous checkpoints, including a 95.7\% accuracy on the ESC-50 dataset for environmental sound classification.

摘要
全球1.5亿人口中有听力障碍，尽管有许多为这些人群开发的技术，但大多数这些技术是非常昂贵或在低中收入国家中不可 accessing。为解决这个问题，我们开发了一个新的助手设备——EchoVest，用于听力障碍人群更好地了解环境。EchoVest通过使用皮肤电刺激（TENS）技术，将声音传递给用户的身体，并提供了多种功能，如声音地图、声音分类、干扰reduction和深度感知。我们想要在精度和计算成本两个方面超越基于Convolutional Neural Networks（CNN）的机器学习模型，因此我们开发了一个新的音频管道，使用Audio Spectrogram Transformer（AST）模型，并使用快速傅立叶变换来减少干扰。使用欧氏方法，我们可以查找最佳背景声音滤波的阈值，从而提高了准确性。为了计算方向和深度，我们应用了复杂时间差分解算法和当今最佳地址算法。最后，我们使用盲源分离来使我们的算法适用于多个麦克风输入。最终算法达到了现有最佳结果，包括ESC-50数据集上的95.7%准确率。

DBFed: Debiasing Federated Learning Framework based on Domain-Independent

paper_url: http://arxiv.org/abs/2307.05582
repo_url: None
paper_authors: Jiale Li, Zhixin Li, Yibo Wang, Yao Li, Lei Wang
for: 本研究旨在提出一种基于领域独立的 federated learning 框架，以解决多个主体数据岛化问题，并 mitigate 模型偏见问题。
methods: 本研究使用了客户端生成的敏感特征来Explicitly 编码敏感特征，以避免模型偏见问题。
results: 实验结果显示，DBFed 比三种比较方法中的大多数指标都高于，这 fully demonstrably 表明 DBFed 的偏见纠正效果。

Abstract
As digital transformation continues, enterprises are generating, managing, and storing vast amounts of data, while artificial intelligence technology is rapidly advancing. However, it brings challenges in information security and data security. Data security refers to the protection of digital information from unauthorized access, damage, theft, etc. throughout its entire life cycle. With the promulgation and implementation of data security laws and the emphasis on data security and data privacy by organizations and users, Privacy-preserving technology represented by federated learning has a wide range of application scenarios. Federated learning is a distributed machine learning computing framework that allows multiple subjects to train joint models without sharing data to protect data privacy and solve the problem of data islands. However, the data among multiple subjects are independent of each other, and the data differences in quality may cause fairness issues in federated learning modeling, such as data bias among multiple subjects, resulting in biased and discriminatory models. Therefore, we propose DBFed, a debiasing federated learning framework based on domain-independent, which mitigates model bias by explicitly encoding sensitive attributes during client-side training. This paper conducts experiments on three real datasets and uses five evaluation metrics of accuracy and fairness to quantify the effect of the model. Most metrics of DBFed exceed those of the other three comparative methods, fully demonstrating the debiasing effect of DBFed.

摘要
为数统计 револю过程中，企业创生、管理和储存大量数据，同时人工智能技术快速发展。然而，这带来数据安全和资讯安全的挑战。数据安全指的是保护数据 digitization 过程中的数据免被未经授权的存取、损坏、窃取等行为。随着数据安全法规的推广和组织和用户对数据安全和隐私的重视，隐私保持技术如联邦学习被应用在各个应用场景中。联邦学习是一种分布式机器学习计算框架，允许多个主题共同训练无需分享数据，以保护数据隐私和解决数据岛问题。然而，各个主题的数据独立于对方，并且各个数据质量的不同可能导致该课程优化问题，如主题间的数据偏见，从而导致不公正和歧视的模型。因此，我们提出了DBFed，一个基于领域独立的偏见调整联邦学习框架，通过客边端训练中明确地编码敏感特征，以 Mitigate 模型偏见。本文对三个真实数据集进行实验，使用五个评估 метри来衡量模型的影响。大多数DBFed的 métriques exceed 其他三种比较方法的 métriques，全面展示了DBFed的偏见调整效果。

AnyTeleop: A General Vision-Based Dexterous Robot Arm-Hand Teleoperation System

paper_url: http://arxiv.org/abs/2307.04577
repo_url: None
paper_authors: Yuzhe Qin, Wei Yang, Binghao Huang, Karl Van Wyk, Hao Su, Xiaolong Wang, Yu-Wei Chao, Dieter Fox
for: 这篇论文目的是提出一个通用的视觉控制系统，以支持多种不同的机器人臂、手、现实和摄像头配置，并且可以在单个系统中实现。
methods: 该系统使用了一种通用的视觉控制方法，可以在多种不同的simulator和实际环境中进行操作。
results: 实际实验和虚拟实验中，AnyTeleop系统可以达到更高的成功率和更好的模仿学习性能，相比之前特定设计 для该机器人硬件的系统。

Abstract
Vision-based teleoperation offers the possibility to endow robots with human-level intelligence to physically interact with the environment, while only requiring low-cost camera sensors. However, current vision-based teleoperation systems are designed and engineered towards a particular robot model and deploy environment, which scales poorly as the pool of the robot models expands and the variety of the operating environment increases. In this paper, we propose AnyTeleop, a unified and general teleoperation system to support multiple different arms, hands, realities, and camera configurations within a single system. Although being designed to provide great flexibility to the choice of simulators and real hardware, our system can still achieve great performance. For real-world experiments, AnyTeleop can outperform a previous system that was designed for a specific robot hardware with a higher success rate, using the same robot. For teleoperation in simulation, AnyTeleop leads to better imitation learning performance, compared with a previous system that is particularly designed for that simulator. Project page: http://anyteleop.com/.

摘要
“视觉基于的 теле操作可以赋予机器人人类水平的智能，同时只需要低成本的摄像头感知器。然而，当前的视觉基于的 теле操作系统是为特定的机器人模型和部署环境设计和工程，这会随着机器人模型池的扩展和运行环境的多样化而扩展不佳。在这篇论文中，我们提议了 AnyTeleop，一个通用和普适的 теле操作系统，可以支持多种不同的臂、手、现实和摄像头配置。尽管我们的系统设计了大量的灵活性，但它仍然可以实现高性能。在实际实验中， AnyTeleop 可以比一个特定机器人硬件设计的系统更高的成功率，使用同样的机器人。在模拟中， AnyTeleop 比特定适用于该模拟器的系统更好的学习效果。项目页面：http://anyteleop.com/。”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

A Semi-Automated Solution Approach Selection Tool for Any Use Case via Scopus and OpenAI: a Case Study for AI/ML in Oncology

paper_url: http://arxiv.org/abs/2307.04573
repo_url: None
paper_authors: Deniz Kenan Kılıç, Alex Elkjær Vasegaard, Aurélien Desoeuvres, Peter Nielsen
for: 本研究提出了一个半自动化的工具，用于方案方法评估和选择，供研究者、实践者和决策者使用，同时作为未来研究的benchmark。
methods: 本工具包括三个模组：(1) 文献选择和分数计算，使用关键词选择方案进行Scopus API查询和 Compute relevancy; (2) 方案方法提取在文献中使用OpenAI API; (3) 敏感分析和后analyzes。
results: 研究显示，这个工具可以实现半自动化的方案方法评估和选择，并提供了各种应用场景的敏感分析和后analyzes。在肿瘤领域的 caso study 和多个使用案例中，该工具获得了有前提的结果，与手动真实比较。

Abstract
In today's vast literature landscape, a manual review is very time-consuming. To address this challenge, this paper proposes a semi-automated tool for solution method review and selection. It caters to researchers, practitioners, and decision-makers while serving as a benchmark for future work. The tool comprises three modules: (1) paper selection and scoring, using a keyword selection scheme to query Scopus API and compute relevancy; (2) solution method extraction in papers utilizing OpenAI API; (3) sensitivity analysis and post-analyzes. It reveals trends, relevant papers, and methods. AI in the oncology case study and several use cases are presented with promising results, comparing the tool to manual ground truth.

摘要
今天的文献景观中， manual review 非常时间consuming。为解决这个挑战，这篇论文提出了一种半自动化工具 для方法评估和选择。它适用于研究人员、实践者和决策者，同时也作为未来工作的标准。工具包括三个模块：（1）文献选择和分数计算，使用关键词选择方案查询Scopus API，计算相关性；（2）解决方法提取在论文中使用OpenAI API；（3）敏感分析和后置分析。它揭示了趋势、相关论文和方法。在肿瘤 случа子研究和一些实践案例中，提出了有前提的结果，与手动参照相比。

Unraveling the Age Estimation Puzzle: Comparative Analysis of Deep Learning Approaches for Facial Age Estimation

paper_url: http://arxiv.org/abs/2307.04570
repo_url: https://github.com/paplhjak/facial-age-estimation-benchmark
paper_authors: Jakub Paplham, Vojtech Franc
for: 本研究旨在解决不同年龄估计方法的比较带来的挑战，即发布结果的不可靠性，归因于测试过程中的不一致性。先前的研究报告了过去十年内特殊方法的不断改进，但我们的发现证明这些报告是不准确的。
methods: 本研究使用了跨Entropy损失函数作为标准方法，并系统分析了影响年龄估计结果的多种因素，包括脸部对齐、脸部覆盖率、图像分辨率、图像表示方式、模型架构和数据量。
results: 我们发现，这些因素通常对年龄估计结果产生更大的影响，而不是选择年龄估计方法本身。我们还评估了每种方法的泛化能力，通过评估每种方法在公共可用的年龄估计数据集上的跨数据集性能。结果强调了使用一致的数据处理方法和建立标准 benchmarks，以确保可靠和意义的比较。

Abstract
Comparing different age estimation methods poses a challenge due to the unreliability of published results, stemming from inconsistencies in the benchmarking process. Previous studies have reported continuous performance improvements over the past decade using specialized methods; however, our findings challenge these claims. We argue that, for age estimation tasks outside of the low-data regime, designing specialized methods is unnecessary, and the standard approach of utilizing cross-entropy loss is sufficient. This paper aims to address the benchmark shortcomings by evaluating state-of-the-art age estimation methods in a unified and comparable setting. We systematically analyze the impact of various factors, including facial alignment, facial coverage, image resolution, image representation, model architecture, and the amount of data on age estimation results. Surprisingly, these factors often exert a more significant influence than the choice of the age estimation method itself. We assess the generalization capability of each method by evaluating the cross-dataset performance for publicly available age estimation datasets. The results emphasize the importance of using consistent data preprocessing practices and establishing standardized benchmarks to ensure reliable and meaningful comparisons. The source code is available at https://github.com/paplhjak/Facial-Age-Estimation-Benchmark.

摘要
This paper aims to address benchmark shortcomings by evaluating state-of-the-art age estimation methods in a unified and comparable setting. We systematically analyze the impact of various factors, including facial alignment, facial coverage, image resolution, image representation, model architecture, and the amount of data on age estimation results. Surprisingly, these factors often have a more significant influence than the choice of age estimation method.We assess the generalization capability of each method by evaluating cross-dataset performance for publicly available age estimation datasets. The results emphasize the importance of using consistent data preprocessing practices and establishing standardized benchmarks to ensure reliable and meaningful comparisons. The source code is available at https://github.com/paplhjak/Facial-Age-Estimation-Benchmark.

Interpreting and generalizing deep learning in physics-based problems with functional linear models

paper_url: http://arxiv.org/abs/2307.04569
repo_url: None
paper_authors: Amirhossein Arzani, Lingxiao Yuan, Pania Newell, Bei Wang
for: 本研究旨在提供一种可解释的深度学习模型，以提高其在不同数据集上的泛化能力和可读性。
methods: 我们提议使用函数数据分析（FDA）的概念，设计一种通用函数线性模型来替代已经训练过的深度学习模型。我们可以使用不同的kernel函数库，并使用稀疏回归来找到一个可解释的替代模型。
results: 我们的模型可以与深度学习模型具有相同的准确率，同时提高对于不同数据集的泛化能力，并且可以提供更多的可读性和可解释性。我们在固体力学、流体力学和运输等领域进行了测试，并得到了良好的结果。

Abstract
Although deep learning has achieved remarkable success in various scientific machine learning applications, its black-box nature poses concerns regarding interpretability and generalization capabilities beyond the training data. Interpretability is crucial and often desired in modeling physical systems. Moreover, acquiring extensive datasets that encompass the entire range of input features is challenging in many physics-based learning tasks, leading to increased errors when encountering out-of-distribution (OOD) data. In this work, motivated by the field of functional data analysis (FDA), we propose generalized functional linear models as an interpretable surrogate for a trained deep learning model. We demonstrate that our model could be trained either based on a trained neural network (post-hoc interpretation) or directly from training data (interpretable operator learning). A library of generalized functional linear models with different kernel functions is considered and sparse regression is used to discover an interpretable surrogate model that could be analytically presented. We present test cases in solid mechanics, fluid mechanics, and transport. Our results demonstrate that our model can achieve comparable accuracy to deep learning and can improve OOD generalization while providing more transparency and interpretability. Our study underscores the significance of interpretability in scientific machine learning and showcases the potential of functional linear models as a tool for interpreting and generalizing deep learning.

摘要
Inspired by the field of functional data analysis (FDA), we propose generalized functional linear models as an interpretable surrogate for a trained deep learning model. Our model can be trained either based on a trained neural network (post-hoc interpretation) or directly from training data (interpretable operator learning). We consider a library of generalized functional linear models with different kernel functions and use sparse regression to discover an interpretable surrogate model that can be analytically presented.We demonstrate our model's effectiveness through test cases in solid mechanics, fluid mechanics, and transport. Our results show that our model can achieve comparable accuracy to deep learning and can improve OOD generalization while providing more transparency and interpretability. Our study highlights the importance of interpretability in scientific machine learning and showcases the potential of functional linear models as a tool for interpreting and generalizing deep learning.

Automatically detecting activities of daily living from in-home sensors as indicators of routine behaviour in an older population

paper_url: http://arxiv.org/abs/2307.04563
repo_url: None
paper_authors: Claire M. Timon, Pamela Hussey, Hyowon Lee, Catriona Murphy, Harsh Vardan Rai, and Alan F. Smeaton
for: 这个研究是为了开发一个基于互联网物 Things（IoT）系统，与数据分析，来提供不干扰的健康和待遇监控，以支持年长者在家中独立生活。
methods: 这个研究使用了一个Action Research Cycle（ARC）试验，试验了23名参与者，每个参与者有约20个IoT仪器在家中。在ARC试验中，参与者参加了两次数据告诉会，每次都会显示参与者的家中活动。这些会议也收集了参与者对检测活动的准确性的反馈。
results: 这个研究发现，使用了联合规则探索，可以独立地检测参与者的每天生活活动（ADL），并且可以使用单一的规则来检测各个参与者的ADL。这意味着，可以降低参与者提供训练数据的必要性，并且可以让更多的参与者加入系统。

Abstract
Objective: The NEX project has developed an integrated Internet of Things (IoT) system coupled with data analytics to offer unobtrusive health and wellness monitoring supporting older adults living independently at home. Monitoring {currently} involves visualising a set of automatically detected activities of daily living (ADLs) for each participant. The detection of ADLs is achieved {} to allow the incorporation of additional participants whose ADLs are detected without re-training the system. Methods: Following an extensive User Needs and Requirements study involving 426 participants, a pilot trial and a friendly trial of the deployment, an Action Research Cycle (ARC) trial was completed. This involved 23 participants over a 10-week period each with c.20 IoT sensors in their homes. During the ARC trial, participants each took part in two data-informed briefings which presented visualisations of their own in-home activities. The briefings also gathered training data on the accuracy of detected activities. Association rule mining was then used on the combination of data from sensors and participant feedback to improve the automatic detection of ADLs. Results: Association rule mining was used to detect a range of ADLs for each participant independently of others and was then used to detect ADLs across participants using a single set of rules {for each ADL}. This allows additional participants to be added without the necessity of them providing training data. Conclusions: Additional participants can be added to the NEX system without the necessity to re-train the system for automatic detection of the set of their activities of daily living.

摘要
目标：NEX项目已经开发了一个集成互联网智能（IoT）系统，并结合数据分析，以提供不间断的健康和休闲监测，支持年轻者在家中独立生活。监测当前包括图示每名参与者的自动检测的日常生活活动（ADLs）的集成。方法：经过了426名参与者的用户需求和要求研究，飞行试验和友好试验的部署，完成了一个Action Research Cycle（ARC）试验。这 involve了23名参与者，每名参与者在10周期内有约20个IoT传感器在家中。在ARC试验期间，参与者每人参加了两次数据驱动的会议，其中显示了每名参与者的家中活动的视觉化。这些会议还收集了参与者对检测的准确性的反馈，以便进一步改进自动检测ADLs的方法。结论：通过对传感器和参与者反馈的组合使用关联规则挖掘，可以独立地检测每名参与者的ADLs，并且可以在所有参与者之间共享同一组则。这意味着可以在不需要更多的训练数据的情况下，将更多的参与者添加到NEX系统中。

Gradient Surgery for One-shot Unlearning on Generative Model

paper_url: http://arxiv.org/abs/2307.04550
repo_url: None
paper_authors: Seohui Bae, Seoyoon Kim, Hyemin Jung, Woohyung Lim
for: 这篇论文目的是为了解释如何将数据的影响力除去 deep generative model 中。
methods: 本文提出了一个简单又有效的方法，通过将条件变数调整到正常方向，以干扰条件变数的影响。
results: 本文比较了现有的基eline，并提供了理论分析，证明了本方法可以高效地删除数据的影响。

Abstract
Recent regulation on right-to-be-forgotten emerges tons of interest in unlearning pre-trained machine learning models. While approximating a straightforward yet expensive approach of retrain-from-scratch, recent machine unlearning methods unlearn a sample by updating weights to remove its influence on the weight parameters. In this paper, we introduce a simple yet effective approach to remove a data influence on the deep generative model. Inspired by works in multi-task learning, we propose to manipulate gradients to regularize the interplay of influence among samples by projecting gradients onto the normal plane of the gradients to be retained. Our work is agnostic to statistics of the removal samples, outperforming existing baselines while providing theoretical analysis for the first time in unlearning a generative model.

摘要
最近的强制忘记法规则促使大量关注在强制忘记前训练机器学习模型上。而 aproximating一个直接又昂贵的重新训练方法，现代机器忘记方法在一个样本上忘记，通过更新权重来消除其影响于权重参数。在这篇论文中，我们介绍了一种简单又有效的方法，用于从深度生成模型中移除数据的影响。受到多任务学习的启发，我们提议在欠拟合的情况下，对权重参数进行抑制，以 regularize各个样本之间的影响相互作用。我们的方法不依赖于废弃样本的统计特征，超越现有的基eline，并提供了理论分析。

StyleGAN2-based Out-of-Distribution Detection for Medical Imaging

paper_url: http://arxiv.org/abs/2307.10193
repo_url: None
paper_authors: McKell Woodland, John Wood, Caleb O’Connor, Ankit B. Patel, Kristy K. Brock
for: 这个研究旨在探讨一种基于生成敌方网络（GAN）的方法，用于检测显像处理器中的非常训练分布（out-of-distribution，OOD）图像。
methods: 这个研究使用了StyleGAN2-ADA架构，并使用了反射推导来重建图像。测试数据包括了脑、头颈、肺、阴道和异常肝脏的 Computed Tomography（CT）图像。
results: 研究发现，这个方法可以对CT图像进行非常好的分类，并且完全无法重建肝脏异常，如针和痰液。AUROC值高于90%，这表明这个方法可以实现非常好的OOD检测。

Abstract
One barrier to the clinical deployment of deep learning-based models is the presence of images at runtime that lie far outside the training distribution of a given model. We aim to detect these out-of-distribution (OOD) images with a generative adversarial network (GAN). Our training dataset was comprised of 3,234 liver-containing computed tomography (CT) scans from 456 patients. Our OOD test data consisted of CT images of the brain, head and neck, lung, cervix, and abnormal livers. A StyleGAN2-ADA architecture was employed to model the training distribution. Images were reconstructed using backpropagation. Reconstructions were evaluated using the Wasserstein distance, mean squared error, and the structural similarity index measure. OOD detection was evaluated with the area under the receiver operating characteristic curve (AUROC). Our paradigm distinguished between liver and non-liver CT with greater than 90% AUROC. It was also completely unable to reconstruct liver artifacts, such as needles and ascites.

摘要
一个阻碍深度学习模型在临床应用的问题是运行时存在远离训练分布的图像。我们使用生成对抗网络（GAN）来探测这些外部分布（OOD）图像。我们的训练集包括3234个liver-包含 computed tomography（CT）扫描图像，来自456名患者。我们的OOD测试数据包括CT图像的脑、头颈、肺、颈部和异常liver。我们采用了StyleGAN2-ADA架构来模型训练分布。图像使用反射推导重建。重建图像的评估方法包括 Wasserstein 距离、平均方差和结构相似度指标。OOD检测的评估方法包括接受操作特征分布曲线（AUROC）。我们的方法可以在liver和非liver CT之间进行分类，AUROC高于90%。此外，我们的方法完全无法重建liver残留物，如针和液体。

Pathway toward prior knowledge-integrated machine learning in engineering

paper_url: http://arxiv.org/abs/2307.06950
repo_url: None
paper_authors: Xia Chen, Philipp Geyer
for: 本研究旨在整合多学科领域专业知识和数据驱动过程，以便将域知识传递到机器可识别的数据驱动过程中。
methods: 本研究使用多学科领域专业知识的整合和数据驱动技术，包括信息不确定性源的检查和知识归约三层知识集成机器学习平台。
results: 本研究可以均衡全景和分解视角，满足不同领域专业人员的需求，同时也能够利用域知识来提高数据驱动过程的精度和可靠性。

Abstract
Despite the digitalization trend and data volume surge, first-principles models (also known as logic-driven, physics-based, rule-based, or knowledge-based models) and data-driven approaches have existed in parallel, mirroring the ongoing AI debate on symbolism versus connectionism. Research for process development to integrate both sides to transfer and utilize domain knowledge in the data-driven process is rare. This study emphasizes efforts and prevailing trends to integrate multidisciplinary domain professions into machine acknowledgeable, data-driven processes in a two-fold organization: examining information uncertainty sources in knowledge representation and exploring knowledge decomposition with a three-tier knowledge-integrated machine learning paradigm. This approach balances holist and reductionist perspectives in the engineering domain.

摘要
Simplified Chinese:不withstanding数字化趋势和数据量的急剧增长，首 принциples模型（也称为逻辑驱动、物理学驱动、规则驱动或知识驱动模型）和数据驱动方法在平行的方式存在，反映了人工智能中符号主义VS连接主义的讨论。对于过程发展来 integrate both sides的研究 rare。这种研究强调在机器可识别的数据驱动过程中 Multidisciplinary domain professions的集成，使用 two-fold 组织方式：对知识表示中的信息不确定源进行检查，并 explore knowledge decomposition with a three-tier knowledge-integrated machine learning paradigm。这种方法平衡了整体和分解的视角在工程领域。

DADO – Low-Cost Selection Strategies for Deep Active Design Optimization

paper_url: http://arxiv.org/abs/2307.04536
repo_url: None
paper_authors: Jens Decke, Christian Gruhl, Lukas Rauch, Bernhard Sick
for: 降低计算成本的数值仿真 simulate computationally expensive numerical simulations.
methods: 使用深度学习自动学习来优化设计。
results: 提高设计优化的效率，减少计算成本。

Abstract
In this experience report, we apply deep active learning to the field of design optimization to reduce the number of computationally expensive numerical simulations. We are interested in optimizing the design of structural components, where the shape is described by a set of parameters. If we can predict the performance based on these parameters and consider only the promising candidates for simulation, there is an enormous potential for saving computing power. We present two selection strategies for self-optimization to reduce the computational cost in multi-objective design optimization problems. Our proposed methodology provides an intuitive approach that is easy to apply, offers significant improvements over random sampling, and circumvents the need for uncertainty estimation. We evaluate our strategies on a large dataset from the domain of fluid dynamics and introduce two new evaluation metrics to determine the model's performance. Findings from our evaluation highlights the effectiveness of our selection strategies in accelerating design optimization. We believe that the introduced method is easily transferable to other self-optimization problems.

摘要
在这份经验报告中，我们运用深度活动学来降低计算成本的数字实验 simulation 的数量。我们关注设计结构元件的优化，其形状由一组参数描述。如果我们可以根据这些参数预测性能，只考虑计算成本较低的候选者，那么可以很大减少计算力量。我们提出了两种选择策略来减少多目标设计优化问题中的计算成本。我们的提出的方法是INTUITIVE的，易于实施，比Random Sampling更好，无需 uncertainty estimation。我们对大量的流体动力学数据集进行了评估，并引入了两种新的评价指标来评估模型的性能。我们的评估结果表明，我们的选择策略在加速设计优化方面具有显著的效果。我们认为引入的方法可以轻松地应用于其他自动优化问题。

QBitOpt: Fast and Accurate Bitwidth Reallocation during Training

paper_url: http://arxiv.org/abs/2307.04535
repo_url: None
paper_authors: Jorn Peters, Marios Fournarakis, Markus Nagel, Mart van Baalen, Tijmen Blankevoort
for: 提高 mobil 和嵌入式设备上的准确性，量化神经网络是一种非常有效的方法。特别是混合精度量化（MPQ）网络，其层可以被量化到不同的位宽，可以实现对同等资源限制下的任务性能更好的表现。
methods: 我们提出了一种名为QBitOpt的新算法，用于在量化感知训练（QAT）中更新位宽。我们将位宽分配问题解释为一个约束优化问题，并通过将快速计算敏感度与高效的解决器结合在一起，可以在QAT中生成混合精度网络，保证任务性能和严格的资源约束之间具有高度的匹配。
results: 我们在ImageNet上测试了QBitOpt，并证明了我们可以比存在fixed和混合精度方法的情况下，在常见的位宽约束下表现更好。

Abstract
Quantizing neural networks is one of the most effective methods for achieving efficient inference on mobile and embedded devices. In particular, mixed precision quantized (MPQ) networks, whose layers can be quantized to different bitwidths, achieve better task performance for the same resource constraint compared to networks with homogeneous bitwidths. However, finding the optimal bitwidth allocation is a challenging problem as the search space grows exponentially with the number of layers in the network. In this paper, we propose QBitOpt, a novel algorithm for updating bitwidths during quantization-aware training (QAT). We formulate the bitwidth allocation problem as a constraint optimization problem. By combining fast-to-compute sensitivities with efficient solvers during QAT, QBitOpt can produce mixed-precision networks with high task performance guaranteed to satisfy strict resource constraints. This contrasts with existing mixed-precision methods that learn bitwidths using gradients and cannot provide such guarantees. We evaluate QBitOpt on ImageNet and confirm that we outperform existing fixed and mixed-precision methods under average bitwidth constraints commonly found in the literature.

摘要
“量化神经网络是一种最有效的方法来实现移动和嵌入式设备上快速的推理。特别是杂比例量化（MPQ）网络，它的层可以被量化到不同的比特宽，可以在同样的资源约束下提高任务性能。然而，寻找最佳比特宽分配是一个复杂的问题，因为搜索空间随着网络层数的增加而呈指数增长。在这篇论文中，我们提出了QBitOpt算法，它是一种在量化推理期间更新比特宽的算法。我们将比特宽分配问题形式化为一个约束优化问题。通过将快速计算的敏感度与高效的解决方案结合在一起，QBitOpt可以生成杂比例网络，保证任务性能高，同时也能够遵守常见的比特宽约束。这与现有的杂比例方法不同，它们通过梯度来学习比特宽，不能提供相同的保证。我们对ImageNet进行了评估，并证明了我们在平均比特宽约束下出perform existing固定和杂比例方法。”

Self Expanding Neural Networks

paper_url: http://arxiv.org/abs/2307.04526
repo_url: https://github.com/ferzaad/Diabetes
paper_authors: Rupert Mitchell, Martin Mundt, Kristian Kersting
for: 这个研究旨在提出一种自适应的神经网络架构，可以灵活地调整网络的大小和深度，以减少训练过程中的损失。
methods: 本研究使用自然几何基于的方法，将网络的宽度和深度适当地调整，以降低训练过程中的损失。 authors 并证明了网络扩展的速率和扩展分数的下界。
results: 研究人员透过实验证明了自适应神经网络的有效性，在分类和回授问题中均有好的表现。此外， authors 还证明了这种自适应神经网络可以在网络的大小和深度不确定情况下表现良好。

Abstract
The results of training a neural network are heavily dependent on the architecture chosen; and even a modification of only the size of the network, however small, typically involves restarting the training process. In contrast to this, we begin training with a small architecture, only increase its capacity as necessary for the problem, and avoid interfering with previous optimization while doing so. We thereby introduce a natural gradient based approach which intuitively expands both the width and depth of a neural network when this is likely to substantially reduce the hypothetical converged training loss. We prove an upper bound on the "rate" at which neurons are added, and a computationally cheap lower bound on the expansion score. We illustrate the benefits of such Self-Expanding Neural Networks in both classification and regression problems, including those where the appropriate architecture size is substantially uncertain a priori.

摘要
training一个神经网络的结果很受网络结构的影响，而 même一些小的修改会导致重新开始训练。相比之下，我们开始训练时选择小的网络 architecture，逐渐增加其容量，并避免在过程中对前一个优化器进行干扰。我们提出一种自然偏导的方法，其可以在可能减少假设的训练损失的情况下，自然地扩展神经网络的宽度和深度。我们证明了某种准则下的神经元数目增加的Upper bound，以及一种计算效率低的下界。我们还证明了这种自适应神经网络在分类和回归问题中的优势，包括一些预先不确定适应网络大小的问题。

Cluster-Induced Mask Transformers for Effective Opportunistic Gastric Cancer Screening on Non-contrast CT Scans

paper_url: http://arxiv.org/abs/2307.04525
repo_url: None
paper_authors: Mingze Yuan, Yingda Xia, Xin Chen, Jiawen Yao, Junli Wang, Mingyan Qiu, Hexin Dong, Jingren Zhou, Bin Dong, Le Lu, Li Zhang, Zaiyi Liu, Ling Zhang
for: 寻找一种可靠、低成本、质量高的胃癌检测方法，以帮助提高胃癌检测率和降低误 диагности Ratio。
methods: 我们提出了一种基于深度学习的方法，使用非对照CT扫描图像进行胃癌检测。我们的模型具有学习的团集，可以快速识别胃癌的特征，并且可以减少误分类的风险。
results: 我们的方法在一个包含100名癌症患者和148名正常人的测试集上实现了感知率85.0%和特征率92.6%。与两名医生的平均感知率（73.5%）和特征率（84.3%）相比，我们的方法表现更好。此外，我们在一个外部测试集上实现了特征率97.7%。这表明我们的方法可以作为一种新、不侵入、低成本、高精度的胃癌检测方法。

Abstract
Gastric cancer is the third leading cause of cancer-related mortality worldwide, but no guideline-recommended screening test exists. Existing methods can be invasive, expensive, and lack sensitivity to identify early-stage gastric cancer. In this study, we explore the feasibility of using a deep learning approach on non-contrast CT scans for gastric cancer detection. We propose a novel cluster-induced Mask Transformer that jointly segments the tumor and classifies abnormality in a multi-task manner. Our model incorporates learnable clusters that encode the texture and shape prototypes of gastric cancer, utilizing self- and cross-attention to interact with convolutional features. In our experiments, the proposed method achieves a sensitivity of 85.0% and specificity of 92.6% for detecting gastric tumors on a hold-out test set consisting of 100 patients with cancer and 148 normal. In comparison, two radiologists have an average sensitivity of 73.5% and specificity of 84.3%. We also obtain a specificity of 97.7% on an external test set with 903 normal cases. Our approach performs comparably to established state-of-the-art gastric cancer screening tools like blood testing and endoscopy, while also being more sensitive in detecting early-stage cancer. This demonstrates the potential of our approach as a novel, non-invasive, low-cost, and accurate method for opportunistic gastric cancer screening.

摘要
Gastic cancer 是全球第三大的肿瘤癌症死亡原因，但没有任何指南推荐的检测试验。现有的方法可能是侵入的，昂贵的，并且缺乏感知力来检测早期肿瘤。在这项研究中，我们探索了使用深度学习方法来非侵入性的检测肿瘤。我们提出了一种新的帧Transformer，它同时分割肿瘤和识别异常。我们的模型包含学习的团队，这些团队编码了肿瘤癌症的文本和形状原型，并利用自我和交叉关注来交互 WITH convolutional 特征。在我们的实验中，我们的方法实现了检测肿瘤的感知率为85.0%，特征率为92.6%。与两名放射学家的平均感知率（73.5%）和特征率（84.3%）相比，我们的方法表现出色。此外，我们在一个外部测试集上获得了特征率为97.7%，这个测试集包含903个正常的案例。我们的方法与已知的肿瘤癌症检测工具如血液测试和endoscopic一样有效，同时具有更高的感知力和更早的识别能力。这表明了我们的方法的潜在优势，作为一种新、非侵入的、低成本的、准确的肿瘤检测方法。

SAGC-A68: a space access graph dataset for the classification of spaces and space elements in apartment buildings

paper_url: http://arxiv.org/abs/2307.04515
repo_url: https://github.com/a2amir/sagc-a68
paper_authors: Amir Ziaee, Georg Suter
for: The paper is written for researchers and practitioners who are interested in developing Graph Deep Learning (GDL) models for space function and space element classification in the context of building design and analysis.
methods: The paper introduces a new dataset, SAGC-A68, which comprises access graphs automatically generated from 68 digital 3D models of space layouts of apartment buildings. The authors use this dataset to train and evaluate a graph attention network (GAT) that predicts 22 space function and 6 space element classes.
results: The authors demonstrate the potential of the dataset and the GAT model by achieving high accuracy rates on the test set. They also show that the GAT model outperforms other baseline models, indicating the effectiveness of using GDL methods for space function and space element classification.

Abstract
The analysis of building models for usable area, building safety, and energy use requires accurate classification data of spaces and space elements. To reduce input model preparation effort and errors, automated classification of spaces and space elements is desirable. A barrier hindering the utilization of Graph Deep Learning (GDL) methods to space function and space element classification is a lack of suitable datasets. To bridge this gap, we introduce a dataset, SAGC-A68, which comprises access graphs automatically generated from 68 digital 3D models of space layouts of apartment buildings. This graph-based dataset is well-suited for developing GDL models for space function and space element classification. To demonstrate the potential of the dataset, we employ it to train and evaluate a graph attention network (GAT) that predicts 22 space function and 6 space element classes. The dataset and code used in the experiment are available online. https://doi.org/10.5281/zenodo.7805872, https://github.com/A2Amir/SAGC-A68.

摘要
analysis of building models for usable area, building safety, and energy use requires accurate classification data of spaces and space elements. to reduce input model preparation effort and errors, automated classification of spaces and space elements is desirable. a barrier hindering the utilization of graph deep learning (gdl) methods to space function and space element classification is a lack of suitable datasets. to bridge this gap, we introduce a dataset, sagc-a68, which comprises access graphs automatically generated from 68 digital 3d models of space layouts of apartment buildings. this graph-based dataset is well-suited for developing gdl models for space function and space element classification. to demonstrate the potential of the dataset, we employ it to train and evaluate a graph attention network (gat) that predicts 22 space function and 6 space element classes. the dataset and code used in the experiment are available online. https://doi.org/10.5281/zenodo.7805872, https://github.com/A2Amir/SAGC-A68.Here's the translation in Traditional Chinese:分析建筑模型的用able面积、建筑安全和能源使用需要准确的分类数据空间和空间元素。为了减少输入模型准备的劳动和错误，自动分类空间和空间元素是可能的。 however，使用图深度学习（GDL）方法进行空间功能和空间元素分类存在一个障碍：缺乏适合的数据集。为了跨越这一障碍，我们介绍了一个数据集，SAGC-A68，该数据集包括自动生成的68个数字3D模型的空间布局的访问图。这个图基本数据集非常适合开发GDL模型进行空间功能和空间元素分类。为了证明该数据集的潜力，我们使用它来训练和评估一个图注意力网络（GAT），该网络预测22个空间功能和6个空间元素类型。该数据集和实验代码在线可用。https://doi.org/10.5281/zenodo.7805872, https://github.com/A2Amir/SAGC-A68。

Improving Heterogeneous Graph Learning with Weighted Mixed-Curvature Product Manifold

paper_url: http://arxiv.org/abs/2307.04514
repo_url: https://github.com/sharecodesubmission/weighted_product_manifold
paper_authors: Tuc Nguyen-Van, Dung D. Le, The-Anh Ta
for: 该论文主要研究 graphs 的 embedding 问题，即如何将 graphs 转换为低维度的 vector 空间中的表示，以便进行后续的计算和分析。
methods: 该论文提出了一种基于 weighted product manifolds 的方法，即 WEIGHTED-PM，来解决 graphs 的 embedding 问题。该方法利用输入 graph 的 topological 信息自动地确定每个 ком成分空间在 product spaces 中的权重，从而更好地表示输入 graph 的复杂结构。
results: 该论文通过 extensively 的实验表明，WEIGHTED-PM 方法可以从输入数据中学习更好的 graph 表示，并且在多个下游任务中表现更好，如 word similarity learning、top-$k$ recommendation 和 knowledge graph embedding。

Abstract
In graph representation learning, it is important that the complex geometric structure of the input graph, e.g. hidden relations among nodes, is well captured in embedding space. However, standard Euclidean embedding spaces have a limited capacity in representing graphs of varying structures. A promising candidate for the faithful embedding of data with varying structure is product manifolds of component spaces of different geometries (spherical, hyperbolic, or euclidean). In this paper, we take a closer look at the structure of product manifold embedding spaces and argue that each component space in a product contributes differently to expressing structures in the input graph, hence should be weighted accordingly. This is different from previous works which consider the roles of different components equally. We then propose WEIGHTED-PM, a data-driven method for learning embedding of heterogeneous graphs in weighted product manifolds. Our method utilizes the topological information of the input graph to automatically determine the weight of each component in product spaces. Extensive experiments on synthetic and real-world graph datasets demonstrate that WEIGHTED-PM is capable of learning better graph representations with lower geometric distortion from input data, and performs better on multiple downstream tasks, such as word similarity learning, top-$k$ recommendation, and knowledge graph embedding.

摘要
在图表示学中，capturing复杂的图形结构，如隐藏的节点关系，在嵌入空间中是非常重要的。然而，标准的欧几何嵌入空间有限的表示能力，无法捕捉不同结构的图。一种有前途的方法是使用不同geometry的组件空间的产品拓扑（球面、尖锥形、欧几何）。在这篇文章中，我们做了产品拓扑 embedding 空间的结构分析，并 argue that每个组件空间在产品中都有不同的表达效果，因此应该给予不同的权重。这与之前的作品不同，一般认为所有组件都应该具有相同的角色。我们然后提出了WEIGHTED-PM，一种数据驱动的方法，用于学习异类图的嵌入。我们的方法利用输入图的topological信息来自动确定每个组件在产品空间中的权重。我们的实验表明，WEIGHTED-PM可以从输入数据中学习更好的图表示，并且在多个下游任务中表现更好，如word相似度学习、top-$k$推荐和知识图嵌入。

An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization

paper_url: http://arxiv.org/abs/2307.04504
repo_url: None
paper_authors: Guy Kornowski, Ohad Shamir
for: 本研究探讨了使用杂变函数评估的$( \delta, \epsilon )$-稳定点优化问题的复杂性，并提出了一种 faster algorithm，其复杂度为 $O(d\delta^{-1}\epsilon^{-3})$，这是对维度 $d$ 和精度参数 $\delta,\epsilon$ 的优化。
methods: 本研究使用了随机零次算法来解决该问题，并提出了一种新的方法，该方法的复杂度为 $O(d\delta^{-1}\epsilon^{-3})$，这是对维度 $d$ 和精度参数 $\delta,\epsilon$ 的优化。
results: 本研究的结果表明，我们的算法可以在几乎所有情况下达到最优的 convergence rate，并且在满足certain condition下，我们的算法可以在随机扰动下达到最优的 convergence rate。此外，我们的分析还证明了在非 convex 随机零次设定下，nonsmooth 优化与 smooth 优化之间存在等价关系。

Abstract
We study the complexity of producing $(\delta,\epsilon)$-stationary points of Lipschitz objectives which are possibly neither smooth nor convex, using only noisy function evaluations. Recent works proposed several stochastic zero-order algorithms that solve this task, all of which suffer from a dimension-dependence of $\Omega(d^{3/2})$ where $d$ is the dimension of the problem, which was conjectured to be optimal. We refute this conjecture by providing a faster algorithm that has complexity $O(d\delta^{-1}\epsilon^{-3})$, which is optimal (up to numerical constants) with respect to $d$ and also optimal with respect to the accuracy parameters $\delta,\epsilon$, thus solving an open question due to Lin et al. (NeurIPS'22). Moreover, the convergence rate achieved by our algorithm is also optimal for smooth objectives, proving that in the nonconvex stochastic zero-order setting, nonsmooth optimization is as easy as smooth optimization. We provide algorithms that achieve the aforementioned convergence rate in expectation as well as with high probability. Our analysis is based on a simple yet powerful geometric lemma regarding the Goldstein-subdifferential set, which allows utilizing recent advancements in first-order nonsmooth nonconvex optimization.

摘要
我们研究生成($\delta$, $\epsilon$)-稳定点的复杂性，这些目标可能不对称，并且可能不 глад。我们使用仅有抽象函数评估来进行。近期的研究提出了一些测量零次法，解决了这个任务，但是所有这些方法都受到了维度($d$)的依赖性，即$\Omega(d^{3/2})$。我们推翻了这个推论，提出了一个更快的算法，其复杂度为$O(d\delta^{-1}\epsilon^{-3})$，它在$d$和精度参数$\delta,\epsilon$方面都是最佳（即对应的常数），解决了林等人（NeurIPS'22）所提出的开问题。此外，我们的算法还可以在光滑目标上 достичь最佳的凝固率，证明了在非对称零次法设定中，非光滑优化是与光滑优化一样容易。我们提供了在预期中和高概率下实现的算法，我们的分析基于一个简单又强大的几何学 lemma，允许我们利用最近的非光滑非对称非凝固优化的进步。

Geometric Constraints in Probabilistic Manifolds: A Bridge from Molecular Dynamics to Structured Diffusion Processes

paper_url: http://arxiv.org/abs/2307.04493
repo_url: None
paper_authors: Justin Diamond, Markus Lill
for: 这篇论文是为了解决生物复杂体系的宏观特征特征模型中的精度和特点问题，具体来说是在Euclidean空间中采样特定子集的问题。
methods: 该论文提出了一种将约束 projetion 算法纳入杜伦抑杂概率模型中的方法，以便在Euclidean空间中采样受约束的分布。
results: 该方法可以帮助实现在深度学习基础上的药物设计中维护特定分子对话的需要，以确保治疗效果和安全性。

Abstract
Understanding the macroscopic characteristics of biological complexes demands precision and specificity in statistical ensemble modeling. One of the primary challenges in this domain lies in sampling from particular subsets of the state-space, driven either by existing structural knowledge or specific areas of interest within the state-space. We propose a method that enables sampling from distributions that rigorously adhere to arbitrary sets of geometric constraints in Euclidean spaces. This is achieved by integrating a constraint projection operator within the well-regarded architecture of Denoising Diffusion Probabilistic Models, a framework founded in generative modeling and probabilistic inference. The significance of this work becomes apparent, for instance, in the context of deep learning-based drug design, where it is imperative to maintain specific molecular profile interactions to realize the desired therapeutic outcomes and guarantee safety.

摘要
理解生物复杂体系的宏观特征需要精度和特点的统计ensemble模型。这个领域的主要挑战在于采样特定的子集state-space，由于现有的结构知识或特定区域的 интересы。我们提出了一种方法，可以准确地采样符合任意的几何约束的分布在欧几何空间中。这是通过将约束投影Operator интеGRATE到了Denosing Diffusion Probabilistic Models的可靠的建筑基础上实现的。这种方法的重要性在深度学习基础设计中 particualry evident，因为需要保持特定的分子 profil interactio来实现所需的治疗效果和安全性。

Invertible Low-Dimensional Modelling of X-ray Absorption Spectra for Potential Applications in Spectral X-ray Imaging

paper_url: http://arxiv.org/abs/2307.04484
repo_url: None
paper_authors: Raziye Kubra Kumrular, Thomas Blumensath
for: 这个论文主要是为了提出一种新的非线性模型，用于 spectral X-ray imaging 中的数据压缩、噪声除除、spectral 估计和材料组成的量化测量。
methods: 该论文提出了一种新的模型，它combines一个深度神经网络自编器和一个最佳的线性模型基于 Singular Value Decomposition (SVD)。
results: 作者比较了这种新方法与其他线性和非线性方法，包括一种简单的模型和一种另一种深度学习模型。结果显示，该新方法在模拟 X-ray 吸收谱中的 K-edge 区域时表现更优异。

Abstract
X-ray interaction with matter is an energy-dependent process that is contingent on the atomic structure of the constituent material elements. The most advanced models to capture this relationship currently rely on Monte Carlo (MC) simulations. Whilst these very accurate models, in many problems in spectral X-ray imaging, such as data compression, noise removal, spectral estimation, and the quantitative measurement of material compositions, these models are of limited use, as these applications typically require the efficient inversion of the model, that is, they require the estimation of the best model parameters for a given spectral measurement. Current models that can be easily inverted however typically only work when modelling spectra in regions away from their K-edges, so they have limited utility when modelling a wider range of materials. In this paper, we thus propose a novel, non-linear model that combines a deep neural network autoencoder with an optimal linear model based on the Singular Value Decomposition (SVD). We compare our new method to other alternative linear and non-linear approaches, a sparse model and an alternative deep learning model. We demonstrate the advantages of our method over traditional models, especially when modelling X-ray absorption spectra that contain K-edges in the energy range of interest.

摘要
在这篇论文中，我们因此提出了一种新的、非线性模型，它结合了深度神经网络自适应器和基于Singular Value Decomposition（SVD）的最佳线性模型。我们与其他的线性和非线性方法进行比较，包括一种稀疏模型和一种另一种深度学习模型。我们示出了我们的方法在传统模型的基础上具有优势，特别是当模型X射线吸收谱在能量范围内时。

Badgers: generating data quality deficits with Python

paper_url: http://arxiv.org/abs/2307.04468
repo_url: https://github.com/fraunhofer-iese/badgers
paper_authors: Julien Siebert, Daniel Seifert, Patricia Kelbert, Michael Kläs, Adam Trendowicz
for: 这篇论文的目的是为了实验性评估数据驱动（人工智能（AI）或机器学习（ML））应用中的数据质量问题。
methods: 这篇论文使用了一个名为“badgers”的Python库，该库可以生成不同类型的数据质量缺陷（例如异常值、数据不均衡、漂移等），以便对数据驱动应用的数据质量进行实验性评估。
results: 这篇论文通过使用“badgers”库，可以生成不同类型的数据质量缺陷，以便对数据驱动应用的数据质量进行实验性评估。 documentation 可以在https://fraunhofer-iese.github.io/badgers/ 中找到，源代码可以在https://github.com/Fraunhofer-IESE/badgers 中找到。

Abstract
Generating context specific data quality deficits is necessary to experimentally assess data quality of data-driven (artificial intelligence (AI) or machine learning (ML)) applications. In this paper we present badgers, an extensible open-source Python library to generate data quality deficits (outliers, imbalanced data, drift, etc.) for different modalities (tabular data, time-series, text, etc.). The documentation is accessible at https://fraunhofer-iese.github.io/badgers/ and the source code at https://github.com/Fraunhofer-IESE/badgers

摘要
生成模式特定数据质量缺陷是评估数据驱动（人工智能（AI）或机器学习（ML））应用的实验需求。本文介绍badgers，一个可扩展的开源Python库，用于生成不同类型的数据质量缺陷（异常值、不均衡数据、漂移等）。文档可以在https://fraunhofer-iese.github.io/badgers/中查看，代码在https://github.com/Fraunhofer-IESE/badgers上可以查看。

paper_url: http://arxiv.org/abs/2307.04461
repo_url: None
paper_authors: Manuel Burger, Gunnar Rätsch, Rita Kuznetsova
for: 预测病人患病进程的多次医院访问（Multiple Hospital Visits）
methods: 使用图 neural network 学习医学概念的有意义表示，并将其组合为代表整个病人访问的表示。
results: 比较于现有的建筑方案，提高了表示多模态医学概念的性能，并且证明了在医学知识的支持下，多模态医学概念表示具有重要的意义。In English, this means:
for: Predicting the progression of patient illnesses across multiple hospital visits
methods: Using graph neural networks to learn meaningful representations of medical concepts, and combining them to represent entire patient visits.
results: Outperforming existing architectures in representing multiple modalities of medical concepts, and demonstrating the significance of incorporating prior medical knowledge.

Abstract
Clinicians are increasingly looking towards machine learning to gain insights about patient evolutions. We propose a novel approach named Multi-Modal UMLS Graph Learning (MMUGL) for learning meaningful representations of medical concepts using graph neural networks over knowledge graphs based on the unified medical language system. These representations are aggregated to represent entire patient visits and then fed into a sequence model to perform predictions at the granularity of multiple hospital visits of a patient. We improve performance by incorporating prior medical knowledge and considering multiple modalities. We compare our method to existing architectures proposed to learn representations at different granularities on the MIMIC-III dataset and show that our approach outperforms these methods. The results demonstrate the significance of multi-modal medical concept representations based on prior medical knowledge.

摘要
临床医生们正在寻求机器学习技术以获取病人发展情况的洞察。我们提出了一种新的方法名为多modal UMLS图像学习（MMUGL），该方法利用知识图库基于统一医学语言系统（UMLS）中的医学概念进行图神经网络学习，以获取有用的医学概念表示。这些表示被聚合以表示整个病人访问，然后通过顺序模型进行预测，以达到多个医院访问病人的级别。我们通过 incorporating 先前的医学知识和考虑多种感知modalities来提高性能。我们对MIMIC-III数据集中的不同层次的表示学习体系进行比较，并显示了我们的方法在这些方法之上出performances。结果表明多modal医学概念表示基于先前的医学知识具有重要性。

Invex Programs: First Order Algorithms and Their Convergence

paper_url: http://arxiv.org/abs/2307.04456
repo_url: None
paper_authors: Adarsh Barik, Suvrit Sra, Jean Honorio
for: 解决非对称问题，即具有全局最小值的每个站点点。
methods: 提出了新的一代首选法，并证明其 converge 的条件和速率。
results: 提供了一种新的投影法，可以解决约束的非对称问题，并提供了速率 guarantees。

Abstract
Invex programs are a special kind of non-convex problems which attain global minima at every stationary point. While classical first-order gradient descent methods can solve them, they converge very slowly. In this paper, we propose new first-order algorithms to solve the general class of invex problems. We identify sufficient conditions for convergence of our algorithms and provide rates of convergence. Furthermore, we go beyond unconstrained problems and provide a novel projected gradient method for constrained invex programs with convergence rate guarantees. We compare and contrast our results with existing first-order algorithms for a variety of unconstrained and constrained invex problems. To the best of our knowledge, our proposed algorithm is the first algorithm to solve constrained invex programs.

摘要
“inx problems是一种特殊的非凸问题，它们在每个稳定点上取得全球最小值。 classical first-order gradient descent方法可以解决它们，但是它们对应的速度很慢。在这篇论文中，我们提出了一些新的first-order算法来解决一般的inx问题。我们识别出了它们的充分条件，并提供了速度 guarantee。此外，我们不仅处理不受限制的问题，而且提出了一个新的对应投影法，它具有速度 guarantee。我们与现有的first-order算法进行比较和对照，并证明了我们的提案是第一个可以解决约束inx问题的算法。”Note: The translation is in Simplified Chinese, which is the standard written form of Chinese used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Graph Convolutional Networks for Simulating Multi-phase Flow and Transport in Porous Media

paper_url: http://arxiv.org/abs/2307.04449
repo_url: None
paper_authors: Jiamin Jiang, Bo Guo
for: numerical simulation of multi-phase fluid dynamics in porous media
methods: data-driven surrogate modeling using Graph Convolutional Networks (GCNs)
results: high accuracy in predicting pressure and saturation states, and generalization to irregular domain geometries and unstructured meshes.Here is the same information in Simplified Chinese:
for: numerical simulation of多相流体动力学在材料孔隙中
methods: 基于图像 convolutional neural networks (GCNs) 的数据驱动模拟
results: 高精度地预测压力和湿度状态的演变，并可以通过不同的域几何和粗粒度网格来扩展。

Abstract
Numerical simulation of multi-phase fluid dynamics in porous media is critical for many subsurface applications. Data-driven surrogate modeling provides computationally inexpensive alternatives to high-fidelity numerical simulators. While the commonly used convolutional neural networks (CNNs) are powerful in approximating partial differential equation solutions, it remains challenging for CNNs to handle irregular and unstructured simulation meshes. However, subsurface simulation models often involve unstructured meshes with complex mesh geometries, which limits the application of CNNs. To address this challenge, here we construct surrogate models based on Graph Convolutional Networks (GCNs) to approximate the spatial-temporal solutions of multi-phase flow and transport processes. We propose a new GCN architecture suited to the hyperbolic character of the coupled PDE system, to better capture the saturation dynamics. Results of 2D heterogeneous test cases show that our surrogates predict the evolutions of the pressure and saturation states with high accuracy, and the predicted rollouts remain stable for multiple timesteps. Moreover, the GCN-based models generalize well to irregular domain geometries and unstructured meshes that are unseen in the training dataset.

摘要
numerically simulating multi-phase fluid dynamics in porous media is crucial for many subsurface applications. data-driven surrogate modeling provides computationally inexpensive alternatives to high-fidelity numerical simulators. although commonly used convolutional neural networks (CNNs) are powerful in approximating partial differential equation solutions, it remains challenging for CNNs to handle irregular and unstructured simulation meshes. however, subsurface simulation models often involve unstructured meshes with complex mesh geometries, which limits the application of CNNs. to address this challenge, we construct surrogate models based on graph convolutional networks (GCNs) to approximate the spatial-temporal solutions of multi-phase flow and transport processes. we propose a new GCN architecture suited to the hyperbolic character of the coupled PDE system, to better capture the saturation dynamics. results of 2D heterogeneous test cases show that our surrogates predict the evolutions of the pressure and saturation states with high accuracy, and the predicted rollouts remain stable for multiple timesteps. moreover, the GCN-based models generalize well to irregular domain geometries and unstructured meshes that are unseen in the training dataset.

Learning Behavioral Representations of Routines From Large-scale Unlabeled Wearable Time-series Data Streams using Hawkes Point Process

paper_url: http://arxiv.org/abs/2307.04445
repo_url: None
paper_authors: Tiantian Feng, Brandon M Booth, Shrikanth Narayanan
for: 这篇论文旨在探讨如何通过不带任何标签的穿戴式传感器数据来探索人们的日常活动模式。
methods: 该论文提出了一种基于时间序列 clustering 和 Hawkes 点过程学习算法的新方法，可以从无标签时间序列数据中探索人们的日常行为模式。
results: 研究人员通过使用该方法，成功地探索了100多名参与者的日常活动模式，并发现了每天的活动转换关系。此外，该方法还能够捕捉到各个人的个性特征和情绪变化。

Abstract
Continuously-worn wearable sensors enable researchers to collect copious amounts of rich bio-behavioral time series recordings of real-life activities of daily living, offering unprecedented opportunities to infer novel human behavior patterns during daily routines. Existing approaches to routine discovery through bio-behavioral data rely either on pre-defined notions of activities or use additional non-behavioral measurements as contexts, such as GPS location or localization within the home, presenting risks to user privacy. In this work, we propose a novel wearable time-series mining framework, Hawkes point process On Time series clusters for ROutine Discovery (HOT-ROD), for uncovering behavioral routines from completely unlabeled wearable recordings. We utilize a covariance-based method to generate time-series clusters and discover routines via the Hawkes point process learning algorithm. We empirically validate our approach for extracting routine behaviors using a completely unlabeled time-series collected continuously from over 100 individuals both in and outside of the workplace during a period of ten weeks. Furthermore, we demonstrate this approach intuitively captures daily transitional relationships between physical activity states without using prior knowledge. We also show that the learned behavioral patterns can assist in illuminating an individual's personality and affect.

摘要
<>使用不间断搭配的便携式传感器，研究人员可以收集大量的丰富的生物行为时序记录，以获得前无之例的人类行为模式发现机会。现有的 Routine 发现方法通过生物行为数据，都是基于先前定义的活动或者使用其他非行为测量，如 GPS 位置或家庭内部的位置，这些方法存在用户隐私风险。在这项工作中，我们提出了一种基于 Hawkes 点过程的时序序分 clustering 框架，称为 HOT-ROD，用于从未标记的便携式记录中探索行为 Routine。我们使用 covariance 基本方法生成时序序 clusters，通过 Hawkes 点过程学习算法发现 Routine。我们经验 Validate 我们的方法可以从未标记的时序记录中提取行为 Routine。我们使用了来自100名参与者的完全无标记时序记录，在工作场所和外部环境中进行了10周的观察。此外，我们还示出了我们的方法可以直观地捕捉日常physical activity状态之间的关系，无需使用先前知识。最后，我们还示出了学习的行为模式可以帮助描述个体的人性和情感。

Search-time Efficient Device Constraints-Aware Neural Architecture Search

paper_url: http://arxiv.org/abs/2307.04443
repo_url: None
paper_authors: Oshin Dutta, Tanu Kanvar, Sumeet Agarwal
for: 这篇论文的目的是提出一种自动化建立适合边缘设备的对应深度学习架构，以减少边缘设备的资源需求。
methods: 这篇论文使用了对应深度学习架构搜索（Neural Architecture Search，NAS）来自动化建立适合边缘设备的架构。它还使用了Weight sharing和Channel bottleneck技术来加速搜索过程。
results: 根据实验结果，DCA-NAS比对应深度学习架构更好地适应边缘设备的硬件限制，并且与流行的 mobil 架构相比，在不同的图像识别 datasets 上具有相似的表现。此外，DCA-NAS 还可以在 Hardware-NAS-Bench 上进行硬件特定的架构搜索，实现低 inference 延迟和现场表现。

Abstract
Edge computing aims to enable edge devices, such as IoT devices, to process data locally instead of relying on the cloud. However, deep learning techniques like computer vision and natural language processing can be computationally expensive and memory-intensive. Creating manual architectures specialized for each device is infeasible due to their varying memory and computational constraints. To address these concerns, we automate the construction of task-specific deep learning architectures optimized for device constraints through Neural Architecture Search (NAS). We present DCA-NAS, a principled method of fast neural network architecture search that incorporates edge-device constraints such as model size and floating-point operations. It incorporates weight sharing and channel bottleneck techniques to speed up the search time. Based on our experiments, we see that DCA-NAS outperforms manual architectures for similar sized models and is comparable to popular mobile architectures on various image classification datasets like CIFAR-10, CIFAR-100, and Imagenet-1k. Experiments with search spaces -- DARTS and NAS-Bench-201 show the generalization capabilities of DCA-NAS. On further evaluating our approach on Hardware-NAS-Bench, device-specific architectures with low inference latency and state-of-the-art performance were discovered.

摘要
(Simplified Chinese translation)边计算旨在让边缘设备，如物联网设备，直接处理本地数据而不依赖于云端。然而，深度学习技术如计算机视觉和自然语言处理可能具有计算成本和内存占用的问题。为了解决这些问题，我们自动构建任务特定的深度学习架构，以适应边缘设备的约束。我们提出了DCA-NAS，一种基于约束的快速神经网络架构搜索方法，该方法包括边缘设备的模型大小和浮点运算数量等约束。它还包括权重共享和通道瓶颈技术来加速搜索时间。根据我们的实验，DCA-NAS在相同大小的模型中表现更好，与流行的移动设备架构相当，并在多个图像分类 datasets like CIFAR-10, CIFAR-100, Imagenet-1k 上达到了类似的性能。我们的方法在 DARTS 和 NAS-Bench-201 的搜索空间上进行了广泛的测试，并证明了DCA-NAS的普适性。在进一步评估我们的方法在 Hardware-NAS-Bench 上时，我们发现了适用于具体的设备的低延迟和状态公平的设备特定架构。

Designing Novel Cognitive Diagnosis Models via Evolutionary Multi-Objective Neural Architecture Search

paper_url: http://arxiv.org/abs/2307.04429
repo_url: https://github.com/devilyangs/emo-nas-cd
paper_authors: Shangshang Yang, Haiping Ma, Cheng Zhen, Ye Tian, Limiao Zhang, Yaochu Jin, Xingyi Zhang
for: 提高现代智能教育平台中学生知识概念掌握的精度和效率，通过自动设计新的认知诊断模型。
methods: 使用进化多目标神经网络架构搜索（NAS）自动设计新的认知诊断模型，并maximize模型性能和可解释性。
results: 通过实验表明，提posed方法可以寻找比现有模型更好的认知诊断模型，同时保持与人类设计的模型相同的可解释性。

Abstract
Cognitive diagnosis plays a vital role in modern intelligent education platforms to reveal students' proficiency in knowledge concepts for subsequent adaptive tasks. However, due to the requirement of high model interpretability, existing manually designed cognitive diagnosis models hold too simple architectures to meet the demand of current intelligent education systems, where the bias of human design also limits the emergence of effective cognitive diagnosis models. In this paper, we propose to automatically design novel cognitive diagnosis models by evolutionary multi-objective neural architecture search (NAS). Specifically, we observe existing models can be represented by a general model handling three given types of inputs and thus first design an expressive search space for the NAS task in cognitive diagnosis. Then, we propose multi-objective genetic programming (MOGP) to explore the NAS task's search space by maximizing model performance and interpretability. In the MOGP design, each architecture is transformed into a tree architecture and encoded by a tree for easy optimization, and a tailored genetic operation based on four sub-genetic operations is devised to generate offspring effectively. Besides, an initialization strategy is also suggested to accelerate the convergence by evolving half of the population from existing models' variants. Experiments on two real-world datasets demonstrate that the cognitive diagnosis models searched by the proposed approach exhibit significantly better performance than existing models and also hold as good interpretability as human-designed models.

摘要
现代智能教育平台中，认知诊断发挥了关键作用，以揭示学生知识概念的熟练程度，以便进行后续适应任务。然而，由于需要高度的模型解释性，现有的手动设计的认知诊断模型具有过于简单的结构，无法满足当前智能教育系统的需求。在这篇论文中，我们提议使用EVOLUTIONARY MULTI-OBJECTIVE NEURAL ARCHITECTURE SEARCH（NAS）自动设计新的认知诊断模型。 Specifically, we observe that existing models can be represented by a general model handling three given types of inputs, and thus first design an expressive search space for the NAS task in cognitive diagnosis. Then, we propose multi-objective genetic programming (MOGP) to explore the NAS task's search space by maximizing model performance and interpretability. In the MOGP design, each architecture is transformed into a tree architecture and encoded by a tree for easy optimization, and a tailored genetic operation based on four sub-genetic operations is devised to generate offspring effectively. Besides, an initialization strategy is also suggested to accelerate the convergence by evolving half of the population from existing models' variants. Experiments on two real-world datasets demonstrate that the cognitive diagnosis models searched by the proposed approach exhibit significantly better performance than existing models and also hold as good interpretability as human-designed models.

Observation of high-energy neutrinos from the Galactic plane

paper_url: http://arxiv.org/abs/2307.04427
repo_url: None
paper_authors: R. Abbasi, M. Ackermann, J. Adams, J. A. Aguilar, M. Ahlers, M. Ahrens, J. M. Alameddine, A. A. Alves Jr., N. M. Amin, K. Andeen, T. Anderson, G. Anton, C. Argüelles, Y. Ashida, S. Athanasiadou, S. Axani, X. Bai, A. Balagopal V., S. W. Barwick, V. Basu, S. Baur, R. Bay, J. J. Beatty, K. -H. Becker, J. Becker Tjus, J. Beise, C. Bellenghi, S. Benda, S. BenZvi, D. Berley, E. Bernardini, D. Z. Besson, G. Binder, D. Bindig, E. Blaufuss, S. Blot, M. Boddenberg, F. Bontempo, J. Y. Book, J. Borowka, S. Böser, O. Botner, J. Böttcher, E. Bourbeau, F. Bradascio, J. Braun, B. Brinson, S. Bron, J. Brostean-Kaiser, R. T. Burley, R. S. Busse, M. A. Campana, E. G. Carnie-Bronca, C. Chen, Z. Chen, D. Chirkin, K. Choi, B. A. Clark, K. Clark, L. Classen, A. Coleman, G. H. Collin, A. Connolly, J. M. Conrad, P. Coppin, P. Correa, D. F. Cowen, R. Cross, C. Dappen, P. Dave, C. De Clercq, J. J. DeLaunay, D. Delgado López, H. Dembinski, K. Deoskar, A. Desai, P. Desiati, K. D. de Vries, G. de Wasseige, T. DeYoung, A. Diaz, J. C. Díaz-Vélez, M. Dittmer, H. Dujmovic, M. Dunkman, M. A. DuVernois, T. Ehrhardt, P. Eller, R. Engel, H. Erpenbeck, J. Evans, P. A. Evenson, K. L. Fan, A. R. Fazely, A. Fedynitch, N. Feigl, S. Fiedlschuster, A. T. Fienberg, C. Finley, L. Fischer, D. Fox, A. Franckowiak, E. Friedman, A. Fritz, P. Fürst, T. K. Gaisser, J. Gallagher, E. Ganster, A. Garcia, S. Garrappa, L. Gerhardt, A. Ghadimi, C. Glaser, T. Glauch, T. Glüsenkamp, N. Goehlke, A. Goldschmidt, J. G. Gonzalez, S. Goswami, D. Grant, T. Grégoire, S. Griswold, C. Günther, P. Gutjahr, C. Haack, A. Hallgren, R. Halliday, L. Halve, F. Halzen, M. Ha Minh, K. Hanson, J. Hardin, A. A. Harnisch, A. Haungs, K. Helbing, F. Henningsen, E. C. Hettinger, S. Hickford, J. Hignight, C. Hill, G. C. Hill, K. D. Hoffman, K. Hoshina, W. Hou, F. Huang, M. Huber, T. Huber, K. Hultqvist, M. Hünnefeld, R. Hussain, K. Hymon, S. In, N. Iovine, A. Ishihara, M. Jansson, G. S. Japaridze, M. Jeong, M. Jin, B. J. P. Jones, D. Kang, W. Kang, X. Kang, A. Kappes, D. Kappesser, L. Kardum, T. Karg, M. Karl, A. Karle, U. Katz, M. Kauer, M. Kellermann, J. L. Kelley, A. Kheirandish, K. Kin, J. Kiryluk, S. R. Klein, A. Kochocki, R. Koirala, H. Kolanoski, T. Kontrimas, L. Köpke, C. Kopper, S. Kopper, D. J. Koskinen, P. Koundal, M. Kovacevich, M. Kowalski, T. Kozynets, E. Krupczak, E. Kun, N. Kurahashi, N. Lad, C. Lagunas Gualda, J. L. Lanfranchi, M. J. Larson, F. Lauber, J. P. Lazar, J. W. Lee, K. Leonard, A. Leszczyńska, Y. Li, M. Lincetto, Q. R. Liu, M. Liubarska, E. Lohfink, C. J. Lozano Mariscal, L. Lu, F. Lucarelli, A. Ludwig, W. Luszczak, Y. Lyu, W. Y. Ma, J. Madsen, K. B. M. Mahn, Y. Makino, S. Mancina, I. C. Mariş, I. Martinez-Soler, R. Maruyama, S. McCarthy, T. McElroy, F. McNally, J. V. Mead, K. Meagher, S. Mechbal, A. Medina, M. Meier, S. Meighen-Berger, Y. Merckx, J. Micallef, D. Mockler, T. Montaruli, R. W. Moore, K. Morik, R. Morse, M. Moulai, T. Mukherjee, R. Naab, R. Nagai, R. Nahnhauer, U. Naumann, J. Necker, L. V. Nguyen, H. Niederhausen, M. U. Nisa, S. C. Nowicki, D. Nygren, A. Obertacke Pollmann, M. Oehler, B. Oeyen, A. Olivas, E. O’Sullivan, H. Pandya, D. V. Pankova, N. Park, G. K. Parker, E. N. Paudel, L. Paul, C. Pérez de los Heros, L. Peters, J. Peterson, S. Philippen, S. Pieper, A. Pizzuto, M. Plum, Y. Popovych, A. Porcelli, M. Prado Rodriguez, B. Pries, G. T. Przybylski, C. Raab, J. Rack-Helleis, A. Raissi, M. Rameez, K. Rawlins, I. C. Rea, Z. Rechav, A. Rehman, P. Reichherzer, R. Reimann, G. Renzi, E. Resconi, S. Reusch, W. Rhode, M. Richman, B. Riedel, E. J. Roberts, S. Robertson, G. Roellinghoff, M. Rongen, C. Rott, T. Ruhe, D. Ryckbosch, D. Rysewyk Cantu, I. Safa, J. Saffer, D. Salazar-Gallegos, P. Sampathkumar, S. E. Sanchez Herrera, A. Sandrock, M. Santander, S. Sarkar, S. Sarkar, K. Satalecka, M. Schaufel, H. Schieler, S. Schindler, T. Schmidt, A. Schneider, J. Schneider, F. G. Schröder, L. Schumacher, G. Schwefer, S. Sclafani, D. Seckel, S. Seunarine, A. Sharma, S. Shefali, N. Shimizu, M. Silva, B. Skrzypek, B. Smithers, R. Snihur, J. Soedingrekso, A. Sogaard, D. Soldin, C. Spannfellner, G. M. Spiczak, C. Spiering, M. Stamatikos, T. Stanev, R. Stein, J. Stettner, T. Stezelberger, B. Stokstad, T. Stürwald, T. Stuttard, G. W. Sullivan, I. Taboada, S. Ter-Antonyan, J. Thwaites, S. Tilav, F. Tischbein, K. Tollefson, C. Tönnis, S. Toscano, D. Tosi, A. Trettin, M. Tselengidou, C. F. Tung, A. Turcati, R. Turcotte, C. F. Turley, J. P. Twagirayezu, B. Ty, M. A. Unland Elorrieta, N. Valtonen-Mattila, J. Vandenbroucke, N. van Eijndhoven, D. Vannerom, J. van Santen, J. Veitch-Michaelis, S. Verpoest, C. Walck, W. Wang, T. B. Watson, C. Weaver, P. Weigel, A. Weindl, M. J. Weiss, J. Weldert, C. Wendt, J. Werthebach, M. Weyrauch, N. Whitehorn, C. H. Wiebusch, N. Willey, D. R. Williams, M. Wolf, G. Wrede, J. Wulff, X. W. Xu, J. P. Yanez, E. Yildizci, S. Yoshida, S. Yu, T. Yuan, Z. Zhang, P. Zhelnin
for: 本研究的目的是找到高能宇宙射线的起源。
methods: 这个研究使用机器学习技术分析了十年的冰穹神 neutrino telescope 数据，寻找宇宙射线的辐射。
results: 研究发现了来自银河平面的 neutrino 辐射，stats 相对于背景只是4.5$\sigma$ 的水平，但这些辐射也可能来自一群未能解决的点源。

Abstract
The origin of high-energy cosmic rays, atomic nuclei that continuously impact Earth's atmosphere, has been a mystery for over a century. Due to deflection in interstellar magnetic fields, cosmic rays from the Milky Way arrive at Earth from random directions. However, near their sources and during propagation, cosmic rays interact with matter and produce high-energy neutrinos. We search for neutrino emission using machine learning techniques applied to ten years of data from the IceCube Neutrino Observatory. We identify neutrino emission from the Galactic plane at the 4.5$\sigma$ level of significance, by comparing diffuse emission models to a background-only hypothesis. The signal is consistent with modeled diffuse emission from the Galactic plane, but could also arise from a population of unresolved point sources.

摘要
高能宇宙射线的起源已经是一个谜题超过一个世纪。由于介电场的折射，宇宙射线从银河系 arrive at Earth from random directions。然而，在源头附近和传播过程中，宇宙射线与物质进行交互，生成高能中微子。我们使用机器学习技术对 IceCube 中微子观测器十年的数据进行搜索。我们在background-only假设下，在4.5σ水平上发现了来自银河平面的中微子发射。该信号与模型的散发辐射相一致，但也可能来自一群未解决的点源。

Handling Group Fairness in Federated Learning Using Augmented Lagrangian Approach

paper_url: http://arxiv.org/abs/2307.04417
repo_url: None
paper_authors: Gerry Windiarto Mohamad Dunda, Shenghui Song
for: 提高 Federated Learning 系统中群体公平性的方法
methods: 提出了一种基于 local differential privacy 和 client-side aggregation 的新 Federated Learning 算法，可以有效地提高群体公平性，同时具有较少的通信成本和负载。
results: 在 CelebA 和 ImSitu 数据集上实验表明，提出的方法可以在统计上不同的客户端数量和随机性下，Quantitatively 和 Qualitatively 提高群体公平性，同时具有较少的减少精度损失。

Abstract
Federated learning (FL) has garnered considerable attention due to its privacy-preserving feature. Nonetheless, the lack of freedom in managing user data can lead to group fairness issues, where models might be biased towards sensitive factors such as race or gender, even if they are trained using a legally compliant process. To redress this concern, this paper proposes a novel FL algorithm designed explicitly to address group fairness issues. We show empirically on CelebA and ImSitu datasets that the proposed method can improve fairness both quantitatively and qualitatively with minimal loss in accuracy in the presence of statistical heterogeneity and with different numbers of clients. Besides improving fairness, the proposed FL algorithm is compatible with local differential privacy (LDP), has negligible communication costs, and results in minimal overhead when migrating existing FL systems from the common FL protocol such as FederatedAveraging (FedAvg). We also provide the theoretical convergence rate guarantee for the proposed algorithm and the required noise level of the Gaussian mechanism to achieve desired LDP. This innovative approach holds significant potential to enhance the fairness and effectiveness of FL systems, particularly in sensitive applications such as healthcare or criminal justice.

摘要
federated learning (FL) 已经引起了广泛的关注，主要是因为它具有隐私保护的特点。然而，由于用户数据的管理不具有完全的自由性，这可能会导致群体公平问题，模型可能会偏爱敏感因素 such as 种族或性别，即使使用了合法的过程进行训练。为了解决这个问题，这篇论文提出了一种新的 Federated Learning 算法，用于直接地 Addressing Group Fairness Issues。我们通过实验表明，在 celebA 和 ImSitu 数据集上，我们的方法可以改善公平性 both quantitatively and qualitatively，即使在统计不同性和不同客户端数量的情况下。此外，我们的 FL 算法兼容地方 differential privacy (LDP)，通信成本很低，并且在将现有 FL 系统从 Common FL Protocol such as FederatedAveraging (FedAvg) 迁移到我们的方法时，不会带来很大的干扰。我们还提供了对我们的算法的理论收敛率保证和需要的 Gaussian 机制的噪音水平来实现 Desired LDP。这种创新的方法可能会提高 FL 系统的公平性和效果，特别是在敏感应用 such as 医疗或刑事正义领域。

Episodic Gaussian Process-Based Learning Control with Vanishing Tracking Errors

paper_url: http://arxiv.org/abs/2307.04415
repo_url: None
paper_authors: Armin Lederer, Jonas Umlauft, Sandra Hirche
for: 这个论文是为了解决技术系统中减少精细模型的问题，通过超visirotting机器学习来推断模型。
methods: 这篇论文使用了 Gaussian process regression，具有高数据效率和明确的uncertainty表示，从而 derivation of prediction error bounds。
results: 论文提出了一种 Bayesian prediction error bound，其与数据密度相关，并证明了时间变化的跟踪准确性保证。通过这种 bound，可以实现vanishing tracking error with increasing data density。

Abstract
Due to the increasing complexity of technical systems, accurate first principle models can often not be obtained. Supervised machine learning can mitigate this issue by inferring models from measurement data. Gaussian process regression is particularly well suited for this purpose due to its high data-efficiency and its explicit uncertainty representation, which allows the derivation of prediction error bounds. These error bounds have been exploited to show tracking accuracy guarantees for a variety of control approaches, but their direct dependency on the training data is generally unclear. We address this issue by deriving a Bayesian prediction error bound for GP regression, which we show to decay with the growth of a novel, kernel-based measure of data density. Based on the prediction error bound, we prove time-varying tracking accuracy guarantees for learned GP models used as feedback compensation of unknown nonlinearities, and show to achieve vanishing tracking error with increasing data density. This enables us to develop an episodic approach for learning Gaussian process models, such that an arbitrary tracking accuracy can be guaranteed. The effectiveness of the derived theory is demonstrated in several simulations.

摘要
To address this issue, we derive a Bayesian prediction error bound for GP regression, which we show to decay with the growth of a novel, kernel-based measure of data density. Based on this bound, we prove time-varying tracking accuracy guarantees for learned GP models used as feedback compensation of unknown nonlinearities, and show that the tracking error vanishes as the data density increases. This enables us to develop an episodic approach for learning Gaussian process models, such that an arbitrary tracking accuracy can be guaranteed. The effectiveness of the derived theory is demonstrated through several simulations.

Comparison of Point Cloud and Image-based Models for Calorimeter Fast Simulation

paper_url: http://arxiv.org/abs/2307.04780
repo_url: https://github.com/viniciusmikuni/calo4eic
paper_authors: Fernando Torales Acosta, Vinicius Mikuni, Benjamin Nachman, Miguel Arratia, Bishnu Karki, Ryan Milton, Piyush Karande, Aaron Angerami
for: 这个论文是为了研究Score基于生成模型，以准确地生成高维度calorimeter数据集。
methods: 这两个状态对比的Score基于模型都是使用同一个calorimeter simulate数据集训练的。
results: 研究发现，使用点云来表示calorimeter shower比使用3D voxels更加自然地处理稀疏数据集，并可以使用更加紧凑的模型和数据文件。

Abstract
Score based generative models are a new class of generative models that have been shown to accurately generate high dimensional calorimeter datasets. Recent advances in generative models have used images with 3D voxels to represent and model complex calorimeter showers. Point clouds, however, are likely a more natural representation of calorimeter showers, particularly in calorimeters with high granularity. Point clouds preserve all of the information of the original simulation, more naturally deal with sparse datasets, and can be implemented with more compact models and data files. In this work, two state-of-the-art score based models are trained on the same set of calorimeter simulation and directly compared.

摘要
分数基于的生成模型是一种新的生成模型，已经证明可以准确地生成高维度的椭圆仪数据。最近的生成模型使用图像的3D矩阵来表示和模型复杂的椭圆仪涨潮。然而，点云更可能是椭圆仪涨潮的自然表示，特别是高精度的椭圆仪中。点云保留原始模拟中的所有信息，更自然地处理稀疏数据，并可以通过更 компакт的模型和数据文件实现。在这项工作中，两种当前领先的分数基于模型被直接对比训练。

CT-based Subchondral Bone Microstructural Analysis in Knee Osteoarthritis via MR-Guided Distillation Learning

paper_url: http://arxiv.org/abs/2307.04390
repo_url: https://github.com/jackhu-bme/srrd
paper_authors: Yuqi Hu, Xiangyu Zhao, Gaowei Qing, Kai Xie, Chenglei Liu, Lichi Zhang
For: The paper aims to develop a novel method for subchondral bone microstructural analysis using easily-acquired CT images, which can help diagnose knee osteoarthritis.* Methods: The proposed method, named SRRD, uses a distillation-learning-based approach to transfer MR structural information to a CT-based model, and leverages paired MR images to enhance the CT-based analysis model during training.* Results: The proposed method achieved high reliability and validity in MR-CT registration, regression, and knee osteoarthritis classification, with an AUC score of 0.767 (95% CI, 0.681-0.853) compared to 0.658 (95% CI, 0.574-0.742) using the CNN approach.

Abstract
Background: MR-based subchondral bone effectively predicts knee osteoarthritis. However, its clinical application is limited by the cost and time of MR. Purpose: We aim to develop a novel distillation-learning-based method named SRRD for subchondral bone microstructural analysis using easily-acquired CT images, which leverages paired MR images to enhance the CT-based analysis model during training. Materials and Methods: Knee joint images of both CT and MR modalities were collected from October 2020 to May 2021. Firstly, we developed a GAN-based generative model to transform MR images into CT images, which was used to establish the anatomical correspondence between the two modalities. Next, we obtained numerous patches of subchondral bone regions of MR images, together with their trabecular parameters (BV / TV, Tb. Th, Tb. Sp, Tb. N) from the corresponding CT image patches via regression. The distillation-learning technique was used to train the regression model and transfer MR structural information to the CT-based model. The regressed trabecular parameters were further used for knee osteoarthritis classification. Results: A total of 80 participants were evaluated. CT-based regression results of trabecular parameters achieved intra-class correlation coefficients (ICCs) of 0.804, 0.773, 0.711, and 0.622 for BV / TV, Tb. Th, Tb. Sp, and Tb. N, respectively. The use of distillation learning significantly improved the performance of the CT-based knee osteoarthritis classification method using the CNN approach, yielding an AUC score of 0.767 (95% CI, 0.681-0.853) instead of 0.658 (95% CI, 0.574-0.742) (p<.001). Conclusions: The proposed SRRD method showed high reliability and validity in MR-CT registration, regression, and knee osteoarthritis classification, indicating the feasibility of subchondral bone microstructural analysis based on CT images.

摘要
背景：MR基于subchondral骨的效果可以预测膝关节炎，但其临床应用受到MR成本和时间的限制。目的：我们希望开发一种基于截然学习的方法，名为SRRD，用于CT图像上的subchondral骨微结构分析，利用MR图像来增强CT图像分析模型的训练。材料和方法：膝关节图像 Both CT和MR模式自2020年10月至2021年5月收集。首先，我们开发了一个GAN基于生成模型，将MR图像转换成CT图像，以建立两种模态之间的解剖相对性。接着，我们从MR图像中提取了许多subchondral骨区域的补丁，并将其与CT图像中的相应区域进行对比，以获取 trabecular参数（BV / TV、Tb. Th、Tb. Sp、Tb. N）的 regression。使用截然学习技术来训练回归模型，以将MR结构信息传递给CT基本的模型。重新计算的 trabecular参数被用于膝关节炎分类。结果：总共评估了80名参与者。CT基于 regression 结果中的 trabecular参数 achieved intra-class correlation coefficients (ICCs) of 0.804, 0.773, 0.711, and 0.622 for BV / TV, Tb. Th, Tb. Sp, and Tb. N, respectively。使用截然学习对CT基本的膝关节炎分类方法进行了显著改进，其AUC score为0.767（95% CI, 0.681-0.853），比之前的0.658（95% CI, 0.574-0.742）（p<.001）。结论：我们的SRRD方法在MR-CT регистраción、回归和膝关节炎分类中表现了高可靠性和有效性，表明基于CT图像的subchondral骨微结构分析是可能的。

Learning to Identify Graphs from Node Trajectories in Multi-Robot Networks

paper_url: http://arxiv.org/abs/2307.04374
repo_url: None
paper_authors: Eduardo Sebastian, Thai Duong, Nikolay Atanasov, Eduardo Montijano, Carlos Sagues
for: Identifying the interactions among nodes in a network given their state/feature trajectories.
methods: Combines a strongly convex program with a self-attention encoder to learn the graph topology and appropriate regularizers for optimization.
results: Can identify the graph topology of unseen networks with new configurations in terms of number of nodes, connectivity, or state trajectories, and demonstrates effectiveness in multi-robot formation and flocking tasks.Here’s the text in Simplified Chinese:
for: Identifying网络中节点之间的互动关系，给出状态/特征轨迹。
methods: 组合强型凸程程序和自注意编码器，学习网络架构和优化适当的正则化项。
results: 可以 Identify未看到的网络架构，包括节点数量、连接性和状态轨迹等新配置，并在多机器formation和群集任务中表现效果。

Abstract
The graph identification problem consists of discovering the interactions among nodes in a network given their state/feature trajectories. This problem is challenging because the behavior of a node is coupled to all the other nodes by the unknown interaction model. Besides, high-dimensional and nonlinear state trajectories make difficult to identify if two nodes are connected. Current solutions rely on prior knowledge of the graph topology and the dynamic behavior of the nodes, and hence, have poor generalization to other network configurations. To address these issues, we propose a novel learning-based approach that combines (i) a strongly convex program that efficiently uncovers graph topologies with global convergence guarantees and (ii) a self-attention encoder that learns to embed the original state trajectories into a feature space and predicts appropriate regularizers for the optimization program. In contrast to other works, our approach can identify the graph topology of unseen networks with new configurations in terms of number of nodes, connectivity or state trajectories. We demonstrate the effectiveness of our approach in identifying graphs in multi-robot formation and flocking tasks.

摘要
《网络图标识问题》的解决方法是找出网络中节点之间的交互关系， giventheir状态/特征轨迹。这个问题具有挑战性，因为每个节点的行为都是所有其他节点的未知交互模型的各自响应。此外，高维和非线性的状态轨迹使得判断两个节点是否连接的很困难。现有的解决方案均基于节点的动态行为和网络拓扑结构的先验知识，因此具有poor泛化性。为了解决这些问题，我们提出了一种新的学习型方法，其包括（i）一个强 convex 程序，可以高效地揭示网络拓扑结构，并且具有全局收敛保证；（ii）一个自注意编码器，可以将原始状态轨迹编码到特征空间中，并预测适当的正则化项来优化程序。与其他工作不同，我们的方法可以在未看到的网络配置下，适应新的节点数、连接性和状态轨迹。我们在多机器人formation和群集任务中证明了我们的方法的有效性。

Recent Advancements in End-to-End Autonomous Driving using Deep Learning: A Survey

paper_url: http://arxiv.org/abs/2307.04370
repo_url: https://github.com/pranav-chib/recent-advancements-in-end-to-end-autonomous-driving-using-deep-learning
paper_authors: Pranav Singh Chib, Pravendra Singh
for: 本研究旨在提供一份涵盖整个自动驾驶栈的全面回顾，探讨了深度学习在自动驾驶中的应用，以及适应实际应用环境中的挑战。
methods: 本研究使用了深度学习来实现自动驾驶栈的全面控制，包括感知到控制的全流程，并Addressed key challenges in real-world applications, such as explainability and safety aspects.
results: 本研究对自动驾驶领域的最新发展进行了分类和评估，并提供了一个包含最新开源实现的GitHub仓库。

Abstract
End-to-End driving is a promising paradigm as it circumvents the drawbacks associated with modular systems, such as their overwhelming complexity and propensity for error propagation. Autonomous driving transcends conventional traffic patterns by proactively recognizing critical events in advance, ensuring passengers' safety and providing them with comfortable transportation, particularly in highly stochastic and variable traffic settings. This paper presents a comprehensive review of the End-to-End autonomous driving stack. It provides a taxonomy of automated driving tasks wherein neural networks have been employed in an End-to-End manner, encompassing the entire driving process from perception to control, while addressing key challenges encountered in real-world applications. Recent developments in End-to-End autonomous driving are analyzed, and research is categorized based on underlying principles, methodologies, and core functionality. These categories encompass sensorial input, main and auxiliary output, learning approaches ranging from imitation to reinforcement learning, and model evaluation techniques. The survey incorporates a detailed discussion of the explainability and safety aspects. Furthermore, it assesses the state-of-the-art, identifies challenges, and explores future possibilities. We maintained the latest advancements and their corresponding open-source implementations at https://github.com/Pranav-chib/Recent-Advancements-in-End-to-End-Autonomous-Driving-using-Deep-Learning.

摘要
END-TO-END自驾系统是一种优秀的思想，因为它绕过模块化系统的缺点，如它们的复杂性和错误卷积。自驾系统超越了传统的交通模式，通过积极地预测重要事件，确保乘客的安全和提供舒适的交通方式，特别是在高度随机和变化的交通环境中。这篇论文提供了END-TO-END自驾栈的全面回顾。它提供了自动驾驶任务中使用神经网络的End-to-End方式，涵盖整个驾驶过程从感知到控制，并解决了实际应用中遇到的主要挑战。最新的END-TO-END自驾技术发展分析，并按照基本原理、方法论和核心功能分类。这些类别包括感知输入、主要和辅助输出、学习方法从模仿到强化学习，以及模型评估技术。文章还包括对解释性和安全方面的详细讨论，以及现状、挑战和未来可能性的评估。我们将latestadvances和它们相应的开源实现存储在https://github.com/Pranav-chib/Recent-Advancements-in-End-to-End-Autonomous-Driving-using-Deep-Learning上。

ECS – an Interactive Tool for Data Quality Assurance

paper_url: http://arxiv.org/abs/2307.04368
repo_url: None
paper_authors: Christian Sieberichs, Simon Geerkens, Alexander Braun, Thomas Waschulzik
for: 这项研究旨在提高机器学习系统的可靠性和安全性，通过确保数据质量高。
methods: 该paper提出了一种新的数据质量保证方法，通过数学基础和多个示例来检测数据中可能有害的特性。
results: 该方法可以检测出可能对安全系统具有害的数据点，从而提高机器学习系统的可靠性和安全性。

Abstract
With the increasing capabilities of machine learning systems and their potential use in safety-critical systems, ensuring high-quality data is becoming increasingly important. In this paper we present a novel approach for the assurance of data quality. For this purpose, the mathematical basics are first discussed and the approach is presented using multiple examples. This results in the detection of data points with potentially harmful properties for the use in safety-critical systems.

摘要
随着机器学习系统的能力不断提高，它们的潜在应用在安全关键系统中也在增加。为此，保证数据质量的重要性也在增加。在这篇论文中，我们提出了一种新的数据质量保证方法。为此，我们首先介绍了数学基础，然后通过多个示例介绍了该方法。这将导致检测数据点中可能有害的特性，因此不适用于安全关键系统的使用。

One-Shot Pruning for Fast-adapting Pre-trained Models on Devices

paper_url: http://arxiv.org/abs/2307.04365
repo_url: None
paper_authors: Haiyan Zhao, Guodong Long
for: 这篇论文旨在解决大规模预训练模型在低能力设备上部署的问题，通过实现一种可扩展的一击适束方法。
methods: 方法是使用相似任务中删除过的模型删除知识，创建任务特有的权重对，以对原始模型进行删除，从而提取适当大小的子网络，并将其适束到新任务。
results: 实验分析显示，提案方法在各种数据集和任务下，与各种删除基准方法进行比较，具有较高的精度和效率。

Abstract
Large-scale pre-trained models have been remarkably successful in resolving downstream tasks. Nonetheless, deploying these models on low-capability devices still requires an effective approach, such as model pruning. However, pruning the model from scratch can pose a practical challenge given the limited resources of each downstream task or device. To tackle this issue, we present a scalable one-shot pruning method that leverages pruned knowledge of similar tasks to extract a sub-network from the pre-trained model for a new task. Specifically, we create a score mask using the pruned models of similar tasks to identify task-specific filters/nodes in the pre-trained model for the new task. Based on this mask, we conduct a single round of pruning to extract a suitably-sized sub-network that can quickly adapt to the new task with only a few training iterations. Our experimental analysis demonstrates the effectiveness of the proposed method on the convolutional neural networks (CNNs) and vision transformers (ViT) with various datasets. The proposed method consistently outperforms popular pruning baseline methods in terms of accuracy and efficiency when dealing with diverse downstream tasks with different memory constraints.

摘要
Specifically, we create a score mask using the pruned models of similar tasks to identify task-specific filters/nodes in the pre-trained model for the new task. Based on this mask, we conduct a single round of pruning to extract a suitably-sized sub-network that can quickly adapt to the new task with only a few training iterations. Our experimental analysis demonstrates the effectiveness of the proposed method on convolutional neural networks (CNNs) and vision transformers (ViT) with various datasets. The proposed method consistently outperforms popular pruning baseline methods in terms of accuracy and efficiency when dealing with diverse downstream tasks with different memory constraints.

False Sense of Security: Leveraging XAI to Analyze the Reasoning and True Performance of Context-less DGA Classifiers

paper_url: http://arxiv.org/abs/2307.04358
repo_url: https://gitlab.com/rwth-itsec/explainability-analyzed-dga-models
paper_authors: Arthur Drichel, Ulrike Meyer
for: 本研究旨在解决使用Domain Generation Algorithm（DGA）检测 botnet 活动时存在的假 positives 问题，通过使用深度学习分类器达到精度超过 99.9%。
methods: 本研究使用 Explainable Artificial Intelligence（XAI）方法分析深度学习分类器的 reasoning，并系统地揭露了这些分类器的偏见。
results: eliminating 偏见后，DGA 分类器的性能明显下降，但我们提出了一种 context-aware 检测系统，能够维持 state-of-the-art 深度学习分类器的检测率，并提供了一种可视化分析系统，帮助更好地理解分类器的 reasoning，从而提高检测方法的信任和透明度。

Abstract
The problem of revealing botnet activity through Domain Generation Algorithm (DGA) detection seems to be solved, considering that available deep learning classifiers achieve accuracies of over 99.9%. However, these classifiers provide a false sense of security as they are heavily biased and allow for trivial detection bypass. In this work, we leverage explainable artificial intelligence (XAI) methods to analyze the reasoning of deep learning classifiers and to systematically reveal such biases. We show that eliminating these biases from DGA classifiers considerably deteriorates their performance. Nevertheless we are able to design a context-aware detection system that is free of the identified biases and maintains the detection rate of state-of-the art deep learning classifiers. In this context, we propose a visual analysis system that helps to better understand a classifier's reasoning, thereby increasing trust in and transparency of detection methods and facilitating decision-making.

摘要
“botnet活动探测透过类别生成算法（DGA）探测的问题似乎已经解决了，因为可用的深度学习分类器可以达到99.9%以上的准确率。但是这些分类器具有严重的偏见，容易被轻松地逃脱检测。在这个工作中，我们利用可解释人工智能（XAI）方法来分析深度学习分类器的思考过程，并系统地揭露这些偏见。我们发现，从DGA分类器中除掉这些偏见后，其性能会很差。但我们能够设计一个具有上述偏见的 контекст感知检测系统，并维持现有深度学习分类器的检测率。在这个上下文中，我们提出了一个可观分析系统，帮助更好地理解分类器的思考过程，从而增加检测方法的信任和透明度，并且方便决策。”

Formulating A Strategic Plan Based On Statistical Analyses And Applications For Financial Companies Through A Real-World Use Case

paper_url: http://arxiv.org/abs/2307.04778
repo_url: None
paper_authors: Saman Sarraf
for: 这篇论文是为了提出一个基于统计分析的战略计划，以帮助金融公司LendingClub实施数据驱动的决策过程。
methods: 论文使用了不同的统计分析方法，包括机器学习技术，来构建更好的总体数据驱动预测模型。
results: 研究发现，贷款数量对borrower债务抵押率有很大影响，而提出的战略计划可以帮助LendingClub提高收入，同时减少贷款风险。

Abstract
Business statistics play a crucial role in implementing a data-driven strategic plan at the enterprise level to employ various analytics where the outcomes of such a plan enable an enterprise to enhance the decision-making process or to mitigate risks to the organization. In this work, a strategic plan informed by the statistical analysis is introduced for a financial company called LendingClub, where the plan is comprised of exploring the possibility of onboarding a big data platform along with advanced feature selection capacities. The main objectives of such a plan are to increase the company's revenue while reducing the risks of granting loans to borrowers who cannot return their loans. In this study, different hypotheses formulated to address the company's concerns are studied, where the results reveal that the amount of loans profoundly impacts the number of borrowers charging off their loans. Also, the proposed strategic plan includes onboarding advanced analytics such as machine learning technologies that allow the company to build better generalized data-driven predictive models.

摘要
企业统计在实施数据驱动策略方面发挥关键作用，以使用不同的统计分析来帮助企业做出更好的决策或减少对组织的风险。在这个工作中，我们将介绍一份基于统计分析的战略计划，用于一家名为LendingClub的金融公司，以增加公司的收入，同时减少向借款者发放贷款的风险。在这个研究中，我们提出了一些用于解决公司的问题的假设，其结果表明，贷款的数量对借款者养成贷款的数量产生很大的影响。此外，我们的战略计划还包括在boarding高级分析工具，如机器学习技术，以帮助公司建立更好的通用数据驱动预测模型。

Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data

paper_url: http://arxiv.org/abs/2307.04354
repo_url: None
paper_authors: Ruiqi Zhang, Andrea Zanette
for: 本研究是用于提高强化学习策略质量的数据采集和处理方法。
methods: 本研究使用了一种可证明的算法，利用已有的离线数据集来设计单一不反应的探索策略。
results: 研究表明，该算法可以在离线数据集的本地覆盖率和额外数据量的情况下提供可证明的质量保证。

Abstract
In some applications of reinforcement learning, a dataset of pre-collected experience is already available but it is also possible to acquire some additional online data to help improve the quality of the policy. However, it may be preferable to gather additional data with a single, non-reactive exploration policy and avoid the engineering costs associated with switching policies. In this paper we propose an algorithm with provable guarantees that can leverage an offline dataset to design a single non-reactive policy for exploration. We theoretically analyze the algorithm and measure the quality of the final policy as a function of the local coverage of the original dataset and the amount of additional data collected.

摘要
在某些应用中的强化学习中，已经有一个准备好的经验数据集，但也可以获得一些在线数据来提高政策质量。然而，可能更好的选择是使用单一、不反应的探索策略来收集更多数据，而不是在政策之间切换。在这篇论文中，我们提出了一个可证明的 garantía 的算法，可以利用已有的离线数据集来设计单一的不反应策略。我们对算法进行了理论分析，并测量了最终政策质量与原始数据集的地方覆盖率以及额外收集的数据量之间的关系。

On Sufficient Graphical Models

paper_url: http://arxiv.org/abs/2307.04353
repo_url: https://github.com/Beuleup93/ProjetPythonSISE
paper_authors: Bing Li, Kyongwon Kim
For: This paper is written for researchers and practitioners who work with high-dimensional data and are interested in evaluating conditional independence in a nonparametric manner.* Methods: The paper uses recently developed nonlinear sufficient dimension reduction techniques to introduce a sufficient graphical model for evaluating conditional independence. The model is nonparametric and does not make distributional assumptions, but it is based on conditional independence given a set of sufficient predictors with a reduced dimension.* Results: The paper demonstrates that the proposed method outperforms existing methods when the Gaussian or copula Gaussian assumptions are violated, and its performance remains excellent in high-dimensional settings. The method is also shown to be consistent in variable selection.

Abstract
We introduce a sufficient graphical model by applying the recently developed nonlinear sufficient dimension reduction techniques to the evaluation of conditional independence. The graphical model is nonparametric in nature, as it does not make distributional assumptions such as the Gaussian or copula Gaussian assumptions. However, unlike a fully nonparametric graphical model, which relies on the high-dimensional kernel to characterize conditional independence, our graphical model is based on conditional independence given a set of sufficient predictors with a substantially reduced dimension. In this way we avoid the curse of dimensionality that comes with a high-dimensional kernel. We develop the population-level properties, convergence rate, and variable selection consistency of our estimate. By simulation comparisons and an analysis of the DREAM 4 Challenge data set, we demonstrate that our method outperforms the existing methods when the Gaussian or copula Gaussian assumptions are violated, and its performance remains excellent in the high-dimensional setting.

摘要
我们引入一种充分的图形模型，通过最近发展的非线性充分维度减少技术来评估条件独立性。这个图形模型是非 Parametric 性的，意味着它不会对条件独立性进行分布假设，如 Gaussian 或 copula Gaussian 假设。然而，与完全非 Parametric 图形模型不同，我们的图形模型基于条件独立性给定一组充分的预测变量，具有减少维度的优点。这样可以避免高维度 kernel 中的困惑。我们研究这个方法的人口级特性、整体速度和变量选择一致性。通过对比实验和分析 DREAM 4 Challenge 数据集，我们示出了我们的方法在假设不满足 Gaussian 或 copula Gaussian 假设时表现出色，并在高维度设置中保持出色的表现。

MD-HIT: Machine learning for materials property prediction with dataset redundancy control

paper_url: http://arxiv.org/abs/2307.04351
repo_url: https://github.com/usccolumbia/md-hit
paper_authors: Qin Li, Nihang Fu, Sadman Sadeed Omee, Jianjun Hu
For: 本研究旨在解决物料数据集中的重复样本问题，以提高物料预测性能的准确性。* Methods: 本文提出了一种物料数据集重复样本减少算法（MD-HIT），并对其进行了评估。* Results: 研究表明，通过使用 MD-HIT 减少样本重复，可以更好地反映物料预测性能的准确性。

Abstract
Materials datasets are usually featured by the existence of many redundant (highly similar) materials due to the tinkering material design practice over the history of materials research. For example, the materials project database has many perovskite cubic structure materials similar to SrTiO$_3$. This sample redundancy within the dataset makes the random splitting of machine learning model evaluation to fail so that the ML models tend to achieve over-estimated predictive performance which is misleading for the materials science community. This issue is well known in the field of bioinformatics for protein function prediction, in which a redundancy reduction procedure (CD-Hit) is always applied to reduce the sample redundancy by ensuring no pair of samples has a sequence similarity greater than a given threshold. This paper surveys the overestimated ML performance in the literature for both composition based and structure based material property prediction. We then propose a material dataset redundancy reduction algorithm called MD-HIT and evaluate it with several composition and structure based distance threshold sfor reducing data set sample redundancy. We show that with this control, the predicted performance tends to better reflect their true prediction capability. Our MD-hit code can be freely accessed at https://github.com/usccolumbia/MD-HIT

摘要
In this paper, we survey the overestimated ML performance in the literature for both composition-based and structure-based material property prediction. We then propose a material dataset redundancy reduction algorithm called MD-HIT and evaluate it with several composition and structure-based distance thresholds for reducing data set sample redundancy. Our results show that with this control, the predicted performance tends to better reflect their true prediction capability. The MD-hit code can be freely accessed at https://github.com/usccolumbia/MD-HIT.

RLTF: Reinforcement Learning from Unit Test Feedback

paper_url: http://arxiv.org/abs/2307.04349
repo_url: https://github.com/zyq-scut/rltf
paper_authors: Jiate Liu, Yiqin Zhu, Kaiwen Xiao, Qiang Fu, Xiao Han, Wei Yang, Deheng Ye
for: 提高大型自然语言模型（LLM）的代码生成性能。
methods: 使用强化学习（RL）方法，并在训练过程中接受多级别单元测试反馈。
results: 在APPS和MBPPbenchmark上实现了状态的最佳性能。

Abstract
The goal of program synthesis, or code generation, is to generate executable code based on given descriptions. Recently, there has been an increasing number of studies employing reinforcement learning (RL) to improve the performance of large language models (LLMs) for code. However, these RL methods have only used offline frameworks, limiting their exploration of new sample spaces. Additionally, current approaches that utilize unit test signals are rather simple, not accounting for specific error locations within the code. To address these issues, we proposed RLTF, i.e., Reinforcement Learning from Unit Test Feedback, a novel online RL framework with unit test feedback of multi-granularity for refining code LLMs. Our approach generates data in real-time during training and simultaneously utilizes fine-grained feedback signals to guide the model towards producing higher-quality code. Extensive experiments show that RLTF achieves state-of-the-art performance on the APPS and the MBPP benchmarks. Our code can be found at: https://github.com/Zyq-scut/RLTF.

摘要
目标是使程序生成器、代码生成器或代码生成器可以根据给定的描述生成可执行的代码。近些年来，有越来越多的研究使用强化学习（RL）来提高大型自然语言模型（LLM）的代码生成性能。然而，这些RL方法只使用了离线框架，限制了新样本空间的探索。此外，现有的使用单元测试信号的方法相对简单，没有考虑特定的错误位置在代码中。为解决这些问题，我们提出了RLTF，即基于单元测试反馈的强化学习框架。我们的方法在训练过程中生成数据并同时使用多级别的反馈信号来引导模型生成更高质量的代码。我们的实验表明，RLTF可以在APPS和MBPPbenchmark上达到当前最佳性能。我们的代码可以在github上找到：https://github.com/Zyq-scut/RLTF。

Injecting Logical Constraints into Neural Networks via Straight-Through Estimators

paper_url: http://arxiv.org/abs/2307.04347
repo_url: https://github.com/azreasoners/cl-ste
paper_authors: Zhun Yang, Joohyung Lee, Chiyoun Park
for: 将逻辑约束直接注入到神经网络学习中
methods: 使用直通估计器将逻辑约束转化为损失函数，使神经网络通过梯度下降更新参数，以满足逻辑约束
results: 在使用GPU和批处理训练时，方法可以规模更好地比较现有的神经符号计算方法，同时可以在不同类型的神经网络上学习，无需或减少标注数据

Abstract
Injecting discrete logical constraints into neural network learning is one of the main challenges in neuro-symbolic AI. We find that a straight-through-estimator, a method introduced to train binary neural networks, could effectively be applied to incorporate logical constraints into neural network learning. More specifically, we design a systematic way to represent discrete logical constraints as a loss function; minimizing this loss using gradient descent via a straight-through-estimator updates the neural network's weights in the direction that the binarized outputs satisfy the logical constraints. The experimental results show that by leveraging GPUs and batch training, this method scales significantly better than existing neuro-symbolic methods that require heavy symbolic computation for computing gradients. Also, we demonstrate that our method applies to different types of neural networks, such as MLP, CNN, and GNN, making them learn with no or fewer labeled data by learning directly from known constraints.

摘要
injecting discrete logical constraints into neural network learning is one of the main challenges in neuro-symbolic AI. we find that a straight-through-estimator, a method introduced to train binary neural networks, could effectively be applied to incorporate logical constraints into neural network learning. more specifically, we design a systematic way to represent discrete logical constraints as a loss function; minimizing this loss using gradient descent via a straight-through-estimator updates the neural network's weights in the direction that the binarized outputs satisfy the logical constraints. the experimental results show that by leveraging GPUs and batch training, this method scales significantly better than existing neuro-symbolic methods that require heavy symbolic computation for computing gradients. also, we demonstrate that our method applies to different types of neural networks, such as MLP, CNN, and GNN, making them learn with no or fewer labeled data by learning directly from known constraints.Here's the translation in Traditional Chinese as well:injecting discrete logical constraints into neural network learning is one of the main challenges in neuro-symbolic AI. we find that a straight-through-estimator, a method introduced to train binary neural networks, could effectively be applied to incorporate logical constraints into neural network learning. more specifically, we design a systematic way to represent discrete logical constraints as a loss function; minimizing this loss using gradient descent via a straight-through-estimator updates the neural network's weights in the direction that the binarized outputs satisfy the logical constraints. the experimental results show that by leveraging GPUs and batch training, this method scales significantly better than existing neuro-symbolic methods that require heavy symbolic computation for computing gradients. also, we demonstrate that our method applies to different types of neural networks, such as MLP, CNN, and GNN, making them learn with no or fewer labeled data by learning directly from known constraints.

Continual Learning as Computationally Constrained Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.04345
repo_url: None
paper_authors: Saurabh Kumar, Henrik Marklund, Ashish Rao, Yifan Zhu, Hong Jun Jeon, Yueyang Liu, Benjamin Van Roy
for: 本研究旨在开发一种能够高效储存知识，逐渐提高人工智能能力的智能代理人。
methods: 本文提出了一种概念和工具集，用于探讨智能代理人的持续学习问题。
results: 本研究提供了一个明确和正式的定义和框架，以促进持续学习领域的进一步研究。

Abstract
An agent that efficiently accumulates knowledge to develop increasingly sophisticated skills over a long lifetime could advance the frontier of artificial intelligence capabilities. The design of such agents, which remains a long-standing challenge of artificial intelligence, is addressed by the subject of continual learning. This monograph clarifies and formalizes concepts of continual learning, introducing a framework and set of tools to stimulate further research.

摘要
一个智能代理人，能够有效地积累知识，开发越来越复杂的技能，可能会推动人工智能技能的前沿。该代理人的设计，仍然是人工智能领域的长期挑战。本论文将 clarify和正式化持续学习的概念，提出一个框架和一组工具，以促进更多的研究。

Privacy-Preserving Graph Machine Learning from Data to Computation: A Survey

paper_url: http://arxiv.org/abs/2307.04338
repo_url: None
paper_authors: Dongqi Fu, Wenxuan Bao, Ross Maciejewski, Hanghang Tong, Jingrui He
for: 这篇论文主要关注于保护敏感信息的图机器学习隐私技术。
methods: 论文系统地回顾了图数据生成的隐私技术，以及在多方合作下传输隐私保护的图模型参数的方法。
results: 论文详细介绍了现有的隐私保护技术和软件工具，同时也提出了未来研究的挑战和可能性。最终，论文概述了一个简单、通用的安全图机器学习系统。

Abstract
In graph machine learning, data collection, sharing, and analysis often involve multiple parties, each of which may require varying levels of data security and privacy. To this end, preserving privacy is of great importance in protecting sensitive information. In the era of big data, the relationships among data entities have become unprecedentedly complex, and more applications utilize advanced data structures (i.e., graphs) that can support network structures and relevant attribute information. To date, many graph-based AI models have been proposed (e.g., graph neural networks) for various domain tasks, like computer vision and natural language processing. In this paper, we focus on reviewing privacy-preserving techniques of graph machine learning. We systematically review related works from the data to the computational aspects. We first review methods for generating privacy-preserving graph data. Then we describe methods for transmitting privacy-preserved information (e.g., graph model parameters) to realize the optimization-based computation when data sharing among multiple parties is risky or impossible. In addition to discussing relevant theoretical methodology and software tools, we also discuss current challenges and highlight several possible future research opportunities for privacy-preserving graph machine learning. Finally, we envision a unified and comprehensive secure graph machine learning system.

摘要
在图机器学习中，数据采集、分享和分析经常涉及多方面，每个方面可能需要不同的数据安全和隐私保护。因此，保护隐私是图机器学习中非常重要的一环。在大数据时代，数据之间的关系变得无前例地复杂，更多的应用利用高级数据结构（即图）来支持网络结构和相关属性信息。到目前为止，许多基于图的AI模型（如图神经网络）已经被提出，用于各种领域任务，如计算机视觉和自然语言处理。在本文中，我们关注图机器学习中的隐私保护技术的评审。我们系统地回顾相关的数据和计算方面的方法。我们首先回顾如何生成隐私保护的图数据。然后我们描述如何将隐私保持的信息（如图模型参数）传输，以实现在多方共享数据时的优化计算。除了讨论相关的理论方法和软件工具外，我们还讨论当前的挑战和可能的未来研究机会。最后，我们拟定了一个统一和完整的安全图机器学习系统。

Source-Aware Embedding Training on Heterogeneous Information Networks

paper_url: http://arxiv.org/abs/2307.04336
repo_url: None
paper_authors: Tsai Hor Chan, Chi Ho Wong, Jiajun Shen, Guosheng Yin
for: 这篇论文旨在提出一种可扩展的无监督多源异构信息网络嵌入方法，以解决现有的异构信息网络嵌入方法忽略多个来源子图的分布差异问题。
methods: 该方法基于一种新的嵌入空间分解技术，可以适应不同来源的异构信息网络，并通过一种扩展的聚合推理来将各个来源的嵌入空间调整到同一个空间中。
results: 实验结果表明，SUMSHINE方法可以在实际世界数据集上达到现有方法的性能水平，同时具有更好的扩展性和灵活性。

Abstract
Heterogeneous information networks (HINs) have been extensively applied to real-world tasks, such as recommendation systems, social networks, and citation networks. While existing HIN representation learning methods can effectively learn the semantic and structural features in the network, little awareness was given to the distribution discrepancy of subgraphs within a single HIN. However, we find that ignoring such distribution discrepancy among subgraphs from multiple sources would hinder the effectiveness of graph embedding learning algorithms. This motivates us to propose SUMSHINE (Scalable Unsupervised Multi-Source Heterogeneous Information Network Embedding) -- a scalable unsupervised framework to align the embedding distributions among multiple sources of an HIN. Experimental results on real-world datasets in a variety of downstream tasks validate the performance of our method over the state-of-the-art heterogeneous information network embedding algorithms.

摘要
各种多源多类网络（HINs）已经广泛应用于现实世界任务中，如推荐系统、社交网络和引用网络。 existing HIN表示学习方法可以有效学习网络的semantic和结构特征，但是忽略了各个子图在单个HIN中的分布差异。然而，我们发现忽略这些分布差异会降低图像学习算法的效果。这种情况 Motivates我们提出 SUMSHINE（可扩展无监督多源多类网络嵌入）——一种可扩展无监督框架，用于对多个HIN的嵌入分布进行对齐。实验结果表明，我们的方法在真实世界数据上的多种下游任务中表现出色，胜过现状的多种多样化网络嵌入算法。

Enhancing Adversarial Robustness via Score-Based Optimization

paper_url: http://arxiv.org/abs/2307.04333
repo_url: None
paper_authors: Boya Zhang, Weijian Luo, Zhihua Zhang
for: 本研究旨在提出一种新的防御机制，以提高深度神经网络分类器对势力攻击的抵抗能力。
methods: 该方法使用分Diffusion模型来优化对抗样本，以便在测试时更好地抵抗势力攻击。
results: 实验结果表明，ScoreOpt方法可以在多个数据集上（包括CIFAR10、CIFAR100和ImageNet）击败现有的防御方法，both in terms of robustness performance和推理速度。

Abstract
Adversarial attacks have the potential to mislead deep neural network classifiers by introducing slight perturbations. Developing algorithms that can mitigate the effects of these attacks is crucial for ensuring the safe use of artificial intelligence. Recent studies have suggested that score-based diffusion models are effective in adversarial defenses. However, existing diffusion-based defenses rely on the sequential simulation of the reversed stochastic differential equations of diffusion models, which are computationally inefficient and yield suboptimal results. In this paper, we introduce a novel adversarial defense scheme named ScoreOpt, which optimizes adversarial samples at test-time, towards original clean data in the direction guided by score-based priors. We conduct comprehensive experiments on multiple datasets, including CIFAR10, CIFAR100 and ImageNet. Our experimental results demonstrate that our approach outperforms existing adversarial defenses in terms of both robustness performance and inference speed.

摘要
深度神经网络分类器可能会被抗击攻击诱导，引入微小的扰动。为确保人工智能的安全使用，开发有效的抗击攻击算法是非常重要。latest studies suggest that score-based diffusion models are effective in adversarial defenses. However, existing diffusion-based defenses rely on the sequential simulation of the reversed stochastic differential equations of diffusion models, which are computationally inefficient and yield suboptimal results. In this paper, we introduce a novel adversarial defense scheme named ScoreOpt, which optimizes adversarial samples at test-time, towards original clean data in the direction guided by score-based priors. We conduct comprehensive experiments on multiple datasets, including CIFAR10, CIFAR100 and ImageNet. Our experimental results demonstrate that our approach outperforms existing adversarial defenses in terms of both robustness performance and inference speed.Here's the translation in Traditional Chinese:深度神经网络分类器可能会被抗击攻击诱导，引入微小的扰动。为确保人工智能的安全使用，开发有效的抗击攻击算法是非常重要。latest studies suggest that score-based diffusion models are effective in adversarial defenses. However, existing diffusion-based defenses rely on the sequential simulation of the reversed stochastic differential equations of diffusion models, which are computationally inefficient and yield suboptimal results. In this paper, we introduce a novel adversarial defense scheme named ScoreOpt, which optimizes adversarial samples at test-time, towards original clean data in the direction guided by score-based priors. We conduct comprehensive experiments on multiple datasets, including CIFAR10, CIFAR100 and ImageNet. Our experimental results demonstrate that our approach outperforms existing adversarial defenses in terms of both robustness performance and inference speed.

Leveraging Multiple Descriptive Features for Robust Few-shot Image Learning

paper_url: http://arxiv.org/abs/2307.04317
repo_url: None
paper_authors: Zhili Feng, Anna Bair, J. Zico Kolter
for: 这篇论文旨在提出一种新的图像分类方法，以便更好地评估图像分类决策中的视觉特征。
methods: 这篇论文使用了一种大型语言模型（LLM）来自动生成每个分类的多个视觉描述，然后使用视图图像模型将这些描述翻译成每个图像的多个视觉特征。最后，使用稀疏逻辑回归选择每个图像的相关特征进行分类。
results: compared to标准方法such as linear probing, this method outperforms in the few-shot learning setting, and when combined with fine-tuning, it also outperforms existing state-of-the-art finetuning approaches on both in-distribution and out-of-distribution performance.

Abstract
Modern image classification is based upon directly predicting model classes via large discriminative networks, making it difficult to assess the intuitive visual ``features'' that may constitute a classification decision. At the same time, recent works in joint visual language models such as CLIP provide ways to specify natural language descriptions of image classes but typically focus on providing single descriptions for each class. In this work, we demonstrate that an alternative approach, arguably more akin to our understanding of multiple ``visual features'' per class, can also provide compelling performance in the robust few-shot learning setting. In particular, we automatically enumerate multiple visual descriptions of each class -- via a large language model (LLM) -- then use a vision-image model to translate these descriptions to a set of multiple visual features of each image; we finally use sparse logistic regression to select a relevant subset of these features to classify each image. This both provides an ``intuitive'' set of relevant features for each class, and in the few-shot learning setting, outperforms standard approaches such as linear probing. When combined with finetuning, we also show that the method is able to outperform existing state-of-the-art finetuning approaches on both in-distribution and out-of-distribution performance.

摘要
现代图像分类基于直接预测模型类via大量推理网络，这使得评估视觉“特征”的概念变得更加困难。同时，现有的 JOINT 视觉语言模型，如 CLIP，可以提供每个类型的自然语言描述，但通常只focus on提供每个类型的单一描述。在这个工作中，我们展示了一种alternative方法，更加类似于我们对多个“视觉特征”的认知，可以在robust few-shot learning setting中提供出色的表现。具体来说，我们使用大型语言模型（LLM）自动生成每个类型的多个视觉描述，然后使用视觉图像模型将这些描述翻译成每个图像的多个视觉特征；最后，我们使用稀疏逻辑回归选择每个图像的相关子集特征进行分类。这种方法不仅提供了每个类型的“直观”相关特征，而且在few-shot learning setting中，也超越了标准方法such as linear probing。当与finetuning结合使用时，我们还证明了该方法可以超越现有的state-of-the-art finetuning方法，包括在分布式和非分布式情况下的表现。

Data-driven Nonlinear Parametric Model Order Reduction Framework using Deep Hierarchical Variational Autoencoder

paper_url: http://arxiv.org/abs/2307.06816
repo_url: None
paper_authors: SiHun Lee, Sangmin Lee, Kijoo Jang, Haeseong Cho, SangJoon Shin
for: 提出了一种基于深度人工神经网络的数据驱动参数缩放方法，用于非线性动力系统的参数 interpolating。
methods: 使用了一种叫做Least-Squares Hierarchical Variational Autoencoder（LSH-VAE）的深度人工神经网络，该网络能够进行非线性参数缩放。LSH-VAE 通过两种主要改进：嵌入了一个嵌入的深度结构和一个权重加权概率损失函数。这两种改进使得LSH-VAE 的准确性和稳定性与传统的非线性参数缩放方法、自动Encoder和Variational Autoencoder相比有了显著提高。
results: 基于LSH-VAE，提出了一种基于圆拟 interpolating 的参数缩放框架。该框架在三个非线性多物理动力系统上进行了验证和评估，并与传统的非线性参数缩放方法进行了比较。结果显示，LSH-VAE 能够在准确性和速度两个方面与传统方法相比有显著优势。

Abstract
A data-driven parametric model order reduction (MOR) method using a deep artificial neural network is proposed. The present network, which is the least-squares hierarchical variational autoencoder (LSH-VAE), is capable of performing nonlinear MOR for the parametric interpolation of a nonlinear dynamic system with a significant number of degrees of freedom. LSH-VAE exploits two major changes to the existing networks: a hierarchical deep structure and a hybrid weighted, probabilistic loss function. The enhancements result in a significantly improved accuracy and stability compared against the conventional nonlinear MOR methods, autoencoder, and variational autoencoder. Upon LSH-VAE, a parametric MOR framework is presented based on the spherically linear interpolation of the latent manifold. The present framework is validated and evaluated on three nonlinear and multiphysics dynamic systems. First, the present framework is evaluated on the fluid-structure interaction benchmark problem to assess its efficiency and accuracy. Then, a highly nonlinear aeroelastic phenomenon, limit cycle oscillation, is analyzed. Finally, the present framework is applied to a three-dimensional fluid flow to demonstrate its capability of efficiently analyzing a significantly large number of degrees of freedom. The performance of LSH-VAE is emphasized by comparing its results against that of the widely used nonlinear MOR methods, convolutional autoencoder, and $\beta$-VAE. The present framework exhibits a significantly enhanced accuracy to the conventional methods while still exhibiting a large speed-up factor.

摘要
提出了一种基于深度人工神经网络的数据驱动参数化模型简化方法（MOR）。该网络为Least-Squares Hierarchical Variational Autoencoder（LSH-VAE），可以实现非线性MOR，用于 interpolating非线性动力系统中的多个自由度。LSH-VAE利用了两个主要改进：层次深度结构和权重加权概率损失函数。这些改进使得MOR的准确性和稳定性得到了明显提高，相比于传统的非线性MOR方法、自动encoder和variational autoencoder。基于LSH-VAE，一种基于圆拟 interpolating latent manifold的参数化MOR框架被提出。该框架在三个非线性和多物理动力系统上进行了验证和评估。首先，该框架在流体-结构交互问题上进行了效率和准确性的评估。然后，一种高度非线性的风动现象，限цик尔振荡，进行了分析。最后，该框架在三维流体流中进行了efficient地分析一个相对较大的自由度。LSH-VAE的性能被与传统的非线性MOR方法、卷积自动encoder和β-VAE进行了比较，其准确性与传统方法相比具有明显提高，同时仍然具有大快速因子。

CT-BERT: Learning Better Tabular Representations Through Cross-Table Pre-training

paper_url: http://arxiv.org/abs/2307.04308
repo_url: None
paper_authors: Chao Ye, Guoshan Lu, Haobo Wang, Liyao Li, Sai Wu, Gang Chen, Junbo Zhao
for: 这个论文的目的是为了提出一种可以在大规模表格数据上预训练表格数据的方法，以便在表格数据上实现通用表示。methods: 这个论文使用了一种名为 CT-BERT 的新框架，该框架可以在跨表格上进行预训练。 CT-BERT 可以与 both 监督学习和自监学习方法一起使用，并且提出了一种基于对比学习的表格模型目标函数。results: 论文的实验结果显示，CT-BERT 在 15 个 dataset 上达到了州际前进的性能，其中包括监督学习和自监学习两种不同的设置。 CT-BERT 的性能都高于先前的方法。

Abstract
Tabular data -- also known as structured data -- is one of the most common data forms in existence, thanks to the stable development and scaled deployment of database systems in the last few decades. At present however, despite the blast brought by large pre-trained models in other domains such as ChatGPT or SAM, how can we extract common knowledge across tables at a scale that may eventually lead to generalizable representation for tabular data remains a full blank. Indeed, there have been a few works around this topic. Most (if not all) of them are limited in the scope of a single table or fixed form of a schema. In this work, we first identify the crucial research challenges behind tabular data pre-training, particularly towards the cross-table scenario. We position the contribution of this work in two folds: (i)-we collect and curate nearly 2k high-quality tabular datasets, each of which is guaranteed to possess clear semantics, clean labels, and other necessary meta information. (ii)-we propose a novel framework that allows cross-table pre-training dubbed as CT-BERT. Noticeably, in light of pioneering the scaled cross-table training, CT-BERT is fully compatible with both supervised and self-supervised schemes, where the specific instantiation of CT-BERT is very much dependent on the downstream tasks. We further propose and implement a contrastive-learning-based and masked table modeling (MTM) objective into CT-BERT, that is inspired from computer vision and natural language processing communities but sophistically tailored to tables. The extensive empirical results on 15 datasets demonstrate CT-BERT's state-of-the-art performance, where both its supervised and self-supervised setups significantly outperform the prior approaches.

摘要
表格数据 -- 也称为结构化数据 -- 是现代数据的最常见形式，这主要归功于过去几十年内Database系统的稳定发展和大规模部署。然而，当前，即使大型预训模型在其他领域，如ChatGPT或SAM，所带来的冲击，总之，如何在大规模的表格数据中提取通用知识，以达到可generalizable的表格数据表示尚未得到解决。事实上，有一些相关的研究工作。大多数（如果不是所有）这些工作都受到单个表格或固定的表格Schema的限制。在这种工作中，我们首先 indentified the crucial research challenges behind tabular data pre-training, particularly in the cross-table scenario。我们的贡献在两个方面：（i）我们收集和精心整理了nearly 2k高质量的表格数据集，每个数据集都具有明确的 semantics、clean labels和其他必要的元信息。（ii）我们提出了一种新的框架，称为 CT-BERT，可以在跨表格上进行预训。另外，我们还提出了一种基于对比学习和遮盖表格模型（MTM）的目标函数，这种目标函数是从计算机视觉和自然语言处理社区中吸取的，但是它在表格上进行了精心修改。我们的实验结果表明，CT-BERT在15个数据集上 display state-of-the-art performance，其中包括supervised和self-supervised设置。

Automatic Piano Transcription with Hierarchical Frequency-Time Transformer

paper_url: http://arxiv.org/abs/2307.04305
repo_url: https://github.com/sony/hft-transformer
paper_authors: Keisuke Toyama, Taketo Akama, Yukara Ikemiya, Yuhta Takida, Wei-Hsiang Liao, Yuki Mitsufuji
for: automatic piano transcription, especially for determining the precise onset and offset of each note in polyphonic piano content.
methods: hFT-Transformer, a two-level hierarchical frequency-time Transformer architecture that captures long-term dependencies in the frequency and time axes using self-attention mechanism.
results: state-of-the-art performance on all F1-scores of metrics among Frame, Note, Note with Offset, and Note with Offset and Velocity estimations, as demonstrated on the widely used MAPS and MAESTRO v3.0.0 datasets.

Abstract
Taking long-term spectral and temporal dependencies into account is essential for automatic piano transcription. This is especially helpful when determining the precise onset and offset for each note in the polyphonic piano content. In this case, we may rely on the capability of self-attention mechanism in Transformers to capture these long-term dependencies in the frequency and time axes. In this work, we propose hFT-Transformer, which is an automatic music transcription method that uses a two-level hierarchical frequency-time Transformer architecture. The first hierarchy includes a convolutional block in the time axis, a Transformer encoder in the frequency axis, and a Transformer decoder that converts the dimension in the frequency axis. The output is then fed into the second hierarchy which consists of another Transformer encoder in the time axis. We evaluated our method with the widely used MAPS and MAESTRO v3.0.0 datasets, and it demonstrated state-of-the-art performance on all the F1-scores of the metrics among Frame, Note, Note with Offset, and Note with Offset and Velocity estimations.

摘要
需要考虑长期 spectral 和 temporal 依赖性，以便自动识别钢琴乐谱。特别是在确定每个乐谱中的精确开始和结束时间点时，长期依赖性对于多重钢琴内容非常重要。在这种情况下，我们可以利用 Transformer 模型中的自我注意力机制，以捕捉在频谱和时间轴上的长期依赖性。在这项工作中，我们提出了 hFT-Transformer，这是一种使用两级叠加频谱-时间 Transformer 架构的自动音乐识别方法。第一层包括时间轴中的卷积块，频谱轴中的 Transformer 编码器，以及将频谱维度转换为时间轴的 Transformer 解码器。输出然后被 fed 到第二层，该层包括另一个时间轴中的 Transformer 编码器。我们对 widely 使用的 MAPS 和 MAESTRO v3.0.0 数据集进行了评估，并示出了所有 F1-scores 的 metric 中的状态顶峰性能。

Edge Storage Management Recipe with Zero-Shot Data Compression for Road Anomaly Detection

paper_url: http://arxiv.org/abs/2307.04298
repo_url: None
paper_authors: YeongHyeon Park, Uju Gim, Myung Jin Kim
for: 本研究旨在提出一种高效的存储管理方法，以便在Edge计算环境中进行噪声检测系统的实现。
methods: 本研究提出了一种使用预训练 autoencoder 进行数据压缩的方法，以便减少数据的存储和传输成本。
results: 经过对比性测试，我们发现该方法可以保持噪声检测性能，同时提高存储和传输效率。

Abstract
Recent studies show edge computing-based road anomaly detection systems which may also conduct data collection simultaneously. However, the edge computers will have small data storage but we need to store the collected audio samples for a long time in order to update existing models or develop a novel method. Therefore, we should consider an approach for efficient storage management methods while preserving high-fidelity audio. A hardware-perspective approach, such as using a low-resolution microphone, is an intuitive way to reduce file size but is not recommended because it fundamentally cuts off high-frequency components. On the other hand, a computational file compression approach that encodes collected high-resolution audio into a compact code should be recommended because it also provides a corresponding decoding method. Motivated by this, we propose a way of simple yet effective pre-trained autoencoder-based data compression method. The pre-trained autoencoder is trained for the purpose of audio super-resolution so it can be utilized to encode or decode any arbitrary sampling rate. Moreover, it will reduce the communication cost for data transmission from the edge to the central server. Via the comparative experiments, we confirm that the zero-shot audio compression and decompression highly preserve anomaly detection performance while enhancing storage and transmission efficiency.

摘要
近期研究显示基于边缘计算的公路异常检测系统可能同时进行数据收集。然而，边缘计算机器具有小容量数据存储，我们需要长期保存收集的音频采样以更新现有模型或开发新方法。因此，我们应该考虑一种高效存储管理方法，同时保持高质量音频。一种硬件视角的方法，如使用低分辨率 Mikrofon，不建议使用，因为它基本上切断高频组件。相反，一种计算机件压缩方法，通过编码收集的高分辨率音频为紧凑的编码，应该得到推荐。受到这一点的激励，我们提出了简单又有效的预训练自动编码器基于数据压缩方法。预训练自动编码器是为了音频超分辨率而训练的，因此可以用于编码或解码任何采样率。此外，它还会降低边缘到中央服务器的数据传输成本。通过比较实验，我们证明了零aser音频压缩和解压缩可以高度保持异常检测性能，同时提高存储和传输效率。

Online Ad Procurement in Non-stationary Autobidding Worlds

paper_url: http://arxiv.org/abs/2307.05698
repo_url: None
paper_authors: Jason Cheuk Nam Liang, Haihao Lu, Baoyu Zhou
for: 本研究旨在帮助在线广告主动ynamically优化广告平台的杠杆决策，以便更好地管理媒体购买预算和竞争对手的影响，并在长期环境下减少 regret。
methods: 本研究使用了在线学习框架，并引入了 primal-dual 算法来解决多维决策变量、飞行反馈和长期不确定约束的问题。
results: 本研究表明，我们的算法在不知道飞行过程的情况下可以 achieve low regret in many worlds，并且可以在不同类型的飞行过程中提供优化的杠杆决策。

Abstract
Today's online advertisers procure digital ad impressions through interacting with autobidding platforms: advertisers convey high level procurement goals via setting levers such as budget, target return-on-investment, max cost per click, etc.. Then ads platforms subsequently procure impressions on advertisers' behalf, and report final procurement conversions (e.g. click) to advertisers. In practice, advertisers may receive minimal information on platforms' procurement details, and procurement outcomes are subject to non-stationary factors like seasonal patterns, occasional system corruptions, and market trends which make it difficult for advertisers to optimize lever decisions effectively. Motivated by this, we present an online learning framework that helps advertisers dynamically optimize ad platform lever decisions while subject to general long-term constraints in a realistic bandit feedback environment with non-stationary procurement outcomes. In particular, we introduce a primal-dual algorithm for online decision making with multi-dimension decision variables, bandit feedback and long-term uncertain constraints. We show that our algorithm achieves low regret in many worlds when procurement outcomes are generated through procedures that are stochastic, adversarial, adversarially corrupted, periodic, and ergodic, respectively, without having to know which procedure is the ground truth. Finally, we emphasize that our proposed algorithm and theoretical results extend beyond the applications of online advertising.

摘要
今天的在线广告主通过交互式自动拍卖平台购买数字广告印象：广告主通过设置杠杆如预算、目标回报率、最高单击成本等来传递高级购买目标。然后广告平台会在广告主的 behalf 购买印象，并将最终购买转化（例如点击）报告给广告主。在实践中，广告主可能会收到最小的平台购买详细信息，并且购买结果受到不同因素的影响，如季节性模式、 occasional 系统腐蚀和市场趋势，这使得广告主难以有效地优化杠杆决策。为了解决这个问题，我们提出了一个在线学习框架，帮助广告主在面临实际的链接环境下动态优化广告平台杠杆决策，同时遵循长期不确定的约束。具体来说，我们引入了 primal-dual 算法，用于在线决策中的多维决策变量、链接反馈和长期不确定约束。我们证明了我们的算法在许多世界中具有低念悟，无需知道采用哪种程序是真实的。最后，我们强调了我们的提出的算法和理论结果超出了在线广告应用的限制，可以应用于其他领域。

Generalizing Graph ODE for Learning Complex System Dynamics across Environments

paper_url: http://arxiv.org/abs/2307.04287
repo_url: None
paper_authors: Zijie Huang, Yizhou Sun, Wei Wang
for: 学习多体系统动力学，特别是在生物学中的分子动力学，以便更好地预测系统的未来轨迹。
methods: 我们提出了一种机器学习框架，名为GG-ODE（通用图Ordinary Differential Equations），可以学习不同环境中系统动力学的连续性。我们使用图神经网络（GNNs）参数化神经普通 diferencial equations（ODEs），以捕捉多体系统之间的连续交互。为了实现模型泛化，我们假设不同环境下的动力学都受到共同的物理法则控制，可以通过学习共享的ODE函数来捕捉。
results: 我们的模型可以准确预测系统动力学，特别是在长距离上，并且可以将新系统中的少量观测数据泛化到整个系统。在多种物理 simulate 中进行了实验，我们的模型能够准确预测系统动力学，特别是在长距离上，并且可以将新系统中的少量观测数据泛化到整个系统。

Abstract
Learning multi-agent system dynamics has been extensively studied for various real-world applications, such as molecular dynamics in biology. Most of the existing models are built to learn single system dynamics from observed historical data and predict the future trajectory. In practice, however, we might observe multiple systems that are generated across different environments, which differ in latent exogenous factors such as temperature and gravity. One simple solution is to learn multiple environment-specific models, but it fails to exploit the potential commonalities among the dynamics across environments and offers poor prediction results where per-environment data is sparse or limited. Here, we present GG-ODE (Generalized Graph Ordinary Differential Equations), a machine learning framework for learning continuous multi-agent system dynamics across environments. Our model learns system dynamics using neural ordinary differential equations (ODE) parameterized by Graph Neural Networks (GNNs) to capture the continuous interaction among agents. We achieve the model generalization by assuming the dynamics across different environments are governed by common physics laws that can be captured via learning a shared ODE function. The distinct latent exogenous factors learned for each environment are incorporated into the ODE function to account for their differences. To improve model performance, we additionally design two regularization losses to (1) enforce the orthogonality between the learned initial states and exogenous factors via mutual information minimization; and (2) reduce the temporal variance of learned exogenous factors within the same system via contrastive learning. Experiments over various physical simulations show that our model can accurately predict system dynamics, especially in the long range, and can generalize well to new systems with few observations.

摘要
学习多智能体系统动态已经广泛研究了各种现实世界应用，如生物分子动力学。大多数现有模型都是建立来学习单个系统动态从观察到数据，预测未来轨迹。然而，在实践中，我们可能会观察到来自不同环境的多个系统，这些系统之间的潜在因素不同，如温度和重力。一个简单的解决方案是学习每个环境特定的模型，但这会忽略系统动态之间的共同特征，并且预测结果在每个环境数据稀缺或有限时会不佳。我们提出了GG-ODE（通用图 ordininary differential equations）机器学习框架，用于学习连续多智能体系统动态。我们的模型通过使用图神经网络（GNNs）参数化神经ordinary differential equations（ODE）来捕捉连续智能体之间的交互。我们通过假设不同环境的动态都受到共同的物理法则所控制，以学习共同的ODE函数来泛化模型。每个环境的潜在隐藏因素被学习到ODE函数中，以考虑它们之间的差异。为了提高模型性能，我们还设计了两种常见化loss，即（1）通过对学习的初始状态和隐藏因素进行相互信息减少来保持对隐藏因素的正交性; 和（2）在同一个系统中减少学习的时间异谱。通过对各种物理 simulate experiment 进行测试，我们发现我们的模型可以准确预测系统动态，特别是在长距离内，并且可以良好地泛化到新系统。

Assessing the efficacy of large language models in generating accurate teacher responses

paper_url: http://arxiv.org/abs/2307.04274
repo_url: None
paper_authors: Yann Hicke, Abhishek Masand, Wentao Guo, Tushaar Gangavarapu
for: 这个研究旨在评估大语言模型在提供教育性帮助和指导中的生成能力，以采用知识able teacher的角色进行 simulate。
methods: 研究使用了多个标准的生成模型，包括GPT-4（少量示例学习）、精度调整GPT-2和DialogGPT，以及通过强化学习来优化Flan-T5模型。
results: 实验结果表明，GPT-4在BERTScore和DialogRPT上的表现较高，而其他精度调整模型表现较差，这些结果提示了 dataset 特性的影响，如采样、代表性和对话完整性，对于精度调整具有 significanteffect。

Abstract
(Tack et al., 2023) organized the shared task hosted by the 18th Workshop on Innovative Use of NLP for Building Educational Applications on generation of teacher language in educational dialogues. Following the structure of the shared task, in this study, we attempt to assess the generative abilities of large language models in providing informative and helpful insights to students, thereby simulating the role of a knowledgeable teacher. To this end, we present an extensive evaluation of several benchmarking generative models, including GPT-4 (few-shot, in-context learning), fine-tuned GPT-2, and fine-tuned DialoGPT. Additionally, to optimize for pedagogical quality, we fine-tuned the Flan-T5 model using reinforcement learning. Our experimental findings on the Teacher-Student Chatroom Corpus subset indicate the efficacy of GPT-4 over other fine-tuned models, measured using BERTScore and DialogRPT. We hypothesize that several dataset characteristics, including sampling, representativeness, and dialog completeness, pose significant challenges to fine-tuning, thus contributing to the poor generalizability of the fine-tuned models. Finally, we note the need for these generative models to be evaluated with a metric that relies not only on dialog coherence and matched language modeling distribution but also on the model's ability to showcase pedagogical skills.

摘要
我们认为，共同任务的 dataset 特点，包括采样、代表性和对话完整性，对于调整 pose significiant 挑战，从而导致调整模型的泛化性不佳。最后，我们注意到，为了评估这些生成模型，需要使用一个指标，不仅考虑对话 coherence 和语言模型的匹配分布，还需要考虑模型的教学技能展示能力。

MentalHealthAI: Utilizing Personal Health Device Data to Optimize Psychiatry Treatment

paper_url: http://arxiv.org/abs/2307.04777
repo_url: None
paper_authors: Manan Shukla, Oshani Seneviratne
for: 提供个性化心理健康跟踪和情绪预测系统，以帮助心理医生更好地了解患有心理疾病的患者的情况，并提供更有效的心理疾病治疗和管理方法。
methods: 利用个体 физиологи数据，combines transfer和 federated机器学习概念，使用智能合约，保持数据在用户设备上，实现隐私感知和负责任的心理健康跟踪。
results: 使用流行的心理健康数据集，实现了Promising results，表明我们的方法可以提供有效的心理健康跟踪和预测。

Abstract
Mental health disorders remain a significant challenge in modern healthcare, with diagnosis and treatment often relying on subjective patient descriptions and past medical history. To address this issue, we propose a personalized mental health tracking and mood prediction system that utilizes patient physiological data collected through personal health devices. Our system leverages a decentralized learning mechanism that combines transfer and federated machine learning concepts using smart contracts, allowing data to remain on users' devices and enabling effective tracking of mental health conditions for psychiatric treatment and management in a privacy-aware and accountable manner. We evaluate our model using a popular mental health dataset that demonstrates promising results. By utilizing connected health systems and machine learning models, our approach offers a novel solution to the challenge of providing psychiatrists with further insight into their patients' mental health outside of traditional office visits.

摘要
精神健康问题仍然是现代医疗中的主要挑战，诊断和治疗往往基于患者主观描述和医疗历史。为解决这一问题，我们提出了个性化精神健康跟踪和情绪预测系统，该系统利用通过个人健康设备收集的患者生物数据。我们的系统采用分布式学习机制，结合了转移和联邦机器学习概念，使得数据能够留在用户设备上，并且有效地跟踪精神健康状况，供心理医生进行精神疾病治疗和管理，同时具有隐私和负责任的特点。我们使用了一个流行的精神健康数据集，并得到了批判性的结果。通过连接医疗系统和机器学习模型，我们的方法提供了一种新的解决方案，即为心理医生在传统办公室访问之外提供更多的精神健康情况的信息。

RidgeBase: A Cross-Sensor Multi-Finger Contactless Fingerprint Dataset

paper_url: http://arxiv.org/abs/2307.05563
repo_url: https://github.com/bhavinjawade/RidgeBase_Fingerprint_Camera_App
paper_authors: Bhavin Jawade, Deen Dayal Mohan, Srirangaraj Setlur, Nalini Ratha, Venu Govindaraju
For: 这 paper 的目的是提出一个大规模实际数据集，以促进无接触指纹识别技术的进一步发展。* Methods: 这 paper 使用了两种摄像头和一个平板式接触传感器来收集 contactless 和 contact-based 指纹图像，并提出了一种基于 facial recognition 数据集的集成匹配协议来处理 intra-sample 差异。* Results: 这 paper 的结果显示，使用 COTS 指纹匹配器和深度 CNN 方法在 RidgeBase 数据集上实现了高度准确的指纹识别。

Abstract
Contactless fingerprint matching using smartphone cameras can alleviate major challenges of traditional fingerprint systems including hygienic acquisition, portability and presentation attacks. However, development of practical and robust contactless fingerprint matching techniques is constrained by the limited availability of large scale real-world datasets. To motivate further advances in contactless fingerprint matching across sensors, we introduce the RidgeBase benchmark dataset. RidgeBase consists of more than 15,000 contactless and contact-based fingerprint image pairs acquired from 88 individuals under different background and lighting conditions using two smartphone cameras and one flatbed contact sensor. Unlike existing datasets, RidgeBase is designed to promote research under different matching scenarios that include Single Finger Matching and Multi-Finger Matching for both contactless- to-contactless (CL2CL) and contact-to-contactless (C2CL) verification and identification. Furthermore, due to the high intra-sample variance in contactless fingerprints belonging to the same finger, we propose a set-based matching protocol inspired by the advances in facial recognition datasets. This protocol is specifically designed for pragmatic contactless fingerprint matching that can account for variances in focus, polarity and finger-angles. We report qualitative and quantitative baseline results for different protocols using a COTS fingerprint matcher (Verifinger) and a Deep CNN based approach on the RidgeBase dataset. The dataset can be downloaded here: https://www.buffalo.edu/cubs/research/datasets/ridgebase-benchmark-dataset.html

摘要
请注意，以下文本将使用简化中文表示法。无接触指纹识别使用智能手机镜头可以解决传统指纹系统中的主要挑战，包括医疗安全、可移植性和展示攻击。然而，发展实用且可靠的无接触指纹识别技术受到实际数据的有限可用性所限制。为了鼓励进一步的无接触指纹识别技术发展，我们介绍了RidgeBase参考 dataset。RidgeBase 包含了15,000多个无接触和接触基于指纹图像的对照项目，来自88名参与者，在不同的背景和照明条件下使用两款智能手机镜头和一款平板式接触感应器所取得。与现有数据集不同的是，RidgeBase 是设计来鼓励研究不同的匹配enario，包括单指纹匹配和多指纹匹配，并且包括CL2CL和C2CL验证和识别。此外，由于无接触指纹内部的同 fingers 的标本之间的高同一样性，我们提出了一个基于人脸识别数据集的集合匹配协议。这个协议特别适用于实用无接触指纹识别，可以考虑到专注、极化和手均角度的变化。我们将在RidgeBase dataset上报告基于不同协议的 qualitative 和量化基eline结果，使用商业 fingerprint 匹配软件（Verifinger）和深度 CNN 基础的方法。数据可以在以下网址下载：https://www.buffalo.edu/cubs/research/datasets/ridgebase-benchmark-dataset.html。

The Future of Fundamental Science Led by Generative Closed-Loop Artificial Intelligence

paper_url: http://arxiv.org/abs/2307.07522
repo_url: None
paper_authors: Hector Zenil, Jesper Tegnér, Felipe S. Abrahão, Alexander Lavin, Vipin Kumar, Jeremy G. Frey, Adrian Weller, Larisa Soldatova, Alan R. Bundy, Nicholas R. Jennings, Koichi Takahashi, Lawrence Hunter, Saso Dzeroski, Andrew Briggs, Frederick D. Gregory, Carla P. Gomes, Christopher K. I. Williams, Jon Rowe, James Evans, Hiroaki Kitano, Joshua B. Tenenbaum, Ross King
For: The paper explores the potential of AI-driven, automated, closed-loop approach to scientific discovery, including self-driven hypothesis generation and open-ended autonomous exploration of the hypothesis space.* Methods: The paper discusses the use of Generative AI and Large Language Models to augment and accelerate the scientific discovery of fundamental deep science with quantitative models.* Results: The paper suggests that integrating AI-driven automation into the practice of science could mitigate current problems, including the replication of findings, systematic production of data, and ultimately democratise the scientific process.

Abstract
Recent advances in machine learning and AI, including Generative AI and LLMs, are disrupting technological innovation, product development, and society as a whole. AI's contribution to technology can come from multiple approaches that require access to large training data sets and clear performance evaluation criteria, ranging from pattern recognition and classification to generative models. Yet, AI has contributed less to fundamental science in part because large data sets of high-quality data for scientific practice and model discovery are more difficult to access. Generative AI, in general, and Large Language Models in particular, may represent an opportunity to augment and accelerate the scientific discovery of fundamental deep science with quantitative models. Here we explore and investigate aspects of an AI-driven, automated, closed-loop approach to scientific discovery, including self-driven hypothesis generation and open-ended autonomous exploration of the hypothesis space. Integrating AI-driven automation into the practice of science would mitigate current problems, including the replication of findings, systematic production of data, and ultimately democratisation of the scientific process. Realising these possibilities requires a vision for augmented AI coupled with a diversity of AI approaches able to deal with fundamental aspects of causality analysis and model discovery while enabling unbiased search across the space of putative explanations. These advances hold the promise to unleash AI's potential for searching and discovering the fundamental structure of our world beyond what human scientists have been able to achieve. Such a vision would push the boundaries of new fundamental science rather than automatize current workflows and instead open doors for technological innovation to tackle some of the greatest challenges facing humanity today.

摘要

ChatGPT in the Age of Generative AI and Large Language Models: A Concise Survey

paper_url: http://arxiv.org/abs/2307.04251
repo_url: https://github.com/iamgmujtaba/scholar_search
paper_authors: Salman Mohamadi, Ghulam Mujtaba, Ngan Le, Gianfranco Doretto, Donald A. Adjeroh
for: 这 paper 的主要目的是为了提供一份简洁的survey关于 ChatGPT 的当前研究进展和演化。methods: 这 paper 使用了两种视角来研究 ChatGPT：玻璃盒视角（glass box view）和黑盒视角（black box view）。玻璃盒视角专注于理解技术的内部工作方式，而黑盒视角则视其为一个复杂系统，从输入、输出和效果的角度进行研究。results: 这 paper 提供了一份全面的探究 ChatGPT 技术的研究进展和应用前景，同时也提出了进一步研究的必要性和潜在应用领域。此外， paper 还 shed light on LLMS 和GAI 的基础文献，并探讨了这些技术在教育、研究、医疗、金融等领域的应用和关键问题。

Abstract
ChatGPT is a large language model (LLM) created by OpenAI that has been carefully trained on a large amount of data. It has revolutionized the field of natural language processing (NLP) and has pushed the boundaries of LLM capabilities. ChatGPT has played a pivotal role in enabling widespread public interaction with generative artificial intelligence (GAI) on a large scale. It has also sparked research interest in developing similar technologies and investigating their applications and implications. In this paper, our primary goal is to provide a concise survey on the current lines of research on ChatGPT and its evolution. We considered both the glass box and black box views of ChatGPT, encompassing the components and foundational elements of the technology, as well as its applications, impacts, and implications. The glass box approach focuses on understanding the inner workings of the technology, and the black box approach embraces it as a complex system, and thus examines its inputs, outputs, and effects. This paves the way for a comprehensive exploration of the technology and provides a road map for further research and experimentation. We also lay out essential foundational literature on LLMs and GAI in general and their connection with ChatGPT. This overview sheds light on existing and missing research lines in the emerging field of LLMs, benefiting both public users and developers. Furthermore, the paper delves into the broad spectrum of applications and significant concerns in fields such as education, research, healthcare, finance, etc.

摘要
chatGPT是一个大型自然语言模型（LLM），由OpenAI精心训练了大量数据。它对自然语言处理（NLP）领域产生了革命性的变革，并推动了LLM的可能性的极限。chatGPT使得大规模的人工智能生成（GAI）与公众进行交互，并且引发了研究人员对类似技术的开发和应用的兴趣。在这篇论文中，我们的主要目标是提供对chatGPT的当前研究进展和演化的简短概述。我们包括了“玻璃盒”和“黑盒”两种视角，即理解技术的内部结构和行为，以及对其输入、输出和影响的研究。这种方法使得我们可以全面探索技术，并为进一步的研究和实验提供了道路图。此外，我们还提供了LLM和GAI的基础文献，这些文献对于公众和开发者都是必读的。此概述照明了LLM领域的现有和缺失的研究方向，并且探讨了GAI在各个领域的应用和关注点，如教育、研究、医疗、金融等。

Ensemble learning for blending gridded satellite and gauge-measured precipitation data

paper_url: http://arxiv.org/abs/2307.06840
repo_url: None
paper_authors: Georgia Papacharalampous, Hristos Tyralis, Nikolaos Doulamis, Anastasios Doulamis
for: 提高卫星降水产品的准确性
methods: 使用11种新的ensemble学习算法，包括多变量适应回归splines、多变量适应多项式splines、随机森林、梯度提升机、极大梯度提升和 bayesian常量化神经网络
results: 对整个美国陆地区和15年时间段进行了广泛的比较，并发现ensemble学习算法可以提高卫星降水产品的准确性。

Abstract
Regression algorithms are regularly used for improving the accuracy of satellite precipitation products. In this context, ground-based measurements are the dependent variable and the satellite data are the predictor variables, together with topography factors. Alongside this, it is increasingly recognised in many fields that combinations of algorithms through ensemble learning can lead to substantial predictive performance improvements. Still, a sufficient number of ensemble learners for improving the accuracy of satellite precipitation products and their large-scale comparison are currently missing from the literature. In this work, we fill this specific gap by proposing 11 new ensemble learners in the field and by extensively comparing them for the entire contiguous United States and for a 15-year period. We use monthly data from the PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) and IMERG (Integrated Multi-satellitE Retrievals for GPM) gridded datasets. We also use gauge-measured precipitation data from the Global Historical Climatology Network monthly database, version 2 (GHCNm). The ensemble learners combine the predictions by six regression algorithms (base learners), namely the multivariate adaptive regression splines (MARS), multivariate adaptive polynomial splines (poly-MARS), random forests (RF), gradient boosting machines (GBM), extreme gradient boosting (XGBoost) and Bayesian regularized neural networks (BRNN), and each of them is based on a different combiner. The combiners include the equal-weight combiner, the median combiner, two best learners and seven variants of a sophisticated stacking method. The latter stacks a regression algorithm on the top of the base learners to combine their independent predictions...

摘要
干涉算法常用于提高卫星降水产品的准确性。在这个上下文中，地面测量是dependent变量，而卫星数据和地形因素是预测变量。此外，逐渐认识到，将算法组合在ensemble学习中可以导致重要的预测性能提高。然而，卫星降水产品的精度 improvemen要求足够多的ensemble学习者，而现有文献中缺乏这些学习者。在这项工作中，我们填充这个空白，并提出11种新的ensemble学习者，并对整个大陆和15年时间进行了广泛的比较。我们使用PERSIANN（Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks）和IMERG（Integrated Multi-satellitE Retrievals for GPM）的月度数据，以及GHCNm（Global Historical Climatology Network monthly database, version 2）中的 gauge-measured precipitation数据。ensemble学习者将base learner（六种回归算法）的预测结果进行组合，其中包括平均值 combiner、 median combiner、两个best learner和七种堆叠方法的七种变种。这些堆叠方法都基于不同的combiner。

Efficient Bayesian travel-time tomography with geologically-complex priors using sensitivity-informed polynomial chaos expansion and deep generative networks

paper_url: http://arxiv.org/abs/2307.04228
repo_url: None
paper_authors: Giovanni Angelo Meles, Macarena Amaya, Shiran Levy, Stefano Marelli, Niklas Linde
for: This paper focuses on developing a strategy for Bayesian travel-time tomography using Monte Carlo Markov Chain (MCMC) methods, which can accurately characterize the prior distribution and efficiently evaluate the likelihood.
methods: The paper combines the use of principal component analysis (PCA) and polynomial chaos expansion (PCE) to develop a surrogate model for the forward problem, and leverages variational autoencoders (VAEs) to represent the prior distribution.
results: The proposed method enables accurate reconstruction of the true travel times and provides a viable alternative to traditional MCMC methods, which can be computationally expensive and challenging to implement.

Abstract
Monte Carlo Markov Chain (MCMC) methods commonly confront two fundamental challenges: the accurate characterization of the prior distribution and the efficient evaluation of the likelihood. In the context of Bayesian studies on tomography, principal component analysis (PCA) can in some cases facilitate the straightforward definition of the prior distribution, while simultaneously enabling the implementation of accurate surrogate models based on polynomial chaos expansion (PCE) to replace computationally intensive full-physics forward solvers. When faced with scenarios where PCA does not offer a direct means of easily defining the prior distribution alternative methods like deep generative models (e.g., variational autoencoders (VAEs)), can be employed as viable options. However, accurately producing a surrogate capable of capturing the intricate non-linear relationship between the latent parameters of a VAE and the outputs of forward modeling presents a notable challenge. Indeed, while PCE models provide high accuracy when the input-output relationship can be effectively approximated by relatively low-degree multivariate polynomials, this condition is typically unmet when utilizing latent variables derived from deep generative models. In this contribution, we present a strategy that combines the excellent reconstruction performances of VAE in terms of prio representation with the accuracy of PCA-PCE surrogate modeling in the context of Bayesian ground penetrating radar (GPR) travel-time tomography. Within the MCMC process, the parametrization of the VAE is leveraged for prior exploration and sample proposal. Concurrently, modeling is conducted using PCE, which operates on either globally or locally defined principal components of the VAE samples under examination.

摘要
Monte Carlo Markov Chain（MCMC）方法常遇到两个基本挑战：准确地 characteryrization of the prior distribution和高效地评估 likelihood。在扩展学中的Tomography研究中，主成分分析（PCA）可以在一些情况下使得归一化分布的定义变得更加直观，同时允许通过多项式混合（PCE）来替代 computationally intensive full-physics forward solvers。然而，在PCA无法直接定义归一化分布的情况下，可以使用深度生成模型（例如变量自动编码器（VAEs））作为可行的选择。然而，生成高精度的surrogate模型，以capture latent parameters of VAE和前向模型之间的复杂非线性关系，则成为一大挑战。实际上，PCE模型在输入-输出关系可以高度有效地被近似为低度多ivariate polynomials时，具有高准确性。然而，这种条件通常不满足使用来自深度生成模型 derivated的latent variables。在这篇论文中，我们提出一种策略，将VAE的很好的重建性与PCA-PCE模型的准确性相结合，以进行 Bayesian ground penetrating radar（GPR） travel-time tomography。在MCMC过程中，VAE的 Parametrization被用于归一化和样本提议。同时，使用PCE进行模型化， operate on either globally or locally defined principal components of VAE samples under examination。

Hierarchical Autoencoder-based Lossy Compression for Large-scale High-resolution Scientific Data

paper_url: http://arxiv.org/abs/2307.04216
repo_url: https://github.com/hieutrungle/data-slim
paper_authors: Hieu Le, Hernan Santos, Jian Tao
for: compress large-scale scientific data while maintaining high reconstruction qualitymethods: uses a neural network-based approach with Autoencoder architectureresults: achieves a compression ratio of 140 on several benchmark data sets, and 200 on simulation data from the High-Resolution Community Earth System Model (CESM) Version 1.3 with negligible reconstruction error.

Abstract
Lossy compression has become an important technique to reduce data size in many domains. This type of compression is especially valuable for large-scale scientific data, whose size ranges up to several petabytes. Although Autoencoder-based models have been successfully leveraged to compress images and videos, such neural networks have not widely gained attention in the scientific data domain. Our work presents a neural network that not only significantly compresses large-scale scientific data but also maintains high reconstruction quality. The proposed model is tested with scientific benchmark data available publicly and applied to a large-scale high-resolution climate modeling data set. Our model achieves a compression ratio of 140 on several benchmark data sets without compromising the reconstruction quality. Simulation data from the High-Resolution Community Earth System Model (CESM) Version 1.3 over 500 years are also being compressed with a compression ratio of 200 while the reconstruction error is negligible for scientific analysis.

摘要
lossy compression技术在许多领域中已成为重要的数据压缩方法。特别是在大规模科学数据领域，数据的大小可以达到数十个petabyte级别。虽然基于Autoencoder的模型已成功地压缩图像和视频，但这些神经网络在科学数据领域尚未得到广泛关注。我们的工作推出了一种能够高效压缩大规模科学数据，同时保持高重建质量的神经网络模型。我们的模型在公共预测数据集上进行测试，并应用于大规模高分辨率气候模拟数据集。我们的模型在多个 benchmark 数据集上实现了压缩率为140，而且重建质量几乎不受影响。同时，我们还对高分辨率社区地球系统模型（CESM）版本1.3的500年 simultion 数据进行压缩，压缩率达200，重建错误几乎可以忽略不计。

Generalized Action-based Ball Recovery Model using 360$^\circ$ data

paper_url: http://arxiv.org/abs/2307.04215
repo_url: None
paper_authors: Ricardo Furbino Marques do Nascimento, Hugo M. R. Rios-Neto
for: 本研究的目的是答 answer the question of what actions lead to a change in possession, and how a team’s positioning affects ball recovery.
methods: 本研究使用了Statsbomb 360$^\circ$ 数据创建了一个通用的动作基于球 Possession 模型（GABR）。
results: 研究发现，在各种不同的防御方式下，球 Possession 的变化是由几种不同的动作引起的，而这些动作与防御方式之间存在着正相关关系。此外，研究还发现了一些球队的位置对球 Possession 的影响。

Abstract
Even though having more possession does not necessarily lead to winning, teams like Manchester City, Liverpool, and Leeds United notably have tried to recover the ball quickly after they lost it over the past few years. Nowadays, some of the top managers in the world apply high-pressing styles, and concepts such as the five-second rule, usually credited to Guardiola, have been spreading out [9][10], becoming a fundamental part of how lots of teams have played over the recent years. Expressions like "don't let them breathe" and "get the ball back as soon as possible" are often heard in the media [4][5][6], but what are the actions that most lead to a change in possession? What is the influence of a team's positioning on the ball recovery? Which are the players that more often collapse when under pressure? Can we evaluate the defensive dynamics of teams that do not necessarily press the player in possession as intensely as those mentioned above? We try to answer those and other questions in this paper by creating a Generalized Action based Ball Recovery model (GABR) using Statsbomb 360$^\circ$ data.

摘要
即使 possessed 更多不一定会导致赢球，但是球队如曼城、利物浦和利兹联等在过去几年来尝试快速回夺球的情况仍然很常见。目前，世界上许多顶尖教练会应用高压风格，并且概念如五秒规则，通常被归功于加多达（Guardiola），在过去几年间成为了许多队伍的基本战斗方式。媒体中经常听到“不让他呼吸”和“尽快回夺球”的表达（4][5][6），但是哪些行为最可能导致 possession 的变化？队伍的位置如何影响球回夺？哪些球员在压力下更容易塌陷？我们通过创建一个通用行动基于球回夺模型（GABR），使用Statsbomb 360$^\circ$ 数据来回答这些问题。

2023-07-10

eess.IV

eess.IV - 2023-07-10

DWA: Differential Wavelet Amplifier for Image Super-Resolution

paper_url: http://arxiv.org/abs/2307.04593
repo_url: None
paper_authors: Brian B. Moser, Stanislav Frolov, Federico Raue, Sebastian Palacio, Andreas Dengel
for: 这篇论文是为了提出一种 Drop-in 模块，即 diferencial wavelet amplifier (DWA)，用于提高wavelet-based image Super-Resolution (SR)。
methods: DWA 使用了Discrete Wavelet Transformation (DWT)，一种已经受到较少关注的方法，来实现高效的图像表示和减少输入空间大小。
results: 根据实验结果，DWA 可以提高wavelet-based SR 模型的性能，特别是在经典的 SR 任务中，如 DWSR 和 MWCNN。此外，DWA 可以直接应用于输入图像空间，从而避免了传统 DWT 的频道化表示。

Abstract
This work introduces Differential Wavelet Amplifier (DWA), a drop-in module for wavelet-based image Super-Resolution (SR). DWA invigorates an approach recently receiving less attention, namely Discrete Wavelet Transformation (DWT). DWT enables an efficient image representation for SR and reduces the spatial area of its input by a factor of 4, the overall model size, and computation cost, framing it as an attractive approach for sustainable ML. Our proposed DWA model improves wavelet-based SR models by leveraging the difference between two convolutional filters to refine relevant feature extraction in the wavelet domain, emphasizing local contrasts and suppressing common noise in the input signals. We show its effectiveness by integrating it into existing SR models, e.g., DWSR and MWCNN, and demonstrate a clear improvement in classical SR tasks. Moreover, DWA enables a direct application of DWSR and MWCNN to input image space, reducing the DWT representation channel-wise since it omits traditional DWT.

摘要
Our proposed DWA model enhances wavelet-based SR models by leveraging the difference between two convolutional filters to refine relevant feature extraction in the wavelet domain. This approach emphasizes local contrasts and suppresses common noise in the input signals. We demonstrate the effectiveness of DWA by integrating it into existing SR models, such as DWSR and MWCNN, and show a clear improvement in classical SR tasks.Moreover, DWA enables a direct application of DWSR and MWCNN to the input image space, reducing the DWT representation channel-wise since it omits traditional DWT. This simplifies the model architecture and reduces the computational cost, making it more sustainable and efficient.

TFR: Texture Defect Detection with Fourier Transform using Normal Reconstructed Template of Simple Autoencoder

paper_url: http://arxiv.org/abs/2307.04574
repo_url: None
paper_authors: Jongwook Si, Sungyoung Kim
for: detection of texture defects in real-world images
methods: simple autoencoder + Fourier transform analysis
results: effective and accurate defect detection, demonstrated through experimental results

Abstract
Texture is an essential information in image representation, capturing patterns and structures. As a result, texture plays a crucial role in the manufacturing industry and is extensively studied in the fields of computer vision and pattern recognition. However, real-world textures are susceptible to defects, which can degrade image quality and cause various issues. Therefore, there is a need for accurate and effective methods to detect texture defects. In this study, a simple autoencoder and Fourier transform are employed for texture defect detection. The proposed method combines Fourier transform analysis with the reconstructed template obtained from the simple autoencoder. Fourier transform is a powerful tool for analyzing the frequency domain of images and signals. Moreover, since texture defects often exhibit characteristic changes in specific frequency ranges, analyzing the frequency domain enables effective defect detection. The proposed method demonstrates effectiveness and accuracy in detecting texture defects. Experimental results are presented to evaluate its performance and compare it with existing approaches.

摘要
Texture 是图像表示中的重要信息，捕捉pattern和结构。由于这些Texture在生产业中扮演着重要的角色，因此在计算机视觉和模式识别领域中进行了广泛的研究。然而，实际世界中的Texture受到defect的影响，这会导致图像质量下降和多种问题。因此，需要一种准确和有效的方法来检测Texture defect。在本研究中，使用了简单的自适应神经网络和快推trasform来检测Texture defect。该方法将快推trasform分析与自适应神经网络重建的模板结合使用。快推trasform是图像和信号频谱分析的powerful工具，而Texture defects通常在特定频谱范围内表现出 caracteristic 变化，因此在频谱分析中可以实现有效的检测。该方法的实验结果表明其效果和准确性在检测Texture defect方面具有优势。与现有方法进行比较的实验结果也是如此。

CoactSeg: Learning from Heterogeneous Data for New Multiple Sclerosis Lesion Segmentation

paper_url: http://arxiv.org/abs/2307.04513
repo_url: https://github.com/ycwu1997/coactseg
paper_authors: Yicheng Wu, Zhonghua Wu, Hengcan Shi, Bjoern Picker, Winston Chong, Jianfei Cai
for: 提高多发形股病（MS）临床治疗中新出现的肿瘤分割精度，以估计疾病进程和治疗效果。
methods: 利用不同时点样本的异类数据（新出现肿瘤注解两个时点样本和全部肿瘤注解单个时点样本），提出了一种协作分割（CoactSeg）框架，以提高新肿瘤分割精度。
results: 对于新肿瘤和全部肿瘤分割任务，通过利用不同时点样本和提出的关系常量约束，可以显著提高分割精度。同时，还提供了一个MS-23v1 dataset，包括38个澳大利亚单个时点样本的全部肿瘤标签。

Abstract
New lesion segmentation is essential to estimate the disease progression and therapeutic effects during multiple sclerosis (MS) clinical treatments. However, the expensive data acquisition and expert annotation restrict the feasibility of applying large-scale deep learning models. Since single-time-point samples with all-lesion labels are relatively easy to collect, exploiting them to train deep models is highly desirable to improve new lesion segmentation. Therefore, we proposed a coaction segmentation (CoactSeg) framework to exploit the heterogeneous data (i.e., new-lesion annotated two-time-point data and all-lesion annotated single-time-point data) for new MS lesion segmentation. The CoactSeg model is designed as a unified model, with the same three inputs (the baseline, follow-up, and their longitudinal brain differences) and the same three outputs (the corresponding all-lesion and new-lesion predictions), no matter which type of heterogeneous data is being used. Moreover, a simple and effective relation regularization is proposed to ensure the longitudinal relations among the three outputs to improve the model learning. Extensive experiments demonstrate that utilizing the heterogeneous data and the proposed longitudinal relation constraint can significantly improve the performance for both new-lesion and all-lesion segmentation tasks. Meanwhile, we also introduce an in-house MS-23v1 dataset, including 38 Oceania single-time-point samples with all-lesion labels. Codes and the dataset are released at https://github.com/ycwu1997/CoactSeg.

摘要
新的肿瘤分割是 Multiple Sclerosis (MS) 诊断和治疗中的关键，但是收集大规模的数据和专家标注的成本限制了应用大规模深度学习模型的可能性。由于单个时间点样本中的所有肿瘤标签比较容易获得，因此可以利用它们来训练深度模型以提高新的肿瘤分割。为此，我们提出了一个合作分割（CoactSeg）框架，可以利用不同类型的数据（新的肿瘤标签的两个时间点数据和所有肿瘤标签的单个时间点数据）来进行新的肿瘤分割。CoactSeg 模型设计为一个统一的模型，三个输入（基线、追视和它们之间的脑部差异）和三个输出（相应的所有肿瘤和新肿瘤预测），不管使用哪种不同类型的数据。此外，我们还提出了一种简单而有效的时间相关约束，以保证新肿瘤和所有肿瘤之间的长期关系，从而提高模型学习。我们的实验表明，利用不同类型的数据和提议的时间相关约束可以大幅提高新肿瘤和所有肿瘤分割任务的表现。此外，我们还释放了一个 MS-23v1 数据集，包括 38 个澳大利亚单个时间点样本，每个样本均有所有肿瘤标签。代码和数据可以在上下载。

SAM-IQA: Can Segment Anything Boost Image Quality Assessment?

paper_url: http://arxiv.org/abs/2307.04455
repo_url: https://github.com/hedlen/sam-iqa
paper_authors: Xinpeng Li, Ting Jiang, Haoqiang Fan, Shuaicheng Liu
for: 本研究旨在提高图像质量评估（IQA） task 的准确性，通过使用大量数据进行训练。
methods: 本研究使用 Segment Anything 模型的encoder部分进行高级别 semantic feature extraction，并利用 Fourier 和标准卷积来提取频域特征。
results: 对四个代表性的数据集进行了广泛的实验，结果表明我们的方法可以比STATE-OF-THE-ART 高效， both qualitatively 和 quantitatively。

Abstract
Image Quality Assessment (IQA) is a challenging task that requires training on massive datasets to achieve accurate predictions. However, due to the lack of IQA data, deep learning-based IQA methods typically rely on pre-trained networks trained on massive datasets as feature extractors to enhance their generalization ability, such as the ResNet network trained on ImageNet. In this paper, we utilize the encoder of Segment Anything, a recently proposed segmentation model trained on a massive dataset, for high-level semantic feature extraction. Most IQA methods are limited to extracting spatial-domain features, while frequency-domain features have been shown to better represent noise and blur. Therefore, we leverage both spatial-domain and frequency-domain features by applying Fourier and standard convolutions on the extracted features, respectively. Extensive experiments are conducted to demonstrate the effectiveness of all the proposed components, and results show that our approach outperforms the state-of-the-art (SOTA) in four representative datasets, both qualitatively and quantitatively. Our experiments confirm the powerful feature extraction capabilities of Segment Anything and highlight the value of combining spatial-domain and frequency-domain features in IQA tasks. Code: https://github.com/Hedlen/SAM-IQA

摘要

Identification of Hemorrhage and Infarct Lesions on Brain CT Images using Deep Learning

paper_url: http://arxiv.org/abs/2307.04425
repo_url: None
paper_authors: Arunkumar Govindarajan, Arjun Agarwal, Subhankar Chattoraj, Dennis Robert, Satish Golla, Ujjwal Upadhyay, Swetha Tanamala, Aarthi Govindarajan
for: 鉴定非contrast computed tomography（NCCT）头部影像的潜在疾病
methods: 使用深度学习（DL）基本的计算机助临 diagnosis（CAD）模型
results: 对头部NCCT影像的自动识别出血肿和损伤的可能性和局限性

Abstract
Head Non-contrast computed tomography (NCCT) scan remain the preferred primary imaging modality due to their widespread availability and speed. However, the current standard for manual annotations of abnormal brain tissue on head NCCT scans involves significant disadvantages like lack of cutoff standardization and degeneration identification. The recent advancement of deep learning-based computer-aided diagnostic (CAD) models in the multidisciplinary domain has created vast opportunities in neurological medical imaging. Significant literature has been published earlier in the automated identification of brain tissue on different imaging modalities. However, determining Intracranial hemorrhage (ICH) and infarct can be challenging due to image texture, volume size, and scan quality variability. This retrospective validation study evaluated a DL-based algorithm identifying ICH and infarct from head-NCCT scans. The head-NCCT scans dataset was collected consecutively from multiple diagnostic imaging centers across India. The study exhibits the potential and limitations of such DL-based software for introduction in routine workflow in extensive healthcare facilities.

摘要
head non-contrast computed tomography (NCCT) 扫描仍然是 primary imaging modality 的首选方式，因为它们在可用性和速度方面具有广泛的优势。然而，现有的手动标注病理脑组织在 head NCCT 扫描中存在一些缺点，如标准化标注的缺乏和衰变识别。Recent Advances 在多学科领域的 computer-aided diagnostic (CAD) 模型中，已经创造了巨大的机会，特别是在神经科医学影像领域。 Earlier literature 已经发表了自动识别不同 imaging modalities 中的脑组织。然而，由于图像 текстура、体积大小和扫描质量的变化，识别Intracranial hemorrhage (ICH) 和衰变可以是困难的。本回顾验证研究检查了一种基于深度学习 (DL) 的算法，可以从 head-NCCT 扫描中识别 ICH 和衰变。该 dataset 是从印度多个诊断影像中心收集的 consecutively。研究表明了这种 DL-based 软件的潜在和局限性，并探讨了在广泛的医疗设施中的应用前景。

Towards Enabling Cardiac Digital Twins of Myocardial Infarction Using Deep Computational Models for Inverse Inference

paper_url: http://arxiv.org/abs/2307.04421
repo_url: None
paper_authors: Lei Li, Julia Camps, Zhinuo, Wang, Abhirup Banerjee, Marcel Beetz, Blanca Rodriguez, Vicente Grau
for: 这个论文的目的是开发一个基于电cardiac digital twins (CDTs)的个性化诊断和治疗规划系统，以便更准确地诊断心肺病。
methods: 该论文使用了多Modal数据，包括心肺成像和电cardiacogram (ECG)，以提高推断肉粉组织特性的准确性和可靠性。具体来说，该论文采用了一种深度计算模型，通过对QRS信号和相应的损肉区域之间的复杂关系进行捕捉，来推断损肉的位置和分布。
results: 在计算实验中，该模型能够有效地捕捉QRS信号和相应的损肉区域之间的复杂关系，并且在未来的临床应用中具有扎实的潜在性。

Abstract
Myocardial infarction (MI) demands precise and swift diagnosis. Cardiac digital twins (CDTs) have the potential to offer individualized evaluation of cardiac function in a non-invasive manner, making them a promising approach for personalized diagnosis and treatment planning of MI. The inference of accurate myocardial tissue properties is crucial in creating a reliable CDT platform, and particularly in the context of studying MI. In this work, we investigate the feasibility of inferring myocardial tissue properties from the electrocardiogram (ECG), focusing on the development of a comprehensive CDT platform specifically designed for MI. The platform integrates multi-modal data, such as cardiac MRI and ECG, to enhance the accuracy and reliability of the inferred tissue properties. We perform a sensitivity analysis based on computer simulations, systematically exploring the effects of infarct location, size, degree of transmurality, and electrical activity alteration on the simulated QRS complex of ECG, to establish the limits of the approach. We subsequently propose a deep computational model to infer infarct location and distribution from the simulated QRS. The in silico experimental results show that our model can effectively capture the complex relationships between the QRS signals and the corresponding infarct regions, with promising potential for clinical application in the future. The code will be released publicly once the manuscript is accepted for publication.

摘要
In this study, we investigate the feasibility of inferring myocardial tissue properties from the electrocardiogram (ECG) to develop a comprehensive CDT platform specifically designed for MI. Our platform integrates multi-modal data, such as cardiac MRI and ECG, to enhance the accuracy and reliability of the inferred tissue properties.We performed a sensitivity analysis based on computer simulations to explore the effects of infarct location, size, degree of transmurality, and electrical activity alteration on the simulated QRS complex of ECG. Our results show that our approach has limitations, but our deep computational model can effectively capture the complex relationships between the QRS signals and the corresponding infarct regions, with promising potential for clinical application in the future. The code will be publicly released once the manuscript is accepted for publication.

K-Space-Aware Cross-Modality Score for Synthesized Neuroimage Quality Assessment

paper_url: http://arxiv.org/abs/2307.04296
repo_url: None
paper_authors: Jinbao Wang, Guoyang Xie, Yawen Huang, Jiayi Lyu, Feng Zheng, Yefeng Zheng, Yaochu Jin
for: This paper aims to address the problem of assessing cross-modality medical image synthesis, which has been largely unexplored and neglected by existing measures such as PSNR and SSIM.
methods: The proposed method, called K-CROSS, uses a pre-trained multi-modality segmentation network to predict lesion locations, together with a tumor encoder to represent features such as texture details and brightness intensities. Both k-space features and vision features are obtained and employed in comprehensive encoders with a frequency reconstruction penalty. The structure-shared encoders are designed and constrained with a similarity loss to capture the intrinsic common structural information for both modalities.
results: The proposed method outperforms other metrics, especially when compared with radiologists on a large-scale cross-modality neuroimaging perceptual similarity (NIRPS) dataset with 6,000 radiologist judgments.

Abstract
The problem of how to assess cross-modality medical image synthesis has been largely unexplored. The most used measures like PSNR and SSIM focus on analyzing the structural features but neglect the crucial lesion location and fundamental k-space speciality of medical images. To overcome this problem, we propose a new metric K-CROSS to spur progress on this challenging problem. Specifically, K-CROSS uses a pre-trained multi-modality segmentation network to predict the lesion location, together with a tumor encoder for representing features, such as texture details and brightness intensities. To further reflect the frequency-specific information from the magnetic resonance imaging principles, both k-space features and vision features are obtained and employed in our comprehensive encoders with a frequency reconstruction penalty. The structure-shared encoders are designed and constrained with a similarity loss to capture the intrinsic common structural information for both modalities. As a consequence, the features learned from lesion regions, k-space, and anatomical structures are all captured, which serve as our quality evaluators. We evaluate the performance by constructing a large-scale cross-modality neuroimaging perceptual similarity (NIRPS) dataset with 6,000 radiologist judgments. Extensive experiments demonstrate that the proposed method outperforms other metrics, especially in comparison with the radiologists on NIRPS.

摘要
医疗影像合成的跨Modalidad评估问题一直受到了相对少数研究。现有的度量方法，如PSNR和SSIM，主要关注医疗影像的结构特征，而忽略了重要的疾病位置和基本的k-空间特点。为了解决这个问题，我们提出了一个新的度量方法，即K-CROSS。Specifically，K-CROSS使用一个预训练的多Modalidad分割网络预测疾病位置，并使用一个恶性编码器来表示特征，如纹理细节和明亮度。为了更好地反映医疗影像的频率特征，我们在我们的全面编码器中使用频率重建罚款。同时，我们设计了结构共享编码器，并使用一个相似损失来捕捉两Modalidad之间的共同结构信息。因此，我们可以从疾病区域、k-空间和解剖结构中 capture所有的特征，这些特征作为我们的质量评估器。我们通过构建一个大规模的跨Modalidad神经成像相似度（NIRPS）数据集，并对6,000名医生判断进行评估，来评估性能。广泛的实验表明，我们提出的方法在比较其他度量方法时，尤其是与医生在NIRPS上的评估中，表现出了优异。

2023-07-09

cs.SD

cs.SD - 2023-07-09

Can Generative Large Language Models Perform ASR Error Correction?

paper_url: http://arxiv.org/abs/2307.04172
repo_url: None
paper_authors: Rao Ma, Mengjie Qian, Potsawee Manakul, Mark Gales, Kate Knill
for: 提高 ASR 系统的表现，使其更加准确和有效。
methods: 使用 ChatGPT 大语言模型进行零次或一次学习，对 ASR N-best 列表进行错误修正。
results: 对 Conformer-Transducer 模型和预训练的 Whisper 模型进行错误修正，可以大幅提高 ASR 系统的表现。I hope that helps! Let me know if you have any other questions.

Abstract
ASR error correction continues to serve as an important part of post-processing for speech recognition systems. Traditionally, these models are trained with supervised training using the decoding results of the underlying ASR system and the reference text. This approach is computationally intensive and the model needs to be re-trained when switching the underlying ASR model. Recent years have seen the development of large language models and their ability to perform natural language processing tasks in a zero-shot manner. In this paper, we take ChatGPT as an example to examine its ability to perform ASR error correction in the zero-shot or 1-shot settings. We use the ASR N-best list as model input and propose unconstrained error correction and N-best constrained error correction methods. Results on a Conformer-Transducer model and the pre-trained Whisper model show that we can largely improve the ASR system performance with error correction using the powerful ChatGPT model.

摘要
<>转换文本为简化中文。<>ASR错误修正仍然serve as重要的后处理步骤 для语音识别系统。传统上，这些模型通过指导学习使用下面ASR系统的解码结果和参考文本进行训练。这种方法是计算机程序昂贵，模型需要在Switching beneath ASR模型时重新训练。 recent years have seen the development of large language models and their ability to perform natural language processing tasks in a zero-shot manner. In this paper, we take ChatGPT as an example to examine its ability to perform ASR error correction in the zero-shot or 1-shot settings. We use the ASR N-best list as model input and propose unconstrained error correction and N-best constrained error correction methods. Results on a Conformer-Transducer model and the pre-trained Whisper model show that we can largely improve the ASR system performance with error correction using the powerful ChatGPT model.Note: The translation is done using Google Translate, which may not be perfect, but it should give you a good idea of the content in Simplified Chinese.

Emotion-Guided Music Accompaniment Generation Based on Variational Autoencoder

paper_url: http://arxiv.org/abs/2307.04015
repo_url: https://github.com/duoluoluos/emotion-guided-music-accompaniment-generation
paper_authors: Qi Wang, Shubing Zhang, Li Zhou
for: 本研究旨在解决人工智能在音乐创作过程中困难地 integrate 人类情感创作美妙的伴奏乐曲。
methods: 我们提议使用一种易于表示情感流程的曲线模型，即Valence/Arousal Curve，以实现模型内部情感信息的兼容，并使用变量自动编码器作为模型结构以提高情感因素的解释性。而Relative Self-Attention 技术也用于保持音乐句子水平结构和生成更加丰富的伴奏乐曲。
results: 我们的方法可以增强音乐创作过程中 AI 的情感创作能力，并生成更加美妙的伴奏乐曲。

Abstract
Music accompaniment generation is a crucial aspect in the composition process. Deep neural networks have made significant strides in this field, but it remains a challenge for AI to effectively incorporate human emotions to create beautiful accompaniments. Existing models struggle to effectively characterize human emotions within neural network models while composing music. To address this issue, we propose the use of an easy-to-represent emotion flow model, the Valence/Arousal Curve, which allows for the compatibility of emotional information within the model through data transformation and enhances interpretability of emotional factors by utilizing a Variational Autoencoder as the model structure. Further, we used relative self-attention to maintain the structure of the music at music phrase level and to generate a richer accompaniment when combined with the rules of music theory.

摘要
音乐伴奏生成是作曲过程中的一个重要方面。深度神经网络在这个领域已经做出了很大的进步，但是AI还未能够有效地涵盖人类情感以创造美妙的伴奏。现有的模型很难准确地捕捉人类情感信息在神经网络模型中，而且通常会导致模型难以理解和描述情感因素。为解决这个问题，我们提议使用一种容易表达情感流程的曲线模型，即情感值/刺激曲线，该模型通过数据转换来兼容情感信息，并通过使用变量自动编码器结构来提高情感因素的解释性。此外，我们使用相对自注意力来保持音乐段级结构和生成更加丰富的伴奏，并与音乐理论规则相结合。

2023-07-09

eess.AS

eess.AS - 2023-07-09

IANS: Intelligibility-aware Null-steering Beamforming for Dual-Microphone Arrays

paper_url: http://arxiv.org/abs/2307.04179
repo_url: None
paper_authors: Wen-Yuan Ting, Syu-Siang Wang, Yu Tsao, Borching Su
for: 提高杂音干扰 speech 的 intelligibility
methods: 使用 STOI-Net 智能预测模型，并结合 null-steering beamformer (NSBF) 生成一系列 beamformed 输出，以提高 speech 的 intelligibility
results: 实验结果表明，IANS 可以使用小型双麦克麦array 生成有 inteligibility 提高的 signal，与 known DOAs 的 null-steering beamformers 的结果相似

Abstract
Beamforming techniques are popular in speech-related applications due to their effective spatial filtering capabilities. Nonetheless, conventional beamforming techniques generally depend heavily on either the target's direction-of-arrival (DOA), relative transfer function (RTF) or covariance matrix. This paper presents a new approach, the intelligibility-aware null-steering (IANS) beamforming framework, which uses the STOI-Net intelligibility prediction model to improve speech intelligibility without prior knowledge of the speech signal parameters mentioned earlier. The IANS framework combines a null-steering beamformer (NSBF) to generate a set of beamformed outputs, and STOI-Net, to determine the optimal result. Experimental results indicate that IANS can produce intelligibility-enhanced signals using a small dual-microphone array. The results are comparable to those obtained by null-steering beamformers with given knowledge of DOAs.

摘要
<> simultrare il testo in Cinese semplificato.<>Beamforming 技术在语音相关应用中很受欢迎，因为它们可以提供有效的空间滤波功能。然而，传统的 beamforming 技术通常具有依赖于目标的方向 arrival (DOA)、相对转移函数 (RTF) 或 covariance matrix 的缺点。这篇论文提出了一种新的方法——智能可识别 beamforming 框架 (IANS)，它使用 STOI-Net 智能可识别模型来提高语音可识别性，无需先知道语音信号参数。IANS 框架将 null-steering beamformer (NSBF) 与 STOI-Net 结合，生成一组扩展出的输出，并使用 STOI-Net 确定最佳结果。实验结果表明，IANS 可以使用小型双 микрофон阵列生成具有可识别性的信号，与 null-steering beamformers 的结果相似。

2023-07-11

APRF: Anti-Aliasing Projection Representation Field for Inverse Problem in Imaging

Does pre-training on brain-related tasks results in better deep-learning-based brain age biomarkers?

Encoder Complexity Control in SVT-AV1 by Speed-Adaptive Preset Switching

HistoColAi: An Open-Source Web Platform for Collaborative Digital Histology Image Annotation with AI-Driven Predictive Integration

Super-resolution imaging through a multimode fiber: the physical upsampling of speckle-driven

Offline and Online Optical Flow Enhancement for Deep Video Compression

SAR-NeRF: Neural Radiance Fields for Synthetic Aperture Radar Multi-View Representation

Towards Anytime Optical Flow Estimation with Event Cameras

Count-Free Single-Photon 3D Imaging with Race Logic

Kinematically-Decoupled Impedance Control for Fast Object Visual Servoing and Grasping on Quadruped Manipulators

Rapid Deforestation and Burned Area Detection using Deep Multimodal Learning on Satellite Imagery

KU-DMIS-MSRA at RadSum23: Pre-trained Vision-Language Model for Radiology Report Summarization

CVPR MultiEarth 2023 Deforestation Estimation Challenge:SpaceVision4Amazon

2023-07-10

Collaborative Song Dataset (CoSoD): An annotated dataset of multi-artist collaborations in popular music

The NPU-MSXF Speech-to-Speech Translation System for IWSLT 2023 Speech-to-Speech Translation Task

Exploiting an External Microphone for Binaural RTF-Vector-Based Direction of Arrival Estimation for Multiple Speakers

HCLAS-X: Hierarchical and Cascaded Lyrics Alignment System Using Multimodal Cross-Correlation

2023-07-10

Timbre transfer using image-to-image denoising diffusion implicit models

Study on the Correlation between Objective Evaluations and Subjective Speech Quality and Intelligibility

A Demand-Driven Perspective on Generative Audio AI

2023-07-10

Joint Salient Object Detection and Camouflaged Object Detection via Uncertainty-aware Learning

Multimodal brain age estimation using interpretable adaptive population-graph learning

SPLAL: Similarity-based pseudo-labeling with alignment loss for semi-supervised medical image classification

Source-Free Open-Set Domain Adaptation for Histopathological Images via Distilling Self-Supervised Vision Transformer

DWA: Differential Wavelet Amplifier for Image Super-Resolution

A Graph Multi-separator Problem for Image Segmentation

AnyTeleop: A General Vision-Based Dexterous Robot Arm-Hand Teleoperation System

TFR: Texture Defect Detection with Fourier Transform using Normal Reconstructed Template of Simple Autoencoder

Unraveling the Age Estimation Puzzle: Comparative Analysis of Deep Learning Approaches for Facial Age Estimation

Important Clues that Facilitate Visual Emergence: Three Psychological Experiments

SparseVSR: Lightweight and Noise Robust Visual Speech Recognition

Customizing Synthetic Data for Data-Free Student Learning

Cluster-Induced Mask Transformers for Effective Opportunistic Gastric Cancer Screening on Non-contrast CT Scans

Efficient Match Pair Retrieval for Large-scale UAV Images via Graph Indexed Global Descriptor

An Examination of Wearable Sensors and Video Data Capture for Human Exercise Classification

CoactSeg: Learning from Heterogeneous Data for New Multiple Sclerosis Lesion Segmentation

Exact Diffusion Inversion via Bi-directional Integration Approximation

Partial Vessels Annotation-based Coronary Artery Segmentation with Self-training and Prototype Learning

Test-Time Adaptation for Nighttime Color-Thermal Semantic Segmentation

SAM-IQA: Can Segment Anything Boost Image Quality Assessment?

Search-time Efficient Device Constraints-Aware Neural Architecture Search

Automatic diagnosis of knee osteoarthritis severity using Swin transformer

Global and Local Visual Processing: Influence of Perceptual Field Variables

Identification of Hemorrhage and Infarct Lesions on Brain CT Images using Deep Learning

Towards Enabling Cardiac Digital Twins of Myocardial Infarction Using Deep Computational Models for Inverse Inference

FODVid: Flow-guided Object Discovery in Videos

CT-based Subchondral Bone Microstructural Analysis in Knee Osteoarthritis via MR-Guided Distillation Learning

Towards Generalizable Diabetic Retinopathy Grading in Unseen Domains

One-Shot Pruning for Fast-adapting Pre-trained Models on Devices

InfLoR-SNN: Reducing Information Loss for Spiking Neural Networks

Hierarchical Semantic Tree Concept Whitening for Interpretable Image Classification

New Variants of Frank-Wolfe Algorithm for Video Co-localization Problem

Leveraging Multiple Descriptive Features for Robust Few-shot Image Learning

Robust Feature Learning Against Noisy Labels

K-Space-Aware Cross-Modality Score for Synthesized Neuroimage Quality Assessment

Convex Decomposition of Indoor Scenes

Mx2M: Masked Cross-Modality Modeling in Domain Adaptation for 3D Semantic Segmentation

2023-07-10

Active Learning for Video Classification with Frame Level Queries

Weakly-supervised positional contrastive learning: application to cirrhosis classification

Code Generation for Machine Learning using Model-Driven Engineering and SysML

MiVOLO: Multi-input Transformer for Age and Gender Estimation

Learning Interpretable Heuristics for WalkSAT

A Memristor-Inspired Computation for Epileptiform Signals in Spheroids

DBFed: Debiasing Federated Learning Framework based on Domain-Independent

Model-Driven Engineering for Artificial Intelligence – A Systematic Literature Review

A Semi-Automated Solution Approach Selection Tool for Any Use Case via Scopus and OpenAI: a Case Study for AI/ML in Oncology

Gradient Surgery for One-shot Unlearning on Generative Model

Hate Speech Detection via Dual Contrastive Learning

Learning Large Margin Sparse Embeddings for Open Set Medical Diagnosis

Pathway toward prior knowledge-integrated machine learning in engineering

Q-YOLOP: Quantization-aware You Only Look Once for Panoptic Driving Perception

QBitOpt: Fast and Accurate Bitwidth Reallocation during Training

Preventing Errors in Person Detection: A Part-Based Self-Monitoring Framework

SAGC-A68: a space access graph dataset for the classification of spaces and space elements in apartment buildings

Improving Heterogeneous Graph Learning with Weighted Mixed-Curvature Product Manifold