cs.LG - 2023-08-25

NeuralClothSim: Neural Deformation Fields Meet the Kirchhoff-Love Thin Shell Theory

paper_url: http://arxiv.org/abs/2308.12970
repo_url: None
paper_authors: Navami Kairanda, Marc Habermann, Christian Theobalt, Vladislav Golyanik
for:这篇论文的目的是提出一种新的物理可能的布料模拟方法，使用薄shell理论来描述布料的表征和动态变化。methods:这篇论文使用了神经网络来学习布料的表征和动态变化，并使用了约束梯度下降来训练神经网络。results:实验结果表明，这种新的布料模拟方法可以具有高效的存储使用和可微的性，同时可以快速地实现布料的材质描述和模拟编辑。

Abstract
Cloth simulation is an extensively studied problem, with a plethora of solutions available in computer graphics literature. Existing cloth simulators produce realistic cloth deformations that obey different types of boundary conditions. Nevertheless, their operational principle remains limited in several ways: They operate on explicit surface representations with a fixed spatial resolution, perform a series of discretised updates (which bounds their temporal resolution), and require comparably large amounts of storage. Moreover, back-propagating gradients through the existing solvers is often not straightforward, which poses additional challenges when integrating them into modern neural architectures. In response to the limitations mentioned above, this paper takes a fundamentally different perspective on physically-plausible cloth simulation and re-thinks this long-standing problem: We propose NeuralClothSim, i.e., a new cloth simulation approach using thin shells, in which surface evolution is encoded in neural network weights. Our memory-efficient and differentiable solver operates on a new continuous coordinate-based representation of dynamic surfaces, i.e., neural deformation fields (NDFs); it supervises NDF evolution with the rules of the non-linear Kirchhoff-Love shell theory. NDFs are adaptive in the sense that they 1) allocate their capacity to the deformation details as the latter arise during the cloth evolution and 2) allow surface state queries at arbitrary spatial and temporal resolutions without retraining. We show how to train our NeuralClothSim solver while imposing hard boundary conditions and demonstrate multiple applications, such as material interpolation and simulation editing. The experimental results highlight the effectiveness of our formulation and its potential impact.

摘要
cloth 模拟是一个广泛研究的问题，计算机图形文献中有很多解决方案。现有的布料模拟器可以生成真实的布料变形，但它们的运作原理受到一些限制：它们在固定的空间分辨率上进行显式表面表示，执行一系列精炼的更新（这限制了它们的时间分辨率），并需要相对较大的存储量。此外，通过现有的解决方案来归档梯度的操作也不直观，这会增加将它们集成到现代神经网络架构时的挑战。面对这些限制，这篇论文采用了一种新的思路来解决布料模拟问题：我们提出了基于薄shell的新布料模拟方法，即NeuralClothSim。我们的方法使用神经网络权重来编码表面的进化，并且使用约束 Kirchhoff-Love 薄shell理论来监督神经变换场（NDF）的演化。NDF是可适应的，即它们会根据布料演化中的变化分配其容量，并且允许在任何空间和时间分辨率上进行表面状态的查询无需重新训练。我们示出了如何在强制边界条件下训练我们的NeuralClothSim解决方案，并展示了多种应用，如材料 interpolate 和模拟编辑。实验结果表明了我们的方法的有效性和潜在影响。

NeO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes

paper_url: http://arxiv.org/abs/2308.12967
repo_url: https://github.com/zubair-irshad/NeO-360
paper_authors: Muhammad Zubair Irshad, Sergey Zakharov, Katherine Liu, Vitor Guizilini, Thomas Kollar, Adrien Gaidon, Zsolt Kira, Rares Ambrus
for: 这个论文旨在解决现有的视点合成方法需要费力的每个景象优化问题，以便应用于实际的无限无限的城市环境中， где对象或背景只有几个视角可见。
methods: 我们介绍了一种新的方法 called NeO 360，它使用神经场来实现 sparse view synthesis of outdoor scenes。该方法可以从单个或几个颜色图像中重建360度场景。
results: 我们的实验表明，NeO 360 可以在 NeRDS 360 提出的挑战性 datasets 上表现出色，并且在新的视角和原始场景中都能够得到高质量的结果。此外，NeO 360 还提供了编辑和组合功能。

Abstract
Recent implicit neural representations have shown great results for novel view synthesis. However, existing methods require expensive per-scene optimization from many views hence limiting their application to real-world unbounded urban settings where the objects of interest or backgrounds are observed from very few views. To mitigate this challenge, we introduce a new approach called NeO 360, Neural fields for sparse view synthesis of outdoor scenes. NeO 360 is a generalizable method that reconstructs 360{\deg} scenes from a single or a few posed RGB images. The essence of our approach is in capturing the distribution of complex real-world outdoor 3D scenes and using a hybrid image-conditional triplanar representation that can be queried from any world point. Our representation combines the best of both voxel-based and bird's-eye-view (BEV) representations and is more effective and expressive than each. NeO 360's representation allows us to learn from a large collection of unbounded 3D scenes while offering generalizability to new views and novel scenes from as few as a single image during inference. We demonstrate our approach on the proposed challenging 360{\deg} unbounded dataset, called NeRDS 360, and show that NeO 360 outperforms state-of-the-art generalizable methods for novel view synthesis while also offering editing and composition capabilities. Project page: https://zubair-irshad.github.io/projects/neo360.html

摘要
最近的隐式神经表示法已经达到了对novel view synthesis的出色成绩。然而，现有的方法需要费时且费力地从多个视角优化，从而限制了它们在实际世界无限大的城市设置中的应用。为解决这个挑战，我们提出了一种新的方法called NeO 360，即神经场 для缺省视图Synthesis of outdoor scenes。NeO 360是一种通用的方法，可以从单个或几个RGB图像中重construct 360度场景。我们的方法的核心思想是捕捉复杂的实际户外3D场景的分布，并使用一种混合图像 conditioned triplanar表示，可以从任何世界点进行查询。我们的表示结合了 voxel-based和bird's-eye-view（BEV）表示的优点，并且在表示效果和表达力方面比每一种更高。NeO 360的表示允许我们从大量的无限大3D场景中学习，并在推理时对新视图和新场景进行普适化。我们在提出的challenging 360度无限大数据集，即NeRDS 360上进行了证明，并表明NeO 360在对novel view synthesis的推理中超越了当前最佳的通用方法，同时也提供了编辑和组合功能。项目页面：https://zubair-irshad.github.io/projects/neo360.html

Scenimefy: Learning to Craft Anime Scene via Semi-Supervised Image-to-Image Translation

paper_url: http://arxiv.org/abs/2308.12968
repo_url: https://github.com/yuxinn-j/scenimefy
paper_authors: Yuxin Jiang, Liming Jiang, Shuai Yang, Chen Change Loy
for: 高质量动漫场景自实际图像渲染
methods: 使用结构保持 pseudo 对应数据引导学习，利用 CLIP 富有模型先验，并应用 segmentation-guided 数据选择，以提高准确性和细节。
results: 比前一代基eline表现出较高的both perceptual quality和量化性能。

Abstract
Automatic high-quality rendering of anime scenes from complex real-world images is of significant practical value. The challenges of this task lie in the complexity of the scenes, the unique features of anime style, and the lack of high-quality datasets to bridge the domain gap. Despite promising attempts, previous efforts are still incompetent in achieving satisfactory results with consistent semantic preservation, evident stylization, and fine details. In this study, we propose Scenimefy, a novel semi-supervised image-to-image translation framework that addresses these challenges. Our approach guides the learning with structure-consistent pseudo paired data, simplifying the pure unsupervised setting. The pseudo data are derived uniquely from a semantic-constrained StyleGAN leveraging rich model priors like CLIP. We further apply segmentation-guided data selection to obtain high-quality pseudo supervision. A patch-wise contrastive style loss is introduced to improve stylization and fine details. Besides, we contribute a high-resolution anime scene dataset to facilitate future research. Our extensive experiments demonstrate the superiority of our method over state-of-the-art baselines in terms of both perceptual quality and quantitative performance.

摘要
自动高质量渲染动漫场景从复杂实际图像是实际上具有重要的实践价值。这些挑战包括场景复杂度、动漫风格独特性和领域域之间的数据域隔。尽管有承诺的尝试，过去的尝试仍然无法达到满意的结果，包括持续性的semantic preserve，证明性的风格化和细节。在本研究中，我们提出了Sceneimefy，一种新的半指导性图像-图像翻译框架。我们的方法利用结构一致的 pseudo paired数据来引导学习，从而简化了纯无监督的设定。 pseudo数据通过基于CLIP的semantic-constrained StyleGAN得到，并应用了 segmentation-guided data selection来获得高质量的pseudo超级vision。此外，我们还提供了一个高分辨率动漫场景集，以便未来的研究。我们的广泛的实验表明，我们的方法在比较现有基eline上方面具有superiority， both perceived quality和量化性能。

Dense Text-to-Image Generation with Attention Modulation

paper_url: http://arxiv.org/abs/2308.12964
repo_url: https://github.com/naver-ai/densediffusion
paper_authors: Yunji Kim, Jiyoung Lee, Jin-Hwa Kim, Jung-Woo Ha, Jun-Yan Zhu
for: 处理细致的文本描述，生成真实的图像。
methods: 利用预训练的文本到图像模型，通过 Layout 指导对象在具体的区域出现。
results: 不需要再训练或数据集，可以根据文本描述提高图像生成效果，并且与具体的 Layout 条件下的模型效果相似。

Abstract
Existing text-to-image diffusion models struggle to synthesize realistic images given dense captions, where each text prompt provides a detailed description for a specific image region. To address this, we propose DenseDiffusion, a training-free method that adapts a pre-trained text-to-image model to handle such dense captions while offering control over the scene layout. We first analyze the relationship between generated images' layouts and the pre-trained model's intermediate attention maps. Next, we develop an attention modulation method that guides objects to appear in specific regions according to layout guidance. Without requiring additional fine-tuning or datasets, we improve image generation performance given dense captions regarding both automatic and human evaluation scores. In addition, we achieve similar-quality visual results with models specifically trained with layout conditions.

摘要
现有的文本到图像扩散模型很难以生成具有细致描述的图像，每个文本提示都提供了特定图像区域的详细描述。为解决这个问题，我们提议了DenseDiffusion，一种不需要训练的方法，可以使用预训练的文本到图像模型来处理这些细致的文本提示，同时提供场景布局控制。我们首先分析生成图像的布局和预训练模型的中间注意力地图之间的关系。然后，我们开发了一种注意力调节方法，可以根据布局指导对象出现在特定区域中。不需要额外的训练或数据集，我们在给出细致文本提示时改进了图像生成性能， Regarding both automatic and human evaluation scores.此外，我们可以通过 specifically 将模型训练在场景条件下，实现类似的视觉效果。

DLIP: Distilling Language-Image Pre-training

paper_url: http://arxiv.org/abs/2308.12956
repo_url: None
paper_authors: Huafeng Kuang, Jie Wu, Xiawu Zheng, Ming Li, Xuefeng Xiao, Rui Wang, Min Zheng, Rongrong Ji
for:本研究旨在提出一种简单 yet efficient的语言图像预训练框架（DLIP），以实现快速、高效地压缩VLP模型。methods:本研究采用了多维度的模型压缩方法，包括不同模块的建筑特征和不同模式之间的信息传递。results:实验结果显示，DLIP可以在多种跨模态任务中实现最佳的精度/效率质量比，如图文检索、图文captioning和视觉问答。例如，DLIP可以压缩BLIP模型1.9倍，从213M参数压缩到108M参数，而且与教师模型的性能相似或更好。此外，DLIP可以保留95%以上的性能，使用22.4%的参数和24.8%的FLOPs，并提高执行速度2.7倍。

Abstract
Vision-Language Pre-training (VLP) shows remarkable progress with the assistance of extremely heavy parameters, which challenges deployment in real applications. Knowledge distillation is well recognized as the essential procedure in model compression. However, existing knowledge distillation techniques lack an in-depth investigation and analysis of VLP, and practical guidelines for VLP-oriented distillation are still not yet explored. In this paper, we present DLIP, a simple yet efficient Distilling Language-Image Pre-training framework, through which we investigate how to distill a light VLP model. Specifically, we dissect the model distillation from multiple dimensions, such as the architecture characteristics of different modules and the information transfer of different modalities. We conduct comprehensive experiments and provide insights on distilling a light but performant VLP model. Experimental results reveal that DLIP can achieve a state-of-the-art accuracy/efficiency trade-off across diverse cross-modal tasks, e.g., image-text retrieval, image captioning and visual question answering. For example, DLIP compresses BLIP by 1.9x, from 213M to 108M parameters, while achieving comparable or better performance. Furthermore, DLIP succeeds in retaining more than 95% of the performance with 22.4% parameters and 24.8% FLOPs compared to the teacher model and accelerates inference speed by 2.7x.

摘要
美化语言预训练（VLP）显示了惊人的进步，却面临实际应用中的部署挑战。知识储存是识别为模型压缩的关键过程。然而，现有的知识储存技术没有对VLP进行深入的研究和分析，也没有提供VLP-oriented储存的实用指南。在这篇论文中，我们提出了一个简单 yet efficient的Distilling Language-Image Pre-training框架（DLIP），以 investigate如何压缩一个轻量级VLP模型。我们从多个维度进行模型压缩，包括不同模块的建筑特征和不同Modalities之间的信息传递。我们进行了全面的实验，并提供了压缩轻量级VLP模型的深入分析。实验结果表明，DLIP可以在多个跨模态任务中实现状态机器人的精度/效率质量比，如图文检索、图文描述和视觉问答。例如，DLIP可以将BLIP压缩为1.9倍，从213M Parameters减少到108M Parameters，同时保持与教师模型的相同或更好的性能。此外，DLIP可以保留95%以上的性能，使用22.4% Parameters和24.8% FLOPs，相比教师模型快速加速执行速度2.7倍。

BridgeData V2: A Dataset for Robot Learning at Scale

paper_url: http://arxiv.org/abs/2308.12952
repo_url: https://github.com/rail-berkeley/BridgeData-V2
paper_authors: Homer Walke, Kevin Black, Abraham Lee, Moo Jin Kim, Max Du, Chongyi Zheng, Tony Zhao, Philippe Hansen-Estruch, Quan Vuong, Andre He, Vivek Myers, Kuan Fang, Chelsea Finn, Sergey Levine
For: The paper is written for researchers in the field of robotic manipulation, particularly those interested in scalable robot learning.* Methods: The paper uses a large and diverse dataset of robotic manipulation behaviors, called BridgeData V2, to facilitate research on scalable robot learning. The dataset contains 60,096 trajectories collected across 24 environments on a publicly available low-cost robot, and is compatible with a wide variety of open-vocabulary, multi-task learning methods conditioned on goal images or natural language instructions.* Results: The paper reports on the results of training six state-of-the-art imitation learning and offline reinforcement learning methods on the BridgeData V2 dataset, and finds that these methods succeed on a suite of tasks requiring varying amounts of generalization. The paper also demonstrates that the performance of these methods improves with more data and higher capacity models, and that training on a greater variety of skills leads to improved generalization.

Abstract
We introduce BridgeData V2, a large and diverse dataset of robotic manipulation behaviors designed to facilitate research on scalable robot learning. BridgeData V2 contains 60,096 trajectories collected across 24 environments on a publicly available low-cost robot. BridgeData V2 provides extensive task and environment variability, leading to skills that can generalize across environments, domains, and institutions, making the dataset a useful resource for a broad range of researchers. Additionally, the dataset is compatible with a wide variety of open-vocabulary, multi-task learning methods conditioned on goal images or natural language instructions. In our experiments, we train 6 state-of-the-art imitation learning and offline reinforcement learning methods on our dataset, and find that they succeed on a suite of tasks requiring varying amounts of generalization. We also demonstrate that the performance of these methods improves with more data and higher capacity models, and that training on a greater variety of skills leads to improved generalization. By publicly sharing BridgeData V2 and our pre-trained models, we aim to accelerate research in scalable robot learning methods. Project page at https://rail-berkeley.github.io/bridgedata

摘要
我们介绍 BridgeData V2，一个大型和多样化的机器人 manipulate 行为数据集，用于促进机器人学习扩展。 BridgeData V2 包含 60,096 条路径，在 24 个环境中收集到，这个数据集提供了广泛的任务和环境多样性，从而实现了在不同环境、领域和机构中实现数据可重用性。此外，这个数据集适用于广泛的开放词汇、多任务学习方法，以图像目标或自然语言指令为条件。在我们的实验中，我们将 6 种现代机器人模仿学习和离线强化学习方法训练在我们的数据集上，发现这些方法在一系列需要不同量的数据和模型容量的任务上成功。我们还证明了这些方法在更多的数据和更高的模型容量下表现更好，以及训练更多的技能将导致更好的数据可重用性。我们通过公开 BridgeData V2 和我们的预训模型，希望能够推动机器人学习方法的扩展。更多信息请参考https://rail-berkeley.github.io/bridgedata。

Label Budget Allocation in Multi-Task Learning

paper_url: http://arxiv.org/abs/2308.12949
repo_url: None
paper_authors: Ximeng Sun, Kihyuk Sohn, Kate Saenko, Clayton Mellina, Xiao Bian
for: 提高机器学习系统的性能，因为标签数据的成本限制了系统的性能。
methods: 提出了一种名为多任务学习中的标签预算分配问题，并正式定义了这个问题。并通过实验表明，不同的预算分配策略对多任务学习的性能有很大的影响。我们提出了一种适应任务的预算分配算法，可以在不同的多任务学习设置下生成最佳的预算分配策略。
results: 我们的方法可以在PASCAL VOC和Taskonomy上实现优于其他常用的标签分配策略的性能。

Abstract
The cost of labeling data often limits the performance of machine learning systems. In multi-task learning, related tasks provide information to each other and improve overall performance, but the label cost can vary among tasks. How should the label budget (i.e. the amount of money spent on labeling) be allocated among different tasks to achieve optimal multi-task performance? We are the first to propose and formally define the label budget allocation problem in multi-task learning and to empirically show that different budget allocation strategies make a big difference to its performance. We propose a Task-Adaptive Budget Allocation algorithm to robustly generate the optimal budget allocation adaptive to different multi-task learning settings. Specifically, we estimate and then maximize the extent of new information obtained from the allocated budget as a proxy for multi-task learning performance. Experiments on PASCAL VOC and Taskonomy demonstrate the efficacy of our approach over other widely used heuristic labeling strategies.

摘要
Machine learning系统的标签成本 oftentimeslimits its performance. In multi-task learning, related tasks can provide information to each other and improve overall performance, but the label cost can vary among tasks. How should the label budget (i.e. the amount of money spent on labeling) be allocated among different tasks to achieve optimal multi-task performance? We are the first to propose and formally define the label budget allocation problem in multi-task learning and to empirically show that different budget allocation strategies make a big difference to its performance. We propose a Task-Adaptive Budget Allocation algorithm to robustly generate the optimal budget allocation adaptive to different multi-task learning settings. Specifically, we estimate and then maximize the extent of new information obtained from the allocated budget as a proxy for multi-task learning performance. Experiments on PASCAL VOC and Taskonomy demonstrate the efficacy of our approach over other widely used heuristic labeling strategies.Here's the breakdown of the translation:* Machine learning系统 (机器学习系统) - This is the Chinese term for "machine learning system".* 标签成本 (标签成本) - This is the Chinese term for "label cost".* multi-task learning (多任务学习) - This is the Chinese term for "multi-task learning".* 不同任务 (不同任务) - This is the Chinese term for "different tasks".* 如何分配标签预算 (如何分配标签预算) - This is the Chinese term for "how to allocate the label budget".* 达到最佳多任务性能 (达到最佳多任务性能) - This is the Chinese term for "achieve optimal multi-task performance".* 我们是第一个 (我们是第一个) - This is the Chinese term for "we are the first".* 提出和正式定义标签预算分配问题 (提出和正式定义标签预算分配问题) - This is the Chinese term for "propose and formally define the label budget allocation problem".* 其实际效果 (其实际效果) - This is the Chinese term for "its practical effect".* 多任务学习设置 (多任务学习设置) - This is the Chinese term for "multi-task learning settings".* 适应任务 (适应任务) - This is the Chinese term for "adaptive to different tasks".* 新信息量 (新信息量) - This is the Chinese term for "new information quantity".* 作为多任务学习性能的代理 (作为多任务学习性能的代理) - This is the Chinese term for "as a proxy for multi-task learning performance".* 我们提议的Task-Adaptive Budget Allocation算法 (我们提议的Task-Adaptive Budget Allocation算法) - This is the Chinese term for "our proposed Task-Adaptive Budget Allocation algorithm".* 可以Robustly生成优化的标签预算分配 (可以Robustly生成优化的标签预算分配) - This is the Chinese term for "can robustly generate optimized label budget allocation".* PASCAL VOC和Taskonomy (PASCAL VOC和Taskonomy) - These are the Chinese terms for "PASCAL VOC and Taskonomy".* 实验证明 (实验证明) - This is the Chinese term for "experiments demonstrate".* 其他常用的标签分配策略 (其他常用的标签分配策略) - This is the Chinese term for "other commonly used labeling strategies".

Learning Only On Boundaries: a Physics-Informed Neural operator for Solving Parametric Partial Differential Equations in Complex Geometries

paper_url: http://arxiv.org/abs/2308.12939
repo_url: None
paper_authors: Zhiwei Fang, Sifan Wang, Paris Perdikaris
for: 解决 Parametrized boundary value problems without labeled data.
methods: 使用 Physics-informed neural operator 方法，通过将 PDE 转化为 boundary integral equations (BIEs)，可以在边界上训练操作网络，而不需要大量标注数据。
results: 可以处理复杂的参数化几何和无穷大问题，并且比现有的 PINNs 和 neural operators 更快速。

Abstract
Recently deep learning surrogates and neural operators have shown promise in solving partial differential equations (PDEs). However, they often require a large amount of training data and are limited to bounded domains. In this work, we present a novel physics-informed neural operator method to solve parametrized boundary value problems without labeled data. By reformulating the PDEs into boundary integral equations (BIEs), we can train the operator network solely on the boundary of the domain. This approach reduces the number of required sample points from $O(N^d)$ to $O(N^{d-1})$, where $d$ is the domain's dimension, leading to a significant acceleration of the training process. Additionally, our method can handle unbounded problems, which are unattainable for existing physics-informed neural networks (PINNs) and neural operators. Our numerical experiments show the effectiveness of parametrized complex geometries and unbounded problems.

摘要
最近，深度学习代理和神经操作已经在解偏微分方程（PDEs）中表现出了承诺。然而，它们经常需要大量的训练数据，并且受到固定域的限制。在这个工作中，我们提出了一种新的物理学习神经操作方法，用于解偏微分方程的参数化边值问题。我们将PDEs转化为边 интеграル方程（BIEs），因此我们可以在域的边上训练操作网络，不需要大量的标注数据。这种方法可以减少训练过程中需要的样本点数量从$O(N^d)$减少到$O(N^{d-1})$，其中$d$是域的维度，这导致训练过程的加速。此外，我们的方法还可以处理无界问题，这些问题对现有的物理学习神经网络（PINNs）和神经操作都是不可能的。我们的数学实验表明，参数化复杂的几何和无界问题的效果。

Low-count Time Series Anomaly Detection

paper_url: http://arxiv.org/abs/2308.12925
repo_url: None
paper_authors: Philipp Renz, Kurt Cutajar, Niall Twomey, Gavin K. C. Cheung, Hanting Xie
For: 这篇论文是为了解决低频时间序列中的异常检测问题，特别是在大规模在线平台上监测和记录多种数据类型时，遇到的几个独特挑战。* Methods: 该论文引入了一种新的生成过程，用于创建含有异常段的低频时间序列的基本数据集。通过理论和实验分析，论文解释了现有算法在这些设置下的缺陷，以及如何使用异常分数平滑化来改进性能。* Results: 该论文通过使用实际数据 validate了其分析和建议的实用性，并在一个实际的零售店销售数据集上证明了异常分数平滑化的作用。

Abstract
Low-count time series describe sparse or intermittent events, which are prevalent in large-scale online platforms that capture and monitor diverse data types. Several distinct challenges surface when modelling low-count time series, particularly low signal-to-noise ratios (when anomaly signatures are provably undetectable), and non-uniform performance (when average metrics are not representative of local behaviour). The time series anomaly detection community currently lacks explicit tooling and processes to model and reliably detect anomalies in these settings. We address this gap by introducing a novel generative procedure for creating benchmark datasets comprising of low-count time series with anomalous segments. Via a mixture of theoretical and empirical analysis, our work explains how widely-used algorithms struggle with the distribution overlap between normal and anomalous segments. In order to mitigate this shortcoming, we then leverage our findings to demonstrate how anomaly score smoothing consistently improves performance. The practical utility of our analysis and recommendation is validated on a real-world dataset containing sales data for retail stores.

摘要
低频时序描述稀疏或间歇性事件，这些事件在大规模在线平台上采集和监测多种数据类型中很普遍。模型低频时序时，有几个明显的挑战，特别是低信号噪声比（畸变签识不可避免）和非均匀性（平均指标不是本地行为的代表）。时序异常检测社区目前缺乏专门的工具和过程来模型和可靠地检测异常情况。我们填补这个空白，引入了一种新的生成过程，用于创建包含低频时序异常段的 Referenz datasets。通过理论和实验分析，我们解释了广泛使用的算法在正常和畸变段之间的分布重叠问题。为了解决这个缺陷，我们then 利用我们的发现，示出了如何使用异常得分平滑来提高性能。我们的分析和建议在实际的零售店销售数据集上进行验证，证明了我们的方法的实用性。

An Efficient Distributed Multi-Agent Reinforcement Learning for EV Charging Network Control

paper_url: http://arxiv.org/abs/2308.12921
repo_url: None
paper_authors: Amin Shojaeighadikolaei, Morteza Hashemi
For: The paper aims to develop an effective EV charging controller to mitigate the risk of transformer overload in the distribution grid, with a focus on preserving privacy for EV owners.* Methods: The authors propose a decentralized Multi-agent Reinforcement Learning (MARL) charging framework, employing the Centralized Training Decentralized Execution-Deep Deterministic Policy Gradient (CTDE-DDPG) scheme to provide valuable information to users during training while maintaining privacy during execution.* Results: The CTDE framework improves the performance of the charging network by reducing network costs, and reduces the Peak-to-Average Ratio (PAR) of the total demand, which in turn reduces the risk of transformer overload during peak hours.Here’s the Chinese translation of the three points:* For: 这篇论文目标是为了减少分布网络中变压器过载的风险，同时保持电动车所有者的隐私。* Methods: 作者提出了一种分布式多代理学习（MARL）充电控制器，使用中央训练分布执行-深度束缚策略 Gradient（CTDE-DDPG）算法，以提供训练过程中有价值信息，而执行过程中保持隐私。* Results: CTDE framwork可以提高充电网络的性能，降低网络成本，并降低总需求的峰值强度（PAR），从而减少变压器过载的风险。

Abstract
The increasing trend in adopting electric vehicles (EVs) will significantly impact the residential electricity demand, which results in an increased risk of transformer overload in the distribution grid. To mitigate such risks, there are urgent needs to develop effective EV charging controllers. Currently, the majority of the EV charge controllers are based on a centralized approach for managing individual EVs or a group of EVs. In this paper, we introduce a decentralized Multi-agent Reinforcement Learning (MARL) charging framework that prioritizes the preservation of privacy for EV owners. We employ the Centralized Training Decentralized Execution-Deep Deterministic Policy Gradient (CTDE-DDPG) scheme, which provides valuable information to users during training while maintaining privacy during execution. Our results demonstrate that the CTDE framework improves the performance of the charging network by reducing the network costs. Moreover, we show that the Peak-to-Average Ratio (PAR) of the total demand is reduced, which, in turn, reduces the risk of transformer overload during the peak hours.

摘要
随着电动车（EV）的普及趋势，它将对分布网络的住宅电力需求产生重要影响，从而增加分布网络的变压器负荷风险。为了缓解这些风险，有紧迫需要开发有效的EV充电控制器。目前，大多数EV充电控制器采用中央化的方法来管理个体EV或一组EV。在这篇论文中，我们介绍了一种分布式多智能体学习（MARL）充电框架，这种框架强调保护电动车所有者的隐私。我们采用了中央训练、分布执行-深度决定策函数（CTDE-DDPG）方案，这种方案在训练过程中提供了有价值的信息，同时在执行过程中保持隐私。我们的结果表明，CTDE框架可以改善充电网络的性能，同时还可以降低总需求的峰值至平均值比（PAR），从而降低变压器负荷风险 durante las horas pico。

Towards Realistic Unsupervised Fine-tuning with CLIP

paper_url: http://arxiv.org/abs/2308.12919
repo_url: None
paper_authors: Jian Liang, Lijun Sheng, Zhengbo Wang, Ran He, Tieniu Tan
for: 这个研究旨在应用CLIPvision-language模型（VLM）进行下游超vised学习任务中。
methods: 这篇文章提出了一个实用的、有效的 Fine-tuning方法，即Universal Entropy Optimization（UEO），它利用样本水平的信任度来紧急降低具有信任度的instances的conditional entropy，并将less confident instances的marginal entropy提高。
results: 这篇文章透过15个领域和4种不同的专业知识进行了广泛的实验，结果显示，UEO方法在泛化和out-of-distribution检测方面都大大超过了基eline方法。

Abstract
The emergence of vision-language models (VLMs), such as CLIP, has spurred a significant research effort towards their application for downstream supervised learning tasks. Although some previous studies have explored the unsupervised fine-tuning of CLIP, they often rely on prior knowledge in the form of class names associated with ground truth labels. In this paper, we delve into a realistic unsupervised fine-tuning scenario by assuming that the unlabeled data might contain out-of-distribution samples from unknown classes. Furthermore, we emphasize the importance of simultaneously enhancing out-of-distribution detection capabilities alongside the recognition of instances associated with predefined class labels. To tackle this problem, we present a simple, efficient, and effective fine-tuning approach called Universal Entropy Optimization (UEO). UEO leverages sample-level confidence to approximately minimize the conditional entropy of confident instances and maximize the marginal entropy of less confident instances. Apart from optimizing the textual prompts, UEO also incorporates optimization of channel-wise affine transformations within the visual branch of CLIP. Through extensive experiments conducted across 15 domains and 4 different types of prior knowledge, we demonstrate that UEO surpasses baseline methods in terms of both generalization and out-of-distribution detection.

摘要
随着视力语言模型（VLM）的出现，如CLIP，研究人员努力将其应用于下游指导学习任务。 Although some previous studies have explored the unsupervised fine-tuning of CLIP, they often rely on prior knowledge in the form of class names associated with ground truth labels. In this paper, we explore a realistic unsupervised fine-tuning scenario by assuming that the unlabeled data may contain out-of-distribution samples from unknown classes. Furthermore, we emphasize the importance of simultaneously enhancing out-of-distribution detection capabilities alongside the recognition of instances associated with predefined class labels.To tackle this problem, we present a simple, efficient, and effective fine-tuning approach called Universal Entropy Optimization (UEO). UEO leverages sample-level confidence to approximately minimize the conditional entropy of confident instances and maximize the marginal entropy of less confident instances. Apart from optimizing the textual prompts, UEO also incorporates optimization of channel-wise affine transformations within the visual branch of CLIP. Through extensive experiments conducted across 15 domains and 4 different types of prior knowledge, we demonstrate that UEO surpasses baseline methods in terms of both generalization and out-of-distribution detection.

Evaluating the Vulnerabilities in ML systems in terms of adversarial attacks

paper_url: http://arxiv.org/abs/2308.12918
repo_url: None
paper_authors: John Harshith, Mantej Singh Gill, Madhan Jothimani
for: 本研究探讨了最新的 adversarial 攻击方法，以及它们对当前深度学习网络防御系统的挑战。
methods: 本研究使用了various methods to explore the consequences of vulnerabilities in AI systems, including discussing how they might arise, differences between randomized and adversarial examples, and potential ethical implications.
results: 本研究发现了一些新的 adversarial 攻击方法，以及它们对当前防御系统的影响。同时，研究还提出了一些建议，以帮助在测试阶段 обу练 AI 系统，以便在更广泛的应用中使用。

Abstract
There have been recent adversarial attacks that are difficult to find. These new adversarial attacks methods may pose challenges to current deep learning cyber defense systems and could influence the future defense of cyberattacks. The authors focus on this domain in this research paper. They explore the consequences of vulnerabilities in AI systems. This includes discussing how they might arise, differences between randomized and adversarial examples and also potential ethical implications of vulnerabilities. Moreover, it is important to train the AI systems appropriately when they are in testing phase and getting them ready for broader use.

摘要
近些时候出现了困难发现的对抗攻击。这些新的对抗攻击方法可能会对当前的深度学习网络防御系统 pose 挑战，并可能影响未来网络攻击防御。作者在这篇研究论文中关注这个领域。他们探讨了人工智能系统的漏洞的后果，包括对随机化和对抗示例的区别，以及漏洞的可能性的伦理问题。此外，在测试阶段，我们需要适当地训练AI系统，以便在更广泛的应用中准备它们。Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

POLCA: Power Oversubscription in LLM Cloud Providers

paper_url: http://arxiv.org/abs/2308.12908
repo_url: None
paper_authors: Pratyush Patel, Esha Choukse, Chaojie Zhang, Íñigo Goiri, Brijesh Warrier, Nithish Mahalingam, Ricardo Bianchini
for: 这个论文主要是针对大型自然语言模型（LLM）的创新和其多种应用场景带来的数据中心GPU的 compute capacity 需求，以及这些新工作负载对于数据中心的电力资源带来的挑战。
methods: 论文使用了对多种 LLM 的描述和配置的电力消耗模式进行了广泛的测量和分析，并识别了推导和训练过程中电力消耗的区别。
results: 论文发现，在推导和训练过程中， LLM 集群的平均和峰值电力使用率不高，并且可以通过实现更好的电力调度来提高数据中心的电力效率，增加可部署的服务器数量，并缩短部署时间。论文还提出了一个名为 POLCA 的可靠和可重复的电力调度机制，可以在 GPU 集群中实现更高的部署效率和可靠性。

Abstract
Recent innovation in large language models (LLMs), and their myriad use-cases have rapidly driven up the compute capacity demand for datacenter GPUs. Several cloud providers and other enterprises have made substantial plans of growth in their datacenters to support these new workloads. One of the key bottleneck resources in datacenters is power, and given the increasing model sizes of LLMs, they are becoming increasingly power intensive. In this paper, we show that there is a significant opportunity to oversubscribe power in LLM clusters. Power oversubscription improves the power efficiency of these datacenters, allowing more deployable servers per datacenter, and reduces the deployment time, since building new datacenters is slow. We extensively characterize the power consumption patterns of a variety of LLMs and their configurations. We identify the differences between the inference and training power consumption patterns. Based on our analysis of these LLMs, we claim that the average and peak power utilization in LLM clusters for inference should not be very high. Our deductions align with the data from production LLM clusters, revealing that inference workloads offer substantial headroom for power oversubscription. However, the stringent set of telemetry and controls that GPUs offer in a virtualized environment, makes it challenging to have a reliable and robust power oversubscription mechanism. We propose POLCA, our framework for power oversubscription that is robust, reliable, and readily deployable for GPU clusters. Using open-source models to replicate the power patterns observed in production, we simulate POLCA and demonstrate that we can deploy 30% more servers in the same GPU cluster for inference, with minimal performance loss

摘要
最近的大语言模型（LLM）的创新和多种应用场景，快速提高了数据中心GPU的计算能力需求。许多云提供商和企业在数据中心进行了大规模的扩展计划以支持这些新型应用。数据中心内部的一个关键瓶颈资源是电力，而随着LLM的模型Size不断增大，它们变得越来越占用电力。在这篇论文中，我们表明了数据中心LLM团群中的电力投入可以进行副作用。副作用提高了数据中心的电力效率，allowing more deployable servers per datacenter, and reduces the deployment time, since building new datacenters is slow。我们对各种LLM的多种配置进行了广泛的电力消耗特征分析。我们发现了推理和训练两种不同的电力消耗模式。根据我们对LLM的分析，我们认为LLM团群的平均和峰值电力利用率在推理任务上不应该很高。我们的结论与生产环境中LLM团群的数据相一致，表明推理任务提供了大量的副作用空间。然而，GPU在虚拟环境中提供的严格的测量和控制机制，使得实现可靠和可Robust的副作用机制变得具有挑战。我们提出了POLCA，我们的可靠可Robust的副作用框架。使用开源模型来复制生产环境中的电力模式，我们在POLCA中进行了模拟，并证明可以在同一GPU团群中部署30%更多的服务器，并且减少性能损失。

CDAN: Convolutional Dense Attention-guided Network for Low-light Image Enhancement

paper_url: http://arxiv.org/abs/2308.12902
repo_url: None
paper_authors: Hossein Shakibania, Sina Raoufi, Hassan Khotanlou
for: 增强低光照图像，解决低光照图像的降低清晰度、降低颜色强度和减少细节等问题，以便进行准确的分析和解释。
methods: 该文献提出了一种基于自编码器的Convolutional Dense Attention-guided Network（CDAN），包括卷积和密集块，以及注意力机制和跳过连接。这种架构确保了信息的有效传播和特征学习。
results: 该文献的方法在低光照图像增强 tasks中表现出了remarkable进步，与当前最佳结果相比，在多种复杂的低光照场景中表现出了robustness。

Abstract
Low-light images, characterized by inadequate illumination, pose challenges of diminished clarity, muted colors, and reduced details. Low-light image enhancement, an essential task in computer vision, aims to rectify these issues by improving brightness, contrast, and overall perceptual quality, thereby facilitating accurate analysis and interpretation. This paper introduces the Convolutional Dense Attention-guided Network (CDAN), a novel solution for enhancing low-light images. CDAN integrates an autoencoder-based architecture with convolutional and dense blocks, complemented by an attention mechanism and skip connections. This architecture ensures efficient information propagation and feature learning. Furthermore, a dedicated post-processing phase refines color balance and contrast. Our approach demonstrates notable progress compared to state-of-the-art results in low-light image enhancement, showcasing its robustness across a wide range of challenging scenarios. Our model performs remarkably on benchmark datasets, effectively mitigating under-exposure and proficiently restoring textures and colors in diverse low-light scenarios. This achievement underscores CDAN's potential for diverse computer vision tasks, notably enabling robust object detection and recognition in challenging low-light conditions.

摘要
低光照图像，受到不足照明的影响，会呈现出降低清晰度、抑制颜色、减少细节等问题。低光照图像改善是计算机视觉中的一项重要任务，旨在通过提高亮度、对比度和总体观察质量来使图像更加清晰，以便更加准确地分析和解释。本文介绍了一种新的低光照图像提升方法——卷积束注意力导航网络（CDAN）。CDAN通过综合了自适应网络、卷积块和束注意力机制的架构，确保了信息传递的高效性和特征学习。此外，特定的后处理阶段可以进一步调整颜色均衡和对比度。我们的方法在低光照图像改善任务中实现了显著的进步，在多种复杂的低光照场景中表现出了扎实的稳定性。这一成就表明CDAN在计算机视觉任务中具有广泛的应用前景，特别是在低光照条件下进行稳定的物体检测和识别。

Unified Data Management and Comprehensive Performance Evaluation for Urban Spatial-Temporal Prediction [Experiment, Analysis & Benchmark]

paper_url: http://arxiv.org/abs/2308.12899
repo_url: https://github.com/libcity/bigscity-libcity
paper_authors: Jiawei Jiang, Chengkai Han, Wayne Xin Zhao, Jingyuan Wang
for: 这个论文主要是为了解决城市空间时间预测领域中的数据访问和利用问题，以及深度学习模型的选择和结构设计问题。methods: 该论文提出了“原子文件”的统一存储格式，用于城市空间时间大数据的管理，并对40个多样化的数据集进行验证。此外，论文还提供了城市空间时间预测模型的技术进步概述，以及使用多种模型和数据集进行广泛的实验，建立了性能排名和研究方向。results: 该论文通过提出“原子文件”和对多种模型和数据集的实验，得出了有效地管理城市空间时间数据，指导未来的研究发展，并且可能在长期内对城市生活标准产生持续的贡献。

Abstract
The field of urban spatial-temporal prediction is advancing rapidly with the development of deep learning techniques and the availability of large-scale datasets. However, challenges persist in accessing and utilizing diverse urban spatial-temporal datasets from different sources and stored in different formats, as well as determining effective model structures and components with the proliferation of deep learning models. This work addresses these challenges and provides three significant contributions. Firstly, we introduce "atomic files", a unified storage format designed for urban spatial-temporal big data, and validate its effectiveness on 40 diverse datasets, simplifying data management. Secondly, we present a comprehensive overview of technological advances in urban spatial-temporal prediction models, guiding the development of robust models. Thirdly, we conduct extensive experiments using diverse models and datasets, establishing a performance leaderboard and identifying promising research directions. Overall, this work effectively manages urban spatial-temporal data, guides future efforts, and facilitates the development of accurate and efficient urban spatial-temporal prediction models. It can potentially make long-term contributions to urban spatial-temporal data management and prediction, ultimately leading to improved urban living standards.

摘要
难 accessible 和利用不同来源和格式的城市空间时间数据的挑战 persistently in the field of urban spatial-temporal prediction. To address these challenges, this work makes three significant contributions. Firstly, we introduce "atomic files", a unified storage format designed for urban spatial-temporal big data, and validate its effectiveness on 40 diverse datasets, simplifying data management. Secondly, we present a comprehensive overview of technological advances in urban spatial-temporal prediction models, guiding the development of robust models. Thirdly, we conduct extensive experiments using diverse models and datasets, establishing a performance leaderboard and identifying promising research directions. Overall, this work effectively manages urban spatial-temporal data, guides future efforts, and facilitates the development of accurate and efficient urban spatial-temporal prediction models. It can potentially make long-term contributions to urban spatial-temporal data management and prediction, ultimately leading to improved urban living standards.Note: The translation is in Simplified Chinese, which is the standard written form of Chinese used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

Beyond Document Page Classification: Design, Datasets, and Challenges

paper_url: http://arxiv.org/abs/2308.12896
repo_url: None
paper_authors: Jordy Van Landeghem, Sanket Biswas, Matthew B. Blaschko, Marie-Francine Moens
for: 本文提出了将文档分类测试更加接近实际应用的需求，包括测试数据的性质（多通道、多页、多业务）和分类任务的类型（多页文档、页流和文档套件分类等）。
methods: 本文认为现有的公共多页文档分类数据集缺乏，并正式化了应用场景中的多种分类任务，以及需要更好的文档表示。
results: 实验表明，现有的分类指标已经过时，需要更新以评估实际上occurs的完整文档。这也强调了评估方法的成熔和时间复杂度的重要性，以及面对实际分布的各种扩展。

Abstract
This paper highlights the need to bring document classification benchmarking closer to real-world applications, both in the nature of data tested ($X$: multi-channel, multi-paged, multi-industry; $Y$: class distributions and label set variety) and in classification tasks considered ($f$: multi-page document, page stream, and document bundle classification, ...). We identify the lack of public multi-page document classification datasets, formalize different classification tasks arising in application scenarios, and motivate the value of targeting efficient multi-page document representations. An experimental study on proposed multi-page document classification datasets demonstrates that current benchmarks have become irrelevant and need to be updated to evaluate complete documents, as they naturally occur in practice. This reality check also calls for more mature evaluation methodologies, covering calibration evaluation, inference complexity (time-memory), and a range of realistic distribution shifts (e.g., born-digital vs. scanning noise, shifting page order). Our study ends on a hopeful note by recommending concrete avenues for future improvements.}

摘要
An experimental study on proposed multi-page document classification datasets demonstrates that current benchmarks have become irrelevant and need to be updated to evaluate complete documents, as they naturally occur in practice. The study also calls for more mature evaluation methodologies, including calibration evaluation, inference complexity (time-memory), and a range of realistic distribution shifts (e.g., born-digital vs. scanning noise, shifting page order).The paper concludes on a hopeful note, recommending concrete avenues for future improvements.Translated into Simplified Chinese:这篇论文强调将文档分类 benchmarking 更近于实际应用场景，包括数据 ($X$) 的性质（多通道、多页、多产业）和分类任务 ($f$) 的多样性。文章指出了多页文档分类数据集的缺乏，并正式化了在应用场景中出现的不同分类任务。它还鼓励了 targets 精准的多页文档表示。一个实验研究表明，现有的分类标准数据集已经失去了现实意义，需要更新以评估完整的文档。研究还强调了需要更成熟的评估方法，包括准确评估、推理复杂度（时间内存）和多种现实的分布转移（例如，生成vs扫描噪声、页面顺序变化）。文章结束于一个希望的注意事项，建议将来的改进方向。Translated into Traditional Chinese:这篇论文强调将文档分类 benchmarking 更近于实际应用场景，包括数据 ($X$) 的性质（多通道、多页、多产业）和分类任务 ($f$) 的多样性。论文指出了多页文档分类数据集的缺乏，并正式化了在应用场景中出现的不同分类任务。它还鼓励了targets精准的多页文档表示。一个实验研究表明，现有的分类标准数据集已经失去了现实意义，需要更新以评估完整的文档。研究还强调了需要更成熟的评估方法，包括准确评估、推理复杂度（时间内存）和多种现实的分布转移（例如，生成vs扫描噪音、页面顺序变化）。论文结束于一个希望的注意事项，建议将来的改进方向。