当前位置: 首页 > news >正文

[论文解析] Cones: Concept Neurons in Diffusion Models for Customized Generation

在这里插入图片描述
论文连接:https://readpaper.com/pdf-annotate/note?pdfId=4731757617890738177&noteId=1715361536274443520
源码链接: https://github.com/Johanan528/Cones

文章目录

  • Overview
    • What problem is addressed in the paper?
    • Is it a new problem? If so, why does it matter? If not, why does it still matter?
    • What is the key to the solution?
    • What is the result?
  • Method
    • 3.1. Concept Neurons for a Given Subject
    • 3.2. Interpretability of Concept Neurons
    • 3.3. Collaboratively Capturing Multiple Concepts
  • Expriments
  • Conclusion

Overview

What problem is addressed in the paper?

Concatenating multiple clusters of concept neurons representing different persons, objects, and backgrounds can flexibly generate all related concepts in a single image. (将多个指定主体融入到一个场景中)

Is it a new problem? If so, why does it matter? If not, why does it still matter?

No, this is the first method to manage to generate four different diverse subjects in one image. (subject-driven generation methods)

What is the key to the solution?

We propose to find a small cluster of neurons, which are parameters in the attention layer of a pretrained text-to-image diffusion model, such that changing values of those neurons can generate a corresponding subject in different contents, based on the semantics in the input text prompt.

This paper proposes a novel gradient-based method to analyze and identify the concept neurons, termed as Cones1. We motivate them as the parameters that scale down whose absolute value can better construct the given subject while preserving prior information.

What is the result?

Extensive qualitative and quantitative studies on diverse scenarios show the superiority of our method in interpreting and manipulating diffusion models.

Method

3.1. Concept Neurons for a Given Subject

concept-implanting loss
在这里插入图片描述
where:
在这里插入图片描述
在这里插入图片描述

Algorithm:
在这里插入图片描述

3.2. Interpretability of Concept Neurons

在这里插入图片描述
Shutting the concept neurons immediately draw the outline of the given subject in the attention map corresponding to the text identifier and subsequently generate the subject in the final output. This shows the strong connections between concept neurons and the given subject in the network representations.
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

3.3. Collaboratively Capturing Multiple Concepts

在这里插入图片描述

Expriments

在这里插入图片描述
Figure 7. Comparison of multi-subject generation ability. First row: compared with other methods, ours can better generate the “sweater” in the prompt. Second row: Our method better reflects the semantics of “playing”, while Dreambooth loses the details of the wooden pot. Third row: our generated images have a higher visual similarity with the target subject, and better semantics alignment with “sitting” and “wearing”. Dreambooth fails to generate “chair”. Fourth row: Cones (Ours) maintains high visual similarity for all subjects.

在这里插入图片描述
Figure 8. Comparison of tuning-free subject generation methods. For Cones, we concatenate concept neurons of multiple subjects directly. For Custom Diffusion, we use the “constraint optimization” method of it to composite multiple subjects.

在这里插入图片描述
Table 1. Quantitative comparisons. Cones performs the best except for image alignment in the single subject case. This could be due to that the image alignment metric is easy to overfit as is pointed out in Custom Diffusion (Kumari et al., 2022). DreamBooth and Textual Inversion employ plenty of parameters in the learning, while Cones only involves the deactivation of a few parameters.

在这里插入图片描述
Table 2. Storage cost and sparsity of concept neurons. As the number of target subjects increases, we need to store more indexes of concept neurons. We save more than 90% of the storage space compared with Custom Diffusion

Conclusion

This paper reveals concept neurons in the parameter space of diffusion models. We find that for a given subject, there is a small cluster of concept neurons that dominate the generation of this subject. Shutting them will yield renditions of the given subject in different contexts based on the text prompts. Concatenating them for different subjects can generate all the subjects in the results. Further finetuning can enhance the multi-subject generation capability, which is the first to manage to generate up to four different subjects in one image. Comparison with state-of-the-art competitors demonstrates the superiority of using concept neurons in visual quality, semantic alignment, multi-subject generation capability, and storage consumption.

相关文章:

  • formily实践经验和踩坑
  • 论文题目<mark>
  • Linux基础操作 常用命令 Centos
  • async 与 await
  • 深入理解【正则化的L1-lasso回归和L2-岭回归】以及相关代码复现
  • Linux内核设计与实现第四章学习笔记
  • <c++> 四、模板初阶
  • 两种编程思维
  • 图书馆管理系统(Java编写,思路及源代码)
  • 算法练习:动态规划(最长公共子串问题)
  • 【lm401】解决malloc动态申请内存时内存不足的问题
  • 【Python】ChineseCalendar包简介
  • 前端八股——JS高级学习
  • 【CSS系列】第二章 · CSS选择器
  • vue尚品汇商城项目-day04【27.分页器静态组件(难点)】
  • ----------
  • [译] 怎样写一个基础的编译器
  • 4个实用的微服务测试策略
  • ABAP的include关键字,Java的import, C的include和C4C ABSL 的import比较
  • Android 初级面试者拾遗(前台界面篇)之 Activity 和 Fragment
  • ES6之路之模块详解
  • HTML中设置input等文本框为不可操作
  • MySQL的数据类型
  • NLPIR语义挖掘平台推动行业大数据应用服务
  • Python代码面试必读 - Data Structures and Algorithms in Python
  • Sequelize 中文文档 v4 - Getting started - 入门
  • vue 配置sass、scss全局变量
  • vue:响应原理
  • 从零开始在ubuntu上搭建node开发环境
  • 前端面试之闭包
  • 实战:基于Spring Boot快速开发RESTful风格API接口
  • ​iOS实时查看App运行日志
  • #HarmonyOS:基础语法
  • #Ubuntu(修改root信息)
  • #经典论文 异质山坡的物理模型 2 有效导水率
  • #我与Java虚拟机的故事#连载07:我放弃了对JVM的进一步学习
  • (07)Hive——窗口函数详解
  • (2)空速传感器
  • (2022 CVPR) Unbiased Teacher v2
  • (27)4.8 习题课
  • (delphi11最新学习资料) Object Pascal 学习笔记---第8章第2节(共同的基类)
  • (详细版)Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
  • (学习日记)2024.04.04:UCOSIII第三十二节:计数信号量实验
  • (转)Sql Server 保留几位小数的两种做法
  • .gitignore文件设置了忽略但不生效
  • .NET CLR Hosting 简介
  • .NET 事件模型教程(二)
  • .net 桌面开发 运行一阵子就自动关闭_聊城旋转门家用价格大约是多少,全自动旋转门,期待合作...
  • .NET大文件上传知识整理
  • .net使用excel的cells对象没有value方法——学习.net的Excel工作表问题
  • .NET中的Exception处理(C#)
  • @data注解_一枚 架构师 也不会用的Lombok注解,相见恨晚
  • @DependsOn:解析 Spring 中的依赖关系之艺术
  • @requestBody写与不写的情况
  • [ 蓝桥杯Web真题 ]-布局切换