-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
308 lines (285 loc) · 17.8 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>DIFu: Depth-Guided Implicit Function for Clothed Human Reconstruction</title>
<!-- Bootstrap -->
<link href="css/bootstrap-4.4.1.css" rel="stylesheet">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
</head>
<!-- cover -->
<section>
<div class="jumbotron text-center mt-0">
<div class="container">
<div class="row">
<div class="col-12">
<h2>DIFu: Depth-Guided Implicit Function <br>
for Clothed Human Reconstruction</h2>
<h4 style="color:#5a6268;">CVPR 2023</h4>
<br>
<hr>
<h6>
<a href="https://eadcat.github.io" target="_blank">Dae-Young Song</a><sup>1,2</sup>,
HeeKyung Lee<sup>1</sup>,
Jeongil Seo<sup>1</sup>, and
<a href="https://sites.google.com/view/cnu-cvip" target="_blank">Donghyeon Cho</a><sup>2</sup>
<br><br>
<p><sup>1</sup>Electronics and Telecommunication Research Institute (ETRI), Daejeon, South Korea <br>
<sup>2</sup>Computer Vision and Image Processing (CVIP) Lab., Chungnam National University, Daejeon, South Korea
<div class="row justify-content-center">
<div class="column">
<p class="mb-5"><a class="btn btn-large btn-light" href="https://openaccess.thecvf.com/content/CVPR2023/papers/Song_DIFu_Depth-Guided_Implicit_Function_for_Clothed_Human_Reconstruction_CVPR_2023_paper.pdf" role="button" target="_blank">
<i class="fa fa-file"></i> Paper</a> </p>
</div>
<div class="column">
<p class="mb-5"><a class="btn btn-large btn-light" href="https://openaccess.thecvf.com/content/CVPR2023/supplemental/Song_DIFu_Depth-Guided_Implicit_CVPR_2023_supplemental.pdf" role="button" target="_blank">
<i class="fa fa-file"></i> Supplementary</a> </p>
</div>
<div class="column">
<p class="mb-5"><a class="btn btn-large btn-light" href="https://youtu.be/uNMnCeBVWak" role="button">
<i class="fa fa-youtube-play" aria-hidden="true"></i> Video</a> </p>
</div>
</div>
</div>
</div>
</div>
</div>
</section>
<!-- abstract -->
<section>
<div class="container">
<div class="row">
<div class="col-12 text-center">
<h3>Abstract</h3>
<hr style="margin-top:10px">
<h6 style="color:#8899a5"> Reconstruct human mesh from a monocular image. </h6>
<hr style="margin-top:0px">
<div style = "padding: 0px 0px 0px 100px;">
<img style="float:left" src="assets/Thumbnail.png", height="300p" width="50%" alt="The image cannot be displayed!">
</div>
<video height="300p" width="%" playsinline="" autoplay="autoplay" loop="loop" preload="" muted="">
<source src="assets/projection.mp4" type="video/mp4">
</video>
<!-- <hr style="margin-bottom:0px"> -->
<!-- <br><br> -->
<!-- Abstract below -->
<br><br>
<p class="text-justify">Recently, implicit function (IF)-based methods for clothed human reconstruction using a single image have received a lot of attention. Most existing methods rely on a 3D embedding branch using volume such as the skinned multi-person linear (SMPL) model, to compensate for the lack of information in a single image. Beyond the SMPL, which provides skinned parametric human 3D information, in this paper, we propose a new IF-based method, DIFu, that utilizes a projected depth prior containing textured and non-parametric human 3D information. In particular, DIFu consists of a generator, an occupancy prediction network, and a texture prediction network. The generator takes an RGB image of the human front-side as input, and hallucinates the human back-side image. After that, depth maps for front/back images are estimated and projected into 3D volume space. Finally, the occupancy prediction network extracts a pixel-aligned feature and a voxel-aligned feature through a 2D encoder and a 3D encoder, respectively, and estimates occupancy using these features. Note that voxel-aligned features are obtained from the projected depth maps, thus it can contain detailed 3D information such as hair and cloths. Also, colors of each query point are also estimated with the texture inference branch. The effectiveness of DIFu is demonstrated by comparing to recent IF-based models quantitatively and qualitatively. </p>
</div>
</div>
</div>
</section>
<br><br>
<!-- Model Architecture -->
<section>
<div class="container">
<div class="row">
<div class="col-12 text-center">
<h3>DIFu Pipeline</h3>
<hr style="margin-top:0px">
<img src="assets/Pipeline.png", alt="The image cannot be displayed!", width=80%>
<br><br><br>
<p class="text-left">
(1) Back-side image generation (<i>I<sup>B</sup></i>) with the hallucinator (mirrored-form, PIFuHD Setting). <br>
(2) Using front-/back-side images and the parametric mesh, the depth estimator infers front-/back-side depth maps (<i>D<sup>F</sup></i>, <i>D<sup>B</sup></i>). <br>
(3) <i>D<sup>F</sup></i> and <i>D<sup>B</sup></i> are projected into the volume <i>V</i>. <br>
(4) If required (texture estimation), <i>I<sup>F</sup></i> and <i>I<sup>B</sup></i> also can be projected. <br>
(5) <i>I<sup>F</sup></i>, <i>I<sup>B</sup></i>, <i>D<sup>F</sup></i>, <i>D<sup>B</sup></i>, and <i>V</i> are encoded. <br>
(6) 2D and 3D features are aligned and concatenated channel-wisely. <br>
(7) The MLPs estimates an occupancy vector. <br>
(8) the occupancy vector is converted into a mesh by the marching cubes algorithm.
</p>
</div>
</div>
</div>
</section>
<br><br><br>
<!-- Reconstruction Outputs -->
<section>
<div class="container">
<div class="row">
<div class="col-12 text-center">
<h3>Reconstruction Outputs</h3>
<hr style="margin-top:0px">
<video height="256p" width="%" playsinline="" autoplay="autoplay" loop="loop" preload="" muted="">
<source src="assets/meshes/DIFu-0063-090.mp4" type="video/mp4">
</video>
<video height="256p" width="%" playsinline="" autoplay="autoplay" loop="loop" preload="" muted="">
<source src="assets/meshes/DIFu-0068-000.mp4" type="video/mp4">
</video>
<video height="256p" width="%" playsinline="" autoplay="autoplay" loop="loop" preload="" muted="">
<source src="assets/meshes/DIFu-0070-090.mp4" type="video/mp4">
</video>
<video height="256p" width="%" playsinline="" autoplay="autoplay" loop="loop" preload="" muted="">
<source src="assets/meshes/DIFu-0073-000.mp4" type="video/mp4">
</video>
<video height="256p" width="%" playsinline="" autoplay="autoplay" loop="loop" preload="" muted="">
<source src="assets/meshes/DIFu-0074-090.mp4" type="video/mp4">
</video>
<video height="256p" width="%" playsinline="" autoplay="autoplay" loop="loop" preload="" muted="">
<source src="assets/meshes/DIFu-0089-000.mp4" type="video/mp4">
</video>
<video height="256p" width="%" playsinline="" autoplay="autoplay" loop="loop" preload="" muted="">
<source src="assets/meshes/DIFu-0105-090.mp4" type="video/mp4">
</video>
<video height="256p" width="%" playsinline="" autoplay="autoplay" loop="loop" preload="" muted="">
<source src="assets/meshes/DIFu-0146-000.mp4" type="video/mp4">
</video>
<video height="256p" width="%" playsinline="" autoplay="autoplay" loop="loop" preload="" muted="">
<source src="assets/meshes/DIFu-0223-180.mp4" type="video/mp4">
</video>
<video height="256p" width="%" playsinline="" autoplay="autoplay" loop="loop" preload="" muted="">
<source src="assets/meshes/DIFu-0229-270.mp4" type="video/mp4">
</video>
<video height="256p" width="%" playsinline="" autoplay="autoplay" loop="loop" preload="" muted="">
<source src="assets/meshes/DIFu-0236-180.mp4" type="video/mp4">
</video>
<video height="256p" width="%" playsinline="" autoplay="autoplay" loop="loop" preload="" muted="">
<source src="assets/meshes/DIFu-0521-270.mp4" type="video/mp4">
</video>
</div>
</div>
</div>
</section>
<br><br>
<!-- Ablations -->
<!-- <section>
<div class="container">
<div class="row">
<div class="col-12 text-center">
<h3>More Ablation Studies</h3>
<hr style="margin-top:0px">
<img src="image/Ours.png", alt="The image cannot be displayed!", width=85%> <br><br>
<img src="image/Ours-wo-post.png", alt="The image cannot be displayed!", width=85%> <br><br>
<img src="image/Ours-wo-pre.png", alt="The image cannot be displayed!", width=85%> <br><br>
<img src="image/Ours-wo-both.png", alt="The image cannot be displayed!", width=85%> <br><br>
<img src="image/Ours-L1.png", alt="The image cannot be displayed!", width=85%> <br><br>
<img src="image/Ours-wo-local.png", alt="The image cannot be displayed!", width=85%> <br><br>
<img src="image/Ours-wo-color-preprocessing.png", alt="The image cannot be displayed!", width=85%> <br><br>
</div>
</div>
</div>
</section>
<br> -->
<!-- Ablations -->
<section>
<div class="container">
<div class="row">
<div class="col-12 text-center">
<hr style="margin-top:0px">
<h3>Discussions</h3>
<p class="text-abstract">
<i>More disscussions can be updated if needed.</i>
<hr style="margin-top:0px">
</p>
<p class="text-left">
<h5>Design Motive</h5>
<p class="text-justify">
Although we inspired by PaMIR, which demonstrates powerful performance with simple implementation, we determined that the existing implicit function-based digital human reconstruction methods are difficult to benefit from spatial assumptions within the occupancy vector estimation mechanism.
We focused on addressing the issue of oversmoothing, particulary in relation to overreliance on patterns of human for the unseen regions, as the loss function compares 1D tensor using MSE.
By placing modules with the inductive bias of convolutional operation at the forefront of the pipeline, we devised a method that allows the implicit function to convert the explicit 3D shaped input into a mesh output without excessive reliance on human patterns.
However, it does not simply serve as a converter.
As the 3D prior can somewhat be incorrect, the implicit function can compensate for this by relying the patterns.
To enhance this ability, we introduced augmentation offset during training.
</p>
<br>
<hr style="margin-top:0px">
<h5>Training Generative Model</h5>
<img src="assets/Tab2.png", alt="The image cannot be displayed!", width=100%>
<br>
<p class="text-justify">
Due to the limited availability of the dataset, we performed reimplementation and retraining of the comparative algorithms under the same conditions.
The dataset we used had limited statistics such as clothing, poses, and races, making it challenging to attempt web images that significantly deviated from the dataset distribution.
DIFu is sensitive in the performance of its two frontal modules.
Particularly in this situation where the dataset is scarce, the performance of the hallucinator can undergo dramatic changes depending on the training method.
In the ablation study and table 2 of the main paper, we investigated the hallucinator.
The model with the application of adversarial loss demonstrates robustness on unseen datasets compared to the model without it.
However, on the contrary, when training the implicit function, the predicted back-side image can differ from the actual back view in the training dataset, which can undermine confidence in explicit guidance.
Preventing mode collapse in GANs can ironically result in the implicit function losing confidence in the generated inputs, incurring to oversmoothing in the back-side.
<br><br>
</p>
<hr style="margin-top:0px">
<h5>Texture with Lower Resolution than PaMIR</h5>
In many implicit function-based methods, if there is no appropriate conditioning for the unseen parts, the implicit function tends to grey out those parts during minimizing the objective function.
Our approach significantly mitigates this drawback by embedding color information in the spatial domain.
However, During the blending process of aligned front-/back-side images and estimated texture vector, we observed the undesired decrease in resolution with the similar architecture to PaMIR.
We acknowledge that there is still room for improvement in this aspect and it appears necessary to introduce additional modules or methods to facilitate better blending.
<br><br>
</p>
<hr style="margin-top:0px">
</div>
</div>
</div>
</section>
<br>
<!-- Contact -->
<section>
<div class="container">
<div class="row">
<div class="col-12 text-center">
<h3>Contact</h3>
<hr style="margin-top:0px">
<p class="text-center">
For more questions, please contact [email protected] or [email protected].
<br>
</div>
</div>
</div>
</section>
<br><br>
<!-- Acknowledgements -->
<section>
<div class="container">
<div class="row">
<div class="col-12 text-center">
<h3>Acknowledgments</h3>
</div>
<div class="col-12 text-justify">
<hr style="margin-top:0px">
The source code repository of PIFu, PaMIR, and ICON were referenced for reimplementation of themselves and pre-processing of the dataset.
Also, PIFuHD was referred for mesh rendering and evaluation. <br> <br>
<p class="text-center">
PIFu (ICCV 2019, Saito et al.): <a href="https://arxiv.org/pdf/1905.05172.pdf" target="_blank">Paper</a> | <a href="https://github.com/shunsukesaito/PIFu" target="_blank">Code</a> | <a href="https://www.youtube.com/watch?v=S1FpjwKqtPs" target="_blank">Video</a> <br>
PIFuHD (CVPR 2020, Saito et al.): <a href="https://arxiv.org/pdf/2004.00452.pdf" target="_blank">Paper</a> | <a href="https://github.com/facebookresearch/pifuhd" target="_blank">Code</a> | <a href="https://www.youtube.com/watch?v=uEDqCxvF5yc" target="_blank">Video</a> <br>
PaMIR (IEEE TPAMI 2021, Zheng et al.): <a href="https://arxiv.org/pdf/2007.03858.pdf" target="_blank">Paper</a> | <a href="https://github.com/ZhengZerong/PaMIR" target="_blank">Code</a> | <a href="http://www.liuyebin.com/pamir/pamir.html" target="_blank">Project Page</a> <br>
ICON (CVPR 2022, Xiu et al.): <a href="https://arxiv.org/pdf/2112.09127.pdf" target="_blank">Paper</a> | <a href="https://github.com/YuliangXiu/ICON" target="_blank">Code</a> | <a href="https://www.youtube.com/watch?v=hZd6AYin2DE" target="_blank">Video</a> <br>
</p>
We employed <a href="https://github.com/ytrock/THuman2.0-Dataset" target="_blank">THuman2.0</a> and <a href="https://buff.is.tue.mpg.de/" target="_blank">BUFF</a> for the experiments as datasets.
<br><br>
<p class="text-center">
THuman2.0 (CVPR 2021, Yu et al.): <a href="https://openaccess.thecvf.com/content/CVPR2021/papers/Yu_Function4D_Real-Time_Human_Volumetric_Capture_From_Very_Sparse_Consumer_RGBD_CVPR_2021_paper.pdf" target="_blank">Paper</a> <br>
BUFF (CVPR 2017, Zhang et al.): <a href="https://openaccess.thecvf.com/content_cvpr_2017/papers/Zhang_Detailed_Accurate_Human_CVPR_2017_paper.pdf" target="_blank">Paper</a>
</p>
<br><br>
</div>
</div>
</div>
</section>
<!-- Citing -->
<div class="container">
<div class="row ">
<div class="col-12">
<div class="col-12 text-center">
<h3>Citation</h3>
</div>
<hr style="margin-top:0px">
<pre style="background-color: #e9eeef;padding: 1.25em 1.5em">
<code>
@InProceedings{Song2022difu,
author={Song, Dae-Young and and Lee, HeeKyung and Seo, Jeongil and Cho, Donghyeon},
title={DIFu: Depth-Guided Implicit Function for Clothed Human Reconstruction},
journal={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2023},
}
</code></pre>
<hr>
</div>
</div>
</div>
<footer class="text-center" style="margin-bottom:10px">
Thanks to <a href="https://lioryariv.github.io/" target="_blank">Lior Yariv</a> for the website template.
</footer>
</body>
</html>