Comparing model types with examples (2025)

Since I started with Faceswap I've heard many questions over and over.
Which is the best model? How many iterations? Why does my swap look blurry?

The real answer is somewhere between having good data, lots of patience, some artistry and creativity.
I feel different models work best in different situations, and you will have to figure out what/how that is.

Postprocessing in Adobe After Effects could make them look professional, and way beyond my capacity, and not the point of these examples.
Thought I could help by showing y'all some models I made out of the exact same data.
Will update these post as I do more. Suggestions welcome.

Model A is a youtuber.
She had lots of data available. Closeups, different expressions, makeup, lighting, to make reasonably good training data.
YouTube is NOT the best place to find good data, but this person had much to choose from, with clear views of her face.

Model B is Karen Gillan.
Used a few YouTube videos, and several episodes of Dr Who, and high quality photos.

The video for converting was chosen to show different lighting, makeup and distance from the camera. So you can see the models react differently to different situations.

These models have not been trained to perfection, also called convergence. In fact, I didn't try very hard to make them perfect.
Really just wanted to show the raw output. Every one could be trained better, with better data, and made to look better by tweaking (I didn't do). I am by no means a professional at this, but I have time, I am curious and have 4 video cards.

All in all, I think the dataset is at least good/mediocre... not amazing. About 4500 pics for each.
I did manually adjust the alignments and mask for about 40% for each set to near perfect, very time consuming, likely unnecessary. "Close" would have been fine for training.

Most of these were trained on a Nvidia 1070. Smaller models were trained on 1060s.

What have I learned:
Close-ups are really hard. Something at the distance of the Jennifer Lawrence-Buscemi video are very possible to make flawless.
You can see this in the example video I've used.
Screen filling faces, cause troubles.

Training higher quality models really do add lots of time. About 160-172 pixels per side seem to be the sweetspot for my 8gb card, but takes ages.
Currently I'm trying to train the best possible at 192... a low batch of 6 ...and it is still getting better at 500K iterations 180 hours. Plus side, I actually don't need a super computer to do what studios were doing in the early 2000's, just a single 4 year old video card and patience. What I also know, is 128-160 would be fine for some pretty convincing swaps.

Thanks to the mods and supporters continuing to help me with this.
There will be more examples, edits, and advice added as time goes on.

Some requested stats.
Original : 140 EGs/sec B:100
IAE: 115EG/s B:64
LightWeight: 50EGs/sec B:64
Dfaker: 47.5EGs/sec B:100
DFL-H128 : 29.5 EGs/sec B:80
DFL-SAE-DF @128 : 9.2 EGs/sec
DFL-SAE-Liae @128 : 8.8 EGs/sec B:16
DFL-SAE-Liae @192 : 3.9 EGs/sec B:6
Dlight: 15.7EGs/sec B:14
Realface: 11.2EGs/sec B:8
villain: 7.8EGs/sec B:10

Comparing model types with examples (1) I dunno what I'm doing Comparing model types with examples (2)
2X RTX 3090 : RTX 3080 : RTX: 2060 : 2x RTX 2080 Super : Ghetto 1060

Comparing model types with examples (2025)

References

Top Articles
Latest Posts
Recommended Articles
Article information

Author: Virgilio Hermann JD

Last Updated:

Views: 6052

Rating: 4 / 5 (41 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Virgilio Hermann JD

Birthday: 1997-12-21

Address: 6946 Schoen Cove, Sipesshire, MO 55944

Phone: +3763365785260

Job: Accounting Engineer

Hobby: Web surfing, Rafting, Dowsing, Stand-up comedy, Ghost hunting, Swimming, Amateur radio

Introduction: My name is Virgilio Hermann JD, I am a fine, gifted, beautiful, encouraging, kind, talented, zealous person who loves writing and wants to share my knowledge and understanding with you.