Stable diffusion thread

doogyhatts

Arch-Supremacy Member
Joined
Feb 13, 2018
Messages
13,741
Reaction score
3,949
@AikiBoy
This channel mainly use AI voice and images only.
If the topic cannot meet the 2hr limit, the remaining content is pasted from other unrelated topic.
On the 15th this month, Youtube will be revising some of the YPP rules regarding mass-produced and repetitive content.

 

windwaver

High Supremacy Member
Joined
Apr 28, 2000
Messages
34,225
Reaction score
2,737
I see the requirements a bit scary.

How much to build a rig to use this?
 

focus1974

Greater Supremacy Member
Joined
May 12, 2007
Messages
91,190
Reaction score
32,797
Do realistic WoW animations, won't go wrong.


next 2 years.. the games coming out will be more and more realistic or of higher graphical fidelity and creativity even for anime.


Already noticed the store graphics of the game apps in Google Playstore all suddenly updated into very visually appealing ones even though the games was from years ago..
 

doogyhatts

Arch-Supremacy Member
Joined
Feb 13, 2018
Messages
13,741
Reaction score
3,949
I see the requirements a bit scary.

How much to build a rig to use this?
For local video generation, I use a RTX 5080.
I mainly use it for image generation, image editing, generate speaking avatars and video upscaling.
My entire rig with monitor costs 3k sgd (exclude hard-disk).

For image generation, I am also using the Seedream 3.0 model on the Dreamina platform.
For video generation, I am still using KlingAI, since I am partially sponsored by them.
I upscale my videos using Topaz Starlight-Mini ($249 usd fixed cost).

If you feel that the images from Midjourney are better, then you have to factor in their subscription plan.
The video output from Midjourney is only 480p, but has very good physics and non-photorealistic rendering.
Some people don't use an expensive local rig, they generate their images and video on Midjourney, then upscale using Starlight-Mini in their cheaper rig (RTX 5060-Ti).

As for the Scarlet Monastery video, the creator replied that he used KlingAI and Veo2 to generate the video clips.
I think his images might be generated from Imagen4.
But I don't think he used KlingAI entirely for the lip-sync, he could have used either Dreamina or Veo3.
 
Last edited:

AikiBoy

Arch-Supremacy Member
Joined
Mar 31, 2005
Messages
22,896
Reaction score
1,306
For local video generation, I use a RTX 5080.
I mainly use it for image generation, image editing, generate speaking avatars and video upscaling.
My entire rig with monitor costs 3k sgd (exclude hard-disk).

For image generation, I am also using the Seedream 3.0 model on the Dreamina platform.
For video generation, I am still using KlingAI, since I am partially sponsored by them.
I upscale my videos using Topaz Starlight-Mini ($249 usd fixed cost).

If you feel that the images from Midjourney are better, then you have to factor in their subscription plan.
The video output from Midjourney is only 480p, but has very good physics and non-photorealistic rendering.
Some people don't use an expensive local rig, they generate their images and video on Midjourney, then upscale using Starlight-Mini in their cheaper rig (RTX 5060-Ti).

As for the Scarlet Monastery video, the creator replied that he used KlingAI and Veo2 to generate the video clips.
I think his images might be generated from Imagen4.
But I don't think he used KlingAI entirely for the lip-sync, he could have used either Dreamina or Veo3.
Ur setup so pro u earn alot making AI vid?
 

doogyhatts

Arch-Supremacy Member
Joined
Feb 13, 2018
Messages
13,741
Reaction score
3,949
Ur setup so pro u earn alot making AI vid?
Not yet.
I am still building a new audience for WoW-related content.
I had to make a set of reusable sprites for the character, which takes up a lot of time.

And the main problem is that KlingAI's lip-sync is still not updated to the new OmniSync algorithm which they have completed their research.
Running the open-source lip-sync algorithms are very slow on the local machine.

Other people don't care about character consistency, so they just whack many different characters and environments quickly.
Then add in the spoken audio, generated from ElevenLabs. Making sure got enough animations to satisfy the long audio.
 
Last edited:

KaiserBreath

Senior Member
Joined
Feb 22, 2005
Messages
1,031
Reaction score
209
Got any tricks to get Fantasy Talking or Float to bypass the 5s limit? Either the video gets distorted, not enough vram, or the model is pegged to 5s.
 

doogyhatts

Arch-Supremacy Member
Joined
Feb 13, 2018
Messages
13,741
Reaction score
3,949
Got any tricks to get Fantasy Talking or Float to bypass the 5s limit? Either the video gets distorted, not enough vram, or the model is pegged to 5s.
I don't use Fantasy Talking or Float.
Use either Hunyuan Avatar, Omni Avatar or Multi Talk.

I am able to run Hunyuan Avatar and Omni Avatar using 16gb vram for 10 seconds audio.
Omni Avatar is faster but has less body motion. I am using the command line version.
Hunyuan Avatar is slower and has morphing hands. I used it inside Wan2GP.
Multi Talk will OOM for 10 seconds audio right now, wait for Wan2GP to integrate it and lower the requirements.
 

KaiserBreath

Senior Member
Joined
Feb 22, 2005
Messages
1,031
Reaction score
209
I don't use Fantasy Talking or Float.
Use either Hunyuan Avatar, Omni Avatar or Multi Talk.

I am able to run Hunyuan Avatar and Omni Avatar using 16gb vram for 10 seconds audio.
Omni Avatar is faster but has less body motion. I am using the command line version.
Hunyuan Avatar is slower and has morphing hands. I used it inside Wan2GP.
Multi Talk will OOM for 10 seconds audio right now, wait for Wan2GP to integrate it and lower the requirements.
Btw, if we take the last frame of the previous clip to make the next one, will it work to look like it can continue? Like extend longer.

Or the sample image is not gauranteed to be the 1st frame.
 

doogyhatts

Arch-Supremacy Member
Joined
Feb 13, 2018
Messages
13,741
Reaction score
3,949
Btw, if we take the last frame of the previous clip to make the next one, will it work to look like it can continue? Like extend longer.

Or the sample image is not gauranteed to be the 1st frame.
There will be a slight colour difference between the last frame of the first segment and the first frame of the second segment.
This is for non-speaking avatars.

For speaking avatars, the first frame of the second segment is not guaranteed to be the same as the last frame of the first segment.
 

KaiserBreath

Senior Member
Joined
Feb 22, 2005
Messages
1,031
Reaction score
209
There will be a slight colour difference between the last frame of the first segment and the first frame of the second segment.
This is for non-speaking avatars.

For speaking avatars, the first frame of the second segment is not guaranteed to be the same as the last frame of the first segment.
ic. So far there is no real open-source that can maintain the same fidelity and duration length like HeyGen this kind of commercial ones rite.

5s is too short. 10s might be OK to atleast complete a sentence lol.
 

doogyhatts

Arch-Supremacy Member
Joined
Feb 13, 2018
Messages
13,741
Reaction score
3,949
ic. So far there is no real open-source that can maintain the same fidelity and duration length like HeyGen this kind of commercial ones rite.

5s is too short. 10s might be OK to atleast complete a sentence lol.
Someone told me on github that he did a 16-second one using MultiTalk on a 3090.

I am not sure what is the maximum length of the audio that HeyGen can support.

I have not tried doing above 10 seconds for Hunyuan Avatar & Omni Avatar solutions.
 

KaiserBreath

Senior Member
Joined
Feb 22, 2005
Messages
1,031
Reaction score
209
Someone told me on github that he did a 16-second one using MultiTalk on a 3090.

I am not sure what is the maximum length of the audio that HeyGen can support.

I have not tried doing above 10 seconds for Hunyuan Avatar & Omni Avatar solutions.
HeyGen can do like long length 1min no cut kind. Not sure if there is some editing magic going on but atleast seems seamless.
 

doogyhatts

Arch-Supremacy Member
Joined
Feb 13, 2018
Messages
13,741
Reaction score
3,949
HeyGen can do like long length 1min no cut kind. Not sure if there is some editing magic going on but atleast seems seamless.
I see. Open-source cannot do this yet.

I am waiting for KlingAI to update their lip-sync algorithm to the new one.
Their lip-sync UI functionality now allows for multiple speakers and total audio length to 1 min.
 
Last edited:

doogyhatts

Arch-Supremacy Member
Joined
Feb 13, 2018
Messages
13,741
Reaction score
3,949
@KaiserBreath
That same person now tells me on github, that he did try a 1 min audio length in MultiTalk and it works.
I think it has no limit on how much how long of the video you generate, I just tried generate 1 minute of talking video and it's working just fine.
I think it generate in chunks, but the VRAM consumption is 17-18GB which might be a bit over from 16GB and that's why you get OOM.
I can't run it on my 5080 as well, but I can run it just fine on 3090 no matter how long the clip is.
 

doogyhatts

Arch-Supremacy Member
Joined
Feb 13, 2018
Messages
13,741
Reaction score
3,949

AikiBoy

Arch-Supremacy Member
Joined
Mar 31, 2005
Messages
22,896
Reaction score
1,306
How u get kling sponsorship? What perk?

What must u do for them
 
Important Forum Advisory Note
This forum is moderated by volunteer moderators who will react only to members' feedback on posts. Moderators are not employees or representatives of HWZ Forums. Forum members and moderators are responsible for their own posts. Please refer to our Community Guidelines and Standards and Terms and Conditions for more information.
Top