doogyhatts
Arch-Supremacy Member
- Joined
- Feb 13, 2018
- Messages
- 13,741
- Reaction score
- 3,949
Do realistic WoW animations, won't go wrong.
For local video generation, I use a RTX 5080.I see the requirements a bit scary.
How much to build a rig to use this?
Ur setup so pro u earn alot making AI vid?For local video generation, I use a RTX 5080.
I mainly use it for image generation, image editing, generate speaking avatars and video upscaling.
My entire rig with monitor costs 3k sgd (exclude hard-disk).
For image generation, I am also using the Seedream 3.0 model on the Dreamina platform.
For video generation, I am still using KlingAI, since I am partially sponsored by them.
I upscale my videos using Topaz Starlight-Mini ($249 usd fixed cost).
If you feel that the images from Midjourney are better, then you have to factor in their subscription plan.
The video output from Midjourney is only 480p, but has very good physics and non-photorealistic rendering.
Some people don't use an expensive local rig, they generate their images and video on Midjourney, then upscale using Starlight-Mini in their cheaper rig (RTX 5060-Ti).
As for the Scarlet Monastery video, the creator replied that he used KlingAI and Veo2 to generate the video clips.
I think his images might be generated from Imagen4.
But I don't think he used KlingAI entirely for the lip-sync, he could have used either Dreamina or Veo3.
Not yet.Ur setup so pro u earn alot making AI vid?
Lol i want to do something like this
I don't use Fantasy Talking or Float.Got any tricks to get Fantasy Talking or Float to bypass the 5s limit? Either the video gets distorted, not enough vram, or the model is pegged to 5s.
Btw, if we take the last frame of the previous clip to make the next one, will it work to look like it can continue? Like extend longer.I don't use Fantasy Talking or Float.
Use either Hunyuan Avatar, Omni Avatar or Multi Talk.
I am able to run Hunyuan Avatar and Omni Avatar using 16gb vram for 10 seconds audio.
Omni Avatar is faster but has less body motion. I am using the command line version.
Hunyuan Avatar is slower and has morphing hands. I used it inside Wan2GP.
Multi Talk will OOM for 10 seconds audio right now, wait for Wan2GP to integrate it and lower the requirements.
There will be a slight colour difference between the last frame of the first segment and the first frame of the second segment.Btw, if we take the last frame of the previous clip to make the next one, will it work to look like it can continue? Like extend longer.
Or the sample image is not gauranteed to be the 1st frame.
ic. So far there is no real open-source that can maintain the same fidelity and duration length like HeyGen this kind of commercial ones rite.There will be a slight colour difference between the last frame of the first segment and the first frame of the second segment.
This is for non-speaking avatars.
For speaking avatars, the first frame of the second segment is not guaranteed to be the same as the last frame of the first segment.
Someone told me on github that he did a 16-second one using MultiTalk on a 3090.ic. So far there is no real open-source that can maintain the same fidelity and duration length like HeyGen this kind of commercial ones rite.
5s is too short. 10s might be OK to atleast complete a sentence lol.
HeyGen can do like long length 1min no cut kind. Not sure if there is some editing magic going on but atleast seems seamless.Someone told me on github that he did a 16-second one using MultiTalk on a 3090.
I am not sure what is the maximum length of the audio that HeyGen can support.
I have not tried doing above 10 seconds for Hunyuan Avatar & Omni Avatar solutions.
I see.HeyGen can do like long length 1min no cut kind. Not sure if there is some editing magic going on but atleast seems seamless.
I think it has no limit on how much how long of the video you generate, I just tried generate 1 minute of talking video and it's working just fine.
I think it generate in chunks, but the VRAM consumption is 17-18GB which might be a bit over from 16GB and that's why you get OOM.
I can't run it on my 5080 as well, but I can run it just fine on 3090 no matter how long the clip is.
Nice but this one cant based on image reference avatar right.@KaiserBreath
That same person now tells me on github, that he did try a 1 min audio length in MultiTalk and it works.
It does have a requirement to have an input image in the json file.Nice but this one cant based on image reference avatar right.
I think usually those are harder, and if comes with gestures, even shorter output.