HappyHorse maybe will be open weights soon (it beat seedance 2.0 on Artificial Analysis!)

Posted by External_Mood4719@reddit | LocalLLaMA | View on Reddit | 12 comments

source unified large model for text-to-video/image-to-video + audio) has recently been making waves on the international stage. After verification from multiple sources, the team behind it has been revealed: they are from the obao and Tmall Group (TTG) Future Life Labled by ang Di(The lab was created by the ATH-AI Innovation Business Department and has since become an independent entity).

ofile of Zhang Di: He holds both a Bachelor's and Master's degree from Shanghai Jiao Tong University. He is the head of the TTG Future Life Lab (Rank: P11) and reports to Zheng Bo, Chief Scientist of TTG and CTO of Alimama. He previously served as the lead (No. 1 position) for Kuaishou’s ing.d prior to that, he was the head of Big Data and Machine Learning Engineering Architecture at Alimama.

P.S. 1. It is rumored that HappyHorse 1.0 will be officially released on the 10th of this month. (It has been undergoing intensive testing recently; in fact, information was leaked back in March, but Alibaba PR immediately deleted the relevant sources). Word is that the team will also release several different types of models, so stay tuned. 2. Alimama is the algorithm platform within the Taobao and Tmall ecosystem and has produced many renowned algorithm experts (this is also the birthplace of the Wan model). After honing his skills at Kuaishou’s Kling, Zhang Di’s return is described as "a fish back in water." He is reportedly extremely excited lately. The team at Xixi District C works late every night and is even happily putting in overtime on Saturdays.

[Basic Information]

Model Type: Open-source unified model for Text-to-Video / Image-to-Video + Audio.
Inference Paradigm: Single Transformer Transfusion, CFG-less (Classifier-Free Guidance-less).
Inference Steps: 8 steps.

[Video Parameters]

* **Resolution:** 1280×720 (720p)

* **Frame Rate:** 24fps

* **Duration:** 5 seconds

**[Audio Capabilities]**

* **Native Synchronous Generation:** Sound effects / Ambient sound / Voiceover

* **Supported Languages:** Chinese, English, Japanese, Korean, German, French

**[Open Source Status]**

* **Fully Open Source:** Base model + Distilled model + Super-resolution + Inference code

Source: https://mp.weixin.qq.com/s/n66lk5q_Mm10UYTnpEOf3w?poc_token=HKwe1mmjFX-RhveuVjk_MbRgFTcirVE2tKrRP_gS

[-]

death-to-yancey@reddit

honestly the only thing that matters is when they actually drop the weights. ath said the api is coming april 30 but api access does not mean open weights and im betting they dont happen at the same time. if you wanna run it locally a 15b parameter model at bf16 needs around 30gb just to load. hunyuanvideo does 720p on a 24gb card so if happyhorse is really as efficient as they say with the 8 step denoising you might be able to pull off 1080p on a 48gb rig. entirely guessing here until someone gets their hands on it though. the magicompiler stuff is the real problem cause if they use weird custom cuda kernels the community is gonna have a nightmare trying to quantize it. kind of like what happened when hunyuanvideo first came out. getting an apache 2.0 license is huge for finetuning but absolutely useless without the files. if you wanna mess around locally right now your best bet is playing with the davinci magihuman base since thats already out and open.

Anco918@reddit

For evaluating output quality before weights drop HappyHorse 1.0 I2V track is where it leads Seedance by the widest margin. Worth pressure-testing your specific prompts now before investing local setup time.

Rheumi@reddit

No open weights AI model will beat a SOTA-closed source model by 100 elo points. Especially not in T2V. Otherwise it would be closed source, too.

My guess is google. Like a Veo 4.

downvotes for speaking the truth...lol

Safe_Sky7358@reddit

They do beat it... with time. I'm pretty sure one of the video or text gen open-weight models can easily beat a SOTA close-sourced models from 6-8 months ago.

hkggerry1@reddit

Yeah, this is the part that makes it actually interesting to me. A 15B model with 8 denoise steps for 1080p plus audio sounds way more realistic for local use than I would’ve guessed. The MagiCompiler stuff sounds nice too, but until people get the weights and test it themselves, the real VRAM usage is still kind of a question mark. HunyuanVideo already runs pretty well at 720p on 24GB, so 1080p landing somewhere in the 40 to 48GB range for HappyHorse doesn’t sound crazy. Still just a guess for now though. The Apache 2.0 license is a big deal too, especially for anyone who wants to fine tune it instead of just messing with inference.

El_newb@reddit

If you want to evaluate output quality before the weights land, the I2V track is where its Elo lead over Seedance is most pronounced — HappyHorse 1.0

martinerous@reddit

Sounds amazing if true. It's about time to get something better than Wan2.2 when it comes to prompt understanding (LTX 2.3 is not there yet).
Duration: 5 seconds - ouch. I hope it will be possible to push it a bit longer? Or at least extend seamlessly?