Summary: The big AI events of October

Posted by nh_local@reddit | LocalLLaMA | View on Reddit | 26 comments

* **Flux 1.1 Pro** is released, showcasing advanced capabilities for image creation. * Meta unveils **Movie Gen**, a new AI model that generates videos, images, and audio from text input. * Pika introduces **Video Model 1.5** along with "Pika Effects". * Adobe announces its video creation model, **Firefly Video**. * Startup Rhymes AI releases **Aria**, an open-source, multimodal model exhibiting capabilities similar to comparably sized proprietary models. * Meta releases an open-source speech-to-speech language model named **Meta Spirit LM**. * Mistral AI introduces **Ministral**, a new model available in 3B and 8B parameter sizes. * **Janus AI**, a multimodal language model capable of recognizing and generating both text and images, is released as open source by DeepSeek-AI. * Google DeepMind and MIT unveil **Fluid**, a text-to-image generation model with industry-leading performance at a scale of 10.5B parameters. * **Stable Diffusion 3.5** is released in three sizes as open source. * Anthropic launches **Claude 3.5 Sonnet New**, demonstrating significant advancements in specific areas over its previous version, and announces **Claude 3.5 Haiku**.

Reply to Post

26 Comments

[-]

nh_local@reddit (OP)

Update: The post has been updated with some important events that happened right at the end of the month!

[-]

sunshinecheung@reddit

If Flux 1.1 Pro open source, it will be invincible

[-]

nh_local@reddit (OP)

At some point it will be so good that there will be no difference between the models (In language models it is more complex, because the intelligence is infinite, but the quality of images has a certain limit)

[-]

Ok-Parsnip-4826@reddit

>In language models it is more complex, because the intelligence is infinite The LLM won't magically grow more intelligent than the creators of the training data. The intelligence of language models only emerges as long as it helps to predict the next token. If the next token in the training data isn't always the "smartest" next token, then it won't be helpful for the network to be smarter than that. There is nothing in the model that would make it learn a correct concept, if it is mostly presented with an incorrect one. You could argue that the truth tends to be less complex than the alternatives, but that is just wishful thinking in my book and will probably be even less arguable when the models grow larger.

[-]

nh_local@reddit (OP)

This is not accurate. Einstein had no guidance data to invent the theory of relativity Language models mimic the activity of the human brain

[-]

Ok-Parsnip-4826@reddit

>Einstein had no guidance data to invent the theory of relativity Yes, because Einstein was a human with a human brain, unlike language models, which are just language models. >Language models mimic the activity of the human brain No, language models, specifically GPTs, predict the next token based on previous tokens, nothing more. The claim that it therefore "mimics the activity of the human brain" is, as I said before, wishful thinking.

[-]

nh_local@reddit (OP)

not. The human brain also predicts the next token during its thinking

[-]

-p-e-w-@reddit

> In language models it is more complex, because the intelligence is infinite Intelligence is not infinite, not even in theory. Its upper limit is the amount of information you can infer from a given input and your knowledge of the Universe. If I ask "What am I holding in my hand?", no amount of intelligence can reveal the answer. The information simply isn't there.

[-]

Xanjis@reddit

Eh. While accuracy wouldn't be great something smarter then me could probably harvest your reddit history and fairly easily find every reference to you on the internet and in public databases. That would reveal your interests and personality enough to make a good guess. Might even be enough to diagnose a plethora of health issues as well.

[-]

-p-e-w-@reddit

That doesn't change anything about what I wrote. Intelligence is not an open-ended mechanism. There is *in principle* a maximum amount of information that is determined by a given input. If a model reaches that limit, it isn't going any further, ever.

[-]

Xanjis@reddit

That's assuming the other user subscribes to your definition of intelligence. I'm fairly certain when the AI community says intelligence they mean something closer to utility. Some messy combination of how many problems can it find solutions to accurately, how fast can it find solutions, how many solutions can it solve in parallel. In principle this can keep scaling up by feeding more mass to a black hole and building a larger and larger matrioshka brain around it. In practice the amount of accessible mass to throw into it is finite without FTL.

[-]

-p-e-w-@reddit

There is no definition of intelligence where intelligence is unbounded. That doesn't make any sense. Whether it's about information, utility, or something else – there is invariably a limit to what you can extract from what you are given. And you cannot scale up intelligence like that. You can scale up how many problems an intelligence can process, but that doesn't make it *more intelligent.* Intelligence is bounded by what is possible with what you are told, not by how much electricity you put in.

[-]

Calandiel@reddit

Take your, bounded definition of intelligence. Based on your other comments, I infer that you could presumably map it to a [0, 1] interval with some bijection. If that's the case, define new intelligence measure as `cot(x * pi)`, a function of your original intelligence. This new intelligence is unbounded. This of course would be a very silly way to define intelligence but I think it illustrates that there at least exist some definitions of intelligence that are unbounded - it's not obvious you can just rule out them all as silly without some serious (and impractical) investigation.

[-]

tessellation@reddit

> "What am I holding in my hand?", no amount of intelligence can reveal the answer. and yet we all know :)

[-]

dddimish@reddit

My precious-s-s-s-s ?

[-]

nh_local@reddit (OP)

Certainly for some tasks there will be no difference between a small and simple open source model and the best sota model

[-]

InvestigatorHefty799@reddit

Standalone image models likely wont be a thing for long, they will be part of a multimodal model. Even if we don't go directly to full multimodal like 4o was supposed to be, images will be derived from video models where an image is just a single frame of a video.

[-]

nh_local@reddit (OP)

The next step will be to consistently create an entire comic at the click of a button

[-]

-p-e-w-@reddit

Less than 2 years away, I reckon.

[-]

nh_local@reddit (OP)

source: [https://nhlocal.github.io/AiTimeline/](https://nhlocal.github.io/AiTimeline/)

[-]

nh_local@reddit (OP)

Oh. This would be an endless summary

[-]

Ok-Succotash-7945@reddit

good info

[-]

a_beautiful_rhind@reddit

SD 3.5 got released too. That's a local model and not paid API we'll never touch.

Summary: The big AI events of October

Reply to Post

26 Comments

nh_local@reddit (OP)

sunshinecheung@reddit

nh_local@reddit (OP)

Ok-Parsnip-4826@reddit

nh_local@reddit (OP)

Ok-Parsnip-4826@reddit

nh_local@reddit (OP)

-p-e-w-@reddit

Xanjis@reddit

-p-e-w-@reddit

Xanjis@reddit

-p-e-w-@reddit

Calandiel@reddit

tessellation@reddit

dddimish@reddit

nh_local@reddit (OP)

InvestigatorHefty799@reddit

nh_local@reddit (OP)

-p-e-w-@reddit

nh_local@reddit (OP)

Intelligent_Jello344@reddit

Robert__Sinclair@reddit

Everlier@reddit

nh_local@reddit (OP)

Ok-Succotash-7945@reddit

a_beautiful_rhind@reddit