How might LLMs store facts

[-]

Ok-Obligation-281@reddit

Am i the only who struggled to understand this episode

Reply

[-]

miscellaneous_robot@reddit

The dot product visualization against all 90 degrees vectors made something click in me

Reply

[-]

AcanthocephalaNo8273@reddit

Yeah it's amazing what higher dimensions can do, although the video got the example wrong, 100 dimensions was too small and there was a bug in the code.

Reply

[-]

First_Understanding2@reddit

3blue1brown is awesome for math explanations

Reply

[-]

miscellaneous_robot@reddit

He needs to dig deeper into the Superposition stuff. That topic will reveal something that is linked to covalence of facts across tokens, or something like that

Reply

[-]

He got the explanation right but the example wrong unfortunately. Its impossible to fit 10,000 vectors that are all between 89 and 91 degrees of each other in 100 dimensions, needs to be between 75 and 105 or something. The accuracy gets better the higher the dimensions.

Reply

[-]

GoodShape4279@reddit

So, if we rename R\_i as keys, and C\_i as values, replace ReLU with another nonlinearity softmax, we will get something like cross-attention with some KV-cache, that is learned during the pre-training. Can we just unite self-attention and MLP layers and have this 50k vectors of MLP as prefix of any input?

Reply

[-]

More-Ad5919@reddit

But what is a fact?

Reply

[-]

Barry_Jumps@reddit

This guys videos simultaneously make me want to continue learning / quit and become a farmer. Also, fun fact, he animated each video with a few thousand lines of Python code. Also... I'm a farmer now.

Reply

[-]

tabspaces@reddit

why not both, a farmer with deep learning phd (or a deep learning trauma you choose)

Reply

[-]

RegularFerret3002@reddit

Trauma chooses you

Reply

[-]

bettedavisbettedavis@reddit

facts are stored in the balls

Reply

[-]

KrazyA1pha@reddit

Great video. This same phenomenon explains why LLMs can so easily make abstract analogies. > Describe Michael Jordan, his career, and legacy using a computer analogy. **Claude 3.5 Sonnet:** > Michael Jordan was like a revolutionary computer processor that transformed the NBA (think of it as the operating system of professional basketball). His career can be compared to the development and dominance of a groundbreaking CPU: > > 1. Boot-up phase (Early career): Jordan entered the league like a new processor architecture, showing immense potential but still adapting to the system. His early years were spent optimizing his performance and learning to work within the NBA’s environment. > 2. Processing power (Prime years): As Jordan hit his stride, he became the most powerful processor in the league. His ability to execute complex plays (instructions) with incredible speed and efficiency was unmatched. He could handle multiple tasks simultaneously - scoring, defending, and leading his team - all while using minimal resources (energy). > 3. Overclocking (Clutch performances): In critical moments, Jordan seemed to overclock himself, pushing beyond normal limits to achieve extraordinary results. This was evident in his game-winning shots and stellar playoff performances. > 4. System compatibility (Teammates): Jordan’s greatness was amplified by his ability to interface effectively with various “peripherals” (teammates). He made those around him perform better, much like a powerful CPU enhancing the overall system performance. > 5. Upgrades (Skill development): Throughout his career, Jordan continuously upgraded his “software,” adding new skills like an improved jump shot and post-up game. These enhancements made him increasingly difficult for opponents to defend against or “debug.” > 6. Multithreading (Multisport athlete): Jordan’s brief foray into professional baseball demonstrated his versatility, like a processor capable of running different types of software efficiently. > 7. Legacy (Lasting impact): Just as groundbreaking processors influence future chip designs, Jordan’s impact on basketball continues long after his retirement. His playing style, work ethic, and competitive drive became blueprints for future generations of players. > 8. Benchmarking (Records and achievements): Jordan’s six championships, numerous scoring titles, and MVP awards serve as benchmarks against which other players’ performances are measured, similar to how new processors are compared to legendary models. > > In the end, Michael Jordan's career redefined the possibilities of basketball performance, much like how a revolutionary processor can redefine computing capabilities. His legacy continues to influence the game, setting a standard of excellence that others strive to match or surpass.

Reply

[-]

KrazyA1pha@reddit

Or, as a Python script: import random class Player: def __init__(self, name): self.name = name self.skills = {"scoring": 0, "defense": 0, "leadership": 0} self.achievements = [] self.championships = 0 def train(self): for skill in self.skills: self.skills[skill] += random.randint(1, 5) def play_season(self): performance = sum(self.skills.values()) + random.randint(1, 20) if performance > 100: self.championships += 1 self.achievements.append(f"{self.championships}x NBA Champion") if performance > 90: self.achievements.append("MVP") if self.skills["scoring"] > 30: self.achievements.append("Scoring Title") def retire(self): print(f"{self.name}'s Career Summary:") print(f"Final Skills: {self.skills}") print(f"Championships: {self.championships}") print(f"Achievements: {', '.join(set(self.achievements))}") def simulate_career(player): for _ in range(15): # 15 seasons player.train() player.play_season() player.retire() # Create and simulate Michael Jordan's career jordan = Player("Michael Jordan") # College years and early pro seasons for _ in range(4): jordan.train() # Prime years for _ in range(11): jordan.train() jordan.play_season() # First retirement jordan.skills["scoring"] -= 5 # Baseball attempt jordan.skills = {"scoring": 10, "defense": 15, "leadership": 25} # Return to basketball for _ in range(3): jordan.train() jordan.play_season() # Second retirement and comeback jordan.skills["scoring"] -= 3 jordan.skills["defense"] -= 3 for _ in range(2): jordan.train() jordan.play_season() # Final retirement jordan.retire() # Run the simulation simulate_career(jordan)

Reply

[-]

SomeOddCodeGuy@reddit

The timing of this post is exceptionally perfect; I was just starting a deep dive into the relationship between parameters and knowledge a couple of hours ago to answer a question I had for a project I'm working on lol. This could not have possibly worked out better for me.

Reply

How might LLMs store facts

Reply to Post

15 Comments

Ok-Obligation-281@reddit

miscellaneous_robot@reddit

AcanthocephalaNo8273@reddit

First_Understanding2@reddit

miscellaneous_robot@reddit

AcanthocephalaNo8273@reddit

GoodShape4279@reddit

More-Ad5919@reddit

Barry_Jumps@reddit

tabspaces@reddit

RegularFerret3002@reddit

bettedavisbettedavis@reddit

KrazyA1pha@reddit

KrazyA1pha@reddit

SomeOddCodeGuy@reddit