TheaterFire

How might LLMs store facts

Posted by de4dee@reddit | LocalLLaMA | View on Reddit | 15 comments

Reply to Post

15 Comments

Ok-Obligation-281@reddit

Am i the only who struggled to understand this episode
View on Reddit #34755349

miscellaneous_robot@reddit

The dot product visualization against all 90 degrees vectors made something click in me
View on Reddit #34623298

AcanthocephalaNo8273@reddit

Yeah it's amazing what higher dimensions can do, although the video got the example wrong, 100 dimensions was too small and there was a bug in the code.
View on Reddit #34721006

First_Understanding2@reddit

3blue1brown is awesome for math explanations
View on Reddit #34584518

miscellaneous_robot@reddit

He needs to dig deeper into the Superposition stuff. That topic will reveal something that is linked to covalence of facts across tokens, or something like that
View on Reddit #34624253

AcanthocephalaNo8273@reddit

He got the explanation right but the example wrong unfortunately. Its impossible to fit 10,000 vectors that are all between 89 and 91 degrees of each other in 100 dimensions, needs to be between 75 and 105 or something. The accuracy gets better the higher the dimensions.
View on Reddit #34720791

GoodShape4279@reddit

So, if we rename R\_i as keys, and C\_i as values, replace ReLU with another nonlinearity softmax, we will get something like cross-attention with some KV-cache, that is learned during the pre-training. Can we just unite self-attention and MLP layers and have this 50k vectors of MLP as prefix of any input?
View on Reddit #34704778

More-Ad5919@reddit

But what is a fact?
View on Reddit #34686199

Barry_Jumps@reddit

This guys videos simultaneously make me want to continue learning / quit and become a farmer. Also, fun fact, he animated each video with a few thousand lines of Python code. Also... I'm a farmer now.
View on Reddit #34605151

tabspaces@reddit

why not both, a farmer with deep learning phd (or a deep learning trauma you choose)
View on Reddit #34628181

RegularFerret3002@reddit

Trauma chooses you
View on Reddit #34633346

bettedavisbettedavis@reddit

facts are stored in the balls
View on Reddit #34595400

KrazyA1pha@reddit

Great video. This same phenomenon explains why LLMs can so easily make abstract analogies. > Describe Michael Jordan, his career, and legacy using a computer analogy. **Claude 3.5 Sonnet:** > Michael Jordan was like a revolutionary computer processor that transformed the NBA (think of it as the operating system of professional basketball). His career can be compared to the development and dominance of a groundbreaking CPU: > > 1. Boot-up phase (Early career): Jordan entered the league like a new processor architecture, showing immense potential but still adapting to the system. His early years were spent optimizing his performance and learning to work within the NBA’s environment. > 2. Processing power (Prime years): As Jordan hit his stride, he became the most powerful processor in the league. His ability to execute complex plays (instructions) with incredible speed and efficiency was unmatched. He could handle multiple tasks simultaneously - scoring, defending, and leading his team - all while using minimal resources (energy). > 3. Overclocking (Clutch performances): In critical moments, Jordan seemed to overclock himself, pushing beyond normal limits to achieve extraordinary results. This was evident in his game-winning shots and stellar playoff performances. > 4. System compatibility (Teammates): Jordan’s greatness was amplified by his ability to interface effectively with various “peripherals” (teammates). He made those around him perform better, much like a powerful CPU enhancing the overall system performance. > 5. Upgrades (Skill development): Throughout his career, Jordan continuously upgraded his “software,” adding new skills like an improved jump shot and post-up game. These enhancements made him increasingly difficult for opponents to defend against or “debug.” > 6. Multithreading (Multisport athlete): Jordan’s brief foray into professional baseball demonstrated his versatility, like a processor capable of running different types of software efficiently. > 7. Legacy (Lasting impact): Just as groundbreaking processors influence future chip designs, Jordan’s impact on basketball continues long after his retirement. His playing style, work ethic, and competitive drive became blueprints for future generations of players. > 8. Benchmarking (Records and achievements): Jordan’s six championships, numerous scoring titles, and MVP awards serve as benchmarks against which other players’ performances are measured, similar to how new processors are compared to legendary models. > > In the end, Michael Jordan's career redefined the possibilities of basketball performance, much like how a revolutionary processor can redefine computing capabilities. His legacy continues to influence the game, setting a standard of excellence that others strive to match or surpass.
View on Reddit #34591572

KrazyA1pha@reddit

Or, as a Python script: import random class Player: def __init__(self, name): self.name = name self.skills = {"scoring": 0, "defense": 0, "leadership": 0} self.achievements = [] self.championships = 0 def train(self): for skill in self.skills: self.skills[skill] += random.randint(1, 5) def play_season(self): performance = sum(self.skills.values()) + random.randint(1, 20) if performance > 100: self.championships += 1 self.achievements.append(f"{self.championships}x NBA Champion") if performance > 90: self.achievements.append("MVP") if self.skills["scoring"] > 30: self.achievements.append("Scoring Title") def retire(self): print(f"{self.name}'s Career Summary:") print(f"Final Skills: {self.skills}") print(f"Championships: {self.championships}") print(f"Achievements: {', '.join(set(self.achievements))}") def simulate_career(player): for _ in range(15): # 15 seasons player.train() player.play_season() player.retire() # Create and simulate Michael Jordan's career jordan = Player("Michael Jordan") # College years and early pro seasons for _ in range(4): jordan.train() # Prime years for _ in range(11): jordan.train() jordan.play_season() # First retirement jordan.skills["scoring"] -= 5 # Baseball attempt jordan.skills = {"scoring": 10, "defense": 15, "leadership": 25} # Return to basketball for _ in range(3): jordan.train() jordan.play_season() # Second retirement and comeback jordan.skills["scoring"] -= 3 jordan.skills["defense"] -= 3 for _ in range(2): jordan.train() jordan.play_season() # Final retirement jordan.retire() # Run the simulation simulate_career(jordan)
View on Reddit #34591984

SomeOddCodeGuy@reddit

The timing of this post is exceptionally perfect; I was just starting a deep dive into the relationship between parameters and knowledge a couple of hours ago to answer a question I had for a project I'm working on lol. This could not have possibly worked out better for me.
View on Reddit #34587453