Yeah it's amazing what higher dimensions can do, although the video got the example wrong, 100 dimensions was too small and there was a bug in the code.
He needs to dig deeper into the Superposition stuff. That topic will reveal something that is linked to covalence of facts across tokens, or something like that
He got the explanation right but the example wrong unfortunately. Its impossible to fit 10,000 vectors that are all between 89 and 91 degrees of each other in 100 dimensions, needs to be between 75 and 105 or something. The accuracy gets better the higher the dimensions.
So, if we rename R\_i as keys, and C\_i as values, replace ReLU with another nonlinearity softmax, we will get something like cross-attention with some KV-cache, that is learned during the pre-training. Can we just unite self-attention and MLP layers and have this 50k vectors of MLP as prefix of any input?
This guys videos simultaneously make me want to continue learning / quit and become a farmer.
Also, fun fact, he animated each video with a few thousand lines of Python code.
Also... I'm a farmer now.
Great video. This same phenomenon explains why LLMs can so easily make abstract analogies.
> Describe Michael Jordan, his career, and legacy using a computer analogy.
**Claude 3.5 Sonnet:**
> Michael Jordan was like a revolutionary computer processor that transformed the NBA (think of it as the operating system of professional basketball). His career can be compared to the development and dominance of a groundbreaking CPU:
>
> 1. Boot-up phase (Early career): Jordan entered the league like a new processor architecture, showing immense potential but still adapting to the system. His early years were spent optimizing his performance and learning to work within the NBA’s environment.
> 2. Processing power (Prime years): As Jordan hit his stride, he became the most powerful processor in the league. His ability to execute complex plays (instructions) with incredible speed and efficiency was unmatched. He could handle multiple tasks simultaneously - scoring, defending, and leading his team - all while using minimal resources (energy).
> 3. Overclocking (Clutch performances): In critical moments, Jordan seemed to overclock himself, pushing beyond normal limits to achieve extraordinary results. This was evident in his game-winning shots and stellar playoff performances.
> 4. System compatibility (Teammates): Jordan’s greatness was amplified by his ability to interface effectively with various “peripherals” (teammates). He made those around him perform better, much like a powerful CPU enhancing the overall system performance.
> 5. Upgrades (Skill development): Throughout his career, Jordan continuously upgraded his “software,” adding new skills like an improved jump shot and post-up game. These enhancements made him increasingly difficult for opponents to defend against or “debug.”
> 6. Multithreading (Multisport athlete): Jordan’s brief foray into professional baseball demonstrated his versatility, like a processor capable of running different types of software efficiently.
> 7. Legacy (Lasting impact): Just as groundbreaking processors influence future chip designs, Jordan’s impact on basketball continues long after his retirement. His playing style, work ethic, and competitive drive became blueprints for future generations of players.
> 8. Benchmarking (Records and achievements): Jordan’s six championships, numerous scoring titles, and MVP awards serve as benchmarks against which other players’ performances are measured, similar to how new processors are compared to legendary models.
>
> In the end, Michael Jordan's career redefined the possibilities of basketball performance, much like how a revolutionary processor can redefine computing capabilities. His legacy continues to influence the game, setting a standard of excellence that others strive to match or surpass.
Or, as a Python script:
import random
class Player:
def __init__(self, name):
self.name = name
self.skills = {"scoring": 0, "defense": 0, "leadership": 0}
self.achievements = []
self.championships = 0
def train(self):
for skill in self.skills:
self.skills[skill] += random.randint(1, 5)
def play_season(self):
performance = sum(self.skills.values()) + random.randint(1, 20)
if performance > 100:
self.championships += 1
self.achievements.append(f"{self.championships}x NBA Champion")
if performance > 90:
self.achievements.append("MVP")
if self.skills["scoring"] > 30:
self.achievements.append("Scoring Title")
def retire(self):
print(f"{self.name}'s Career Summary:")
print(f"Final Skills: {self.skills}")
print(f"Championships: {self.championships}")
print(f"Achievements: {', '.join(set(self.achievements))}")
def simulate_career(player):
for _ in range(15): # 15 seasons
player.train()
player.play_season()
player.retire()
# Create and simulate Michael Jordan's career
jordan = Player("Michael Jordan")
# College years and early pro seasons
for _ in range(4):
jordan.train()
# Prime years
for _ in range(11):
jordan.train()
jordan.play_season()
# First retirement
jordan.skills["scoring"] -= 5
# Baseball attempt
jordan.skills = {"scoring": 10, "defense": 15, "leadership": 25}
# Return to basketball
for _ in range(3):
jordan.train()
jordan.play_season()
# Second retirement and comeback
jordan.skills["scoring"] -= 3
jordan.skills["defense"] -= 3
for _ in range(2):
jordan.train()
jordan.play_season()
# Final retirement
jordan.retire()
# Run the simulation
simulate_career(jordan)
The timing of this post is exceptionally perfect; I was just starting a deep dive into the relationship between parameters and knowledge a couple of hours ago to answer a question I had for a project I'm working on lol. This could not have possibly worked out better for me.
15 Comments
Ok-Obligation-281@reddit
miscellaneous_robot@reddit
AcanthocephalaNo8273@reddit
First_Understanding2@reddit
miscellaneous_robot@reddit
AcanthocephalaNo8273@reddit
GoodShape4279@reddit
More-Ad5919@reddit
Barry_Jumps@reddit
tabspaces@reddit
RegularFerret3002@reddit
bettedavisbettedavis@reddit
KrazyA1pha@reddit
KrazyA1pha@reddit
SomeOddCodeGuy@reddit