Elven77AI

Can 4chan data REALLY improve a model? TURNS OUT IT CAN!

Posted by Sicarius_The_First@reddit | LocalLLaMA | View on Reddit | 157 comments

[-]

Elven77AI@reddit

Also, the identities are anonymoyus: the training on Reddit will model "fictional identity bank" spread over various names(associative identity), 4chan forces more coherent single vector of same "Anonymous" post responsible for all replies, perhaps it appears more coherent during training and skips identity-modeling?

Can 4chan data REALLY improve a model? TURNS OUT IT CAN!

Posted by Sicarius_The_First@reddit | LocalLLaMA | View on Reddit | 157 comments

[-]

Elven77AI@reddit

> The finetune was literally on an extremely noise 4chan dataset, it should have eaten glue. Hmm, perhaps the post->reply structure in flat threads provides a better dialogue model vs threaded dialogue tree(reddit), since the clue to what post X replies to(>>post number) is direct pointer that LLM digest better than external "post X appears below Y"). i.e. the advantage would be context of the threads as interlocking tree of posts referencing(link numbers) each other explicitly outperforms threaded/quotable nesting structure within training.

DeepSeek-R1’s paper was updated 2 days ago, expanding from 22 pages to 86 pages and adding a substantial amount of detail.

Posted by Nunki08@reddit | LocalLLaMA | View on Reddit | 55 comments

[-]

Elven77AI@reddit

This seems like it, dumping dozens of pages means its no longer relevant to their current research and they moved on to something far more effective(i.e. no competitor advantage), likely a new reasoning architecture built from https://huggingface.co/papers/2512.24880

Physical documentation for LLMs in Shenzhen bookstore selling guides for DeepSeek, Doubao, Kimi, and ChatGPT.

Posted by abdouhlili@reddit | LocalLLaMA | View on Reddit | 55 comments

[-]

Elven77AI@reddit

What is the use case for this? Is this prompt engineering DeepSeek to be more focused? Then its 1 page cheat-sheet. There isn't enough material for a book.

When will the free ride be over?

Posted by DeltaSqueezer@reddit | LocalLLaMA | View on Reddit | 75 comments

[-]

Elven77AI@reddit

They are trying to get market share, once a stable userbase forms they move to freemium type service and make it limited/low-quality for free users. Its not a charity. The forces they compete with on cost, however can easily sway consumers towards something cheaper so they have to maintain some basic competitive offer that prevents users migrating or using local models. The free ride is the "loss leader" cost for making their models famous and their API dependencies entrenched in the market, plus benefits of free training material from user prompts.

[2510.05688] vAttention: Verified Sparse Attention

Posted by Elven77AI@reddit | LocalLLaMA | View on Reddit | 3 comments

[-]

Elven77AI@reddit (OP)

tl;dr The new sparse attention scheme "matches full model quality with upto 20x sparsity" Repo for their modified models. https://github.com/xAlg-ai/sparse-attention-hub

More love for GLM4.6 (evaluation vs. Claude 4.5 for NLP tasks)

Posted by LoveMind_AI@reddit | LocalLLaMA | View on Reddit | 61 comments

[-]

Elven77AI@reddit

With presence/frequency/repetition penalties, the model "lapse into chinese" is more likely when there its trying to repeat something but has to rephrase it due penalties, since the majority of its training corpus is Chinese and Chinese tokens are the default the rephrase shifts to another language

Biggest Provider for the community for at moment thanks to them

Posted by dead-supernova@reddit | LocalLLaMA | View on Reddit | 281 comments

[-]

Elven77AI@reddit

It justs shows the scale of software innovation, anyone reading arxiv preprints can see for themselves - the vast majority of papers on AI come from China. That is despite the GPU embargoes and much less financing per project.

inclusionAI/Ring-flash-2.0

Posted by nullmove@reddit | LocalLLaMA | View on Reddit | 13 comments

[-]

Elven77AI@reddit

This thing is a monster of creative writing and surprisingly robust for its size.

Clever code is probably the worst code you could write

Posted by Rtzon@reddit | programming | View on Reddit | 341 comments

[-]

Elven77AI@reddit

Disagree. Either the code is clever or i'll use AI to generate it. I don't like to waste time reimplementing wheels and boilerplate, its soul-draining to write "dumb code" to add functionality, like 'small talk' but it lasts for hours. Without AI writing down dumb code by kilobytes, i'd spending most of time debugging dumb code doing something even dumber(e.g. corner cases in C/C++).

PubChem is down, DNS record gone

Posted by geaibleu@reddit | PrepperIntel | View on Reddit | 39 comments

[-]

Elven77AI@reddit

website loads here, DNS certificate shows domain is managed via GoDaddy.com, Inc. as Name *.ncbi.nlm.nih.gov

The US Government's open data is currently being scrubbed

Posted by InvisibleBobby@reddit | PrepperIntel | View on Reddit | 175 comments

[-]

Elven77AI@reddit

Has anyone analyzed what is the data being scrubbed? What exactly is hidden? Exposing this might be important, since e.g. EPA datasets are considered authorative(like https://www.epa.gov/chemical-data-reporting )

Shock poll: 41 percent of young voters find killing of UnitedHealthcare CEO acceptable

Posted by 1DarkStarryNight@reddit | anime_titties | View on Reddit | 418 comments

[-]

Elven77AI@reddit

Oh, you're russian, so you miss alot of context: You have to understand US insurance industry is not something normal. Its more like a mafia extortion scheme with extra steps, one of which is killing people by forcing to pay for "protection"(insurance) and not providing it so the patient dies or gets crippled, and if they sue they got the best lawyers extortion money can get.

I just had the scariest dream/nightmare in my life

Posted by Midnight-blue1513@reddit | HighStrangeness | View on Reddit | 57 comments

[-]

Elven77AI@reddit

Have you tried changing the direction where you sleep on the bed(e.g. head-to-South)?

Interesting real life story on a man encountering a space-time distortion

Posted by Projectcultureshock@reddit | HighStrangeness | View on Reddit | 70 comments

[-]

Elven77AI@reddit

Source is “The Trap of the Devil” in: “Planet X” monthly newspaper, Kiev, Ukraine July 2005 https://www.fern-flower.org/en/articles/devils-trap

Alternative to Reddit/forum where knowledgeable people are

Posted by Nicoleism101@reddit | RedditAlternatives | View on Reddit | 26 comments

[-]

Elven77AI@reddit

Smart people(i'd assume smarter than myself) seem to have a penchant for forming some academic circles, similar to gamer's discord groups/clans, so you have e.g. "Quantum Gravity group" operating in some space where it excludes non-members and not subject to public opinion. If you could read that, it would defeat the point of that group and make it "too public", like e.g. Linux kernel mailing list dealing with constant drama.

How much do you value information density in Reddit-like UI?

Posted by kinghuang@reddit | RedditAlternatives | View on Reddit | 19 comments

[-]

Elven77AI@reddit

I use tons of CSS hacks to remove margins, because scrolling tiny websites is waste of time.

Why is Lemmy.world so toxic?

Posted by Character-Storage661@reddit | RedditAlternatives | View on Reddit | 87 comments

[-]

Elven77AI@reddit

Based on my extensive browsing of lemmy servers: 1.They seem to be copying reddit structures at scale, which doesn't have as much users there - so empty subreddits. Lack of niche topics/hobbies. Nobody seems to want to make anything unpopular for long-term investment. 2.Mods are very pedantic and enforce their arbitrary rules very effectively: its far more moderation per user than reddit. Ironically where the moderation is stronger is enforcing low posting rates in communities to avoid spam: only few users are essentially posting vs hundreds viewers, and these rules limit them. 3.Suffocating ideological conformity: each lemmy seems to have an insular ideological platform of "us vs them" and they don't like opinions outside of consesus.

Have any of you peeps looked at "classic forums" (like Xenforo / Invision) as viable Reddit Alternatives? What would be your reasoning to use those community frameworks? If not, why not?

Posted by prankster999@reddit | RedditAlternatives | View on Reddit | 15 comments

[-]

Elven77AI@reddit

Google is unfortunately not capable of finding much due changes in its algorithm, you're better off with other search engines(Bing/Yandex/DuckDuckGo). There is much more spam and fake pages to sort through, but its possible; unless your niche is unpopular it should be indexed and linked from somewhere, you can check majestic million( https://majestic.com/reports/majestic-million ) to see how much exposure it has on the web, including rare websites that would fail to show up in most searches.

Have any of you peeps looked at "classic forums" (like Xenforo / Invision) as viable Reddit Alternatives? What would be your reasoning to use those community frameworks? If not, why not?

Posted by prankster999@reddit | RedditAlternatives | View on Reddit | 15 comments

[-]

Elven77AI@reddit

The problem with forums is exposure at scale: subreddits share 'exposure space' with all of reddit. Try finding a forum for any niche with search engines and compare it with finding a subreddit. Its the same with blogs and personal pages, they can't compete with centralized services on exposure: the 'discovery' of forums needs some central directory to compete with reddit. Now, imagine you do get exposure: your users will have to register to post for each of the forums they read(unlike one registration for all subreddits). This adds friction, and they're not that interested in 'just a forum' - without a huge amount of content to read, there is no point creating a empty forum(same with reddit clones). Maintaining the forum and dealing with forum hosters/companies for just a tiny community with low growth potential is not appealing to most people. Reddit clones allow to concentrate content by self-moderation sub-forums, thats the genius of this dynamic scheme: topics(tags) -> communities-> self-moderation. Forums are just this with manual moderation and intervention: you can't just randomly grant anyone mod powers or allow subforums to be created organically, but at reddit scale it self-organizes into successful subforums and compete for attention with rest.

Building Reddit right

Posted by lumpyvasdeferens@reddit | RedditAlternatives | View on Reddit | 11 comments

[-]

Elven77AI@reddit

Remove karma, replace it with log2(#replies)*log10(total_reply_text_length) to sort threads/subthreads, cutoff for top 1h/1d/1w/1m. Bootstrap with something like subSimulatorGPT2 and stealthily delete the initial bot posts that aren't replied. Don't allow to embed any media - only links, text is much cheaper to host(if you don't rely on dynamically constructed pages, make it 100% cloudflare compatible). Monetization: sponsored posts that stay "at top" of subreddit for N hours. Avoid captchas, instead use algorithmic triggers to mark bots: basically, don't anger potential power-users with hostile design. Reddit changing their design/API at whim is a prime example of alienating decisions, people rely on things staying as is for years.

Social websites with nested comments v6

Posted by 1billionthuser@reddit | RedditAlternatives | View on Reddit | 20 comments

[-]

Elven77AI@reddit

They allow sorting replies by ratings and are more easy to follow than a nest of chronological posts referring/quoting multiple posts(often nested). Threaded discussions also allow hiding irrelevant subthreads, while flat threads force you to skip posts in the middle of relevant content.

Strange lights and hums in the sky appear all over the world, a harbinger of an alien invasion?

Posted by JuliaJune96@reddit | HighStrangeness | View on Reddit | 6 comments

[-]

Elven77AI@reddit

What are the dates on these reports? The impression its all recent, but it could be cherrypicked from 10-5-3 years ago.

[Hype Train] Your friendly reminder that benevolent canine aliens are supposed to be revealing themselves TOMORROW (12/23)!

Posted by ResplendentShade@reddit | HighStrangeness | View on Reddit | 413 comments

[-]

Elven77AI@reddit

Coincidence? https://old.reddit.com/r/HighStrangeness/comments/18p637a/strange_lights_and_hums_in_the_sky_appear_all/

Why do programmers need private offices with doors? (Do Not Disturb)

Posted by Mariambarouma@reddit | programming | View on Reddit | 382 comments

[-]

Elven77AI@reddit

A better analogy is interrupting an online game without save states vs a single-player game you can reload from a save state, recovering the progress/levels/items back to point where you left.

Fossil: A Git alternative with batteries included

Posted by ketralnis@reddit | programming | View on Reddit | 90 comments

[-]

Elven77AI@reddit

Alright, suppose i self-host my Fossil, how do i collaborate with others using their own self-hosted fossil without a central hub or ability to locate the code i'm wanting to fix/fork/review?

Fossil: A Git alternative with batteries included

Posted by ketralnis@reddit | programming | View on Reddit | 90 comments

[-]

Elven77AI@reddit

What is the FossilHub equivalent to github? Github has a nice web interface and lots of features that don't exist in git itself.

The NSA advises move to memory-safe languages

Posted by ketralnis@reddit | programming | View on Reddit | 554 comments

[-]

Elven77AI@reddit

Why not reform C/C++ standards to mandate specific memory-safe features as default? Migrating from C/C++ codebases is a non-starter for most of companies. A buffer overflow checking overhead can be eliminated by proving at compile-time that all writes are limited to buffer length, so if the buffer can be written to outside the limit it would cause a "ambigious write error" instead of compiling it. Runtime-allocation would of course need to be checked at runtime limits, but since most of these exploits target fixed buffers its going to be priority to makes this "compile-time check for buffer operations outside of range" mandatory step (and disabling it with something like -funsafe-buffers)