After court order, OpenAI is now preserving all ChatGPT and API logs

[-]

AaronFeng47@reddit

How is keeping all user chat logs gonna help NYT in this case? Do they think Openai will just shove every chat history into gpts brain?

[-]

Littlehouse75@reddit

If there has been a violation of copyright, the chat logs would serve as evidence.

[-]

llmentry@reddit

I mean, you'd think NY Times would have just asked chatgpt itself, wouldn't you? If it's that easy to do this, then they'd have saved everyone a whole lot of trouble by just producing the evidence.

This is a lot of trouble for what sounds awfully like a fishing expedition ...

[-]

ginger_and_egg@reddit

"Chat GPT, are you violating copyright?"

"No 😇 of course not"

"Alright, that solves that"

[-]

llmentry@reddit

:) Asked ChatGPT to reproduce copies of their articles, of course, not asked it if it was copying them ...

[-]

I just tried with 4o-mini via duck.ai ... and I couldn't even get it to give me a 1-2 sentence fair-use quote. The guardrails against this are seemingly very strict, and I suspect you'd have to jailbreak it to get anywhere.

If anyone has better luck getting any OpenAI model to reproduce an NY Times article, it'd be interesting to know?

[-]

TentacledKangaroo@reddit

The suit started in 2023, when there were demostrably far fewer guardrails in place. It's one of the reasons preserving that data is important in this instance.

[-]

llmentry@reddit

But those data are long gone (or should be) - if you're a paying customer, and haven't opted in to data retention, then prompts and outputs should be deleted within 30 days (supposedly ... we'll see, I guess!)

This order appears to be more about retaining current prompts and outputs. So the current guardrails are relevant for that, I think.

[-]

TentacledKangaroo@reddit

What I'm saying is that just because there are guardrails in place now to prevent verbatim replication of NYT articles, it doesn't mean those were in place when the lawsuit was initially filed, or for that matter, even just a few months ago.

As I understand it, the allegation is that OpenAI hasn't actually been deleting data as it claims, and has only been doing so recently in an effort to destroy evidence for this case (and OpenAI is crying about it not because they're actually concerned with user/data privacy, but to make the public think they are). The court is then going the ham-fisted route and saying "fine, then you can't delete any data until we deal with this case."

[-]

llmentry@reddit

Ah, I see. And yeah, ok, if that's the point then ... well, sure. But looking over the ruling that led to this, it seems as though the judge was asking about *new* data, moving forwards, not old data previous to this. Although it's a bit hard to tell, because I'm not sure the judge really understands the situation -- and they seem, if anything, most annoyed by OpenAI not proposing a means to segregate and anonymise some users' data, even though the judge seemed initially sympathetic with potential privacy issues. (The response appears to have basically been, "if you're not going to engage with the court and propose ways forwards, then fine, just save everything and see if I care!" Well done there, OpenAI ...)

Anyway, I guess more will come to light about OpenAI's data retention practices after this ... probably.

But seriously -- if we can acknowledge that right now it's impossible to get OpenAI's models to cough up even a sentence of copyrighted material, surely this ruling could have explicitly referring to historic, not current, outputs?

From what I can see, all of the NY Times evidence of infringement are about the early use of RAG (stupid, dumb, pointless, counterproductive RAG!) with ChatGPT, back in 2023, under prompts that expressly requested the reproduction of their own content. (Ironically, they also claim that most of the time they *couldn't* get ChatGPT to correctly reproduce their content, and then get upset because it was falsely attributing non-infringing text to the NY Times ...) Anyway, they have something of a point here, and OpenAI should just acknowledge this, pay up and move on -- the damages for the partial reproduction of a few NY Times articles back in 2023 should not be much.

But none of the above is relevant now, and I'm not sure why the court can't require the NY Times to demonstrate evidence of *current* infringements before requiring *current* outputs to be saved. That would seem only logical to my mind. But, IANAL ...

[-]

Traditional-Gap-3313@reddit

is the judge dumb enough to believe that people capable of jailbreaking openai models are doing it to read a shitty nyt article?

[-]

visarga@reddit

LLMs are the worst copyright infringement tool. They are slow, expensive and give approximative results. Who generated bootleg Harry Potter instead of just pirating it? Copying is free, instant and perfect fidelity.

[-]

llmentry@reddit

Well, LLMs are pretty good at verbatim reproduction (I've had fun getting models to reproduce entire chapters from books in the public domain, and they'll tell you they not capable of doing this while happily copying out the text verbatim. It's possible this relates to the order this information entered the network (there was a post the other day on this) so it's possibly not universal. But still, they can theoretically do this, with no hallucinations. (I was surprised.)

But, for current copyright works you really have to jailbreak, which is going to get you flagged if using a closed model and doing this repeatedly ... so, why would you bother? The number of people going to such trouble to read a poor text-only copy of the NY Times must be countable on the fingers of one hand.

(Maybe this lawsuit will prove me wrong, but I seriously doubt it.)

[-]

visarga@reddit

They probably use n-gram filtering. So they are guaranteed to never have more than n consecutive tokens in common with the source corpus.

[-]

ginger_and_egg@reddit

My mistake :)

[-]

dinerburgeryum@reddit

It’s why you go local, friends.

[-]

the_ai_wizard@reddit

yes, lets just drop $10,000 on a rig to run something similar locally

[-]

pitchblackfriday@reddit

Why so extreme?

ChatGPT and other commercial LLM services are running at a massive loss. If they are going to charge you the "real" cost, $10,000 rig is not that expensive.
Consumer grade hardware can run small and mid-sized LLMs locally that covers ordinary people's casual usage. It's never going to be an expert assistant, but a generally knowledgeable friend.
Knowledge density of LLMs are increasing at neck-breaking speed. Soon, in few years, you will probably need a $2,000 rig to run the same level of LLM which is not too bad.

[-]

_thispageleftblank@reddit

Development costs are pretty high, but inference is cheap. Look at how much inference providers charge for R1-full on OpenRouter. It‘s dirt cheap SOTA.

[-]

pitchblackfriday@reddit

Do you really think inference for 400 million weekly active users is cheap?

They constantly throttle their services, even for paid users, don't you know?

https://www.cnbc.com/2025/02/20/openai-tops-400-million-users-despite-deepseeks-emergence.html

https://www.cnbc.com/2025/03/27/chatgpts-viral-image-generation-ai-is-melting-openais-gpus.html

[-]

_thispageleftblank@reddit

It doesn’t matter what the aggregate cost is, only what the profit per token is. You can buy R1 tokens from a bunch of third party providers, who surely won’t be operating at a loss, and it‘s still extremely cheap. Or you can become an inference provider yourself.

[-]

TentacledKangaroo@reddit

So here's the thing... OpenAI operates at a 225% loss. No, I'm not missing a decimal point in that. Every single query, including from paid uses, loses them money. Every token loses them money. The revenue they do get barely covers the operating expenses, let along the training and everything else.

And sure, you could purchase from a third party provider, and they may be making a profit...that is, until OpenAI inevitably jacks up their prices to three or four or five times what they are now, forcing those third parties to either start operating at a loss or to also jack up their prices.

Consumer prices are cheap right now, because the whole thing is a house of cards, and all it'll take to make it come crashing down is for Microsoft to stop funneling money into OpenAI.

[-]

the_ai_wizard@reddit

if true, holy shit

[-]

_thispageleftblank@reddit

It’s not unusual for startups to lose money during the first years of their existence (and OpenAI has effectively existed since 2022), in an attempt to capture market share. The total loss also doesn’t tell us about the structure, like whether API inference is profitable or not, or whether specific models are profitable.

I’m not talking about third-party providers of OpenAI’s models. I don’t think they even exist. I’m talking about other models, including open-source ones, that anyone can self-host. R1 is close to SOTA performance and is offered by self-hosters for a very low price on OpenRouter. OpenAI’s prices have nothing to do with that, their models are not even within the top 5 by token usage.

[-]

TentacledKangaroo@reddit

Genuine question - Is that $3 per hour before Microsoft's 80% or so discount to OpenAI, or after?

[-]

Megatron_McLargeHuge@reddit

If they start charging the "real" cost,

The real cost that covers all R&D expenses or the operating cost of the model in production? It's the engineers and training that are expensive but home users don't need to replicate that as long as open models are competitive.

[-]

the_ai_wizard@reddit

Also by this logic, I assume the electricity cost is equal or more for home users...

[-]

Captain_D_Buggy@reddit

How long till these companies stop with the open models? Will we ever see a gemini size model getting released?

[-]

pier4r@reddit

or the operating cost of the model in production?

when a company deploys something in production, it has to recoup also the money spent to produce it. It is not just pure operation cost.

That is my interpretation of the parent comment.

[-]

ginger_and_egg@reddit

A company would like to do that. But ultimately it makes decision based not on recouping a sunk cost, but instead making the most profit (or least loss) based on the marginal cost of more inference.

If they can't charge enough for inference to get the cost of training back, what that means is they stop training new models and just milk their existing models as long as they can.

[-]

rorykoehler@reddit

Scaling costs are inverse. The more inference you do the cheaper it gets due to batching efficiency gains

[-]

This-Complex-669@reddit

You are so right.

But look around you. These are all brokies who think 10k for a permanent helper is too expensive 🤣

[-]

stoppableDissolution@reddit

Not everyone is living in US where its spare change. Even in EU in a lot of places its close to a year's salary, let alone other parts of the world.

[-]

TentacledKangaroo@reddit

It's not even spare change in the US, save for a very small portion of people. For about 5% of the population, $10k is basically their entire take home pay for a year, and for another 5% it's half ther entire take home pay.

[-]

pitchblackfriday@reddit

What are you talking about?

$10K + hundreds-watts-per-hour electricity bill is fucking expensive as a matter of fact. Are you living in Switzerland or what? You are labelling 99% of global population as 'brokies'.

I was just making a counterpoint.

[-]

This-Complex-669@reddit

Yes. Y’all are brokies if 10k is a problem. We are talking about a SOTA helper who is available 24/7 and tackle complex stuff. I can put it to work for a second job in another 1 or two years when we are nearing AGI.

[-]

kmouratidis@reddit

In some countries, you can work two full-time minimum-wage job for 2+ years and still not be able to afford it. Just because you happen to live in a richer country, shouldn't make you blind to the rest.

[-]

power97992@reddit

Two years from now, agents will automate a lot of tasks, you won’t think about using a two year old model… Using a two year model, is like using gpt 4 or llama 1 now

[-]

Captain_D_Buggy@reddit

What hardware do we need to run 500 billion parameter model?

[-]

AvidCyclist250@reddit

Soon, in few years

If there's one thing I've learned since SD and LLMs, it's that development is always faster than you think. The surprises have never ended.

[-]

kmouratidis@reddit

If they start charging the "real" cost, you are going to cry because you can't pay your rent (I guarantee you)

Are you including model development? If not, inference isn't going to be *that" expensive.

[-]

AppearanceHeavy6724@reddit

2x3060 is like $450, will be $250 in 3 years when you are done with them, so you lose only $200 in 3 years. what are you talking about.

[-]

pitchblackfriday@reddit

To be fair, we are talking about commercial-scale SOTA here so 24GB VRAM is not enough. 70B model would be minimum to partially replace the ChatGPT in service.

[-]

AppearanceHeavy6724@reddit

OK then 2x3060 and 5060ti us less than 1000. You would probably get only 8 tok/sec with 70b model, slow but still usable.

[-]

llmentry@reddit

Sadly, 70B model will not provide you with GPT-4.1 equivalent output. (I love local models, and wish it was otherwise ... but it's so far from equivalent it's not even funny.)

You've really got to get DeepSeek-v3 to get close - and achieving that, even at Q6, will cost you so much more than $1000. Again, I wish it was otherwise :(

[-]

AppearanceHeavy6724@reddit

It depends on the task. For writing fiction there's no obvious correlation between model size and output quality. I often like stories written by 12b models more than one made by SOTAs. For coding it might make a noticeable difference, but the way I use LLMs 14b assistants are good enough for me.

[-]

Ravenpest@reddit

Yes. Absolutely. How about not being a slave

[-]

nO0b@reddit

yes, lets just drop $10,000 on a rig to run something similar locally

there are LLMs that will run on your phone, let alone a 10k rig. All kinds of options in between. and no, they don't suck.

[-]

Smile_Clown@reddit

I love when people say this... on their 3060 with 3 tokens a second on anything even remotely smarter than a three year old.

Maybe in a year or so something will be serviceable at home for an average user.

You guys... you quant the fuck out of models and post how amazing it all is and the quality is just ass for anything more than a recipe or an email.

[-]

dinerburgeryum@reddit

I’m not willing to pay for the difference in quality with my privacy or NDA work data. You are. That’s fine, but there is a cost beyond money for these services.

[-]

PrototypePineapple@reddit

You're seeing cost but not value.

Local models: cost 0, value 1

Big models: cost incalculable, value incalculable.

People are upset about apples, and you tell them to eat oranges.

to be clear, I am a huge proponent of local models, but I am also realistic, which my idealism really dislikes.

[-]

spacenglish@reddit

Nothing as good as o3 or what Gemini 2.5 used to be, right?

[-]

dinerburgeryum@reddit

"Used to be" is pulling a lot of weight there. Will you get the same quality out of the box? No. Will you set up a consistent, private and trustworthy system, where companies fluffed up on VC cash can't just rug pull your workflow? Yes.

[-]

det1rac@reddit

Effective immediately?

[-]

dhlu@reddit

I'm short of 30k USD to run my own DeepSeek instance, if someone could help...

[-]

power97992@reddit

Run deepseek distilled 8b

[-]

dhlu@reddit

I said DeepSeek, not LLaMA/Qwen/...

[-]

power97992@reddit

You can run distilled 8b on a mac mini...

[-]

dhlu@reddit

So a machine exclusively for LLM, for said price

Considering I have no use of an Apple ecosystem

[-]

my_byte@reddit

Interesting. I'd be under the assumption that privacy laws like GDPR etc. would range precedence over a private lawsuit like that.

[-]

InitialAd3323@reddit

In the EU yeah, in the US... Nope.

[-]

my_byte@reddit

Curious what happens to data of EU citizens like myself then. I guess the US is gonna be US and say their company, their rules. 🥴

[-]

TentacledKangaroo@reddit

As I understand it, part of GDPR is the requirement that EU data be on EU servers and stay within the EU. I've seen companies go so far as to have a separate infrastructure team over there to manage their European computers/servers and have the two continents basically air-gapped from one another, so that there was effectively no chance of their EU data ending up in the US, even accidentally.

[-]

my_byte@reddit

Not really. The requirement is just to tell customers where data is located and to comply with gdpr wherever you store it. Which of course is a fad if US government overrules and forbids to comply with GDPR. And knowing our retarded EU bureaucrats they'll probably fine OpenAI for it rather than addressing the issue with US government.

[-]

InitialAd3323@reddit

I believe since the data must be stored because of a court order, and since OpenAI operates only with the US company instead of a European subsidiary, we are basically screwed

Guess I'll just use Mistral, since it's also faster and less limited in chat usage

[-]

my_byte@reddit

Sadly the openai offering is way ahead of others in terms of value for money. I guess if you mostly use it for chat and especially Q&A, perplexity is a better investment of money. But I'm also using image generation quite a bit. And despite having them in Cursor, the reasoning models in Chatgpt work better for me sometimes. For day to day chat stuff, their models aren't even that good. I feel like they had peeked with the initial gpt-4 release and - at least for my use cases like helping me write lyrics - have been getting progressively worse since. I had to come up with increasingly longer prompts to keep it working and it still feels worse than 2 years ago to me. But hey, I'm starting to approach a quantity of conversations where I can probably fine tune my own model. If you're GPU poor, you can also consider hosting your own chat, but hitting fireworks.ai for inference. I don't think they store anything. And it's fast AF too.

[-]

AppearanceHeavy6724@reddit

Try gemma3 27b for poetry

[-]

my_byte@reddit

Yeah. I tried running it, but llama-server refuses to run the model. I'll have to investigate cause I've been looking to build my own app to help with the editing for my workflow (things like "suggest a four syllable phrase for this selection"). I'll definitely look into it, especially since I'm interested in fine tuning a model to work for my workflow and output style. I mostly write the lyrics myself, but use llms for ideation. So I've been wondering if fine tuning could get ah llm to produce output I deem good enough

[-]

AppearanceHeavy6724@reddit

Try updating llama server to the very latest version. They are doing lots of fixes for Geema 3 models lately.

[-]

my_byte@reddit

Yeah. It's been a week since my last compile

[-]

The_IT_Dude_@reddit

I figured internally they never truly deleted anything, and the delete button was more like a placebo...

I mean, yeah, it might even be illegal and all that, but it's all training data for them to sit on. Perhaps they'd never share it with anyone at least intentionally, but I also figured the folks at the NSA had a search feature on it still and were likely using more AI on it to search for what they wanted.

Bottom line: If you don't control both the hardware and the software, you run your stuff on just assume games are being played in the background.

[-]

Efficient_Ad_4162@reddit

The NSA aren't going to do anything as pedestrian as trust a third party when they can just capture every packet that goes in. Just take a second to consider the scale of PRISM.

Don't get me wrong, I agree with everything you said but its never a bad time to hear 'actually its way worse than that'.

[-]

Red_Redditor_Reddit@reddit

Unless they have some magic quantum machine, those encrypted packets don't mean anything.

[-]

AppearanceHeavy6724@reddit

Assuming AES is not cracked/backdoored.

[-]

cd1995Cargo@reddit

That’s a pretty good assumption to make though.

AES was created by a completely public process, and the underlying algorithm, Rijndael, was designed by two graduate students from Belgium and selected by a vote from a committee consisting of cryptography experts from around the world. Many of them had created and submitted their own algorithms for consideration. Rijndael won because it was considered the best (for reasons that would be way too long to explain here, it is a very elegant algorithm). There’s no possibility that the NSA put some sort of backdoor in the algorithm. It would be immediately obvious to anyone looking.

As for cracking it, it’s technically not impossible but again, crypto experts have been analyzing it for decades and it’s known to resist all forms of differential cryptanalysis. It has technically been been “broken” in theory, in that a couple of papers have shown that it’s possible to break a bit faster than brute force, but the computational power and amount of data you’d need would still be prohibitive. The NSA can’t change the rules of math.

Now, the real thing to worry about is the NSA backdooring key generation algorithms, which they have done in the past, though it was immediately obvious that they did. https://en.m.wikipedia.org/wiki/Dual_EC_DRBG

[-]

AppearanceHeavy6724@reddit

The fact AES was invented by Belgians means zero, as it might have been chosen because NSA precisely knew it has desirable weakness. I mean, do you really trust these dudes?

NSA cannot change the rules of math, but they have massive budget dedicated solely to exactly solving these kind if problems; you cannot change the laws of economics. There is a good reason they were the first to discover differential cryptanalysus.

[-]

cd1995Cargo@reddit

As I said in my comment, Rijndael was chosen by a vote by a committee made from cryptography experts around the world. They convened for a conference and each group presented their own algorithms for consideration. Rijndael was extensively analyzed by all of these independent groups and was selected as the winner because it was simple, elegant, and secure.

I’d encourage you to look more into how the algorithm actually works. I studied cryptography in graduate school and I understand the math behind it and why it is secure.

[-]

AppearanceHeavy6724@reddit

"Understand why it is secure " is a nonsens; sha and MD5 are fubar these days yet they were analyzed by likes of you in 1990s and were deemed to be good.

[-]

scswift@reddit

sha and MD5 are fubar these days yet they were analyzed by likes of you in 1990s and were deemed to be good.

Oh my god. Are you a zoomer? Cause you sound like one, clearly having no understanding of how slow and how limited computers of that time period were. Something like SHA or MD5 would have been the best that was reasonable to implement at the time, without slowing everything to a crawl, and when I say slowing everything to a crawl, I don't mean by today's standards because everyhting was already running at a crawl by today's standards. I mean even WORSE than that!

These were the days when you had to wait 60 seconds for a .gif to download. And it was a 640x480 gif not a high res animated one.

So don't go shitting on all the computer scientists of that era like they just didn't know what the hell they were doing, because they did the best they could with what they had.

[-]

AppearanceHeavy6724@reddit

I am from ex ussr, so we do not have same generation structure the West has, but by American standards I am late genX/early genY.

What you've said is sad incoherent unrelated blabbering.

[-]

scswift@reddit

Oh then let me be more clear:

Performing a complex mathematical operation requiring a 128 bit key, as AES requires, on every 32 bit integer transferred on a 486 running at 100mhz, would be insanity. And that would have been the top of the line PC back in 1991 when MD5 was introduced.

[-]

AppearanceHeavy6724@reddit

It is not important information for the discussion.

[-]

scswift@reddit

It absolutely is.

You implied that the people who chose MD5 did so becuase they were stupid.

I am suggesting they did so because that was the best they could do at the time without horribly impacting the user experience, due to processing power.

Therefore it is entirely relevant how powerful PC's were at the time, and how many calculations are needed to implement something better like AES.

[-]

AppearanceHeavy6724@reddit

I cannot figure out if you a r e trolling or simply not getting it. The vulnerability in md5 is non obvious and I am not implying it's designers were stupid, my point is exactly opposite- smart people cannot foresee the developments in cryptanalysis, ripemd of the same Era or md2 are still good. Lack of public knowledge of weaknesses in AES does mean much, as md5was considered secure for more than 10 years.

[-]

scswift@reddit

Well then you should have made that more clear, because from this:

"Understand why it is secure " is a nonsens; sha and MD5 are fubar these days yet they were analyzed by likes of you in 1990s and were deemed to be good.

It certainly sounds like you're calling them stupid. You generally don't use the phrase "by the likes of you" unless you're trying to INSULT someone.

[-]

AppearanceHeavy6724@reddit

Yes, you are correct I tried to insult you after you were disrespectful to me.

[-]

scswift@reddit

So then you admit that I was correct to assume you were suggesting the researchers are stupid.

Also, that wasn't a reply to me, dummy. You were replying to another guy when you said that.

[-]

AppearanceHeavy6724@reddit

I am tired talking to you. Bye.

[-]

cd1995Cargo@reddit

SHA-1 and MD5 were broken partly due to their small digest size, and those are both hash algorithms which are a different category than block ciphers.

There has been no analysis of Rijndael that suggests any lack of security. Just the opposite, it has been analyzed for decades and is known to be secure. If you’re interested, here’s a site that visually shows how it works: https://legacy.cryptool.org/en/cto/aes-animation

If you don’t even understand how the algorithm works (which it appears you don’t), you don’t have any standing to claim that it has been broken. Just screeching about how “it COULD BE broken bro, it like, totally COULD BE” and offering up nothing but that isn’t an argument. If you’re going to make a claim that the NSA has somehow cracked or backdoored a publicly designed algorithm that has been analyzed more than any other in history you’re gonna need to provide something to support your claim.

I can say there’s a statue of me in orbit around Neptune right now, and when you tell me that’s ridiculous I can just say “YEAH, but can you prove there isn’t? HUH? There could be!!”

You’re baselessly speculating on something you don’t understand with nothing to back up your claim and when someone who knows what they’re talking about tries to explain it to you you’re putting your fingers in your ears and saying “NUH UH! I don’t believe you”.

[-]

AppearanceHeavy6724@reddit

I understand how aes works, implemented multiple times, it is you who is falling into a fallacy that if the researchers have not found holes in aes there us no such, keeping in mind asymmetry in budget in nsa and the independent, competing researchers.

Neither sha1 nor md5 were broken due to the digest size, they were broken due to defects (then unknown) in round mixing functions. Similarly sized ripemd160 and equally old is free of these defect. Pardon me, but you sound like overconfident dilettante.

Besides I never said that AES has been broken, all I said it could gave been and you'd never know. I would rather stack say twofish onto aes if I want unbreakable security or run 3aes - I do not always need ultimate speed, especially dealing with llms. Only idiots believe the "wisdom of using well proven algorithm", as there is massively incentive to break the standard and keep it hush by billions of different adversaries.

[-]

Historical-Camera972@reddit

I believe the usage of AES doesn't dissuade them from getting what they are after. No, I don't believe they can read the contents of those packets. I do however, believe in hardware backdoors that help make that point irrelevant. Odin's Eye, is very real. If the NSA wants to sniff through your data after strange packet routing. They're probably waiting until you're asleep, and using Odin's Eye, the way only big government can.

[-]

Ikinoki@reddit

As far as I know the algorithms are superb, the issue is with WHO makes the RNG and WHAT provides the RNG, because if you control RNG you can supply predictable keys, seeds and everything else.

There's no way around this.

Right now your RNG is supplied by internal Intel algorithm, back in the days it was opensource weak implementation most of the times. There are ways to create an invisible pattern in RNG output which weakens all crypto your system can make.

[-]

BlipOnNobodysRadar@reddit

They store all of it. When it becomes feasible to decrypt it, it will be there.

Also all of our hardware is backdoored anyways. It wouldn't surprise me if they have some obfuscated method of making the packet encryption a moot point by having every router, PC, and server compromised anyways.

[-]

-p-e-w-@reddit

Also all of our hardware is backdoored anyways.

Sorry, but that’s tinfoil hat nonsense. There are thousands of independent security researchers around the world, who in decades haven’t found anything remotely suggesting that this wild claim is true. Every standard hardware component has been dissected at every level, down to CPU microcode and firmware signature checks. Do you think those people are all idiots?

Your comment is the typical talk of someone who thinks computers are magic, intelligence agencies are wizards with access to alien technology, and that they are capable of hiding something like that in plain sight of hundreds of thousands of extremely smart people.

Here’s the reality: When the NSA did try to insert a backdoor in a PRNG 20 years ago (Dual_EC_DRBG), it was immediately caught by multiple independent researchers, long before it got anywhere near production systems. The world where the government employs super hackers that are much better than everyone else exists only in movies.

[-]

TheTerrasque@reddit

While I generally agree with you, you have Intel Management Engine and AMD PSP that can arguably be considered a backdoor.

[-]

a_beautiful_rhind@reddit

If you can edit your bios, you can generally disable it. There are other problems than a remote access backdoor though. "encrypted" ssds had manufacturer passwords. Who even knows what's in windows. Pushing microsoft accounts, sending data, and now they want to screenshot your desktop every 5s.

Even if it's not remote, your computer can generate evidence against you that you didn't plan to keep. Agency follows the crumbs from provider logs and billing data then gets physical access.

[-]

whinis@reddit

If you can edit your bios, you can generally disable it.

You literally cannot atleast on intel and I am uncertain on AMD. There is lots of background and articles on this but earlier intel processors had it as a separate nullable module but since atleast the 7th gen its deeply embedded in the startup routine and all attempts so far to disable it have not been successful.

[-]

a_beautiful_rhind@reddit

Just set the hap bit. I think it works up to ME 14 even. On this thinkpad I have the stupid wson bios chip with glue on it and a backup normal one so I didn't put in the effort.

[-]

itsjustawindmill@reddit

That doesn’t completely disable the ME though; it’s still critical for booting the computer (this is a familiar playbook: make critical functionality pointlessly dependent on an arbitrarily intrusive component, then claim the component itself is critical functionality). HAP just puts the ME into an abnormal/non-functional state after the critical startup stuff is done. But that’s enough to leave you exposed to multiple known, major vulnerabilities requiring firmware updates from hardware vendors to address.

Given a choice between setting the HAP bit or not, obviously setting it is better. But it doesn’t make the problems go away completely

[-]

a_beautiful_rhind@reddit

If its a desktop you can at least replace the onboard nic. The OOB stuff is tied to it usually. Tons of computers that can run coreboot out there too.

FWIW, it does seem to be nonfunctional with the HAP bit. Intel gives you the tools to try to use it like a BMC and I played with it.

What's more concerning is my lenovo bios had some kind of remote support in it and many mini PCs I encountered can do recovery from the internet as well. All those vendors with their own implementation vs IME. Don't see it discussed much by anyone.

[-]

-p-e-w-@reddit

Nonsense. Just because you can’t disable a component of a system doesn’t make it a “backdoor”. Do you even know what that word means?

There is zero evidence that the IME allows for unauthorized remote access. And people have looked into this question very, very carefully. Claiming otherwise is conspiracy theory territory.

[-]

BlipOnNobodysRadar@reddit

There was a sophisticated hardware backdoor on apple silicon, both phones and macs, running for years undetected. When Kaspersky Labs identified it and reverse engineered it enough to publish, no western media company made a peep. iPhones were sending data, obfuscated and undetected for years, and no mainstream media source thought this was newsworthy apparently.

https://securelist.com/operation-triangulation-the-last-hardware-mystery/111669/

A year after Kaspersky Labs made this reveal they were banned in the US.

[-]

-p-e-w-@reddit

That’s not a “backdoor”. You clearly don’t understand what that word means.

[-]

BlipOnNobodysRadar@reddit

The exploit relied on a hardware design "flaw" that most likely was left in as an obfuscated backdoor to exploit with plausible deniability.

[-]

AppearanceHeavy6724@reddit

No one still knows if AES is compromised by NSA or not. They are far ahead in cryptanalysis than all the other educational and research institutions. NSA are literally superhackers hired by US government. They are absolutely top notch best crytographers on this planet, all you've said is a naive "debunker/skeptic" mindset.

[-]

-p-e-w-@reddit

No one still knows if AES is compromised by NSA or not.

AES wasn’t even designed in the US. And if it had significant flaws, they would absolutely have been found by independent researchers in 25 years, which is what happened with DES and other algorithms.

NSA are literally superhackers hired by US government.

No they aren’t. They are what’s left over after FAANG and hedge funds have taken all the best people, because those institutions can pay 5-10 times more than the NSA, plus you won’t have to spend the rest of your life with someone watching you. The idea that the NSA has anywhere near the best hackers is ridiculous. They can’t offer them even a fraction of what they get elsewhere.

[-]

AppearanceHeavy6724@reddit

Statement about "if there were flaws they would have long discovered" does not hold water, as in far lower barrier to entry areas such as buffer overflow discovery people still find ancient 15 years old bugs.cryptanalysis is so narrow special area and nsa spends so much more money on that research there is a very small probability someone outside nsa will discover flaws.

FAANG DGAF about cryptography, buddy. Cryptography is ultranarrow specialization and if you are into it you will want to work in NSA if you do not have ethical reasons not to.

[-]

-p-e-w-@reddit

FAANG DGAF about cryptography, buddy.

Google maintains multiple cryptography libraries, including BoringSSL and the Go cryptography standard library, and has renowned cryptographers among their staff. Microsoft has done cutting edge cryptography research, such as on Fully Homomorphic Encryption and post-quantum cryptography. Apple is a pioneer in applying privacy-preserving cryptography techniques to user data, and has by far the most advanced hardware-assisted security in the business, which has famously resisted multiple efforts by three-letter agencies to crack it.

You clearly have no idea what you are talking about.

[-]

AppearanceHeavy6724@reddit

The amount of cryptography positions in faang is laughably small, and these positions are either extremely academic useless crap like homomorphic encryption or equally boring shit like maintenance of boring ssl. True meat is in nsa or universities.

[-]

doodlinghearsay@reddit

Inserting a backdoor into an open protocol is far more difficult than inserting it into a piece of software that only goes through black-box testing. I don't think it's crazy to assume that a lot of networking/firewall vendors have been pressured into putting backdoors in for US intelligence. Actually, any of the thousands of security vulnerabilities found every year could have been put there deliberately. It's very hard to distinguish incompetence from malice and it's even more difficult to prove it.

But the whole discussion is moot. I doubt these organizations are looking for a magic bullet. They would much rather use something simple, like compromise the endpoint itself. Specifically, with OpenAI they will just have someone on the inside that transfers all the data, while the internal security team pretends not to notice.

[-]

-p-e-w-@reddit

I don't think it's crazy to assume that a lot of networking/firewall vendors have been pressured into putting backdoors in for US intelligence.

It’s not “crazy”, it’s simply a conspiracy theory. Assuming that the US government orchestrated 9/11 isn’t automatically crazy either, there just isn’t any hard evidence for it, so Occam’s razor applies. And considering that many if not most routers are made in China, Occam’s razor says that they weren’t, in fact, pressured by the US government.

Also, there are thousands of people who take these things apart and look very deep into what they contain. It’s incredibly difficult to hide anything in such systems.

[-]

doodlinghearsay@reddit

Also, there are thousands of people who take these things apart and look very deep into what they contain.

As I said, serious vulnerabilities are found all the time, including in products that have been in use for some time.

I don't care for the argument that we should assume these are honest mistakes until proven otherwise. Some of the are, others aren't. It's not jury duty where you only have two options, guilty or not guilty. "Probably guilty, but I can't prove it" is a perfectly reasonable verdict.

[-]

-p-e-w-@reddit

There’s a huge difference between “products have vulnerabilities (some of which may have been deliberately inserted)” and the above claim of “all our hardware is backdoored”. The latter is Hollywood-level nonsense, roughly as reasonable as the movie trope that shooting a monitor will disable the computer.

[-]

doodlinghearsay@reddit

There’s a huge difference between “products have vulnerabilities (some of which may have been deliberately inserted)” and the above claim of “all our hardware is backdoored”.

There's no functional difference between a software vulnerability and "backdoored hardware". If you're buying a firewall you're using the whole package. It makes no difference whether the backdoor is encoded in the placement of the logic gates, ASIC microcode, or the the software implementation of the SSL inspection module. Either way, the confidentiality of any communication that goes through the appliance is potentially compromised.

Of course it's impossible to say that all devices are compromised. But from a user point of view, unless you can prove that a particular set of devices involved in a secure communication are not compromised you would need to treat the channel is unsafe. At least vis a vis US intelligence. Of course you should still follow good security practices to protect yourself from less capable attackers.

There are some subtleties when we're talking about devices running fully open source software. But I'm not sure this is relevant in 99.9% of communication. Almost all secure conversations rely on some proprietary software at some point in the chain in a way that would make them insecure, if the software happens to be incorrect (by mistake or by design).

[-]

townofsalemfangay@reddit

Daily reminder that Cloudflare is just an NSA reverse proxy and encryption is basically cosplay.

[-]

Red_Redditor_Reddit@reddit

Also all of our hardware is backdoored anyways.

Maybe, but whoever has that power can only use it a few times.

Now consumer shit, yeah I agree 100%. It's out of control. Everything is spying so bad that having something that doesn't send telemetry is almost worse because then you stand out.

[-]

photonenwerk-com@reddit

Every officially signed certificate has a "backdoor". Only self-signed certs are save.

[-]

BumbleSlob@reddit

ITT: “I don’t know what asymmetric encryption is and I don’t care to know, but lemme tell you all about the NSA”

[-]

Efficient_Ad_4162@reddit

https://en.wikipedia.org/wiki/PRISM

Dumbass.

[-]

BumbleSlob@reddit

Would you mind pointing me to the part of that page which relates to packet capture?

Oh right, it doesn’t exist, and you continue to have no idea what you are talking about.

Protip: if you don’t know how an SSL/TLS handshake works, maybe you shouldn’t be opining about network security

[-]

Efficient_Ad_4162@reddit

Yeah sure.

https://www.zdnet.com/article/prism-heres-how-the-nsa-wiretapped-the-internet/

Here's an example. I wouldn't read the article though, I'd go looking for the leaked slides.

Pro-tip: If you don't know anything about a two decade old NSA program you shouldn't have opinions on a two decade old NSA program.

[-]

BumbleSlob@reddit

Are you stupid or do you just not understand what packet capture is lmao.

Congrats on being non-technical, kiddo 🎈

[-]

Efficient_Ad_4162@reddit

How old were you when the existence of PRISM got leaked? Were you playing tag at the time or just high? Because you clearly don't know the first thing about it about and every post you make just makes that more apparent.

[-]

BumbleSlob@reddit

Can you even explain what the difference is between symmetric and asymmetric encryption and how that relates to SSL/TLS and packet capture?

Of course you can’t, because you are a child lmao

[-]

The_IT_Dude_@reddit

I do not think that program acts in such a way. Prism required that companies comply. Not even the NSA has a way breaking RSA in mass. So they could be hacking them and collecting the data that way, but my guess is that companies like OpenAI will simply need to comply with secret orders. To expose these orders will just be illegal. Other laws and privacy be damned. They won't even be able to speak out against it.

If this is where people are going to interface with the internet, ask questions, and consult, they're going to figure out some way to get at it. It won't be breaking RSA though.

[-]

cyberdork@reddit

I figured internally they never truly deleted anything, and the delete button was more like a placebo...

Oh they want to risk a multi billion dollar fine by the EU?

[-]

The_IT_Dude_@reddit

They'll be protected. Do you figure the NSA respects those laws? I'm sure they do not.

[-]

cyberdork@reddit

That's a completely different thing. The NSA anyways puts all internet traffic on their Utah datacenter, they don't give a shit about OpenAI or the EU.
But this was about the company lying about deleting logs and intentionally breaking EU law while doing business in the EU.

[-]

SkyFeistyLlama8@reddit

Be very afraid. Anything you say to ChatGPT can be used against you in a kangaroo court in the near future.

We're getting to a point where the NSA could be used to target domestic political opponents and the executive branch simply ignores the judiciary, even a stacked right-leaning SCOTUS.

[-]

BusRevolutionary9893@reddit

We're far past that point.

[-]

a_beautiful_rhind@reddit

IRS was already used to target political opponents in multiple administrations, as were other agencies. That cat is long out of the bag.

[-]

llmentry@reddit

Given OpenAI's current legal fight on this, obviously you figured wrong. (Assuming you're referring to paid customers, that is - the only thing OpenAI has ever been open about is that if you don't pay, you're always the product.)

Yes, one should always be sceptical, and careful with the data you share. But their policies are there for a reason. And are very clear that they will not store prompts or data if you pay (unless you opt in) - which is emphasized by the fact that they're fighting this.

(Also, this requirement to store all outputs from everyone just to see if their models can reproduce NY Times articles ... is simply nuts.)

[-]

Familiar_Gas_1487@reddit

[-]

Every_Prior7165@reddit

well clearly people know which is true, double the upvotes in half the time

[-]

koeless-dev@reddit

The people of Sargus 4 agree with you!

[-]

nigl_@reddit

They really tried with the framing there.

[-]

ksera23@reddit

Are people here illiterate, astroturfing bots or just refusing to read the article? That's literally the title of the article, this post is the one doing the framing.

[-]

StyMaar@reddit

fact: a court order mandates OpenAI to keep all logs, OpenAI is trying to fight against it.

framing: OpenAI are good guys defending prvacy

[-]

llmentry@reddit

You think they're just worried about the storage costs?

I agree that framing OpenAI as any form of "good guys" is inappropriate. But, hey, at least they're trying to protect their right to not store user prompts and outputs, which is ... well, actually, better than I'd expected of them.

I'm not sure many people have realised this, but they aren't even breaking their own privacy policy by storing user prompts and outputs when required under a court order. Their privacy policy has always been clear that they will retain data when required to meet legal obligations. They're going above and beyond to fight against this.

Also ... I'm quite pleased to learn that they weren't storing everything, regardless of their privacy policy.

The problem is: I was so tempted to start asking GPT 4.1 to reproduce NY Times articles for me, just to see if it would or not. And I'm sure there have been many who just tried it out anyway. So OpenAI is going to have so many prompts now, all looking as though users were desperately trying to read the NY times via ChatGPT ...

[-]

Loud_Ad3666@reddit

If they were worried about it they wouldn't have designed it the way they did.

It's a bullshit copout from openAI. They wanted to secretly keep it for their own vile purposes and now their courts have forced them to do so publicly rather than secretly.

[-]

StyMaar@reddit

You think they're just worried about the storage costs?

They are worried about the bad press implications, that's it.

They were most likely already storing everything they could, and would still continue to do so if the order was reversed, but they don't want people to think too much about that.

[-]

llmentry@reddit

Well, if that was their plan, they've achieved the exact opposite.

This has clearly made a lot of people very aware of just how unprotected the data they send to OpenAI is.

Anyway, who knows? But quiet compliance would have been a far easier route here, you would think, with the valid defence that they were legally obliged to do this if it ever came to light.

[-]

half_a_pony@reddit

"slams" in news headlines usually means "someone whined about something on twitter"

[-]

SamSausages@reddit

lol as if they weren’t already saving everything….in their world data is essential for training, and they need more data. 1000000% they already were doing it, and this just gives them a scale goat.

[-]

Diabetous@reddit

We're not saving what you wrote*!

but...

^^We're ^^saving ^^a ^^tokenized/hashed ^^meta ^^data ^^that ^^technically ^^has ^^a ^^loss ^^rate ^^and ^^isn't ^^the ^^same.

[-]

SamSausages@reddit

Pretty much. That’s what all of the end use agreements say. (At least the ones I have read). They just say that they will “anonymize” the data.

But fingerprinting exists.

[-]

lostnuclues@reddit

They can still remove or minimize the logging.

[-]

True-Surprise1222@reddit

They have the ex head of the nsa on their board. If you thought your data was private you were delusional.

[-]

Super_Sierra@reddit

The NSA collected so much data that even with ten thousand employees, it would have taken 30 years to go through it all.

And the Bush administration lied and lied about what they were doing, with actual secret courts to get warrantless searches of your data.

[-]

True-Surprise1222@reddit

thank god they have AI now so they can parse through it all while they await encryption to be broken for their next fun trick.

[-]

pitchblackfriday@reddit

Finally the real OpenAI.

All your data are belong to us.

[-]

Far-Heron-319@reddit

How does this work if youre using something like open router?

[-]

Nekasus@reddit

Openrouter may need to include details of who is sending the API call to open AI.

[-]

Far-Heron-319@reddit

Interesting. I haven't had to do that (yet)

[-]

blurredphotos@reddit

Openrouter already implemented the picture ID for certain models.

[-]

AkmalAlif@reddit

my mindset when using any type of online services is THEY ALWAYS TRACK AND SAVE YOUR DATA, no amount of bullshit lies or TOS tell me otherwise, that's the tradeoff of going local or convenience, i mean shit openAI has a governmental contract for project Stargate🤣 you know damn well the CIA, FBI or whatever the fuck are already using their users behavioural data...hmmmm data, data, data, give me all your data💦💦💦💦 : sam altman probably

[-]

llmentry@reddit

Soooo ... they're fighting this ... why, exactly? Sounds like storing data is all part of their cunning plan ...

[-]

CheatCodesOfLife@reddit

Soooo ... they're fighting this ... why, exactly? Sounds like storing data is all part of their cunning plan ...

Good publicity. What did you think the reason was?

[-]

llmentry@reddit

The publicity has been almost entirely negative, so if that was their plan, it backfired badly. They didn't need to Streisand-effect this thing, they could have simply, quietly, and legally complied.

I can only think that they realised the value to their current customers in going above and beyond. I'm not entirely discounting the possibility that it could still be a sham, and they've been keeping everyone's data all along - this is OpenAI after all. But, it's a promising sign.

[-]

TentacledKangaroo@reddit

It's for the illusion of objecting to having to keep data. Even if the publicity right now is net negative, they'll be able to point to it in the future and go "see?! We objected to it! (Pay no mind to the fact that we were already saving all that data anyway and only deleted what we did to obstruct court proceding.)"

[-]

vikarti_anatra@reddit

What next?

Police and divorce lawyers starts to ask for OpenAI logs all data who could be sent by some specific users?

[-]

TentacledKangaroo@reddit

Prospective employers asking for your OpenAI history.

[-]

Shockbum@reddit

You cheated on your wife with an Role Play LLM! That's why she had an affair with the neighbor - Federal Judge

[-]

davew111@reddit

This sounds like a big problem for anyone using GitHub Copilot. When you ask a question it will upload the source code you are working on as part of the context. This source code can contain database connection strings, API keys etc.

[-]

engnadeau@reddit

Data privacy and sovereignty have never been more important

[-]

Ok-Cucumber-7217@reddit

This is fucked up on evey level

[-]

This-Complex-669@reddit

🤣 what a way to say you are doing illegal stuff with chatbots 🤣

[-]

pitchblackfriday@reddit

Define illegal.

https://www.newscientist.com/article/2479045-us-government-is-using-ai-for-unprecedented-social-media-surveillance/

[-]

This-Complex-669@reddit

Go ahead. I have nothing to hide

[-]

my_name_isnt_clever@reddit

Oh, so you're not a woman or queer or black or trans then? All of these groups do nothing wrong but this admin calls them terrorists so they can justify spying on them. If that's not a problem for you, you're privileged.

[-]

Eisenstein@reddit

Do you dance in public? Would you wear a clown nose on the street? Are you brave enough to ask people questions you want to know the answer to even if it would be really embarrassing if the answer wasn't good? If you kept a diary of your thoughts would you let anyone read it? Will you post your browser history on pastebin and link it here without editing it?

[-]

This-Complex-669@reddit

How ? I will do it immediately if you give me a guide

[-]

threeLetterMeyhem@reddit

Pretty much any LLM should be able to give you working instructions for that lol

[-]

SuperS06@reddit

And also tell you why you shouldn't!

[-]

satireplusplus@reddit

First they came for the robots

And I did not speak out

Because I was not a robot...

[-]

BCBenji1@reddit

Cool give me your debit card details please and while you're at it a nude photo. Thanks

[-]

moonnlitmuse@reddit

You are delusional if you think a basic concern for privacy implicates any sort of malicious use in anyway.

This will be what ultimately leads to 1984-levels of authoritarianism in 20-30 years. Not the politicians, but fellow citizens falling for this train of thought. “If you have nothing to hide you wouldn’t say that. Therefore you must be doing something wrong.”

Truly sad to see.

[-]

This-Complex-669@reddit

Then why am I pasting my Fortune 500 company’s private and confidential information in AI Studio? 😹

Basic privacy is not even a thing. Who cares my man

[-]

moonnlitmuse@reddit

Because you don’t care about privacy unlike a good chunk of people.

What a stupid question.

[-]

This-Complex-669@reddit

🤣 You will lose the battle against Google. Google will reach 300 by end of the year and make privacy extinct in the name of AGI. Don’t let some stupid paranoia get in the way of the universe’s most important event.

[-]

moonnlitmuse@reddit

I’m sorry you’ve got such an absurd, unrealistic view on the future.

[-]

RickyRickC137@reddit

I don't understand. What the hell can you do illegally using Chatgpt?

[-]

JorgitoEstrella@reddit

Idk but imagine something like tiananmen square in deepsek just that in the future it becomes illegal to even ask.

[-]

Worldly_Science1670@reddit

creepy roleplaying and gore violent stuff with underage 'characters'

[-]

stoppableDissolution@reddit

Even if that was the case (its not) - so what?

[-]

TechnoByte_@reddit

Just because 90% of your roleplays are like that, doesn't mean ours are.

[-]

cultish_alibi@reddit

Thought crimes

[-]

OkProMoe@reddit

Yes, let’s normalise speech being illegal /s

[-]

Loud_Ad3666@reddit

I feel bad for all those folks who used chatgpt as a stand in for therapy services that they couldn't afford.

Now their personal issues are for sale to the highest bidder. Just like the DNA data of folks who tried the genealogical services fad.

Never. Trust. Corporations. With. Sensitive. Data.

[-]

Competitive-Yam-1384@reddit

This is the fault of the judge and the various news agencies suing OpenAI for copyright infringement. NYT being the organization who suggested OpenAI must be deleting evidence. I’m personally more pissed at the news agencies.

[-]

joninco@reddit

OpenAI like -- 'yo, we already do that.. to train more chatgpt'

[-]

IUpvoteGME@reddit

Fill the API with garbage. I'm doing my part. 100 million tokens of lorem ipsum per day, every day.

[-]

madaradess007@reddit

i've thought about it:
the real value is going on locally, while normies chat with chatgpt about their grocery lists and workout plans
so openai is gathering average normies prompts and their normies outputs - that is why openai models go dumber and dumber :)

go local boys!

[-]

blurredphotos@reddit

(bingo)

[-]

Murph-Dog@reddit

Ouch on storage costs. If they have to save output, that will hurt even more.

[-]

blurredphotos@reddit

Sam will pass every cost down the line. Guaranteed.

[-]

Eisenstein@reddit

I don't think it is problem at all. A 60 drive 4U storage server can fit 1.2PB with 20TB drives. The entirety of Don Quixote is over 1000 pages and only takes up 2.2MB, which means at 10 servers to a rack that is 7.37 Billion copies of Don Quixote per storage rack. With 1,023 pages a storage rack can fit 7.54 Trillion pages of text.

That's a lot of chats.

[-]

FastDecode1@reddit

And that's not even considering the use of a compressed filesystem.

Text compresses famously well, so that'll at least double the storage capacity, if not quadruple it.

[-]

ForsookComparison@reddit

Oh wow. So even API usage probably won't ever create a physical storage issue. I'm always amazed at how small text is.

[-]

Yorn2@reddit

It all depends on how it is stored. Sadly, there are still some government API frameworks that still generate text files for every request instead of using databases or blobs.

[-]

GlowingPulsar@reddit

Duckduckgo's duck.ai claims to provide models anonymously, but they offer o3-mini and GPT-4o mini. Their privacy policy says they "have agreements in place with all model providers that further limit how they can use data from these anonymous requests, including not using Prompts and Outputs to develop or improve their models, as well as deleting all information received once it is no longer necessary to provide Outputs (at most within 30 days, with limited exceptions for safety and legal compliance)." I wonder how this will affect them, and if they'll stop offering OpenAI models given this change.

[-]

BangkokPadang@reddit

At some point people will have to just accept semi-anonymized data at best.

Ie if you’re using a proxy service that doesn’t log, then it passes your prompt on to openAI and they log it as being from OpenRouter, then it’s logged, but not connected to you unless you include personally identifying information in the prompt, so… don’t do that.

Also, if you have the resources, download eights now for Deepseek-R1, and keep downloading the best OSS SOTA models because one day in a decade or so, inexpensive hardware will likely be able to run them no problem, and you’ll always be able to feed them current data with search and RAG, so even if OpenAI somehow gets local AI banned (worst case) you’ll already have the weights and the know how to run it.

[-]

blurredphotos@reddit

Underrated comment.

Not to mention that the guardrails are improved every release. Nothing will ever be as "free" (or as malleable) as what we have now.

[-]

rorykoehler@reddit

There is no such a thing. The AI can easily figure out who you are out at least who you probably are based on your prompt patterns

[-]

BangkokPadang@reddit

In this scenario you’ve imagined of these stateless models we have and use today, how is it gathering and cross referencing these prompt patterns?

[-]

rorykoehler@reddit

You're posting here right?

[-]

ForsookComparison@reddit

Everything that leaves prem with the intention of being decrypted somewhere that's a black box to you is, at best, a pinky promise.

[-]

sgent@reddit

"limited exceptions for safety and legal compliance"

This won't affect them at all as it seems their TOS covers a court order (which this is).

[-]

GlowingPulsar@reddit

If they continue to serve OpenAI models when the user chat logs are being indefinitely prevented from being deleted, that's against the spirit of what duck.ai claims to be providing. All I was curious about is if they'll stop providing access to OpenAI models.

[-]

llmentry@reddit

So many people seem to have missed this point ...

[-]

SeriousBuiznuss@reddit

Duck.Ai might use GTP-4o mini from Azure as opposed to from OpenAI?

[-]

CouscousKazoo@reddit

In the meantime, should we consider those models to now be indefinitely logged? This is all breaking, so it’s understandable DDG has yet to push an update.

[-]

blurredphotos@reddit

Just lost 1/2 of your userbase.

[-]

spawncampinitiated@reddit

GDPR doesn't sound so bad now eh?

[-]

acec@reddit

If OpenAI preserves the information against the user will, it is not GDPR compliant so it should not be able to offer services in Europe

[-]

spawncampinitiated@reddit

It goes first. Until you are GDPR compliant you cannot operate in EU. That's why Anthropic/meta took longer to open here.

[-]

latestagecapitalist@reddit

They have deeper problems in EU because of the personal data from pre-training and in model etc.

But GDPR is not an issue for prompt retention so long as they 1. say they are keeping it, 2. give a mech for users to see what they retain ... they should allow deletion but GDPR allows for data to be retained for law enforcement, fraud prevention etc.

[-]

MerePotato@reddit

Unless you're gooning to the thing or trying to build a bomb I really don't see that this is a big issue, I don't much care that the US government can see my algebra and given I'm running on Windows anyway being paranoid about them is a bit silly. The same holds true for Deepseek.

[-]

redditscraperbot2@reddit

People should be allowed to goon without fear of their goon logs being forever on a database with their personal details.

[-]

Sudden-Lingonberry-8@reddit

do you goon announcing your name? MY NAME IS MIKE AND I AM GONNAA COOM, is that how you communicate with your LLM?

[-]

redditscraperbot2@reddit

What do you think the requests look like from OpenAI's end?

[-]

Sudden-Lingonberry-8@reddit

probably lots of xml and http logs?

[-]

redditscraperbot2@reddit

If you're using their API directly that key is sent as well. The key is attached to an account. Unless you're using a layer of obfuscation like open router.

Idk about you, but I don't really think it's a good premise to have that information stored in perpetuity.

[-]

Sudden-Lingonberry-8@reddit

well of course the API key is sent, they are the ones that provide the API key in the first place... now will they store the data?

yes

should you send informtion you think it is private to an LLM?

no

Will other companies request data to you and leak it to openAI?

yes.

Can you do something about it?

no, well you can indeed send noise as well, so you can include fake data or fake numbers, fake stories, along with the truth, then if you send 99% fake stuff and 1% truth, they still have nothing.

So you can send fake addresses, fake names, fake invoices, then if they store all the data, they got a bunch of nothing. It's the only way to "fight back".

[-]

redditscraperbot2@reddit

I don't really know what we're disagreeing on. I just think gooners should be able to goon without the threat of their data being exposed.

[-]

Sudden-Lingonberry-8@reddit

that or gooners shouldnt send goon data to third parties

[-]

redditscraperbot2@reddit

I actually agree. I'm just saying that in an ideal world, that wouldn't be the case.

[-]

CheatCodesOfLife@reddit

A user's chats will have a common uuid or API key associated. Even if Mike didn't announce his name like that (though no kink shaming here) -- he might have uploaded his tax docs or other PII in another chat. These separate chats will likely have something like an api key or uuid tying them together🤭

[-]

Sudden-Lingonberry-8@reddit

well mike should upload fake tax docx

[-]

Cultured_Alien@reddit

You're paying and you'd want your logs to be public? Even if your aren't using it for illegal stuff, it's going to affect you in the background. Detailing your life, without you ever knowing.

[-]

Ulterior-Motive_@reddit

Local models don't have this issue.

[-]

Ardalok@reddit

The right to private correspondence?

Pfft, the interlocutor is a robot, so this is just a smart notebook, not a correspondence, and the government can do whatever it wants.

And if you aren't citizen they can do way more!

[-]

pier4r@reddit

are personal notebook not protected by law? Really asking here.

Like I bring my notebook in the park, I leave beside me for a minute, someone comes and say "hey! you know I can read that? It is not protected! Let me have a glance at it!". Can this happen?

[-]

threeLetterMeyhem@reddit

It's worth doing some research into how the 4th amendment has been applied to data stored "in the cloud" (on other people's / company's servers).

The short-ish version is that, in the US, your data in the cloud may or may not be protected by the 4th amendment depending on specifics around what kind of data it is, the age of data, who is searching that data (and why), and what the current court case rulings are.

If privacy is of critical importance to you, run local LLMs.

[-]

psychicsword@reddit

If your personal notebook works by mailing letters to a 3rd party bank's safety deposit boxes and asking them to open the envelopes and store your letters for you then it isn't your personal notebook.

[-]

BoJackHorseMan53@reddit

Laws must be updated. The robot must be treated as human under law

[-]

satireplusplus@reddit

No, humans should be treated as humans and given appropriate privacy online.

[-]

BoJackHorseMan53@reddit

Laws protect correspondence between two humans but not correspondence between a human and a machine.

[-]

ForsookComparison@reddit

This is reaching.

Tool-use needs to be considered sensitive data just like private correspondence.

[-]

this_is_a_long_nickn@reddit

More likely the human will be treated as a robot

[-]

calamitymic@reddit

I always treat everything online, better yet, connected to the internet, as a soft delete.

[-]

Jotschi@reddit

Does this also apply to ChatGPT that is hosted by Azure in Europe?

[-]

AriyaSavaka@reddit

It's just standard surveilance state stuff

[-]

-Ellary-@reddit

[-]

TheArisenRoyals@reddit

Welp, never using that shit again, already didn't for the most part, but damn.

[-]

xXG0DLessXx@reddit

The only thing I’m using it for is the 4 free images. Like clockwork when the timer resets for the day it’s time for new images!

[-]

smallfried@reddit

It's fine for stuff you don't mind being in the open.

Same goes for what you comment on Reddit by the way.

[-]

FullOf_Bad_Ideas@reddit

OpenAI could have taken steps to anonymize the chat logs but chose not to, only making an argument for why it "would not" be able to segregate data, rather than explaining why it "can’t."

That would signal readiness to introduce this and adjust to new requirements of preserving all outputs. They are disputing this so it's not a good look.

It's not exactly their fault here, though users are indeed at risk and maybe would have been better off with more secure inference.

[-]

skyblue_Mr@reddit

We need a better local LLM that outperforms O3.

[-]

s_arme@reddit

It begins with the fact that oai added search functionality to expand and compete with perplexity but now end up jeopardize all users including one that don't use search functionality.

[-]

klam997@reddit

I don't really go local due to hardware issues and need for really good SOTA models.

But I'd rather give my data to our brothers deepseek and qwen at this point. At least I know they will make us better models based off our data.

[-]

stoppableDissolution@reddit

Also, chinese LEAs and other entities dont care about you at all

[-]

SpareIntroduction721@reddit

10 years without regulation my friends.

[-]

rorykoehler@reddit

This sounds like regulation to me

[-]

PandaParaBellum@reddit

Weekend project:
Make chatGPT generate a thousand pieces of slash-fiction of the judge who gave that order.

No worries, I'll press Delete on all of them.

[-]

marrow_monkey@reddit

Maybe they should consider moving their operations to EU?

[-]

yopla@reddit

Can't wait for them to be slammed by the EU for not respecting the GDPR. 😆

[-]

pitchblackfriday@reddit

USA: "I can do whatever the fuck I want."

shatters GDPR and throws it out of a window

[-]

latestagecapitalist@reddit

RIP anyone who was just testing safety

I've posted here previously they'd already said in terms that anything that triggered safety was an automatic 7 year retention

[-]

mecatman@reddit

Please use my data, which I use to code my chatbot and maybe we could get a better distil model in the future.

[-]

Sudden-Lingonberry-8@reddit

I always knew, I do not mind please use that data, so you can get better models, so china can distill them, however do realize that openai is far behind the race.

[-]

SkyFeistyLlama8@reddit

How about Azure OpenAI models? Microsoft requires warrants before releasing any logged data but I wonder how much data is logged in Office CoPilot, corporate CoPilot subscriptions and Azure OpenAI API endpoints.

If Microsoft receives a National Security Letter to provide a dump of all Azure OpenAI usage for a certain tenant, they would have to provide that to the government without notifying the customer.

[-]

FateOfMuffins@reddit

I know it's cool to hate OpenAI but the comments in this thread blaming them is just...

Reddit used to be the place where you read the headline, then skip reading the article and jump straight into inserting your opinion in the comments, but you all can't be bothered to read even the headline anymore huh

OpenAI slams court order to save all ChatGPT logs, including deleted chats

OpenAI defends privacy of hundreds of millions of ChatGPT users.

OpenAI is now fighting a court order to preserve all ChatGPT user logs...

[-]

mister2d@reddit

Some people just aren't understanding that their personal data is going to be fully exposed (not anonymized) in legal discovery for all parties to see.

[-]

ForsookComparison@reddit

I doubt it'll happen to the masses, but it's insane that there's technically no legal stopper for the courts demanding this yet.

I'm sure some lawyer type can step in and figure something out but these logs aren't peer to peer chats protected by any correspondence laws and technically ALL of them are possibly using data from NY Times articles. If the judge was stupid enough or wanted to have a career moment, they could probably start pushing for this type of event

[-]

mister2d@reddit

You don't think what will happen to the masses?

[-]

ForsookComparison@reddit

I don't think that all of our chats will end up exposed as a part of discovery. It can, but I doubt that it will.

[-]

mister2d@reddit

You have to have the expectation that your data is compromised at this point. This will effectively be a data breach.

The court order as it stands freezes any modifications so that means whoever performs full take searches will incidentally be exposed.

[-]

TheRealMasonMac@reddit

Yeah, I support a certain level of responsibility on the part of these creators taking unlicensed intellectual data, but this is an outrageous court order. For a bunch of news companies who probably made up a minority of the training data? Keep in mind that OpenAI probably has contracts with stuff like healthcare, meaning YOU are affected regardless of YOUR choices if you happened to be in the line of fire. Absolutely atrocious.

[-]

Basileolus@reddit

expected!

[-]

stuffitystuff@reddit

Good thing the only risible thing in there is my extreme laziness as a developer

[-]

ForsookComparison@reddit

Anyone self-host from home and expose to the open web?

I'm kind of thinking it's time. I can serve Qwen 30B quick enough and wrap SmolAgents web search or something I figure

[-]

Littlehouse75@reddit

These things are complicated. Doesn't sound like anyone is attempting to "own your data". Sounds like the court is trying to make sure OpenAI doesn't destroy any potential evidence. This is fairly standard, as awful as it is for ChatGPT's users.

[-]

mister2d@reddit

At the same time exposing unredacted "private" data to those that aren't fit to handle said data responsibly.

[-]

Ssjultrainstnict@reddit

To only way to guarantee privacy is using local ai. This just gives me more motivation to make my app even better so it can be a local chatgpt replacement with complete privacy.

[-]

o5mfiHTNsH748KVq@reddit

Twice in one day I’m surprised that people don’t understand that the data they put into websites becomes the websites.

Like why wouldn’t this be the case? If I owned a SaaS, I’d preserve all logs just to cover my own ass when people act a fool.

I mean follow GDPR and allow people to delete themselves, but while they’re users, do what you want with that data as long as it’s legal. I put my data in, you give me a product that helps me - that’s the deal. If I want to be anonymous, I’ll run it my own llm locally for that task.

[-]

Igoory@reddit

I never used it (or any other cloud LLM) assuming privacy, so for me that doesn't change anything. If anyone did that before they should've been using local models instead.

[-]

1eyedsnak3@reddit

Same boat. 0 cloud and all local. Best thing I ever did. Once you understand the freedom, there is no going back.

[-]

No-Break-7922@reddit

Closed/cloud providers only had 12-24 months before they either joined open-source or went out of business anyway, but this will surely compound the effects that put them in this position. Google, Meta, Microsoft will likely survive in the language model field but OpenAI and Anthropic are so about to be gone.

[-]

dmter@reddit

i think this is logical, since for training their new models, this data is priceless. their whole business model is based on being able to use this data. if you don't like it, just use offline models