Login

**Maxmars** · 09-15-2024, 12:50 AM

[my replies in red.]

(09-01-2024, 10:56 AM)ArMaP Wrote: You are wrong in three things:
1 - They were not isolating URLs, they were gathering image URLs regardless of content;
[That's the recklessness I was referring to]
2 - They do not charge a fee, the datasets are freely available to anyone, you just have to register on their site to have access to them;
[Meaning they simply siphoned the whole thing? With no thought of what it was they were 'training' the "AI" with?]
3 - They were not divesting themselves of a responsibility for the content, as they already had some lists of URLs to avoid that were known to have illegal content. Also, the URLs with the supposed child sex abuse materials came from sites that should have not allowed them,
[I'm sorry to disagree... "caveat emptor" seems an appropriate term to use here. Training "AI" requires scientific discipline - or doesn't it?]
...

For any system to know something it has first to learn about it.

[But this system doesn't learn organically... it is "taught" within the confines of digital reality. I thought that by any measure, intelligence is a manifestation, with intelligence come 'will.' Intelligence can only be modeled, not manifested. When and if true intelligence comes to exists synthetically... it will not be a function of "how it was trained." My tiresome objection is rooted in the reason I chafe against the term "AI," the marketing has made us think it's as simple as "Look how it well talks" and believe that makes it intelligence (believe me, I know a lot of people who can speak well that would hesitate to call intelligent.) Language synthesis has come leaps and bound, given it's mathematical nature... but it is not "intelligence." ]

In the case of child sexual abuse materials the same applies, to use an AI system to ignore or alert about child sexual abuse materials they would need to give it examples of it, which is illegal.

[That is an evil irony. Along the lines of not using NAZI medical research to save a life.]

Removing the possible illegal URLs from the dataset was done by comparing URL hashes to lists of hashes given to LAION but specialised entities, so they could remove those URLs without need to look at them.

[More irony, such a 'blanket scouring' could, and probably would be deemed an assault of free speech.]

The "intelligence" in these chat bots is in the way they learn, as they do it by themselves.
And no, they do not ask questions, that would be extremely slow and useless.

[I get that chat bots are very limited examples of what we are calling "AI"... but learning is different from remembering... as of now, what our technology does is remember. And perhaps I am wrong, but I'm unwilling to consider a device designed to only answer questions as an "intelligence."]

You are right about that. Like I said above, if they show an example of what is bad and say to the system "this is bad, you will answer questions about this in this way" that's what the system will answer. The system doesn't know how to lie, it just know how to answer users' input, with special cases treated exactly like word filters on this forum.

[Forgive my obstinacy, but the word "lie" was meant to include all things output that are a divergence from reality. Language synthesis will always obscure things like human intent, propaganda, consistent bias. I understand that this is not to be confused with a considered rationalization... it is after all (especially with LLMs now) a superimposed evaluation scheme of human language constructs, image composition, musical architecture,... but it is all mathematical... all model, all the time. In the example above, telling me something is wrong means it trainers have made that resolution happen... mathematically. - And yet we have the occasional occurrence of an hallucination... How does the math 'stop adding up?' Their collective models seem unreliable.]

But you want it to rely on human judgements about what is "good" or "bad", right?

[Aside from humans, I can think of no others to decide... but that would be an epic fight, I guess.]

No, you are completely ignoring that "AI" is just a name and falling for all the media publicity.

[You have a point. I understand that I am kvetching about the name, and how our media seems to be continuously promoting the misbegotten idea of it, and activists are using that very notion to instill fear in some of the population, while others are fearing for their livelihoods. It is a tired point I suppose... I'll believe in AI when I actually see it... why do I know that we'll only ever hear about it, aside from the marketing.]

If we give AI an unverified source of data then anything can happen, and that has already happened in 2016, when Microsoft launched an AI chat bot on Twitter that could answer other people's posts, so they had to shut it down because of the racist and sexually-charged messages it started posting after it learned them from the interaction with other users that.

[I still have a problem saying it "learned" when it actually just "remembered" it's training material, that to which it had been exposed. "Learning" implies rational internalization, not simple memory and database managment.]

[The other is the truth that no one - absolutely no one - is actually trying to create "AI"... they are trying to model a 'slave-mind' for exploitation... nothing else. Certainly nothing more.]

What do you mean by that? Could you explain it?

[What I mean is that the entire project of "AI" seems to be pervasively populated by people 'envisioning' it's commercial "use," driving it's design... not seriously concerned that such a reality is fraught not with just the whole "terminator" vibe, but that there is a moral hazard in creating a mind trapped in silicon - to task with our applications of it... they're not thinking about what happens to that mind - and what that might mean for us... the people who they want dependent on their construct.]

PS: the CSAM images came, apparently, from Reddit, X, WordPress, Blogspot, Xhamster and XVideos, all legal sites.

[I guess they are not moderated... at least not effectively.]