09-01-2024, 10:56 AM
(08-31-2024, 05:52 PM)Maxmars Wrote: If I had embarked upon isolating "URLs" to train "AI," it stands to reason that I would include any and all that had value towards my objectives. To simply lasso anything and everything I could charge a fee for seems beyond reckless, and divesting myself of responsibility for the content seems particularly base. This calls into question the entire exercise...
You are wrong in three things:
1 - They were not isolating URLs, they were gathering image URLs regardless of content;
2 - They do not charge a fee, the datasets are freely available to anyone, you just have to register on their site to have access to them;
3 - They were not divesting themselves of a responsibility for the content, as they already had some lists of URLs to avoid that were known to have illegal content. Also, the URLs with the supposed child sex abuse materials came from sites that should have not allowed them,
Quote:How many "other" training data sets are equally polluted with things to which any rational human would say, "No?"
It's impossible to know, because, unlike LAION, that creates and publishes the datasets for anyone to use, other entities like OpenAI and Google keep their datasets secret, so nobody knows what there's in those.
Quote:You know what "intelligence" could easily scour such databases for errant and destructive material besides "a person?" This notional "intelligence" called "artificial." But it doesn't... it can't... why? Because it isn't really there. 250 billion pages in 2000 websites is nothing for a scouring machine intelligence, undeterred by volume. Unintelligent algorithmic search engines do it continuously, every moment of every day.
You still don't get it.
For any system to know something it has first to learn about it.
A machine doesn't have a "bad/good" way of looking at things, to them is just data, do they would need someone to give them examples of what is wrong and what is right, to be able to analyse one image and consider it "bad" or "good".
If you want to create a system that allows only images that show clowns you need to give it images with and without clowns and tell if "this is a clown", "this is not a clown", so it can look at any other image and try to decide if it shows a clown or not.
In the case of child sexual abuse materials the same applies, to use an AI system to ignore or alert about child sexual abuse materials they would need to give it examples of it, which is illegal.
Removing the possible illegal URLs from the dataset was done by comparing URL hashes to lists of hashes given to LAION but specialised entities, so they could remove those URLs without need to look at them.
Quote:"Intelligence" artificial or otherwise, is compelled by it's nature to ask questions. I think it is a hallmark of intelligence... rather than just an interface for queries (like a search engine.)
The "intelligence" in these chat bots is in the way they learn, as they do it by themselves.
And no, they do not ask questions, that would be extremely slow and useless.
Quote:Your cheating example is reliant upon the coding of the word "cheat" and it's meaning... as a mystery to be probed, the AI in that example either seemed to actually know how to cheat, but reported it "shouldn't," or didn't know how to cheat, but didn't actually say so... saying only that "it shouldn't"... which might lead us to infer that the machine already knows how to lie by omission.
You are right about that. Like I said above, if they show an example of what is bad and say to the system "this is bad, you will answer questions about this in this way" that's what the system will answer. The system doesn't know how to lie, it just know how to answer users' input, with special cases treated exactly like word filters on this forum.
Quote:"Lying" about something raises many, many other problems... presumably AI's value is in the reporting of accurate, complete, and factual information... not 'human' value judgements about what should or shouldn't be... only the reality of what is.
But you want it to rely on human judgements about what is "good" or "bad", right?
Quote:Maybe I'm being too demanding of the AI?
No, you are completely ignoring that "AI" is just a name and falling for all the media publicity.
Quote:Which makes me question about this so-called AI, with it's reliance on mathematical algorithms as "models" of intelligence. Or is it just techno-gimmickry and slick marketing as "appearances?"
It is.
Quote:I have to wonder, just how sloppy is the groundwork for this training? Are we to fret over the numbers of errant collections of anti-social content... pretending that the volume is such that we are simply going to have to accept being "victimized' by its presence? Are we to simply 'accept' that the AI will include such things because we can't "find" it until it manifests in AI?
If we give AI an unverified source of data then anything can happen, and that has already happened in 2016, when Microsoft launched an AI chat bot on Twitter that could answer other people's posts, so they had to shut it down because of the racist and sexually-charged messages it started posting after it learned them from the interaction with other users that.
Quote:I have no pretension about the "artificial intelligence" in this story. I understand that it will manifest what it is 'trained' to.
No, it will act based on what it is trained with, those systems train themselves.
Quote:And yet the commerce of the moment is already destroying any positive potential that might have arisen from it. Selling undifferentiated "collections" of URLs to form the basis of its training is only one flawed approach to making AI a viable reality.
LAION wasn't selling a thing, and the bigger dataset you train an AI system with the better it gets, that's why they have a list of 5 billion (American billions) images, some with associated texts in English, some with text in other languages and some without any associated text, so anyone can use them to train their AI systems in any subset of that dataset they want.
Quote:The other is the truth that no one - absolutely no one - is actually trying to create "AI"... they are trying to model a 'slave-mind' for exploitation... nothing else. Certainly nothing more.
What do you mean by that? Could you explain it?
PS: the CSAM images came, apparently, from Reddit, X, WordPress, Blogspot, Xhamster and XVideos, all legal sites.