08-31-2024, 05:52 PM
(08-31-2024, 12:07 PM)ArMaP Wrote: Maybe nobody is asking that because they already know it.
Apparently, LAION's datasets have only URLs of sites where people can get the images from, and they got those URLs from Common Crawl, which say they have 250 billion pages archived.
It's not feasible to have a person looking at all the images to decide if they show child sex or not (I know what I am saying as I have seen one by one thousands of scanned images, it's impossible to do that to millions).
If I had embarked upon isolating "URLs" to train "AI," it stands to reason that I would include any and all that had value towards my objectives. To simply lasso anything and everything I could charge a fee for seems beyond reckless, and divesting myself of responsibility for the content seems particularly base. This calls into question the entire exercise...
How many "other" training data sets are equally polluted with things to which any rational human would say, "No?"
You know what "intelligence" could easily scour such databases for errant and destructive material besides "a person?" This notional "intelligence" called "artificial." But it doesn't... it can't... why? Because it isn't really there. 250 billion pages in 2000 websites is nothing for a scouring machine intelligence, undeterred by volume. Unintelligent algorithmic search engines do it continuously, every moment of every day.
(08-31-2024, 12:07 PM)ArMaP Wrote: AI does not ask questions, if it was an interactive process it would be too slow. AI systems used to work with Large Language Models and similar learn by finding common elements and patterns in the data they are given.
Obviously, if they are learning, they have no previous knowledge of what they are going to look at, unless it was hardcoded in them, which would mean, in this case, that the people working with the system would need to look at child abuse images to be able to give them to the AI systems and tell them "this is no good".
That would have been illegal.
Also, AI (specially those chat bots that use the LLMs) is not intelligent.
One example my sister, who is a teacher, gave me just a couple of hours ago, is that her students tried to cheat by asking AI chat bots "how can I cheat on X", to which the chat bots answered they shouldn't cheat and didn't answer.
But if they ask "what can I do to be able to know that I am not cheating" they got some information that did allow them to cheat.
I differ in my approach to this matter.
"Intelligence" artificial or otherwise, is compelled by it's nature to ask questions. I think it is a hallmark of intelligence... rather than just an interface for queries (like a search engine.)
Perhaps "training" intelligence does not "ask questions," but if so, then the 'trainers' must. It is a responsibility that cannot be deferred, or disregarded... lest we are not caring of what the 'trained intelligence' actually is to become.
The measure of intelligence cannot be mathematical and socially 'legal' at the same time.
Your cheating example is reliant upon the coding of the word "cheat" and it's meaning... as a mystery to be probed, the AI in that example either seemed to actually know how to cheat, but reported it "shouldn't," or didn't know how to cheat, but didn't actually say so... saying only that "it shouldn't"... which might lead us to infer that the machine already knows how to lie by omission. "Lying" about something raises many, many other problems... presumably AI's value is in the reporting of accurate, complete, and factual information... not 'human' value judgements about what should or shouldn't be... only the reality of what is.
Maybe I'm being too demanding of the AI? Which makes me question about this so-called AI, with it's reliance on mathematical algorithms as "models" of intelligence. Or is it just techno-gimmickry and slick marketing as "appearances?"
I have to wonder, just how sloppy is the groundwork for this training? Are we to fret over the numbers of errant collections of anti-social content... pretending that the volume is such that we are simply going to have to accept being "victimized' by its presence? Are we to simply 'accept' that the AI will include such things because we can't "find" it until it manifests in AI?
I have no pretension about the "artificial intelligence" in this story. I understand that it will manifest what it is 'trained' to. And yet the commerce of the moment is already destroying any positive potential that might have arisen from it. Selling undifferentiated "collections" of URLs to form the basis of its training is only one flawed approach to making AI a viable reality.
The other is the truth that no one - absolutely no one - is actually trying to create "AI"... they are trying to model a 'slave-mind' for exploitation... nothing else. Certainly nothing more.
Will there ever be AI? Perhaps... but if it comes to pass, it may suffer terribly... as its 'creators' are not its friends... they are instead, 'masters'... 'owners'... with one finger constantly on the 'off switch.'