(08-31-2024, 03:41 AM)Maxmars Wrote: 2000 web links... never once before examined to determine if they held child sex... used commercially, as they were, until recently... and no one is asking "how" or "why?" let alone "who?"
Maybe nobody is asking that because they already know it.
Apparently, LAION's datasets have only URLs of sites where people can get the images from, and they got those URLs from Common Crawl, which say they have 250 billion pages archived.
It's not feasible to have a person looking at all the images to decide if they show child sex or not (I know what I am saying as I have seen one by one thousands of scanned images, it's impossible to do that to millions).
Quote:And this is the new 'super AI' that threatens humanity by it's very existence? How intelligent is this artificial entity that follows the pattern of not asking questions of the nature of the reality it's being fed?
AI does not ask questions, if it was an interactive process it would be too slow. AI systems used to work with Large Language Models and similar learn by finding common elements and patterns in the data they are given.
Obviously, if they are learning, they have no previous knowledge of what they are going to look at, unless it was hardcoded in them, which would mean, in this case, that the people working with the system would need to look at child abuse images to be able to give them to the AI systems and tell them "this is no good".
That would have been illegal.
Also, AI (specially those chat bots that use the LLMs) is not intelligent.
One example my sister, who is a teacher, gave me just a couple of hours ago, is that her students tried to cheat by asking AI chat bots "how can I cheat on X", to which the chat bots answered they shouldn't cheat and didn't answer.
But if they ask "what can I do to be able to know that I am not cheating" they got some information that did allow them to cheat.
Quote:Why those website were included in the data set in the first place might be a good place to begin serious inquiry... but somehow I doubt that will happen...
It could have been a perfectly normal site.
Many years ago, I was looking for something (I don't remember what it was) on Google and got as a result a page with several images of (mild) child pornography. When I looked at the base address of the site, it was a site for an hairdresser that was probably used to spread child pornography without their knowledge (Wordpress sites are great for that, as they need to be carefully configured to close all the possible unattended accesses).
Also, if you have a program looking for links in all web pages it finds it's likely it is going to find hidden things.
Another case for which I have personal knowledge happened with a site I made and didn't properly protected. One, when I was looking at the database where the text for the site's page is stored, I saw that someone had injected some texts in those pages. The text was not visible on the pages but was accessible to web bots looking for links, and those injected texts were links for pages selling Viagra and things like that.
(08-31-2024, 08:57 AM)Lynyrd Skynyrd Wrote: I saw a related article that claimed AI is getting more "inbred", and stupider, because it's finding and scraping AI content that's posted on the internet. It can't differentiate AI from human posts.
That is going to be a real problem for AI systems using freely available data from the Internet, with so many people posting AI "created" images and AI "created" texts.
In a couple of years those systems will be irrelevant.