Re: MIT's Technology Review Article

new topic     » goto parent     » topic index » view thread      » older message » newer message

On 27 Sep 2004, at 7:00, cklester wrote:

>=20
>=20
> posted by: cklester <cklester at yahoo.com>
>=20
> Interesting technology article about Tim Berners-Lee. Hopefully it doesn'=
t
> require a subscription... :)
>=20
> in particular, kat, I'd be interested in your viewpoint on what the
> article refers to as the Semantic Web and its use in AI.
>=20
> http://www.technologyreview.com/articles/04/10/frauenfelder1004.asp?trk=
=nl

It's no more than some web domains have been using for years, and nothing=
=20
is, or can be, tightly standardised. It's the XML spec for web design,
basically. Some sites which don't use XML will use <!-- tags --> for blocks=
 of=20
"data", or they use classes linked to / defined in the css file. These not-=
quite-
semantic-tags are the way to go, imho, if there was a general agreement on=
=20
using <!-- words with only one meaning -->. Right now, i am parsing out dat=
a=20
on some domains using these tags.

The article has this line: "It still doesn=92t solve the semantic one, thou=
gh. For=20
that, the Semantic Web first gives names to the basic concepts involved in=
=20
the data: date and time, an event, a check, a transaction, temperature and=
=20
pressure, and location. These are all defined just to mean whatever they=
=20
mean in the system which produces the data" , which leave out the system=
=20
which you just bought to understand that data. It also leaves out other
human languages, which is important because the XML isn't machine
generated now, it's hand coded, in English. I've seen none in other human=
=20
languages, so only the usa-brit-aussy point of view is semantically tagged?=
=20
That makes for a pretty biased Ai, CK. The same paragraph goes on to
discuss ontologies from on high, which leaves out the human tagger, i
suppose? Then we need supercomputers that don't exist yet, just ask Lenat,=
=20
who has been working on-off on Cyc and it's predecessors since the early=
=20
1980's, it's got millions of (wo)man-years of semantic data and is still no=
t off=20
the ground. I've got some of Cyc's files, they are horrendous. I cannot
imagine that mess on everyone's PDA. (not to say it's useless mess, but the=
=20
presentation, with the repetitive nature of the tags, is disregarding the=
=20
computer's strong suit of being able to reconstruct the useless data itself=
.=20
Hint: "class inheritance")

It goes on to discuss interoperability, which i am going to assume isn't so=
lo=20
standard meanings for each and every word, but is meant to be understood=
=20
as "all the computers that have anything valuable must be left on 24-7 on=
=20
broadband connection, otherwise, all the rest of the internet will crash wa=
iting=20
on CK to turn his laptop back on".. Distributed data works only as well as =
it=20
is valid data, and it's available as needed. I am finding many commercial=
=20
hosts drop of the internet at least once an hour, due to split second break=
s=20
here and there, no matter how reliable the host is. And i did a random spot=
=20
check of some data, and found 2/3 missing, 2/3 wrong data (and no human
interested in fixing it), and 1/6 was correct. (i forget the other 1/6, but=
 it=20
wasn't good data.)

He discusses "FOAF" files, which brings up a huge wall to climb over: the=
=20
data must be available and free from artificial constraints. Much data isn'=
t=20
online or isn't indexed at all, but it would suffice to merely point to it.=
 Some=20
data is in such form on webpages that not even Google indexes it, such as=
=20
valid data buried in javascript code or linked framesets. Creating a whole =
new=20
XML file, and having a XML tag on each word in every existing file, would=
=20
bloat the internet to a crawl. I've seen 5K XML semantic files that had not=
hing=20
to say. Literally. But even files which do appear online often disappear af=
ter a=20
month, a year,, sometimes a few hours. If there was an automagic tagger
built, so as to not use a human to tag a file which has a lifetime of mere=
=20
hours,  then why not dispense with tagging, and move that tagger to the
recipient, and not spew XML/semantics all over the internet? Frankly, i hav=
e=20
decided to NOT mine some semantically XML tagged sites, because of the
page bloat.

And, if we disregard the code to process and "understand" the data, i have=
=20
occasionally presented semanticly/syntacticly tagged data here, and the
means to retrieve it was/is in strtok.e. I am looking forwards to seeing if=
 Eu=20
v2.5 spawns a fast string execution function or procedure, because
"understanding"  in mirc is just abysmally slow.

Kat

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu