Re: Clustering (was: Neural Network Customer)

new topic     » topic index » view thread      » older message » newer message

noah smith wrote:

> --Noah's comment:
> If i remember correctly, what he's talking about here is seperating cells into
> 2 groups.  There
> are eight "factors" which can describe a given cell, some which determine if
> the cell is
> abnormal.  The problem is, there are 3 types of cells, "normal"
> run-of-the-mill cells, "stem" or
> really good cells, and "cancer" or really bad cells.  They want stem cells,
> and don't want
> cancer cells, but the same data which defines a stem cell appears to also
> define a cancer cell.
> They have to find a "line" (the "line!
> " is a seven-dimensional structure) which seperates the
> eight dimensional data set into 2 groups.

That reminds me of a siimilar problem in my field (linguistic
decipherment).

A word can belong to more than one cluster. E.g. "plant" belongs at
least
to two semantic cluster: botany and manufacturing.

The stumbling block I had encountered was that clustering algorithms
were
designed to identify points as members of one cluster and only one, or
to assign  them "in between" adjacent clusters. When in reality,
in my field, a point can very well belong to two or more clusters which
are far apart (I called them "disjunct clusters", by analogy with
"disjunct"
morphemes). That was quite some time ago. I fell into the usual traps:
unwittingly overfitting the data, using  neural nets (with such data,
they
never reach a stable state, but keep "changing their minds" -- it's
quite
funny, you feel you are dealing with a hopeless human being.
Much later, I think it was early last year or late in 1997, during an
exchange on the Voynich interest group (it's about an undeciphered
Medieval manuscript), I hit upon the seed of a clustering algorithm
that would allow points to belongn to more than one cluster. I tested
it by hand and it seemed to work. I left it at that, because then, I
was routinely using Borland Pascal and the memory allocation and
disallocation was just too much of a nightmare for me to contemplate.
(To get interesting results on the Voynich language, I would have
needed 2000x2000 matrices). Whereas now that I have become rather
fluent in Euphoria...

The other thing, perhaps worth repeating, is that I did try neural
nets on that sort of data, and that I fell flat on my face. They
didn't work for me.

new topic     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu