1. Re: Clustering (was: Neural Network Customer)

For those interested in the neural network, I have more info fer ya:

> **--------- Original Message follows...

>Okay here's how data comes:

The data will be in a ".txt" file and will be constructed so.... or in real
life it will come at greater than 100,000 cells/sec; they must be sorted
correctly.
The run is done twice, first to give the computer a test set. The second is
to actually SORT them.
So here's how the data comes....

I v1 v2 v3 v4 v5 v6 v7 v8 v9 .... vn

The "I" is the index set, the cell number.
the "vk" is the kth voltage measured off the cell. The voltage range is
currently 0-1023 or 1-1023; it will have to be upped to 0-4095 or 1-4095
Note that for any voltage channel of 2^k, the 2^k channel is the "junk"
channel and should be thrown away (this means throw ANY cell with a 2^k).

This data is then sorted into frequency of n-vectors.

Thus, for example ...

20345 1020 235 185 743 264 ...
12034 1020 235 185 743 264 ...

would both go into a vector...

[1020, 235, 185, 743, 264 ...] which would now have a value of "2"

So, the data the computer analyzes is a list of n-vectors and an integer,
the lists length being less than or equal to "I."

So now we think of an n+1 space, where there are n "voltage" dimension and
one "frequency" dimension.

Some of the cells are randomly taken and determined through a pcr-technique
what sort of cell they are (e.g. stem, blood, white, cancer, etc.). The
computer then analyzes the data given and determines "where" in the space
do you have the most likely chance of finding a stem cell, a cancer cell or
a trash cell.

Then, the computer determines where the n-1 dimensional line should be
drawn so as to:
(1) keep a certain minimum number, s, of stem cells
(2) throw away the greatest number of cancer cells (like, 99.999 <- note
the 5 nines, percent or so).

as an off the boot here's why: with too few stem cells the patient DIES. If
there are any cancer cells (this could be as little as one ... but who
REALLY knows) the patient will DIE of cancer eventually.
-- the point is that this is removing, completely, this malignant cancer,
not merely increasing life expectancy.

Note: there is a technique using some rather hairy algebra (of the matrix
variety) called "linear regression," this is the technique to "find the
line." There is also a probability technique (rather hairy, too) which
determines the "badness" of a choice of line found by the linear
regression.  The math (the theory itself) is ... er ... most likely done.
I'd have to talk with Dr. Hokanson to see if Dr. Rosenblatt has finished it
or whatnot.

I reiterate that these guys MUST talk with Dr. Hokanson ... this is HIS
project and not mine.


-j.

Noah

new topic     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu