1. Euphoria web usage analysis
Robert Craig wrote:
> Ray Smith writes:
> > With the processing power of pc's these days is speed a real issue?
> > Except for specialized applications like action games, 3d modelling,
> > simulations etc speed is almost a non issue.
>
> I just finished crunching 17 Mb of log data from RapidEuphoria.com
> The log records each page hit, each .zip downloaded, etc, along with
> the referring URL, IP address etc. etc.
>
> My Euphoria program took 1 minute to give me lots of interesting
> highly customized information. Would I want to wait half an hour
> for Python or Perl? (My ISP has a free log analyzer. It provides lots of
>
> data, but little useful information that I need to evaluate sources of
> advertising.)
>
> Speed will always be valuable.
>
Rob, are you using an Apache web server? My web host provides some
statistics, but people want more. There was some discussion on the
support forum about downloading the log file and doing analysis on their
PC. I haven't noticed anyone mentioning that they have tools to do this
on the PC. If you have Euphoria code which can analyze an Apache web
server log file then I would be happy both to use it and to promote it
on my web host as an efficient and effective way to do this analysis.
It might generate some more interest in Euphoria.
Do you have such a program that you would be willing to put in the
archive? (A quick search did not find such a program based on a search
for "apache log" in the archive.)
-J. Kenneth Riviere
2. Re: Euphoria web usage analysis
J. Kenneth Riviere writes:
> Do you have such a program that you would be willing to put in the archive?
It's really very specific to my needs.
I doubt that anyone else could make much use of it,
but it's a good example of the kind of thing
Euphoria is good at, since it requires speed,
but it's also something that I wanted to develop quickly
and play around with a lot. (without having to compile, link and
resolve machine crashes).
I'm using it right now to evaluate various "pay-per-clickthrough"
advertising sites. It tells me how many people came from
various search keywords that I bid on, and how "interested" they were
when they arrived, based on the number of extra pages that they
viewed after seeing the main page. I've found significant
differences in the "quality" of the visitors that various
places send me, and of course differences depending on
what the keyword is. This will influence which places I continue
with, and how much I bid for various words.
A typical line in my log file looks like (wrapped onto 5 lines here):
195.92.168.171 - - [03/feb/2002:09:59:55 -0800]
"get /spellchk.zip http/1.1"
200 32768
"http://www.programmersheaven.com/search/download.asp?fileid=14415"
"mozilla/4.6 [en-gb]c-cck-mcd netscapeonline.co.uk (win98; i)"
It shows the IP address of the visitor, the date, the file that he accessed,
info on the success of the access, the URL the person was referred from,
what kind of browser they are using, their o/s etc.
By the way, there were 28,533 visits to the RapidEuphoria Web site
in January, smashing the previous record.
Here's the code, for what it's worth.
Sorry about the indentation and lack of comments.
-- extract stats from RapidEuphoria.com access_log
without type_check
include sort.e
constant TO_LOWER = 'a' - 'A'
function fast_lower(sequence s)
-- Faster than the standard lower().
-- Speed of lower() is very important for "any-case" search.
integer c
for i = 1 to length(s) do
c = s[i]
if c <= 'Z' then
if c >= 'A' then
s[i] = c + TO_LOWER
end if
end if
end for
return s
end function
sequence target_list, target_count, referrer_list, referrer_count
integer line_count, gif_count
integer total_referrers, unknown_referrers
sequence referrer
sequence ip_address
sequence cl
cl = command_line()
if length(cl) < 3 then
puts(2, "Usage: ex stats access_log\n")
abort(1)
end if
sequence special_referrer, special_target, special_words
special_referrer = {
"freshmeat",
"linkexchange",
"directhit.com",
"google.com",
"altavista.com"
}
special_target = {
"?sp981", -- all Sprinks
"?bayf", -- Bay9 freeware
"?bayc", -- Bay9 C
"?baysh", -- Bay9 Shareware
"?bayso", -- Bay9 Software
"?bayfs", -- Bay9 Free Software
"?bayd", -- Bay9 DOS
"?gc981", -- all goCLick
"?fw981", -- Overture freeware
"?pl981", -- Overture programming language
"?f981", -- all FindWhat
"?7se" -- all 7Search
}
constant S_WORD = 1,
S_LIST = 2,
S_DUPS = 3
constant L_EXTRA = 1,
L_IP = 2,
L_LINE = 3
special_words = special_referrer & special_target
for i = 1 to length(special_words) do
special_words[i] = {special_words[i], {}, 0}
end for
procedure visitor(sequence word)
-- a person has entered with a special target or referrer
integer dups
-- ignore visualbasic from sprinks
-- if equal(word, "?sp981") then
-- if not match("basic", referrer) and not match("visual", referrer) then
-- if not match("cplus", referrer) then
-- return
-- end if
-- end if
for i = 1 to length(special_words) do
if equal(word, special_words[i][S_WORD]) then
dups = special_words[i][S_DUPS]
for j = 1 to length(special_words[i][S_LIST]) do
if equal(ip_address, special_words[i][S_LIST][j][L_IP]) then
dups += 1
exit
end if
end for
special_words[i][S_LIST] = prepend(special_words[i][S_LIST],
{0, ip_address, line_count})
special_words[i][S_DUPS] = dups
return
end if
end for
puts(2, "Couldn't find " & word & '\n')
end procedure
procedure credit(sequence ip_address)
-- give credit for this ip_address to special target or referrer
sequence list, temp
for i = 1 to length(special_words) do
list = special_words[i][S_LIST]
for j = 1 to length(list) do
if line_count > list[j][L_LINE]+3000 then
exit
end if
if equal(ip_address, list[j][L_IP]) then
if line_count < list[j][L_LINE]+3000 then
special_words[i][S_LIST][j][L_EXTRA] += 1
special_words[i][S_LIST][j][L_LINE] = line_count
-- move it to first position
temp = special_words[i][S_LIST][j]
special_words[i][S_LIST][j] = special_words[i][S_LIST][1]
special_words[i][S_LIST][1] = temp
exit -- allow double credit for two or more words,
-- but not for the same word
end if
end if
end for
end for
end procedure
procedure gather_stats()
-- one pass through the access log
integer q, s, p, special
object line
sequence target
integer log_file
log_file = open(cl[3], "r")
if log_file = -1 then
puts(2, "Couldn't open " & cl[3] & '\n')
end if
target_list = {}
target_count = {}
referrer_list = {}
referrer_count = {}
line_count = 0
gif_count = 0
total_referrers = 0
unknown_referrers = 0
while 1 do
line = gets(log_file)
if atom(line) then
exit
end if
line_count += 1
line = fast_lower(line)
if match(".gif ", line) or match(".jpg ", line) then
gif_count += 1
else
q = find(' ', line)
if q then
ip_address = line[1..q-1]
else
ip_address = ""
end if
credit(ip_address)
q = find('"', line)
if q then
-- target address
line = line[q+1..length(line)]
s = find('/', line)
if s then
target = "/"
while 1 do
s += 1
if s > length(line) or line[s] = ' ' then
exit
end if
target &= line[s]
end while
line = line[s+1..length(line)]
p = find(target, target_list)
if p then
target_count[p] += 1
else
target_list = append(target_list, target)
target_count = append(target_count, 1)
end if
end if
-- referrer address
q = find('"', line)
if q then
line = line[q+1..length(line)]
q = find('"', line)
if q then
referrer = ""
while 1 do
q += 1
if q > length(line) or line[q] = '"' then
exit
end if
referrer &= line[q]
end while
if not match("rapideuphoria", referrer) and
not match("addr.com", referrer) then
-- coming in from outside world
special = 0
for i = 1 to length(special_target) do
if match(special_target[i], target) then
visitor(special_target[i])
exit
end if
end for
if not special then
for i = 1 to length(special_referrer) do
if match(special_referrer[i], referrer) then
visitor(special_referrer[i])
exit
end if
end for
end if
total_referrers += 1
if length(referrer) < 3 then
unknown_referrers += 1
end if
p = find(referrer, referrer_list)
if p then
referrer_count[p] += 1
else
referrer_list = append(referrer_list, referrer)
referrer_count = append(referrer_count, 1)
end if
end if
end if
end if
end if
end if
end while
for i = 1 to length(target_list) do
target_list[i] = {target_count[i], target_list[i]}
end for
for i = 1 to length(referrer_list) do
referrer_list[i] = {referrer_count[i], referrer_list[i]}
end for
close(log_file)
end procedure
atom t
t = time()
gather_stats()
puts(1, "Targets:\n")
target_list = sort(target_list)
printf(1, "%d total .gifs\n", gif_count)
for i = length(target_list) to 1 by -1 do
printf(1, "%d %s\n", target_list[i])
end for
puts(1, "\nReferrers:\n")
referrer_list = sort(referrer_list)
for i = length(referrer_list) to 1 by -1 do
printf(1, "%d %s\n", referrer_list[i])
end for
printf(1, "\n\nTotal Lines: %d\n", line_count)
printf(1, "Total External Referrers: %d\n", total_referrers)
printf(1, "Total Unknown Referrers: %d\n\n", unknown_referrers)
integer total, v, max, extra
sequence max_ip
for i = 1 to length(special_words) do
max = -1
max_ip = ""
printf(1, "Special word: %s\n", {special_words[i][S_WORD]})
v = length(special_words[i][S_LIST])
printf(1, "Total: %d\n", v)
if v > 0 then
printf(1, "Total Dups: %d (%.0f%%)\n", {special_words[i][S_DUPS],
100 * special_words[i][S_DUPS] / v})
end if
total = 0
for j = 1 to length(special_words[i][S_LIST]) do
extra = special_words[i][S_LIST][j][L_EXTRA]
if extra > max then
max = extra
max_ip = special_words[i][S_LIST][j][L_IP]
end if
if extra > 25 then
extra = 25 -- avoid huge excesses
end if
total += extra
end for
printf(1, "Total extra pages: %d\n", total)
printf(1, "Max extra pages for one visitor: %d by %s\n", {max, max_ip})
if v > 0 then
printf(1, "Average extra pages: %.2f\n", total / v)
end if
puts(1, '\n')
end for
puts(2, '\n')
print(2, time()-t)
Regards,
Rob Craig
Rapid Deployment Software
http://www.RapidEuphoria.com
3. Re: Euphoria web usage analysis
On Fri, 15 Feb 2002 18:17:47 -0500, you wrote:
>function fast_lower(sequence s)
>-- Faster than the standard lower().
>-- Speed of lower() is very important for "any-case" search.
...
> c=s[i]
...
> s[i] = c + TO_LOWER
Hmmm, so is that (last line) faster than s[i]+=TO_LOWER then?
Pete
4. Re: Euphoria web usage analysis
Ray Smith writes:
> Just out of curosity how many people download
> Euphoria from the RDS web site each month?
> * over the last few months with the new release, and
> * over a nonrmal month with no releases?
Unfortunately, I have no good stats on that because most
people download Euphoria from the .zips that I've stored
on CompuServe and AOL. I had to move the interpreter .zip
off site to avoid blowing my 4 Gb/month bandwidth limit.
I only recently added a 3rd link on my own site.
I'm currently running at almost 6 Gb/month and I'm waiting for
my hosting service to blow the whistle and start charging me more.
Regards,
Rob Craig
Rapid Deployment Software
http://www.RapidEuphoria.com
5. Re: Euphoria web usage analysis
- Posted by rforno at tutopia.com
Feb 16, 2002
Rob:
Do you know why indentation of programs written with 'ed' is screwed up when
you post the file to the list?
----- Original Message -----
From: "Robert Craig" <rds at RapidEuphoria.com>
To: "EUforum" <EUforum at topica.com>
Subject: Re: Euphoria web usage analysis
>
> J. Kenneth Riviere writes:
> > Do you have such a program that you would be willing to put in the
archive?
>
> It's really very specific to my needs.
> I doubt that anyone else could make much use of it,
> but it's a good example of the kind of thing
> Euphoria is good at, since it requires speed,
> but it's also something that I wanted to develop quickly
> and play around with a lot. (without having to compile, link and
> resolve machine crashes).
>
> I'm using it right now to evaluate various "pay-per-clickthrough"
> advertising sites. It tells me how many people came from
> various search keywords that I bid on, and how "interested" they were
> when they arrived, based on the number of extra pages that they
> viewed after seeing the main page. I've found significant
> differences in the "quality" of the visitors that various
> places send me, and of course differences depending on
> what the keyword is. This will influence which places I continue
> with, and how much I bid for various words.
>
> A typical line in my log file looks like (wrapped onto 5 lines here):
>
> 195.92.168.171 - - [03/feb/2002:09:59:55 -0800]
> "get /spellchk.zip http/1.1"
> 200 32768
> "http://www.programmersheaven.com/search/download.asp?fileid=14415"
> "mozilla/4.6 [en-gb]c-cck-mcd netscapeonline.co.uk (win98; i)"
>
> It shows the IP address of the visitor, the date, the file that he
accessed,
> info on the success of the access, the URL the person was referred from,
> what kind of browser they are using, their o/s etc.
>
> By the way, there were 28,533 visits to the RapidEuphoria Web site
> in January, smashing the previous record.
>
> Here's the code, for what it's worth.
> Sorry about the indentation and lack of comments.
>
> -- extract stats from RapidEuphoria.com access_log
> without type_check
>
> include sort.e
>
> constant TO_LOWER = 'a' - 'A'
> function fast_lower(sequence s)
> -- Faster than the standard lower().
> -- Speed of lower() is very important for "any-case" search.
> integer c
>
> for i = 1 to length(s) do
> c = s[i]
> if c <= 'Z' then
> if c >= 'A' then
> s[i] = c + TO_LOWER
> end if
> end if
> end for
> return s
> end function
>
> sequence target_list, target_count, referrer_list, referrer_count
> integer line_count, gif_count
> integer total_referrers, unknown_referrers
> sequence referrer
> sequence ip_address
> sequence cl
>
> cl = command_line()
> if length(cl) < 3 then
> puts(2, "Usage: ex stats access_log\n")
> abort(1)
> end if
>
> sequence special_referrer, special_target, special_words
>
> special_referrer = {
> "freshmeat",
> "linkexchange",
> "directhit.com",
> "google.com",
> "altavista.com"
> }
>
> special_target = {
> "?sp981", -- all Sprinks
> "?bayf", -- Bay9 freeware
> "?bayc", -- Bay9 C
> "?baysh", -- Bay9 Shareware
> "?bayso", -- Bay9 Software
> "?bayfs", -- Bay9 Free Software
> "?bayd", -- Bay9 DOS
> "?gc981", -- all goCLick
> "?fw981", -- Overture freeware
> "?pl981", -- Overture programming language
> "?f981", -- all FindWhat
> "?7se" -- all 7Search
> }
>
> constant S_WORD = 1,
> S_LIST = 2,
> S_DUPS = 3
>
> constant L_EXTRA = 1,
> L_IP = 2,
> L_LINE = 3
>
> special_words = special_referrer & special_target
> for i = 1 to length(special_words) do
> special_words[i] = {special_words[i], {}, 0}
> end for
>
> procedure visitor(sequence word)
> -- a person has entered with a special target or referrer
> integer dups
>
> -- ignore visualbasic from sprinks
> -- if equal(word, "?sp981") then
> -- if not match("basic", referrer) and not match("visual", referrer)
then
> -- if not match("cplus", referrer) then
> -- return
> -- end if
> -- end if
>
> for i = 1 to length(special_words) do
> if equal(word, special_words[i][S_WORD]) then
> dups = special_words[i][S_DUPS]
> for j = 1 to length(special_words[i][S_LIST]) do
> if equal(ip_address, special_words[i][S_LIST][j][L_IP]) then
> dups += 1
> exit
> end if
> end for
> special_words[i][S_LIST] = prepend(special_words[i][S_LIST],
> {0, ip_address, line_count})
> special_words[i][S_DUPS] = dups
> return
> end if
> end for
> puts(2, "Couldn't find " & word & '\n')
> end procedure
>
> procedure credit(sequence ip_address)
> -- give credit for this ip_address to special target or referrer
> sequence list, temp
>
> for i = 1 to length(special_words) do
> list = special_words[i][S_LIST]
> for j = 1 to length(list) do
> if line_count > list[j][L_LINE]+3000 then
> exit
> end if
> if equal(ip_address, list[j][L_IP]) then
> if line_count < list[j][L_LINE]+3000 then
> special_words[i][S_LIST][j][L_EXTRA] += 1
> special_words[i][S_LIST][j][L_LINE] = line_count
>
> -- move it to first position
> temp = special_words[i][S_LIST][j]
> special_words[i][S_LIST][j] = special_words[i][S_LIST][1]
> special_words[i][S_LIST][1] = temp
> exit -- allow double credit for two or more words,
> -- but not for the same word
> end if
> end if
> end for
> end for
> end procedure
>
> procedure gather_stats()
> -- one pass through the access log
> integer q, s, p, special
> object line
> sequence target
> integer log_file
>
> log_file = open(cl[3], "r")
> if log_file = -1 then
> puts(2, "Couldn't open " & cl[3] & '\n')
> end if
> target_list = {}
> target_count = {}
> referrer_list = {}
> referrer_count = {}
> line_count = 0
> gif_count = 0
>
> total_referrers = 0
> unknown_referrers = 0
>
> while 1 do
> line = gets(log_file)
> if atom(line) then
> exit
> end if
> line_count += 1
> line = fast_lower(line)
>
> if match(".gif ", line) or match(".jpg ", line) then
> gif_count += 1
> else
> q = find(' ', line)
> if q then
> ip_address = line[1..q-1]
> else
> ip_address = ""
> end if
>
> credit(ip_address)
>
> q = find('"', line)
> if q then
> -- target address
> line = line[q+1..length(line)]
> s = find('/', line)
> if s then
> target = "/"
> while 1 do
> s += 1
<snip>
> if v > 0 then
> printf(1, "Average extra pages: %.2f\n", total / v)
> end if
> puts(1, '\n')
> end for
> puts(2, '\n')
>
> print(2, time()-t)
>
> Regards,
> Rob Craig
> Rapid Deployment Software
> http://www.RapidEuphoria.com
>
>
>
>
6. Re: Euphoria web usage analysis
rforno writes:
> Do you know why indentation of programs written with 'ed'
> is screwed up when you post the file to the list?
ed saves Euphoria files with tabs.
Outlook Express seems to have difficulties with tabs.
Next time I'll write a tiny filter program to replace tabs with blanks.
The tabs save a bit of space, and save the interpreter
a miniscule amount of time when scanning/parsing but
to avoid glitches like this, maybe ed should expand to all blanks
when saving a Euphoria file.
Regards,
Rob Craig
Rapid Deployment Software
http://www.RapidEuphoria.com