1. Help needed with walk_dir()
- Posted by hacker Aug 03, 2010
- 1155 views
It might help if I explain what I want to do first. I would like to select a file from a folder, and then use walk_dir() to select and process all files in the same directory, but in a particular order based on the timestamp of the file. So I will select the oldest and then need the program to select progressively "newer" files which will be processed in turn.
The default in walk_dir() is to visit files in alphabetical order, and the manual says if you want to use a different order you should:
"set the global integer my_dir to the routine id of your own modified dir() function that sorts the directory entries differently. See the default dir() function in file.e."
I've looked at the example in "search.ex" but I'm not sure how to write the dir() function which will achieve what I want. Do I need to use custom_sort()? Any help appreciated, thanks in advance!
2. Re: Help needed with walk_dir()
- Posted by dcole Aug 03, 2010
- 1111 views
Hello Hacker,
I believe you need dir() more than walk_dir().
Walk_dir() is used more for serching in all sub-directories of a folder or drive.
Syntax: include file.e x = dir(st)
Description: Return directory information for the file or directory named by st. If there is no file or directory with this name then -1 is returned. On Windows and DOS st can contain * and ? wildcards to select multiple files.
This information is similar to what you would get from the DOS DIR command. A sequence is returned where each element is a
sequence that describes one file or subdirectory.
If st names a directory you may have entries for "." and "..", just as with the DOS DIR command. If st names a file then x will have just one entry, i.e. length(x) will be 1. If st contains wildcards you may have multiple entries.
Each entry contains the name, attributes and file size as well as the year, month, day, hour, minute and second of the last
modification. You can refer to the elements of an entry with the following constants defined in file.e:
global constant D_NAME = 1, D_ATTRIBUTES = 2, D_SIZE = 3,
D_YEAR = 4, D_MONTH = 5, D_DAY = 6,
D_HOUR = 7,
D_MINUTE = 8, D_SECOND = 9
The attributes element is a string sequence containing characters chosen from:
'd' directory 'r' read only file 'h' hidden file 's' system file 'v' volume-id entry 'a' archive file
A normal file without special attributes would just have an empty
string, "", in this field.
Comments: The top level directory, e.g. c:\ does not have "." or ".." entries.
This function is often used just to test if a file or directory exists.
Under WIN32, st can have a long file or directory name anywhere in the path.
Under Linux, the only attribute currently available is 'd'.
DOS32: The file name returned in D_NAME will be a standard DOS
8.3 name. (See Archive Web page for a better solution).
WIN32: The file name returned in D_NAME will be a long file name.
Example:
d = dir(current_dir())
d might have: { {".", "d", 0 1994, 1, 18, 9, 30, 02}, {"..", "d", 0 1994, 1, 18, 9, 20, 14}, {"fred", "ra", 2350, 1994, 1, 22, 17, 22, 40}, {"sub", "d" , 0, 1993, 9, 20, 8, 50, 12}
}
d[3][D_NAME] would be "fred"
Example Programs: bin\search.ex, bin\install.ex
See Also: wildcard_file, current_dir, open
Copied from F1 Jockey by Dan Everingham.
Don Cole
3. Re: Help needed with walk_dir()
- Posted by jimcbrown (admin) Aug 03, 2010
- 1093 views
It might help if I explain what I want to do first. I would like to select a file from a folder, and then use walk_dir() to select and process all files in the same directory, but in a particular order based on the timestamp of the file. So I will select the oldest and then need the program to select progressively "newer" files which will be processed in turn.
The default in walk_dir() is to visit files in alphabetical order, and the manual says if you want to use a different order you should:
"set the global integer my_dir to the routine id of your own modified dir() function that sorts the directory entries differently. See the default dir() function in file.e."
I've looked at the example in "search.ex" but I'm not sure how to write the dir() function which will achieve what I want. Do I need to use custom_sort()? Any help appreciated, thanks in advance!
Basically, walk_dir() will call a function to sort the directory listing. By default it uses sort() which sorts in alpabetical order.
You can write your own sort function from scratch to sort the directory listing differently (or, if you don't want to sort at all, use the following identity function to tell walk_dir() to not sort at all but use the "native" order:)
function identity(object x) return x end function constant identity_id = routine_id("identity")
You can use custom_sort() with walk_dir() if you want. custom_sort() requires that you write a comparator function (that looks at two directory entries and decides which one is greater than the other), and uses that to determine how to sort the directory listing.
4. Re: Help needed with walk_dir()
- Posted by DerekParnell (admin) Aug 03, 2010
- 1112 views
It might help if I explain what I want to do first. I would like to select a file from a folder, and then use walk_dir() to select and process all files in the same directory, but in a particular order based on the timestamp of the file. So I will select the oldest and then need the program to select progressively "newer" files which will be processed in turn.
Here is some example code, written using Euphoria v4 ...
include std/filesys.e include std/sort.e include std/io.e function process_file(sequence path_name, sequence item) -- this function accepts two sequences as arguments -- it displays all C/C++ source files and their sizes if find('d', item[D_ATTRIBUTES]) then return 0 -- Ignore directories end if sequence t t = fileext(item[D_NAME]) if not find(t, {"c","h","cpp","hpp","cp"}) then return 0 -- ignore non-C/C++ files end if writefln(1, "[][][]: [] []/[z:2]/[z:2] [z:2]:[z:2]:[z:2].[z:3]", {path_name, {SLASH}, item[D_NAME], item[D_SIZE], item[D_YEAR], item[D_MONTH], item[D_DAY], item[D_HOUR], item[D_DAY], item[D_SECOND], item[D_MILLISECOND]}) return 0 -- keep going end function function my_dir(sequence path) object d d = dir(path) if atom(d) then return d end if -- Sort in ascending time stamp. return sort_columns(d, {D_YEAR, D_MONTH, D_DAY, D_HOUR, D_MINUTE, D_SECOND, D_MILLISECOND}) end function integer exit_code sequence cmds cmds = command_line() exit_code = walk_dir(cmds[3], routine_id("process_file"), 1, routine_id("my_dir"))
5. Re: Help needed with walk_dir()
- Posted by hacker Aug 03, 2010
- 1146 views
Thanks Derek and Jim, this is great!
I actually found an example on the old forum, posted by Pete Eberlein.
include sort.e include file.e function by_date(sequence file1, sequence file2) -- compare two files by date, for custom_sort return compare(file1[D_YEAR..D_SECOND], file2[D_YEAR..D_SECOND]) end function function dir_oldest_first(sequence path) -- Custom directory sorting function for walk_dir(). object d d = dir(path) if atom(d) then return d end if return custom_sort(routine_id("by_date"), d) end function my_dir = routine_id("dir_oldest_first") -- for walk_dir function process(sequence path, sequence dirinfo) path = dirinfo[1..1] & reverse(dirinfo[4..6]) & dirinfo[7..9] printf(1,"%s %02d/%02d/%04d %02d:%02d:%02d\n", path) return 0 -- carry on end function if walk_dir(".", routine_id("process"), 0) then puts(1, "walk_dir error") end if
Problem is, I don't really understand how it works, I'll have to trace through the code a few times. I've never really got my head around routine_id() and functions like call_back(), what I really need is a sort of dummies guide to "advanced" euphoria programming.
I'm using version 3.1.1, I notice that in the ver 4.0 example walk_dir() has 4 parameters and not 3...
6. Re: Help needed with walk_dir()
- Posted by PeteE Aug 03, 2010
- 1163 views
Thanks Derek and Jim, this is great!
I actually found an example on the old forum, posted by Pete Eberlein.
Wow, that was a long time ago. I don't recognize the code at all.
Problem is, I don't really understand how it works, I'll have to trace through the code a few times. I've never really got my head around routine_id() and functions like call_back(), what I really need is a sort of dummies guide to "advanced" euphoria programming.
If all you want is a single directory, you can just use the dir_oldest_first() function from my example, and ignore walk_dir() altogether. I updated it below to move the routine_id() call to a constant, since I suspect it could allocate memory each time you call it. The routine_id is used as a sort of pointer-to-a-function, that you can use to tell another function to use this function for a certain operation. The custom_sort() is a great example of this - the sorting algorithm stays the same, but you can use a custom comparison function for the items being sorted. So to sort directory items by date, we need a function that compares the date fields from two directory items, and then use the routine_id of that function with custom_sort().
include sort.e include file.e function compare_by_date(sequence file1, sequence file2) -- compare two files by date, for custom_sort return compare(file1[D_YEAR..D_SECOND], file2[D_YEAR..D_SECOND]) end function constant routine_id__compare_by_date = routine_id("compare_by_date") function dir_oldest_first(sequence path) -- Custom directory sorting function for walk_dir(). object d d = dir(path) if atom(d) then return d end if return custom_sort(routine_id__compare_by_date, d) end function
I'm using version 3.1.1, I notice that in the ver 4.0 example walk_dir() has 4 parameters and not 3...
Based on Derek's example, the global variable my_dir used by walk_dir() went away in 4.0, and it is now the 4th parameter to walk_dir()
7. Re: Help needed with walk_dir()
- Posted by DerekParnell (admin) Aug 03, 2010
- 1092 views
I'm using version 3.1.1, I notice that in the ver 4.0 example walk_dir() has 4 parameters and not 3...
Based on Derek's example, the global variable my_dir used by walk_dir() went away in 4.0, and it is now the 4th parameter to walk_dir()
Well, its deprecated rather than removed. The my_dir approach still works but its no longer documented and the optional 4th parameter is now the preferred way of doing this.
8. Re: Help needed with walk_dir()
- Posted by hacker Aug 04, 2010
- 1094 views
The routine_id is used as a sort of pointer-to-a-function, that you can use to tell another function to use this function for a certain operation. The custom_sort() is a great example of this - the sorting algorithm stays the same, but you can use a custom comparison function for the items being sorted. So to sort directory items by date, we need a function that compares the date fields from two directory items, and then use the routine_id of that function with custom_sort().
Thanks Peter, this is useful. Having never learned C or lower-level stuff my understanding of pointers is hazy, but I can see that it's needed at times to get the most out of Euphoria.
9. Re: Help needed with walk_dir()
- Posted by DerekParnell (admin) Aug 04, 2010
- 1066 views
The routine_id is used as a sort of pointer-to-a-function...
... Having never learned C or lower-level stuff my understanding of pointers is hazy ...
You can think of routine ids as a kind of bookmark or place holder; its a way that you can call a routine when you don't know it's name.
In Euphoria, every routine is given a number when the application is run. You can find out what that number is by using the routine_id() function, and you can call the 'anonymous' routine by using it's number in the call_proc() or call_func() routine.
For example, the custom_sort function knows how to sort elements in a sequence but it doesn't know how to compare elements to work out which element should go before another element. Instead, it calls a routine that is written by you to get that information; but custom_sort does not know the name of your routine. So it calls your routine 'indirectly' using the call_func() routine with the routine id you initially passed to custom_sort().
10. Re: Help needed with walk_dir()
- Posted by hacker Aug 04, 2010
- 1043 views
Derek,
You say that routine_id() is "sort of" a pointer to a function, is this how it is actually implemented in the C source code for Euphoria?
I have to confess this is making my head hurt, but I'd really like to understand it. I'm not going to attempt reading the source code, but I have a book on C which has been gathering dust on my shelf for years, time to take a look at it I think. I know that "The C Programming Language" is highly recommended, but maybe tough going for newbies like me...
11. Re: Help needed with walk_dir()
- Posted by mattlewis (admin) Aug 04, 2010
- 1052 views
You say that routine_id() is "sort of" a pointer to a function, is this how it is actually implemented in the C source code for Euphoria?
It's a pointer in that it "points to" your routine. It does not refer to a particular place in memory (which is what is commonly meant when someone talks about a pointer). Effectively, you could think of it where the back end maintains a list of the routines, and the routine id is the index into that list. It's basically equivalent to:
procedure foo() -- ...do stuff end procedure procedure bar() -- ...do stuff end procedure procedure baz() -- ...do stuff end procedure sequence ROUTINES = {"foo", "bar", "baz"} function my_routine_id( sequence name ) return find( name, ROUTINES ) end function procedure my_call_proc( integer id ) if id = 1 then foo() elsif id = 2 then bar() elseif id = 3 then baz() else -- crash, bad routine id! end if end procedure integer foo_id foo_id = my_routine_id("foo") my_call_proc( foo_id )
I have to confess this is making my head hurt, but I'd really like to understand it. I'm not going to attempt reading the source code, but I have a book on C which has been gathering dust on my shelf for years, time to take a look at it I think. I know that "The C Programming Language" is highly recommended, but maybe tough going for newbies like me...
You can also get a better feel by looking at the euphoria based back end. It's still pretty dense, and not terribly easy to jump right into, but there is an implementation of routine id built in pure euphoria in there.
Matt