Re: Phix segfault when wildcard_match file between 1.5 and 2 million lines
- Posted by petelomax Feb 08, 2024
- 874 views
I uploaded it there.
Ah! I found it now, as file.txt.gz rather than file.txt.zip - so I guess you are running on Linux?
I've just tried it on Mint Cinnamon and it is crashing now (unlike my previous tests on Windows).
Update: I can now tell you it seems to be crashing when trying to say you have run out of memory....
Maybe a "paltry" 192MB should not really trouble it (and it don't on Windows, but see the final note).
First thing though, I'm testing mmap() for null returns, should be (void*)-1, let's fix that:
(Actually, all this probably won't really help you very much...)
call "libc.so.6","mmap"
so let's just add right after that:
--9/2/24: test rax,rax jg @f xor rax,rax @@:
Which ran me straight into some missing error handling (the five lines above int3 next, no doubt other similar instances exist):
call :%pGetPool -- allocate rcx bytes, rounded up test rax,rax -- jz :memoryallocationfailure jnz @f --9/2/24: mov rdx,[rsp+48] mov al,33 -- e33maf sub rdx,1 jmp :!iDiag int3 @@:
A quick "./p -c p" later...
It now goes a bit mental with "Your program has run out of memory, one moment please", but you can kill that with Ctrl C.
If I remove object txt = read_lines(t_file): It doesn't segfault.
Well that's certainly going to put this on the back burner. If you are going to load a really big file you really should process
it one line at a time and throw them away once dealt with. It takes Phix (running on a VM, so not fast) about 40 seconds
to plough through that file. In contrast, I gave up trying to load it in gedit after 10 minutes and it was not even 1/4
the way through, so to fully load the same file would take it (and gedit is not noted for being slow) at least 45 minutes.
[Of course gedit manages to load the first screenful very quickly, that's not what I'm talking about, try scrolling.]
Presumably what you want is to scrape some info and store it in a much faster database or similar for later use.