1. Phix segfault when wildcard_match file between 1.5 and 2 million lines
- Posted by lib9 in February
- 966 views
Hi,
I setup a program to test the execution speed of Phix.
The program should read a text file line by line and discard all lines that match two patterns, as well as the previous line.
All other lines should be kept.
This is not even meant to save the kept lines because I didn't want to add the time needed for saving.
When the text file file.txt is up to +1.5 million lines the program finishes successfully after some seconds.
But if the text file is over 2 million lines, there's a segmentation fault.
The 2 million lines files is just a repetition of segments of the 1.5 million lines file, so there's no new symbol, char or pattern in the extra 500.000 lines.
The text file has some unicode characters like ★★ Some word ★★
integer file_in constant ERROR = 2 string t_file = "file.txt" file_in = open(t_file, "r") if file_in = -1 then puts(ERROR, "Could not open " & t_file) abort(2) end if object txt = read_lines(t_file) integer match0, match1 sequence buffer = {} object line integer skip_next = 0, i = 0 while 1 do i += 1 line = gets(file_in) if atom(line) then exit end if if skip_next = 1 then skip_next = 0 continue end if match0 = wildcard_match("*some string*", line) match1 = wildcard_match("*other*", line) if match0 + match1 = 0 then buffer = append(buffer, line) else skip_next = 1 end if end while close(file_in)
Pete, I can send you the text file if you want to try it yourself.
2. Re: Phix segfault when wildcard_match file between 1.5 and 2 million lines
- Posted by petelomax in February
- 964 views
Pete, I can send you the text file if you want to try it yourself.
Yeah, I cannot reproduce that here, so I'm going to need that file.
If you want/have nowhere else to put it, go to http://phix.x10.mx/pmwiki/pmwiki.php?n=Profiles.Pete and click on the file.txt.zip link, should allow you to upload it.
Since you are probably new to PCAN, you might want to read http://phix.x10.mx/pmwiki/pmwiki.php?n=Main.Introduction first.
3. Re: Phix segfault when wildcard_match file between 1.5 and 2 million lines
- Posted by lib9 in February
- 904 views
Hi Pete,
I've already uploaded the file, compressed. It makes no sense now as I stripped it, but it's still segfaulting.
Thank you.
4. Re: Phix segfault when wildcard_match file between 1.5 and 2 million lines
- Posted by petelomax in February
- 896 views
I've already uploaded the file, compressed.
That's nice. Reminds me of a brilliant phone prank I once heard: guy gets a call from a satelite dish salesman, so pretends to be a detective at a murder scene, with a proper deep southern drawl:
(after warning him not to hang up, asking him how he knows the victim, and generally scaring the pants off him and not letting him get a word in edgeways)
Detective: Whe're you now?
Terrified Salesman: At work.
Detective: You tryin to be funny? What would I need to put on the OUTSIDE of an envelope to get a letter t'yr sorry ass?
It makes no sense now as I stripped it, but it's still segfaulting.
Let's try a different tack. Does it still crash if you remove object txt = read_lines(t_file)?
Does wildcard_match("*some string*",line) ==> match("some string",line) (ditto other) work?
How about (undoing that then, and) if you remove buffer = append(buffer, line)?
If you then add ?i after incrementing it, does it always crash at the same place?
Assuming it does, on line NNNNNN, what does if i>=NNNNNN then ?line end if show?
And is the ex.err that generates small enough to be pasted here?
5. Re: Phix segfault when wildcard_match file between 1.5 and 2 million lines
- Posted by jimcbrown (admin) in February
- 850 views
I've already uploaded the file, compressed.
That's nice. Reminds me of a brilliant phone prank I once heard: Detective: You tryin to be funny? What would I need to put on the OUTSIDE of an envelope to get a letter t'yr sorry ass?
Huh, maybe I make too many assumptions. I read lib9's quote as meaning that the file was uploaded to the spot you asked for, i.e. through the http://phix.x10.mx/pmwiki/pmwiki.php?n=Profiles.Pete link that you mentioned earlier in this thread (though I haven't verified if that's the case or not).
6. Re: Phix segfault when wildcard_match file between 1.5 and 2 million lines
- Posted by lib9 in February
- 847 views
That's nice. Reminds me of a brilliant phone prank I once heard: guy gets a call from a satelite dish salesman, so pretends to be a detective at a murder scene, with a proper deep southern drawl:
(after warning him not to hang up, asking him how he knows the victim, and generally scaring the pants off him and not letting him get a word in edgeways)
Detective: Whe're you now?
Terrified Salesman: At work.
Detective: You tryin to be funny? What would I need to put on the OUTSIDE of an envelope to get a letter t'yr sorry ass?
I'm not native english speaker and I miss the message you're trying to convey to me. Direct meaning is preferable, for me.
Let's try a different tack. Does it still crash if you remove object txt = read_lines(t_file)?
Does wildcard_match("*some string*",line) ==> match("some string",line) (ditto other) work?
How about (undoing that then, and) if you remove buffer = append(buffer, line)?
If you then add ?i after incrementing it, does it always crash at the same place?
Assuming it does, on line NNNNNN, what does if i>=NNNNNN then ?line end if show?
And is the ex.err that generates small enough to be pasted here?
1. If I remove object txt = read_lines(t_file):
It doesn't segfault.
I tried two methods of reading the input file and seem to have forgot that line.
2. #match("some string",line) (not using asterisks), but with the object txt = read_lines(t_file)
Continues to sefault.
3. Like 2., but without object txt = read_lines(t_file)
Runs fine.
4. Remove buffer = append(buffer, line), keep the rest as in my code, so with object txt = read_lines(t_file).
Runs fine.
5. ?i
No, it stops at different lines, like line number 1993109, 1992407, 1992925.
6. It didn't generate any ex.err file.
Thanks.
7. Re: Phix segfault when wildcard_match file between 1.5 and 2 million lines
- Posted by lib9 in February
- 836 views
I've already uploaded the file, compressed.
That's nice. Reminds me of a brilliant phone prank I once heard: Detective: You tryin to be funny? What would I need to put on the OUTSIDE of an envelope to get a letter t'yr sorry ass?
Huh, maybe I make too many assumptions. I read lib9's quote as meaning that the file was uploaded to the spot you asked for, i.e. through the http://phix.x10.mx/pmwiki/pmwiki.php?n=Profiles.Pete link that you mentioned earlier in this thread (though I haven't verified if that's the case or not).
Correct jimcbrown. I uploaded it there.
8. Re: Phix segfault when wildcard_match file between 1.5 and 2 million lines
- Posted by petelomax in February
- 829 views
I uploaded it there.
Ah! I found it now, as file.txt.gz rather than file.txt.zip - so I guess you are running on Linux?
I've just tried it on Mint Cinnamon and it is crashing now (unlike my previous tests on Windows).
Update: I can now tell you it seems to be crashing when trying to say you have run out of memory....
Maybe a "paltry" 192MB should not really trouble it (and it don't on Windows, but see the final note).
First thing though, I'm testing mmap() for null returns, should be (void*)-1, let's fix that:
(Actually, all this probably won't really help you very much...)
call "libc.so.6","mmap"
so let's just add right after that:
--9/2/24: test rax,rax jg @f xor rax,rax @@:
Which ran me straight into some missing error handling (the five lines above int3 next, no doubt other similar instances exist):
call :%pGetPool -- allocate rcx bytes, rounded up test rax,rax -- jz :memoryallocationfailure jnz @f --9/2/24: mov rdx,[rsp+48] mov al,33 -- e33maf sub rdx,1 jmp :!iDiag int3 @@:
A quick "./p -c p" later...
It now goes a bit mental with "Your program has run out of memory, one moment please", but you can kill that with Ctrl C.
If I remove object txt = read_lines(t_file): It doesn't segfault.
Well that's certainly going to put this on the back burner. If you are going to load a really big file you really should process
it one line at a time and throw them away once dealt with. It takes Phix (running on a VM, so not fast) about 40 seconds
to plough through that file. In contrast, I gave up trying to load it in gedit after 10 minutes and it was not even 1/4
the way through, so to fully load the same file would take it (and gedit is not noted for being slow) at least 45 minutes.
[Of course gedit manages to load the first screenful very quickly, that's not what I'm talking about, try scrolling.]
Presumably what you want is to scrape some info and store it in a much faster database or similar for later use.
9. Re: Phix segfault when wildcard_match file between 1.5 and 2 million lines
- Posted by jimcbrown (admin) in February
- 813 views
That's nice. Reminds me of a brilliant phone prank I once heard: guy gets a call from a satelite dish salesman, so pretends to be a detective at a murder scene, with a proper deep southern drawl:
(after warning him not to hang up, asking him how he knows the victim, and generally scaring the pants off him and not letting him get a word in edgeways)
Detective: Whe're you now?
Terrified Salesman: At work.
Detective: You tryin to be funny? What would I need to put on the OUTSIDE of an envelope to get a letter t'yr sorry ass?
I'm not native english speaker and I miss the message you're trying to convey to me. Direct meaning is preferable, for me.
Ah, I think Pete was just asking where you uploaded the files to, and possibly under what file name, etc. The small details to help him locate it.