1. Phix segfault when wildcard_match file between 1.5 and 2 million lines

Hi,

I setup a program to test the execution speed of Phix.
The program should read a text file line by line and discard all lines that match two patterns, as well as the previous line.
All other lines should be kept.
This is not even meant to save the kept lines because I didn't want to add the time needed for saving.

When the text file file.txt is up to +1.5 million lines the program finishes successfully after some seconds.
But if the text file is over 2 million lines, there's a segmentation fault.
The 2 million lines files is just a repetition of segments of the 1.5 million lines file, so there's no new symbol, char or pattern in the extra 500.000 lines.
The text file has some unicode characters like ★★ Some word ★★

integer file_in 
constant ERROR = 2 
string t_file = "file.txt" 
file_in = open(t_file, "r") 
if file_in = -1 then  
	puts(ERROR, "Could not open " & t_file) 
	abort(2) 
end if 
 
object txt = read_lines(t_file) 
integer match0, match1 
sequence buffer = {} 
object line 
integer skip_next = 0, i = 0 
while 1 do 
	i += 1 
	line = gets(file_in) 
	if atom(line) then 
		exit 
	end if 
	if skip_next = 1 then 
		skip_next = 0 
		continue 
	end if 
	match0 = wildcard_match("*some string*", line) 
	match1 = wildcard_match("*other*", line) 
	if match0 + match1 = 0 then 
		buffer = append(buffer, line) 
	else 
		skip_next = 1 
	end if 
end while 
close(file_in) 

Pete, I can send you the text file if you want to try it yourself.

new topic     » topic index » view message » categorize

2. Re: Phix segfault when wildcard_match file between 1.5 and 2 million lines

lib9 said...

Pete, I can send you the text file if you want to try it yourself.

Yeah, I cannot reproduce that here, so I'm going to need that file.
If you want/have nowhere else to put it, go to http://phix.x10.mx/pmwiki/pmwiki.php?n=Profiles.Pete and click on the file.txt.zip link, should allow you to upload it.
Since you are probably new to PCAN, you might want to read http://phix.x10.mx/pmwiki/pmwiki.php?n=Main.Introduction first.

new topic     » goto parent     » topic index » view message » categorize

3. Re: Phix segfault when wildcard_match file between 1.5 and 2 million lines

Hi Pete,

I've already uploaded the file, compressed. It makes no sense now as I stripped it, but it's still segfaulting.

Thank you.

new topic     » goto parent     » topic index » view message » categorize

4. Re: Phix segfault when wildcard_match file between 1.5 and 2 million lines

lib9 said...

I've already uploaded the file, compressed.

That's nice. Reminds me of a brilliant phone prank I once heard: guy gets a call from a satelite dish salesman, so pretends to be a detective at a murder scene, with a proper deep southern drawl:
(after warning him not to hang up, asking him how he knows the victim, and generally scaring the pants off him and not letting him get a word in edgeways)
Detective: Whe're you now?
Terrified Salesman: At work.
Detective: You tryin to be funny? What would I need to put on the OUTSIDE of an envelope to get a letter t'yr sorry ass?

lib9 said...

It makes no sense now as I stripped it, but it's still segfaulting.

Let's try a different tack. Does it still crash if you remove object txt = read_lines(t_file)?
Does wildcard_match("*some string*",line) ==> match("some string",line) (ditto other) work?
How about (undoing that then, and) if you remove buffer = append(buffer, line)?
If you then add ?i after incrementing it, does it always crash at the same place?
Assuming it does, on line NNNNNN, what does if i>=NNNNNN then ?line end if show?
And is the ex.err that generates small enough to be pasted here?

new topic     » goto parent     » topic index » view message » categorize

5. Re: Phix segfault when wildcard_match file between 1.5 and 2 million lines

lib9 said...

I've already uploaded the file, compressed.

petelomax said...

That's nice. Reminds me of a brilliant phone prank I once heard: Detective: You tryin to be funny? What would I need to put on the OUTSIDE of an envelope to get a letter t'yr sorry ass?

Huh, maybe I make too many assumptions. I read lib9's quote as meaning that the file was uploaded to the spot you asked for, i.e. through the http://phix.x10.mx/pmwiki/pmwiki.php?n=Profiles.Pete link that you mentioned earlier in this thread (though I haven't verified if that's the case or not).

new topic     » goto parent     » topic index » view message » categorize

6. Re: Phix segfault when wildcard_match file between 1.5 and 2 million lines

petelomax said...

That's nice. Reminds me of a brilliant phone prank I once heard: guy gets a call from a satelite dish salesman, so pretends to be a detective at a murder scene, with a proper deep southern drawl:
(after warning him not to hang up, asking him how he knows the victim, and generally scaring the pants off him and not letting him get a word in edgeways)
Detective: Whe're you now?
Terrified Salesman: At work.
Detective: You tryin to be funny? What would I need to put on the OUTSIDE of an envelope to get a letter t'yr sorry ass?

I'm not native english speaker and I miss the message you're trying to convey to me. Direct meaning is preferable, for me.

petelomax said...

Let's try a different tack. Does it still crash if you remove object txt = read_lines(t_file)?
Does wildcard_match("*some string*",line) ==> match("some string",line) (ditto other) work?
How about (undoing that then, and) if you remove buffer = append(buffer, line)?
If you then add ?i after incrementing it, does it always crash at the same place?
Assuming it does, on line NNNNNN, what does if i>=NNNNNN then ?line end if show?
And is the ex.err that generates small enough to be pasted here?


1. If I remove object txt = read_lines(t_file):
It doesn't segfault.
I tried two methods of reading the input file and seem to have forgot that line.

2. #match("some string",line) (not using asterisks), but with the object txt = read_lines(t_file)
Continues to sefault.

3. Like 2., but without object txt = read_lines(t_file)
Runs fine.

4. Remove buffer = append(buffer, line), keep the rest as in my code, so with object txt = read_lines(t_file).
Runs fine.

5. ?i
No, it stops at different lines, like line number 1993109, 1992407, 1992925.

6. It didn't generate any ex.err file.

Thanks.

new topic     » goto parent     » topic index » view message » categorize

7. Re: Phix segfault when wildcard_match file between 1.5 and 2 million lines

jimcbrown said...
lib9 said...

I've already uploaded the file, compressed.

petelomax said...

That's nice. Reminds me of a brilliant phone prank I once heard: Detective: You tryin to be funny? What would I need to put on the OUTSIDE of an envelope to get a letter t'yr sorry ass?

Huh, maybe I make too many assumptions. I read lib9's quote as meaning that the file was uploaded to the spot you asked for, i.e. through the http://phix.x10.mx/pmwiki/pmwiki.php?n=Profiles.Pete link that you mentioned earlier in this thread (though I haven't verified if that's the case or not).

Correct jimcbrown. I uploaded it there.

new topic     » goto parent     » topic index » view message » categorize

8. Re: Phix segfault when wildcard_match file between 1.5 and 2 million lines

lib9 said...

I uploaded it there.

Ah! I found it now, as file.txt.gz rather than file.txt.zip - so I guess you are running on Linux?
I've just tried it on Mint Cinnamon and it is crashing now (unlike my previous tests on Windows).

Update: I can now tell you it seems to be crashing when trying to say you have run out of memory....
Maybe a "paltry" 192MB should not really trouble it (and it don't on Windows, but see the final note).
First thing though, I'm testing mmap() for null returns, should be (void*)-1, let's fix that:
(Actually, all this probably won't really help you very much...)

builtins\VM\pHeap.e line 1074 said...

call "libc.so.6","mmap"

so let's just add right after that:

--9/2/24: 
            test rax,rax 
            jg @f 
                xor rax,rax 
         @@: 

Which ran me straight into some missing error handling (the five lines above int3 next, no doubt other similar instances exist):

builtins\VM\pHeap.e line 3691 said...

        call :%pGetPool                 -- allocate rcx bytes, rounded up 
        test rax,rax 
--      jz :memoryallocationfailure 
        jnz @f 
--9/2/24: 
            mov rdx,[rsp+48] 
            mov al,33   -- e33maf 
            sub rdx,1 
            jmp :!iDiag 
            int3 
      @@: 

A quick "./p -c p" later...
It now goes a bit mental with "Your program has run out of memory, one moment please", but you can kill that with Ctrl C.

lib9 said...

If I remove object txt = read_lines(t_file): It doesn't segfault.

Well that's certainly going to put this on the back burner. If you are going to load a really big file you really should process
it one line at a time and throw them away once dealt with. It takes Phix (running on a VM, so not fast) about 40 seconds
to plough through that file. In contrast, I gave up trying to load it in gedit after 10 minutes and it was not even 1/4
the way through, so to fully load the same file would take it (and gedit is not noted for being slow) at least 45 minutes.
[Of course gedit manages to load the first screenful very quickly, that's not what I'm talking about, try scrolling.]
Presumably what you want is to scrape some info and store it in a much faster database or similar for later use.

new topic     » goto parent     » topic index » view message » categorize

9. Re: Phix segfault when wildcard_match file between 1.5 and 2 million lines

petelomax said...

That's nice. Reminds me of a brilliant phone prank I once heard: guy gets a call from a satelite dish salesman, so pretends to be a detective at a murder scene, with a proper deep southern drawl:
(after warning him not to hang up, asking him how he knows the victim, and generally scaring the pants off him and not letting him get a word in edgeways)
Detective: Whe're you now?
Terrified Salesman: At work.
Detective: You tryin to be funny? What would I need to put on the OUTSIDE of an envelope to get a letter t'yr sorry ass?

lib9 said...

I'm not native english speaker and I miss the message you're trying to convey to me. Direct meaning is preferable, for me.

Ah, I think Pete was just asking where you uploaded the files to, and possibly under what file name, etc. The small details to help him locate it.

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu