1. Stopping phix (bug?)

When I end a EuGTK program, the window(s) close and if I'm running from a terminal, the terminal prompt returns.

When I end a PhixGTK program in the same way, the window(s) remain on the screen - non-responsive - and you have to do a ctl-c on the terminal, which closes the non-responsive windows, and gets you back to a prompt.

Watching the process monitor, it shows p using 10.9 megs of memory and 320 megs of virtual memory.

Clicking on the window close button shows p still running, but memory and virtual memory are N/A. Only ctl-c on the term or killing p in the system monitor will destroy the window and return to the prompt.

Doing the same with Euphoria, Eu disappears from the process list as soon as you close the main GTK window.

Calling gtk_main_quit() before abort makes no difference.

This is especially a problem if you compile a program and run it by clicking on the file icon. There's no way to stop it, short of waiting for the "Non-responding" dialog and clicking on "force quit", which is after a long delay.

Why the difference between Eu and Phix?

new topic     » topic index » view message » categorize

2. Re: Stopping phix

Hi

I also have found this problem occasionally, and while this may not help, and I can't remember specifics, it may be because the memory allocated by one of the GTK dlls is not explicitly being released, so it may be that you have to tell GTK to release the memory it's using. As I said it may not help. I usually find that these hangs are associated with DLLs.

Cheers

Chris

new topic     » goto parent     » topic index » view message » categorize

3. Re: Stopping phix

I'm stumped.

It is exiting gtk_main, but then both a normal exit and abort() just hang, and hang the debugger too.

ps -a shows the process as <defunct> which leads me to https://en.wikipedia.org/wiki/Zombie_process only there's no child/parent relation here at all.

The only other thing I could try is that opAbort is implemented in pStack as

    [ELF64] 
        mov rdi,rax                     -- error_code (p1) 
        mov rax,60                      -- sys_exit(rdi=int error_code) 
        syscall 

whereas exit_thread ended up as

        [ELF64] 
--          xor rbx,rbx  
--          mov rax,60                          -- sys_exit 
--          syscall 
            xor rdi,rdi 
            call "libpthread.so.0","pthread_exit" 

So I'm thinking along those lines for trying glibc/exit(), but I can't remember how/where I ever figured the last line out...
(mind you, even if/when I figure that out, I'm betting it won't change a thing, though https://stackoverflow.com/questions/46903180/syscall-implementation-of-exit suggests it might.)

new topic     » goto parent     » topic index » view message » categorize

4. Re: Stopping phix

Calling system("ps -U irv axjf",0) as the last thing before ending the program (after calling gtk_main_quit) shows a difference which might explain why the window doesn't go away when the program ends. Notice there are two invocations of p test1, but only one with Eu.

 7784  7791  7791  7791 pts/1     8162 Ss    1000   0:00      \_ bash  
 7791  8162  8162  7791 pts/1     8162 Sl+   1000   0:00          \_ p test1 
 8162  8165  8162  7791 pts/1     8162 S+    1000   0:00              \_ p test1 
 8165  8166  8162  7791 pts/1     8162 S+    1000   0:00                  \_ sh -c ps -U irv axjf 
 8166  8167  8162  7791 pts/1     8162 R+    1000   0:00                      \_ ps -U irv axjf 
 
 7784  7791  7791  7791 pts/1     8265 Ss    1000   0:00      \_ bash 
 7791  8265  8265  7791 pts/1     8265 Sl+   1000   0:00          \_ eui test1 
 8265  8271  8265  7791 pts/1     8265 S+    1000   0:00              \_ sh -c ps -U irv axjf 
 8271  8272  8265  7791 pts/1     8265 R+    1000   0:00                  \_ ps -U irv axjf 

If you run the same program compiled by Phix (which exhibits the same misbehaviour as the interpreted version):

    1  8618  1277  1277 ?           -1 Sl    1000   0:00 /home/irv/pgtk/test1 
 8618  8622  1277  1277 ?           -1 Z     1000   0:00  \_ [test1] <defunct> 
 8618  8625  1277  1277 ?           -1 S     1000   0:00  \_ /home/irv/pgtk/test1 
 8625  8626  1277  1277 ?           -1 S     1000   0:00      \_ sh -c ps -U irv axjf > PuOut.txt 
 8626  8627  1277  1277 ?           -1 R     1000   0:00          \_ ps -U irv axjf 

new topic     » goto parent     » topic index » view message » categorize

5. Re: Stopping phix

Here's a work-around for what appears to be two copies of phix launched when I run one GTK program. Create a tiny shared library: libpid.so

#include<stdio.h> 
#include<unistd.h> 
int get_pid() 
{ 
  return  getpid() ; 
} 
int get_ppid() 
{ 
  return getppid() ; 
} 

Then, on startup of your program, obtain the pid and ppid (process_id and parent_process_id) by calling these two functions e.g.

constant PID = open_dll("libpid.so") 
integer pid = c_func(define_c_func(PID,"get_pid",{},C_INT) 
integer ppid = c_func(define_c_func(PID,"get_ppid",{},C_INT) 

Then, when your program ends, call

system(sprintf("kill %d %d",{pid,ppid}))

Without this, your graphics remain on screen but unresponsive. Just a guess, but this may have something to do with GTK using threads. Euphoria doesn't "do" threads, so it doesn't have this problem. Maybe that, or something else entirely?

At least, with this patch, your program behaves properly, and cleans up after itself. Now, if I could learn to do that in my workshop...

new topic     » goto parent     » topic index » view message » categorize

6. Re: Stopping phix

irv said...
system(sprintf("kill %d %d",{pid,ppid}))

On winders, i used taskinfo and taskkill. On win7 taskinfo was changed to tasklist. There's other task* too, found in /system32/.

Kat

new topic     » goto parent     » topic index » view message » categorize

7. Re: Stopping phix

I think I finally found it, at last. Simply change pStack.e line 1771:

--      mov rax,60                      -- sys_exit(rdi=int error_code) 
        mov rax,231                     -- sys_exit_group(rdi=int error_code)  

Obviously ./p -cp and then, as long as you're calling gtk_main_quit, it should shut down properly.

Hmm, I tried all these things last week, turns out I was testing against the wrong pigging file...

new topic     » goto parent     » topic index » view message » categorize

8. Re: Stopping phix

This is why I like Phix. Bugs can be fixed without having to wait for ever for someone to re-compile the works.

Thanks!

new topic     » goto parent     » topic index » view message » categorize

9. Re: Stopping phix

For completeness (on this thread), the equivalent change for 32-bit a couple of lines above:

--      mov eax,1                       -- sys_exit(ebx=int error_code) 
        mov eax,252                     -- sys_exit_group(ebx=int error_code) 
new topic     » goto parent     » topic index » view message » categorize

10. Re: Stopping phix

This works for interpreted code, but compiled still hangs.

Watching the System Monitor shows something interesting:

When you execute "test1" as a compiled program (either from a command line or by clicking on the compiled program icon), the System Monitor shows one instance of test1 running. Things work as they should.

When you click on a close button to shut down test1, a SECOND, "zombie" instance of test1 pops up on the sys monitor.

A fraction of a second after that, the first instance is also converted to a zombie. It's just like the TV show!

Then you get the "Not responding" window after a few seconds delay.

Another clue might be: when you run the program in the interpreter, from a terminal, you are returned to the terminal prompt after the program ends. e.g.
irv@irv-Mint19:/pgtk$

When you run a compiled version, you are not returned to the prompt until you do a "force quit" from the "Not responding" dialog.

new topic     » goto parent     » topic index » view message » categorize

11. Re: Stopping phix

irv said...

This works for interpreted code, but compiled still hangs.

That's a shame, it works flawlessly for me here.

new topic     » goto parent     » topic index » view message » categorize

12. Re: Stopping phix

petelomax said...
irv said...

This works for interpreted code, but compiled still hangs.

That's a shame, it works flawlessly for me here.

Must work on the same problem.

Download a Mint iso, and burn to a usb. We now have identical starting points for a phix installation. Yes, phix will only exist until you turn the power off--an advantage since we get a fresh install for every test.

be well
_tom

new topic     » goto parent     » topic index » view message » categorize

13. Re: Stopping phix

To eliminate the possibility that GTK or my code is causing the problem, we eliminate both:

puts(1,"Hello\n") 
system("ps -U irv axjf | grep hello",0) 

irv@irv-Mint19:~/pgtk$ eui hello 
Hello 
 7209  7234  7234  7209 pts/1     7234 S+    1000   0:00          \_ eui hello 
 7234  7235  7234  7209 pts/1     7234 S+    1000   0:00              \_ sh -c ps -U irv axjf | grep hello 
 7235  7237  7234  7209 pts/1     7234 S+    1000   0:00                  \_ grep hello 
irv@irv-Mint19:~/pgtk$  

irv@irv-Mint19:~/pgtk$ p hello 
Hello 
 7209  7239  7239  7209 pts/1     7239 S+    1000   0:00          \_ p hello 
 7239  7240  7239  7209 pts/1     7239 S+    1000   0:00              \_ p hello 
 7240  7241  7239  7209 pts/1     7239 S+    1000   0:00                  \_ sh -c ps -U irv axjf | grep hello 
 7241  7243  7239  7209 pts/1     7239 S+    1000   0:00                      \_ grep hello 
irv@irv-Mint19:~/pgtk$ 

Note the difference.

I think it is safe to say that the problem is caused when one of those 2 instances of 'p' dies but just won't go away.

new topic     » goto parent     » topic index » view message » categorize

14. Re: Stopping phix

In builtins/syswait.ew line 430 we have:

                    [ELF32] 
                        mov eax,[child] 
--                      shl eax,2 
                        push eax 
                        call "libc.so.6","system" 
                        add esp,4 
                        mov ebx,eax  
                        mov eax,1                   -- sys_exit 
                        int 0x80  
                    [ELF64] 
                        mov rdi,[child] 
--                      shl rdi,2 
                        call "libc.so.6","system" 
                        mov rdi,rax 
                        mov rax,60                  -- sys_exit 
                        syscall 
                    [] 

Maybe (untried) if you replace those sys_exit with sys_exit_group as above and in the following recap from pStack.e:

    [ELF32] 
        mov ebx,eax                     -- error_code (p1) 
--      mov eax,1                       -- sys_exit(ebx=int error_code) 
        mov eax,252                     -- sys_exit_group(ebx=int error_code) 
        int 0x80 
--      xor ebx,ebx                     -- (common requirement after int 0x80) 
    [ELF64] 
        mov rdi,rax                     -- error_code (p1) 
--      mov rax,60                      -- sys_exit(rdi=int error_code) 
        mov rax,231                     -- sys_exit_group(rdi=int error_code)  
        syscall 
    [] 

it may help?...

new topic     » goto parent     » topic index » view message » categorize

15. Re: Stopping phix

Unfortunately, no success. No harm, either, as far as I can see.

What puzzles me is why are there two 'p's running in the first place? Threads, perhaps?

And do you get the same results, or only one instance when running this test?

new topic     » goto parent     » topic index » view message » categorize

16. Re: Stopping phix

If I invoke the clib system('gedit'), it will not return until gedit closes.
So, in the bit of code you just edited, which implements the Phix system(), I do a fork and let the child take that wait penalty.
It is in fact the system(ps) that is creating the interim child (I couldn't have told you that when I woke up this morning).

I also get two p hello, but no hang and no zombies.

So, it "still going wrong" as in "two p hello" or as in "compiled gtkwin hangs"?

new topic     » goto parent     » topic index » view message » categorize

17. Re: Stopping phix

Still get the two instances of p hello, whether running it interpreted or compiled.

It doesn't "hang", it comes back to the prompt.

If you add a line to the test, and open a process monitor, you'll see two instances of P, one a zombie. All I can figure is that phix (or the OS, somehow) is invoking p again on startup.

puts(1,"Hello\n") 
system("ps -U irv axjf | grep hello",0) 
?wait_key() -- add this 

BUT WAIT! I may have solved it:

?wait_key() -- add this 
puts(1,"Hello\n") 
system("ps -U irv axjf | grep hello",0) 
?wait_key() 

Now, watch the system monitor. ONE instance is running. Hit any key. Now TWO instances are running! The second is brain dead. That right there ought to be a clue as to what is happening.

I suspect the reason the GtkWindows "hang around" when you use them is that the zombie process doesn't have the ability to tell GTK to clean up and shut down. Zombies aren't very good communicators, according to the TV show...

new topic     » goto parent     » topic index » view message » categorize

18. Re: Stopping phix

system(ps) is doing a fork.

new topic     » goto parent     » topic index » view message » categorize

19. Re: Stopping phix

petelomax said...

system(ps) is doing a fork.

Yeah, that makes sense. oops sorry for the wild goose chase.

But it still doesn't explain the results in message 13 above. I'm still curious about why the exact same program calls eui once,but p twice.

Anyway, the previous changes mean that interpreted programs close properly. I'll worry about compiled ones later. Working around the problem is easy enough.

new topic     » goto parent     » topic index » view message » categorize

20. Re: Stopping phix

irv said...

But it still doesn't explain the results in message 13 above. I'm still curious about why the exact same program calls eui once,but p twice.

It makes perfect sense to me. Does this help:

parent  pid   
 7209  7239    \_ p hello  -- original/parent-post-fork paused at wait()   [having exited builtins\syswait.ew] 
 7239  7240       \_ p hello  --        child-post-fork paused at system()  [still inside builtins\syswait.ew] 
 7240  7241           \_ sh -c ps -U irv axjf | grep hello  
 7241  7243               \_ grep hello  

new topic     » goto parent     » topic index » view message » categorize

21. Re: Stopping phix

OK. Which of the first two steps aren't necessary for Euphoria to do? There was no wait_key() involved in the test program used in message 13:

puts(1,"Hello\n")  
system("ps -U irv axjf | grep hello",0)  
new topic     » goto parent     » topic index » view message » categorize

22. Re: Stopping phix

irv said...

OK. Which of the first two steps aren't necessary for Euphoria to do?

Afaict eui invokes [C's] system() directly and somehow it returns immediately, which is point blank dead against everything I have ever read about it and my direct experience in a low-level debugger.
I am now utterly convinced this is a complete red herring and absolutely nothing to do with whatever else is going wrong.

new topic     » goto parent     » topic index » view message » categorize

23. Re: Stopping phix

petelomax said...

Afaict eui invokes [C's] system() directly and somehow it returns immediately, which is point blank dead against everything I have ever read about it and my direct experience in a low-level debugger.
I am now utterly convinced this is a complete red herring and absolutely nothing to do with whatever else is going wrong.

I agree.

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu