[DGD] DGD Segmentation Fault After Dump Restore

bart at wotf.org bart at wotf.org
Sat Sep 21 13:58:57 CEST 2013


Looks like I managed to trigger the same or a related issue.

Can't reliably reproduce it, through it seems pretty much certain that
recently upgraded LWOs are involved. 

Yesterday I changed a bit of code to use LWOs instead of clones as data
objects for keeping track of some user data. As soon as I made that change, I
started noticing intermittent crashes, usually after having done multiple
recompiles of the master object for those LWOs, from what I can tell without
actually referencing the LWOs inbetween. Haven't been able to narrow down the
conditions more then that yet. 

Changing back to using clones makes the issue go away. That doesn't prove it
is the LWOs but in my mind it makes it rather likely.

Is there anything more I can do to help debug this issue? 

A backtrace follows, note that I gathered 4 different cores, showing 4
different backtraces, I can post backtraces from the other 3 as well if that
would help.

Core was generated by `driver'.                                                
Program terminated with signal 11, Segmentation fault.                         
Reading symbols from /lib/libc.so.7...done.                                    
Loaded symbols for /lib/libc.so.7                                              
Reading symbols from /libexec/ld-elf.so.1...done.                              
Loaded symbols for /libexec/ld-elf.so.1                                        
#0  0x08051004 in o_control (obj=0x0) at object.c:966                          
966         if (!(o->flags & O_MASTER)) {                                      
(gdb) bt                                                                       
#0  0x08051004 in o_control (obj=0x0) at object.c:966                          
#1  0x080641fb in i_call (f=0xbfbfb078, obj=0x0, lwobj=0x2849fc84,
func=0x28408fc0 "query_priv", len=10, call_static=0, nargs=0) at interpret.c:3086
#2  0x08095ebc in kf_call_other (f=0xbfbfb078, nargs=2) at std.c:189         
                                                                      
#3  0x080656a4 in i_interpret1 (f=0xbfbfb078, pc=0x28579b5f "\n\001?@") at
interpret.c:2592                                                         
#4  0x08063df8 in i_funcall (prev_f=0xbfbfb3a8, obj=Variable "obj" is not
available.                                                                
) at interpret.c:2973                                                        
                                                                      
#5  0x08065af3 in i_interpret1 (f=0xbfbfb3a8, pc=0x28579bee "\n\001") at
interpret.c:2698
#6  0x08063df8 in i_funcall (prev_f=0xbfbfb768, obj=Variable "obj" is not
available.
) at interpret.c:2973
#7  0x080642fe in i_call (f=0xbfbfb768, obj=0x28301278, lwobj=0x0,
func=0x28407470 "query_wiz", len=9, call_static=0, nargs=1) at interpret.c:3106
#8  0x08095ebc in kf_call_other (f=0xbfbfb768, nargs=3) at std.c:189
#9  0x080656a4 in i_interpret1 (f=0xbfbfb768, pc=0x28554e8a "\n\001?��") at
interpret.c:2592
#10 0x08063df8 in i_funcall (prev_f=0xbfbfba88, obj=Variable "obj" is not
available.
) at interpret.c:2973
#11 0x08065af3 in i_interpret1 (f=0xbfbfba88, pc=0x28595a6f "\n\001\030") at
interpret.c:2698
#12 0x08063df8 in i_funcall (prev_f=0xbfbfbe38, obj=Variable "obj" is not
available.
) at interpret.c:2973
#13 0x08065af3 in i_interpret1 (f=0xbfbfbe38, pc=0x28595b9b "\n\001\030") at
interpret.c:2698
#14 0x08063df8 in i_funcall (prev_f=0xbfbfc308, obj=Variable "obj" is not
available.
) at interpret.c:2973
#15 0x080642fe in i_call (f=0xbfbfc308, obj=0x28302284, lwobj=0x0,
func=0x2858eae8 "expand_alias", len=12, call_static=0, nargs=1) at
interpret.c:3106
#16 0x08095ebc in kf_call_other (f=0xbfbfc308, nargs=3) at std.c:189
#17 0x080656a4 in i_interpret1 (f=0xbfbfc308, pc=0x28584ddd "\n\0031�e�\030")
at interpret.c:2592
#18 0x08063df8 in i_funcall (prev_f=0xbfbfc748, obj=Variable "obj" is not
available.
) at interpret.c:2973
#19 0x080642fe in i_call (f=0xbfbfc748, obj=0x28302180, lwobj=0x0,
func=0x28418fd4 "receive_message", len=15, call_static=0, nargs=1)
    at interpret.c:3106
#20 0x08095ebc in kf_call_other (f=0xbfbfc748, nargs=3) at std.c:189
#21 0x080656a4 in i_interpret1 (f=0xbfbfc748, pc=0x285901c2 "?\232:?@�� \001")
at interpret.c:2592
#22 0x08065c22 in i_interpret1 (f=Variable "f" is not available.
) at interpret.c:2736
#23 0x08063df8 in i_funcall (prev_f=0xbfbfca68, obj=Variable "obj" is not
available.
) at interpret.c:2973
#24 0x08065af3 in i_interpret1 (f=0xbfbfca68, pc=0x285901e5 "Z:?@�") at
interpret.c:2698
#25 0x08063df8 in i_funcall (prev_f=0x80b8f60, obj=Variable "obj" is not
available.
) at interpret.c:2973
#26 0x080642fe in i_call (f=0x80b8f60, obj=0x28301d70, lwobj=0x0,
func=0x80a6cd0 "receive_message", len=15, call_static=1, nargs=1)
    at interpret.c:3106
#27 0x0805d3b0 in comm_receive (f=0x80b8f60, timeout=0, mtime=281) at comm.c:1578
#28 0x0806a239 in dgd_main (argc=Variable "argc" is not available.
) at dgd.c:223
#29 0x0809c15e in main (argc=Cannot access memory at address 0x0
) at local.c:46


On Mon, 09 Sep 2013 19:19:57 +1000, Neil McBride wrote
> Not sure if I'm on Felix's whitelist so posting this to the list too.
> 
> On 9/09/2013 5:50 PM, Felix A. Croes wrote:
> >> The statedump was likely corrupted.  Was it by the same version of DGD,
> >> or an earlier version?  Also, could you email me the result of the commands
> >>
> >>      frame 1
> >>      print *tmpl
> >>
> >> with this coredump?
> 
> The state dump was from the same version of DGD. I restarted from a 
> fresh restart a few days ago when this originally happened thinking 
> perhaps it was me.
> 
> Output from the above commands follow: -
> 
> Core was generated by `./bin/driver H7/H7.dgd dump'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x00007f4dc834a217 in o_control (obj=0x7f4dc9adb340) at object.c:966
> 966         if (!(o->flags & O_MASTER)) {
> (gdb) bt
> #0  0x00007f4dc834a217 in o_control (obj=0x7f4dc9adb340) at object.c:966
> #1  0x00007f4dc83592f9 in d_get_varmap (obj=0x7ffff35180b0, update=0,
>      nvariables=0x7ffff35180f4) at data.c:1521
> #2  0x00007f4dc83597cf in d_upgrade_lwobj (lwobj=0x7f4dc8264d48,
>      obj=0x7f4dc96dc280) at data.c:1649
> #3  0x00007f4dc8350672 in d_count (save=0x7ffff3518200, 
> v=0x7f4dc829a4f0, n=1)
>      at sdata.c:1830
> #4  0x00007f4dc83503ed in d_arrcount (save=0x7ffff3518200, 
> arr=0x7f4dc827b0f0)
>      at sdata.c:1776
> #5  0x00007f4dc8350599 in d_count (save=0x7ffff3518200, 
> v=0x7f4dc82ada88, n=3)
>      at sdata.c:1813
> #6  0x00007f4dc83515a5 in d_save_dataspace (data=0x7f4dc8279378, swap=1 
> '\001')
>      at sdata.c:2210
> #7  0x00007f4dc835215a in d_swapout (frag=32) at sdata.c:2437
> #8  0x00007f4dc837415e in endthread () at dgd.c:89
> #9  0x00007f4dc835e870 in comm_receive (f=0x7f4dc85f9b00 <topframe>,
>      timeout=26, mtime=827) at comm.c:1580
> #10 0x00007f4dc837458a in dgd_main (argc=2, argv=0x7ffff351a5b0) at 
> dgd.c:223
> #11 0x00007f4dc83c86dd in main (argc=3, argv=0x7ffff351a5a8) at local.c:46
> (gdb) frame 1
> #1  0x00007f4dc83592f9 in d_get_varmap (obj=0x7ffff35180b0, update=0,
>      nvariables=0x7ffff35180f4) at data.c:1521
> 1521        vmap = o_control(tmpl)->vmap;
> (gdb) print *tmpl
> Cannot access memory at address 0x7f4dc9adb340
> 
> Thanks,
> 
> Neil.
> ____________________________________________
> https://mail.dworkin.nl/mailman/listinfo/dgd


--
http://www.flickr.com/photos/mrobjective/
http://www.om-d.org/




More information about the DGD mailing list