A typical valgrind log for such a leak looks like this:
==20520== 20 bytes in 1 blocks are still reachable in loss record 3 of 6 ==20520== at 0x402B0D5: calloc (vg_replace_malloc.c:623) ==20520== by 0x80E6EF0: yylex (parse_l.l:177) ==20520== by 0x80E0D6C: yyparse (parse_y.tab.c:1696) ==20520== by 0x80E85ED: Parse (parse_l.l:292) ==20520== by 0x80E876B: ParsePCB (parse_l.l:347) ==20520== by 0x8078591: real_load_pcb (file.c:390) ==20520== by 0x80787E9: LoadPCB (file.c:459) ==20520== by 0x8097719: main (main.c:1781)The code at parse_l.l:177 is just a calloc() and some string operation: this is where the string is created. The STRING token is referenced about 58 times in the grammar. After reading through the whole file 4..5 times, I still didn't see any obvious place for the leak.
The leak was also a rare one: happened for one string per file. This suggested it was in the header - unless there's an one-instance object somewhere in the .pcb or it's a cache where the same pointer is free()'d and overwritten for multiple occurrences and simply no one free()'s the last.
Assuming it's a header, a cheap ways to find which header field leaked:
At this point I figured that I'd depend on the reported size of the leak with my tests. I didn't want to do multiple runs and didn't want to risk the whole parser to run differently so I didn't want to modify the input. Instead I figured there's a simple, yet generic way to track these sort of leaks.
I estimated no string in the file is longer than 1000 characters. Right above the calloc() in the lexer I introduced a new static integer variable starting at 1000, increased before each allocation. This counter is sort of an ID of each allocation. Then I modified the calloc() to ignore the originally calculated string length and use this ID for allocation size. I also printed the ID-string pairs. The original code looked like this (simplified):
/* ... predict string size ... */ yylval.str = calloc(predicted_size, 1); /* ... build the string here ... */ return STRING;The resulting code looked like this (simplified):
/* ... predict string size ... */ static int alloc_id = 1000; alloc_id++; yylval.str = calloc(alloc_id, 1); /* ... build the string here ... */ fprintf(stderr, "STRING: %d '%s'\n", alloc_id, yylval.str); return STRING;I saved the list printed on stderr and checked valgrind's log to find the two strings in question were ID 1002 and ID 1007, both looked something like this:
1,3,4,c:2,5,6,s:7:8The only thing that looks like this is the layer group description ("Groups()"). From this point it was trivial to find the bug in the grammar.