A typical valgrind log for such a leak looks like this:

==20520== 20 bytes in 1 blocks are still reachable in loss record 3 of 6
==20520==    at 0x402B0D5: calloc (vg_replace_malloc.c:623)
==20520==    by 0x80E6EF0: yylex (parse_l.l:177)
==20520==    by 0x80E0D6C: yyparse (parse_y.tab.c:1696)
==20520==    by 0x80E85ED: Parse (parse_l.l:292)
==20520==    by 0x80E876B: ParsePCB (parse_l.l:347)
==20520==    by 0x8078591: real_load_pcb (file.c:390)
==20520==    by 0x80787E9: LoadPCB (file.c:459)
==20520==    by 0x8097719: main (main.c:1781)

The code at parse_l.l:177 is just a calloc() and some string operation: this is where the string is created. The STRING token is referenced about 58 times in the grammar. After reading through the whole file 4..5 times, I still didn't see any obvious place for the leak.

The leak was also a rare one: happened for one string per file. This suggested it was in the header - unless there's an one-instance object somewhere in the .pcb or it's a cache where the same pointer is free()'d and overwritten for multiple occurrences and simply no one free()'s the last.

Assuming it's a header, a cheap ways to find which header field leaked:

At this point I figured that I'd depend on the reported size of the leak with my tests. I didn't want to do multiple runs and didn't want to risk the whole parser to run differently so I didn't want to modify the input. Instead I figured there's a simple, yet generic way to track these sort of leaks.

I estimated no string in the file is longer than 1000 characters. Right above the calloc() in the lexer I introduced a new static integer variable starting at 1000, increased before each allocation. This counter is sort of an ID of each allocation. Then I modified the calloc() to ignore the originally calculated string length and use this ID for allocation size. I also printed the ID-string pairs. The original code looked like this (simplified):

	/* ... predict string size ... */
	yylval.str = calloc(predicted_size, 1);
	/* ... build the string here ... */
	return STRING;

The resulting code looked like this (simplified):

	/* ... predict string size ... */
	static int alloc_id = 1000;
	alloc_id++;
	yylval.str = calloc(alloc_id, 1);
	/* ... build the string here ... */
	fprintf(stderr, "STRING: %d '%s'\n", alloc_id, yylval.str);
	return STRING;

I saved the list printed on stderr and checked valgrind's log to find the two strings in question were ID 1002 and ID 1007, both looked something like this:

1,3,4,c:2,5,6,s:7:8

The only thing that looks like this is the layer group description ("Groups()"). From this point it was trivial to find the bug in the grammar.