admin管理员组

文章数量:1122846

I have a large C code base, with >100 binaries, >3000 files and > 30 libraries. There is a lot of dead code that was accumulated and I'm looking for ways to identify and remove that code. The code is simple - no complex macros and (very little) automatically generated code (lex/bison/...).

To identify "static" dead code (and variables) gcc does a good job (using -Wunused-* options identifies all unused static variables, static functions, ...). My challenge is with non-static global functions and variables (and the code base has lot of them!)

I've lot of mileage using 'nm' across all the objects files, practically create a list of all defined global symbols (types 'T', 'D' and 'B' for code, data and uninitialized). I then removed every 'U' symbols. That process identified all unreferenced global. At this point, I have to manually make each symbol static, compile with gcc -Werror -Wunused, and see if it raises any error.

# Omitting some details for brevity.
nm --undefined-only lib1.a lib2.a ... obj1 obj2.o obj3.o | sort > refs.txt
nm --extern-only --defined-only lib1.a lib2.a ... obj1 obj2.o obj3.o  | sort > defs.txt
join -12 -23 -v2 refs.txt defs.txt

My question - is it possible to use "nm" (or other object analysis tool like objdump) to identify which global symbols in object file are also used inside the same object. This will speed up the dead code elimination by separating dead code in global function from global functions that are actually used (but may become static).

Alternatively, is there any other existing tool that will do the job?

I have a large C code base, with >100 binaries, >3000 files and > 30 libraries. There is a lot of dead code that was accumulated and I'm looking for ways to identify and remove that code. The code is simple - no complex macros and (very little) automatically generated code (lex/bison/...).

To identify "static" dead code (and variables) gcc does a good job (using -Wunused-* options identifies all unused static variables, static functions, ...). My challenge is with non-static global functions and variables (and the code base has lot of them!)

I've lot of mileage using 'nm' across all the objects files, practically create a list of all defined global symbols (types 'T', 'D' and 'B' for code, data and uninitialized). I then removed every 'U' symbols. That process identified all unreferenced global. At this point, I have to manually make each symbol static, compile with gcc -Werror -Wunused, and see if it raises any error.

# Omitting some details for brevity.
nm --undefined-only lib1.a lib2.a ... obj1 obj2.o obj3.o | sort > refs.txt
nm --extern-only --defined-only lib1.a lib2.a ... obj1 obj2.o obj3.o  | sort > defs.txt
join -12 -23 -v2 refs.txt defs.txt

My question - is it possible to use "nm" (or other object analysis tool like objdump) to identify which global symbols in object file are also used inside the same object. This will speed up the dead code elimination by separating dead code in global function from global functions that are actually used (but may become static).

Alternatively, is there any other existing tool that will do the job?

Share Improve this question edited yesterday mkrieger1 22.9k7 gold badges63 silver badges79 bronze badges asked yesterday dash-odash-o 14.4k1 gold badge13 silver badges40 bronze badges 5
  • Make all the functions static and see what breaks :) – ikegami Commented yesterday
  • Can you run your code under a source coverage analysis tool? It cannot tell you which code are really dead, but can at least tell you which are not dead and you no longer need to change those symbols to static and test. Depending on your use case it may save a lot of time or not. – Weijun Zhou Commented yesterday
  • You should also consider looking for duplicate code. String functions in particular tend to accumulate on big projects. – stark Commented yesterday
  • As a lateral solution, consider using a test coverage tool (most professional environments have one available to their programmers/testers/QA anyway). Because instead of looking for unused code you might just identify untested code. If you find something that is not tested and no way to test it, then it is either dead or a design problem that needs fixing. (Depending on your context you might consider this an answer ... let me know, I will make one.) – Yunnosch Commented yesterday
  • A perl/python script may be able to find most cases. Regexp to find definition and for finding use. Note: assume some kind of human code (with some coding styles), do not try to do a tools which can handle all C cases (e.g. function definitions in a line with other statements), and forget preprocessors. Else you need a parser, and so no more a quick script – Giacomo Catenazzi Commented yesterday
Add a comment  | 

1 Answer 1

Reset to default 3

I suggest to use GNU ld's dead symbol removal functionality for this.

For this you need to compile your code with -fdata-sections -ffunction-sections and then link with -Wl,--gc-sections -Wl,--print-gc-sections flags. It will print information about functions which have been removed.

Here is an example for sample program

/usr/bin/ld: removing unused section '.text.foo' in file '/tmp/ccXZWJ2X.o'

(.text.foo is section generated for unused function foo).

As a side note, if you use these options there may be no need to manually sanitize your codebase (apart from making it cleaner) because the toolchain will remove dead code automatically.

本文标签: cIdentifying dead code in large code repositoryStack Overflow