How to use static analysis tools within pkgsrc

From NetBSD Wiki

Jump to: navigation, search

Contents

Introduction

In this article, static analyzers like devel/splint or devel/cqual are integrated into the pkgsrc infrastructure, making it an environment to test software packages for common programming errors.

The idea is to write a generic wrapper script that gets called whenever a source file is compiled or when source files are linked together. This wrapper then calls the various static analysis tools to do the main work.

To get a first impression, I will first write a very simple »analyzer« that just writes the command line of the compiler to a log file. Then, I will integrate this analyzer into the pkgsrc infrastructure and run some basic tests on it, to get a feeling about how all that works.

Basic system configuration

I have pkgsrc installed in /home/roland/proj/pkgsrc, and the static analysis things in /home/roland/proj/static-analysis. In the static analysis directory, there are two subdirectories, bin and logs.

Writing a demo analyzer

The first »analyzer« is called demo.py, and gets saved in the /home/roland/proj/static-analysis/bin directory.

#! /usr/bin/env python

import sys

logfile = open("/home/roland/proj/static-analysis/logs/command_lines", "a")
logfile.write(" ".join(sys.argv[1:]) + "\n")
logfile.close()

I am using a hard-coded absolute path here because I don't trust the HOME environment variable to be properly set. Since some packages write files to the home directory of the user that builds the package, and this is not generally wanted, some other packaging systems already set HOME to a temporary directory while building packages. I expect that pkgsrc will follow soon.

Integration into pkgsrc

Calling the analyzer whenever a file is compiled or linked is pretty easy when you know what to do. In the pkgsrc infrastructure, there is the file mk/wrapper/wrapper.sh, which is the template for the pkgsrc compiler wrappers. It gets called whenever a package runs the compiler, the linker or libtool for building. The command line arguments that are given by the package are analyzed and slightly modified in various ways. Then the real compiler is run with these modified arguments, and this is the place where we step in. Look for the following line:

eval "$cmd" || wrapper_result="$?"

and place a similar line before it, so that it looks like this:

eval "/home/roland/proj/static-analysis/bin/demo.py $cmd" || wrapper_result="$?"
eval "$cmd" || wrapper_result="$?"

That's it.

Testing the analyzer

The /home/roland/proj/static-analysis now looks like this:

drwxr-xr-x   bin
-rwxr-xr-x   bin/demo.py
drwxr-xr-x   logs

Now let's pick a simple package from pkgsrc and try to build it. I usually take sysutils/same for this purpose.

$ cd $HOME/proj/pkgsrc/sysutils/same
$ bmake
...
=> Unwrapping files-to-be-installed.

That's great, building a simple package still works. Let's have a look if our »analyzer« had anything to report.

$ cd $HOME/proj/static-analysis/logs
$ ls -l
-rw-r--r--  1 roland  users  646 Jan 17 07:00 command_lines

Indeed, there is something.

$ cat command_lines
/tmp/roland/pkgsrc/sysutils/same/work.bacc/.gcc/bin/gcc \
  -O2 -Werror -c same.c \
  -I/tmp/roland/pkgsrc/sysutils/same/work.bacc/.buildlink/include \
  -L/tmp/roland/pkgsrc/sysutils/same/work.bacc/.buildlink/lib
/tmp/roland/pkgsrc/sysutils/same/work.bacc/.gcc/bin/gcc \
  -Wl,-R/home/roland/pkg/lib -o same same.o \
  -I/tmp/roland/pkgsrc/sysutils/same/work.bacc/.buildlink/include \
  -L/tmp/roland/pkgsrc/sysutils/same/work.bacc/.buildlink/lib \
  -lz

(The above output had been manually reindented to look better.)

A wrapper for the platform's native lint(1)

Now it's time to get a real program to work. Let's first take the platform's native lint. To get it working, first have a look at its manual page to see what command line options it understands. Many of the compiler's options can just be passed to lint. Some others need to be discarded and again others need to be added. So let's have a try, creating bin/lint.py:

#! /usr/bin/env python

# Takes a command line for a compiler call and runs lint(1) on the
# source files.

import os, re, sys

logfile = open("/home/roland/proj/static-analysis/logs/lint_log", "a")
lint = "/usr/bin/lint"
lintflags = "-aabcFhpxz"

def main():
	compiler = sys.argv[1]
	Dflags = []
	Iflags = []
	Lflags = []
	lflags = []
	Uflags = []
	c_sources = []
	compiling = False

	for a in sys.argv[2:]:
		opt = a[0:2]
		if   opt == "-D": Dflags += [a]
		elif opt == "-I": Iflags += [a]
		elif opt == "-L": Lflags += [a]
		elif opt == "-l": lflags += [a]
		elif opt == "-U": Uflags += [a]
		elif opt == "-c": compiling = True
		elif re.match("-W.*,", a):
			pass # ignore flags to the subprocesses
		elif opt == "-W":
			pass # ignore what are most likely GCC warnings
		elif a[0:1] == "-":
			logfile.write("W: unknown option %s\n" % a)
		elif re.match(".*\\.c$", a):
			c_sources += [a]
		else:
			logfile.write("W: unknown argument %s\n" % a)

	args = tuple([ lint, lintflags ] + Dflags + Uflags + Iflags + c_sources)
	do_run = (compiling and len(c_sources) != 0)
	logfile.write("D: in %s: %s %s\n" % (
		os.getcwd(),
		["skipping", "running"][do_run],
		" ".join(args)))

	if do_run:
		os.spawnv(os.P_WAIT, lint, args)

if __name__ == "__main__":
	main()

Testing the analyzer

$ cd $HOME/proj/static-analysis
$ cat > hello.c
main()
{
        printf("hello, world\n");
}
^D

This is a particularly bad programming style, let's see what lint(1) has to say about it.

$ ./bin/lint.py gcc hello.c
$ ./bin/lint.py gcc -c hello.c
hello.c:
Lint pass2:
printf returns value which is always ignored

$ cat logs/lint_log
D: in /home/roland/proj/static-analysis: skipping /usr/bin/lint -aabcFhpxz hello.c
D: in /home/roland/proj/static-analysis: running /usr/bin/lint -aabcFhpxz hello.c

In the first try, the -c option was not given, so no check took place. In the second call, the -c option was given, and lint was run. I had expected a little more warnings, but hey, it's working.

Integrating lint into pkgsrc

Now let's edit the pkgsrc compiler wrapper again, which is in mk/wrapper/wrapper.sh. Just replace demo.py with lint.py.

Testing everything

$ cd $HOME/proj/pkgsrc/sysutils/same
$ make
...
cc -O2 -Dpkgsrc_same_1_8_CFLAGS   -Werror  -Dpkgsrc_same_1_8_CPPFLAGS     -c    same.c
same.c:
/usr/include/sys/endian.h(193): warning: conversion from 'unsigned int' to 'unsigned char' may lose accuracy [132]
/usr/include/sys/endian.h(194): warning: conversion from 'int' to 'unsigned char' may lose accuracy [132]
/usr/include/sys/endian.h(202): warning: conversion from 'int' to 'unsigned char' may lose accuracy [132]
/usr/include/sys/endian.h(203): warning: conversion from 'unsigned int' to 'unsigned char' may lose accuracy [132]
...
=> Unwrapping files-to-be-installed.

It seems it has worked. There are many new warnings that have not been there before. Let's have a look at the log file:

$ cd $HOME/proj/static-analysis
$ cat logs/lint_log
W: unknown option -O2
W: unknown option -fmessage-length=0
W: unknown option -ggdb
D: in /tmp/roland/pkgsrc/sysutils/same/work.bacc/same-1.8: running /usr/bin/lint -aabcFhpxz -Dpkgsrc_same_1_8_CFLAGS -Dpkgsrc_same_1_8_CPPFLAGS -I/tmp/roland/pkgsrc/sysutils/same/work.bacc/.buildlink/include same.c
W: unknown option -o
W: unknown argument same
W: unknown argument same.o
W: unknown option -fmessage-length=0
W: unknown option -ggdb
D: in /tmp/roland/pkgsrc/sysutils/same/work.bacc/same-1.8: skipping /usr/bin/lint -aabcFhpxz -I/tmp/roland/pkgsrc/sysutils/same/work.bacc/.buildlink/include

There are plenty of things that can still be improved, but it is basically working. When compiling a file, lint is run on that file, and when linking objects together, lint is not run.

TODO

  • devel/splint
  • devel/cqual
  • a generic wrapper for many tools at once
  • better error checking in the wrapper
Personal tools