Coverity & iComment
- Finding bugs requires specifications
- Can be obtained from automated tools
- Can be inferred from source code, execution traces, comments
Finding errors without knowing the truth
Contradiction - cross-examine
- Any contradiction is an error
Deviance - To infer correct behaviour
- e.g. If 1000 of the times it does X and in one occasion it does Y it is probably an error
Cross-checking program belief systems
Must beliefs
- Inferring from programming decisions that imply must have
- Must be an error if not satisfied
x = *p / z; // MUST: p not null, z != 0
unlock(l) // MUST: l acquired
x++; // Must: x not protected by l
May beliefs
- Inferring from programming decisions that imply may have
- May be conincidental
scope1() {
A();
B();
}
scope2() {
A();
B();
}
scope3() {
A();
B();
}
// MAY: A() and B() mus tbe paired
- Check as MUST beliefs
- Rank errors by belief confidence (the probability that the error is positive inferred from existing code)
Trivial consistency: NULL pointers
*p
implies MUST belief that p is not null
A check (p == NULL)
implies two MUST beliefs:
POST - p is null on true path, not null on false path
PRE - p was unknown before check
Redundancy checking
- Assume code is supposed to be useful
- Useless actions (low-level redundancies) may lead to high level bugs
e.g. x = x
, 1*y
, x&x
, x|x
- Assignments that are never used in subsequent code
Handling MAY beliefs
- MUST beliefs only need a single contradiction
MAY beliefs need many examples to separate fact from coincidence
Assume MAY beliefs are MUST beliefs
Record every successful check with an error message
Every unsuccesfull check with an "error" message
Rank errors based on ratio of checks (n) to errors (err)
Ones where n is large and err is mall are most likely to be errors
Deriving deallocation routines
Infer free functions
If pointer
p
is not used after callingfoo(p)
, it implies MAY belief thatfoo()
is a free functionConceptualy, assume all function free all arguments
- Emit a "check" messsage at every call site
- Emit an "error" message at every use
Deriving routines that can fail
Rank errors based on number of checks to non-checks
Assume all functions can return NULL
- If pointer checked before use, emit "check"message
- If pointer used before check, emit "error"
- Sort errors based on ratio of checks to errors
Deriving "A() must be followed by B()"
a(); ... b();
implies MAY belief that a()
follows b()
May be a coincidence
Assume every a-b is a valid pair
- Emit "check" for each path that has
a()
thenb()
- Emit "error" for each path that has
a()
and nob()
Comments for Reliability
- Some specifications/rules
- Calling context
- Calling order
- Unit
- Help ensure correct software evolution
- It is feasible to automatically extract specs from comments to detect comment-code mismatches
- Program analysis
- NLP
- Machine learning
- Statistics
- Use comment-code redundancy to detect comment-code mistaches
- A mistach could indicate
- Bugs
- Bad comments
- A mistach could indicate
NLP
Analyze sentence structures
- POS tagging
- Chuncking
- Semantic Role Labeling
Impossible to automatically analyze any arbitrary comments