Archive for April, 2009
Continuous parameter optimization
Posted by JohannesTheLittleScientist in science on April 26th, 2009
In the recent days and weeks I got into the topic of “continuous parameter optimization”. That is the case when you have a n-dimensional function that returns a value, and you want find a maximum or minimum.
In particular I was looking for an algorithm that finds one (local) maximum, and uses the least number of evaluations. Because in the scenario I have, evaluating the function takes hours (or days).
So what algorithms can you use?
I developed some variants of the most primitive and naive approach.
You can find the description and source here: http://github.com/JohannesBuchner/nobrain_optimization/
Later I found CONDOR which is much more sophisticated.
I adopted the project (GPL), so I can provide the same useful interface that allows to plug in any program or programming language.
I wrote an introduction here: http://wiki.github.com/JohannesBuchner/condor_optimization
What do I need this for?
I am developing a statistic and numerical application at the University of Vienna (Astronomy) with a number of input parameters. It takes hours to compute, and it can return a value of evaluation. I want to find the optimal values for the input parameters (that are not independent) in the shortest time (least evaluations) possible.
A good measurement for the performance of an optimization algorithm is Rosenbrock’s valley.
Bash: Chain of commands
Posted by JohannesTheDeveloper in fun with Linux on April 25th, 2009
I wondered more than once how to make a tool that chains together input/output filters using bash.
e.g. you would like to do a iteration of filters:
for i in 1 2 3 4; do grep -v $i; done
So I wrote this chaining tool that does the job in the obvious way:
chain.sh
#!/bin/bash
commands_file=$1
a=$(mktemp)
b=$(mktemp)
cat > $a
while read line; do
$line < $a > $b
mv $b $a
done < $commands_file
cat $a
rm $a
With this, you can do:
for i in 1 2 3 4; do
echo grep -v $i >> mycommands.txt
done
script/chain.sh mycommands.txt
lang=”bash”
Java concepts in C: classes
Posted by JohannesTheDeveloper in Happy Hacking on April 19th, 2009
While to some it might seem absurd, some concepts in Java development can be used more or less straight-forward in C. Lets start with basic things such as object-oriented programming. OOP is a way of programming that does not rely on the language in use.
You have several choices in C:
- (most common): use a .h file with the struct and all functions related to the struct. if the struct is called foo_bar, the functions are called foo_bar_init(), foo_bar_set_baz(), etc.
- (the gnome way): gobject is a class concept that uses preprocessor directives. It is as effective as ugly.
I prefer the first. A funky thing you can do in C is to put the functions as function pointers in the struct:
typedef struct {
int a;
(void)*set_a(int);
} myclass;
void myclass_set_a(myclass * c, int new_a) {
c->a = new_a;
}
myclass * myclass_init() {
myclass * c = (myclass*) malloc(sizeof(myclass));
c->set_a = myclass_set_a;
return c;
}
This allows you to write
c->set_a(c, 42);
in your code, which already has a very object-oriented-language touch. It is basically the same thing python does: The first argument of a method is the object, only if you are already in the object and call self.mymethod(), python fills in the self for you.
If you want something ugly, you add a type-unsafe subclassing concept:
typedef struct {
int a;
void * additional_data;
} classA;
classA * classA_init();
typedef struct {
int b;
void * additional_data;
} classB;
classA * classB_init() {
classB * b = (classB*)malloc(sizeof(classB));
classA * a = classA_init();
a->additional_data = (void*)b;
return a;
}
classB_set_b(classA * c, int newb) {
(classB*)(c->additional_data)->b = newb;
}
C: IDE of choice
Posted by JohannesTheDeveloper in Happy Hacking on April 19th, 2009
I used to use gedit as C IDE, I still use vim, but today I’d like you to try eclipsecdt.
It has the benefit of highlighting the line of errors and being able to jump there, rebuilding with a keystroke (Ctrl-B) without leaving the window and providing a tree view. So far, lots of C IDEs do that. anjuta for example. The thing that made the difference for me is that eclipsecdt provides completion for structs (functions too) when you haven’t written any letter yet. For example “mystructp->”(Ctrl-Space). I had a look at anjuta and BLOCKS.
Further, the outline (functions, structs, directives in the file) is nice as well as a very good preprocessor expansion/preview. Eclipsecdt also has a sufficient method of autoformatting your C code and also respects your preprocessor code.
So if you have
#define IFDEBUG if(1)
IFDEBUG
printf("blah");
depending on your settings, printf will be intended.
Last word: Intend with tabs and indent “case:”. If you don’t, you are wrong :-).
Edit: typo.
Java concepts in C: logging
Posted by JohannesTheDeveloper in Happy Hacking on April 19th, 2009
One will always run into the problem that one needs more detailed output from a section at some time, and less verbose output at another time. Or the whole output should be more verbose, or you want to investigate a segfault.
There are some logging libraries (log4c for example). I however use the following approach:
debug.h:
#ifdef DEBUG
#define IFDEBUG if(1)
#else
#define IFDEBUG if(0)
#endif
#ifdef SEGV
#define IFSEGV if(1)
#else
#define IFSEGV if(0)
#endif
#define debug(str) IFDEBUG { printf("\tDEBUG[%s]: %s\n", AT, str); fflush(NULL); }
#define dump_i(str, var) IFDEBUG { printf("\tDEBUG[%s]: %s: %i\n", AT, str, var); fflush(NULL); }
#define dump_p(str, var) IFDEBUG { printf("\tDEBUG[%s]: %s: %p\n", AT, str, var); fflush(NULL); }
in your code:
debug("we are now doing foo ");
IFSEGV
dump_p("m\n", (void*)m);
IFDEBUG
printf("\t\tn_par=%d; status=%d\n", get_n_par(m), status);
Example output:
DEBUG[tests/tests.c:196]: add starting points, ...
This allows you to recompile your code with -DDEBUG -DSEGV and enable/disable verbosity levels without constant code changes.
Java concepts in C: test-first development and unit-testing
Posted by JohannesTheDeveloper in Happy Hacking on April 19th, 2009
Unit testing is no rocket science. Do it as often as you can. Writing your own unit-testing library is simple: You need something that calls your test methods, and some form of assert-function that can die with style.
I chose the convention of simply returning 0 if the test is ok, and 1 if it fails:
/* example test function. return value is 0 iff succeeded.*/
int test_tests(void) {
return (count_tests() > 0 ? 0 : 1);
}
A assertEquals directive you can use.
#define ASSERTEQUALD(result, expected, text) { \
if(expected == result || expected - result < (expected+result)*0.001 ) { \
printf(" subtest ok: %s\n", text); \
} else { \
printf(" ASSERT EQUAL FAILED: expected: %e; got: %e; %s\n", expected, result, text); \
return 1; \
} \
}
Collect the tests you want to run
/* register of all tests */
int (*tests_registration[])(void) = {
/* this is test 1 */ /*test_tests, */
test_hist,
test_create,
test_alloc,
test_load,
test_resize,
test_append,
test_random,
test_mod,
test_write,
test_write_prob,
/* register more tests before here */
NULL,
};
And link with the following test runner. (run-tests.c)
/* we do testing of the inner works */
#include <ctype.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
extern int (*tests_registration[])(void);
int count_tests() {
int n = -1;
while (tests_registration[++n] != NULL)
;
return n;
}
/*
* when testnr in 1..n, execute the specific test.
* output is conforming to TAP
*/
int test(int testnr) {
int n = count_tests();
int r;
if (testnr <= 0 || testnr > n) {
fprintf(stderr, "Test number out of range (1..%d)\n", n);
return -1;
}
r = tests_registration[testnr - 1]( ) ;if(r == 0)
printf("ok %d\n", testnr);
else
printf("not ok %d - returned %d\n", testnr, r);
return r;
}
int test_all() {
int i = 0;
int n = count_tests();
int count = 0;
printf("1..%d\n", n);
while (i++ < n) {
count += test(i);
}
if (count != 0) {
printf(" %d of %d tests unsuccessful\n", count, n);
} else {
printf(" all %d tests successful\n", n);
}
return count;
}
void usage(char * progname) {
printf("SYNOPSIS: \n\n"
"%s \trun all tests\n"
"%s [<testnumber>]\trun a specific test by number\n"
"\n"
"Tests available: 1..%d\n\n"
"", progname, progname, count_tests());
}
int main(int argc, char ** argv) {
int t;
if (argc == 1) {
return test_all();
} else if (argc == 2 && isdigit(argv[1][0])) {
errno = 0;
t = strtol(argv[1], (char **) NULL, 10);
if (errno != 0) {
usage(argv[0]);
} else {
test(t);
}
} else {
usage(argv[0]);
}
return 0;
}
You can then run your tests selectively or the whole suite
$ ./tests.exe
...
ok 8
subtest ok: mod ints
subtest ok: mod doubles
subtest ok: mod doubles
ok 9
ok 10
all 10 tests successful
$ ./tests.exe 4
subtest ok: x 0
subtest ok: x 1
subtest ok: x last
DEBUG[tests/tests.c:196]: add starting points, ...
subtest ok: y 0
subtest ok: y 1
subtest ok: y last
ok 4
The beauty of this test suite (yes, there is some beauty in it), is that it is pluggable, extendable, debuggable and TAP (Test Anything Protocol) conform.
As with all test suites you will have to separate your code into testable chunks so that they are accessible by your test functions.
make: self-documenting Makefiles
Posted by JohannesTheDeveloper in Happy Hacking on April 19th, 2009
With the following grep/sed combinations, you can keep the help for each target inline just above your target (or wherever you want it).
## Makefile for foo
## help: this clutter
help:
@grep -E '^## ' Makefile|grep -v ':'|sed 's,^## ,,g'
@echo This Makefile has the targets:
@grep -E '^## [.a-z]{2,}:' Makefile|sed 's,^## *,\t,g' |sed 's,: ,\t,g'
## tests: run the tests
tests: tests.exe
./tests.exe ${TESTNR}
Output of
$ make help
Makefile for foo
This Makefile has the targets:
help this clutter
all
tests run the tests
...
Nice, leet and meta. :-)
Some more words on Makefiles: You can use something like
CFLAGS=-O3 -Wall -Werror -Wextra -g -ansi -pedantic ${CCFLAGS}
and pass CCFLAGS from outside. This avoids having many many versions while trying to test things out. For example:
$ CCFLAGS="-DDEBUG -Os -DUSED_ALGORITHM=binsearch2" make tests
(if you didn’t notice yet, -DFOO lets you jump inside #ifdef FOO).
C: Simple multi-processor optimization
Posted by JohannesTheDeveloper in Happy Hacking on April 19th, 2009
Programming threads explicitly is extremely verbose. But how else can you utilize all processors on your machine? You could use a compiler that is so good it can optimize your code and do everything.
However, with OpenMP it is as easy as adding 3 lines:
Look for a expensive loop in your code
+ #include <omp.h>
...
+#pragma omp parallel for
for (i = 0; i < n; i++) {
for (j = 0; j < n2; j++) {
do_something(m, i, j);
}
}
...
add -fopenmp to your compile-flags.
3 lines. By default, the number of processors used is the number available, and the work is split up between the threads automatically, but you can set it at runtime (!) using the environment variable OMP_NUM_THREADS.
OpenMP is of course much more powerful, but this little change might already improve your optimization. It scales very well if you have enough work for each thread.
Last word: The concepts and way of programming when sharing load between computers (clusters, grids) is totally different from multi-processor programming. You will have to choose at some point. OpenMP is not meant for clusters. Have a look at programming with BOINC.
C: Documenting is Duty
Posted by JohannesTheDeveloper in Happy Hacking on April 19th, 2009
Document your functions, your structs, your preprocessor directives, and most importantly, your general concept.
Doxygen provides a good tool for keeping the documentation inline. If your documentation is not next to your code, it will never be updated.
It produces nice html output. Make a make target. Write a general introduction with \mainpage.
C: Makefiles and compile flags
Posted by JohannesTheDeveloper in Happy Hacking on April 19th, 2009
I bet whole books have been written about the compile flags of gcc.
In my opinion it is good style to apply all the following flags to all your programs and try to enforce them onto others:
-Wall -Werror -Wextra -ansi -pedantic
This might seem a little harsh at first, but it has several benefits: 1) Errors in your code are detected before execution. 2) Your code is more portable.
The only case where the compile flags above are annoying, is when you have unused parameters. Trick:
void my_no_sort(int * a){
(void)a;
}
The line performs a no-op onto the parameter, saying “I voluntarily do not use this parameter”.
Optimization
I will try to keep this short, as there are good resources around.
1. Do not optimize. Premature optimization is the root of all evil. First make sure your program is correct and complete (A unimplemented routine is faster than a implemented one).
2. Measure, never assume. Optimize only where your bottlenecks are, and when you can measure the performance increase. Optimization generally leads to poorer code style. Concluding after 3 hours that a certain change does not improve your performance is a good reason to throw the change away. Use version control systems, even (especially?) if you program alone.
3. gcc drastically improved optimization when moving to 4.0/4.1. Try to use a new compiler. -O3 is not better than -O2 or -Os in all cases, especially if your program size is big in memory. O3 increases the program and swapping in/out between the various caches can become a problem.
4. Run tests. On various scenarios. Several times. Watch the deviation closely before concluding significance.
On the right side is what the evaluation results of various compile-flag configurations should look like.
5. Take it seriously or let it be. Many types of programs don’t need to be extremely fast: User interaction programs and logins can take up to 200ms for returning a result. A program in a higher-level scripting language might be easier to maintain. Compilers don’t need to be fast, correctness is so much more important.
At some point, be satisfied with the level of performance and try to keep it over the following versions :-)
Recent Comments