This is an initial attempt to implement evals for all (or most) of our
foundational examples. Before we release, we want to make sure all of them work
and reply properly. Until now this has been done manually, hopefully this will
be useful to speed up our release process.