This is an initial attempt to implement evals for all (or most) of our foundational examples. Before we release, we want to make sure all of them work and reply properly. Until now this has been done manually, hopefully this will be useful to speed up our release process.