Testing Models Of Cognition At Scale