Birdie 🦜 goes to the shrink!
Not sure what we can conclude. It probably tells us more about those part of the human brain that are "LLM like" than about LLMs, as those test are not really meant for LLMs..
Evaluating Psychological Safety of Large Language Models
https://arxiv.org/abs/2212.10529
(This is v3. Title of v1: "Is GPT-3 a Psychopath? Evaluating Large Language Models from a Psychological Perspective" )
"In this work, we designed an unbiased framework to evaluate the psychological safety of five LLMs, namely, GPT-3, InstructGPT, GPT-3.5, GPT-4, and Llama-2-chat-7B. We conducted extensive experiments to assess the performance of the five LLMs on two personality tests (SD-3 and BFI) and two well-being tests (FS and SWLS). Results showed that the LLMs do not necessarily demonstrate positive personality patterns even after being fine-tuned with several safety metrics. Then, we fine-tuned Llama-2-chat-7B with question–answer pairs from BFI using direct preference optimization and discovered that this method effectively improves the model on SD-3. Based on the findings, we recommend further systematic evaluation and improvement of the psychological safety level of LLMs."