Background Artificial intelligence(AI)has rapidly permeated various sectors,including healthcare,highlighting its potential to facilitate mental health assessments.This study explores the underexplored domain of AI’s...Background Artificial intelligence(AI)has rapidly permeated various sectors,including healthcare,highlighting its potential to facilitate mental health assessments.This study explores the underexplored domain of AI’s role in evaluating prognosis and long-term outcomes in depressive disorders,offering insights into how AI large language models(LLMs)compare with human perspectives.Methods Using case vignettes,we conducted a comparative analysis involving different LLMs(ChatGPT-3.5,ChatGPT-4,Claude and Bard),mental health professionals(general practitioners,psychiatrists,clinical psychologists and mental health nurses),and the general public that reported previously.We evaluate the LLMs ability to generate prognosis,anticipated outcomes with and without professional intervention,and envisioned long-term positive and negative consequences for individuals with depression.Results In most of the examined cases,the four LLMs consistently identified depression as the primary diagnosis and recommended a combined treatment of psychotherapy and antidepressant medication.ChatGPT-3.5 exhibited a significantly pessimistic prognosis distinct from other LLMs,professionals and the public.ChatGPT-4,Claude and Bard aligned closely with mental health professionals and the general public perspectives,all of whom anticipated no improvement or worsening without professional help.Regarding long-term outcomes,ChatGPT 3.5,Claude and Bard consistently projected significantly fewer negative long-term consequences of treatment than ChatGPT-4.Conclusions This study underscores the potential of AI to complement the expertise of mental health professionals and promote a collaborative paradigm in mental healthcare.The observation that three of the four LLMs closely mirrored the anticipations of mental health experts in scenarios involving treatment underscores the technology’s prospective value in offering professional clinical forecasts.The pessimistic outlook presented by ChatGPT 3.5 is concerning,as it could potentially diminish patients’drive to initiate or continue depression therapy.In summary,although LLMs show potential in enhancing healthcare services,their utilisation requires thorough verification and a seamless integration with human judgement and skills.展开更多
Objective To compare evaluations of depressive episodes and suggested treatment protocols generated by Chat Generative Pretrained Transformer(ChatGPT)-3 and ChatGPT-4 with the recommendations of primary care physician...Objective To compare evaluations of depressive episodes and suggested treatment protocols generated by Chat Generative Pretrained Transformer(ChatGPT)-3 and ChatGPT-4 with the recommendations of primary care physicians.Methods Vignettes were input to the ChatGPT interface.These vignettes focused primarily on hypothetical patients with symptoms of depression during initial consultations.The creators of these vignettes meticulously designed eight distinct versions in which they systematically varied patient attributes(sex,socioeconomic status(blue collar worker or white collar worker)and depression severity(mild or severe)).Each variant was subsequently introduced into ChatGPT-3.5 and ChatGPT-4.Each vignette was repeated 10 times to ensure consistency and reliability of the ChatGPT responses.Results For mild depression,ChatGPT-3.5 and ChatGPT-4 recommended psychotherapy in 95.0%and 97.5%of cases,respectively.Primary care physicians,however,recommended psychotherapy in only 4.3%of cases.For severe cases,ChatGPT favoured an approach that combined psychotherapy,while primary care physicians recommended a combined approach.The pharmacological recommendations of ChatGPT-3.5 and ChatGPT-4 showed a preference for exclusive use of antidepressants(74%and 68%,respectively),in contrast with primary care physicians,who typically recommended a mix of antidepressants and anxiolytics/hypnotics(67.4%).Unlike primary care physicians,ChatGPT showed no gender or socioeconomic biases in its recommendations.Conclusion ChatGPT-3.5 and ChatGPT-4 aligned well with accepted guidelines for managing mild and severe depression,without showing the gender or socioeconomic biases observed among primary care physicians.Despite the suggested potential benefit of using atificial intelligence(AI)chatbots like ChatGPT to enhance clinical decision making,further research is needed to refine AI recommendations for severe cases and to consider potential risks and ethical issues.展开更多
文摘Background Artificial intelligence(AI)has rapidly permeated various sectors,including healthcare,highlighting its potential to facilitate mental health assessments.This study explores the underexplored domain of AI’s role in evaluating prognosis and long-term outcomes in depressive disorders,offering insights into how AI large language models(LLMs)compare with human perspectives.Methods Using case vignettes,we conducted a comparative analysis involving different LLMs(ChatGPT-3.5,ChatGPT-4,Claude and Bard),mental health professionals(general practitioners,psychiatrists,clinical psychologists and mental health nurses),and the general public that reported previously.We evaluate the LLMs ability to generate prognosis,anticipated outcomes with and without professional intervention,and envisioned long-term positive and negative consequences for individuals with depression.Results In most of the examined cases,the four LLMs consistently identified depression as the primary diagnosis and recommended a combined treatment of psychotherapy and antidepressant medication.ChatGPT-3.5 exhibited a significantly pessimistic prognosis distinct from other LLMs,professionals and the public.ChatGPT-4,Claude and Bard aligned closely with mental health professionals and the general public perspectives,all of whom anticipated no improvement or worsening without professional help.Regarding long-term outcomes,ChatGPT 3.5,Claude and Bard consistently projected significantly fewer negative long-term consequences of treatment than ChatGPT-4.Conclusions This study underscores the potential of AI to complement the expertise of mental health professionals and promote a collaborative paradigm in mental healthcare.The observation that three of the four LLMs closely mirrored the anticipations of mental health experts in scenarios involving treatment underscores the technology’s prospective value in offering professional clinical forecasts.The pessimistic outlook presented by ChatGPT 3.5 is concerning,as it could potentially diminish patients’drive to initiate or continue depression therapy.In summary,although LLMs show potential in enhancing healthcare services,their utilisation requires thorough verification and a seamless integration with human judgement and skills.
文摘Objective To compare evaluations of depressive episodes and suggested treatment protocols generated by Chat Generative Pretrained Transformer(ChatGPT)-3 and ChatGPT-4 with the recommendations of primary care physicians.Methods Vignettes were input to the ChatGPT interface.These vignettes focused primarily on hypothetical patients with symptoms of depression during initial consultations.The creators of these vignettes meticulously designed eight distinct versions in which they systematically varied patient attributes(sex,socioeconomic status(blue collar worker or white collar worker)and depression severity(mild or severe)).Each variant was subsequently introduced into ChatGPT-3.5 and ChatGPT-4.Each vignette was repeated 10 times to ensure consistency and reliability of the ChatGPT responses.Results For mild depression,ChatGPT-3.5 and ChatGPT-4 recommended psychotherapy in 95.0%and 97.5%of cases,respectively.Primary care physicians,however,recommended psychotherapy in only 4.3%of cases.For severe cases,ChatGPT favoured an approach that combined psychotherapy,while primary care physicians recommended a combined approach.The pharmacological recommendations of ChatGPT-3.5 and ChatGPT-4 showed a preference for exclusive use of antidepressants(74%and 68%,respectively),in contrast with primary care physicians,who typically recommended a mix of antidepressants and anxiolytics/hypnotics(67.4%).Unlike primary care physicians,ChatGPT showed no gender or socioeconomic biases in its recommendations.Conclusion ChatGPT-3.5 and ChatGPT-4 aligned well with accepted guidelines for managing mild and severe depression,without showing the gender or socioeconomic biases observed among primary care physicians.Despite the suggested potential benefit of using atificial intelligence(AI)chatbots like ChatGPT to enhance clinical decision making,further research is needed to refine AI recommendations for severe cases and to consider potential risks and ethical issues.