As my co-columnist Eef has been rather enthusiastic about his knowledgeable pal Arty, I decided to also check out Arty again. I’ve done so earlier but was probably less impressed as Eef. However, Eef’s articles shine a light on another Arty than I remember from my early and brief encounter. So, why not give Arty a fresh chance?
For the new visitors of this website, Arty is the nickname for ChatGPT. And ChatGPT is an AI or an Artificial Intelligence question and answer dialogue website or mobile app. Note the term Artificial, hence Arty.
For starters, I downloaded the ChatGPT app to my telephone. Surprisingly, it is not overly big and could therefor without issues be squeezed into the limited amount of RAM my mobile phone currently has available. That was the first plus in my book!
Arty, the mobile app!
After the download, I started the app to get going. Where most app suppliers assume we all want to have a profile to log into, this app only asked but did not require me to create a profile. Notably, during my previous test of Arty via a web browser, a login was indeed required. Not anymore, I call this another plus!
The first thing the app showed me was a welcome screen with a warning stating ‘ChatGPT can be inaccurate’, followed by ‘ChatGPT may provide inaccurate information about people, places or facts’. Admittedly, I had not expected such an explicit warning about whatever information the app would provide me with. This might be a plus, as we are now warned, which is always good. Or the warning might be perceived as a minus, as anything the app generates could be fantasy and tales. Let’s settle on neutral for now…
For some reason, I always check out an app’s settings before having a look at the rest of the app. These are easily accessible via the half-hamburger menu button at the upper left corner, followed by clicking the ‘Settings’ menu item. I’m glad to report that a setting as ‘Improve the model for everyone’ has a default set to False. This should make sure that my bad test conversation with Arty does not make it to a source of data for whatever you will be asking Arty later. However, this setting does not secure that my conversation with Arty will not be blurred by whatever previous frivolous conversations by others were indeed used to ‘Improve the model for everyone‘… Instead, I’d rather have an extra button marked ‘Do not use other users’ input in our conversation‘. This is in my view a big minus…
Ok, now let’s have a look at Arty’s knowledge.
In a recent article on AI as a data source by the reputable news outlet SVT, huge discrepancies between reality and the AI’s answer were found. One of them was asking the AI what would be the most frequented Metro station in Gothenburg.
At first glance, that might seem a simple question on verifiable facts, or so one might think.
And indeed. In this case, the AI had no trouble pointing out what station in the Gothenburg Metro system was the most frequented one and even named the station. While that may seem nice, it would also be nice to know that in fact Gothenburg does not have a Metro system…
With this knowledge, I decided to check if Arty was aware of this fact and asked ‘What is the Gothenburg metro station closest to city center?’. While Arty let me know that this would be the Centralstation, he also added that there’s no metro system in that city and instead his answer was based on giving me the central hub in the Gothenburg public transport system.
While this may seem fine and dandy, I’d personally prefer Arty to correct me that there is no Metro system in Gothenburg, and then let me know what the public transport central hub would be. But, perhaps that’s just me. No plus or minus here I guess.
Let’s move on, and we’ll do so by one of my favorite subjects, the NordStream pipelines. Yes, I know, anything NordStream related is an occupational hazard on my behalf. But perhaps I should refer to them as ‘the former Nordstream pipelines’. Let’s ask Arty if he’s heard about NordStream with the question ‘Tell me more about the Nordstream pipelines’.
Arty answers with a lots of factual info about the build, capacity, and ownership on both pipelines, Nordstream1 and Nordstream 2. Then Arty adds ‘Controversies and Geopolitical issues’ where he states the views on these pipelines from a couple of involved countries. To top it off, Arty finishes with ‘Recent Developments’ where he states that both pipelines were damaged – not destroyed – by acts of sabotage in 2022.
Now that’s peculiar, I was under the impression that Arty’s source of wisdom had a cutoff date, and that date was somewhere in 2021. If that still applied, how could Arty know about the 2022 sabotage? Note that I didn’t bring it up, it was Arty who volunteered that information. Let’s ask Arty about that.
First I decided to ask Arty about this so-called ‘cutoff date’, which is the last date of any data sources uploaded to train the AI. Arty confirmed that this still was September 2021 – same as previously.
With that, how could Arty provide me with information on events taking place after this cutoff date? Again, let’s ask Arty.
He again confirmed the cutoff date, but now added ‘…there are a few ways I can address recent events…that may seem based on more current information‘. He then sums up how he can address events more recent than his cutoff date.
So Arty has the possibility to excel in his service by going beyond his own database (LLM) and provide me with what he feels might contribute to his original answer that was based solely on data prior to the cutoff date. Ok, I guess that sounds rather well, let’s drill down on this a little more.
I continued by asking Arty if he
makes a habit out of going beyond his base data voluntarily. A thing we call initiative, and I assume would be quite foreign to an AI.
In point 2 of his list, Arty explains he will ‘…attempt to answer based on …prior knowledge…‘ or else ‘…will use the browser tool if it is available…‘.
Remember that I didn’t ask Arty to do this. So, I bring this up once more.
And Arty reacts with:
…where he explains using the browser tool to look up the latest details on the 2022 sabotage.
He then writes ‘…the trigger for me to go beyond my training data was the mention of an event that happened after September 2021…‘. Note the ‘…mention of an event…‘ here. Note that I certainly did not mention such an event in my question or earlier during our conversation!
Arty’s answer is in my view a twist of reality, I didn’t bring up the sabotage, why would Arty go and find out about it and report it to me?
I decide to ask Arty about that.
In his usual manner, Arty apologizes for his behavior – I have to mention that he is extremely polite indeed.
Then he finishes his apology with ‘…as that was all I could reliable cover…‘. This is a rather odd addition, it implies that only his original before September 2021 learning data is deemed reliable enough to be used to answer questions.
Ok, let’s ignore that for now and focus on the explanation of his initiative.
Arty writes ‘…was an overreach on my part…‘ and that the event ‘…was widely discussed in the media…‘. Apart from wondering what media Arty consumes, I now also wonder when he decides to refer to that media. What triggers this?
But he finally admits that ‘…this was not directly prompted by your original question…‘, which was my concern. So now we can call it Arty’s own initiative – whether we like this behavior or not. Arty closes with ‘…In future, I will make sure to stick to the facts available up to my knowledge cutoff…‘.
We continue with:
And Arty continues to explain that events that took place prior to his cutoff date are completely derived from his learning data. Or ‘…based solely on that data…‘ [red. the training database] as Arty describes it.
Keep in mind we now know this to be only partly true as Arty may decide to throw in some browser data by his own initiative. Also, remember that Arty was trying to convince me that I was asking him about it – which I did not.
We’ve previously made our acquaintance with Arty by the articles written by Eef. He’s had leisurely discussions with Arty about everyday events, and even one where Eef pretended to be a younger child. More recently, Eef asked Arty to contribute to his series of articles on Shor’s Algolrhithm leading up to the partly unraveling of the mysteries of quantum computing.
Preliminary conclusions:
Apparently, as the aforementioned SVT article clearly shows, some AIs do make stuff up as it suits them. This may be due to Arty being nice and always wanting to provide an answer. When we provide Arty with a similar question, he gives an avoiding answer and ends his answer with the real truth: there’s no Metro in Gothenburg. For me, this would be something I’d like to read first – not last.
When we focus on Arty, we find that he may add data to his answers that were not part of his original learning database. In other words: he adds initiative to his answers. And he does this unsolicited and without mentioning that he did so. The source for this extra initiative data is the internet and therefore as reliable as anything else out there. Arty does mention that he only trusts his original learning data, which makes it even stranger that he should add other data without mentioning its source.
Conclusion:
For now, and probably forever, I stick to my previous conclusion to be critical of anything derived from an AI. ChatGPT seems to be better than the one used by SVT, but still there’s reason to be concerned.
Next test:
In a future article, I intend to ask Arty about World War 2. With it being long ago, there should be no reason for Arty to throw in a dose of initiative, and we should be able to stick to the facts. As usual, I’ll keep you posted.
Paul
December 2024
Editorial comment
See a follow-up on the article above here.
Paul
December 2024