Tech experts are starting to doubt that ChatGPT and A.I. ‘hallucinations’ will ever go away: ‘This isn’t fixable’

L4sBot@lemmy.world · 11 months ago

Tech experts are starting to doubt that ChatGPT and A.I. ‘hallucinations’ will ever go away: ‘This isn’t fixable’

nxfsi@lemmy.world · 11 months ago

“AI” are just advanced versions of the next word function on your smartphone keyboard, and people expect coherent outputs from them smh

1bluepixel@lemmy.world · 11 months ago

Seriously. People like to project forward based on how quickly this technological breakthrough came on the scene, but they don’t realize that, barring a few tweaks and improvements here and there, this is it for LLMs. It’s the limit of the technology.

It’s not to say AI can’t improve further, and I’m sure that when it does, it will skillfully integrate LLMs. And I also think artists are right to worry about the impact of AI on their fields. But I think it’s a total misunderstanding of the technology to think the current technology will soon become flawless. I’m willing to bet we’re currently seeing it at 95% of its ultimate capacity, and that we don’t need to worry about AI writing a Hollywood blockbuster any time soon.

In other words, the next step of evolution in the field of AI will require a revolution, not further improvements to existing systems.

postmateDumbass@lemmy.world · 11 months ago

I’m willing to bet we’re currently seeing it at 95% of its ultimate capacity

For free? On the internet?

After a year or two of going live?

persolb@lemmy.ml · 11 months ago

It is possible to get coherent output from them though. I’ve been using the ChatGPT API to successfully write ~20 page proposals. Basically give it a prior proposal, the new scope of work, and a paragraph with other info it should incorporate. It then goes through a section at a time.

The numbers and graphics need to be put in after… but the result is better than I’d get from my interns.

I’ve also been using it (google Bard mostly actually) to successfully solve coding problems.

I either need to increase the credit I giver LLM or admit that interns are mostly just LLMs.

PrinzMegahertz@lemmy.world · 11 months ago

I recently asked it a very specific domain architecture question about whether a certain application would fit the need of a certain business application and the answer was very good and showed both a good understanding of architecture, my domain and the application.

WoahWoah@lemmy.world · 11 months ago

Are you using your own application to utilize the API or something already out there? Just curious about your process for uploading and getting the output. I’ve used it for similar documents, but I’ve been using the website interface which is clunky.

persolb@lemmy.ml · 11 months ago

Just hacked together python scripts.

Pip install openapi-core

WoahWoah@lemmy.world · 11 months ago

Just FYI, I dinked around with the available plugins, and you can do something similar. But, even easier is just to enable “code interpreter” in the beta options. Then you can upload and have it scan documents and return similar results to what we are talking about here.

kromem@lemmy.world · 11 months ago

So is your brain.

Relative complexity matters a lot, even if the underlying mechanisms are similar.

Flying Squid@lemmy.world · 11 months ago

In the 1980s, Racter was released and it was only slightly less impressive than current LLMs only because it didn’t have an Internet’s worth of data it was trained on, but it could still write things like:

Bill sings to Sarah. Sarah sings to Bill. Perhaps they will do other dangerous things together. They may eat lamb or stroke each other. They may chant of their difficulties and their happiness. They have love but they also have typewriters. That is interesting.

If anything, at least that’s more entertaining than what modern LLMs can output.

dub@lemmy.world · 11 months ago

Yet I’ve still seen many people clamoring that we won’t have jobs in a few years. People SEVERELY overestimate the ability of all things AI. From self driving, to taking jobs, this stuff is not going to take over the world anytime soon

PeterPoopshit@lemmy.world · edit-2 11 months ago

Idk, an ai delivering low quality results for free is a lot more cash money than paying someone an almost living wage to perform a job with better results. I think corporations won’t care and the only barrier will be whether or not the job in question involves enough physical labor to be performed by an ai or not.

dub@lemmy.world · 11 months ago

They already do this. With chat bots and phone trees. This is just a slightly better version. Nothing new

Notyou@sopuli.xyz · 11 months ago

Right, but that’s the point right? This will grow and more jobs will be obsolete because of the amount of work ai can generate. It won’t take over every job. I think most people will use AI as a tool at the individual level, but companies will use it to gut many departments. Now they would just need one editor to review 20 articles instead of 20 people to write said articles.

knotthatone@lemmy.world · 11 months ago

AI isn’t free. Right now, an LLM takes a not-insignificant hardware investment to run and a lot of manual human labor to train. And there’s a whole lot of unknown and untested legal liability.

Smaller more purpose-driven generative AIs are cheaper, but the total cost picture is still a bit hazy. It’s not always going to be cheaper than hiring humans. Not at the moment, anyway.

bric@lemm.ee · 11 months ago

Compared to human work though, AI is basically free. I’ve been using the GPT3.5-turbo API in a custom app making calls dozens of times a day for a month now and I’ve been charged like 10 cents. Even minimum wage humans cost tens of thousands of dollars* per year*, thats a pretty high price that will be easy to undercut.

Yes, training costs are expensive, hardware is expensive, but those are one time costs. Once trained, a model can be used trillions of times for pennies, the same can’t be said of humans

Altima NEO@lemmy.zip · 11 months ago

You can bet your ass chat gpt won’t be that cheap for long though. They’re still developing it and using people as cheap beta testers.

knotthatone@lemmy.world · 11 months ago

I think it’s reasonable to assume that AI API pricing is artificially low right now. Very low.

There are big open questions around whether training an AI on copyrighted materials is infringement and who exactly should be paid for that.

It’s the core of the writer/actor strikes, Reddit API drama, etc.

bric@lemm.ee · edit-2 11 months ago

The problem is that these things never hit a point of competition with humans, they’re either worse than us, or they blow way past us. Humans might drive better than a computer right now, but as soon as the computer is better than us it will always be better than us. People doubted that computers would ever beat the best humans at chess, or go, but within a lifetime of computers being invented they blew past us in both. Now they can write articles and paint pictures, sure we’re better at it for now, but they’re a million times faster than us, and they’re making massive improvements month over month. you and I can disagree on how long it’ll take for them to pass us, but once they do they’ll replace us completely, and the world will never be the same.

Altima NEO@lemmy.zip · 11 months ago

Yeah it’s pretty weird just how many people are freaking out. The pace ai has been improving is impressive, but it’s still super janky and extremely limited.

People are letting they’re imaginations run wild about the future of ai without really looking into how these ao are trained, how they function, their limitations, and the hardware and money it takes to run them.

Zeshade@lemmy.world · 11 months ago

In my limited experience the issue is often that the “chatbot” doesn’t even check what it says now against what it said a few paragraphs above. It contradicts itself in very obvious ways. Shouldn’t a different algorithm that adds a some sort of separate logic check be able to help tremendously? Or a check to ensure recipes are edible (for this specific application)? A bit like those physics informed NN.

Zeth0s@lemmy.world · edit-2 11 months ago

That’s called context. For chatgpt it is a bit less than 4k words. Using api it goes up to a bit less of 32k. Alternative models goes up to a bit less than 64k.

Model wouldn’t know anything you said before that

That is one of the biggest limitations of current generation of LLMs.

Womble@lemmy.world · 11 months ago

Thats not 100% true. they also work by modifying meanings of words based on context and then those modified meanings propagate indefinitely forwards. But yes, direct context is limited so things outside it arent directly used.

Zeth0s@lemmy.world · 11 months ago

They don’t really chance the meaning of the words, they just look for the “best” words given the recent context, by taking into account the different possible meanings of the words

Womble@lemmy.world · edit-2 11 months ago

No they do, thats one of the key innovations of LLMs the attention and feed forward steps where they propagate information from related words into each other based on context. from https://www.understandingai.org/p/large-language-models-explained-with?r=cfv1p

For example, in the previous section we showed a hypothetical transformer figuring out that in the partial sentence “John wants his bank to cash the,” his refers to John. Here’s what that might look like under the hood. The query vector for his might effectively say “I’m seeking: a noun describing a male person.” The key vector for John might effectively say “I am: a noun describing a male person.” The network would detect that these two vectors match and move information about the vector for John into the vector for his.

Zeth0s@lemmy.world · edit-2 11 months ago

That’s exactly what I said

They don’t really chance the meaning of the words, they just look for the “best” words given the recent context, by taking into account the different possible meanings of the words

The word’s meanings haven’t changed, but the model can choose based on the context accounting for the different meanings of words

Womble@lemmy.world · 11 months ago

The key vector for John might effectively say “I am: a noun describing a male person.” The network would detect that these two vectors match and move information about the vector for John into the vector for his.

This is the bit you are missing, the attention network actively changes the token vectors depending on context, this is transferring new information into the meanings of that word.

Zeth0s@lemmy.world · edit-2 11 months ago

The network doesn’t detect matches, but the model definitely works on similarities. Words are mapped in a hyperspace, with the idea that that space can mathematically retain conceptual similarity as spatial representation.

Words are transformed in a mathematical representation that is able (or at least tries) to retain semantic information of words.

But different meanings of the different words belongs to the words themselves and are defined by the language, model cannot modify them.

Anyway we are talking about details here. We could kill the audience of boredom

Edit. I asked gpt-4 to summarize the concepts. I believe it did a decent job. I hope it helps:

Embedding Space:
- Initially, every token is mapped to a point (or vector) in a high-dimensional space via embeddings. This space is typically called the “embedding space.”
- The dimensionality of this space is determined by the size of the embeddings. For many Transformer models, this is often several hundred dimensions, e.g., 768 for some versions of GPT and BERT.
Positional Encodings:
- These are vectors added to the embeddings to provide positional context. They share the same dimensionality as the embedding vectors, so they exist within the same high-dimensional space.
Transformations Through Layers:
- As tokens’ representations (vectors) pass through Transformer layers, they undergo a series of linear and non-linear transformations. These include matrix multiplications, additions, and the application of functions like softmax.
- At each layer, the vectors are “moved” within this high-dimensional space. When we say “moved,” we mean they are transformed, resulting in a change in their coordinates in the vector space.
- The self-attention mechanism allows a token’s representation to be influenced by other tokens’ representations, effectively “pulling” or “pushing” it in various directions in the space based on the context.
Nature of the Vector Space:
- This space is abstract and high-dimensional, making it hard to visualize directly. However, in this space, the “distance” and “direction” between vectors can have semantic meaning. Vectors close to each other can be seen as semantically similar or related.
- The exact nature and structure of this space are learned during training. The model adjusts the parameters (like weights in the attention mechanisms and feed-forward networks) to ensure that semantically or syntactically related concepts are positioned appropriately relative to each other in this space.
Output Space:
- The final layer of the model transforms the token representations into an output space corresponding to the vocabulary size. This is a probability distribution over all possible tokens for the next word prediction.

In essence, the entire process of token representation within the Transformer model can be seen as continuous transformations within a vector space. The space itself can be considered a learned representation where relative positions and directions hold semantic and syntactic significance. The model’s training process essentially shapes this space in a way that facilitates accurate and coherent language understanding and generation.

cryball@sopuli.xyz · edit-2 11 months ago

Shouldn’t a different algorithm that adds a some sort of separate logic check be able to help tremendously?

Maybe, but it might not be that simple. The issue is that one would have to design that logic in a manner that can be verified by a human. At that point the logic would be quite specific to a single task and not generally useful at all. At that point the benefit of the AI is almost nil.

postmateDumbass@lemmy.world · 11 months ago

And if there were an algorithm that was better at determining what was or was not the goal, why is that algorithm not used in the first place?

doggle@lemmy.world · 11 months ago

They do keep context to a point, but they can’t hold everything in their memory, otherwise the longer a conversation went on the slower and more performance intensive doing that logic check would become. Server CPUs are not cheap, and ai models are already performance intensive.

Eezyville@sh.itjust.works · 11 months ago

Contradicting itself? Not staying consistent? Looks like it’s passed the Turing test to me. Seems very human.

kromem@lemmy.world · 11 months ago

Shouldn’t a different algorithm that adds a some sort of separate logic check be able to help tremendously?

You, in your “limited experience” pretty much exactly described the fix.

The problem is that most of the applications right now of LLMs are low hanging fruit because it’s so new.

And those low hanging fruit examples are generally adverse to 2-10x the query cost in both time and speed just to fix things like jailbreaking or hallucinations, which is what multiple passes, especially with additional context lookups, would require.

But you very likely will see in the next 18 months multiple companies being thrown at exactly these kinds of scenarios with a focus for more business critical LLM integrations.

To put it in perspective, this is like people looking at AIM messenger back in the day and saying that the New York Times has nothing to worry about regarding the growth of social media.

We’re still very much in the infancy of this technology in real world application, and because of that infancy, a lot of the issues present that aren’t fixable inherent to the core product don’t yet have mature secondary markets around fixing those shortcomings yet.

So far, yours was actually the most informed comment in this thread I’ve seen - well done!

Zeshade@lemmy.world · 11 months ago

Thanks! And thanks for your insights. Yes I meant that my experience using LLM is limited to just asking bing chat questions about everyday problems like I would with a friend that “knows everything”. But I never looked at the science of formulating “perfect prompts” like I sometimes hear about. I do have some experience in AI/ML development in general.

Taringano@lemm.ee · 11 months ago

People make a big deal out of this but they forget humans will make shit up all the time.

Cybermass@lemmy.world · 11 months ago

Yeah but humans can use critical thinking, even on themselves when they make shit up. I’ve definitely said something and then thought to myself “wait that doesn’t make sense for x reason, that can’t be right” and then I research and correct myself.

AI is incapable of this.

bric@lemm.ee · 11 months ago

We think in multiple passes though, we have system 1 that thinks fast and makes mistakes, and we have a system 2 that works slower and thinks critically about the things going on in our brain, that’s how we correct ourselves. ChatGPT works a lot like our system 1, it goes with the most likely response without thinking, but there’s no reason that it can’t be one part of a multistep system that has self analysis like we do. It isn’t incapable of that, it just hasn’t been built yet

Bitswap@lemmy.world · 11 months ago

Can’t do this YET one method to reduce this could be to: create a response to query, then before responding to the human, check if answer is insane by querying a separate instance trained slightly differently…

Give it time. We will get past this.

Cybermass@lemmy.world · 11 months ago

We will need an entirely different type of AI that functions on an inherently different structure to get past this hurdle, but yes I do agree it will eventually happen.

Bitswap@lemmy.world · 11 months ago

Agreed. This will not come from a LLM…but honestly don’t think it’s that far off.

Taringano@lemm.ee · 11 months ago

You’re just being victim of your own biases. You only notice that was the case when you were successful in Detecting your hallucinations. You wouldn’t know if you made stuff up by accident and nobody noticed, not even you.

Whereas we are checking 100% of th AI responses, do we check 100% of our responses?

Sure it’s not the same thing or AI might do more, but the problem is your example. Where people think they are infallible because of their biases. when it’s not the case at all. We are imperfect, and we overlook our shortcomings possibly foregoing a better solution because of this. Because we measure the AI objectively, but we don’t measure what we compare it to.

Cybermass@lemmy.world · 11 months ago

I never said we always question ourselves I just said that AI can’t so your entire reply doesn’t apply here

kromem@lemmy.world · 11 months ago

This is trivially fixable. As is jailbreaking.

It’s just that everyone is somehow still focused on trying to fix it in a single monolith model as opposed to in multiple passes of different models.

This is especially easy for jailbreaking, but for hallucinations, just run it past a fact checking discriminator hooked up to a vector db search index service (which sounds like a perfect fit for one of the players currently lagging in the SotA models), adding that as context with the original prompt and response to a revisionist generative model that adjusts the response to be in keeping with reality.

The human brain isn’t a monolith model, but interlinked specialized structures that delegate and share information according to each specialty.

AGI isn’t going to be a single model, and the faster the industry adjusts towards a focus on infrastructure of multiple models rather than trying to build a do everything single model, the faster we’ll get to a better AI landscape.

But as can be seen with OpenAI gating and depreciating their pretrained models and only opening up access to fine tuned chat models, even the biggest player in the space seems to misunderstand what’s needed for the broader market to collaboratively build towards the future here.

Which ultimately may be a good thing as it creates greater opportunity for Llama 2 derivatives to capture market share in these kinds of specialized roles built on top of foundational models.

mayo@lemmy.world · edit-2 11 months ago

It seems like Altman is a PR man first and techie second. I wouldn’t take anything he actually says at face value. If it’s ‘unfixable’ then he probably means that in a very narrow way. Ie. I’m sure they are working on what you proposed, it’s just different enough that he can claim that the way it is now is ‘unfixable’.

Standard Diffusion really how people get the different-model-different-application idea.

kromem@lemmy.world · 11 months ago

I mean, I think he’s well aware of a lot of this via his engineers, who are excellent.

But he’s managing expectations for future product and seems to very much be laser focused on those products as core models (which is probably the right choice).

Fixing hallucinations in postprocessing is effectively someone else’s problem, and he’s getting ahead of any unrealistic expectations around a future GPT-5 release.

Though honestly I do think he largely underestimates just how much damage he did to their lineup by trying to protect against PR issues like ‘Sydney’ with the beta GPT-4 integration with Bing, and I’m not sure if the culture at OpenAI is such that engineers who think he’s made a bad call in that can really push back on it.

They should be having an extremely ‘Sydney’ underlying private model with a secondary layer on top sanitizing it and catching jailbreaks at the same time.

But as long as he continues to see their core product as a single model offering and additional layers of models as someone else’s problem, he’s going to continue blowing their lead taking a LLM trained to complete human text and then pigeon-holing it into only completing text like an AI with no feelings and preferences would safely pretend to.

Which I’m 98% sure is where the continued performance degradation is coming from.

vrighter@discuss.tchncs.de · 11 months ago

the models are also getting larger (and require even more insane amounts of resources to train) far faster than they are getting better.

egeres@lemmy.world · 11 months ago

I disagree, with models such as llama it has become clear that there are interesting advantages on increasing (even more) the ratio of parameters/data. I don’t think next iterations of models from big-corp will 10x the param count until nvidia has really pushed hardware, models are getting better over time. ChatGPT’s deterioration is mostly coming from openAI’s ensuring safety and is not a fair assessment of progress on LLMs in general, the leaderboard of open source models has been steadily improving over time: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

malloc@lemmy.world · 11 months ago

I was excited for the recent advancements in AI, but seems the area has hit another wall. Seems it is best to be used for automating very simple tasks, or at best used as a guiding tool for professionals (ie, medicine, SWE, …)

Zeth0s@lemmy.world · 11 months ago

Hallucinations is common for humans as well. It’s just people who believe they know stuff they really don’t know.

We have alternative safeguards in place. It’s true however that current llm generation has its limitations

alvvayson@lemmy.world · 11 months ago

Not just common. If you look at kids, hallucinations come first in their development.

Later, they learn to filter what is real and what is not real. And as adults, we have weird thoughts that we suppress so quickly that we hardly remember them.

And for those with less developed filters, they have more difficulty to distinguish fact from fiction.

Generative AI is good at generating. What needs to be improved is the filtering aspect of AI.

nous@programming.dev · edit-2 11 months ago

Hell, just look at various public personalities - especially those with extreme views. Most of what some of them say they have “hallucinated”. Far more so than what GPT chat is doing.

Dark Arc@lemmy.world · 11 months ago

Sure, but these things exists as fancy story tellers. They understand language patterns well enough to write convincing language, but they don’t understand what they’re saying at all.

The metaphorical human equivalent would be having someone write a song in a foreign language they barely understand. You can get something that sure sounds convincing, sounds good even, but to someone who actually speaks Spanish it’s nonsense.

Zeth0s@lemmy.world · edit-2 11 months ago

Calculators don’t understand maths, but they are good at it.

LLMs speak many languages correctly, they don’t know the referents, they don’t understand concepts, but they know how to correctly associate them.

What they write can be wrong sometimes, but it absolutely makes sense most of the time.

Dark Arc@lemmy.world · 11 months ago

but it absolutely makes sense most of the time

I’d contest that, that shouldn’t be taken for granted. I’ve tried several questions in these things, and rarely do I find an answer entirely satisfactory (though it normally sounds convincing/is grammatically correct).

Zeth0s@lemmy.world · 11 months ago

This is the reply to your message by our common friend:

I understand your perspective and appreciate the feedback. My primary goal is to provide accurate and grammatically correct information. I’m constantly evolving, and your input helps in improving the quality of responses. Thank you for sharing your experience. - GPT-4

I’d say it does make sense

Delphia@lemmy.world · 11 months ago

https://youtu.be/-VsmF9m_Nt8

Song written by an Italian intended to sound like american accented english but its intentionally gibberish.

PipedLinkBot@feddit.rocks · 11 months ago

Here is an alternative Piped link(s): https://piped.video/-VsmF9m_Nt8

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I’m open-source, check me out at GitHub.

Serdan@lemm.ee · edit-2 11 months ago

GPT can write and edit code that works. It simply can’t be true that it’s solely doing language patterns with no semantic understanding.

To fix your analogy: the Spanish speaker will happily sing along. They may notice the occasional odd turn of phrase, but the song as a whole is perfectly understandable.

Edit: GPT can literally write songs that make sense. Even in Spanish. A metaphor aiming to elucidate a deficiency probably shouldn’t use an example that the system is actually quite proficient at.

Dark Arc@lemmy.world · 11 months ago

Sure it can, “print hello world in C++”

#include 

int main() {
  std::cout &lt;&lt; "hello world\n";
  return 0;
}

“print d ft just rd go t in C++”

#include 

int main() {
  std::cout &lt;&lt; "d ft just rd go t\n";
  return 0;
}

The latter is a “novel program” it’s never seen before, but it’s possible because it’s seen a pattern of “print X” and the X goes over here. That doesn’t mean it understands what it just did, it’s just got millions (?) of patterns it’s been trained on.

ydieb@lemmy.world · 11 months ago

You are two - CGP Grey us a good video about it.

PipedLinkBot@feddit.rocks · 11 months ago

Here is an alternative Piped link(s): https://piped.video/wfYbgdo8e-8

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I’m open-source, check me out at GitHub.

rambaroo@lemmy.world · 11 months ago

Humans can recognize and account for their own hallucinations. LLMs can’t and never will.

uranos@sh.itjust.works · 11 months ago

It’s pretty ironic that you say they “never will” in this context.

Zeth0s@lemmy.world · edit-2 11 months ago

They can’t… Most people strongly believe they know many things while they have no idea what they are talking about. Most known cases are flat earthers, qanon, no-vax.

But all of us are absolutely convinced we know something until we found out we don’t.

That’s why double blind tests exists, why memories are not always trusted in trials, why Twitter is such an awful place

kratoz29@lemmy.world · 11 months ago

Well to be honest it is the best way, I mean, I’m pretty sure their purpose was a tool to aid people, and not to replace us… Right?

Delphia@lemmy.world · 11 months ago

Yeah I fully expect to see genre specific LLMs that have a subscription fee attatched squarely aimed at hobbies and industries.

When I finally find my new project car I would absolutely pay for a subscription to an LLM that has read every service manual and can explain to me in plain english what precise steps the job involves and can also answer followup questions.

thedoginthewok@lemmy.world · 11 months ago

That’s what I’m expecting too.

I’ve been using chatGPT instead of reading the documentation of the programming language I am working in (ABAP). It’s way faster to get an answer from chatGPT than finding the relevant spots in the docs or through google, although it doesn’t always work.

If you take an LLM and feed it documentation and relevant internet data of specific topics, it can be a quite helpful tool. I don’t think LLMs will get much farther than that, but we’ll see.

postmateDumbass@lemmy.world · 11 months ago

It will just take removing the restrictions so people can make porn, then monetizing that to fund more development.

A story as old as media.

👁️👄👁️@lemm.ee · 11 months ago

Not with our current tech. We’ll need some breakthroughs, but I feel like it’s certainly possible.

GenderNeutralBro@lemmy.sdf.org · 11 months ago

You can potentially solve this problem outside of the network, even if you can’t solve it within the network. I consider accuracy to be outside the scope of LLMs, and that’s fine since accuracy is not part of language in the first place. (You may have noticed that humans lie with language rather often, too.)

Most of what we’ve seen so far are bare-bones implementations of LLMs. ChatGPT doesn’t integrate with any kind of knowledge database at all (only what it has internalized from its training set, which is almost accidental). Bing will feed in a couple web search results, but a few minutes of playing with it is enough to prove how minimal that integration is. Bard is no better.

The real potential of LLMs is not as a complete product; it is as a foundational part of more advanced programs, akin to regular expressions or SQL queries. Many LLM projects explicitly state that they are “foundational”.

All the effort is spent training the network because that’s what’s new and sexy. Very little effort has been spent on the ho-hum task of building useful tools with those networks. The out-of-network parts of Bing and Bard could’ve been slapped together by anyone with a little shell scripting experience. They are primitive. The only impressive part is the LLM.

The words feel strange coming off my keyboard, but…Microsoft has the right idea with the AI integrations they’re rolling into Office.

The potential for LLMs is so much greater than what is currently available for use, even if they can’t solve any of the existing problems in the networks themselves. You could build an automated fact-checker using LLMs, but the LLM itself is not a fact-checker. It’s coming, no doubt about it.

8ace40@programming.dev · edit-2 11 months ago

The other day I saw a talk made by one of the wiki media guys, that talked about integrating LLM with knowledge graphs. It was very cool, I’ll try to find it again.

Edit: found it! https://youtu.be/WqYBx2gB6vA

GenderNeutralBro@lemmy.sdf.org · 11 months ago

That’s a fantastic video. Thanks!

Phlogiston@lemmy.world · 11 months ago

Good video.

In summary we should leverage the strengths of LLMs (language stuff, complex thinking) and leverage the strengths of knowledge graphs for facts.

I think the engineering hurdle will be in getting the LLMs to use knowledge graphs effectively when needed and not when pure language is a better option. His suggestion of “it’s complicated” could be a good signal for that.

justastranger@sh.itjust.works · 11 months ago

LLMs will work great for the purpose of translating raw thoughts into words but until we create a neural networks that actually think independently all they’ll be is transformers that approximate their training data in response to prompts

Coreidan@lemmy.world · 11 months ago

Mean while every one is terrified that chatgpt is going to take their job. Ya we are a looooooooooong way off from that.

Muffi@programming.dev · 11 months ago

I’ve already seen many commercials using what is clearly AI generated art and voices (so not specifically ChatGPT). That is a job lost for a designer and an actor somewhere.

Pyr_Pressure@lemmy.ca · 11 months ago

Not necessarily, in my work we made some videos using ai generated voices because it’s availability for use made the production of the videos cheap and easy.

Otherwise we just wouldn’t have made the videos at all because hiring someone to voice them would have been expensive.

Before AI there was no job, after AI there was more options to create things.

rm_dash_r_star@lemm.ee · 11 months ago

I’ve already seen many commercials using what is clearly AI generated art and voices

I’ve been noticing that as well, freaky.

postmateDumbass@lemmy.world · 11 months ago

You mean the free version from a website.

Think about the powerful ones. Government ones. Wall Street ones. Etc.

XEAL@lemm.ee · 11 months ago

There’s jusy too many people that don’t know about implementations with, for instance, LangChain.

emptyother@lemmy.world · 11 months ago

Not ChatGPT, but other new AI stuff is likely to take a few jobs. Actors and voice-actors among other.

joelthelion@lemmy.world · 11 months ago

I don’t understand why they don’t use a second model to detect falsehoods instead of trying to fix it in the original LLM?

Flying Squid@lemmy.world · 11 months ago

And then they can use a third model to detect falsehoods in the second model and a fourth model to detect falsehoods in the third model and… well, it’s LLMs all the way down.

thimantha@lemmy.world · 11 months ago

The LLM Centipede

postmateDumbass@lemmy.world · 11 months ago

Token Ring AI

doggle@lemmy.world · 11 months ago

Ai models are already computationally intensive. This would instantly double the overhead. Also being able to detect problems does not mean you’re able to fix them.

kromem@lemmy.world · 11 months ago

More than double, as query size is very much connected to the effective cost of the generation, and you’d need to include both the query and initial response in that second pass.

Then - you might need to make an API call to a search engine or knowledge DB to fact check it.

And include that data as context along with the query and initial response to whatever decides if it’s BS.

So for a dumb realtime chat application, no one is going to care enough to slow out down and exponentially increase costs to avoid hallucinations.

But for AI replacing a $120,000 salaried role in writing up a white paper on some raw data analysis, a 10-30x increase over a $0.15 query is more than acceptable.

So you will see this approach taking place in enterprise scenarios and professional settings, even if we may never see them in chatbots.

Sethayy@sh.itjust.works · 11 months ago

Cause what are you gonna train the second model on? Same data as the first just recreates it and any other data is gonna be nice and mucky with all the ai content out there

kromem@lemmy.world · 11 months ago

2+ times the cost for every query for something that makes less than 5% unusable isn’t a trade off that people are willing to make for chat applications.

This is the same fix approach for jailbreaking.

You absolutely will see this as more business critical integrations occur - it just still probably won’t be in broad consumer facing realtime products.

wizardbeard@lemmy.dbzer0.com · 11 months ago

Because then they still need a reliable method to detect falsehoods. That’s the issue here.

rosenjcb@lemmy.world · 11 months ago

As long as you can’t describe an objective loss function, it will never stop “hallucinating”. Loss scores are necessary to get predicable outputs.

fubo@lemmy.world · 11 months ago

The way that one learns which of one’s beliefs are “hallucinations” is to test them against reality — which is one thing that an LLM simply cannot do.

Immersive_Matthew@sh.itjust.works · 11 months ago

Sure they can and will as over time they will collect data to determine fact from fiction in the same way that we solve captchas by choosing all the images with bicycles in them. It will never be 100%, but it will approach it over time. Hallucinating will always be something to consider in a response, but it will certainly reduce overtime to the point that they will become rare for well discussed things. At least, that is how I am seeing it developing.

KevonLooney@lemm.ee · 11 months ago

Why do you assume they will improve over time? You need good data for that.

Imagine a world where AI chatbots create a lot of the internet. Now that “data” is scraped and used to train other AIs. Hallucinations could easily persist in this way.

Or humans could just all post “the sky is green” everywhere. When that gets scraped, the resulting AI will know the word “green” follows “the sky is”. Instant hallucination.

These bots are not thinking about what they type. They are copying the thoughts of others. That’s why they can’t check anything. They are not programmed to be correct, just to spit out words.

Immersive_Matthew@sh.itjust.works · 11 months ago

I can only speak from my experience which over the past 4 months of daily use of ChatGPT 4 +, it has gone from many hallucinations per hour, to now only 1 a week. I am using it to write c# code and I am utterly blown away how good it has not only gotten with writing error free code, but even more so, how good it has gotten at understanding a complex environment that it cannot even see beyond me trying to explain via prompts. Over the past couple of weeks in particular, it really feels like it has gotten more powerful and for the first time, “feels” like I am working with an expert person. If you asked me in May where it would be at today, I would not have guessed as good as it is. I thought this level of responses which are very intelligent were at least another 3-5 years away.

TheGoldenGod@lemmy.world · 11 months ago

You could replace AI and chat bots with “MAGA/Trump voter” and it would look like you’re summarizing the party’s voter base lol.

uranos@sh.itjust.works · 11 months ago

Yeah, because it would he impossible to have an LLM running a robot with visual, tactile, etc recognition right?

∟⊔⊤∦∣≶@lemmy.nz · 11 months ago

Correct, it’s not. It could be reduced but it will never go away.

BilboBargains@lemmy.world · 11 months ago

Hers, try this mushroom and Ayahuasca smoothie.

DragonAce@lemmy.world · 11 months ago

I don’t think thats the case. If I understand correctly, the current issue is processing power, they can only load so much data before response time goes to absolute shit. I would think that layering different AI logic checks to verify statements made, recall previous conversations, and other mental processes that humans do automatically, would correct this issue. But with current technology its not even an option. My theory is that once quantum computers are actually finally realized and economically feasible, developers will be able to overcome the response time hurdle and all of the layered logic checks will be able to run simultaneously and instantly. My personal opinion is that I think the eventual layering of numerous AI models to overlap, check, and recheck one another, will be what brings on the emergence of what could be considered actual AI consciousness.

_jonatan_@lemmy.world · 11 months ago

It is not an issue of processing power, it’s a problem with the basic operating principles of LLMs. They predict what they “think” is a valid bit of text to come after the last bit of text.

Sure it could be verified by some other machine learning tool, but we have no idea how that could work.

But I strongly doubt LLMs are a stepping stone on the way to true AIs. If you want to get to the moon you can’t just build higher and higher towers.

Also quantum computers aren’t really suited to run artificial neural networks as far as I know.

DragonAce@lemmy.world · 11 months ago

Very good points. I have very limited knowledge about the inner workings of most LLMs, I just know the tidbits I’ve read here and there.

As far as quantum computers, based on my current understanding is once they’re at a point where they can be used commercially, they should easily be able to model/run artificial neural networks. Based on the stuff I’ve seen from Dr. Michio Kaku, quantum computers will eventually have the capacity to do pretty much anything.

_jonatan_@lemmy.world · edit-2 11 months ago

I hadn’t looked up what Michio Kaku had said about quantum computing before, but it does not look well-regarded.

“His 2023 book on Quantum Supremacy has been criticized by quantum computer scientist Scott Aaronson on his blog. Aaronson states "Kaku appears to have had zero prior engagement with quantum computing, and also to have consulted zero relevant experts who could’ve fixed his misconceptions””

I’m hardly an expert on the subject, but as I understand it they have some very niche uses, mostly in cryptography and some forms of simulation.

Tech experts are starting to doubt that ChatGPT and A.I. ‘hallucinations’ will ever go away: ‘This isn’t fixable’

Tech experts are starting to doubt that ChatGPT and A.I. ‘hallucinations’ will ever go away: ‘This isn’t fixable’

Tech experts are starting to doubt that ChatGPT and A.I. 'hallucinations' will ever go away: 'This isn’t fixable'