Why AI detectors think the US Constitution was written by AI

jocanib@lemmy.world · 1 year ago

Why AI detectors think the US Constitution was written by AI

dan1101@lemmy.world · edit-2 1 year ago

As expected, they can’t be trusted. And the more AI evolves, the less likely AI content will be detectable IMO.

jocanib@lemmy.world · 1 year ago

It will almost always be detectable if you just read what is written. Especially for academic work. It doesn’t know what a citation is, only what one looks like and where they appear. It can’t summarise a paper accurately. It’s easy to force laughably bad output by just asking the right sort of question.

The simplest approach for setting homework is to give them the LLM output and get them to check it for errors and omissions. LLMs can’t critique their own work and students probably learn more from chasing down errors than filling a blank sheet of paper for the sake of it.

weew@lemmy.ca · 1 year ago

given how much AI has advanced in the past year alone, saying it will “always” be easy to spot is extremely short sighted.

Terrasque@infosec.pub · 1 year ago

Some things are inherent in the way the current LLM’s work. It doesn’t reason, it doesn’t understand, it just predicts the next word out of likely candidates based on the previous words. It can’t look ahead to know if it’s got an answer, and it can’t backtrack to change previous words if it later finds out it’s written itself into a corner. It won’t even know it’s written itself into a corner, it will just continue predicting in the pattern it’s seen, even if it makes little or no sense for a human.

It just mimics the source data it’s been trained on, following the patterns it’s learned there. At no point does it have any sort of understanding of what it’s saying. In some ways it’s similar to this, where a man learned how enough french words were written to win the national scrabble competition, without any clue what the words actually mean.

And until we get a new approach to LLM’s, we can only improve it by adding more training data and more layers allowing it to pick out more subtle patterns in larger amounts of data. But with the current approach, you can’t guarantee that what it writes will be correct, or even make sense.

Zeth0s@lemmy.world · 1 year ago

This is not entirely correct, in my experience. With the current version pf gtp-4 you might be right, but the initial versions were extremely good. Clearly you have to work with it, you cannot ask for the whole work

jocanib@lemmy.world · 1 year ago

That’s not true! There’s heaps of early-GPT articles pointing out how much bullshit it regurgitates (eg Why does ChatGPT constantly lie?). And no evidence at all that the breathless fanboys have even stopped to check.

Zeth0s@lemmy.world · edit-2 1 year ago

I meant initial versions of chatGTP 4. ChatGTP isn’t lying, simply because lying implies a malevolent intent. Gtp-4 has no intent, it just provides an output given an input, that can be either wrong or correct. A model able to provide more correct answers is a more accurate model. Computing accuracy for a LLM is not trivial, but gpt-4 is still a good model. User has to know how to use it, what to expect and how to evaluate the result. If they are unable to do so it’s completely their fault.

Why are you so pissed of a good nlp model?

Asifall@lemmy.world · 1 year ago

I think there’s a big difference between being able to identify an AI by talking to it and being able to identify something written by an AI, especially if a human has looked over it for obvious errors.

Candelestine@lemmy.world · 1 year ago

Clearly the Founding Fathers were not advanced enough to have crafted the US Constitution unaided. It’s only reasonable to imagine that ancient aliens could have landed, given them an AI to assist them, and then departed with nobody the wiser.

I am certain we can find evidence of this if we dig hard enough.

Addition@sh.itjust.works · 1 year ago

James Madison was a fraud! He used ChatGPT instead of writing out his homework like he was supposed to. Smh my head.

Dodecahedron December@sh.itjust.works · 1 year ago

As an AI language model, I am unable to found a country but here are some ideas which might help people feel less likely to revolt.

Zeth0s@lemmy.world · 1 year ago

This is a simple example of a constitution. For a real constitution you might want to consider a more detailed list of amendaments and civil rights. Please refer to a professional constitutional writer if you want to create a constitution for productive purposes

Gimly@lemmy.world · 1 year ago

GPT-4 is quite willing to help you create the constitution for your own country, well I guess I have to find a name now:

Certainly! Crafting a constitution for a new country is an exciting and monumental task. The constitution is the core document that dictates the legal obligations and rights of your country’s government and its citizens. Here’s a very basic outline to get us started. Please remember, this is a rudimentary draft and it will need to be adjusted based on the specific needs, culture, history, and goals of your country:

PREAMBLE

We, the people of [Your Country’s Name], in order to form a more perfect union, establish justice, ensure domestic tranquility, provide for the common defense, promote the general welfare, and secure the blessings of liberty to ourselves and our posterity, do ordain and establish this Constitution for [Your Country’s Name].

ARTICLE I: THE LEGISLATURE

Section 1: All legislative powers herein granted shall be vested in a [name of your legislative body].

Section 2: [Details on how members are elected, their terms, qualifications, powers and responsibilities]

Section 3: [Details on how legislation is passed, veto powers, etc.]

ARTICLE II: THE EXECUTIVE

Section 1: The executive power shall be vested in a [title of your head of state].

Section 2: [Details on how the head of state is elected, their term, qualifications, powers and responsibilities]

Section 3: [Details on how the executive branch operates, and other positions within the executive branch]

ARTICLE III: THE JUDICIARY

Section 1: The judicial power of [Your Country’s Name] shall be vested in one Supreme Court, and in such inferior courts as the [Legislative body] may from time to time ordain and establish.

Section 2: [Details on how judges are appointed, their terms, qualifications, powers and responsibilities]

ARTICLE IV: THE RIGHTS OF THE PEOPLE

Section 1: All persons born or naturalized in [Your Country’s Name] are citizens of [Your Country’s Name] and of the state wherein they reside. No state shall make or enforce any law which shall abridge the privileges or immunities of citizens.

Section 2: [Further details on the rights, freedoms, and responsibilities of the citizens, these could include things like freedom of speech, freedom of religion, right to a fair trial, etc.]

ARTICLE V: AMENDMENTS

Section 1: The [Legislative body] whenever two thirds of both Houses deem it necessary, shall propose Amendments to this Constitution, or, on the Application of the Legislatures of two thirds of the several states, shall call a Convention for proposing Amendments…

ARTICLE VI: GENERAL PROVISIONS

Section 1: [Any other items that need to be addressed, such as the official language, the national religion (or lack thereof), the national emblem, the process for international treaties, etc.]

ARTICLE VII: RATIFICATION

This Constitution shall be considered ratified when agreed upon by a majority of citizens through a free and fair vote.

Please remember that a constitution is a living document that will grow and change over time as your country evolves. This outline is just a starting point and may not include all the specific sections you need. For example, you might want to add sections addressing human rights, environmental protection, financial regulations, and so on.

This process should involve broad input from many different stakeholders, including legal experts, and should be reflective of the country’s culture, values, history, and aspirations for the future. Also, please note that the effective implementation of a constitution requires a strong rule of law and mechanisms for holding those in power accountable.

Dodecahedron December@sh.itjust.works · 1 year ago

💰

dethb0y@lemmy.world · 1 year ago

Because AI detectors suck and are the modern day equivalent of dowsing rods?

jocanib@lemmy.world · edit-2 1 year ago

They’re circular. If the text is too predictable it was written by an LLM* but LLMs are designed to regurgitate the next word most commonly used by humans in any given context.

*AI is a complete misnomer for the hi-tech magic 8ball

Zeth0s@lemmy.world · edit-2 1 year ago

The next most commonly used word would result in a loop of common word. LLMs do not work like that

jocanib@lemmy.world · 1 year ago

In context. And that is exactly how they work. It’s just a statistical prediction model with billions of parameters.

Zeth0s@lemmy.world · edit-2 1 year ago

regurgitate the next word most commonly used by humans in any given context.

is not what it does. That would create non sensical text (you can try yourself).

This is a summary of the method, as summarized by gtp-4:

Sure, here is a detailed description of how text is generated with ChatGPT, which is based on the GPT architecture:

Initial Prompt: The process begins with an input prompt. This could be something like “Tell me about the weather today” or any other string of text.

Tokenization: The input text is broken down into smaller parts, called tokens, which can represent words, parts of words, or punctuation. GPT uses a byte pair encoding (BPE) tokenization, which essentially breaks down text into commonly occurring chunks.

Embedding: Each token is then turned into a vector via an embedding. This vector captures semantic information about the token and serves as the input for the model.

Processing the Input: The GPT model processes the input vectors sequentially with a stack of transformer layers. Each layer applies self-attention and feeds its output into the next layer.

Self-Attention Mechanism: The self-attention mechanism in the Transformer model allows it to weigh the importance of different words when predicting the next word. For example, when trying to predict the last word in the sentence “The cat sat on the ____,” the words “cat” and “on” are likely to have more influence on the prediction than “The”. This weighing is learned during training and allows the model to generate more coherent and contextually appropriate responses.

Output Layer: The output from the final transformer layer for the last input token goes through a linear layer followed by a softmax function, which turns it into a probability distribution over the possible next tokens in the vocabulary. Each possible next token is assigned a probability.

Sampling with Temperature: The next token is chosen based on these probabilities. One common method is to sample from this distribution, which introduces some randomness into the process. The temperature parameter controls the amount of randomness: a higher temperature makes the distribution more uniform and the output more random, while a lower temperature makes the model more likely to choose the highest-probability token.

Decoding: The chosen token is then decoded back into text and appended to the output.

Next Iteration: The process then repeats for the next token: the model takes the output so far (including the newly-generated token), processes it, and generates probabilities for the next token. This continues until a maximum length is reached, or an end-of-sequence token is produced.

Post-Processing: Any necessary post-processing is applied, such as cleaning up tokenization artifacts.

In this way, the model generates a sequence of tokens, one at a time, based on the input prompt and the tokens it has generated so far. Please note that while this process typically uses sampling with a temperature parameter, other methods like beam search or top-k sampling can also be used to choose the next token. These methods have different trade-offs in terms of computational efficiency, diversity, and quality of output.

You are missing the key part where the text is tranformed in a vector space of “concepts” where semanticic relationships are represented, that is where the inference happens. The inference is not on words to get the next commonly used word, otherwise it wouldn’t work. And you also missed the final sampling to introduce a randomness in the word selection.

I don’t understand why are you so upset for a chain of complex mathematical functions that complete and input sentence. Why are you angry?

jocanib@lemmy.world · 1 year ago

You’re agreeing with me but using more words.

I’m more annoyed than upset. This technology is eating resources which are badly needed elsewhere and all we get in return is absolute junk which will infest the literature for decades to come.

Zeth0s@lemmy.world · 1 year ago

I am not agreeing with you because “regurgitate the next most commonly world” is not what it does.

That said, the technology is not doing anything wrong. The people using it are doing it. The technology is a great achievement of human kind, possibly one of the greatest. If people decide to use it to print sh*t is people fault. Quantum mechanics is one of the greatest achievement of human kind, if people decided to use it to kill people, it is a fault of people. Many humans are simply shitty, don’t blame a clever mathematical function and its clever implementation

Dohnakun@lemmy.fmhy.ml · 1 year ago

This article was written to keep people as long on the page as possible. It didn’t get to the point before i left. Someone has a tl;dr?

Postcard64@lemmy.world · 1 year ago

Constitution is a text that appears many times on the internet. ChatGPT’s training set probably has multiple copies of it. So it’s likely ChatGPT will generate it. Therefore, the detectors are likely to flag it as AI-generated. That’s what I got from it, but I also found it difficult to parse. Maybe someone can correct me on this.

SubArcticTundra@lemmy.ml · 1 year ago

I thimk we need and AI to summarize the article. Edit: oh god, soon shit articles are gonna be optimized to not be AI summarizable

Dohnakun@lemmy.fmhy.ml · 1 year ago

Bet it was written by an AI.

Dohnakun@lemmy.fmhy.ml · 1 year ago

Thanks!

busturn@lemmy.world · 1 year ago

I’ve recently checked my years-old essay using one of these AI plagiarism detectors and it said that the essay was 90% AI written. So either it’s all bs or I’m a time travelling AI.

98codes@lemm.ee · 1 year ago

I’m convinced that it’s been trained on top of the essays of middle and high school students that have gone their whole lives without proper education on vocabulary, grammar, and the like. So when asked to evaluate something written properly, it’s flagged as AI.

Garbage in, garbage out. Same as is ever was.

Flying Squid@lemmy.world · 1 year ago

CAUGHT!

Bluefold@sh.itjust.works · 1 year ago

Interesting fact: The same parent company owns both Reddit and Turnitin.

InternetTubes@lemmy.world · edit-2 11 months ago

Removed by mod

kikuchiyo@lemmy.ml · 1 year ago

I’m waiting for new conspiracy theories after that article hahah.

paddirn@lemmy.world · 1 year ago

Obviously the US Constitution was written by AI, we’re living in a simulation. Wake up sheeple, the Matrix is real!

Captain_Patchy@lemmy.world · 1 year ago

They only know what they have been fed.

What more likely first/base feeding than the US Constitution’s declarations and it’s amendments?

BarqsHasBite@lemmy.ca · 1 year ago

Aliens.jpg

f43r05@lemmy.ca · 1 year ago

Awwww. Beat me to it.

Dodecahedron December@sh.itjust.works · 1 year ago

This. Changes. Everything.