Is it illegal to download things that aren't meant to be downloaded?

antonim@lemmy.dbzer0.com · 4 days ago

And that’s more or less what I was aiming for, so we’re back at square one. What you wrote is in line with my first comment:

it is a weak compliment for AI, and more of a criticism of the current web search engines

The point is that there isn’t something that makes AI inherently superior to ordinary search engines. (Personally I haven’t found AI to be superior at all, but that’s a different topic.) The difference in quality is mainly a consequence of some corporate fuckery to wring out more money from the investors and/or advertisers and/or users at the given moment. AI is good (according to you) just because search engines suck.

antonim@lemmy.dbzer0.com · 5 days ago

AI LLMs simply are better at surfacing it

Ok, but how exactly? Is there some magical emergent property of LLMs that guides them to filter out the garbage from the quality content?

antonim@lemmy.dbzer0.com · 7 days ago

If you don’t feel like discussing this and won’t do anything more than deliberately miss the point, you don’t have to reply to me at all.

antonim@lemmy.dbzer0.com · 7 days ago

they’re a great use in surfacing information that is discussed and available, but might be buried with no SEO behind it to surface it

This is what I’ve seen many people claim. But it is a weak compliment for AI, and more of a criticism of the current web search engines. Why is that information unavailable to search engines, but is available to LLMs? If someone has put in the work to find and feed the quality content to LLMs, why couldn’t that same effort have been invested in Google Search?

antonim@lemmy.dbzer0.com · 19 days ago

deleted by creator

antonim@lemmy.dbzer0.com · 21 days ago

Here in my southeast European shithole I’m not worrying about my tax money, the upgrade is going to be pretty cheap, they’re just going to switch from unlicensed XP to unlicensed Win7.

antonim@lemmy.dbzer0.com · 21 days ago

Yep, but I didn’t mention that because it’s not a part of the “Wayback Machine”, it’s just the general “Internet Archive” business of archiving media, which is for now still completely unavailable. (I’ve uploaded dozens of public-domain books there myself, and I’m really missing it…)

antonim@lemmy.dbzer0.com · 22 days ago

You can (well, could) put in any live URL there and IA would take a snapshot of the current page on your request. They also actively crawl the web and take new snapshots on their own. All of that counts as ‘writing’ to the database.

antonim@lemmy.dbzer0.com · 26 days ago

I don’t get the impression you’ve ever made any substantial contributions to Wikipedia, and thus have misguided ideas about what would be actually helpful to the editors and conductive to producing better articles. Your proposal about translations is especially telling, because the machine-assisted translations (i.e. with built-in tools) have already existed on WP long before the recent explosion of LLMs.

In short, your proposals either: 1. already exist, 2. would still risk distorsion, oversimplification, made-up bullshit and feedback loops, 3. are likely very complex and expensive to build, or 4. are straight up impossible.

Good WP articles are written by people who have actually read some scholarly articles on the subject, including those that aren’t easily available online (so LLMs are massively stunted by default). Having an LLM re-write a “poorly worded” article would at best be like polishing a turd (poorly worded articles are usually written by people who don’t know much about the subject in the first place, so there’s not much material for the LLM to actually improve), and more likely it would introduce a ton of biases on its own (as well as the usual asinine writing style).

Thankfully, as far as I’ve seen the WP community is generally skeptical of AI tools, so I don’t expect such nonsense to have much of an influence on the site.

antonim@lemmy.dbzer0.com · 27 days ago

As far as Wikipedia is concerned, there is pretty much no way to use LLMs correctly, because probably each major model includes Wikipedia in its training dataset, and using WP to improve WP is… not a good idea. It probably doesn’t require an essay to explain why it’s bad to create and mechanise a loop of bias in an encyclopedia.

antonim@lemmy.dbzer0.com · 27 days ago

Are you ok?

antonim@lemmy.dbzer0.com · edit-2 1 month ago

It has custom user-made themes that are dark mode, so it probably has dozens of dark modes.

antonim@lemmy.dbzer0.com · 1 month ago

That might depend on where you live, but generally no, I think.

antonim@lemmy.dbzer0.com · 2 months ago

(Sorry for the late response.) Well it depends a lot on the site. Since I focus on books and scholarly articles, the ideal way is to find the URL of the original PDF. The website might show you just individual pages as images, but it might hide the link to the PDF somewhere in the code. Alternatively, you might just obtain all the URLs of the individual page images, put them all into a download manager, and later bundle them all into a new PDF. (When you open the “inspect element” window, you just have to figure out which part of the code is meant to display the pages/images to you.) Sometimes the PDFs and page images can be found in your browser cache, as I mention in the OP. There’s quite some variety among the different sites, but with even the most rudimentary knowledge of web design you should be able to figure out most of them.

If need help with ripping something in particular, DM me and I’ll give it a try.

antonim@lemmy.dbzer0.com · 2 months ago

I never said I follow the law, I’m just wondering what the law says ;)

antonim@lemmy.dbzer0.com · 2 months ago

Honestly much of your reply is confusing me and doesn’t seem to be relevant to my questions. This is what I think is crucial:

Just because a file is cached on your device does not mean you are the legal owner of that content forever.

What does being “the legal owner forever” actually entail, either with regards to a physical book or its scan? And what does that mean regarding what I can legally do with the cached file on my computer?

antonim@lemmy.dbzer0.com · edit-2 2 months ago

Is it illegal to download things that aren't meant to be downloaded?

antonim@lemmy.dbzer0.com · 2 months ago

https://github.com/elementdavv/internet_archive_downloader

This one? I’ll definitely give it a try.

antonim@lemmy.dbzer0.com · edit-2 2 months ago

FYI, there are multiple methods to download “digitally loaned” books off IA, the guides exist on reddit. The public domain stuff is safe, but the stuff that is still under copyright yet unavailable by other means (Libgen/Anna’s Archive, or even normal physical copies) should definitely be ripped and uploaded to LG.

The method I use, which results in best images, is to “loan” the book, zoom in to load the highest resolution, and then leaf through the book. Periodically extract the full images from your browser cache (with e.g. MZCacheView). This should probably be automatised, but I’m yet to find a method, other than making e.g. an Autohotkey script. When you have everything downloaded, the images can be easily modified (if the book doesn’t have coloured illustrations IMO it is ideal to convert all images to black-and-white 2-bit PNG), and bundled up into a PDF with a PDF editor (I use X-Change Editor; I also like doing OCR, adding the bookmarks/outline, and adding special page numbering if needed - but that stuff can take a while and just makes the file easier to handle, it’s not necessary). Then the book can be uploaded to proper pirate sites and hopefully live on freely forever. Also there are some other methods you can find online, on reddit, etc.