I have several thousand books, movies, anime, manga, games, etc. that I need to organize into spreadsheets. How in the everloving fuck do I automate this, even partially?I have a shitload of .txt files with lists in them and links to miscellaneous database websites and need to scrape a few specific things from each one (like genres, who made it, when it was made, etc.) and map that to a spreadsheet. Some things need data from multiple websites because a lot of databases suck for anything that isn't mainstream. I don't mind cleaning it up after, I just need a way to get the information in a readable format with as little editing as I can get away with.If there's any retard-friendly software, programming languages, etc. I can learn and use to make this even a little easier, please direct me to them. I have no clue how to code, but I'm assuming learning how and automating this stuff would be much faster than doing everything manually at this point. There's gotta be a better way.
>>1510821>I have a shitload of .txt files with lists in them and links to miscellaneous database websitesWhat format are they in, is it all plain text, CSV, html tables or what...? Regardless there's ways to extract data from them, a lot of languages have purpose built modules for that kind of workhttps://metacpan.org/pod/HTML::TableExtracthttps://pypi.org/project/beautifulsoup4/
>>1510832It's unfortunate but you'll probably need one specific extractor for each data source since they'll have it all in different formats (xml, jason etc) and different schemas as well
>>1510821Won't be too difficult to script
>>1510832They're all in plain text, one per line. Thanks for the links, this gives me a good place to start. HTML and Python.>>1510833I figured as much. Sad news, but that's fine by me. Any speedup I can get is valuable.>>1511093Thanks, that's good to know. Do you have any pointers on what I should learn or focus on?
>>1511149One per line? The names? The database websites?
>>1511155Sorry, I meant the titles. They're either likeTitleTitleTitleor they're links to specific entries on the database website likehttps://www.url.com/etc/titlehttps://www.url.com/etc/titlehttps://www.url.com/etc/titlewhere title is either insert-title-here or some numeric ID code like on imdb. They aren't mixed in any of the files afaict though, it's either all plain titles or all plain links. None are hyperlinked or wrapped in any kind of code.
>>1511160can you just post an example?
>>1511209I don't really understand what you mean. Do you mean a list with specific titles? One of the .txt files? Sorry, I'm a little retarded and not familiar with this stuff. I figured that since it's all either plaintext or plain links with nothing else it would be fairly straightforward.
>>1511649People are still confused about the specifics of the files and the format of the excel document you want. >>1511160 is kinda confusing and for the spreadsheet do you want the manga, books,etc to be in one spreadsheet or split into separate pages? That kinda thing
>>1510821>If there's any retard-friendly [...] programming languagespythonhttps://docs.python.org/3/tutorial/index.html
>>1511805I see, sorry about that. To (hopefully) explain a little better:I have various .txt files, each grouped into folders. Each folder covers one type of media; e.g., a folder named "Movies" contains only lists of movies. These files are in plaintext and list one title per line. For example, one file has the following list:La Gloire de Mon PèreManon des SourcesNight Train to LisbonIt contains exclusively plaintext. A different file in the same folder has this:https://www.imdb.com/title/tt0050083/https://www.imdb.com/title/tt0053125/https://www.imdb.com/title/tt0080455/It contains exclusively imdb links. These are plaintext too and are not hyperlinked. The only exception to the former list is when either dates (e.g., Suspiria (1977)) or names (e.g., Italo Calvino - Marcovaldo) are included.My goal is to take these lists and extract the data I want from relevant database sites into LibreOffice Calc spreadsheets. For some things such as manga, I need to be able to extract from multiple sites per entry, like https://mangaupdates.com for basic info and https://ja.wikipedia.com or https://manba.co.jp/ for the publication period/any missing info.Currently, I have one spreadsheet per medium - books, movies, etc. - with one sheet each. The data I want differs by medium, but in general it's as follows: Type, Title, Director (or Author for books/manga/etc.), Studio (or Publisher), Date, Genre, whether it's original or an adaption, and whether it has any sequels/prequels/etc. I'm not picky about this though and figured I would need to alter some things to work better, like concatenating all my files together and then converting them to either all links or all titles.If this still doesn't make much sense or I'm missing something, please let me know and I'll try to explain more.
>>1511908Thanks, will read through this.