Tools Ripandtear - An asynchronous file archival program

johnny.barracuda · Feb 19, 2023

Requirements:
Python 3.10+

Please, Log in or Register to see links and images

An asynchronous file archival program

The intention of Ripandtear is to make it easy to save/update content uploaded by content creators online. What makes Ripandtear unique is that it stores all information on a user in a .rat file to help condense all the information about a content creator into one location (don't worry, a .rat is just a .json file with a different extension name). From what usernames they use across different websites, to tracking already downloaded URLs and even file names with their MD5 hash to remove duplicates, it's all stored in the .rat for easy convenience when adding and creating new information from the command line. You can store all the information about a content creator, archive their content, remove duplicates and sort the files all with one command!

By using the .rat file you eliminate the need for re-downloading the same content and creating duplicate files. All the previously downloaded url's are tracked in the .rat file. If the file has already been downloaded, it is skipped to save you time/data/bandwidth and make fewer requests to servers. This also makes it convenient if you share a folder that has a .rat file with someone else. They can pick up where you left off without having to download the same content all over again!

Anyways that is the elevator pitch from the README. I tried to write a pretty extensive instructions on how to use the program with lots of examples so feel free to

Please, Log in or Register to see links and images

if you want to learn more.

Motivation

Years ago the internet was a much simpler time. You had Reddit girls that thrived off attention, cam girls trying to make as much money as possible, tumblerinas with daddy issues trying to get revenge on their fathers that were never around by posting nudes online. It was a perfect era to fulfill my addiction. Since I am posting on this site I am sure you are all thinking I am talking about a porn addiction, but that is not the case (ok, maybe a little). My primary addiction is hoarding data and organizing said data. Content creators self segregating themselves made my life very easy and I was happy. However, in the recent years this has all changed. With the rise of Onlyfans and the vast amount of money girls realize they can now make, their inner capitalist has come out and they have embraced diversification. No longer does a Reddit user stick to Reddit. Her Reddit exists to drive users to her Onlyfans, she uses Redgifs to post videos and advertise her PPV content, updates her Twitter to stay in contact with fans and maybe even has a Pornhub. Even worse is she might use different usernames for each account, or she is forced to if one account gets banned. If you only have 100-200 users saved on your computer you might be able to remember each one of her usernames and still categorize them in main website directories (Reddit, Tumblr, Onlyfans, etc).

However when your collection surpasses 1,500 Reddit users alone this becomes harder to do. This was one of the driving factors behind Ripandtear. Instead of organizing content based on sites, I have changed to collecting content per user instead. To track all the information about a user I created this program. Ripandtear. It uses the .rat file (it's just a .json file with a different extension name) to store all information I (currently) want to track. Within the .rat I can track name(s) a creator uses on specific websites, simpcity links to quickly see if people have posted new content, track url's of media I already downloaded to prevent downloading it again and most importantly download the content itself. Since I primarily use my terminal when interacting with my computer I wanted to be able to create and update users via the command line and do it quickly. I feel that I have done a good job accomplishing these goals with this first version. With one command and a few flags I can quickly log information and download all the content a user has with entering 2 commands into my terminal. One to make the folder and the second to tell Ripandtear what I want it to do.

One of the biggest features I think Ripandtear has is the use of the .rat file to track url's and downloaded files. Instead of having to download a folder with content, take the contents of said folder and copy it into the specific content creators folder, then running a separate program to remove duplicates, then run another program to sort the files, then accidentally downloading the same content again because another user reposted the same link without you realizing it and forcing you to go through the whole process all over again, Ripandtear deals with all of this for you. It tracks the urls you have downloaded so if you try to download the same content twice it skips it. It has the ability to sort files into their respective categories (pics, vids, audio, text), it keeps track of file hashes to remove duplicate content and currently it supports the majority of websites that people post on here (and I have intentions to expand the list). Since all that information is stored in the .rat file it can easily act like a checkpoint. If people include their .rat file in uploads, whoever downloads and uses it can simply pick up where the uploader left off. They no longer have to download the same content over again. This also means going the extra mile to clean up duplicates that slip by, content that people aren't interested or just bad content in general won't just benefit you, but everyone you share your .rat file.

Anyways, I feel I am rambling a bit. I mainly made this project for myself, but I have found it to be a huge boon for me and my collection. It has helped me catalog and speed up working on my collection so much that I didn't want to keep it to myself. I will admit that it might feel rigid to some since it it how I work with and save my content, maybe it could seem a little intimidating with all the different command and it still being a work in progress, but I am currently looking to improve on it. However, If you are a power user that shares a similar philosophy to organizing content and really wants to take managing that content to the next level I hope you find some use from this project.

Here is the current template of the .rat:

JSON:

Please, Log in or Register to view codes content!

johnny.barracuda · Jun 3, 2023

coomer.party both updated their site layout and allowed the ability to upload fansly content recently. Updated RAT to work with both changes.

johnny.barracuda · Feb 20, 2023

Added the ability to download from pixl.li. If anybody knows of older top level domains that still work let me know and I can do an update.

johnny.barracuda · Feb 27, 2023

Just pushed a new update.

- Changed the output from ugly print statments to a beautiful colored display (all credit goes to Jules and cyberdrop-dl for the inspiration)

- Added a -mk flag for a huge quality of life improvement. You can now use -mk <name> to make a directory called <name> and then ripandtear will automatically move into the new directory and run all further flags. After it is done running, it returns back to the original directory -mk was run from. Now you don't have to take the time to enter a separate command to create/move into the new directory

- Fixed lots of bugs to provide better consistency when downloading

- Limited Bunkr video downloads to two at a time to prevent 429 staus code errors. Downloads will be slower, but much more consistent

froppalo I am going to start on the coomer.party extractor now. It will be the next thing uploaded. I should have it done in the next few days.

johnny.barracuda · Feb 28, 2023

froppalo Just finished writing the coomer extractor and uploading it. If you update ripandtear you will have access to it.

-d - will download a coomer link (profiles, content pages and direct links to pictures/videos)
-sc - syncs all coomer links that you have stored in a .rat (they need to be full urls)

Coomer seems to be a little finicky with how many downloads you can do at the same time. To prevent a bunch of 429 error codes (Too Many Requests) I limited the downloads for coomer to 4 at a time. From the bit of testing I have been doing it seems like sometimes downloads will still get blocked with a 429 code. Ripandtear stores those failed downloads under errors within the .rat. If you download a profile and see that you had a bunch of files fail, wait a few seconds after the download completes and run ripandtear -se to reattempt just the failed downloads. For me I am able to immediately download them with no problem. This is just something I might need to tweak going forward to get it to 100%. I have been getting ~85% success rate on the first past. After syncing the errors it hits 100%.

I feel I have coverage for about 85-95% of use cases for coomer, but I didn't run into too many edge cases that would cause problems. If you run into any let me know and I will try to make tweaks to get better coverage.

johnny.barracuda · Apr 22, 2023

Just implemented it and uploaded the new version. You can download ripandtear with pip. Re-read the install instructions if you have already read them once before.

Make sure you run

playwright install

after downloading ripandtear. Playwright is a web development tool that launches a web browser in the background. Ripandtear uses playwright to load javascript from redgifs to be able to scroll down the page to find all the available videos.

johnny.barracuda · May 23, 2023

Thanks for the kind words. I appreciate it

1) I am sort of a Linux power user so I just do for loops over all the directories that have a .rat and it has been working for me. That is a suggestion I will take seriously though. Could you give me an example of your work flow where that would be useful? I just want to better understand your situation so I could implement it as effectively as possible.

2) LOL it's funny you mentioned this because I just noticed it today as well. It should be a simple fix. I'll try and get it fixed by tomorrow.

Thanks for taking the time to give feedback. I appreciate it.

johnny.barracuda · May 26, 2023

your 1) has been addressed. You can use -mk to either initially create a directory or run commands from within the directory.

johnny.barracuda · May 27, 2023

For me I have a directory that has all the content creators that have a .rat file. I just call is "users". Whenever I want to do a mass update I gather all the absolute directory paths into a text file. Then I iterate over the text file of paths where I 1) cd into the folder 2) run the ripandtear command I want to run. I use Fish shell which has a different syntax to something like bash, but it is easier to read so it might give you ideas.

1) cd /path/to/users (go to the main directory where the sub directories contain .rat files)
2) ls -d $PWD/* > names.txt (get the absolute file path off all those sub directories and save them in a text file called "names.txt"
3) loop over all the directories that are in names.txt. cd into the directory, then run the ripandtear command. In this case download everything new, then hash to remove duplicates, then sort.

for d in (cat names.txt)
       cd $d
       ripandtear -sa -H -S
   end

If you have a lot of directories you can speed up the process by breaking up the names.txt into smaller .txt files. For example I currently have 2742 directories that have .rat files. I will take the names.txt and break it up into four text files that have 685 entries each. I will name them 1.txt, 2.txt, 3.txt, 4.txt. Then I can open up four terminal instances and run the same for loop, just cat-ting out the different text files. So in one terminal I would do the for loop, but do for d in (cat 1.txt), then in the next for loop I would do for d in (cat 2.txt) , etc.

That is just my current work flow. Maybe it will give you some ideas.

johnny.barracuda · May 28, 2023

The -H flag only looks in the working directory. However the -H flag will always clean up files no matter what directory it is ran in, it doesn't require only being ran in the same directory as a .rat. To get the most use out of it though I would recommend running it in the same directory as the .rat file. The reason for this is that when files are hashed in the same directory as the .rat, the hashes of the files are recorded and stored in the .rat.

Here is a scenario. You go to a thread on simpcity and download a link from bunkr. You download it in the directory that has the .rat. When you do that the URL the file is hosted at is saved in the .rat to avoid downloading it again. When you use -H you hash the file and that hash is stored in the .rat along with the filename. A month goes by and another user uploads that same file, but now it is under a different URL. Ripandtear doesn't know you already have it because the URL's are different. However after downloading it when you use -H again the same hash will appear. Because you have already downloaded and saved the hash, ripandtear uses the saved hash in the .rat and knows it is a duplicate. ripandtear will then delete the file that has the shorter filename and keep the file with the longest filename. If the original file has a shorter name, then ripandtear will find where it is located and delete it, even if it is in a sub directory.

So you can run -H wherever you want and it will remove duplicates it finds, but if you don't run it in the same directory as the .rat you lose out on better tracking and file management (in my opinion). Even without the .rat it will remove duplicates it finds so feel free to run it wherever you want. The way I kind of think about it is the directory with the .rat file works like boarder patrol. If you download all of your files in the directory with the .rat then the .rat keeps track of who is coming in, where they should be there and that there is only one of each unique file.

johnny.barracuda · May 28, 2023

glandalf

I don't know how you store and name your directories, but I mainly download a lot of reddit users. I had like 1,200 users ripped before I made ripandtear. What I did was create a for loop that looped over all my reddit user directories, went inside the directory, save the basename from the current directory (if the path was /path/to/username the basename would be "username") in a variable, then use that variable as my input for ripandtear to set the reddit name. Then you could find every file in that directory, move it to where the .rat file has been created (where you currently are), delete all the, now, empty directories, then run -SH to hash the files and sort them.

For me I had to let it run for like 24 hours straight because I had terabytes of content to hash, but that is a close approximation of the one liner I used. If you collect more onlyfans then you could just use the basename as the input for onlyfans instead of reddit.

Rough pseudo code example. Do lots of tests before running:

Bash:

Please, Log in or Register to view codes content!

johnny.barracuda · May 28, 2023

I used to do this which was one of the motivating factors behind makeing the .rat file. That way I could have everything in one place

If you want, maybe hold off for a couple of days before doing this. I have been planning on adding a big update to ripandtear (RAT). What I am going to do is add a rudamentary search function. Pretty much I am going to make a global json file that will be able to keep track of names and locations of all your .rat files. That way you can give RAT a name, it will check the json file if it exits and if it does it will tell you the path to the directory that has the name. Thinking about it, it would probably be pretty easy to add tags not only to the individual rat files, but to this global rat file as well. Also I could add a generic names category for sites that I haven't covered yet.

If you would be willing to wait a bit while I implement it, you could work on making a simple parser function for your directories. It might be a little too complicated for fish or zsh, but if you know some python maybe you could write up a simple function that will parse the current directory to extract the names/tags and store them in lists. Then you can us the OS module to add them to RAT. Just an idea so you don't have to lose all this information you collected

You don't need to copy a .rat template. If you add a name, url or link via RAT and a .rat file does not exist, RAT will create one for you. The new .rat file will be named after the directory you are in.

Jules--Winnfield · May 28, 2023

No judgement at all. You can go back to the first few commits on mine to see where it started. Glad to see I could provide some indirect help. Feel free to drop in my DMs if you need anything.

johnny.barracuda · May 29, 2023

glandalf

Pushed a new update. Also from now on I will refer to ripandtear as RAT

Changes:

Due to a bug in Windows, RAT now removes colons ( ":" ) and parentheses from filenames (waiting for confirmation, but I am pretty sure this is the cause)

- Added the ability to add generic names. If RAT doesn't cover a specific website, you can still record the name for posterity.

-g - to add a generic name. Can add multple at once if you split them with a pipe ( | ). Example ripandtear -g 'name|name1'

-pg - prints out the generic names that are saved in the .rat

- Added the ability to add tags to the .rat file. These will be useful for a big update I am planning on doing soon.

-tags - Just like the generic names they can be separated with a pipe ( | ). Example ripandtear -tags 'thanks|for|using|ripandtear'

-ptags - prints out the tags that are saved in the .rat

- Removed the cooldown for Reddit.

- RAT used to try and be nice by waiting for cooldowns when using Reddit, like they ask you to. However they throttled the shit out of me while I was trying to update my collection before the Imgur purge and it pissed me off so I removed it. Like WTF I am just trying to update a few Terabytes, whats the big deal?

- Removed the ability to print links

previously -g was reserved to print the links that were found instead of downloading the files. I never used it so I removed it to free up the -g flag for generic names.

johnny.barracuda · May 30, 2023

Congrats!!! That is awesome! I remember the first python script I wrote was to rename files based off the directory they were in so they would be uniform. I did the same thing and commented almost every line to remind me what it did just in case I had to come back later to use it as a reference. It took me all day aswell, but was 100 lines of codes (lots of comments and spaces though). Looking at your code it looks like it is simple enough, but with a few complexities that you should feel accomplished that you got it done in a day.

The edge cases always get you. That is the one thing I have learned with making RAT. You get it working perfectly, then all of the sudden there is a wrench that is thrown into your plans. For me it has happened a lot with the bunkr extractor. I get everything right then, take a breath, run it on a new link and all of the sudden there is an odd ball name, or a change in the site layout, or how they configure their backend that fucks everything up and you have to go fix. I understand how spaghetti code gets written now because you just want shit to work.

Ya that should be fine. That is how all UTF-8 characters are displayed and stored, it's just that your terminal and browser convert the bytes into the emoji/character they represent when they print them to the screen. When you look at the raw json it is showing you everything literally so that you why you see the bytes. If you print them out to the screen they will be converted.

Please, Log in or Register to see links and images

The first organic project is always the hardest. At least it was for me. I spent way too long in tutorial hell before I worked up the courage to actually try. It also took me a day to write that first file renamer script. After I did it though there was a click for me and realized the power of programming.

Thanks for the update. I am honored that I got to play a small part in your first script.

johnny.barracuda · Jun 1, 2023

I am an archivist at heart. The original idea was if you end up with identical files in the folder, but they are separated (the newly downloaded one being in the root folder and the older one being in a sub directory) RAT would keep the older one and delete the new one. The thought was keeping the older file would be more important to track the first instances of when file was uploaded. It seems like this might just be too complicated and overly anal so I changed it so it will always keep the new file and delete the old one. This should fix the problem described and I think it will be better overall. Plus the fix was as easy as changing one variable.

Thanks for the callout. I added it to the setup.cfg so when you update it should download the module. I don't think too many people are using RAT so sometimes I get stuck in the mentality of "it works on my machine" and forget about others.

Done. If you ever want to get the feel of working with git and developing a project feel free to make a pull request, especially for something as simple as adding "ts" to the end of a regex expression. I won't be offended.

Added. Use -mv to add a manyvids link and -pmv to print them out. Ideally you should save them as the full url. The reason I store the names as just names is those websites have a simple url that you can plug the name into one spot and go to the destination. Manyvids you need both a number and the models name so I feel it would be easier and cleaner to just store the full url.

So that's it for now. Update has been pushed so feel free to download it. Please keep letting me know what bugs and quirks you find. I appreciate it.

johnny.barracuda · Jun 1, 2023

Part of the reason I suggested you do a pull request is I have never used git in a collaborative way as well and I want to learn too. The only thing I have done is pushed code to my repo. That's it.

froppalo · Feb 20, 2023

still need to try it, but this looks incredible!

johnny.barracuda · Feb 27, 2023

Currently -c will only store the coomer.party url in the .rat file. You could print out the link with -pc. I haven't currently added an extractor to download coomer.party links, but it was one of the next sites that I was planning on writing an extractor for as it should be fairly simple from the poking around I have done. I have spent the past few days re-writing a lot of the program to make it look better (based off inspiration from cyberdrop-dl), making it more efficient and fixing bugs. I just finished all of that and should be releasing the new version tomorrow. After that I will get back to writing extractors for the common hosting sites that are used on simpcity.

If you want to know all of the sites that you can currently download from, check out the 'website quirks' section at the bottom of this

Please, Log in or Register to see links and images

. I knew that I wanted to add the ability to download coomer links in the future, but it was slightly lower on my priorities list when compared to other features, so I added a way to store that information for the future. My goal is that for any .rat file that has a coomer link saved, you can go into the folder and run ripantear -sc to sync the posts that coomer has. It will find all coomer links that are stored in the .rat, then download all the content that you haven't gotten yet by checking the .rat file. If you don't have a .rat file (and/or you don't want one) then you will just have to run ripandtear -d <coomer_link> and it will download everything it finds, without saving the url's to a .rat. I am hoping to have the coomer extractor done within 3-4 days.

My ideal situation that I am working towards is to either write a companion program, or integrate this feature directly into ripandtear, where you can tell it the root folders where you store all subfolders that have .rat files. When you tell it to, it will look for all folders that have .rat files based off the those root folders, move to the folder with the .rat and then sync all the content, only downloading what you haven't downloaded yet based of the information stored in the .rat file. The goal (mainly for myself, but also others) is to be able to schedule a task every night for ripandtear (or the companion program) to run and update all the content that content creators had posted that day. That way you will only have to download a little bit each night from each creator and are always up to date with their uploads (and possibly getting all content before they delete stuff). Because this is my end goal I added the ability to save coomer and simpcity links today, so you don't have to go back tomorrow to add them. They don't have extractors to download the content yet, but I am planning on writing them in the future.

froppalo · Mar 1, 2023

is it ok if I ask questions about ripandtear here? Do you prefer DMs? I would also understand if you'd rather not receive questions at all 😅

I'm wondering how you are actually using the program. Do you keep a single .rat file or you create multiple ones for each site or each model?

And looking at the documentation I can see how to set a username for reddit and redgif. Is that possible to,somehow, create a username and then add all kind of links and sources under that link?

Let's say I'm following model1: she has a reddit account, but then I find lots on content on bunkr and coomer. It could be interesting if I could store links like this:

Code:

Please, Log in or Register to view codes content!

I hope it makes sense what I mean

And once again, thanks for this!

Tools Ripandtear - An asynchronous file archival program

Bathwater Drinker

Bathwater Drinker

Bathwater Drinker

Bathwater Drinker

Bathwater Drinker

Bathwater Drinker

Bathwater Drinker

Bathwater Drinker

Bathwater Drinker

Bathwater Drinker

Bathwater Drinker

Bathwater Drinker

Cyberdrop-DL Creator

Bathwater Drinker

Bathwater Drinker

Bathwater Drinker

Bathwater Drinker

Tier 3 Sub

Bathwater Drinker

Tier 3 Sub