johnny.barracuda

Bathwater Drinker
Sep 26, 2022
220
2,975
1,252
0fya082315al84db03fa9bf467e3.png
Thanks for the kind words. I appreciate it

Please, Log in or Register to view quotes

1) I am sort of a Linux power user so I just do for loops over all the directories that have a .rat and it has been working for me. That is a suggestion I will take seriously though. Could you give me an example of your work flow where that would be useful? I just want to better understand your situation so I could implement it as effectively as possible.

2) LOL it's funny you mentioned this because I just noticed it today as well. It should be a simple fix. I'll try and get it fixed by tomorrow.

Thanks for taking the time to give feedback. I appreciate it.
 
  • PeepoLove
Reactions: glandalf

glandalf

Superfan
Mar 14, 2022
34
674
832
0fya082315al84db03fa9bf467e3.png
Please, Log in or Register to view quotes

My workflow may very well be flawed, because I'm far from a Linux power user, i'm very much still learn the ropes when it comes to more advanced usage. I'm taking a far more manual approach to tool, at least at first because I'm going through a organization/clean up task using the -H and -S flags on your tool, as well as running dedupes on video comparisons and image comparisons using other tools, and also re-encoding some stuff to h.265, all in the name of freeing up hardrive space. I was trying to script something a bit more automatic using nemo action scripts but I couldn't seems to get ripandtear to launch from it. So then I was trying to create a script that would cd into the directory of choosing and run the tool, but again I failed miserably lol. So basically my suggestion of being able to set the directory in the tool itself come from failure/frustration haha. Thank you for considering it :)

At the moment I'm using nnn to navigate my library, where I can get spawn a shell from inside the directory and then run ripandtear from an alias I set up.

If you have any suggestions of ways I could be using the tool better, I will happily take a look into them. Always looking to learn and try new things, even if I fail miserably at them haha.
 
Last edited:

johnny.barracuda

Bathwater Drinker
Sep 26, 2022
220
2,975
1,252
0fya082315al84db03fa9bf467e3.png
Please, Log in or Register to view quotes

For me I have a directory that has all the content creators that have a .rat file. I just call is "users". Whenever I want to do a mass update I gather all the absolute directory paths into a text file. Then I iterate over the text file of paths where I 1) cd into the folder 2) run the ripandtear command I want to run. I use Fish shell which has a different syntax to something like bash, but it is easier to read so it might give you ideas.

1) cd /path/to/users (go to the main directory where the sub directories contain .rat files)
2) ls -d $PWD/* > names.txt (get the absolute file path off all those sub directories and save them in a text file called "names.txt"
3) loop over all the directories that are in names.txt. cd into the directory, then run the ripandtear command. In this case download everything new, then hash to remove duplicates, then sort.

for d in (cat names.txt) cd $d ripandtear -sa -H -S end


If you have a lot of directories you can speed up the process by breaking up the names.txt into smaller .txt files. For example I currently have 2742 directories that have .rat files. I will take the names.txt and break it up into four text files that have 685 entries each. I will name them 1.txt, 2.txt, 3.txt, 4.txt. Then I can open up four terminal instances and run the same for loop, just cat-ting out the different text files. So in one terminal I would do the for loop, but do for d in (cat 1.txt), then in the next for loop I would do for d in (cat 2.txt) , etc.

That is just my current work flow. Maybe it will give you some ideas.
 
  • PeepoLove
Reactions: glandalf

glandalf

Superfan
Mar 14, 2022
34
674
832
0fya082315al84db03fa9bf467e3.png
Please, Log in or Register to view quotes

Thank you for this info, its been really helpful. I've used ZSH in the past, but hadn't set it up on the VM I use to download stuff, so I looked into FISH and thought it was cool, so I've gone ahead and set it up. Have now got a for loop like yours running to de-dupe and sort, and its working like a dream. Definitely something for me to learn and build upon.

Can I ask if your -H flag only looks in the working directory, or do it also look in the sub directories that have previously been -S sorted when comparing new downloads?
 
  • Like
Reactions: johnny.barracuda

johnny.barracuda

Bathwater Drinker
Sep 26, 2022
220
2,975
1,252
0fya082315al84db03fa9bf467e3.png
Please, Log in or Register to view quotes
The -H flag only looks in the working directory. However the -H flag will always clean up files no matter what directory it is ran in, it doesn't require only being ran in the same directory as a .rat. To get the most use out of it though I would recommend running it in the same directory as the .rat file. The reason for this is that when files are hashed in the same directory as the .rat, the hashes of the files are recorded and stored in the .rat.

Here is a scenario. You go to a thread on simpcity and download a link from bunkr. You download it in the directory that has the .rat. When you do that the URL the file is hosted at is saved in the .rat to avoid downloading it again. When you use -H you hash the file and that hash is stored in the .rat along with the filename. A month goes by and another user uploads that same file, but now it is under a different URL. Ripandtear doesn't know you already have it because the URL's are different. However after downloading it when you use -H again the same hash will appear. Because you have already downloaded and saved the hash, ripandtear uses the saved hash in the .rat and knows it is a duplicate. ripandtear will then delete the file that has the shorter filename and keep the file with the longest filename. If the original file has a shorter name, then ripandtear will find where it is located and delete it, even if it is in a sub directory.

So you can run -H wherever you want and it will remove duplicates it finds, but if you don't run it in the same directory as the .rat you lose out on better tracking and file management (in my opinion). Even without the .rat it will remove duplicates it finds so feel free to run it wherever you want. The way I kind of think about it is the directory with the .rat file works like boarder patrol. If you download all of your files in the directory with the .rat then the .rat keeps track of who is coming in, where they should be there and that there is only one of each unique file.
 
  • PeepoLove
Reactions: glandalf

johnny.barracuda

Bathwater Drinker
Sep 26, 2022
220
2,975
1,252
0fya082315al84db03fa9bf467e3.png
glandalf

I don't know how you store and name your directories, but I mainly download a lot of reddit users. I had like 1,200 users ripped before I made ripandtear. What I did was create a for loop that looped over all my reddit user directories, went inside the directory, save the basename from the current directory (if the path was /path/to/username the basename would be "username") in a variable, then use that variable as my input for ripandtear to set the reddit name. Then you could find every file in that directory, move it to where the .rat file has been created (where you currently are), delete all the, now, empty directories, then run -SH to hash the files and sort them.

For me I had to let it run for like 24 hours straight because I had terabytes of content to hash, but that is a close approximation of the one liner I used. If you collect more onlyfans then you could just use the basename as the input for onlyfans instead of reddit.

Rough pseudo code example. Do lots of tests before running:

Bash:
Please, Log in or Register to view codes content!
 
  • PeepoLove
Reactions: glandalf

glandalf

Superfan
Mar 14, 2022
34
674
832
0fya082315al84db03fa9bf467e3.png
So my folder names aren't to great when it comes to compatibility with the tool it seems. Its a combination of the models different usernames that I've found, then different tags for different attributes. e.g. if they have tattoos. So my folders names look like this
glandalf - gland_alf (Tattoo - Tall)

So what I'm going to do is something along the lines of this:

Bash:
Please, Log in or Register to view codes content!

While testing this out though I have found that ripandtear doesn't seem to like when there is period/full-stop "." in the directory name.
So I might run a bulk renamed to remove . from folder names
 

johnny.barracuda

Bathwater Drinker
Sep 26, 2022
220
2,975
1,252
0fya082315al84db03fa9bf467e3.png
Please, Log in or Register to view quotes
I used to do this which was one of the motivating factors behind makeing the .rat file. That way I could have everything in one place

Please, Log in or Register to view quotes
If you want, maybe hold off for a couple of days before doing this. I have been planning on adding a big update to ripandtear (RAT). What I am going to do is add a rudamentary search function. Pretty much I am going to make a global json file that will be able to keep track of names and locations of all your .rat files. That way you can give RAT a name, it will check the json file if it exits and if it does it will tell you the path to the directory that has the name. Thinking about it, it would probably be pretty easy to add tags not only to the individual rat files, but to this global rat file as well. Also I could add a generic names category for sites that I haven't covered yet.

If you would be willing to wait a bit while I implement it, you could work on making a simple parser function for your directories. It might be a little too complicated for fish or zsh, but if you know some python maybe you could write up a simple function that will parse the current directory to extract the names/tags and store them in lists. Then you can us the OS module to add them to RAT. Just an idea so you don't have to lose all this information you collected

Please, Log in or Register to view quotes
You don't need to copy a .rat template. If you add a name, url or link via RAT and a .rat file does not exist, RAT will create one for you. The new .rat file will be named after the directory you are in.
 
  • PeepoLove
Reactions: glandalf

glandalf

Superfan
Mar 14, 2022
34
674
832
0fya082315al84db03fa9bf467e3.png
Please, Log in or Register to view quotes

That sounds like a cool idea. I think it would be good to ad a generic names category, because I've also been storing old usernames of defunct accounts the model used to use, for searching purposes. So it wouldn't be good to store the old names in the main site categories in the rat file.

Please, Log in or Register to view quotes

I'm starting to learn python, but I don't think I'm quite there yet to develop something like that, but it might give me reason to try expedite my learning lol.

Please, Log in or Register to view quotes

For now I'm just trying to clean up my library en masse to free up some drive space. So at current I don't want to go through and assign an account name or URL to each model manuall, I just want to create a rat file that can start logging the hashes. I will eventually take time to add usernames and URL's, but clearing space is my initial goal.
 

Jules--Winnfield

Cyberdrop-DL Creator
Mar 11, 2022
2,148
5,056
1,127
0fya082315al84db03fa9bf467e3.png
Nifty program you got going, Gonna have to give it a try later today. I'm mostly interested in the reddit portion personally
 

johnny.barracuda

Bathwater Drinker
Sep 26, 2022
220
2,975
1,252
0fya082315al84db03fa9bf467e3.png
Please, Log in or Register to view quotes
It should be pretty easy and it will be a good way to learn about working with paths. There are two ways you can work with the file system in python. The OS or Path module. The Path module is newer and what most people recommend using. All you need to do is put the function in a .py file, then if you are on Linux add it somewhere within your $PATH and make sure it is executable. Then you can just call it like any other program.

Just making assumptions about your file structure, but if all your folders are like this glandalf - gland_alf (Tattoo - Tall) then you could grab the folder name using the "stem" from the path module and store it in a variable. Then from that variable split on the " ( ". That will return a list with two entries ["glandalf - gland_alf ", "Tattoo - Tall)"]. Then you can do some trimming up and isolating names for the first entry until you have a list with clean names and then do the same creating another list for the tags. You could end up with something like this.

Python:
Please, Log in or Register to view codes content!

Once you get those two lists you can loop over them one at a time (names first then tags) and call os.subprocess to call RAT and add them to your .rat file. The --generic-name and --tag are stand ins. I will probably add those name or something similar.

Psudocode:

Python:
Please, Log in or Register to view codes content!

Just an idea off the top of my head, take it or leave it. It might sound intimidating, but all you really need to do is get the folder name, then break apart the name into manageable pieces and add them to a list. I am doing updates to my home server, but once I am done I will try implementing what I was talking about. I will work on adding the generic names and tags first in case you end up wanting to do what I just mentioned.
 

glandalf

Superfan
Mar 14, 2022
34
674
832
0fya082315al84db03fa9bf467e3.png
Please, Log in or Register to view quotes

Not need to prioritize anything on my behalf mate, you're already doing a lot by taking time to explain and teach me new stuff, so I'm happy to wait for whenever you get around to implementing stuff!

Thanks for this further info, it's a great help. I understand the premise of what you're suggesting I do, its just the applying to to an actual python script that I need to get my head around. But you've given me a great deal to research and start trying to apply, so again thank you. I will get tinkering with it and see what I can come up with.
 
  • Like
Reactions: johnny.barracuda

johnny.barracuda

Bathwater Drinker
Sep 26, 2022
220
2,975
1,252
0fya082315al84db03fa9bf467e3.png
Please, Log in or Register to view quotes
Wow! Thank you for the compliment! It means a lot coming from you seeing as I would take a peak at your source code whenever I would get stuck and need inspiration. I am just an amateur programmer so please don't judge me too much.

My reddit extractor is pretty ugly at the moment. I have been meaning to go back and clean it up, but haven't gotten around to it. I was planning on adding the ability to download text posts that users make so I was planning on doing it then.

Here is a link to the gitlab if you hadn't seen it -
Please, Log in or Register to see links and images


Just for quick orientation the philosophy I use is to create a dictionary that contains all information needed to download a file. It can be passed around between different extractors adding necessary information. The most common use case is a url_dictionary being created in the reddit extractor, adding the name of the text post, then passing it to the redgifs extractor to find the actual download link.

utils/content_finder is where most of the calls to find content are made

utils/tracker - this is a singleton that all the extractors connect to. Whenever a url_dictionary is completed it is added to the tracker. After all the extractors are done finding everything the tracker cleans up duplicate links. After everything is clean it passes it to the downloader

utils/custom_types - is where you can see all the fields that can be collected. Not all of them are used for every file though
 

johnny.barracuda

Bathwater Drinker
Sep 26, 2022
220
2,975
1,252
0fya082315al84db03fa9bf467e3.png
glandalf

Pushed a new update. Also from now on I will refer to ripandtear as RAT

Changes:

Due to a bug in Windows, RAT now removes colons ( ":" ) and parentheses from filenames (waiting for confirmation, but I am pretty sure this is the cause)

- Added the ability to add generic names. If RAT doesn't cover a specific website, you can still record the name for posterity.
-g - to add a generic name. Can add multple at once if you split them with a pipe ( | ). Example ripandtear -g 'name|name1'
-pg - prints out the generic names that are saved in the .rat​

- Added the ability to add tags to the .rat file. These will be useful for a big update I am planning on doing soon.
-tags - Just like the generic names they can be separated with a pipe ( | ). Example ripandtear -tags 'thanks|for|using|ripandtear'
-ptags - prints out the tags that are saved in the .rat​

- Removed the cooldown for Reddit.
- RAT used to try and be nice by waiting for cooldowns when using Reddit, like they ask you to. However they throttled the shit out of me while I was trying to update my collection before the Imgur purge and it pissed me off so I removed it. Like WTF I am just trying to update a few Terabytes, whats the big deal?​

- Removed the ability to print links
previously -g was reserved to print the links that were found instead of downloading the files. I never used it so I removed it to free up the -g flag for generic names.​
 
  • PeepoLove
Reactions: glandalf

glandalf

Superfan
Mar 14, 2022
34
674
832
0fya082315al84db03fa9bf467e3.png
Please, Log in or Register to view quotes
Many thanks for this!

I'm getting closer to finishing the function you suggested. At least I think I am lol.

One thing I wanted to call out for consideration, is that would you perhaps think of using the name method instead of stem in the rat_info.py util, as stem interprets a period as the beginning of a suffix/extension and cuts it off.

[old imgur media embed was here once, but it's now gone]
 
  • Like
Reactions: johnny.barracuda

johnny.barracuda

Bathwater Drinker
Sep 26, 2022
220
2,975
1,252
0fya082315al84db03fa9bf467e3.png
Please, Log in or Register to view quotes
Damn, this is embarrassing, but I didn't realize that name was an option. That is definitely way better than stem. So better in fact that I already patched it and pushed an update. Do you want me to add your name as a contributor to the project? I am being serious.
 

glandalf

Superfan
Mar 14, 2022
34
674
832
0fya082315al84db03fa9bf467e3.png
Please, Log in or Register to view quotes
S'all good mate, no need for contributor, just happy that I could of been some help. A friend of mine in fact showed me name when I asked them a question earlier, so I thought I'd relay the info as it helped fix an issue I was having.


Just updated the tool, tried creating a rat file in the directory I was having issue with, and BOOM, works like a dream. Thank you for patching that so quickly :)
 
  • Like
Reactions: johnny.barracuda