Manually Removing Duplicate Files

While growing up, a lot of us get massive music libraries. Sometimes, those libraries look more like tumbleweeds; after years of faithfully collecting your music files, you now have a library that’s so obfuscated, you’re tempted to buy some software that claims to remove duplicates. Problem is, even with that software, it takes forever to get it done. What’s a music fan to do?

Not to worry! If you’re a little familiar with Python3 and the Linux command line, you can clean out that collection in no time, and that reliably!

Disclaimer: please read this all the way through before running the python code. The bash command doesn’t delete anything, but the python code does.

First, we need to find those pesky files. I found this little gem over at commandlinefu.com. It’s one big, fat, hairy command that searches recursively through the directory structure where you execute it and prints duplicates to the console, which I’ve piped to a file on the Desktop for later use:

find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate > ~/Desktop/unique.txt

A quick note on this command: it first checks the file sizes, then the MD5 hashes of the files if necessary, to check for duplicates. It takes a minute to run (on my 10 G music library, it takes about 45 seconds).

Once we have a list of all our duplicates, we need to delete them all. Or rather, we need to delete all the duplicates except for the first one. Note that that last command kindly puts an empty line between each group of duplicates. If we skip the first of these and delete all of the others until we run into an empty line, rinse, and repeat, then we will have gotten rid of all of our duplicates. This can be done by hand, but it can be done much faster (and much more reliably) using Python3.

Open a terminal and navigate to your music file directory. Type “python3” to start the correct python interpreter, and copy/paste the following code at the prompt (in Ubuntu, ctrl+shift+v pasts into the terminal):


import os
remove = False
f = open('/home/adam/Desktop/unique.txt')
for line in f:
    line = line.strip()
        if line:
            if remove:
                fname = line.split('./',1)[1]
                print(fname)
                os.remove(fname)
            else:
                remove = True
        else:
            remove = False

And voila! Your music library has been significantly reduced in size in mere seconds. Aren’t you glad you didn’t have to sit there for a few weeks, manually deleting files one at a time?

Disclaimer: I am not responsible for the inadvertent deletion of good files; it’s your responsibility to understand what the bash command above does, as well as what the python script does. One thing this process cannot do is determine which part of your folder structure a given file is supposed to belong in, and it may end up deleting something from one album’s folder because it existed elsewhere. If the location of a file is important to you, you should try to find another solution that works better.

Good luck!

Posted in Uncategorized | 2 Comments

Rails 4.x.x and Twitter Bootstrap

I just spent over an hour trying to figure this out, so I’m posting what I found here in case others run into the same problem.

Rails Version: 4.0.2
(gem) bootstrap-sass Version: 3.0.3.0

After following the instructions here to install bootstrap in my rails project, I was getting an interesting error:

File to import not found or unreadable: bootstrap

I tried putting the gemfile statements for bootstrap-sass and sass-rails inside of an assets group, but apparently that fix only works for rails < 4.x.x, so in doing this, I caused another problem that gives a very similar error message when you try to load the page. After running around the interwebs for a while, it occurred to me that I hadn't restarted Webrick. Duh! After I removed gemfile statements from the assets group and restarted the server, the styles were loading correctly. Took way too long to figure this out. Moral of the story? If you put new lines in the gemfile, restart your server before freaking out about it not working; you might do what I did and fail to restart before you try to debug the problem. Turns out the cause of the problem is that Rails 4 handles groups in the gemfile a little differently, so things like bootstrap shouldn't be put into the assets group.

Posted in Uncategorized, Web Development | Leave a comment

Complete Trees

The advent of object-oriented programming spawned the creation of abstract data types like linked lists, binary search trees, heaps, etc. A large portion of these can be classified as trees, which themselves can be classified as graphs. While graphs and trees have been used long before computers became prevalent, the serious study of these structures comes as a direct result of object-oriented programming. As a consequence, much of graph and tree theory comes as a direct result of anomalies in how computers store data in memory. This presents us with a unique problem: as mathematical structures, the properties of trees and other graphs are sometimes difficult to formally study purely out of mathematical interest. The best example of this is found in the traditional definition of complete trees.

Complete Trees
A complete tree is a tree in which all levels are completely filled, with the possible exception of the last level, which must be filled from left to right.

The first question a mathematician should ask themselves at this point is, “how do we tell if a level of a tree is completely filled?” The second question is “why do they put that funny exception in there at the end?” In the quest to resolve these problems, some texts (like Rosen) limit this definition to m-ary trees. This approach attempts to fix the ambiguity of the traditional definition by applying it only to trees in which the number of nodes on level L is the Lth power of m. While this does eliminate the ambiguity, it also excludes many other trees in which the number of nodes on level L is known, perhaps as a function of L other than m^L.

The real source of the ambiguity in this definition is in the fact that we don’t often consider the creation of a graph structure as an operation that is separate from the assignment of values to the elements of that structure. To fix this problem, we need to define to function families that can be seen as collections of rules that govern each operation. I call these function families \delta and \alpha functions.

\delta and \alpha functions
A \delta function is any function that constructs a graph structure consisting of nodes and the edges that associate those nodes. An \alpha function is any function that assigns values to some or all of the elements of a structure created by a \delta function. In some cases, like with Huffman trees, the associated \delta and \alpha function pairs must have a feedback loop between them.

With these two function families, we are able to define many kinds of graphs and trees more formally by constraining \delta and \alpha function pairs. Once they are sufficiently constrained, these function pairs can be utilized to define many kinds of trees that are used in computer science. The first such constraint that I will mention is the basic constraint that a \delta function construct a rooted tree by starting at the root and assigning children to it, then entering into a loop in which the function assigns children to the nodes on each level before moving on to the next level. Such a \delta function can be constrained further by requiring it to assign exactly m children to every internal node in the tree.

If we consider the number of children a node on level L must have to be a function of L written f(L) = m_L, we can generalize the m-ary tree \delta function to construct trees in which the number of nodes on level L can also be expressed as a function g(L) = \prod_{n=1}^L f(n). The results of f(L) for each level L can be seen as a sequence S = \{m_1, \dots, m_f\} that can be passed as an input to a \delta function that constructs trees starting at the root as previously described. We write \delta(S) to denote the \delta function that corresponds to what we call a sequence tree, in which all internal nodes on level L must have m_L children.

At this point, I feel it safe to offer the following mathematical definition of complete trees, which expressly ignores the exception included in the traditional definition because of the non-mathematical origins of that exception:

Complete Sequence Trees
A sequence tree is complete if all of the nodes defined by the corresponding \delta(S) function have been assigned values by the corresponding \alpha function.

Note that there are several ways for a \delta and \alpha function to be incompatible. The first and most obvious case is when the \alpha function for a sequence tree fails to fill all nodes with values. The second and third potential consequences are less obvious; what if all nodes defined by the \delta function have been assigned values, but not all values the \alpha function is constrained to assign have found a place? Also, what if the “shape” of the tree prevents the tree’s \alpha function from assigning all the values it must assign, even if there are still empty nodes in the structure, because of the way that the \alpha function has been constrained? These cases have no formal definition in tree theory, so I will provide one:

Perfect Complete Sequence Trees
A perfect complete sequence tree is any sequence tree in which all nodes have been assigned values, and all values which must be assigned have been assigned to nodes in the tree.

These ideas and concepts will be published as part of a paper I am co-writing with my discrete mathematics professor at Brigham Young University-Idaho. I will be posting more on this subject after publication.

Posted in Discrete Mathematics, Uncategorized | Leave a comment

Hello world!

Welcome to my home!

I am currently constructing this site with the intent to express my opinions on various and random topics in programming, science, math, and religion, and to educate the public on the mathematical discovery that has been the focus of my life for the past year: complete modulus trees.

You should be aware that I am not the authoritative voice on most of the subjects on which I will be posting, nor to I aspire to be. My concern is with truth. If, therefore, I post anything that is not true, objective, realistic, and complete, I want you all to correct me. Please try to be respectful when doing so.

Stay tuned for more in the future.

Posted in Uncategorized | Leave a comment