Analyze this

Almost not related to Shoes: I wanted to discover how many of the hundreds of the daily downloads are what I would consider legitimate – real people downloading Shoes or real people using the Shoes packaging. How many bots, or idiot leachers or just evil folks and what are their IP#’s ? Some of them need to go into .htaccess in deny entries.

So I created some ruby scripts to process a collection of Apache (combined format) log files and stuff them into a local Sqlite3 db. I’m not a fan of SQL, but it’s the right tool for this. After many syntax errors I’ve got 7 days of shoes.mvmanila.com log entries (3475) which is enough to pick out the evil-doers, the idiots, the clueless and the friends.

Did I mention I don’t like SQL. I barely know what I’m doing. If someone one wants to help me analyze these logs, contact me at ccoupe@cableone.net and I’ll share the Ruby scripts and database. Because some of us jusst like to know things. I share your pain.

For example:
sqlite> select * from logentry where browser='Ruby'; shows me the Shoes (or other Ruby), AKA real people are using packager. If I knew SQL better I could probably figure how to count them or I could parse the results into a ruby hash based on the url (what are people really packaging for). Yes it would be butt easy to bin them into a ‘friends’ table and to populate an ‘evil doers’ table (because they POST instead of GET or they GET on things like index.php which just don’t exist at the site. Create another table for script kiddies. Some might end up in .htaccess deny entries.

Yes, I can track user behavior IF they use this site. I don’t care unless you abuse this site. There is a lot of abuse going on and has been ever since I went live. It always will occur but that doesn’t mean I have to accept it forever. I can clean my house and lock the doors when I want.

Leave a Reply

Your email address will not be published. Required fields are marked *