Monday, 23 April 2012

Utilizing Multiple Linear Regression Analysis in Order to Determine the Effects of Spatial, Digestive, and Other Variables on Personal Rap Listening Habits from 2011



Abstract: Building off my writings on British culture and economics, my incredibly popular previous blogs, and my work as an abstract political cartoonist, I bring my considerable expertise towards tackling Crossfadermetrics, the application of rigorous statistical analysis to gain a better understanding of rap music and international finance. My findings support the notion that the Nichols-Zeckhauser transfer model is inherently flawed and that the Washington Redskins almost certainly overspent in free agency.

Introduction: A few weeks (make that months, this took me way too long write) ago I finished my final project in my econometrics course on campaign spending and elections. Like all of my academic papers, it was pretty convoluted and I didn’t really comprehend anything I was writing and I still don’t understand any of the concepts I attempted to explain. I also used outdated data that was old enough to purchase alcoholic beverages in Canada and the European Union and a good portion of my data conflicted with the findings of economists who probably have a better grasp on this whole statistical analysis thing than students at western Pennsylvanian public colleges. And as you could have probably guessed from these first few sentences, it was plagued with awkwardly-worded run-on sentences. I can’t say I am too broken up about these facts, and writing this post has made it plainly obvious that I have forgotten most of the little information I learned in that class. Ho-hum. I have undoubtedly forgotten better and more useful academic concepts over the course my college career. I truly lament the fact that I will never grasp the Ramsey Optimal Tax model as well as I did during the peak of the finals epoch at LSE. What does bother me, however, is the fact that it seems that my copy of Stata, the data analysis and statistical software that I used for my project and is the bane of many an economist with a quantitative focus, is no longer necessary. While my current semester includes courses covering the arcane and abstruse topics of Japanese detective fiction and basic financial accounting (the latter of which is mercifully geared towards non-business majors) that will tax my mental faculties to their (admittedly meagre) capabilities, none of them are likely to require quantitative or statistical analysis using programs with maddeningly frustrating interfaces. Having paid five dollars to Pitt for a Stata license, it appears that I have not reaped the full half a sawbuck of value out of the program. I realize this reasoning violates the economic concept of a sunk cost (which could be damaging to my credibility considering this is the introduction to an economic paper) but like any principled individual, I realize that ignoring sunk costs in practice is completely foolish. I may not have learned much from my economics coursework, but one lesson that has not been crowded out by Arrested Development minutia and De La Soul verses is that applying economic theory to real-world scenarios is a dicey proposition at best. I will thus stubbornly attempt to derive my full five dollars of enjoyment out of the statistical package, and I have until August 2012 to do so.

How will I go about this? Clearly, the only logical answer is to employ the software to analyze my own personal rap listening habits in order to determine the primary factors driving such behavior. My interest in rap music has always been somewhat confusing to me, and I believe that studying its basic causes will help me better understand my affinity for irrelevant early-nineties urban music and yield valuable conclusions for policymakers. I think this exercise and subsequent paper will earn me about five dollars’ worth of enjoyment and thus make my purchase a wash, which is all you can really ask for when it comes to purchasing mandatory academic software licenses.

Literature Review: Normally, such papers provide some basic overview of the current papers on the topic to situate their own paper within the context of the previously-published academic writings.  Figuring this would be a somewhat fertile field for econometricians, I used the databases generously offered by my two colleges of higher learning to seek out previously-published statistical analyses of rap music. Like all of my interactions with the two institutions, I was ultimately let down after several long instances of extreme frustration. Thanks for nothing Pitt. Gas face goes out to LSE as well.

As long as we are tenuously on the subject of rap literature, I think now is as good a time as ever to point out that Ego Trip’s Big Book of Rap Lists is by far the most superior book on the genre. However, I was unable to find anything in the volume to help me with my current task.

Methodology: In order to properly write a paper on rap listening habits (ignoring the fact that my ineptitude with Stata and poor understanding of econometrics already disqualify me for such a task) we need to establish a definition of what exactly constitutes rap/hip-hop music. Many definitions of the genre discuss the nebulous four/five elements of hip-hop, consisting of graffiti writing, MCing, belly rings (according to Das Racist), street culture (see previous parenthetical statement), and other components that are incredibly unhelpful when it comes to determining the musical aspects of a song that would classify it in the rap genre. Even by genre definition standards, this is clearly unacceptable for the purposes of my paper. To save you even more long-winded analysis deprived of any insight or actual knowledge, rap music will basically be determined on a call-them-as-I-see-them basis. Rest easy, as you are in somewhat capable hands, given that I have read the aforementioned Ego Trip’s Big Book of Rap Lists twice and own 80 percent of the tome’s ten best albums of 1992, though I would like to state for the record that Mecca and the Soul Brother is criminally under-ranked at #8.

In Re: rap definitions, this project does raise the question of whether a song or artist is inherently more hip-hop than another. Since I am measuring total rap plays, is it realistic and accurate to assume that each rap song is equally rap-py? Is rap best defined in terms of thresholds (where after a song reaches a certain level of melodic devolution or lyrical simplicity it becomes hip-hop) or in terms of a hip-hop spectrum (where songs are classified as being varying levels of hip-hop-yness based on their respective levels of melodic devolution and lyrical simplicity)? Bands these days are increasingly likely to employ hip-hop elements in their songs and should these works count as partial plays as a result? Of course not, simply because that would complicate things even further and I don’t have the patience for such trifling annoyances.
I will allow for two exceptions to this rule. Any plays of Queensbridge-based artist Cormega will count as 1.025 rap plays as opposed to the usual one plays, which will be denoted as Cormega Bonus Listens (CBLs). Why is Cormega the only rapper to warrant above a listen? This.QED. Also, any plays by the Natives Tongues-affiliated Jungle Brothers will only count as 0.99 rap plays, as punishment for their role in introducing the abhorrent genre of hip house to the musical community. And it goes without saying that Belle and Sebastian plays will count as -0.0025 rap plays compared to the customary 0 given to all other non-rap listens.

The Model:

Dependent Variable

The dependent variable will be percentage of total plays that are in the rap genre, as defined previously in the paper. Listening statistics are derived from my incredibly unreliable last.fm account powered by the considerably wonky Audioscrobbler software. Interestingly enough, this is not the first study to utilize last.fm data, though it is almost certainly the worst. Again, I have already described what constitutes a rap song, and CBLs, Jungle Brothers, and Belle and Sebastian aside, each rap song listening will count as one rap play, with all non-rap plays counting as 0. This rap total will be divided into the total numbers of song plays for a given week to calculate rap percentage (Rappctg), measured in percentage points.

Independent Variables

Miles Run: At first glance, it may seem like the distance traveled on my runs (measured in total weekly miles) has no bearing on my propensity to listen to rap music. I can’t really offer any reasonable hypothesis to explain any causal relationship between running and rap. Maybe thinking about running or actually engaging in such foot-powered exercise primes me toward other activities starting with that particular consonant? Perhaps longer runs leave me so physically and mentally exhausted that I am incapable of enjoying any musical works more complex than the lyrical stylings of Gucci Mane (quite a frightening scenario, though I don’t know of anything I could listen to under such restrictions (also no song hyperlink here because I value your eardrums)). I will be the first to admit (and it should be painfully obvious to the few who have slogged this far (stay strong, only a few more variables, some results tables and explanations, and a conclusion to go) that I did not learn much of anything about econometrics in my class, but I did learn that including variables that are likely to be irrelevant in regression models is generally a poor idea. But I have been diligently tracking every single run on my expensive GPS watch for many months now and I need to justify the countless hours I spent slogging through the GarminConnect interface and synching my watch with my computer. I don’t expect to completely validate these efforts by including them in an uninspired and confusing blog post, but I suppose it is a start, and thus MilesRun is included in the regression model.  

Location and the Ecological Determinism Hypothesis: This variable aims to test whether hip-hop listening habits are determined by the listener’s environment. Is the genre particularly popular in urban areas based on the inherent demographics of those residing in such geographical zones or is it the result of some kind of ecological determinism? I initially raised this question while walking to class during my year as an undergraduate in London. In order to avoid any pedestrian interaction, I would always keep my eyes to the sidewalk and covered my ears with bulky headphones that didn’t have great fidelity, consistently messed up my hair, but effectively communicated that I did not have any spare pence or any knowledge concerning the direction of Picadilly Circus from my current location. And while I usually carried cash wherever I went, I never did really get my geographical bearings set in Britain and I probably would point the wrong way if prompted to direct anyone to any landmark besides the Tate Modern. And that knowledge isn’t all that impressive giving my dorm was right behind said gallery. Most of the time, these headphones were connected to my iPod and playing some kind of music, many of which could be classified as rap songs. My daily school commute took me past an endless strip of frou-frou sushi bars and boxed-sandwich peddlers, historical architecture, and other related ornate buildings, and after a few days I realized the jarring contrasts between the grimy conditions being described by Cormega (who else? (a question which is obviously rhetorical given that it’s not like he is the only rapper specializing in gritty crime narratives)) playing on my unwieldy, hair-crushing headphones and my immediate wealthy, worldly, European surroundings, and quickly switched to listen to The Kinks. I then began to wonder whether a significant portion of my rap listening results from my environmental conditions, and that the disparate relationship of rap listening habits across rural and urban spaces is a matter of context rather than actual individual musical tastes. It certainly would make sense given the neighbourhood effect posited by Kevin Cox in “The Voting Decision in a Spatial Context,” which basically states that an individual’s voting characteristics can be determined by their spatial location due to a variety of factors such as social relations among nearby contemporaries. While this study is unable to pinpoint the exact mechanisms of such a hypothesis, it will help determine whether such a phenomenon exists at all. Simply put, can someone’s listening environment affect their inclination to listen to hip-hop music and their enjoyment from said genre? It may explain why rap music is so popular in inner-cities and why country music is so big in the flyover states. 

In order to better illustrate the concept, consider a hypothetical pedestrian listening scenario. Imagine that someone is listening to "The Session" by the Roots, which is subtitled the longest posse cut in history and clocks in at a little less than thirteen minutes (the youtube link cuts off after ten, but you can get a pretty good idea about the track), which is really long by rap standards (In the time it takes you to read this blog post/academic article you could have probably listened to an entire rap album (though this may be more of a condemnation of the superfluous nature of my writing (and its numerous and wholly unnecessary parenthetical statements) than a meaningful statement on the length of rap songs and albums vis-à-vis other genres). Holding all other variables constant, enjoyment of this particular song can be expressed in the following utility graph:

Essentially, the MC quality is a little top-heavy (Black Thought and Malik B’s verses are too close together near the beginning), and the beat doesn’t really change much at all over the course of the track, thus demonstrating diminishing returns (though it isn’t a bad beat by any means), and these two factors ultimately cause track enjoyment levels in utils to decline as the track progresses and a negatively-sloped function. As a result, if this hypothesis holds based on the regression analysis, we can further posit a rap-utility-smoothing hypothesis. Drawing from Milton Friedman’s permanent income theory, which stated that individuals normally desire stable consumption patterns throughout the course of their lifetime, rap listeners may optimize their utility by “smoothing” their enjoyment of a particular rap album or song by adjusting their surroundings over the course of their listening to keep utility from listening constant. For instance, listeners to a top-heavy album that stacks its best songs near the beginning (i.e. Dr. Dre’s The Chronic) would be advised to gradually walk into areas of lower socioeconomic status and more blight as the album progresses in order to derive maximum enjoyment from the disc. Meanwhile, listeners to albums with stronger second halves such as Buhloone Mindstate (I really like the entire album, but it gets particularly excellent once you get to “Ego Trippin’Part 2”) can smooth their utility by walking closer to SIPB establishments (which I shall explain soon) and schools wholly devoted to economics and investment banking as the album progresses. 

This above graphic explains what I am talking about with utility smoothing. Using “The Session,” it is most efficient from a welfare economics perspective to balance out the declining quality of the song by adjusting one’s surroundings. The utility function u1 represents an listener’s utility holding all other factors constant while listening to the song. Meanwhile, u2 shows a constant utility curve which results from gradually moving from the wealthy confines of Buckingham Palace to the gang-infested streets of Baltimore populated by Bubbles and friends as the track develops. The worsening socioeconomic surroundings counteract the effects of the diminishing track quality to yield constant utility. The triangle labeled "A" can be considered a “frou-frou quality discount” that decreases track enjoyment while "B" measures the “decrepitude bonus” which has the opposite effect. It kind of works like the Keynesian concept of automatic stabilizers when you think about it. 
Basically, if this hypothesis holds, London should have a negative slope coefficient while Pittsburgh should have a positive slope coefficient. I will be using dummy variables to represent each city. London will have a value of 1 if I spent that week in 2011 in London while Pittsburgh will have a value of 1 if I spent the week in “Hell with the Lid Off.” I unfortunately spent January through May in London and was in Pittsburgh for most of the remainder of the year. There were like 2 weeks which I spent at my home in New Jersey, where both dummy variables will equal zero.


Tesco Sandwiches Consumed: A few weeks (make that months) ago the government of India, widely regarded as the messiest democracy in the world, managed to overcome its byzantine political system to achieve true progress in improving the welfare of its citizens. Indian officials were able to renege on their decision to allow international retailers and supermarket chains such as Wal-Mart to enter their country. Now if anyone was still reading this post you may be asking yourself how Indian citizens can benefit from such a development, given that it seems that Indians will be doomed to patronize disorganized and corrupt small-scale retailers through the immediate future. That is true. But this restriction means that Indian society has also managed to temporarily stave off the invasion of the Tesco supermarket chain, whose sandwiches are a menace to British society and foreigners studying abroad. Because British citizens are apparently incapable of independently purchasing bread, meat, and other foodstuffs and assembling such materials in a fashion that satisfies hunger and avoids indigestion, the sandwiches-in-plastic boxes (SIPBs) industry is a huge one on the island. In general SIPBs are adequate at fulfilling sandwich criteria number one and poor at satisfying the second and perhaps most important condition. However, due to limited funds and an unfavorable exchange rate, I had to often consume Tesco sandwiches during my stay in London, much to my chagrin. I predict that this variable will be positively correlated with rap listens, as Tesco sandwich consumption is likely to be strongly associated with my personal anger levels, and I am probably much more likely to listen to rap in such an emotional state. For the purposes of convenience and scientific accuracy, the variable representing weekly Tesco sandwiches consumed will be denoted in the regression as ScourgesUponHumanity and will be expressed in individual sandwich units. Interestingly enough, Tesco’s corporate performance began to nosedive right around the time I finally decided to boycott their stores for the rest of my tenure in London. Some analysts claim that this is primarily due to stretching themselves too far abroad while neglecting the quality of their British stores (no kidding) but I still take completely responsibility for them not meeting their earnings forecasts.  

Temperature: If Spike Lee’s 1989 film Do The Right Thing taught me anything, it was that summer heat levels are highly correlated with instances of property damage to pizzerias from aerial trash can assaults. Additionally, I learned that Public Enemy songs can become very grating when constantly played for two hours. Regardless, the film raises the question that perhaps the high temperatures experienced by Radio Raheem and his chums were a causal factor in not just their launching of rubbish bins but also their propensity to play “Fight the Power” by Public Enemy through (unfortunately) the whole movie. To test this hypothesis, I will add Temperature variable will test whether there is any credence to the notion that rap listening habits are positively correlated with temperature. Temperature will be expressed in degrees Fahrenheit and be the average temperature over a week. From what I remember, London weather was consistently 45 degrees and cloudy from September through May but I know that there was actual temperature variations once I returned to America, meaning this variable might contribute something to the model.

Fun P.E. Fact: Flavor Flav actually tapped in the snares on the legendary cut “Rebel Without a Pause.” While this is an interesting fact by itself, it raises the possibility that Flav may have actually not have had a negative net effect on the quality of It Takes a Nation of Millions to Hold Us Back, as this feat of drum programming may have been sufficient to counteract horrendous contributions such as "Cold Lampin' with Flavor." However, it is worth considering the hype man’s S.P.A.R. (snare-programming-above-replacement) score when examining his net contributions to the album. If Flav was unable to perform the snare tapping then the task would have probably been performed by Hank or Keith Shocklee, who probably wouldn’t do that bad a job. It is perhaps more likely that this factual tidbit basically qualifies Flav for a rarely-seen induction into rap’s equivalent of Club Trillion, for musical contributors who have no net effect on the quality of the album and would not affect any enjoyment of an album if they were to have never worked on the disc in the first place. To my knowledge, the only other artist qualifying for the rap Club Trillion is 5 Ft. Accelerator on Enta Da Stage. One future avenue for Crossfadermetrics research is surely likely to examine other potential rappers and producers who qualify for the hallowed Club Trillion pantheon.

Due to a mixture of laziness, lack of data, and ignorance, these are all the variables I will incorporate into the model. Thus, we can write the equation for the regression as follows:

Rappctg= Temperature+London+Pittsburgh+Temperature+ScourgesUponHumanity+MilesRun

And here is a summary of all the variables I am using:

So now that we have designed this model and I have collected all of the data, all I have to do is run the actual regression with Stata. The results are below:

Results: Good gravy. I really forgot a lot about interpreting regression analysis over the course of writing this paper. After consulting with some old course materials I had lying around I will try to do my best.
We can write the final regression results as follows:

Rappctg= 54.79 + 7.09(Pittsburgh) –13.76(London) + 0.25(MilesRun) –2.39(ScourgesUponHumanity) - 0.24(Temperature)

The constant of 54.79 is pretty uninteresting. Basically if I ran 0 miles, currently resided in somewhere besides London or Pittsburgh, ate 0 Tesco sandwiches, and the temperature was 0 degrees Fahrenheit, 54.79% of my songs listened to that week would have been of the rap variety. Given that I don’t really plan on moving to Siberia anytime soon this doesn’t mean all that much by itself, and actually Tesco may have already opened some establishments in Russia.

According to the model, every additional mile I run during the week increases my weekly rap plays by 0.25 percentage points. I really don’t have any idea of how to interpret this, as again I only included the variable in the model to serve as some kind of justification for my diligent mile tally-age over 2011. I guess this supports my “things that begin with ‘R’” hypothesis. I guess we can't rule out the Gucci Mane theory either, but I think my knees would destroy themselves before I run long enough to find his raps too mentally rigorous.

The ScourgesUponHumanity variable suggests that for every additional Tesco sandwich I eat over a given week my rap play percentage decreases by 2.39 percentage points, which is kind of big I suppose. This conflicts with my prediction that Tesco-induced anger led me to listen to more rap than normal. The thought of consuming one extra Tesco sandwich a week is such an abhorrent thought that I will stop discussing this variable for the sake of my sanity and stomach.

The most surprising result is the sign of the slope coefficient for temperature. The coefficient of  -0.24 means that I actually listened to less rap music as the temperature increased, ceteris paribus. Given that my rubbish bin tosses stayed constant at zero through the entire year, I suppose we can say that Do The Right Thing features more Hollywood embellishments and fantasies than we may have originally thought.
The Pittsburgh dummy variable has a positive slope coefficient of 7.09, which basically means that holding all other factors in the model constant, I listen to 7.09 percentage points more rap music than in other locations. This offers support to my environmental determinism hypothesis. While London and Pittsburgh are both cities (a bit of a stretch for the latter if you include “decent public transportation” and “free from Kenny Chesney concerts” as criterion for city status) the environmental determinism hypothesis predicts that I will listen to more rap music in the rough-and-tough rust belt environs of Pittsburgh rather than the SIPB-infested streets of Central London.

The next step after running the regression is clearly to test the correlation between the variables to make sure there isn’t any multicollinearity. I have forgotten the exact mechanisms that drive its negative effects, but I do remember learning that multicollinearity screws up the model in some way and that it occurs when two variables are highly correlated. The correlation test results are as follows:

Nothing too glaring here. Pittsburgh and London are apparently pretty strongly negatively correlated, which I suppose is a rather strong vote of confidence for the laws of physics. But it looks like we are safe from the multicollinearity menace. 

The R-squared value of my model is 0.23, which essentially states that my model explains a little less than a quarter of the observed variation in rap listening habits over 2011. Not terrible I guess. The R-squared value of my election model was around 0.52, but that included variables like campaign spending and party strength which were actually relevant to the dependent variable.

Looking over the report for my final project on voting outcomes, I evidently ran some kind of test for heteroskedasticity at the end. Not being someone who likes to upend tradition, here are the results from the Breusch-Pagan/Cook-Weisberg test for heteroskedasticity. I didn’t explain the test or my rationale for running it at all in my paper, which suggests that I didn’t really have any idea why I ran it in the first place or what it actually did. I can’t say that any of my classes this semester taught me about it either (Gas face goes out to you too, Japanese Detective Fiction professor) but maybe some reader will figure out what it means. 

Conclusion and Policy Prescriptions: So now you are probably saying to yourself, “Great article,” (well maybe not) “but what was the point of all this?” (that seems more realistic). Well, these results allow us to draw the following conclusions and predictions:

Evidently rap music is more appealing the colder it gets and the worse one’s socioeconomic surroundings become. Tesco sandwiches appear to decrease the propensity to listen to rap music, though it should be noted that these results may not hold for other, more palatable sandwich types, especially non-SIPBs. Running seems to have a slightly positive effect on rap listening habits.

Newt Gingrich and Ron Paul will both continue campaigning for the Republican nomination until the convention in August and Mitt Romney will fall short of the 1,144 delegates required to clinch the nomination. There will be a rancorous brokered convention in Tampa where Buddy Roemer will finally emerge as victorious. This development will not prevent Gail Collins from incessantly referencing Mitt Romney’s trip to Canada with his dog on the roof of the car.The Bank of Japan will pursue expansionary monetary policy in hopes of reinvigorating the yen and their economy.

Tony does indeed get shot by the guy in the Members Only jacket, but the bullet only grazes his arm and he lives out the rest of his life in a rather uneventful fashion. Tragically, he never sees Eric Mangini again.

The model suggests that Low End Theory is the superior album to Midnight Marauders, though both are some of the best rap offerings from the early 90s. The decision is based on the former’s topical diversity and Busta Rhymes’ verse on “Scenario.” Unsurprisingly, the field of Crossfadermetrics is still incapable of answering the age-old question of 3 Feet High and Rising vs. De La Soul is Dead.

I feel that this scholarly paper/blog post lays the foundation for future research into the nascent field of Crossfadermetrics. I invite economics scholars (you know you want to, Krugman) to further take an analytical and regression-based approach to hip-hop. .