Recall I guessed a function Y(t) (or rather, its rate of change) for YouTube Views vs. Time of popular videos. I was interested enough in the idea such that I actually tested it – with three popular YT videos:
1) By My Side – David Choi MV;
2) Rebecca Black – My Moment; and
3) Nigahiga – 4,000,000 Subscribers.
For those who do not wish to go through the original post, here was my original guess:
We’ll see how well it stands up against experimental data.
1) David Choi
For this video, I recorded only the number of likes for ~6 hours as a proxy; the YouTube data for views were deemed unreliable. The results are as follows:
The solid line is a logarithmic fit to the data. It’s a very good fit with an R2 value of 0.994. But alas, there is no oscillatory nature to the data, at least from an initial glance. So perhaps the oscillatory component is negligible or doesn’t exist. But what about the other part? Since the derivative of a logarithm is 1/x, this data fits that part of the guess perfectly. Fantastic. However, note that the fit starts diverging from the data points for large times. So I figure a correction might be needed.
The possibility of a correction, along with the peculiar lack of oscillation with day/night, prompted me to do another, more rigorous analysis, on a much more popular video.
2) Rebecca Black
I was thrilled when Vicki alerted me to this <6 hours into the video's posting. For this video, I tried taking data points on irregular but short intervals of 3 hours or less. The data gets more loose after a few days as I was not near my computer the weekend after the video's posting. And here is the data:
Note that the logarithmic fit again performs very well with R2 values over 0.9, but now the oscillations show up.
Now for this video I had enough data points to approximate the instantaneous rate of change of views at all times by assuming
for any two adjacent data points at t1 and t2 such that tmid is the average of t1 and t2. For better accuracy, we only did this for adjacent data points that were less than four hours apart.
The derivatives are plotted below:
And WOW, look at the oscillations! But as it turns out, the exponential fit works out the best here (which is why those fits appear on the graph). And yes, I know the R2 values take a hit because any attempt of spreadsheets to do a x-k fit will blow up at 0, but more importantly, the curvatures of the data and 1/x don’t match. A graphical comparison of the shapes of 1/(x+1) and e-x demonstrates this nicely:
(Source; green is e-x, blue is 1/(x+1). Red graph will be explained shortly.)
However, neither one works fully alone.
–We just demonstrated that 1/(x+1) does not work as well for Rebecca Black’s derivative, but does for David Choi, assuming the derivative is as good as a fit as the plot itself.
–The better fit for Rebecca Black’s plot itself, however, is logarithmic, which is unusual given the prior statement. Notice the fit in the graph three images above becomes better for large times, whereas for David Choi, the fit becomes worse.
–In either case, the exponential derivative should not work for large times because that converges to zero very quickly, while we reasoned that it should converge much more slowly.
This is where the extra graph in the figure comes in. The sum of 1/(x+1) and e-x, the red graph, looks very much like the exponential alone – Rebecca Black style, but does not converge to zero as fast, as would be expected. Varying the shift of the rational component (not shown) could make the derivative look a lot more like 1/x… as would be expected from a David Choi perspective. Either way, the derivative looks completely 1/x-ish after a long time, as the exponential dies out rapidly, and this is supported by Rebecca Black looking completely logarithmic after a long time. And by shifting the graphs, we can alter the appearance of the graph for smaller t‘s.
So ignoring the oscillations, which we’ll add back later, we assume a form
where everything not symbolized by t are constants to be determined later.
We’re not done yet though – we still haven’t figured out why Rebecca Black oscillated but not David Choi. Or what exactly the “correction” is. Onto the next video…
The plot of views vs. time:
(The fit really goes awry at large times in this case. Yikes.)
Time derivative, in the same fashion as for Rebecca Black:
And here we see slight oscillatory motion, but it dies to a negligible amplitude very quickly. Weird. Or is it?
Consider a mass-spring system on a rough surface with friction. Assuming that the resistive force is proportional to velocity, a differential equation can be set up which has solutions of the following form:
Where ζ depends on the mass, the spring constant, and the coefficient of the resistive force.
Well, my belief, based on the data that I have collected here, is that we have something similar going on with YouTube videos. Depending on various factors, video popularity (quantified by the rate of change of views/likes over time) dampens over time… and it can either dampen without oscillating much between night and day (if at all), ala David Choi/Nigahiga, or it can dampen very slowly after several oscillations, ala Rebecca Black. The 1/x factor ensures that the derivative never reaches zero overnight, and such that the total views v. time does not reach an upper limit. The superposition of the dampened oscillation factor and the general 1/x factor in the derivative culminates in what we want. [I’ll add something about the interpretation of a lack of night/day oscillation later.]
Regarding the correction: we’ll just add an exponent to the 1/x factor. This is a work in progress, lol… don’t know what better things to do here.
The general solution for the non-oscillatory video is EQUATION 1, with the correction:
The general solution for the oscillatory video is as follows:
Again all things not related to t are constants. The square in the cosine term prevents the derivative from going negative, which wouldn’t make sense.
Hopefully I’ll add more to this and maybe correct this again in the future. My original guess wasn’t too bad though, I suppose.
P.S. I plugged in some arbitrary constants to generate these plots that somewhat resembled the experimental data:
1) Rebecca Black