Can we predict how long movies will be "in release"?
Initial analysis of pre-release variables such as production budget don't accurately predict how long a movie will be in theatres. It seems as though this decision is made on the fly based on how well a movie is performing relative to competition. For this project, I have scraped a feature set from the web that varies with each day. These features include box office rank, profits per theatre, number of theatres, and how many days the movie has already been in release. Using an autoregressive time-series model, I make daily predictions as to how far along each movie is in its "in release" tenure.
Visualizing demographic distributions by county
Adtech companies are still trying to design algorithms that accurately predict click-through rate. In this project, we attempt to estimate the probability that a particular user is in an optimal age/income demographic, given the county. Census data has joint lognormal distributions of age + income on the national level, and only individual distributions on the county level. We estimated the joint distribution for each county by changing the national mu to account for the difference between the county and nation in age. Assuming the ip address reflects a county-specific location, we plan to evaluate how this feature may be useful in predicting click-through rate.
Evaluating the impact of scientific research topics
I am using APIs from various academic databases to attain information about NIH grant awards and the published abstracts that arise from that funding. Using Natural Language Processing techniques, I am clustering the abstracts by topic. Initial analysis reveals that I may be able to assign value to different topic clusters by evaluating the award funding per citation.