BIAM530 Week 5 iLab Analyzing Data with Hadoop

Share it with your friends Like

Thanks! Share it with your friends!

Close 1: Are there other data columns in the GHCN-Daily file that might be relevant to 311 call volume besides what was used in the tutorial? If so, what are they, and how might you include them in the analysis?

Question 2: Review what other data files are available on this NOAA National centers for Environmental Information Quick Links page. Select one other file that might contain data of interest to a company or government agency and briefly describe an analysis that could be performed using this data.

Question 3: Visit the NYC Open Data site at and follow the links to Social Services and the 311 Service Requests page. In addition to the data columns used in the tutorial, what other data columns available might be useful in an analysis of 311 calls? Suggest some analyses that might be performed using these data.

Question 4: Explore the NYC Open Data site to identify other data sets available besides the 311 service calls. Select one of these data sets and briefly describe how it could be used in an analysis for a government agency or business.

Question 5: Summarize the major steps performed in the analysis shown in the tutorial, and the IBM BigSheets functions used in the analysis.

Question 6: Compare and contrast the analysis shown in the tutorial, using Hadoop and IBM BigSheets, with how a similar analysis might be performed on a smaller data set using Microsoft Excel. What was similar and what was different? What challenges would be faced by an analyst skilled in Excel when adjusting to working with big data sets using Hadoop?


Write a comment