Data Structures, Data Mining and Big Data with Python
This course will introduce the advanced Python programming features with an emphasis on cloud
computing to solve large data problems. Topics include ETL using command line interface, functional programming, mySQL, MapReduce framework using Hadoop streaming and MRJob, and Spark with SparkML. Students will gain practical experience with Amazon Web Services Elastic Computing and Elastic MapReduce. Students will also explore how the Python built-in data structures such as lists, dictionaries, and tuples can be used to perform increasingly complex data analysis while introducing creating regression and cluster models in Python for data mining while introducing machine learning for analysis and analytics.
Prerequisites: I&C SCI X426.59 Intermediate Python: Data Structures, Algorithms and Object-Oriented Programming