A required course in the Data Science Certificate Program
Big data is one of the most important technology trends to fundamentally impact the way organizations operate and compete. As more and more companies collect large amounts of data through their daily operations, the ability to analyze and glean knowledge from big data has become an integral part of a successful business. This course will help students navigate through the complex layers of Big Data while providing insight on ways to effectively use technologies and architectures to create and manage big data workflows. Concepts covered include an introduction to Big Data and related technologies, discussion of Big Data Processing Architectures, explanation of major concepts behind Big Data Management, and how all of those topics are applied in Big Data Analysis. Students will gain an understanding of the characteristics of big data and techniques for working on big data platforms through hands-on exercises in the tools and systems used by data scientists and data engineers including Hadoop (HiveQL & PIG), Apache Spark, and SparkSQL.
Prerequisite: I&C SCI X427.05 Fundamentals of Data Science
NOTE: This course may use live sessions via Zoom. While students are highly encouraged to attend, all sessions are optional and will be recorded. A device with audio and visual will be needed to participate. The following student guide provides additional resources/information on how to use and access your courses Zoom sessions.