Lade Daten...
iLearn
Cau-siegel-color-300

Lecture

Online Lecture: Recorded lecture material for download + online live sessions

Lecture-Podcasts: The lecture will be provided as a podcast series (available for download), made available Mondays from 4pm.

Lecture-Live-Sessions: The material provided on Monday and Tuesday will be briefly recapped and discussed in the live-session (Tuesdays 13:00-13:45) that will be conducted via ZOOM (Entry-links will be provided via iLearn-portal). In the live-Sessions, you have the opportunity to ask questions to the addressed topics.

Live-Session: Tuesday 13:00 - 13:45 Zoom link

First date: 03.11.2020

Excercise Class

Thursday 8:15 - 9:45 Zoom link: https://uni-kiel.zoom.us/j/82387350214

On 19.11.2020, use this room: https://zoom.us/j/93951871074

First date: 12.11.20

Lecture Slides

will be uploaded in the future

Abstract

Knowledge Discovery is the nontrivial process of discovering interesting, valid, novel, and potentially useful information in huge collections. It is a multidisciplinary field drawing work from areas including data management and databases, statistics, pattern recognition, machine learning, information retrieval, recommendation systems, knowledge-based systems, high-performance computing, and data visualization among others. Data Mining is the key-component of the knowledge discovery process that performs the analysis of the data to reveal new information. Though data mining already emerged during the late 1980s, in the context of data-intensive scientific discovery (founding the new age of science), it became a very important field for many applications in academia and industry. This lecture will introduce the the field knowledge discover by focusing on basic concepts and algorithms for data mining and related concepts for data pre-processing.

Learning Goals

In this lecture, students will become an understanding of the field Knowledge Discovering and learn the principal and most important techniques, methods and tools associated with data mining, including frequent itemset mining and association rule mining, classification, clustering, and outlier detection.

Topics

  • Introduction to Data Mining and Knowledge Discovery in Databases
  • Data preprocessing, feature selection, similarity and distance functions
  • Frequent itemsets mining and association rules
  • Classification
  • Clustering
  • Outlier detection

There is no specific textbook the class’s material directly follows. But, most of the class’s material is covered in several textbooks:

  • Primary sources:
    Han J., Kamber M., Pei J., Data Mining: Concepts and Techniques, 3rd ed., Morgan Kaufmann, 2011
    and Tan P.-N., Steinbach M., Kumar V., Introduction to Data Mining, Addison-Wesley, 2006.

  • Further sources:
    Tan, Steinbach, Kumar: Introduction to data mining, Addison Wesley, 2006.
    Liu: Web Data Mining, Springer, 2007. Witten, Frank, Hall: Data Mining, Morgan Kaufmann, 2011.

Exam

The exam in the first period is scheduled for 24.2.2021, 16:00 and will be held in digital form via OLAT. More information will follow via E-Mail.

Lecture Material


Lecture Week | Date  | Pages | Section | Slides (Script)  | Pod-Cast Records  | Live-Session
1 02/03.11.2020 1-54 Section 1: Introduction Intro, Why to Study Data Mining?, What is KDDM? Intro, Why to Study Data Mining?, What is KDDM? 03.11.20 01:00pm; Meeting-Link: Link; Meeting-ID: 921-8174-8973; PW:1234
2 09/10.11.2020 55-64 Section 2: Features Data Preprocessing, Decomposing a Data Set Data Preprocessing, Decomposing a Data Set 10.11.20 01:00pm; Meeting-Link: Link; Meeting-ID: 921-8174-8973; PW:1234
3 16/17.11.2020 65-89 Section 2: Data Descriptors & Feature Spaces Data Descriptors, Feature Spaces Data Descriptors, Feature Spaces 17.11.20 01:00pm; Meeting-Link: Link; Meeting-ID: 921-8174-8973; PW:1234
4 23/24.11.2020 89-110 Section 2: Data Descriptors & Feature Spaces & Section 3: Frequent Itemsets and Association Rule Mining Text Features, FIM/ARM Intro Text Features, FIM/ARM Intro 24.11.20 01:00pm; Meeting-Link: Link; Meeting-ID: 921-8174-8973; PW:1234
5 30.11/01.12.2020 111-144 Section 3: Section 3: Frequent Itemsets and Association Rule Mining FIM-Apriori, FIM-ARM FIM-Apriori, FIM-ARM 01.12.20 01:00pm; Meeting-Link: Link; Meeting-ID: 921-8174-8973; PW:1234
6 07/08.12.2020 145-159 Section 4: Section 4: Frequent Itemset Mining II FIM-FPTree FIM-FPTree 08.12.20 01:00pm; Meeting-Link: Link; Meeting-ID: 921-8174-8973; PW:1234
7 14/15.12.2020 160-178 Section 4: Section 4: Frequent Itemset Mining II FIM-Partition, FIM-CFI FIM-Partition, FIM-CFI 15.12.20 01:00pm; Meeting-Link: Link; Meeting-ID: 921-8174-8973; PW:1234
8 04/05.01.2021 178-202 Section 4: Frequent Itemset Mining II / Section 5: Classification FIM-Categorical, Classification Part1 FIM-Categorical, Classification Part1 05.01.21 01:00pm; Meeting-Link: Link; Meeting-ID: 921-8174-8973; PW:1234
9 11/12.01.2021 202-226 Section 5: Classification Decision Trees (part1), Decision Trees (part 2) Decision Trees (part 1), Decision Trees (part 2) !!! Monday 11.01.21 05:00pm !!!; Meeting-Link: Link; Meeting-ID: 921-8174-8973; PW:1234
10 18/19.01.2021 226-249 Section 5: Classification Overfitting, kNN Classification Overfitting, kNN Classification Tuesday 19.01.21 01:00pm; Meeting-Link: Link; Meeting-ID: 921-8174-8973; PW:1234
11 25/26.01.2021 249-280 Section 5/6: Classification/Clustering Classification: Evaluation, Clustering: Introduction Classification: Evaluation, Clustering: Introduction Tuesday 19.01.21 01:00pm; Meeting-Link: Link; Meeting-ID: 921-8174-8973; PW:1234
12 01/02.02.2021 280-316 Section 6: Clustering Clustering: Partition-Based, Clustering: Hierarchical Clustering: Partition-Based, Clustering: Hierarchical Tuesday 02.02.21 01:00pm; Meeting-Link: Link; Meeting-ID: 921-8174-8973; PW:1234
13 08/09.02.2021 317-358 Section 6: Clustering Clustering: Density-Based, Clustering: Model-Based Clustering: Density-Based, Clustering: Model-Based Monday 08.02.21 05:00pm; Meeting-Link: Link; Meeting-ID: 921-8174-8973; PW:1234
14 359-401 Section 7: Outlier Detection (Not relevant for the exam!!!) Outlier Detection: Intro, Outlier Detection: Model/Distance-based Outlier Detection: Density/Cluster-based Outlier Detection: Intro, Outlier Detection: Model/Distance-based, Outlier Detection: Density/Cluster-based No online meeting