Google Custom Search

Wir verwenden Google für unsere Suche. Mit Klick auf „Suche aktivieren“ aktivieren Sie das Suchfeld und akzeptieren die Nutzungsbedingungen.

Hinweise zum Einsatz der Google Suche

Technical University of Munich

  • Data Analytics and Machine Learning Group
  • TUM School of Computation, Information and Technology
  • Technical University of Munich

Technical University of Munich

Open Topics

We offer multiple Bachelor/Master theses, Guided Research projects and IDPs in the area of data mining/machine learning. A  non-exhaustive list of open topics is listed below.

If you are interested in an internal thesis or a guided research project, please send your CV and transcript of records to Prof. Stephan Günnemann via email ([email protected]) or to the project's reference PhD student. 

If you are interested in an  external thesis with us, please write directly to  Prof. Stephan Günnemann via email ([email protected]).

Generative Models for Drug Discovery

Type:  Mater Thesis / Guided Research

Prerequisites:

  • Strong machine learning knowledge
  • Proficiency with Python and deep learning frameworks (PyTorch or TensorFlow)
  • Knowledge of graph neural networks (e.g. GCN, MPNN)
  • No formal education in chemistry, physics or biology needed!

Description:

Effectively designing molecular geometries is essential to advancing pharmaceutical innovations, a domain which has experienced great attention through the success of generative models. These models promise a more efficient exploration of the vast chemical space and generation of novel compounds with specific properties by leveraging their learned representations, potentially leading to the discovery of molecules with unique properties that would otherwise go undiscovered. Our topics lie at the intersection of generative models like diffusion/flow matching models and graph representation learning, e.g., graph neural networks. The focus of our projects can be model development with an emphasis on downstream tasks ( e.g., diffusion guidance at inference time ) and a better understanding of the limitations of existing models.

Contact :  Leon Hetzel

References:

Equivariant Diffusion for Molecule Generation in 3D

Equivariant Flow Matching with Hybrid Probability Transport for 3D Molecule Generation

Structure-based Drug Design with Equivariant Diffusion Models

Efficient Machine Learning: Pruning, Quantization, Distillation, and More

Type: Master's Thesis / Guided Research / Hiwi

  • Strong knowledge in machine learning
  • Proficiency with Python and deep learning frameworks (TensorFlow or PyTorch)

The efficiency of machine learning algorithms is commonly evaluated by looking at target performance, speed and memory footprint metrics. Reduce the costs associated to these metrics is of primary importance for real-world applications with limited ressources (e.g. embedded systems, real-time predictions). In this project, you will investigate solutions to improve the efficiency of machine leanring models by looking at multiple techniques like pruning, quantization, distillation, and more.

Contact: Bertrand Charpentier

  • The Efficiency Misnomer
  • A Gradient Flow Framework for Analyzing Network Pruning
  • Distilling the Knowledge in a Neural Network
  • A Survey of Quantization Methods for Efficient Neural Network Inference

Deep Generative Models

Type:  Master Thesis / Guided Research

  • Strong machine learning and probability theory knowledge
  • Knowledge of generative models and their basics (e.g., Normalizing Flows, Diffusion Models, VAE)
  • Optional: Neural ODEs/SDEs, Optimal Transport, Measure Theory

With recent advances, such as Diffusion Models, Transformers, Normalizing Flows, Flow Matching, etc., the field of generative models has gained significant attention in the machine learning and artificial intelligence research community. However, many problems and questions remain open, and the application to complex data domains such as graphs, time series, point processes, and sets is often non-trivial. We are interested in supervising motivated students to explore and extend the capabilities of state-of-the-art generative models for various data domains.

Contact : Marcel Kollovieh , David Lüdke

  • Flow Matching for Generative Modeling
  • Auto-Encoding Variational Bayes
  • Denoising Diffusion Probabilistic Models 
  • Structured Denoising Diffusion Models in Discrete State-Spaces

Active Learning for Multi Agent 3D Object Detection 

Type: Master's Thesis  Industrial partner: BMW 

Prerequisites: 

  • Strong knowledge in machine learning 
  • Knowledge in Object Detection 
  • Excellent programming skills 
  • Proficiency with Python and deep learning frameworks (TensorFlow or PyTorch) 

Description: 

In autonomous driving, state-of-the-art deep neural networks are used for perception tasks like for example 3D object detection. To provide promising results, these networks often require a lot of complex annotation data for training. These annotations are often costly and redundant. Active learning is used to select the most informative samples for annotation and cover a dataset with as less annotated data as possible.   

The objective is to explore active learning approaches for 3D object detection using combined uncertainty and diversity based methods.  

Contact: Sebastian Schmidt

References: 

  • Exploring Diversity-based Active Learning for 3D Object Detection in Autonomous Driving   
  • Efficient Uncertainty Estimation for Semantic Segmentation in Videos   
  • KECOR: Kernel Coding Rate Maximization for Active 3D Object Detection
  • Towards Open World Active Learning for 3D Object Detection   

Graph Neural Networks

Type:  Master's thesis / Bachelor's thesis / guided research

  • Knowledge of graph/network theory

Graph neural networks (GNNs) have recently achieved great successes in a wide variety of applications, such as chemistry, reinforcement learning, knowledge graphs, traffic networks, or computer vision. These models leverage graph data by updating node representations based on messages passed between nodes connected by edges, or by transforming node representation using spectral graph properties. These approaches are very effective, but many theoretical aspects of these models remain unclear and there are many possible extensions to improve GNNs and go beyond the nodes' direct neighbors and simple message aggregation.

Contact: Simon Geisler

  • Semi-supervised classification with graph convolutional networks
  • Relational inductive biases, deep learning, and graph networks
  • Diffusion Improves Graph Learning
  • Weisfeiler and leman go neural: Higher-order graph neural networks
  • Reliable Graph Neural Networks via Robust Aggregation

Physics-aware Graph Neural Networks

Type:  Master's thesis / guided research

  • Proficiency with Python and deep learning frameworks (JAX or PyTorch)
  • Knowledge of graph neural networks (e.g. GCN, MPNN, SchNet)
  • Optional: Knowledge of machine learning on molecules and quantum chemistry

Deep learning models, especially graph neural networks (GNNs), have recently achieved great successes in predicting quantum mechanical properties of molecules. There is a vast amount of applications for these models, such as finding the best method of chemical synthesis or selecting candidates for drugs, construction materials, batteries, or solar cells. However, GNNs have only been proposed in recent years and there remain many open questions about how to best represent and leverage quantum mechanical properties and methods.

Contact: Nicholas Gao

  • Directional Message Passing for Molecular Graphs
  • Neural message passing for quantum chemistry
  • Learning to Simulate Complex Physics with Graph Network
  • Ab initio solution of the many-electron Schrödinger equation with deep neural networks
  • Ab-Initio Potential Energy Surfaces by Pairing GNNs with Neural Wave Functions
  • Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds

Robustness Verification for Deep Classifiers

Type: Master's thesis / Guided research

  • Strong machine learning knowledge (at least equivalent to IN2064 plus an advanced course on deep learning)
  • Strong background in mathematical optimization (preferably combined with Machine Learning setting)
  • Proficiency with python and deep learning frameworks (Pytorch or Tensorflow)
  • (Preferred) Knowledge of training techniques to obtain classifiers that are robust against small perturbations in data

Description : Recent work shows that deep classifiers suffer under presence of adversarial examples: misclassified points that are very close to the training samples or even visually indistinguishable from them. This undesired behaviour constraints possibilities of deployment in safety critical scenarios for promising classification methods based on neural nets. Therefore, new training methods should be proposed that promote (or preferably ensure) robust behaviour of the classifier around training samples.

Contact: Aleksei Kuvshinov

References (Background):

  • Intriguing properties of neural networks
  • Explaining and harnessing adversarial examples
  • SoK: Certified Robustness for Deep Neural Networks
  • Certified Adversarial Robustness via Randomized Smoothing
  • Formal guarantees on the robustness of a classifier against adversarial manipulation
  • Towards deep learning models resistant to adversarial attacks
  • Provable defenses against adversarial examples via the convex outer adversarial polytope
  • Certified defenses against adversarial examples
  • Lipschitz-margin training: Scalable certification of perturbation invariance for deep neural networks

Uncertainty Estimation in Deep Learning

Type: Master's Thesis / Guided Research

  • Strong knowledge in probability theory

Safe prediction is a key feature in many intelligent systems. Classically, Machine Learning models compute output predictions regardless of the underlying uncertainty of the encountered situations. In contrast, aleatoric and epistemic uncertainty bring knowledge about undecidable and uncommon situations. The uncertainty view can be a substantial help to detect and explain unsafe predictions, and therefore make ML systems more robust. The goal of this project is to improve the uncertainty estimation in ML models in various types of task.

Contact: Tom Wollschläger ,   Dominik Fuchsgruber ,   Bertrand Charpentier

  • Can You Trust Your Model’s Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift
  • Predictive Uncertainty Estimation via Prior Networks
  • Posterior Network: Uncertainty Estimation without OOD samples via Density-based Pseudo-Counts
  • Evidential Deep Learning to Quantify Classification Uncertainty
  • Weight Uncertainty in Neural Networks

Hierarchies in Deep Learning

Type:  Master's Thesis / Guided Research

Multi-scale structures are ubiquitous in real life datasets. As an example, phylogenetic nomenclature naturally reveals a hierarchical classification of species based on their historical evolutions. Learning multi-scale structures can help to exhibit natural and meaningful organizations in the data and also to obtain compact data representation. The goal of this project is to leverage multi-scale structures to improve speed, performances and understanding of Deep Learning models.

Contact: Marcel Kollovieh , Bertrand Charpentier

  • Tree Sampling Divergence: An Information-Theoretic Metricfor Hierarchical Graph Clustering
  • Hierarchical Graph Representation Learning with Differentiable Pooling
  • Gradient-based Hierarchical Clustering
  • Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space
  • Bibliography
  • More Referencing guides Blog Automated transliteration Relevant bibliographies by topics
  • Automated transliteration
  • Relevant bibliographies by topics
  • Referencing guides

Data Science

  • Mathematics and Computer Science

Student theses

  • 1 - 50 out of 781 results
  • Title (descending)

Search results

3d face reconstruction using deep learning.

Student thesis : Master

3D fingerprint detection in ancient museum sculptures from CT data

Achieving long term fairness through curiosity driven reinforcement learning: how intrinsic motivation influences fairness in algorithmic decision making, a coherent temporal visualization of algorithm dynamics over large graphs.

Student thesis : Bachelor

A comparative study for process mining approaches in a real-life environment

A comparative study on unsupervised deep learning methods for x-ray image denoising with multi-image self2self and single frequency denoising, a comparison of quantitative evaluation and human perception of quality of generated images of faces, a computational biology framework: a data analysis tool to support biomedical engineers in their research, active learning for text classification, active learning in vae latent space, activity recognition using deep learning in videos under clinical setting, a dashboard for emulating lstm-based predictive process monitoring and its qualitative evaluation, a data cleaning assistant, a data cleaning assistant for machine learning, adding formal specifications to a legacy code generator, a deep learning approach for clustering a multi-class dataset, a detailed understanding of actor involvement in business processes, adopting the factorized model of execution in a graph database engine, advances in understanding and initializing einsum networks, adversarial attacks on deep dreams, adversarial datasets through sentence length and conjunctions, adversarial nlp benchmarks: data characteristics complicating automated generation of adversarial examples, adversarial noise benchmarking on image caption, aerial imagery pixel-level segmentation, aethra db: optimising analytical processing through query-tailored code generation, a feasibility study on automated database exercise generation with large language models, a forecasting framework for recirculation in baggage handling systems, a framework for understanding business process remaining time predictions, age(ing) in software development, aggregated information visualization for process alignments, a heuristic approach for the vrptw using dual information of its lp formulation, a hybrid model for pedestrian motion prediction, algorithms for center-based trajectory clustering, aligning incompatible state (and action) representations through disentanglement for multi-task transfer in offline rl, allocation decision-making in service supply chain with deep reinforcement learning, a method for identifying undesired medical treatment variants using process and data mining techniques, a method to determine actual time worked from event logs, an adaptive and scrutable math tutoring system, an adversarial analysis of inference capabilities acquired by state-of-the-art nlp models from the rte dataset, analysis and improvement of process models with respect to key performance indicators: a debt collection case study, analysis of the influence of routines on task execution performance, analyzing application usage logs to understand the users, analyzing causes of outlier cascade behavior in baggage handling systems, analyzing collaborations and routines in event graphs using statistics and pattern mining, analyzing complexity progression and complexity correlation of sql questions on stack overflow, analyzing customer journey with process mining: from discovery to recommendations, analyzing data of operating rooms in hospitals to reduce rework, analyzing policy gradient approaches towards rapid policy transfer, analyzing routines and habits in event graphs using statistics and pattern mining.

data mining bachelor thesis

The chair typically offers various thesis topics each semester in the areas computational statistics, machine learning, data mining, optimization and statistical software. You are welcome to suggest your own topic as well .

Before you apply for a thesis topic make sure that you fit the following profile:

  • Knowledge in machine learning.
  • Good R or python skills.

Before you start writing your thesis you must look for a supervisor within the group.

Send an email to the contact person listed in the potential theses topics files with the following information:

  • Planned starting date of your thesis.
  • Thesis topic (of the list of thesis topics or your own suggestion).
  • Previously attended classes on machine learning and programming with R.

Your application will only be processed if it contains all required information.

Potential Thesis Topics

[Potential Thesis Topics] [Student Research Projects] [Current Theses] [Completed Theses]

Below is a list of potential thesis topics. Before you start writing your thesis you must look for a supervisor within the group.

Available thesis topics

Title Type Supervisor
MA Casalicchio
MA Rügamer
MA Bender
MA Casalicchio
MA Bothmann
MA Bothmann
MA Feurer
MA Feurer
MA Feurer
MA Feurer
MA Feurer
MA Feurer
MA Feurer
MA Feurer/Casalicchio
MA Feurer

Disputation

The disputation of a thesis lasts about 60-90 minutes and consists of two parts. Only the first part is relevant for the grade and takes 30 minutes (bachelor thesis) and 40 minutes (master thesis). Here, the student is expected to summarize his/her main results of the thesis in a presentation. The supervisor(s) will ask questions regarding the content of the thesis in between. In the second part (after the presentation), the supervisors will give detailed feedback and discuss the thesis with the student. This will take about 30 minutes.

  • How do I prepare for the disputation?

You have to prepare a presentation and if there is a bigger time gap between handing in your thesis and the disputation you might want to reread your thesis.

  • How many slides should I prepare?

That’s up to you, but you have to respect the time limit. Prepariong more than 20 slides for a Bachelor’s presentation and more than 30 slides for a Master’s is VERY likely a very bad idea.

  • Where do I present?

Bernd’s office, in front of the big TV. At least one PhD will be present, maybe more. If you want to present in front of a larger audience in the seminar room or the old library, please book the room yourself and inform us.

  • English or German?

We do not care, you can choose.

  • What do I have to bring with me?

A document (Prüfungsprotokoll) which you get from “Prüfungsamt” (Frau Maxa or Frau Höfner) for the disputation.Your laptop or a USB stick with the presentation. You can also email Bernd a PDF.

  • How does the grading work?

The student will be graded regarding the quality of the thesis, the presentation and the oral discussion of the work. The grade is mainly determined by the written thesis itself, but the grade can improve or drop depending on the presentation and your answers to defense questions.

  • What should the presentation cover?

The presentation should cover your thesis, including motivation, introduction, description of new methods and results of your research. Please do NOT explain already existing methods in detail here, put more focus on novel work and the results.

  • What kind of questions will be asked after the presentation?

The questions will be directly connected to your thesis and related theory.

Student Research Projects

We are always interested in mentoring interesting student research projects. Please contact us directly with an interesting resarch idea. In the future you will also be able to find research project topics below.

Available projects

Currently we are not offering any student research projects.

For more information please visit the official web page Studentische Forschungsprojekte (Lehre@LMU)

Current Theses (With Working Titles)

Title Type
Empirical Evaluation of Methods for Discrete Time-to-event Analysis BA
Enhancing stance prediction by utilizing party manifestos MA
Examining and Mitigating Gender Bias in German Word Embeddings BA
Exploring the Effects of Domain Shift on Inferred Topics in Neural and Non-Neural Topic Models BA
Transformer Uncertainty Estimation with Stochastic Attention MA
Transfer Learning of Simulation to Hardware Direction Finding for Indoor Position MA
Reliable Self-supervised Learning for Medical Image Analysis MA
Quantification of Uncertainties via Deep Learning for Medical Image Segmentation MA
Deep Efficient Transformers for Learning Representation of Genomic Sequences MA
Self-Supervised Multimodal Metric Learning MA
Diverse Sentence Embedding for Legal Multi-Label Document Classification MA
Unsupervised Domain Adaptive Object Detection MA
Uncertainty-Aware Self-Supervised Learning MA
Data-driven Lag-lead Selection for Exposure-Lag-Response Associations BA
Probabilistic Deep Learning of Liver Failure in Therapeutical Cancer Treatment MA
Model agnostic Feature Importance by Loss Measures MA
Model-agnostic interpretable machine learning methods for multivariate MA
Time Series Forecasting MA
Normalizing Flows for Interpretablity Measures MA
Representation Learning for Semi-Supervised Genome Sequence Classification MA
Neural Architecture Search for Genomic Sequence Data MA
Comparison of Machine Learning Models For Competing Risks Survival Analysis MA
Multi-accuracy calibration for survival models MA
MA

Completed Theses

Completed theses (lmu munich).

Title Type Completed
Domain transfer across country, time and modality in multiclass-classification BA 2022
Predicted Sentiments of Customer Texts as Covariates for Time Series Forecasting MA 2022
Gaussian Process Regression and Bayesian Deep Learning for Insurance Tariff Migration MA 2022
Transformer Model for Genome Sequence Analysis BA 2022
Self-supervised Representation Learning for Genome Sequence Data MA 2022
Self-supervised Learning Framework for Imbalanced Positive-Unlabeled Data MA 2022
A comparative Evaluation of the Utility of linguistic Features for Part-of-Speech-Tagging BA 2022
Evaluating pre-trained language models on partially unlabeled multilingual economic corpora MA 2022
How Different is Stereotypical Bias in Different Languages? Analysis Multilingual Language Models MA 2022
Leveraging pairwise constraints for topic discovery in weakly annotated text data MA 2022
Word Embedding Evaluation with Intrinsic Evaluators BA 2022
Application of neural topic models to twitter data from German politicians BA 2022
Visualizing Hyperparameter Performance Dependencies BA 2022
Deep Self-Supervised Divergence Learning MA 2021
Neural Architecture Search for Genomic Sequence Data MA 2021
Multi-state modeling in the context of predictive maintenance MA 2021
Multi-state modeling in the context of predictive maintenance MA 2021
Model Based Quality Diversity Optimization MA 2021
mlr3automl - Automated Machine Learning in R MA 2021
Knowledge destillation - Compressing arbitrary learners into a neural net MA 2020
Personality Prediction Based on Mobile Gaze and Touch Data MA 2020
Identifying Subgroups induced by Interaction Effects MA 2020
Benchmarking: Tests and Vizualisations MA 2019
Counterfactual Explanations MA 2019
Methodik, Anwendungen und Interpretation moderner Benchmark-Studien am Beispiel der MA 2019
Risikomodellierung bei akuter Cholangitis    
Machine Learning pipeline search with Bayesian Optimization and Reinforcement Learning MA 2019
Visualization and Efficient Replay Memory for Reinforcement Learning BA 2019
Neural Network Embeddings for Categorical Data BA 2019
Localizing phosphorylation sites by deep learning-based fragment ion intensity MA 2019
Average Marginal Effects in Machine Learning MA 2019
Wearable-based Severity Detection in the Context of Parkinson’s Disease Using MA 2018
Deep Learning Techniques    
Bayesian Optimization under Noise for Model Selection in Machine Learning MA 2018
Interpretable Machine Learning - An Application Study using the Munich Rent Index MA 2018
Automatic Gradient Boosting MA 2018
Efficient and Distributed Model-Based Boosting for Large Datasets MA 2018
Linear individual model-agnostic explanations - discussion and empirical analysis of modifications MA 2018
Extending Hyperband with Model-Based Sampling Strategies MA 2018
Reinforcement learning in R MA 2018
Anomaly Detection using Machine Learning Methods MA 2018
RNN Bandmatrix MA 2018
Configuration of deep neural networks using model-based optimization MA 2017
Kernelized anomaly detection MA 2017
Automatic model selection amd hyperparameter optimization MA 2017
mlrMBO / RF distance based infill criteria MA 2017
Kostensensitive Entscheidungsbäume für beobachtungsabhängige Kosten BA 2016
Implementation of 3D Model Visualization for Machine Learning BA 2016
Eine Simulationsstudie zum Sampled Boosting BA 2016
Implementation and Comparison of Stacking Methods for Machine Learning MA 2016
Runtime estimation of ML models BA 2016
Process Mining: Checking Methods for Process Conformance MA 2016
Implementation of Multilabel Algorithms and their Application on Driving Data MA 2016
Stability Selection for Component-Wise Gradient Boosting in Multiple Dimensions MA 2016
Detecting Future Equipment Failures: Predictive Maintenance in Chemical Industrial Plants MA 2016
Fault Detection for Fire Alarm Systems based on Sensor Data MA 2016
Laufzeitanalyse von Klassifikationsverfahren in R BA 2015
Benchmark Analysis for Machine Learning in R BA 2015
Implementierung und Evaluation ergänzender Korrekturmethoden für statistische Lernverfahren BA 2014
bei unbalancierten Klassifikationsproblemen    

Completed Theses (Supervised by Bernd Bischl at TU Dortmund)

Title Type Completed
Anwendung von Multilabel-Klassifikationsverfahren auf Medizingerätestatusreporte zur Generierung von Reparaturvorschlägen MA 2015
Erweiterung der Plattform OpenML um Ereigniszeitanalysen MA 2015
Modellgestützte Algorithmenkonfiguration bei Feature-basierten Instanzen: Ein Ansatz über das Profile-Expected-Improvement Dipl. 2015
Modellbasierte Hyperparameteroptimierung für maschinelle Lernverfahren auf großen Daten MA 2015
Implementierung einer Testsuite für mehrkriterielle Optimierungsprobleme BA 2014
R-Pakete für Datenmanagement und -manipulation großer Datensätze BA 2014
Lokale Kriging-Verfahren zur Modellierung und Optimierung gemischter Parameterräume mit Abhängigkeitsstrukturen BA 2014
Kostensensitive Algorithmenselektion für stetige Black-Box-Optimierungsprobleme basierend auf explorativer Landschaftsanalyse MA 2013
Exploratory Landscape Analysis für mehrkriterielle Optimierungsprobleme MA 2013
Feature-based Algorithm Selection for the Traveling-Salesman-Problem BA 2013
Implementierung und Untersuchung einer parallelen Support Vector Machine in R Dipl. 2013
Sequential Model-Based Optimization by Ensembles: A Reinforcement Learning Based Approach Dipl. 2012
Vorhersage der Verkehrsdichte in Warschau basierend auf dem Traffic Simulation Framework BA 2011
Klassifikation von Blutgefäßen und Neuronen des menschlichen Gehirns anhand von ultramikroskopierten 3D-Bilddaten BA 2011
Uncertainty Sampling zur Auswahl optimaler Sampler aus der trunkierten Normalverteilung BA 2011
Over-/Undersampling für unbalancierte Klassifikationsprobleme im Zwei-Klassen-Fall BA 2010

data mining bachelor thesis

Bachelor and Master Theses

BSc (100), MSc (76), MSc SciComp (15), Diploma (5)

  • Alexandra Kowalewski:  Querying Web Tables with Language Models , Bachelor Thesis, July 2024.
  • Nils Krehl:  Real-world Clinical Knowledge Graph Construction and Exploration , Master Thesis, July 2024. 
  • Christopher Lindenberg:  Exploring the Behavior of Function Calling Systems Using Small LLMs ,  Bachelor Thesis, July 2024.
  • Björn Bulkens:  Towards Automatic Generation of Knowledge Graphs using LLMs , Master Thesis, May 2024. 
  • Abdulghani Almasri:  A Framework to Measure Coherence in Query Sessions , Bachelor Thesis, May 2024.
  • Stefan Lenert: Building Conversational Question Answering Systems , Master Thesis, March 2024. 
  • Alexander Kosnac:  Quantity-centric Summarization Techniques for Documents , Bachelor Thesis, March 2024.
  • Nicolas Hellthaler: Footnote-Augmented Documents for Passage Retrieval , Bachelor Thesis, February 2024.
  • Simon Gimmini: Exploring Temporal Patterns in Art Through Diffusion Models , Master Thesis, February 2024
  • Xingqi Cheng:  A Rule-based Post-processor for Temporal Knowledge Graph Extrapolation , Master Thesis, January 2024.
  • Raphael Ebner: Leveraging Large Language Models for Information Extraction and Knowledge Representation , Bachelor Thesis, January 2024.
  • Angelina Basova: Table Extraction from PDF Documents , Master Thesis, December 2023
  • Milena Bruseva:  Benchmarking Vector Databases: A Framework for Evaluating Embedding Based Retrieval , Master Thesis, December 2023.
  • Luis Wettach:  Medical Electronic Data Capture at Home – A Privacy Compliant Framework , Master Thesis, December 2023.
  • Jayson Pyanowski:  Semantic Search with Contextualized Query Generation , Master Thesis, December 2023.
  • Philipp Göldner: Information Retrieval using Sparse Embeddings , Master Thesis, December 2023.
  • Vivian Kazakova:  A Topic Modeling Framework for Biomedical Text Analysis , Bachelor Thesis, October 2023
  • Dennis Geiselmann: Context-Aware Dense Retrieval , Master Thesis, October 2023.
  • Konrad Goldenbaum: Semantic Search and Topic Exploration of Scientific Paper Corpora , Bachelor Thesis, October 2023
  • Yingying Cao:  Keyword-based Summarization of (Legal) Documents , Master Thesis Scientific Computing, August, 2023.
  • Julian Freyberg: Structural and Logical Document Layout Analysis u sing Graph Neural Networks , Master Thesis, August 2023.
  • Marina Walther:  A Universal Online Social Network Conversation Model , Master Thesis, August 2023.
  • David Pohl:  Zero-Shot Word Sense Disambiguation using Word Embeddings , Bachelor Thesis, August 2023
  • Klemens Gerber:  Automatic Enrichment of Company Information in Knowledge Graphs , Master Thesis, August 2023.
  • Bastian Müller:  An Adaptable Question Answering Framework with Source-Citations , Bachelor Thesis, August 2023
  • Jiahui Li:  Styled Text Summarization via Domain-specific Paraphrasing ,  Master Thesis Scientific Computing, July 2023.
  • Sophia Matthis: Multi-Aspect Exploration of Plenary Protocols , Master Thesis, June 2023.
  • Till Rostalski:  A Generic Patient Similarity Framework for Clinical Data Analysis , Bachelor Thesis, June 2023
  • David Jackson:  Automated Extraction of Drug Analysis and Discovery Networks , Master Thesis Scientific Computing, May 2023.
  • Christopher Brückner:  Multi-Feature Clustering of Search Results , Master Thesis, April 2023.
  • Paul Dietze:  Formula Classification and Mathematical Token Embeddings , Bachelor Thesis, April 2023.
  • Sophia Hammes:  A Neural-Based Approach for Link Discovery in the Process Management Domain , Master Thesis, March 2023.
  • Fabian Kneissl:  Time-Dependent Graph Modeling of Twitter Conversations , Master Thesis, March 2023.
  • Lucienne-Sophie Marmé:   A Bootstrap Approach for Classifying Political Tweets into Policy Fields , Bachelor Thesis, March 2023.
  • Jing Fan: Assessing Factual Accuracy of Generated Text Using Semantic Role Labeling , Bachelor Thesis, March 2023.
  • Fabio Gebhard:  A Rule-based Approach for Numerical Question Answering , Master Thesis, December 2022.
  • Severin Laicher:  Learning and Exploring Similarity of Sales Items and its Dependency on Sales Data , Master Thesis, September 2022.
  • Raeesa Yousaf: Explainability of Graph Roles Extracted from Networks , Bachelor Thesis, September 2022.
  • Julian Seibel: Towards GAN-based Open-World Knowledge Graph Completion , Master Thesis, June 2022.
  • Claire Zhao Sun: Extracting and Exploring Causal Factors from Financial Documents , Master Thesis Scientific Computing, May 2022.
  • Ziqiu Zhou:  Semantic Extensions of OSM Data Through Mining Tweets in the Domain of Disaster Management , Master Thesis, May 2022.
  • Lukas Ballweg:  Analysis of Lobby Networks and their Extraction from Semi-Structured Data ,  Bachelor Thesis, April 2022.
  • Benjamin Wagner:  Benchmarking Graph Databases for Knowledge Graph Handling , Bachelor Thesis, March 2022.
  • Cedric Bender:  Exploration and Analysis of Methods for German Tweet Stream Summarization , Bachelor Thesis, March 2022. 
  • Johannes Klüh:  Polyphonic Music Generation for Multiple Instruments using Music Transformer , Bachelor Thesis, March 2022.
  • Nicolas Reuter: Automatic Annotation of Song Lyrics Using Wikipedia Resources , Master Thesis, December 2021.
  • Mateusz Chrzastek: Extraktive Keyphrases form Noun Chunk Similarity , Bachelor Thesis, October 2021. 
  • Fabrizio Primerano: Document Information Extraction from Visually-rich Documents with Unbalanced Class Structure , Master Thesis, October 2021.
  • Sarah Marie Bopp: Gender-centric Analysis of Tweets from German Politicians , Bachelor Thesis, September 2021.
  • Philipp Göldner: A Framework for Numerical Information Extraction , Bachelor Thesis, July 2021.
  • Robin Khanna: Adaptive Topic Modelling for Twitter Data , Bachelor Thesis, July 2021.
  • Thomas Rekers: Correlating Postings from Different Social Media Platforms , Master Thesis, July 2021.
  • Duc Anh Phi: Background Linking of News Articles , Master Thesis, May 2021.
  • Eike Harms: Linking Table and Text Quantities in Documents , Master Thesis, April 2021.
  • Raphael Arndt: Regelbasierte Binärklassifizierung von Webseiten , Bachelor Thesis, April 2021.
  • Jonas Gann: Integrating Identity Management Providers based on Online Access Law , Bachelor Thesis, March 2021.
  • Björn Ternes: Kontextbasierte Informationsextraktion aus Datenschutzerklärungen , Bachelor Thesis, March 2021.
  • Fabio Becker: A Generative Model for Dynamic Networks with Community Structures , Master Thesis, December 2020.
  • Jan-Gabriel Mylius: Visual Analysis of Paragraph Similarity , Bachelor Thesis, December 2020
  • Alexander Hebel: Information Retrieval mit PostgreSQL , Master Thesis, November 2020.
  • Jonas Albrecht: Lexikon-basierte Sentimentanalyse von Tweets , Bachelor Thesis, November 2020.
  • Marina Walther: A Network-based Approach to Investigate Medical Time Series Data , Bachelor Thesis, September 2020.
  • Stefan Hickl: Automatisierte Generierung von Inhaltsverzeichnissen aus PDF-Dokumenten , Bachelor Thesis, September 2020.
  • Christopher Brückner: Structure-centric Near-Duplicate Detection , Bachelor Thesis, August 2020.
  • David Jackson: Extracting Knowledge Graphs from Biomedical Literature , Bachelor Thesis, August 2020.
  • David Richter: Single-Pass Training von Klassifikatoren basierend auf einem großem Web-Korpus , Master Thesis, August 2020.
  • Julian Freyberg: Time-sensitive Multi-label Classification of News Articles , Bachelor Thesis, July 2020.
  • John Ziegler: Modelling and Exploration of Property Graphs for Open Source Intelligence , Master Thesis, August, 2020.
  • Johannes Keller: A Network-based Approach for Modeling Twitter Topics , Master Thesis, June 2020.
  • Erik Koynov : Three Stage Statute Retrieval Algorithm with BERT and Hierachical Pretraining" , Bachelor Thesis, Mai 2020.
  • Fabian Kaiser: Cross-Reference Resolution in German and European Law , Master Thesis, April 2020.
  • Hasan Malik: Open Numerical Information Extraction , Master Thesis, Scientific Computing, March 2020.
  • Matthias Rein: Exploration of User Networks and Content Analysis of the German Political Twittersphere , Master Thesis, March 2020.
  • Philip Hausner: Time-centric Content Exploration in Large Document Collections , Master Thesis, March 2020.
  • Mohammad Dawas: On the Analysis of Networks Extracted from Relational Databases , Master Thesis, Scientific Computing, February 2020.
  • Lea Zimmermann: Mapping Machine Learning Frameworks to End2End Infrastructures , Bachelor Thesis, February 2020
  • Bente Nittka: Modelling Verdict Documents for Automated Judgment Grounds Prediction , Bachelor Thesis, November 2019
  • Michael Pronkin: A Framework for a Person-Centric Gazetteer Service , Bachelor Thesis, November 2019
  • Jessica Löhr: Analysis and Exploration of Register Data of Companies , Bachelor Thesis, October 2019
  • Seida Basha: Extraction of Comment Threads of Political News Articles , Bachelor Thesis, September 2019
  • Lukas Rüttgers: Analyse von YouTube-Kommentaren zur Förderung von Diskussionen , Master Thesis, Scientific Computing, July 2019
  • Gloria Feher: Concepts in Context: A Network-based Approach , Master Thesis, July 2019
  • Dennis Aumiller: Implementation of a Relational Document Hypergraph for Information Retrieval , Master Thesis, April 2019
  • Raheel Ahsan: Efficient Entity Matching , Master Thesis, Scientific Computing, March 2019
  • Christian Straßberger: Time-Varying Graphs to Explore Medical Time Series , Master Thesis, Scientific Computing, March 2019
  • Frederik Schwabe: Zitationsnetzwerke in Gesetzestexten und juristischen Entscheidungen , Bachelor Thesis, February 2019
  • Kilian Claudius Valenti: Extraktion und Exploration von Kookkurenznetzwerken aus Arztbriefen , Bachelor Thesis, February 2019
  • Satya Almasian: Learning Joint Vector Representation of Words and Named Entities , Master Thesis, Scientific Computing, October 2018
  • Naghmeh Fazeli: Evolutionary Analysis of News Article Networks , Master Thesis, Scientific Computing, October 2018
  • Lukas Kades: Development and Evaluation of an Indoor Simulation Model for Visitor Behaviour on a Trade Fair , Master Thesis, October 2018
  • David Stronczek: Named Entity Disambiguation using Implicit Networks , Master Thesis, August 2018
  • Julius Franz Foitzik: A Social Network Approach towards Location-based Recommendation , Master Thesis, April 2018
  • Carine Dengler: Network-based Modeling and Analysis of Political Debates , Master Thesis, May 2018
  • Maximilian Langknecht: Exploration-Based Feature Analysis of Time Series Using Minimum Spanning Trees ,  Bachelor Thesis, May 2018
  • Jayson Salazar: Extraction and Analysis of Dynamic Co-occurence Networks from Medical Text , Master Thesis, Scientific Computing, April 2018
  • Fabio Becker: Toponym Resolution in HeidelPlace , Bachelor Thesis, April 2018
  • Felix Stern: Correlating Finance News Articles and Stock Indexes , Master Thesis, March 2018
  • Oliver Hommel: Symbolical Inversion of Formulas in an OLAP Context , Master Thesis, Scientific Computing,  March 2018
  • Jan Greulich: Reasoning with Imprecise Temporal and Geographical Data , Master Thesis, February 2018
  • Johannes Visintini: Modelling and Analyzing Political Activity Networks , Bachelor Thesis, February 2018
  • Sebastian Lackner:  Efficient Algorithms for Anti-community Detection , Master Thesis, February 2018
  • Leonard Henger: Erstellung eines konzeptionellen Datenmodells für Zeitreihen und Erkennung von Zeitreihenausreißern , Bachelor Thesis, December 2017
  • Christian Kromm: Short-term travel time prediction in complex contents , Master Thesis, December 2017
  • Christian Schütz: A Generative Model for Correlated Geospatial Property Graphs with Social Network Characteristics , Bachelor Thesis, December 2017
  • Sophia Stahl: Association Rule Based Pattern Mining of Cancer Genome Variants , Master Thesis, December 2017
  • Patrick Breithaupt: Evolving Topic-centric Information Networks , Master Thesis, October 2017
  • Michael Müller: Graph Based Event Summarization , Master Thesis, September 2017
  • Slavin Donchev: Statement Extraction from German Newspaper Articles , Bachelor Thesis, August 2017
  • Dennis Aumiller: Mining Relationship Networks from University Websites , Bachelor Thesis, August 2017
  • Katja Hauser: Latent Information Networks from German Newspaper Articles , Bachelor Thesis, April 2017
  • Xiaoyu Ye: Extraction and Analysis of Organization and Person Networks , Master Thesis, April 2017
  • Martin Enderlein: Modeling and Exploring Company Networks , Bachelor Thesis, January 2017
  • Ludwig Richter: A Generic Gazetter Data Model and an Extensible Framework for Geoparsing , Master Thesis, October 2016
  • Benjamin Keller: Matching Unlabeled Instances against a Known Data Schema Using Active Learning , Bachelor Thesis, August 2016
  • Julien Stern: Generation and Analysis of Event Networks from GDELT Data , Bachelor Thesis, July 2016
  • Hüseyin Dagaydin: Personalized Filtering of SAP Internal Search Results based on Search Behavior , Master Thesis, March 2016
  • Zaher Aldefai: Improvement of SAP Search HANA results through Text Analysis , Master Thesis, April 2016
  • Jens Cram: Adapting In-Memory Representations of Property Graphs to Mixed Workloads , Bachelor Thesis, April 2016
  • Antonio Jiménez Fernández: Collection and Analysis of User Generated Comments on News Articles , Bachelor Thesis, April 2016
  • Nils Weiher: Temporal Affiliation Network Extraction from Wikidata , Bachelor Thesis, March 2016
  • Claudia Dünkel: Erweiterung des Wu-Holme Modells für Zitationsnetzwerke , Bachelor Thesis, January 2016
  • Muhammad El-Hindi: VisIndex: A Multi-dimensional Tree Index for Histogram Queries , Master Thesis, December 2015
  • Annika Boldt: Rahmenwerk für kontextsensitive Hilfe von webbasierten Anwendungen , Master Thesis, December 2015
  • Carine Dengler: Das INDY-Bildanalyseframework für die Geschichtswissenschaften , Bachelor Thesis, October 2015
  • Leif-Nissen Lundbaek: Conceptional analysis of cryptocurrencies towards smart financial networks , Master Thesis, Scientific Computing, October 2015
  • Viktor Bersch: Effiziente Identifikation von Ereignissen zur Auswertung komplexer Angriffsmuster auf IT Infrastrukturen , Master Thesis, September 2015
  • Ranjani Dilip Banhatti: Graph Regularization Parameter for Non-Negative Matrix Factorization , Master Thesis, Scientific Computing, September 2015
  • Konrad Kühne: Temporal-Topological Analysis of Evolutionary Message Networks , Bachelor Thesis, July 2015
  • Stefanie Bachmann: The K-Function and its use for Bandwidth Parameter Estimation , Bachelor Thesis, July 2015
  • Philipp Daniel Freiberger: Temporal Evolution of Communities in Affiliation Networks , Bachelor Thesis, June 2015
  • Johannes Auer: Bewertung von GitHub Projekten anhand von Eventdaten , Bachelor Thesis, March 2015
  • Christian Kromm: Erkennung und Analyse von Regionalen Hashtag Communities in Twitter , Bachelor Thesis, March 2015
  • Matthias Brandt: Evolution of Correlation of Hashtags in Twitter, Master Thesis, February 2015
  • Jonas Scholten: Effizientes Indexing von Twitter-Daten für temporale und räumliche TopK-Suche unter Verwendung von Mongo DB , Bachelor Thesis, February 2015
  • Patrick Breithaupt: Experimentelle Analyse des Exponetial Random Graph Modells , Bachelor Thesis, February 2015
  • Timm Schäuble: Classification of Temporal Relations between Events , Bachelor Thesis, January 2015
  • Andreas Spitz: Analysis and Exploration of Centrality and Referencing Patterns in Networks of News Articles, Master Thesis , November 2014
  • Tobias Zatti: Simulation und Erweiterung von sozialen Netzwerken durch Random Graphs am Beispiel von Twitter , Bachelor Thesis, November 2014
  • Ludwig Richter: Automated Field-Boundary Detection by Trajectory Analysis of Agricultural Machinery , Bachelor Thesis, August 2014
  • Thomas Metzger: Mining Sequential Patterns from Song Lists , Bachelor Thesis, July 2014
  • Arthur Arlt: Determining Rates of False Positives and Negatives in Fast Flux Botnet Detection , Master Thesis, July 2014.
  • Hanna Lange: Stream-based Event and Place Detection from Social Media , June 2014
  • Christian Karr: Effektive Indexierung von räumlichen und zeitlichen Daten , Bachelor Thesis, May 2014
  • Haikuhi Jaghinyan: Evaluation of the HANA Graph Engine, Bachelor Thesis, March 2014
  • Sebastian Rode: Speeding Up Graph Traversals in the SAP HANA Database , Diploma Thesis, Mathematics/Computer Science, March 2014
  • Isil Özge Pekel: Performing Cluster Analysis on Column Store Databases , Master Thesis, March 2014
  • Andreas Runk: Integrating Information about Persons from Linked Open Data , Master Thesis, February 2014
  • Tobias Limpert: Verbesserung der spatio-temporal Event Extraktion und ihrer Kontextinformation durch Relationsextraktionsmethoden , Bachelor Thesis, December 2013
  • Christian Seyda: Comparison of graph-based and vector-space geographical topic detection , Master Thesis, December 2013
  • Bartosz Bogasz: Generation of Place Summaries from Wikipedia , Master Thesis, December 2013
  • David Richter: Segmentierung geographischer Regionen aus Social Media mittels Superpixelverfahren , Bachelor Thesis, Oktober 2013
  • Marek Walkowiak: Gazetteer-gestützte Erkennung und Disambiguierung von Toponymen in Text , Bachelor Thesis, Oktober 2013
  • Mirko Kiefer: Histo: A Protocol for Peer-to-Peer Data Synchronization in Mobile Apps , Bachelor Thesis, September 2013
  • Daniel Egenolf: Extraktion und Normalisierung von Personeninformation für die Kombination mit Spatio-temporal Events , Bachelor Thesis, September 2013
  • Lisa Tuschner: Tag-Recommendation auf Basis von Flickr Daten , Bachelor Thesis, September 2013
  • Edward-Robert Tyercha: An Efficient Access Structure to 3D Mesh Data with Column Store Databases , Master Thesis, September 2013
  • Matthias Iacsa: Study of NetPLSA with respect to regularization in multidimensional spaces , Bachelor Thesis, Juli 2013
  • Timo Haas: Analyse und Exploration von temporalen Aspekten in OSM-Daten , Bachelor Thesis, June 2013
  • Julian Wintermayr: Evaluation of Semantic Web storage solutions focusing on Spatial and Temporal Queries , Bachelor Thesis, June 2013
  • Bertil Nestorius Baron: Aggregate Center Queries in Dynamic Road Networks , Diploma Thesis, Mathematics/Computer Science, Mai 2013
  • Viktor Bersch: Methoden zur temporalen Analyse und Exploration von Reviews , Bachelor Thesis, Mai 2013
  • Cornelius Ratsch: Adaptive String Dictionary Compression in In-Memory Column-Store Database Systems , Master Thesis, April 2013
  • Andreas Zerkowitz: Aufbau und Analyse eines Event-Repository aus Wikipedia , Bachelor Thesis, April 2013
  • Erik von der Osten: Influential Graph Properties of Collaborative-Filtering based Recommender Systems , Diploma Thesis, Mathematics/Computer Science, March 2013
  • Philipp Harth: Local Similarity in Geometric Graphs via Spectral Correspondence , Master Thesis, February 2013
  • Benjamin Kirchholtes: A General Solution for the Point Cloud Docking Problem , Master Thesis, February 2013
  • Manuel Kaufmann: Modellierung und Analyse heuristischer und linguistischer Methoden zur Eventextraktion , Bachelor Thesis, November 2012
  • Dennis Runz: Socio-Spatial Event Detection in Dynamic Interaction Graphs , Master Thesis, November 2012
  • Andreas Schuster: Compressed Data Structures for Tuple Identifiers in Column-Oriented Databases , Master Thesis, October 2012
  • Christian Kapp: Person Comparison based on Name Normalization and Spatio-temporal Events , Master Thesis, September 2012
  • Jörg Hauser: Algorithms for Model Assignment in Multi-Gene Phylogenetics , Master Thesis, August 2012
  • Andreas Klein: The CSGridFile for Managing and Querying Point Data in Column Stores , Master Thesis, August 2012
  • Andreas Runk: Dynamisches Rerouting in Strassennetzwerken , Bachelor Thesis, August 2012
  • Markus Neusinger: Erkennung von Sternströmen mit Hilfe moderner Clusteringverfahren , Diploma Thesis Physics/Computer Science, August 2012
  • Clemens Maier: Visualisierung und Modellierung des auf BRF+ aufgebauten Workflows , Bachelor Thesis, August 2012
  • Daniel Kruck: Investigation of Exact Graph and Tree Isomorphism Problems , Bachelor Thesis, July 2012
  • Andreas Fay: Correlation and Exploration of Events , Master Thesis, February 2012
  • Cornelius Ratsch: Extending Context-Aware Query Autocompletion , Bachelor Thesis, February 2012
  • Alexander Wilhelm: Spezifikation und Suche komplexer Routen in Strassennetzwerken , Diploma Thesis, Mathematics/Computer Science, February 2012
  • Britta Keller: Ein Event-basiertes Ähnlichkeitsmodell für biomedizinische Dokumente , Bachelor Thesis, February 2012
  • Simon Jarke: Effiziente Suche von Substrukturen in grossen geometrischen Graphen , Master Thesis, November 2011
  • Markus Kurz: Visualizing and Exploring Nonparametric Density Estimations of Context-aware Itemsets , Bachelor Thesis, October 2011
  • Frank Tobian: Modelle und Rankingverfahren zur Kombination von textueller und geographischer Suche , Bachelor Thesis, September 2011
  • Alexander Hochmuth: Efficient Computation of Hot Spots in Road Networks , Bachelor Thesis, June 2011
  • Selina Raschack: Spezifikation von Mustern auf räumlichen Daten und Suche von zugehörigen Musterinstanzen , Bachelor Thesis, Mai 2011
  • Bechir Ben Slama: Dynamische Erkennung von Ausreißern in Straßennetzwerken , Master Thesis, March 2011
  • Marcus Schaber: Scalable Routing using Spatial Database Systems , Bachelor Thesis, March 2011
  • Edward-Robert Tyercha: Co-Location Pattern Mining mit MapReduce , Bachelor Thesis, March 2011
  • Benjamin Hiller: Analyse und Verarbeitung von OpenStreetMap-Daten mit MapReduce , Bachelor Thesis, March 2011
  • Serge Thiery Akoa Owona: Apache Cassandra as Database System for the Activiti BPM Engine , Bachelor Thesis, February 2011
  • Maik Häsner: Bestimmung und Überwachung von Hot Spots in Strassennetzwerken , Master Thesis, October 2010.
  • Philipp Harth: Scale-Dependent Pattern Mining on Volunteered Geographic Information , Bachelor Thesis, August 2010.
  • Peter Artmann: Design and Implementation of a Rule-based Warning and Messaging System , Bachelor Thesis, June 2010.
  • Christopher Röcker: Analyse und Rekonstruktion unvollständiger Sensordaten , Bachelor Thesis, March 2010.
  • Andreas Klein: Eine Indexstruktur zur Verwaltung und Anfrage an Moving Regions auf Grundlage des TPR∗-Baumes , Bachelor Thesis, February 2010.
  • Benjamin Kirchholtes: Object Recognition and Extraction in Satellites Images using the Insight Segmentation and Registration Toolkit (ITK) , Bachelor Thesis, February 2010.
  • Fabian Rühle: Performance Analysis of Column-based Main Memory Databases , Bachelor Thesis, December 2009.
  • Pavel Popov: GeoDok: Extraktion und Visualisierung von Ortsinformationen in Dokumenten , Bachelor Thesis, Dezember 2009.
  • Navigationsstruktur

Prof. Bouros has a long experience in advising and (co-)supervising undergraduate and graduate student projects.

The Data Management group offers interesting topics for theses (bachelor or master) on query processing, database systems and information systems. Some suggested topics can be found in the following link (last update on June 5, 2024). Interested students can also suggest their own topics of interest.

For more information contact Prof. Bouros via email ( bouros@uni-mainz.de ).

Ongoing (first by type, then in alphabetical order)

  • Christian Rauch, Doctorate thesis Temporal Information Retrieval (working title)
  • Achilleas Michalopoulos, Doctorate thesis Partitioning and Indexing Techniques for Scalable Spatial Query Evaluation University of Ioannina Primary supervisor: Prof. Nikos Mamoulis
  • Dimitrios Tsitsigkos, Doctorate thesis Join Operators for Complex Data University of Ioannina Primary supervisor: Prof. Nikos Mamoulis
  • Samia Mubarika Goraya, Master thesis Comparing State-of-the-Art NLP Techniques for Information Extraction from Systems Neuroscience Publications
  • Sebastian Hemberger, Bachelor thesis Indexing composite interval data (working title)
  • Jakob Heeß, Bachelor thesis Managing and visualizing systems neuroscience data (working title)
  • Joshua Kempter, Bachelor thesis A user interface for displaying, exploring and searching through systems neuroscience data
  • Dennis Scheck, Bachelor thesis Generating interval data (working title)

Completed (by completion year)

  • 2024 - Tobias Laures, Master thesis Academic Social Networks: Evolution of Scientific Communities during the COVID-19 Pandemic
  • 2024 - Giorgos Kotsinas, Master thesis Ranking Queries over Range Data University of Ioannina Primary supervisor: Prof. Nikos Mamoulis
  • 2024 - Jan Raider, Bachelor thesis A Graphical User Interface for Managing Interval Data
  • 2024 - Patrick Götz, Bachelor thesis A graphical user interface for systems neuroscience data
  • 2024 - Julius Hoffmann, Bachelor thesis A Graphical Interface for Managing Geosocial networks
  • 2024 - Timo Suk, Master thesis Route Inference as a Tool for Generating Training Data for ML-based tasks University of Kostanz Primary supervisor: Dr. Theodoros Chondrogiannis
  • 2023 - Yazhou Pan, Bachelor thesis Processing spatial keyword search queries
  • 2023 - Lina Khidair, Bachelor thesis An interactive graphical user interface for systems neuroscience graph data
  • 2023 - Huu Duy Nguyen, Bachelor thesis Path Diversification for Evacuation Planning
  • 2023 - Nina Röckelein, Bachelor thesis Measuring the Attractiveness of Transportation Modes using Relative Reachability and Mobility Patterns University of Kostanz Primary supervisor: Dr. Theodoros Chondrogiannis
  • 2023 - Abed Al Rahman Sansour, Bachelor thesis A Graphical User Interface for generating Geosocial networks
  • 2023 - George Christodoulou, PhD thesis Interval Data Management in Main Memory Department of Computer Science and Engineering, University of Ioannina, Greece Primary supervisor: Prof. Nikos Mamoulis
  • 2023 - Arjanit Arifi, Bachelor thesis Training ETA models with recovered routes
  • 2023 - Maximilian Detlef Zerbe, Bachelor thesis Spatio-textual outlier detection
  • 2023 - Adrian Kissinger, Bachelor thesis Selecting geospatial data on maps for routing applications
  • 2023 - David Betz, Bachelor thesis Extending the EURASIM Interface for Evacuation Planning in Urban Areas
  • 2022 - Judith Kunz, Master thesis Finding Similarity Through Dissimilarity: Utilizing path dissimilarity for role extraction in networks University of Kostanz Primary supervisor: Dr. Theodoros Chondrogiannis (for Prof. Dr. Michael Grossniklaus)
  • 2022 - Samia Mubarika Goraya, Bachelor thesis Generating complex and composite data
  • 2022 - Claudia Berenice Perez Martinez, Bachelor thesis An interactive interface for generating road networks
  • 2022 - Lisa-Patricia Barth, Master thesis App-Based Specification and Visualization of User Preferences in Routing Primary supervisor: Prof. Dr. Stefan Kramer
  • 2021 - Yannic Marcel Moog, Bachelor thesis Extending k -Shortest Paths with Limited Overlap
  • 2021 - Marc Araujo Conde, Bachelor thesis A Database with Amnesia Primary supervisor: Prof. Dr. Felix Schuhknecht
  • 2021 - Daniel-Valentin Kowalski, Bachelor thesis Evaluating GeoSocial Reachability Queries
  • 2021 - Christian Häcker, Bachelor thesis k -Most Diverse Near-Shortest Paths
  • 2020 - Artur Titkov, Bachelor thesis Spatially Combined Text Searches
  • 2019 - Mohamed Masarwa, Bachelor thesis Monitoring Geo-Social Influence in Location-Aware Social Networks
  • 2019 - Alina Gerhards, Master thesis Equi-joins on GPU

Primary supervisor: Prof. Bertil Schmidt

  • 2019 - Matthias Sawatzky, Bachelor thesis A Web Application for Analysis and Comparison of k-Shortest Paths with Limited Overlap Algorithms on Road Networks
  • 2017 - Theodoros Chondrogiannis, PhD thesis Efficient Algorithms for Route Planning Problems on Road Networks Faculty of Computer Science, Free University of Bozen-Bolzano, Italy Primary supervisor: Prof. Johann Gamper
  • 2016 - Shuyao Qi, PhD thesis Advanced Ranking Queries on Composite Data Department of Computer Science, University of Hong Kong, China PR Primary supervisor: Prof. Nikos Mamoulis
  • 2016 - Jens C. B. Madsen and Benjamin Petersen, Graduate student project Continuous Monitoring of Geo-Socially Influential Users

Department of Computer Science, Aarhus University, Denmark

  • 2016 - Tobias Sommer, Graduate student project Outlier detection on spatio-temporal data
  • 2015 - Florian Hönicke, Diploma thesis Optimizing Set Similarity Joins on MapReduce Department of Computer Science, Humboldt-Universität zu Berlin, Germany Primary supervisor: Prof. Johann-Christoph Freytag
  • 2015 - John Liagouris, Doctorate thesis Web Data Management with Applications in Privacy (in greek) School of Electrical and Computer Engineering, National Technical University of Athens, Greece
  • 2014 - Anja Kunkel, Undergraduate student project (Studienarbeit) Set Containment Joins using Two Prefix Trees Department of Computer Science, Humboldt-Universität zu Berlin, Germany
  • 2012 - Shen Ge, PhD thesis Advanced analysis and join queries in multidimensional spaces Department of Computer Science, University of Hong Kong, China PR Primary supervisor: Prof. Nikos Mamoulis
  • 2009 - Dora Kontogianni, Diploma thesis GRAPHIT-DB: Graph Data Management System (II) (in greek) School of Electrical and Computer Engineering, National Technical University of Athens, Greece
  • 2009 - Vassilis Giannopoulos, Diploma thesis P-Miner+: Portal Catalogs Administration Supporting Usage Data Mining Processes (in greek) School of Electrical and Computer Engineering, National Technical University of Athens, Greece
  • 2008 - John Liagouris and Trifon Farmakakis, Diploma thesis DataBase supported Reasoning System (DBRS) (in greek) School of Electrical and Computer Engineering, National Technical University of Athens, Greece
  • 2007 - Lemonia Boula, Diploma thesis GRAPHIT-DB: Graph Data Management System (I) (in greek) School of Electrical and Computer Engineering, National Technical University of Athens, Greece
  • 2006 - Theodore Galanis, Diploma thesis P-Miner: Portal Catalogs Administration Supporting Usage Data Mining Processes (in greek) School of Electrical and Computer Engineering, National Technical University of Athens, Greece

data mining bachelor thesis

Purdue University Graduate School

File(s) under embargo

until file(s) become available

Multi-Agent-Based Collaborative Machine Learning in Distributed Resource Environments

This dissertation presents decentralized and agent-based solutions for organizing machine learning resources, such as datasets and learning models. It aims to democratize the analysis of these resources through a simple yet flexible query structure, automate common ML tasks such as training, testing, model selection, and hyperparameter tuning, and enable privacy-centric building of ML models over distributed datasets. Based on networked multi-agent systems, the proposed approach represents ML resources as autonomous and self-reliant entities. This representation makes the resources easily movable, scalable, and independent of geographical locations, alleviating the need for centralized control and management units. Additionally, as all machine learning and data mining tasks are conducted near their resources, providers can apply customized rules independently of other parts of the system.

Degree Type

  • Doctor of Philosophy
  • Computer and Information Technology

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Additional committee member 2, additional committee member 3, additional committee member 4, usage metrics.

  • Artificial life and complex adaptive systems
  • Autonomous agents and multiagent systems
  • Machine learning not elsewhere classified
  • Distributed systems and algorithms

CC BY 4.0

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications You must be signed in to change notification settings

Bachelor's thesis

SirongHuang/Twitter-data-mining

Folders and files.

NameName
18 Commits

Repository files navigation

Twitter data mining - social media as a rising customer service channel.

Presentation slides ---- Thesis paper

Collect data from Twitter API using Python scripts that records the customer service interactions on major consumer electronic brands such as Apple and Samsung

Parse Json data objects in Python and organize tweets into customer service sessions with unique customers with Pandas

Use NLTK(Natural Language Processing toolkit) to conduct Text Mining and Sentiment Analysis

Use Naive Bayes classifier and bag of words method to predict sentiment of a customer towards the customer service received

Unfortunately my codes are forever gone with damaged computer, and I was too inexperienced to back it up on github.

UKnowledge

UKnowledge > College of Engineering > Mining Engineering > Theses & Dissertations

Theses and Dissertations--Mining Engineering

Theses/dissertations from 2024 2024.

THE METHODOLOGY FOR INTEGRATING ROBOTIC SYSTEMS IN UNDEGROUND MINING MACHINES , Peter Kolapo

DISCRETE ELEMENT MODELING TO PREDICT MUCKPILE PROFILES FROM CAST BLASTING , Russell Lamont

AUTONOMOUS SHUTTLE CAR DOCKING TO A CONTINUOUS MINER USING RGB-DEPTH IMAGERY , Sky Rose

Theses/Dissertations from 2023 2023

ASSESSMENT OF AIR OVERPRESSURE FROM BLASTING USING COMPUTATIONAL FLUID DYNAMICS , Cecilia Estefania Aramayo

RECOVERY OF VALUABLE METALS FROM ELECTRONIC WASTE USING A NOVEL AMMONIA-BASED HYDROMETALLURGICAL PROCESS , Peijia Lin

AN ACID BAKING APPROACH TO ENHANCE RARE EARTH ELEMENT RECOVERY FROM BITUMINOUS COAL SOURCES , Ahmad Nawab

PREDICTION OF DYNAMIC SUBSIDENCE IN THE PROXIMITY OF LONGWALL PANEL BOUNDARIES , JESUS DAVID ROMERO BENITEZ

Prediction of Blast-Induced Ground Vibrations: A Comparison Between Empirical and Artificial-Neural-Network Approaches , Luis F. Velasquez

A LABORATORY AND NUMERICAL INVESTIGATION OF THE STRENGTH OF IRREGULARLY SHAPED PILLARS , Zachary Wedding

Theses/Dissertations from 2022 2022

DEVELOPMENT OF UNIVARIATE AND MULTIVARIATE FORECASTING MODELS FOR METHANE GAS EMISSIONS IN UNDERGROUND COAL MINES , Juan Diaz

PARAMETRIC NUMERICAL ANALYSIS OF INCLINED COAL PILLARS , Robin Flattery

Strain Energy Analysis Related To Strata Failure During Caving Operations , Caroline Gerwig

LAPTOP RECYCLING CASE STUDY: ESTIMATING THE CONTAINED VALUE AND VALUE RECOVERY PROCESS FEASIBILITY OF END-OF-LIFE CONSUMER ELECTRONICS , Zebulon Hart

INVESTIGATION INTO, & ANALYSIS OF TEMPERATURE & STRAIN DATA FOR COAL MINE SEAL MATERIAL DURING CURING , Stephanus Jaco van den Berg

Theses/Dissertations from 2021 2021

DEVELOPMENT OF AN AUTONOMOUS NAVIGATION SYSTEM FOR THE SHUTTLE CAR IN UNDERGROUND ROOM & PILLAR COAL MINES , Vasileios Androulakis

Investigation of Coal Burst Potential Using Numerical Modeling and Rock Burst Indices , Cristian David Cardenas Triana

Capture of Respirable Dust using Maintenance Free Impingement Screen , Neeraj Kumar Gupta

OXIDATION PRETREATMENT FOR ENHANCED LEACHABILITY OF RARE EARTH ELEMENTS FROM BITUMINOUS COAL SOURCES , Tushar Gupta

AN APPROACH FOR PREDICTING FLOW CHARACTERISTICS AT THE CONTINUOUS MINER FACE , Kayla Henderson

CONCEPTS FOR DEVELOPMENT OF SHUTTLE CAR AUTONOMOUS DOCKING WITH CONTINUOUS MINER USING 3-D DEPTH CAMERA , Sibley Miller

MODELING OF RARE EARTH SOLVENT EXTRACTION PROCESS FOR FLOWSHEET DESIGN AND OPTIMIZATION , Vaibhav Kumar Srivastava

Application of a Novel Ventilation Simplification Algorithm , Caitlin V. Strong

A METHODOLOGY FOR AUTONOMOUS ROOF BOLT INSTALLATION USING INDUSTRIAL ROBOTICS , Anastasia Xenaki

Theses/Dissertations from 2020 2020

NUMERICAL APPROXIMATION OF THE GROUND REACTION AND SUPPORT REACTION CURVES FOR UNDERGROUND LIMESTONE MINES , Jesus Castillo Gomez

Advanced Search

  • Notify me via email or RSS

Browse by Author

  • Collections
  • Disciplines

Author Corner

  • Submit Research

New Title Here

Below. --> connect.

  • Law Library
  • Special Collections
  • Copyright Resource Center
  • Graduate School
  • Scholars@UK

Logo of Kentucky Research Commons

  • We’d like your feedback

Home | About | FAQ | My Account | Accessibility Statement

Privacy Copyright

University of Kentucky ®

An Equal Opportunity University Accreditation Directory Email Privacy Policy Accessibility Disclosures

  • Skip to the main content
  • Skip to subnavigation
  • Skip to the footer

data mining bachelor thesis

Design Thinking in Practice - The case of WienerLinien

The E&I institute is proud to share another submission of a Bachelor thesis in the newly introduced video thesis format.

Our E&I student Emma Litschka has successfully completed her Bachelor thesis on “Design Thinking in Practice: The Case of WienerLinien” under the supervision of our E&I colleague Shtefi Mladenovska.

Emma created an educational video that exemplifies how a group of students in the E&I Project Course “InnoLab” employed Design Thinking to help WienerLinien tackle the challenge of elderly people not using public transport.

The students interviewed 65 elderly people in Vienna and observed their behaviour in public transport to identify their key frustrations. They employed five ideation methods to generate +80 ideas for solving these frustrations, and after ranking them based on technical feasibility and customer value, one of the solutions they chose to work on was a navigation screen for elderly people.

To turn this idea into a reality, the students developed a paper prototype for the navigation screen by simply sketching the layout of a user interface, which they then transformed into a digital mock-up that simulates user interface using the web application Figma.

Finally, to ensure that this solution actually satisfied user needs, they went to gather feedback from the users. Ten seniors tried out the prototype, providing feedback on features they liked and features they deem needed some improvement. The feedback was used to adjust the prototype accordingly and test it again. 

We thank our contact person at WienerLinien, Ms. Lilian Izsak, for providing the students with the opportunity to work on this exciting project.

The student team that worked on this project in the InnoLab project course includes Katharina Anic, Aaron Farrokhnejad Afshar, Richard Wagentristl, and Marina Yazykova. They delivered this project under the supervision of our colleagues Erik Kommol and Carola Wandres. We thank them for the excellent work and for making their detailed documentation available for this thesis.

Do you also want to make an impact with your Bachelor Thesis?

The newly introduced Video Thesis format might be an opportunity!

Develop an educational video for an element (theory, method, tool, framework, etc.) of Core Lecture 1 - and thereby help future E&I students when learning and gain you public visibility as an E&I expert!

Follow this link to learn more about this format.

Is there a CL 1 element that you think deserves such a video?

Contact the CL-1 module experts and apply for supervision.

Follow this link to learn more about the application process and module expert contact details.

Video Design Thinking in Practice: The Case of WienerLinien

Design Thinking in Practice: The…

Upon activation, some data may be transmitted to third parties (YouTube). For further information, please see our Data Protection Statement .

Research Portal

Your data, your choice.

Our website uses cookies. Some of them are essential for the functionality of our website, while others are optional. By clicking the button “Accept all cookies,” you consent to all cookies, including cookies provided by US companies . This means that the personal data collected by the respective cookies are no longer subject to level of protection deemed appropriate under EU data protection law. In such cases, you only have very limited or no rights as a data subject in the US. In particular, the US government may gain access to these data. If you would like to reject all cookies or consent only to individual cookies, please click on “Individual settings” – this will allow you to manage your individual cookie preferences.

Name Purpose Lifetime Provider
CookieConsent Saves your consent to using cookies. 30 days WU
site-popup Saves if popup was filled or closed. 30 days WU
BACH_PRXY_ID To be able to display some WU-specific content, it is necessary that some information must be accessed by back-end WU systems. Required to assign the appropriate answer to a request. 20 years WU
BACH_PRXY_SN To be able to display some WU-specific content, it is necessary that some information must be accessed by back-end WU systems. Required to assign the appropriate answer to a request. session WU
fe_typo_user Required for login and access to protected content or for editing the user’s personal profile. session WU
be_typo_user Required for login and editing content in the TYPO3 back end. session WU
be_lastLoginProvider Stores the last method used for logging in to the TYPO3 back end. 90 days WU
ASP.NET_SessionId Required for assigning visitors to forms. session WU (forms.wu.ac.at)
__RequestVerificationToken Required to protect forms against attacks. session WU (forms.wu.ac.at)
ESRASOFTSID Required for identifying the logged-in user in the Business Language Center’s course registration system. session WU (esrasoft.wu.ac.at)
esraSoftWiData Required to track the language and language courses selected by the user. session WU (esrasoft.wu.ac.at)
esraSimpleSAMLAuthToken Required for identifying WU employees during the course registration process. session WU (esrasoft.wu.ac.at)
esraSimpleSAML Required for identifying WU employees during the course registration process. session WU (esrasoft.wu.ac.at)
SimpleSAML Required for identifying WU employees during the course registration process. session WU (esrasoft.wu.ac.at)
Name Purpose Lifetime Provider
_pk_id Used by Matomo Analytics to store a few details about the user, such as the unique visitor ID. 30 days WU (piwik.wu.ac.at)
_pk_ref Used by Matomo Analytics to store the attribution information, the referrer initially used to visit the website. 6 months WU (piwik.wu.ac.at)
_pk_ses Created by Matomo Analytics, short-lived cookies used to temporarily store data for the current visit. 1 hours WU (piwik.wu.ac.at)
_gcl_au Contains a randomly generated user ID. 3 months Google
AMP_TOKEN Contains a token that can be used to retrieve a Client ID from AMP Client ID service. Other possible values indicate opt-out, request in progress or an error retrieving a Client ID from AMP Client ID service. 1 year Google
_dc_gtm_--property-id-- Used by DoubleClick (Google Tag Manager) to help identify the visitors by either age, gender or interests. 2 years Google
_ga Contains a randomly generated user ID. Using this ID, Google Analytics can recognize returning users on this website and merge the data from previous visits. 2 year Google
_gat_gtag Certain data is only sent to Google Analytics a maximum of once per minute. As long as it is set, certain data transfers are prevented. 1 minute Google
_gid Contains a randomly generated user ID. Using this ID, Google Analytics can recognize returning users on this website and merge the data from previous visits. 24 hour Google
_gac_gb Contains campaign-related information for the user. If Google Analytics and Google Ads accounts are linked, the conversion tags on the Google Ads website read this cookie. 90 day Google
_dc_gtm Used to throttle the request rate. 1 minute Google
IDE Contains a randomly generated user ID. Using this ID, Google can recognize the user across different websites across domains and display personalized advertising. 1 year Google
player This cookie saves user-specific settings before an embedded Vimeo video is played. This means that the next time you watch a Vimeo video, your preferred settings will be loaded. 1 year Vimeo
vuid This cookie is used to save the usage history of the user. 2 year Vimeo
__cf_bm This cookie is used to distinguish between humans and bots. This is necessary for Vimeo to collect valid data about the use of the service. 1 day Vimeo
_uetvid This cookie is set to enable the use of the Vimeo video player. 1 year Vimeo
_tt_enable_cookie This cookie is used to enable the vimeo video embedding on the WU Website and for other unspecified purposes. 1 year Vimeo
afUserId This cookie collects data from users who interact with embedded Vimeo videos. 2 years Vimeo
_abexps This cookie saves settings made by the user, e.g. Default language, region or username as well as interaction data of the user with Vimeo 10 months Vimeo
_clck This cookie enables the use of the embedded Vimeo video player 1 year Vimeo
has_logged_in This cookie stores login information and if the user has ever logged in. 10 years Vimeo
language This cookie remembers the language setting of a user. This ensures that Vimeo appears in the language selected by the user. 11 years Vimeo
_ttp This cookie is set to enable the use of the Vimeo video player 1 year Vimeo
sd_client_id This cookie stores data about the users current video settings and a personal identification token 2 year Vimeo
_rdt_uuid This cookie collects data about the users actions on websites that have a vimeo video embedded. 3 months Vimeo
vimeo_cart This cookie is used to check how many times a video has been played by the user. 10 years Vimeo
OptanonConsent This cookie stores information about the consent status of a visitor. 1 year Vimeo
_scid This cookie is used to assign a unique ID to a user 10 months Vimeo
hjSessionBenutzer_ Set when a user first lands on a page. Persists the Hotjar User ID which is unique to that site. Hotjar does not track users across different sites. Ensures data from subsequent visits to the same site are attributed to the same user ID. 1 year Hotjar
_hjid This is an old cookie which is not set anymore, but if a user has it unexpired in their browser. It will be reused and migrated to _hjSessionUser_{site_id}. Set when a user first lands on a page. Persists the Hotjar User ID which is unique to that site. Ensures data from subsequent visits to the same site are attributed to the same user ID. 1 year Hotjar
_hjFirstSeen Identifies a new users first session. Used by Recording filters to identify new user sessions. 30 minutes Hotjar
_hjHasCachedUserAttributes Enables us to know whether the data set in _hjUserAttributes Local Storage item is up to date or not. session Hotjar
_hjUserAttributesHash Enables us to know when any User Attribute has changed and needs to be updated. 2 minutes Hotjar
_hjBenutzerAttribute Stores User Attributes sent through the Hotjar Identify API. No explicit expiration. session Hotjar
hjViewportId Stores user viewport details such as size and dimensions. session Hotjar
hjActiveViewportIds Stores user active viewports IDs. Stores an expirationTimestamp that is used to validate active viewports on script initialization. session Hotjar
_hjSession_ Holds current session data. Ensures subsequent requests in the session window are attributed to the same session. 30 minutes Hotjar
_hjSessionTooLarge Causes Hotjar to stop collecting data if a session becomes too large. Determined automatically by a signal from the server if the session size exceeds the limit. 1 hour Hotjar
_hjSessionResumed Set when a session/recording is reconnected to Hotjar servers after a break in connection. session Hotjar
_hjCookieTest Checks to see if the Hotjar Tracking Code can use cookies. If it can, a value of 1 is set. Deleted almost immediately after it is created. session Hotjar
_hjLocalStorageTest Checks if the Hotjar Tracking Code can use Local Storage. If it can, a value of 1 is set. Data stored in _hjLocalStorageTest has no expiration time, but it is deleted almost immediately after it is created. none Hotjar
_hjSessionStorageTest Checks if the Hotjar Tracking Code can use Session Storage. If it can, a value of 1 is set. Data stored in _hjSessionStorageTest has no expiration time, but it is deleted almost immediately after it is created. none Hotjar
_hjIncludedInPageviewSample Set to determine if a user is included in the data sampling defined by your site's pageview limit. 2 minutes Hotjar
_hjIncludedInSessionSample_ Set to determine if a user is included in the data sampling defined by your site's daily session limit. 2 minutes Hotjar
_hjAbsoluteSessionInProgress Used to detect the first pageview session of a user. 30 minutes Hotjar
_hjTLDTest We try to store the _hjTLDTest cookie for different URL substring alternatives until it fails. Enables us to try to determine the most generic cookie path to use, instead of page hostname. It means that cookies can be shared across subdomains (where applicable). After this check, the cookie is removed. session Hotjar
Name Purpose Lifetime Provider
test_cookie Is set as a test to check whether the browser allows cookies to be set. Does not contain any identification features. 15 minute Google
IDE Contains a randomly generated user ID. Using this ID, Google can recognize the user across different websites across domains and display personalized advertising. 1 year Google
_gcl_au Contains a randomly generated user ID. 90 day Google
_gcl_aw This cookie is set when a user clicks on a Google ad on the website. It contains information about which ad was clicked. 90 day Google
xs Used to maintain a Facebook session. It works in combination with the c_user cookie to authenticate the user's identity on Facebook. 1 year Facebook
fr Used to serve advertisements and measure and improve their relevance. 90 day Facebook
m_pixel_ratio Performance cookie used by Facebook with Facebook pixels. session Facebook
wd Used for analysis purposes. Technical parameters are logged (e.g. aspect ratio and dimensions of the screen) so that Facebook apps can be displayed correctly. 7 day Facebook
dpr Used for analysis purposes. Technical parameters are logged (e.g. aspect ratio and dimensions of the screen) so that Facebook apps can be displayed correctly. 7 day Facebook
sb Used to save browser details and Facebook account security information. 2 year Facebook
dbln Used to save browser details and Facebook account security information. 2 year Facebook
spin Cookie for advertising purposes and reporting on social campaigns. session Facebook
presence Contains the "Chat" status of a logged in user. 1 month Facebook
x-referer Performance cookie that is used by Facebook in combination with Facebook pixels. session Facebook
cppo Cookie for statistical purposes. 90 day Facebook
datr Identifies the browser for security and website integrity purposes, including account recovery and identification of potentially compromised accounts. 2 year Facebook
locale Saves language settings. session Facebook
_fbp A cookie for Facebook advertising that is used to track and improve relevance and to serve ads on Facebook. 90 day Facebook
_fbc A cookie for Facebook advertising that is used to track and improve relevance and to serve ads on Facebook. 90 day Facebook
UserMatchHistory This cookie is used to synchronize the LinkedIn Ads IDs. 30 day LinkedIn
AnalyticsSyncHistory This cookie saves the time at which the user was synchronized with the "lms_analytics" cookie. 30 day LinkedIn
li_oatml This cookie is used to identify LinkedIn members outside of LinkedIn for advertising and analysis purposes. 30 day LinkedIn
lms_ads This cookie is used to identify LinkedIn members outside of LinkedIn. 30 day LinkedIn
lms_analytics This cookie is used to identify LinkedIn members for analysis purposes. 30 day LinkedIn
li_fat_id This cookie is an indirect member identification that is used for conversion tracking, retargeting and analysis. 30 day LinkedIn
li_sugr This cookie is used to determine probabilistic matches of the identity of a user. 90 day LinkedIn
U This cookie identifies the user’s browser. 3 month LinkedIn
_guid This cookie is used to identify a LinkedIn member for advertising via Google Ads. 90 day LinkedIn
BizographicsOptOut This cookie is used to determine the rejection status for tracking by third-party providers. 10 year LinkedIn
lidc This cookie makes it easier to select LinkedIn's data center. 24 hours LinkedIn
aam_uuid This cookie is used for ID synchronization with Adobe Audience Manager. 30 days LinkedIn
AMCV_XXX_at_AdobeOrg This cookie contains a unique identifier for the Adobe Experience Cloud. 180 days LinkedIn
li_mc This cookie is used as a temporary cache. It is used to have the user's consent information from the database available client side. 2 years LinkedIn
lang This cookie stores the language settings of a user. This ensures that the LinkedIn.com website appears in the language selected by the user. session LinkedIn
twll This cookie is set when X is embedded on the page. X collects data that is mainly used for tracking and targeting. 4 year X
secure_session This cookie is set when X is embedded on the page. E.g. X's like or sharing functions. 14 year X
guest_id This cookie is set by X when a visitor shares content from the WU website on X. 2 year X
personalization_id This cookie is set by X to measure the performance of X advertising campaigns in a user's browsers and devices 2 year X
remember_checked This cookie is set by when X is embedded on the page. X collects data that is mainly used for tracking and targeting. 4 year X
remember_checked_on This cookie is set when X is embedded on the page. E.g. X's like or sharing functions. 4 year X
mbox This cookie is intended for identifying X users, for analyzing interaction with the X Service and advertising whitin the service. 2 years X
guest_id_ads This cookie is set due to X integration and for sharing content to social media. 10 months X
d_prefs This cookie ist used to check referral links and the login status. 90 days X
ct0 This cookie is set due to X integration and sharing capabilities for the social media. 10 months X
kdt This cookie is used to monitor the users login status on X. 10 months X
guest_id_marketing This cookie is set for tracking and analytics purposes. 10 months X
twid This cookie checks if you are logged in to X during a browser session. 1 year X
auth_token This cookie is required for authentication and checks whether the user is logged in. 10 months X
external_referer This cookie collects statistical data, including how often you visit X and how long a user stays on X. 1 day X
NID This cookie contains a unique ID that is used to save user-specific settings and other information, in particular your preferred language, how many search results should be displayed per page and whether the Google SafeSearch filter should be activated. 6 month YouTube
1P_JAR This Google cookie is used to optimize advertising, to provide ads that are relevant to users, to improve reports on campaign performance or to prevent a user from seeing the same ads multiple times. 1 month YouTube
CONSENT This cookie is used to support Google's advertising services. 20 year YouTube
OTZ Aggregated analysis of website visitors. 17 day YouTube

data mining bachelor thesis

CENG Connection

Today’s news, tomorrow’s impact.

  • Digital Democracy: Transforming Legislative Transparency and Civic Engagement in California 

Professor points to a video screen about Digital Democracy

  • July 30, 2024
  • CalMatters , Collaboration , Computer Science , Digital Democracy

A decadelong passion for local journalism and a push for government transparency in California have culminated in the latest version of Digital Democracy, a groundbreaking project led by a dedicated professor.  

Digital Democracy offers journalists and citizens unparalleled access to a comprehensive, searchable database of state-level legislative information, fostering greater civic engagement and healthier public interest decisions in California – the world’s fifth-largest economy with an annual budget of about $300 billion.  

“We are processing 200 hours of hearings a week, with up to 10 happening simultaneously during the Legislature’s busy periods,” said Foaad Khosmood, computer science professor and research director at the Institute for Advanced Technology & Public Policy (IATPP). “It would take an army of 200 reporters to cover that many hearings.”  

Since his days as a technology manager for the Mustang Daily (now Mustang News) at Cal Poly, Khosmood has been a staunch supporter of local journalism. Disturbed by the ongoing cuts to reporters and resources in newsrooms, he has witnessed the rise of “news deserts” – communities with limited access to credible and comprehensive information – across California and the country.  

Digital Democracy has provided Khosmood with an outlet for action and activism. 

“We wanted to empower journalists to write stories that help Californians understand their state government and hold their politicians accountable,” Khosmood said. “With local journalism devastated, Digital Democracy is our way of making a difference.”  

Group from Cal Poly stands in front of the state Capitol

Project Evolution  

Digital Democracy was initially launched in 2015 by former state Sen. Sam Blakeslee and then-Lt. Gov. Gavin Newsom through the IATPP with key contributors including Khosmood and Cal Poly alumna Christine Robertson, the original designer and manager of the platform. 

In March 2024, a revamped Digital Democracy was launched by CalMatters, a nonprofit news organization, broadening its reach and impact.  

The database consolidates hard-to-access public information, capturing every word from public hearings and floor sessions, the full text of bills with amendments, votes, and lists of supporters and opponents. It also includes financial data such as campaign donations, expenditures, gifts and travel, and district data like voter registration, election results and demographics.  

“We use all kinds of AI on this,” said Khosmood, noting that the database and artificial intelligence were designed and built by faculty and students at Cal Poly. 

Additionally, it supports journalism by providing reporters with quick and easy access to extensive legislative information, enabling them to uncover relationships, patterns and anomalies within the Legislature and the policymaking process.  

“We are leveraging the uniqueness of this database, which compiles records that aren’t available in a single location anywhere else,” Khosmood said. “Some of these records were never available before.” 

For example, through Digital Democracy, CalMatters reporters uncovered that legislators often kill bills by simply not voting. This insight was made possible by the database, which provided access to 1 million votes cast by current legislators over the past five years.  

Digital Democracy also assists journalists in identifying stories about the legislative process with a custom AI tool that scans the database and suggests story ideas. These ideas and the resulting stories are then shared with news outlets across the state.  

Professor and grad student work on laptops

Innovative Data Solutions  

Cal Poly computer science graduate Thomas Gerrity, who was one of at least 12 graduate students to write their master’s thesis on Digital Democracy, now oversees the technical program for CalMatters.  

“It’s interesting to see legislators take notice, realizing they are being monitored,” he said.  

He highlighted a feature that tracks the number of words each legislator speaks, creating a “Top 10 Most Talkative” list. Legislators have taken note, with some even mentioning their ranking during addresses about bills or issues.  

Gerrity regularly collaborates with Cal Poly students and at least one from outside Cal Poly who is working on the project this summer. Sasha Prostota, a data science student from UCLA studying applied math, is helping tackle one of the major challenges in mining data from legislative hearings: names.  

There is no standardized identification when a person, organization or company testifies in a public hearing. Variations like “ACLU” vs. “American Civil Liberties Union” or subgroups such as “Association for Commuter Transportation” vs. “Association for Commuter Transportation, Southern California Chapter” can complicate data analysis, as these entities are the same.  

“This was a serious problem that came up early, and there was no existing solution,” Khosmood said. “Many organizations have different names, but it’s crucial we accurately characterize these groups to determine their political influence.”  

Working out of IATPP headquarters on campus, Prostota and Gerrity are developing a methodology to identify relationships between groups and their names, with the goal of creating an automated system.  

“Applying what I’ve learned in my classes to something impactful and meaningful is very rewarding,” said Prostota, who advocates for greater transparency between citizens and their government in California.  

For more details and to see Digital Democracy in action, explore the project here .

By Emily Slater

Founder of CalMatters gives a presentation at Cal Poly

Photo of the Week

data mining bachelor thesis

Post Categories

  • Awards & Recognition
  • College Update
  • Dean's Advisory Council
  • Faculty Highlights
  • Faculty/Staff News
  • Giving For Impact
  • Justice, Equity, Diversity & Inclusion
  • Research Report
  • Strategic Plan
  • Student Opportunities
  • Upcoming Events

Trending Posts

  • Biomedical Engineering Breakthrough: Cal Poly Research Lab Leads the Way in Blood Vessel Mimics 
  • Women Leaders in Engineering: Lily Laiho
  • Maria Manzano Honored with President’s Diversity Award for Pioneering Efforts in Engineering Outreach 
  • Empowering Engineers: Dean Amy Fleischer’s Insights on Leadership and Innovation 

Trending Tags

Contributor form, submission guidelines, post archives.

COMMENTS

  1. PDF The application of data mining methods

    This thesis first introduces the basic concepts of data mining, such as the definition of data mining, its basic function, common methods and basic process, and two common data mining methods, classification and clustering. Then a data mining application in network is discussed in detail, followed by a brief introduction on data mining ...

  2. Open Theses

    Open Topics We offer multiple Bachelor/Master theses, Guided Research projects and IDPs in the area of data mining/machine learning. A non-exhaustive list of open topics is listed below.. If you are interested in an internal thesis or a guided research project, please send your CV and transcript of records to Prof. Stephan Günnemann via email ([email protected]) or to the project's ...

  3. PDF DATA PREPROCESSING FOR DATA MINING

    1 INTRODUCTION TO DATA MINING 5 1.1 Background 5 1.2 Definition 6 1.3 Data Source 7 1.4 Application 8 1.5 Challenges 10 2 RELATED TECHNIQUES VS DATA MINING 12 2.1 Data warehouse 12 2.2 Online analytical processing 13 2.3 Statistics and Machine Learning 14 3 WORKING THEORY OF DATA MINING 16 3.1 Task 16 3.2 Process 18 3.3 Data preprocessing 20

  4. PDF Big data mining

    3 DATA Bachelor Thesis 2020/2021 - Richie Lee 2Literature The foundation of this thesis, Mining big data using parsimonious factor, machine learning, vari-able selection and shrinkage methods byKim and Swanson(2018) focuses on the usefulness of factor models in the context of prediction using big data. In particular, this research examines perfor-

  5. Data Mining

    Data Mining. Data Science; Data and Artificial Intelligence; Overview; Fingerprint; Network; Researchers (45) Projects (2) Research output (638) Datasets (4) Prizes (17) Activities (6) ... Student thesis: Bachelor. File. A Deep Learning Approach for Clustering a Multi-Class Dataset Kamat, V. (Author) ...

  6. PDF Bachelor's Thesis (UAS) Degree Program in Information Technology

    1.2.1 Data Mining Data mining, the technology, art and science of delving complex and large bodies into of data in order to ascertain useful patterns, is a part of the general process of knowledge discovery in databases (KDD). Practitioners and theoreticians are incessantly seeking ameliorated techniques to make the process more accu- rate, cost

  7. Application of Data Mining Methods for Customer Clustering

    The motivation behind this thesis is to investigate the value of clustering in the machine learning/data mining context for customer segmentation. Classical database marketing methods are combined with data mining tools. Data mining techniques can be used to create the segments automatically.

  8. PDF BACHELOR THESIS APPENDICES

    The bachelor thesis is thematically focused on an in-depth coverage of the principles of data mining. The main goal of this bachelor thesis is to understanding the concepts of data mining and data mining techniques, the process of managing and extracting data, analyzing and establishing

  9. Dissertations / Theses: 'Data mining'

    This thesis presents a data mining methodology for this problem, as well as for others in domains with similar types of data, such as human activity monitoring. It focuses on the variable selection stage of the data mining process, where inputs are chosen for models to learn from and make inferences. Selecting inputs from vehicle telemetry data ...

  10. PDF Hash-based Approach to Data Mining

    My thesis, with the subject "hash-based approach to data mining" focuses on the hash-based method to improve performance of finding association rules in the transaction databases and use the PHS (perfect hashing and data shrinking) algorithm to build a system, which helps directors of shops/stores to have a detailed view about his business.

  11. Data Science

    A method for identifying undesired medical treatment variants using process and data mining techniques Cremers, L. M. W. (Author) Vanderfeesten, ... Student thesis: Bachelor. File. Analysis and improvement of process models with respect to key performance indicators: a debt collection case study Syring, A. F. (Author) ...

  12. Statistical Learning and Data Science Chair :: Theses

    The chair typically offers various thesis topics each semester in the areas computational statistics, machine learning, data mining, optimization and statistical software. ... (bachelor thesis) and 40 minutes (master thesis). Here, the student is expected to summarize his/her main results of the thesis in a presentation. The supervisor(s) will ...

  13. Bachelor and Master Theses

    Ziqiu Zhou: Semantic Extensions of OSM Data Through Mining Tweets in the Domain of Disaster Management, Master Thesis, May 2022. Lukas Ballweg: Analysis of Lobby Networks and their Extraction from Semi-Structured Data, Bachelor Thesis, April 2022.

  14. PDF Data mining using open source software for small business ...

    Bachelor Thesis Degree Programme in BIT 2016. Abstract 3th Mai 2015 Authors Antoine Dubuis The title of your thesis Data mining using open source software for small business Number of pages and ap-pendices 63+4 Supervisors Arvo Lipitsainen - Advisor from Haaga-Helia University of Applied Sciences

  15. Theses

    Theses. Prof. Bouros has a long experience in advising and (co-)supervising undergraduate and graduate student projects. The Data Management group offers interesting topics for theses (bachelor or master) on query processing, database systems and information systems. Some suggested topics can be found in the following link (last update on June ...

  16. Multi-Agent-Based Collaborative Machine Learning in Distributed

    This dissertation presents decentralized and agent-based solutions for organizing machine learning resources, such as datasets and learning models. It aims to democratize the analysis of these resources through a simple yet flexible query structure, automate common ML tasks such as training, testing, model selection, and hyperparameter tuning, and enable privacy-centric building of ML models ...

  17. PDF Bachelor Thesis A machine learning approach to enhance the ...

    Bachelor Thesis A machine learning approach to enhance the privacy of customers En maskininärningsmetod för ökad kundintegritet. Jesper Anderberg ... The report also examines how data mining is affected in the context of private information. In the first case, the authors collected biometric samples from 200 people. According to

  18. GitHub

    Twitter data mining - social media as a rising customer service channel. Presentation slides ---- Thesis paper. Collect data from Twitter API using Python scripts that records the customer service interactions on major consumer electronic brands such as Apple and Samsung. Parse Json data objects in Python and organize tweets into customer ...

  19. Theses and Dissertations--Mining Engineering

    the methodology for integrating robotic systems in undeground mining machines, peter kolapo. pdf. discrete element modeling to predict muckpile profiles from cast blasting, russell lamont. pdf. autonomous shuttle car docking to a continuous miner using rgb-depth imagery, sky rose. theses/dissertations from 2023 pdf

  20. Challenges and Potentials of Process Mining

    This thesis sheds light on the topic of process mining on a scientific basis. Among other things, the methods, challenges and limitations, the potential of process mining and the degree of maturity of various algorithms are analysed. The work is of interest to anyone who, in addition to colourful graphs, would like to understand how exactly ...

  21. PDF Data Mining Thesis Topics in Finland

    Bachelor of Engineering Information Technology Thesis 5 May 2017. Abstract Author Title Number of Pages Date Ari Bajo Rouvinen Data Mining Thesis Topics in Finland 46 pages ... This thesis is based on data mining the Theseus dataset. This dataset is maintained by Arene Ry [1], the Rector's Conference of Finnish Universities of Applied ...

  22. PDF Graduate Courses of Study

    This course covers the final stages in the preparation of a Master's Thesis. Stages include implementation of a research plan, data collection and analysis, development of the thesis manuscript according to program requirements and guidelines, thesis defense, and final submission of the manuscript. Six credit hours. READ

  23. Design Thinking in Practice

    The E&I institute is proud to share another submission of a Bachelor thesis in the newly introduced video thesis format. Our E&I student Emma Litschka has successfully completed her Bachelor thesis on "Design Thinking in Practice: The Case of WienerLinien" under the supervision of our E&I colleague Shtefi Mladenovska.

  24. PDF Data mining in medical diagnostic support system

    Bachelor's Thesis Degree Programme in BIT 2019 . Abstract Date: 9 May 2019 Author(s) Khoa Nguyen Degree programme Report/thesis title Data mining in medical diagnostic support system Number of pages and appendix pages 45 + 5 The health and education are always a vital issue for any countries in the world. ... Data mining is a technology based ...

  25. Digital Democracy: Transforming Legislative Transparency and Civic

    A decadelong passion for local journalism and a push for government transparency in California have culminated in the latest version of Digital Democracy, a groundbreaking project led by a dedicated professor. Digital Democracy offers journalists and citizens unparalleled access to a comprehensive, searchable database of state-level legislative information, fostering greater civic engagement ...

  26. What Makes Fake News Appeal to You? Empirical Evidence from the Tweets

    Bachelor or diploma: 472 (63.5) Postgraduate Degree: 185 (24.9) Twitter use per day (min) Under 30 min: ... Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, Melbourne. Crossref. Google Scholar. ... [Thesis, University of Missouri-Columbia].