Designing a Data Warehouse for Collected Data About User Activity in Social Networks Using Elasticsearch
Abstract
In this paper, a data storage data warehouse is designed to store collected data from social networks. Creating indexes with data and selecting a configuration with the appropriate number of shards and replicas is described – the primary states of the cluster and possibilities of its scaling. The features of working with the non-relational Elasticsearch database are described when working with data on user activity in social network posts. Among social networks, Facebook and Instagram were chosen for analysis. The paper describes the advantages and disadvantages of using such a data store compared to Apache Kafka.
Analysed existing data insertion Application Program Interfaces (APIs) and data visualisation tools integrated with Elasticsearch. The study describes the use of the Bulk API to insert many records at once into a database. The designed data warehouse uses Kibana, a data visualisation and analytics tool integrated with the selected database. Also, it is shown the ability to insert and view logs using Elasticsearch, Logstash, and Kibana (ELK stack). Tested data ingest by logging into the database using Beats. The obtained results can help implement a system for analysing user activities from social network data based on Elasticsearch as a central component.Keywords
Full Text:
PDFReferences
1. Mysiuk I., Mysiuk R., & Shuvar R. (2023). Collecting and analyzing news from newspaper posts in facebook using machine learning. Artificial Intelligence, 28(1), 147–154. doi: 10.15407/jai2023.01.147
2. Mysiuk, I., Mysiuk, R., Shuvar, R., & Yuzevych, V. (2022). Methods of Analytics of Big Data of Popular Electronic Newspapers on Facebook. Electronics and Information Technologies, 19, 66-74. doi: 10.30970/eli.19.6
3. Manias, G., Mavrogiorgou, A., Kiourtis, A., Kakomitas, D., & Kyriazis, D. (2021). Real-Time Kafka-Based Topic Modeling and Identification of Tweets. 2021 IEEE International Conference on Progress in Informatics and Computing (PIC). doi: 10.1109/pic53636.2021.9687024
4. Raptis, T. P., & Passarella, A. (2023). A Survey on Networked Data Streaming with Apache Kafka. IEEE Access, 1–1. doi: 10.1109/access.2023.3303810
5. Mysiuk, R., & Yuzevych, V. (2023). Recover Data about Detected Defects of Underground Metal Elements of Constructions in Amazon Elasticsearch Service. Path of Science, 9(1), 1011–1019. doi: 10.22178/pos.89-9
6. SaaSHub. (2023). ElasticSearch VS Kafka. Retrieved from https://www.saashub.com/compare-elasticsearch-vs-kafka
7. MirBozorgi. (2023). Spark, Kafka, Cassandra and Elasticsearch applications. Retrieved from https://mirbozorgi.com/spark-vs-kafka-vs-cassandra
8. Elastic. (2023). Download Elasticsearch. Retrieved from https://www.elastic.co/downloads/elasticsearch
9. Maffeo, L. (2019, July 11). How to install Elasticsearch on MacOS. Retrieved from https://opensource.com/article/19/7/installing-elasticsearch-macos
10. Elastic. (2023). Download Kibana. Retrieved from https://www.elastic.co/downloads/kibana
Article Metrics
Metrics powered by PLOS ALM
Refbacks
- There are currently no refbacks.
Copyright (c) 2023 Iryna Mysiuk

This work is licensed under a Creative Commons Attribution 4.0 International License.




