Benchmarking Relational and Array-Based Models for Genealogical Data Storage in PostgreSQL

Authors

  • Suwanto Raharjo Faculty of Science and Information Technology, Universitas AKPRIND Yogyakarta
  • Ema Utami Faculty of Computer Science, Universitas Amikom Yogyakarta

DOI:

https://doi.org/10.52436/1.jutif.2026.7.1.5629

Keywords:

genealogical database, postgresql, hierarchical data, uuid array, tree traversal

Abstract

Genealogical information systems manage inherently hierarchical data structures that represent family relationships across multiple generations. Traditional implementations predominantly rely on normalized relational database designs using junction tables to model parent–child relationships. While this approach ensures strong referential integrity, it often incurs substantial performance overhead due to complex join operations during deep hierarchical traversal. Recent versions of PostgreSQL provide native support for array data types This study compares two genealogical database models implemented in PostgreSQL: a normalized relational model using a junction table and a denormalized model that stores child identifiers directly as UUID arrays. To evaluate their performance, we conducted controlled benchmarking experiments using synthetically generated genealogical datasets with varying generational depth and branching patterns. The comparison focuses on storage efficiency, recursive traversal performance, and write operation costs under realistic hierarchical workloads. Results obtained from a large-scale dataset containing more than 7 million individual records show that the UUID array–based model reduces disk space usage by 31%. During deep recursive traversal involving over 12 million nodes at the tenth generation, the array-based model demonstrates improved data locality, leading to a 5.2% reduction in execution latency and 7% fewer shared buffer accesses compared to the relational model. Interestingly, contrary to common expectations in normalized database design, the array-based model achieves 22% faster single-insert performance because it avoids foreign key validation and multiple index updates. This improvement comes with slightly higher write amplification, reflected in a 6.6% increase in buffer usage due to PostgreSQL’s multi-version concurrency control mechanism. These findings contribute to the field of Informatics by providing empirical evidence on how database internal mechanisms influence performance trade-offs in hierarchical data management, offering guidance for designing scalable and read-efficient information systems beyond genealogical applications.

Downloads

Download data is not yet available.

References

F. Shan and K. Luther, “Reexamining Technological Support for Genealogy Research, Collaboration, and Education,” Proceedings of the ACM on Human-Computer Interaction, vol. 9, no. 2, pp. 1–33, May 2025, doi: 10.1145/3711053.

B. Karwin, SQL Antipatterns: Avoiding the Pitfalls of Database Programming. Raleigh, NC, USA: Pragmatic Bookshelf, 2010.

J. Celko, Joe Celko’s Trees and Hierarchies in SQL for Smarties. 2012. doi: 10.1016/c2010-0-69241-8.

D. Hu, X. Cheng, G. Lü, Y. Wen, and M. Chen, “The China Family Tree Geographic Information System,” in Human dynamics in smart cities, 2020, pp. 13–37. doi: 10.1007/978-3-030-52734-1_3.

M. Kleppmann, Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems. Sebastopol, CA, USA: O’Reilly Media, 2017.

PostgreSQL Global Development Group, “PostgreSQL documentation: GIN indexes,” 2025. [Online]. Available: https://www.postgresql.org/docs/current/gin.html. Describes the usage of Generalized Inverted Indexes (GIN) and support for indexing array and other composite data types.

R. Čerešňák and M. Kvet, “Comparison of query performance in relational a non-relation databases,” Transportation Research Procedia, vol. 40, pp. 170–177, Jan. 2019, doi: 10.1016/j.trpro.2019.07.027.

J. T. Harviainen and B.-C. Björk, “Genealogy, GEDCOM, and popularity implications,” Informaatiotutkimus, vol. 37, no. 3, Oct. 2018, doi: 10.23978/inf.76066.

FamilySearch, “GEDCOM X conceptual model,” 2014. [Online]. Available: https://github.com/FamilySearch/gedcomx. Modern genealogical data model for web based systems

A. Rey, T. Neumann, and M. Rieger, “Nested Parquet Is Flat, Why Not Use It? How To Scan Nested Data With On-the-Fly Key Generation and Joins,” Proceedings of the ACM on Management of Data, vol. 3, no. 3, pp. 1–24, Jun. 2025, doi: 10.1145/3725329.

M. A. Aceto et al., “#3418 Genealogical analysis in the Era of big data: implications for biomedical research,” Nephrology Dialysis Transplantation, vol. 40, no. Suppl 3, Oct. 2025, doi: 10.1093/ndt/gfaf116.004.

K. Min, M. Jung, H. Lee, and J. Cho, “Optimization for Large-Scale n-ary Family Tree Visualization,” Journal of information and communication convergence engineering, vol. 21, no. 1, pp. 54–61, Mar. 2023, doi: 10.56977/jicce.2023.21.1.54.

T. Taipalus, “On the effects of logical database design on database size, query complexity, query performance, and energy consumption.” Jan. 13, 2025. doi: 10.48550/arxiv.2501.07449.

K. Min, M. Jung, H. Lee, and J. Cho, “Analysis of Impact Between Data Analysis Performance and Database,” Journal of information and communication convergence engineering, vol. 21, no. 3, pp. 244–251, Sep. 2023, doi: 10.56977/jicce.2023.21.3.244.

Gennadii Turutin, Mikita Puzevich, "PostgreSQL JSONB-based vs. Typed-column Indexing: A Benchmark for Read Queries", IEEE Dataport, October 29, 2025, doi:10.21227/fxws-3a11

G. Toktomusheva, “Indexing in PostgreSQL: Performance Evaluation and Use Cases.” MDPI Preprints, Nov. 27, 2025. doi: 10.20944/preprints202511.2170.v1.

R. O. Obe and L. Hsu, PostgreSQL: Up and Running—A Practical Guide to the Advanced Open Source Database, 3rd ed. Sebastopol, CA, USA: O’Reilly Media, 2017.

R. Jain, The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling. New York, NY, USA: John Wiley & Sons, 1991.

M. Raasveldt et al., “Fair benchmarking considered difficult: Common pitfalls in database performance comparisons,” in Proceedings of the Workshop on Testing Database Systems, ACM, 2018. doi: 10.1145/3209950.3209955.

M. Jurkovic, V. Hudovernik, and E. Štrumbelj, “SyntheRela: A benchmark for synthetic relational database generation,” in Will Synthetic Data Finally Solve the Data Access Problem?, 2025. https://iclr.cc/virtual/2025/32181

Badan Pusat Statistik, “Indikator fertilitas long form Sensus Penduduk 2020,” 2022.

K. Davis, P. Leach, and B. Peabody, “Universally Unique IDentifiers (UUIDs),” rfc editor, May 2024. doi: 10.17487/rfc9562.

S. V. Salunke and A. Ouda, “A Performance Benchmark for the PostgreSQL and MySQL Databases,” Future Internet, vol. 16, no. 10, p. 382, Oct. 2024, doi: 10.3390/fi16100382.

T. Taipalus, “Database management system performance comparisons: A systematic literature review,” Journal of Systems and Software, vol. 208, p. 111872, Oct. 2023, doi: 10.1016/j.jss.2023.111872.

H.-J. Schönig, Mastering PostgreSQL 17: Advanced Techniques to Build and Administer Scalable Databases, Packt Publishing Ltd, 5th ed., November 2024. Chapter 10: Understanding Transaction Management and Concurrency.

J. Han and Y. Choi, “Analyzing Performance Characteristics of PostgreSQL and MariaDB on NVMeVirt.” Nov. 15, 2024. doi: 10.48550/arxiv.2411.10005.

R. Elmasri and S. B. Navathe, Fundamentals of Database Systems, 7th ed. Boston, MA, USA: Pearson, 2016. Chapter 24: NOSQL Databases and Big Data Storage Systems

PostgreSQL Global Development Group, “PostgreSQL 18.1 documentation: UUID functions,” 2025.

Additional Files

Published

2026-02-15

How to Cite

[1]
S. Raharjo and E. Utami, “Benchmarking Relational and Array-Based Models for Genealogical Data Storage in PostgreSQL”, J. Tek. Inform. (JUTIF), vol. 7, no. 1, pp. 662–674, Feb. 2026.

Most read articles by the same author(s)

1 2 > >>