Reducing Formal Context Extraction: A Newly Proposed Framework from Big Corpora

Automating the extraction of concept hierarchies from free text is advantageous because manual generation is frequently labor- and resource-intensive. Free result, the whole procedure for concept hierarchy learning from free text entails several phases, including sentence-level text processing, sentence splitting, and tokenization. Lemmatization is after formal context analysis (FCA) to derive the pairings. Nevertheless, there could be a few uninteresting and incorrect pairings in the formal context. It may take a while to generate formal context; thus, size reduction formal context is necessary to weed out irrelevant and incorrect pairings to extract the concept lattice and hierarchies more quickly. This study aims to propose a framework for reducing formal context in extracting concept hierarchies from free text to reduce the ambiguity of the formal context. We achieve this by reducing the size of the formal context using a hybrid of a WordNet-based method and a frequency-based technique. Using 385 samples from the Wikipedia corpus and the suggested framework, tests are carried out to examine the reduced size of formal context, leading to concept lattice and concept hierarchy. With the help of concept lattice-invariants, the generated formal context lattice is compared to the normal one. In contrast to basic ones, the homomorphic between the resultant lattices retains up to 98% of the quality of the generating concept hierarchies, and the reduced concept lattice receives the structural connection of the standard one. Additionally, the new framework is compared to five baseline techniques to calculate the running time on random datasets with various densities. The findings demonstrate that, in various fill ratios, hybrid approaches of the proposed method outperform other indicated competing strategies in concept lattice performance.
View on arXiv@article{hassan2025_2504.06285, title={ Reducing Formal Context Extraction: A Newly Proposed Framework from Big Corpora }, author={ Bryar A. Hassan and Shko M. Qader and Alla A. Hassan and Joan Lu and Aram M. Ahmed and Jafar Majidpour and Tarik A. Rashid }, journal={arXiv preprint arXiv:2504.06285}, year={ 2025 } }