Whether it is offline company documents or dynamic webpages, the underlying structures that we use to store data in are specified in the form of databases. PostgreSQL is an object-relational database management system that uses the SQL standard and is available for use on all main operating systems, including Mac, Windows, Linux, and UNIX, in a variety of different flavors. It’s open source, and has a large range of programming interfaces, making it a popular choice worldwide, including beingused by companies such as Etsy and Skype.
When we have databases full of data, much of the underlying data can be text – and it can even be lengthy text at that. Consider, for instance, all of a company’s HR records in a database, or a personal blogging site full of posts. Each of these two examples would be text rich, and if we wanted to drill down to find specific textmatches, such as “HTML6 CSS”, for our HR records to find an employee with both of these capabilities, or “Mickey Mouse Road Runner”, to find blog posts that mention both characters for our blogging site, then we would need to use a full text search to scan each record for these terms.
Full text search is often expected, or even taken for granted, across systems. But the actual implementation of a full text search is more than just a trivial task. For instance, we would probably like to get results for “HTML5 CSS”, as well as “Mickey Road Runner”, in case these results, too, are relevant. A good full text search will return results for these (with other words between), too.
Adding a full text search engine to a database management system adds in extra complexity to a project. It requires more upkeep, as well as extra storage space.
If a database solution that includes full text search is what is needed, the PostgreSQL may indeed be the answer. It sidesteps adding in an extra full text search engine, and may be the right kind of solution for your data.
While PostgreSQL’s full text search capabilities aren’t as advanced, fast when scaled, or as accurate as other more complex solutions, they might just be good enough for your data, especially if configured cleverly.
The full text function of PostgreSQL indexes documents after they have been preprocessed. Each document added to a database is preprocessed and an index created. It does then by breaking up each document into tokens. Tokens can be items such as a word, or a number.
The parser that preprocesses each document does things like remove stop words (such as “and”, “the”, “with”, etc.), remove casing (“E” in a word is the same as “e” and vice versa), and include synonyms in search, including similar phrases (“baby” would also find “infant” and “newborn”).
The full text search has a default ranking system that can be manipulated to give you more accurate results based on whatever factors that you consider to be important, such as document length, phrase density, or unique phrasing.
You may also highlight the matched phrases in your search output.
There is a lot to learn about PostgreSQL’s full text search functionality and how to successfully configure it to do what you want within your database implementation, whatever form that may be in. We recommend viewing the PostgreSQL documentation chapter itself on the matter or have a read of this in depth blog post.
Smartsourcing is a brief guide to the world of modern technology partnerships. It was developed through a collaborative effort of top Zymr executives as we uncovered a gap in the market between the perception of what outsourcing used to be, and how leading technology innovators are leveraging this globalized approach to value generation. Read this guide to learn...