Expand the searching capabilities of gitlab by using Elasticsearch
GitLab is great for hosting code when you know where that code is and what is going on in your project. However, when you involve other people in your project or you need to search all of the projects which you have access for code, issues, milestones, etc there is no good way to search.
I searched for a single filename which I know exists in a repository and I couldn't find even the name of the file (let alone the contents of it).
I propose expanding the search capabilities of GitLab by integrating elasticsearch for indexing everything GitLab has to offer. Whether it's code, issues, milestones, merge requests, wall, wiki, what have you.
Roman Mykhailiuk commented
Are there any plans to implement at least Postgres full text search in Gitlab, to be able to search all the code, not just inside one repository?
We have a repository for production ready modules, but we are having trouble searching for the module name which corresponds to the folder name within the production repository. A basic search for filename or folder name would be a huge advantage on usability
Andrew Pennebaker commented
Adding all the dependencies to run Elastic Search (JVM, etc.) would make GitLab hard to install and maintain. It also requires a lot of extra resources (increasing the memory requirements). We afraid that there will be a split of some people running GitLab with Elastic Search and some people without, greatly complicating future development.
We think there are still a lot of possibilities for improving GitLab search with PostgreSQL. Maybe we should focus on improving that. Long term we see GitLab using PostgreSQL exclusively and we don't mind improving the search experience for people using PostgreSQL only. A good article about it is http://blog.lostpropertyhq.com/postgres-full-text-search-is-good-enough/
Of course this is not as good as using Elastic Search and there are valid comments about this in https://news.ycombinator.com/item?id=8381748 But there is a big advantage to keeping everyone working with the same GitLab. Search is something that could use more development attention and we should put our effort into something that everyone can use.
Matthias Niehoff commented
Hi guys, what is the current progress on this? we are very interested in this feature and the screenshot provided by Zzet looks very cool!
Hi, Guys. Result of integration: http://puu.sh/7JZD2.png
Sam, https://github.com/zzet/elasticsearch-git#integration-with-gitlab is certainly interesting, let us know what you come up with. And obviously we greatly prefer a generic approach above a configurable one, but for now we try to be open minded about this.
Even better that appears to have some integration with gitlab!... I'm going to play with it and see what I come up with...
After a quick search there's a rubygem for elasticsearch-git. https://github.com/zzet/elasticsearch-git
I'm thinking of attempting to implement elasticsearch as an addon component to gitlab (not a requirement). I feel like PostgreSQL fulltext search is a bit heavy on the processing side to effectively serve the scope of the content I'm talking about. Additionally my org uses MySQL as the backend for this so in this scenario Postgres fulltext search wouldn't apply at all.
For people running a raspi or digital ocean if this is configurable as an optional external service to plug-in they simply run elasticsearch in a different system or not at all.
This suggestion makes sense, but what about the people running GitLab on a server with 512MB on Digital Ocean and a Raspberry Pi? Can we maybe leverage PostgreSQL fulltext search to improve the situation for everyone?
We are asking for suggestions in http://feedback.gitlab.com/forums/176466-general/suggestions/3887815-improve-search-so-it-searches-code-diffs-and-file