This article describes a simple, smart search algorithm that I have used in several iOS apps. The algorithm is smarter than a trivial substring match but not as complex as something like Xcode’s fuzzy matching of method names. It doesn’t understand English or alternate word forms. It’s smart, but simple!
If the data to be searched in your app is representable as a list of one or more words, the algorithm might be a good way to allow its users to quickly find relevant matches when searching.
The simplest way to describe the features of the algorithm and see it in use is with a sample project. You can download SmartSearchExample from GitHub and should be able to run it in the iOS simulator without changes. You will need to configure your development team in the app target’s “Signing & Capabilities” tab in order to run it on a real device.
The app shows an alphabetical list of the 2,612 SF Symbols available in iOS 14. This data is hopefully familiar to most iOS developers and works well as an example because there are lots of occurrences of some words (such as “arrow”, “circle” and “fill”). A search bar can be used to filter the list of names. The rules defining how matching occurs are the heart of the algorithm and this article.
At its heart, the algorithm works by matching word prefixes. It can be used when the data to be search is structured as a list of words. I prefer to use the slightly more abstract term “tokens” instead of “words”: groups of characters separated by whitespace.
To understand my requirements for how the search algorithm should work I will describe some use cases using the SF Symbol names as an example. Symbol names can be considered to be lists of tokens separated by periods, e.g.
heart.circle.fill is effectively three tokens:
fill. Replacing periods with spaces transforms the symbol names into a form that is suitable for the algorithm (which expects whitespace-separated tokens).
- A search for the empty string should match all symbols. Requirement 1: search strings with no tokens match everything.
- A search for “he” should match symbol names which contain a token that begins with “he” (e.g.
bolt.heart.fill). A match should not occur for a symbol that only has the two letters “he” somewhere within a token (e.g.
rectangle.dashed). Requirement 2: matches only occur on token prefixes.
- A search for “x bin” should match
xmark.bin.fill. The matches should not be a union of the matches of separate searches for “x” and “bin”. Requirement 3: each token in the search string must match a token in the data.
- A search for “up up” should only match symbol names which have two (or more) tokens beginning with “up”. Requirement 4: each token in the search string must match a separate token in the data.
- A search for “fill circle sq” should match
square.fill.on.circle.fill. Requirement 5: the order of the tokens in the search string and data do not need to be the same.
- A search for “p pe” should match (among others)
person.badge.plus. In this example, the “p” could be considered to match either
plus, but “pe” can only match
person. The more restrictive (i.e. longer) search token “pe” must match
person, allowing "p" to match
plus. In combination with requirement 4 (each search token matches a separate data token), this avoids giving different results depending upon the order of tokens in the strings. This is important and subtle behaviour but is actually easy to implement. Requirement 6: longer search tokens should be matched first.
- Requirement 7: leading and/or trailing spaces in the search string should be ignored.
- Requirement 8: multiple spaces between tokens in the search string should be treated as a single space; they do not create zero length search tokens.
Using the algorithm will require a matcher object to be initialised with the search text. That object will have a method which takes a candidate string as a parameter (the data item to test for a match, a space-separated SF Symbol name in the sample app). The method returns a boolean value indicating whether the candidate string matched the search text or not. In the sample project the SF symbol names (with periods replaced by spaces) are searched each time the search bar contents changes. Those that match are shown on screen:
The matcher object’s initialiser splits the search text into separate tokens, delimited by whitespace. These tokens will be needed each time a test is made on a candidate string to see if the candidate string matches the search text.
The match method is where the real work is done. The algorithm is relatively simple: check each search token matches a token in the candidate string. If all search tokens can be matched, return
true; otherwise return
false. A candidate token matches a search token if it begins with that string (e.g. “heart” matches “he”).
During each test for a match, candidate tokens are “consumed” as soon as they are matched so that they will not match another search token (requirement 4). If any of the search tokens are prefixes of other tokens (e.g. “p pe”) the order in which they are checked is important to ensure correct results. Testing the search tokens in order of decreasing length ensures the longer search tokens are matched first (requirement 6).
Real Swift Code
I don’t want to spend much time in this article discussing the sample project itself. I want to focus on how to use the searching algorithm and how matching is implemented. The app has an array of
SFSymbolName structs that defines the list of all the symbol names. A list style collection view shows rows with the image and name of matching symbols (or all symbols if a search is not being performed).
ViewController.swift file implements
UISearchResultsUpdating to recalculate the list of symbols to show each time the search bar text changes. The
sfSymbolNames property is an array of
SFSymbolName structs to show. It defaults to all the symbols and has a custom setter which triggers a reload of the collection view with the search results:
The search algorithm will ignore leading/trailing whitespace but it’s a worthwhile optimisation to check for a special case of an empty or all-whitespace search string and just show all symbols.
For non-trivial cases, a
SmartSearchMatcher object (actually a struct) is created using the space-stripped search text. The app then filters over its array of all the SF Symbols and uses the
matches method to test if it should be included in the results. (In a real app the symbol names with periods replaced with spaces would be cached, but I wanted to make the code as simple and clear as possible to focus on the important details.)
SmartSearchMatcher.swift implements the matcher. The code in the sample project has some comments which I’m not going to show here. The public API and initialiser are very simple:
split method defaults to omitting empty subsequences which is how the algorithm ignores leading/trailing space and treats multiple spaces between tokens as a single space. Note that the code uses
isWhitespace rather than assuming only actual space characters will be used to delimit tokens. Finally, the tokens are sorted by decreasing length to ensure longer search tokens are matched first.
matches method is relatively simple (but smart!):
true for the trivial case where there are no search tokens (requirement 1: search strings with no tokens match everything).
The candidate string (the data item being tested) is split into tokens. The order these are tested is not important to the result so they don’t need to be sorted.
The outer loop iterates over each of the search tokens (remember, these are already sorted in decreasing order of length by the initialiser). The inner loop tests a single search token against all the (remaining) candidate tokens. There is probably a shorter way to implement this, but I want the code to be clear. Matches are case insensitive and ignore differences in accented characters (e.g. “é” matches “E”).
If the inner loop completes without matching a search token then we can give up: the candidate string is not a match. If a candidate token does match, it is removed from the array so that it will not be matched again.
Finally, if all of the search tokens match a candidate token, we have a match!
Real World Considerations
The data being searched needs to be in an appropriate form for the algorithm to be useful: whitespace-delimited tokens. I’ve used this algorithm in several of my own apps: Easy Shopping List, Easy Cooking Timer, Meeting Cost Timers and Adaptivity. The first three search over user-created names (shopping list items, meal names, meeting template names). Adaptivity is more interesting and I will discuss it in more detail below.
Search Text and Data Formats
The sample app demonstrates a simple conversion of actual SF Symbol names to tokens by replacing each period with a space. But what if the user’s search text contains a period? As presented above, the code in
updateSearchResults would not match anything because there are no periods in the candidate strings passed to the
matches method. That’s not a good experience. If I literally search for “xmark.bin” I would expect to see
In the sample app there is additional code that (mostly) handles cases like this: if the search text contains a single token then a check is first made to see if the search text matches the full SF Symbol name (which is treated as one token since it contains periods and not spaces):
In practice this means that searches containing a period will only match strict prefixes of the full symbol names.
Adaptivity Search Examples
Amongst the many features of my app Adaptivity are views for browsing System Colors and System Images.
The names of System Colors can be considered to be “word”-like: camel case property names. In Adaptivity’s System Colors view, each component of the name is considered as a separate token. A search for “gr” matches colors with a token beginning with “gr” such as
Similar to this article’s sample app, the full name of the color is also checked against the search text for a strict prefix match (allowing “systemgr” to match
systemGreen, and the six system grays but not match
It’s debatable whether
tertiarySystemGroupedBackground actually should match a search for “systemgr”, but I’ve chosen to make the rule that special case behaviour for search text with a single token will only match strict prefixes of the full un-tokenised color name.
Adaptivity’s System Images view has some even more complex matching behaviour. The user can choose between viewing iOS 13, 14 or 14.2 SF Symbol sets (corresponding to v1.1, v2 or v2.1 of Apple’s SF Symbols Mac app). Some symbols were renamed in v2 and v2.1. Adaptivity can show which iOS version a symbol was added and any deprecated names.
The deprecated names are also considered for searching, allowing “split” to match
sleep (which was named
circle.bottomthird.split in iOS 13). Some of the tokens between periods in symbol names are actually made up of more than one English word. Thus a search for “pro” matches
airpodpro.left (and more). A search for “drop f” will match
drop.fill using the normal rules, but also matches
This is achieved by having my own custom data structures (one per SF Symbols data set), which defines the symbol names along with any deprecated versions and in such a way that I can derive the full name (e.g.
eyedropper.halffull), the “simple” tokenised name (e.g.
eyedropper halffull) and also another tokenised form with separate words (e.g.
eye dropper half full).
The algorithm is smart, but simple. Here’s a few possible optimisations that could be worth implementing:
- Add an overload of
matcheswhich takes an array of candidate tokens to avoid repeatedly splitting the same candidate strings over and over again on subsequent searches. In the sample app, all the SF symbol names will be tokenised every time the search text changes.
- If candidate strings have lots of tokens, the inner loop my be more efficient if it iterated over a set of tokens as opposed to an array. Depending on how many matching tokens there are, the cost of removing each matching token from an array might be more than the overhead of building the set (which has O(1) time complexity for removal).
- If the search token does not match the very beginning of the candidate token, we don’t care if it matches later. Using
range:of:options:to search for a match and then, if a match is found, testing if was at the start of the candidate token will needlessly search the whole string. A smarter way of matching only the candidate token prefix could save a little time. If we didn’t care about case insensitivity or accented characters, we could use
starts:with:. In my apps, I do care!
- A meta-optimisation would be to take advantage of the fact that some types of change to the search text cannot possibly match any items that a previous search had already failed to match. For example, adding a character to the end of a search token, or adding a new search token, will match either the same or fewer results as before. It is wasted effort to test any data items that didn’t match a less restrictive search text. The matches from a previous search could be used as a smaller data set for a subsequent search if this could be detected.
- Search results could be cached if it is likely the user would repeat a previous search (e.g. when removing a character from the end of a search token, changing back to a previously-used search text).
- If the number of items to search is large, or the user could be expected to quickly type relatively long search text, delaying the search until the search text had remain unchanged for a fraction of a second could avoid making searches which are immediately obsolete because the search text has changed. This kind of optimisation is common when searching requires performing a (relatively) slow network request to look for matches.
In my own experience, I’ve never felt like I needed to consider these kinds of optimisations and make the search algorithm more complicated. My data sets are relatively small and the number of search tokens is typically small. The effort required to update the collection view and redraw the UI to show the new results is most likely far greater than the time taken to perform the search (although, I’ve not measured this).
The searching within an app can be made more useful and powerful by treating the search text and underlying data as a series of tokens and filtering based on matching token prefixes. As seen in the sample and Adaptivity’s System Colors and System Images views, more complex representations of the underlying data can allow for even smarter matching. The algorithm remains the same, but multiple representations of each item in the data set (system color or system image) is checked for matches.
Other Articles That You Might Like
The latest iPhone 12 screen sizes are discussed in detail with many screenshots in How iOS Apps Adapt to the various iPhone 12 Screen Sizes. That has links to earlier articles in the series for other device sizes that Apple has introduced over the years.
If you found any of these articles helpful then please take a look at my apps in the iOS App Store to see if there’s anything that you’d like to download (especially the paid ones 😀).
As an iOS developer you might also be interested in Working with Multiple Versions of Xcode.
If you work with a lot of Xcode projects you might like my Mac Menu Bar utility XcLauncher. It’s like having browser bookmarks for your favorite Xcode projects, workspaces, playgrounds, and Swift packages. There is more information on my website about XcLauncher’s features.