# 12.8. 測試與除錯

The behavior of a custom text search configuration can easily become confusing. The functions described in this section are useful for testing text search objects. You can test a complete configuration, or test parsers and dictionaries separately.

## 12.8.1. Configuration Testing

The function`ts_debug`allows easy testing of a text search configuration.

```
ts_debug([
config
regconfig
, 
] 
document
text
,
         OUT 
alias
text
,
         OUT 
description
text
,
         OUT 
token
text
,
         OUT 
dictionaries
regdictionary[]
,
         OUT 
dictionary
regdictionary
,
         OUT 
lexemes
text[]
)
         returns setof record
```

`ts_debug`displays information about every token of`document`*\_as produced by the parser and processed by the configured dictionaries. It uses the configuration specified by*`config`\_, or`default_text_search_config`if that argument is omitted.

`ts_debug`returns one row for each token identified in the text by the parser. The columns returned are

* `aliastext`— short name of the token type
* `descriptiontext`— description of the token type
* `tokentext`— text of the token
* `dictionariesregdictionary[]`— the dictionaries selected by the configuration for this token type
* `dictionaryregdictionary`— the dictionary that recognized the token, or`NULL`if none did
* `lexemestext[]`— the lexeme(s) produced by the dictionary that recognized the token, or`NULL`if none did; an empty array (`{}`) means it was recognized as a stop word

Here is a simple example:

```
SELECT * FROM ts_debug('english','a fat  cat sat on a mat - it ate a fat rats');
   alias   |   description   | token |  dictionaries  |  dictionary  | lexemes 
-----------+-----------------+-------+----------------+--------------+---------
 asciiword | Word, all ASCII | a     | {english_stem} | english_stem | {}
 blank     | Space symbols   |       | {}             |              | 
 asciiword | Word, all ASCII | fat   | {english_stem} | english_stem | {fat}
 blank     | Space symbols   |       | {}             |              | 
 asciiword | Word, all ASCII | cat   | {english_stem} | english_stem | {cat}
 blank     | Space symbols   |       | {}             |              | 
 asciiword | Word, all ASCII | sat   | {english_stem} | english_stem | {sat}
 blank     | Space symbols   |       | {}             |              | 
 asciiword | Word, all ASCII | on    | {english_stem} | english_stem | {}
 blank     | Space symbols   |       | {}             |              | 
 asciiword | Word, all ASCII | a     | {english_stem} | english_stem | {}
 blank     | Space symbols   |       | {}             |              | 
 asciiword | Word, all ASCII | mat   | {english_stem} | english_stem | {mat}
 blank     | Space symbols   |       | {}             |              | 
 blank     | Space symbols   | -     | {}             |              | 
 asciiword | Word, all ASCII | it    | {english_stem} | english_stem | {}
 blank     | Space symbols   |       | {}             |              | 
 asciiword | Word, all ASCII | ate   | {english_stem} | english_stem | {ate}
 blank     | Space symbols   |       | {}             |              | 
 asciiword | Word, all ASCII | a     | {english_stem} | english_stem | {}
 blank     | Space symbols   |       | {}             |              | 
 asciiword | Word, all ASCII | fat   | {english_stem} | english_stem | {fat}
 blank     | Space symbols   |       | {}             |              | 
 asciiword | Word, all ASCII | rats  | {english_stem} | english_stem | {rat}
```

For a more extensive demonstration, we first create a`public.english`configuration and Ispell dictionary for the English language:

```
CREATE TEXT SEARCH CONFIGURATION public.english ( COPY = pg_catalog.english );

CREATE TEXT SEARCH DICTIONARY english_ispell (
    TEMPLATE = ispell,
    DictFile = english,
    AffFile = english,
    StopWords = english
);

ALTER TEXT SEARCH CONFIGURATION public.english
   ALTER MAPPING FOR asciiword WITH english_ispell, english_stem;
```

```
SELECT * FROM ts_debug('public.english','The Brightest supernovaes');
   alias   |   description   |    token    |         dictionaries          |   dictionary   |   lexemes   
-----------+-----------------+-------------+-------------------------------+----------------+-------------
 asciiword | Word, all ASCII | The         | {english_ispell,english_stem} | english_ispell | {}
 blank     | Space symbols   |             | {}                            |                | 
 asciiword | Word, all ASCII | Brightest   | {english_ispell,english_stem} | english_ispell | {bright}
 blank     | Space symbols   |             | {}                            |                | 
 asciiword | Word, all ASCII | supernovaes | {english_ispell,english_stem} | english_stem   | {supernova}
```

In this example, the word`Brightest`was recognized by the parser as an`ASCII word`(alias`asciiword`). For this token type the dictionary list is`english_ispell`and`english_stem`. The word was recognized by`english_ispell`, which reduced it to the noun`bright`. The word`supernovaes`is unknown to the`english_ispell`dictionary so it was passed to the next dictionary, and, fortunately, was recognized (in fact,`english_stem`is a Snowball dictionary which recognizes everything; that is why it was placed at the end of the dictionary list).

The word`The`was recognized by the`english_ispell`dictionary as a stop word ([Section 12.6.1](https://www.postgresql.org/docs/10/static/textsearch-dictionaries.html#textsearch-stopwords)) and will not be indexed. The spaces are discarded too, since the configuration provides no dictionaries at all for them.

You can reduce the width of the output by explicitly specifying which columns you want to see:

```
SELECT alias, token, dictionary, lexemes
FROM ts_debug('public.english','The Brightest supernovaes');
   alias   |    token    |   dictionary   |   lexemes   
-----------+-------------+----------------+-------------
 asciiword | The         | english_ispell | {}
 blank     |             |                | 
 asciiword | Brightest   | english_ispell | {bright}
 blank     |             |                | 
 asciiword | supernovaes | english_stem   | {supernova}
```

## 12.8.2. Parser Testing

The following functions allow direct testing of a text search parser.

```
ts_parse(
parser_name
text
, 
document
text
,
         OUT 
tokid
integer
, OUT 
token
text
) returns 
setof record

ts_parse(
parser_oid
oid
, 
document
text
,
         OUT 
tokid
integer
, OUT 
token
text
) returns 
setof record
```

`ts_parse`parses the given\_`document`\_and returns a series of records, one for each token produced by parsing. Each record includes a`tokid`showing the assigned token type and a`token`which is the text of the token. For example:

```
SELECT * FROM ts_parse('default', '123 - a number');
 tokid | token
-------+--------
    22 | 123
    12 |
    12 | -
     1 | a
    12 |
     1 | number
```

```
ts_token_type(
parser_name
text
, OUT 
tokid
integer
,
              OUT 
alias
text
, OUT 
description
text
) returns 
setof record

ts_token_type(
parser_oid
oid
, OUT 
tokid
integer
,
              OUT 
alias
text
, OUT 
description
text
) returns 
setof record
```

`ts_token_type`returns a table which describes each type of token the specified parser can recognize. For each token type, the table gives the integer`tokid`that the parser uses to label a token of that type, the`alias`that names the token type in configuration commands, and a short`description`. For example:

```
SELECT * FROM ts_token_type('default');
 tokid |      alias      |               description                
-------+-----------------+------------------------------------------
     1 | asciiword       | Word, all ASCII
     2 | word            | Word, all letters
     3 | numword         | Word, letters and digits
     4 | email           | Email address
     5 | url             | URL
     6 | host            | Host
     7 | sfloat          | Scientific notation
     8 | version         | Version number
     9 | hword_numpart   | Hyphenated word part, letters and digits
    10 | hword_part      | Hyphenated word part, all letters
    11 | hword_asciipart | Hyphenated word part, all ASCII
    12 | blank           | Space symbols
    13 | tag             | XML tag
    14 | protocol        | Protocol head
    15 | numhword        | Hyphenated word, letters and digits
    16 | asciihword      | Hyphenated word, all ASCII
    17 | hword           | Hyphenated word, all letters
    18 | url_path        | URL path
    19 | file            | File or path name
    20 | float           | Decimal notation
    21 | int             | Signed integer
    22 | uint            | Unsigned integer
    23 | entity          | XML entity
```

## 12.8.3. Dictionary Testing

The`ts_lexize`function facilitates dictionary testing.

```
ts_lexize(
dict
regdictionary
, 
token
text
) returns 
text[]
```

`ts_lexize`returns an array of lexemes if the input\_`token`\_is known to the dictionary, or an empty array if the token is known to the dictionary but it is a stop word, or`NULL`if it is an unknown word.

Examples:

```
SELECT ts_lexize('english_stem', 'stars');
 ts_lexize
-----------
 {star}

SELECT ts_lexize('english_stem', 'a');
 ts_lexize
-----------
 {}
```

## Note

The`ts_lexize`function expects a single\_token\_, not text. Here is a case where this can be confusing:

```
SELECT ts_lexize('thesaurus_astro','supernovae stars') is null;
 ?column?
----------
 t
```

The thesaurus dictionary`thesaurus_astro`does know the phrase`supernovae stars`, but`ts_lexize`fails since it does not parse the input text but treats it as a single token. Use`plainto_tsquery`or`to_tsvector`to test thesaurus dictionaries, for example:

```
SELECT plainto_tsquery('supernovae stars');
 plainto_tsquery
-----------------
 'sn'
```
