Custom PII
Overview
Jibber AI currently supports 42 different types of PII. But you may want to add your own.
You add new PII types by defining them within a json configuration file.
NOTE: This feature is only available with the Docker solution and is not available with Rapid API.
Example Custom PII File
In the custom_pii.json file below, you will find 3 examples of extracting time from text:
[
{
"name": "GENERAL_TIME_CONF_ON_PROXIMITY",
"pattern": "\b((0[0-9]|1[0-9]|2[0-3]):[0-5][0-9])\b",
"pattern_case_sensitive": false,
"keywords_pattern": "\b(time|clock)\b",
"keywords_case_sensitive": true,
"keywords_distance": 100,
"confidence": "medium",
"confidence_not_near_keyword": "low"
},
{
"name": "GENERAL_TIME_MUST_BE_NEAR_KEYWORD",
"pattern": "\b((0[0-9]|1[0-9]|2[0-3]):[0-5][0-9])\b",
"pattern_case_sensitive": false,
"pattern_must_be_near_keyword": true,
"keywords_pattern": "\b(time|clock)\b",
"keywords_case_sensitive": true,
"keywords_distance": 100,
"confidence": "high"
},
{
"name": "GENERAL_TIME_NO_KEYWORD",
"pattern": "\b((0[0-9]|1[0-9]|2[0-3]):[0-5][0-9])\b",
"pattern_case_sensitive": false,
"confidence": "high"
}
]
JSON File Properties
The JSON file must conform to the following rules:
The table below describes the various settings:
name |
The name of the custom rule. This should not include spaces. |
pattern |
The regular expression to match the PII data. Only the top level regular expression groups are returned. You could use something like https://regex101.com/ to verify the pattern before you use it here. |
pattern_case_sensitive |
Boolean value indicating if the pattern is case sensitive. |
pattern_must_be_near_keyword |
Boolean value indicating if the pattern must be near a keyword in order to return the match. |
keywords_pattern |
The regular expression to match the PII keywords. You can match multiple keywords using the or | operator. You could use something like https://regex101.com/ to verify the keyword pattern before you use it here. |
keywords_case_sensitive |
Boolean value indicating if the keyword pattern is case sensitive. |
keywords_distance |
Integer value indicating the maximum proximity distance in characters betweek a pattern match and a keyword. |
confidence |
The default confidence level if a match is found. |
confidence_not_near_keyword |
The confidence level to return is a match is found, but is not near a keyword and the pattern_must_be_near_keyword setting is absent or set to false. |
The table below describes the various settings:
Setting | Description |
---|---|
name | The name of the custom rule. This should not include spaces. |
pattern | The regular expression to match the PII data. Only the top level regular expression groups are returned. You could use something like https://regex101.com/ to verify the pattern before you use it here. |
pattern_case_sensitive | Boolean value indicating if the pattern is case sensitive. |
pattern_must_be_near_keyword | Boolean value indicating if the pattern must be near a keyword in order to return the match. |
keywords_pattern | The regular expression to match the PII keywords. You can match multiple keywords using the or | operator. You could use something like https://regex101.com/ to verify the keyword pattern before you use it here. |
keywords_case_sensitive | Boolean value indicating if the keyword pattern is case sensitive. |
keywords_distance | Integer value indicating the maximum proximity distance in characters betweek a pattern match and a keyword. |
confidence | The default confidence level if a match is found. |
confidence_not_near_keyword | The confidence level to return is a match is found, but is not near a keyword and the pattern_must_be_near_keyword setting is absent or set to false. |
Setup
Jibber AI looks for a file in the location /jibber_data/[LANGUAGE_CODE]/custom_pii.json
The language code should match the Docker image you intend to run. In English, that would be /jibber_data/en/custom_pii.json
A typical approach to get the custom_pii.json file in the container is:
Example: If starting the Docker container using docker run, and the custom_pii.json file is in the host folder as /jibber_host_folder/en/custom_pii.json
docker run -v /jibber_host_folder:/jibber_data -d -p 5000:8000 jibberhub/jibber_extractor_en:1.0
Replace `1.0` with the version of Jibber AI you want to run.
If using Docker Compose, then you can map the folder in the Docker Compose folder.
version: "3"
services:
jibber-service:
image: jibberhub/jibber_extractor_en:1.0
environment:
- TOKEN=my-license-token-from-jibber-ai
ports:
- "8000:8000"
volumes:
- /jibber_host_folder:/jibber_data
Replace `1.0` with the version of Jibber AI you want to run.
There are various ways to mount volumes, refer to the Docker/Docker compose/kubernetes documentation on mounting volumes in a Docker container for more information.