Custom Entity
Overview
Jibber AI currently supports 20 different types of entities depending on the language. But you may want to add your own.
You can supply training data in one easy to read file and Jibber AI will automatically train a new entity model on startup from that data ready to start extracting entities from text.
NOTE: This feature is only available with the Docker solution and is not available with Rapid API.
Example Training File
In the custom_entity.json file below, you will find an example of a training file:
{
"entities": [
"DRUG",
"CONDITION"
],
"nerPosition": "before",
"training": [
"<DRUG>Abilify</DRUG> is a medication that is used to treat <CONDITION>schizophrenia</CONDITION> and <CONDITION>bipolar disorder</CONDITION>.",
"I take <DRUG>Bupropion</DRUG> to manage my <CONDITION>depression</CONDITION>.",
"<DRUG>Carvedilol</DRUG> is a medication that is used to treat <CONDITION>heart failure</CONDITION> and <CONDITION>high blood pressure</CONDITION>.",
"I use <DRUG>Dexedrine</DRUG> to manage my <CONDITION>attention deficit hyperactivity disorder (ADHD)</CONDITION>.",
"<DRUG>Eliquis</DRUG> is a medication that is used to prevent <CONDITION>blood clots</CONDITION>.",
"The doctor prescribed <DRUG>Fluoxetine</DRUG> to manage my <CONDITION>depression</CONDITION> and <CONDITION>anxiety</CONDITION>.",
"<DRUG>Gabapentin</DRUG> is a medication that is used to treat <CONDITION>nerve pain</CONDITION> and <CONDITION>seizures</CONDITION>.",
"I take <DRUG>Hydrocodone</DRUG> to manage my <CONDITION>pain</CONDITION>.",
"<DRUG>Insulin glargine</DRUG> is a long-acting insulin that is used to manage <CONDITION>diabetes</CONDITION>.",
"<DRUG>Jardiamet</DRUG> is a medication that is used to manage <CONDITION>type 2 diabetes</CONDITION>.",
"The doctor advised some time off to manage the <CONDITION>stress</CONDITION>."
],
"validation": [
"The doctor prescribed <DRUG>Effexor</DRUG> to manage my <CONDITION>depression</CONDITION> and <CONDITION>anxiety</CONDITION>.",
"<DRUG>Famotidine</DRUG> is a medication that is used to manage <CONDITION>acid reflux</CONDITION> and <CONDITION>stomach ulcers</CONDITION>.",
"I take <DRUG>Gilenya</DRUG> to manage <CONDITION>multiple sclerosis</CONDITION>.",
"<DRUG>Hydrochlorothiazide</DRUG> is a medication that is used to manage <CONDITION>high blood pressure</CONDITION>.",
"The doctor prescribed <DRUG>Invega</DRUG> to manage my <CONDITION>schizophrenia</CONDITION>.",
]
}
JSON File Properties
The JSON file must conform to the following rules:
The table below describes the various settings:
entities |
This is a list of entity names that Jibber AI will look for in the training and validation data examples. |
nerPosition |
Jibber AI can run entity extraction to extract standard entities and also custom entities, but you can control this behaviour using the nerPosition setting:
|
training |
Contains a list of examples. Each entity should be enclosed in the html style tags with the tag name being the name of the entity. For example: "<INVENTOR>Thomas Eddison</INVENTOR> invented the light bulb." NOTES:
|
validation |
The validation examples are used by the machine learning component to acess how well the model is performing and is used to feed back into the next iteration of training. |
The table below describes the various settings:
Setting | Description |
---|---|
entities | This is a list of entity names that Jibber AI will look for in the training and validation data examples. |
nerPosition | Jibber AI can run entity extraction to extract standard entities and also custom entities, but you can control this behaviour using the nerPosition setting:
|
training | Contains a list of examples. Each entity should be enclosed in the html style tags with the tag name being the name of the entity. For example: "<INVENTOR>Thomas Eddison</INVENTOR> invented the light bulb." NOTES:
|
validation | The validation examples are used by the machine learning component to assess how well the model is performing and is used to feed back into the next iteration of training. |
Setup
Jibber AI looks for a file in the location /jibber_data/[LANGUAGE_CODE]/custom_entity.json
The language code should match the Docker image you intend to run. In English, that would be /jibber_data/en/custom_entity.json
A typical approach to get the custom_entity.json file in the container is:
Example: If starting the Docker container using docker run, and the custom_entity.json file is in the host folder as /jibber_host_folder/en/custom_entity.json
docker run -v /jibber_host_folder:/jibber_data -d -p 5000:8000 jibberhub/jibber_extractor_en:1.0
Replace `1.0` with the version of Jibber AI you want to run.
If using Docker Compose, then you can map the folder in the Docker Compose folder.
version: "3"
services:
jibber-service:
image: jibberhub/jibber_extractor_en:1.0
environment:
- TOKEN=my-license-token-from-jibber-ai
ports:
- "8000:8000"
volumes:
- /jibber_host_folder:/jibber_data
Replace `1.0` with the required version of Jibber AI.
There are various ways to mount volumes, refer to the Docker/Docker Compose/kubernetes documentation on mounting volumes in a Docker container for more information.
Training
Jibber AI automatically looks for the custom_entity.json file on startup and checks if there's a model available matching the data in the file.
If a model is not found, it will automatically start training a model based on the contents of custom_entity.json. The model will be saved to
The training can take a while to run, and if there are multiple Jibber AI containers starting up at the same time (if you are load balancing for example), then run the training on one container first by starting a single container in train only mode.
When run in train only mode, the container will start up, check for a training file and train a model if required. The output model is written to disk and you can use that model when starting the containers in normal mode. The service then shuts down.
By doing it this way, you can train and test in isolation and then copy the model to the production system when you are ready.
To start the container in train only mode, you can use the following Docker command:
docker run --rm -v /jibber_host_folder:/jibber_data jibberhub/jibber_extractor_en:1.0 python -m train
You would replace the `1.0` version with the version you require.