How can I train a model on both text and numeric data?

Posted by boringblobking@reddit | LocalLLaMA | View on Reddit | 3 comments

Is there a standard way of doing this? E.g. if you have patient data taken from GP records. The data consists of things like their age, gender, and whether they smoke, which are discrete values, but it also contains text data describing their conditions etc. How would you train a single model with all of this data to make inferences?

One idea I had was to just build a vanilla neural network that takes in the discrete data as input parameters, and for the text, use BERT to encode the text and use the encodings as input as well to my vanilla neural network. Is this likely to work? Is there a more standard way of dealing with such situations?