Roadmap & Development5
We're continuing to innovate with our development objectives for the coming months for Octavia for Groups being outlined below.
Model 1 was our first model, a proof of concept trained on 5000 real and simulated examples of conversations. We found that our first model was approximately 95% reliable in our testing, but it proved that it was possible to moderate a community with AI.
Model 1's main weakness was that it was unable to determine the difference between enthusiasm for the project and shilling. Additionally, soft shilling wasn't well managed.
Model 2 uses 10x more training data, and is a much larger model. This allowed us to achieve 99.5% accuracy, and very few false positives. However, the weakness of model 0002 was mainly that content like "This token is so sexy" is mislabeled as potentially violating sexual content.
Model 0003 - Not Yet Trained
Model 3 has significantly more training data, and solves the remaining problems. It is also proficient in some other languages, however further testing and training is required to see this out to completion.
We're working on developing custom rules, allowing you to specifc your own specific rule set in plain english to fine tune the AI moderation how you want.
Unprompted Automated Support
Octavia will soon determine when and where she can be most useful, and will be able to respond automatically to users needing help when (and only when) she knows she can provide a clear and concise answer. Just like your community management team would!
Audio Abuse Detection
Just like regular Octavia Assistant, Octavia for Groups will need to translate all text into content she can understand. And as such, it will be processed just as normal.
Image Abuse Detection
We never want any of our staff, or anyone else's staff to be exposed to awful abusive content. That’s why Octavia automatically compares all images posted to Octavia for Groups communities to known child exploitation hash datasets provided by organisations like National Center for Missing and Exploited Children, and Industry Hash Sharing. This technology uses fuzzy hashing to detect CSAM material for moderation. The user is immediately muted and the content is deleted. As this process is fuzzy hash based, there is a possibility for false-positives. To prevent this, we plan to process the image through the LLAVA image/language model, and if we detect that it is abusive in LLAVA as well as the hashing, we ban the user permanently from all groups that Octavia manages, notify the admins of the group.
Image Based Rule Breaking
Additionally, users may share abusive content which is intended to get around AI moderation. Thankfully, we are building a LLAVA model to process images and understand them contextually – both in text and in content.
If we detect the possibility of abuse, the image is deleted, the user is reported to admins, and they are prevented from uploading photos until the admins can evaluate and take action.