Unfortunately I don't know much about this topic, and I don't know if anyone here does. And that's especially true for Asia/Malaysia. I think it would be great to apply analyses to new language groups like that though! And we'd like to help, but I don't know if we have the information.
From searching the web, I see a lot of information, though!
Have you seen this webpage?
http://www.linguistics.ucsb.edu/faculty/stgries/teaching/groningen/There are a number of PDF articles linked there, so that's a good place to start. And it also has information for you to start applying the analyses yourself.
Using R can be a challenge when you first start, but it's a great option when you want to analyze large amounts of text. For what you're doing, working with large corpora is the best way, and for Malaysian you'd probably need to create your own corpus (for example, from online newspapers or blogs), which could be done in R and searching for collexemes is relatively easy in a custom corpus (compared to something like part of speech tagging!).
I recently read (several chapters from) a book by Zeldes (2012), "Productivity in Argument Selection".
https://www.google.com/search?q=productivity+in+argument+selectionThere's also a link there to a PDF of an in-press article, I think, so you could read that.
The basic point of the book is that our grammatical knowledge also consists of frequency information and information about how often we use certain forms with new arguments and so forth. It's interesting and new-- it hasn't been tested in Asian languages as far as I know, so you could consider looking into that if it interests you.