admin管理员组

文章数量:1301592

I'm building a Javascript chat bot for something, and I ran into an issue:
I use string.split() to tokenize my input like this:
tokens = message.split(" ");

Now my problem is that I need 4 tokens to make the mand, and 1 token to have a message. when I do this: !finbot msg testuser 12345 Hello sir, this is a test message

these are the tokens I get: ["!finbot", "msg", "testuser", "12345", "Hello", "sir,", "this", "is", "a", "test", "message"]

However, how can I make it that it will be like this: ["!finbot", "msg", "testuser", "12345", "Hello sir, this is a test message"]

The reason I want it like this is because the first token (token[0]) is the call, the second (token[1]) is the mand, the third (token[2]) is the user, the fourth (token[3]) is the password (as it's a password protected message thing... just for fun) and the fifth (token[4]) is the actual message.
Right now, it would just send Hello because I only use the 5th token.
the reason why I can't just go like message = token[4] + token[5]; etc. is because messages are not always exactly 3 words, or not exactly 4 words etc.

I hope I gave enough information for you to help me. If you guys know the answer (or know a better way to do this) please tell me so.

Thanks!

I'm building a Javascript chat bot for something, and I ran into an issue:
I use string.split() to tokenize my input like this:
tokens = message.split(" ");

Now my problem is that I need 4 tokens to make the mand, and 1 token to have a message. when I do this: !finbot msg testuser 12345 Hello sir, this is a test message

these are the tokens I get: ["!finbot", "msg", "testuser", "12345", "Hello", "sir,", "this", "is", "a", "test", "message"]

However, how can I make it that it will be like this: ["!finbot", "msg", "testuser", "12345", "Hello sir, this is a test message"]

The reason I want it like this is because the first token (token[0]) is the call, the second (token[1]) is the mand, the third (token[2]) is the user, the fourth (token[3]) is the password (as it's a password protected message thing... just for fun) and the fifth (token[4]) is the actual message.
Right now, it would just send Hello because I only use the 5th token.
the reason why I can't just go like message = token[4] + token[5]; etc. is because messages are not always exactly 3 words, or not exactly 4 words etc.

I hope I gave enough information for you to help me. If you guys know the answer (or know a better way to do this) please tell me so.

Thanks!

Share Improve this question asked Aug 27, 2016 at 19:16 Finlay RoelofsFinlay Roelofs 5706 silver badges22 bronze badges
Add a ment  | 

4 Answers 4

Reset to default 3

Use the limit parameter of String.split:

tokens = message.split(" ", 4);

From there, you just need to get the message from the string. Reusing this answer for its nthIndex() function, you can get the index of the 4th occurrence of the space character, and take whatever es after it.

var message = message.substring(nthIndex(message, ' ', 4))

Or if you need it in your tokens array:

tokens[4] = message.substring(nthIndex(message, ' ', 4))

I would probably start by taking the string like you did, and tokenizing it:

const myInput = string.split(" "):

If you're using JS ES6, you should be able to do something like:

const [call, mand, userName, password, ...messageTokens] = myInput;
const message = messageTokens.join(" ");

However, if you don't have access to the spread operator, you can do the same like this (it's just much more verbose):

const call = myInput.shift();
const mand = myInput.shift();
const userName = myInput.shift();
const password = myInput.shift();
const message = myInput.join(" ");

If you need them as an array again, now you can just join those parts:

const output = [call, mand, userName, password, message];

If you can use es6 you can do:

let  [c1, c2, c3, c4, ...rest] = input.split (" ");
let msg = rest.join (" ");

You could revert to regexp given that you defined your format as "4 tokens of not-space separated with spaces followed by message":

function tokenize(msg) {
    return (/^(\S+) (\S+) (\S+) (\S+) (.*)$/.exec(msg) || []).slice(1, 6);
}

This has the perhaps unwanted behaviour of returning an empty array if your msg does not actually match the spec. Remove the ... || [] and handle accordingly, if that's not acceptable. The amount of tokens is also fixed to 4 + the required message. For a more generic approach you could:

function tokenizer(msg, nTokens) {
    var token = /(\S+)\s*/g, tokens = [], match;

    while (nTokens && (match = token.exec(msg))) {
        tokens.push(match[1]);
        nTokens -= 1; // or nTokens--, whichever is your style
    }

    if (nTokens) {
        // exec() returned null, could not match enough tokens
        throw new Error('EOL when reading tokens');
    }

    tokens.push(msg.slice(token.lastIndex));
    return tokens;
}

This uses the global feature of regexp objects in Javascript to test against the same string repeatedly and uses the lastIndex property to slice after the last matched token for the rest.

Given

var msg = '!finbot msg testuser 12345 Hello sir, this is a test message';

then

> tokenizer(msg, 4)
[ '!finbot',
  'msg',
  'testuser',
  '12345',
  'Hello sir, this is a test message' ]
> tokenizer(msg, 3)
[ '!finbot',
  'msg',
  'testuser',
  '12345 Hello sir, this is a test message' ]
> tokenizer(msg, 2)
[ '!finbot',
  'msg',
  'testuser 12345 Hello sir, this is a test message' ]

Note that an empty string will always be appended to returned array, even if the given message string contains only tokens:

> tokenizer('asdf', 1)
[ 'asdf', '' ]  // An empty "message" at the end

本文标签: