We are about to switch to a new forum software. Until then we have removed the registration on this forum.
Hi all
I'm trying to remove the url's from strings, generated by tweets (using twitter4j) I've been trying all sorts of things with no luck.
So Let's say I have a string containing
"Hello, follow me! http://google.com"
or
"This is a tweet with an instagram photo https://instagr.am/blablabla"
I want to strip just the url from that string, but that url may be anything.
I've been trying things with regex, and other instructions I've found on the web, but I'm really lost. Could anyone point me in the right direction?
Answers
Don't...
:)
I mean twitter already provides you with a lot of stuff for each "entity" like the URL both long and short versions. And twitter4j does handle those for you:
https://twitter4j.org/javadoc/twitter4j/EntitySupport.html
i have some code at home i can post later if you need.
It is something like:
If no URL is present in the Status it returns
null
I think.Thanks for the reply! So then can I use getURLEntities() to remove the url from the string?
I haven't quite figured out how to remove a string from a string yet. Is it possible without splitting the string into separate words first?
Also, this might be more appropriate in the Library questions department, since it the solution seems to be part of twitter4j.
http://docs.Oracle.com/javase/8/docs/api/java/lang/String.html#replace-java.lang.CharSequence-java.lang.CharSequence-
Using those is not to get a url from a String. You are getting a string that is just the url directly from tweeter see
https://dev.twitter.com/overview/api/entities-in-twitter-objects
To deal with Strings your self follow goToLoop's path. But as you noticed deal with this from scratch is not trivial, so tweeter already suplies you with those stuff in an easy way. The entities.
Alright, thanks for pointing me in the right direction guys! I've got this! I'll post my results when I get there :)
@GoToLoop, I know of the existence of replace(), but it can only replace characters with something else (or nothing), it won't work with a String. Is there a workaround? It would really facilitate the process of stripping the urls from tweets...
The other way to go would involve splitting the tweet String into words, comparing each word to each possible url entity, then putting together only the words dat don't match a url. I've already tried constructing something like this, but that didn't really work out so well.
Together with an url entitie there is an array of ints containing the index of the first and last char of that specific url in the tweet string, like:
This is a tweet http://someurl.com
Would have [16, 34] (if I counted right :)
That is another brilliant hint. Thanks a lot!
replace() is a method of class String. So I don't get it when you say it won't work on 1! :-@
If you've got ahold of the URL String and wanna remove it outta the bigger String, it's as simple as:
biggerStr = biggerStr.replace(urlString, "");
Ok, this is what I've got working atm:
This removes all consecutive urls in a tweet. They are usually at the end, so I could do without the urlEnd parameter and thus avoiding having to combine the tweetTarra strings, but just in case I've got it like this. Text in between the first and last url is lost though, I think I could do this prettier and better with a for-loop, but I'm not quite there yet :)
something like this:
Nice job, @_vk! A shorter solution using my replace() hint! :)>-
Just 1 tip: No need to check for length. As long as URLs isn't
null
,for ( : )
won't crash! ;)):-c :)
Thank GTL